Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
153855 stories
·
33 followers

Xbox is now XBOX

1 Share
Vector illustration the Xbox logo.

Xbox just allcapsmaxxed: Meet XBOX. This isn't a joke; Microsoft appears to be actually rebranding Xbox to XBOX. Asha Sharma, Xbox CEO, ran a poll on X earlier this week, asking fans whether Microsoft should use Xbox or XBOX. The results were in favor of XBOX, and the company has now renamed its X account.

Curiously, the Threads and Bluesky accounts for Xbox haven't been renamed yet, but if Microsoft is going ahead with a rebranding then I expect those will change soon. I asked Microsoft to comment on this potential Xbox rebranding and the company simply referred me to Sharma's post.

The use of all caps for Xbox is a return to original for …

Read the full story at The Verge.

Read the whole story
alvinashcraft
59 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Does Trump Mobile know how many stripes are on the American flag?

1 Share
A still from Trump Mobile’s promotional video showing the T1 Phone surrounded by the accessories it ships with.
The T1 Phone has the wrong number of stripes, but it does at least have 50 stars. | Screenshot: Trump Mobile

Where's the Trump phone? We're going to keep talking about it every week. We've reached out, as usual, to ask about the Trump phone's whereabouts. This week, despite our best hopes, we still don't have our phone - but we do have some fresh doubts about the company's patriotic credentials.

This has been a momentous few days for Trump Mobile, in which it defied the haters by announcing that its phones will be shipping to buyers this very week. Not that there's any sign the company has actually done that, but I digress. Because what I really want to talk about today is the American flag.

I am not an American, which probably explains why I did …

Read the full story at The Verge.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Starbucks announces new round of corporate layoffs following cuts in Seattle tech roles

1 Share
Starbucks’ Seattle headquarters. (GeekWire File Photo)

Starbucks announced Friday that is laying off 300 additional corporate employees and closing several regional offices after earlier this week providing details on the elimination of 61 tech roles in Seattle.

The cuts aim to “further sharpen focus, prioritize work, reduce complexity, and lower costs,” a spokesperson said by email. The company axed nearly 2,000 corporate roles last year, according to past reports.

Starbucks did not announce any new store closures, but will shutter offices in Atlanta, Burbank, Chicago and Dallas while maintaining its Seattle headquarters and offices in New York, Toronto and Coral Gables, Fla. The company is also opening a new office in Nashville.

The moves are part of the company’s “Back to Starbucks” strategy, launched by CEO Brian Niccol to bolster performance and refocus attention on its coffeehouses and customer service.

On a quarterly earnings call last month, Niccol highlighted several tech innovations aimed at improving coffeehouse efficiency and productivity:

  • Plans to install automated Mastrena machines that can pull four espresso shots in less than 30 seconds.
  • Improved use of its Smart Queue system, which uses algorithms to manage the flow of cafe, drive-thru, and mobile orders.
  • A digital system called the GROW Report that provides insights into coffeehouse performance.

Starbucks, which has 41,129 coffee shops worldwide, previously reported revenue growth of 8% compared to the same period last year.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Building a general-purpose accessibility agent—and what we learned in the process

1 Share

It is an understatement to say agents have become a popular way of working with code. GitHub has adopted agent-based code creation and editing for many of its initiatives, including piloting an agent to help with our commitment to accessibility.

GitHub is currently piloting an experimental general-purpose accessibility agent to achieve two main goals:

  1. Providing engineers with reliable, just-in-time answers to accessibility questions in the GitHub Copilot CLI and the Copilot VS Code integration.
  2. Catching and automatically remediating simple, objective accessibility issues before they go to production.

For purpose number two, the accessibility agent is set to automatically evaluate changes that modify our front-end code.

To date, the agent has reviewed 3,535 pull requests, with a 68% resolution rate. In order of occurrence, the top five issue types center around:

  1. Making structure and relationships clear to assistive technologies
  2. Providing clear and concise names for interactive controls
  3. Ensuring users are aware of important announcements
  4. Ensuring there are text alternatives for non-text content
  5. Moving keyboard focus through pages and views in a logical order

Each of these issue types represents friction and barriers automatically removed that would have otherwise inhibited use of GitHub for people who use and rely on assistive technology. Here’s a screenshot of it in action:

A GitHub Actions bot comment on a line of code in a Pull Request that suggests a fix to a content order accessibility issue. The comment reads, 'WCAG 1.3.2 Meaningful Sequence: The .header CSS class uses flex-direction: row-reverse, which causes the close button to appear first in the DOM (and screen reader reading order) but visually renders after the heading. This creates a mismatch between the programmatic reading sequence and the visual layout. A simpler approach is to swap the element order in the DOM and use regular flex-direction: row in the CSS, so the reading order matches what sighted users see:' Following that is a code suggestion that re-orderes the heading and side panel toolbar, with the option to commit the suggestion to code. After that is a final comment that reads, ''This also requires updating •header in agent-task-content.module.css to change flex-direction: row-reverse → flex-direction: row." Cropped screenshot.

Interested? We’ll be outlining successes and lessons learned with this experiment, with the hopes that it can help with other teams’ accessibility journeys.

Mindset

The social model of disability teaches us that access barriers—and consequently impairment—can be created because of how an environment is built. The same thinking applies to digital experiences.

With the accessibility agent, we are not attempting to “solve” accessibility in isolation. We are instead attempting to augment our peers’ efforts, to better help them remove the barriers that may be created as a result of how we construct GitHub’s user interfaces.

The accessibility agent is not a “silver bullet” that can automatically address every hypothetical scenario. Understanding, honoring, and socializing this better helps set the agent’s scope of responsibility. This sped up the experiment’s launch, leading to more buy-in for the effort.

Past efforts

The European Accessibility Act is now in effect. Title II of the Americans with Disabilities Act is set to establish meeting WCAG 2.1 AA as the legal definition of done in April of 2027. LLM agents can read and take action on the accessibility tree.

To say it plainly: Organizations will be at a disadvantage if they have not already invested in manually identifying and remediating accessibility issues. There are many reasons for this, including building an accessibility agent.

To that point, GitHub has a mature system in place for logging accessibility issues, as well as verifying fixes to issues are working as intended. This includes:

  • A structured template for reporting problems
  • Steps to reproduce the issue
  • A rich layer of metadata about the issue’s severity level, service area, and applicable WCAG success criterion
  • Crosslinks to the Pull Request that addressed the issue
  • Acceptance criteria

In addition, all the issues are centralized to a single repository. While this issue-logging effort predated the explosion in popularity of LLM tooling, its highly consistent and structured nature made it an ideal corpus of content for the accessibility agent to reference.

Because of this, we instructed the agent to investigate these issues to see if there are related code and language snippets it can extrapolate from. This is one area where the non-deterministic “fuzzy matching” behavior of LLMs acts as an asset rather than a possible liability.

Old gold

Much like with any other specialized domain area, vague instructions in a skill file won’t cut it. Telling an LLM to “use accessibility best practices” with a short list of examples won’t work well.

When generating code, LLMs have an unfortunate bias towards producing accessibility antipatterns since every major LLM currently available is trained on decades of inaccessible code.

To counteract this, the agent needs better content to draw from.

So, I enthusiastically recommend investing in manually cataloging and remediating accessibility issues. After some progress, this data can be incorporated into the agent.

The issues and their corresponding pull requests provide highly contextual examples for the LMM to reference, written using the conventions set up by the organization it is deployed in. This collection of issues and code is by far one of the strongest assets the agent draws from.

Efficient token consumption

Accessibility is a holistic concern, intersecting with code, design, copywriting, and numerous other disciplines involved with creating user interfaces.

A lot of accessibility work is also highly contextual, meaning that someone typically needs the full working picture of a problem before they’re able to give the appropriate advice for what to do.

Because of these two factors, a general-purpose accessibility agent can consume a ton of tokens when it performs work. This has three negative outcomes:

  1. An increased amount of unreliable output
  2. Slower response times
  3. Increased operational costs

It’s important to be diligent when structuring the agent. Here’s how we went about doing just that.

Use sub-agents

The accessibility agent started as a single monolithic agent, but quickly grew past the limitations of this approach. Because of this, we evolved it to use a sub-agent architecture.

A lot of guides recommend creating a large suite of sub-agents, each with its own specific area of responsibility. Here, the sub-agents are executed in parallel, with the main agent reconciling their output.

Surprisingly, this approach worked against us for the accessibility agent. Instead, we wound up using two dedicated sub-agents:

  1. The first sub-agent acts as a passive reviewer and researcher.
  2. The second sub-agent acts as an active implementer.

The two sub-agents are sandboxed and cannot directly pass content to each other. Instead, they generate a structured, templatized output. This output is then served to the parent orchestrating accessibility agent to consume, validate, and route.

A diagram demonstrating how the parent accessibility agent passes work sequentially from itself to a read-only reviewer sub-agent, then back to the parent agent, to a write and read-capable implementer sub-agent, then back again to the parent agent. The parent agent is contained in a column labeled, 'Tier 1 - Orchestration', and the two sub-agents are contained in a column labeled, 'Tier 2 - Specialists'. The first connecting line that shows the parent agent passing work off to the reviewer sub-agent is labeled, 'run sub-agent'. The second line that passes work back to the parent agent is labeled, 'structured findings'. The third line has the parent agent passing work to the implementer sub-agent, and is labeled 'Run sub-agent with structured findings'. The fourth and final line passes work from the implementer sub-agent back to the parent agent and is labeled, 'Changes or guidance generated'. The parent and sub-agents also have lists of responsibilities. The parent accessibility agent routes requests, locates code and skills, runs complexity scoring, validates outputs, manages escalation gates, manages re-audit loops, and answers research questions. The reviewer sub-agent  performs code audits, WCAG research, detects escalation triggers, and produces structured findings. The implementer sub-agent has two modes: a default code-change mode and a fallback guidance-only mode. The code-change mode fixes critical issues first, then addresses the rest. The guidance-only mode generates guidance docs. Both modes validates changes.

There are a few reasons for this approach:

  • Escalation checkpoints. The reviewer checks for areas where human intervention will likely be needed. This includes multiple high-severity WCAG failures, as well as a list of patterns that are known to be difficult to make accessible.
  • Complexity-based behavior. The agent is instructed to operate in a specialized guidance-only mode if the underlying code is deemed too complicated. Here, the parent accessibility agent acts as an arbiter, while the reviewer agent is “opinionless” and just reports the findings as instructed.
  • Filtering. The reviewer presents everything it finds. The parent accessibility agent then utilizes resources and skills to determine what is relevant to the request. The reviewer passing all its findings to the implementer would be costly and potentially set it on irrelevant and counter-productive tasks.
  • Traceability. Direct communication between sub-agents would remove the ability to create and review an audit trail of user and agent decisions. This is important given the agent’s instruction around complex patterns, as well as the highly contextual nature of accessibility work.

Execute instructions in a linear order

In addition to being a holistic concern, effective digital accessibility work also demands a methodical, detail-oriented approach.

The concern of using sub-agents to increase the speed of the LLM’s reply is counterbalanced by our need for its results to be accurate. We found that compelling the agent to execute its sub-agent instructions in a fixed order was key.

We first establish a parent set of ordered phases. Each phase itself contains child ordered steps of instructions, which are accompanied by relevant resources and skills:

A diagram demonstrating how the research sub-agent uses ordered phases and ordered steps within each phase to produce structured output. The first phase is labeled, 'Phase 1 - Research', and contains 5 steps. The first step is labeled, 'WCAG SCs' and uses a skill called 'wcag-2.2-level-a-aa-success-criteria'. The second step is labeled, 'GitHub’s SC interpretation' and uses a skill called 'accessibility-check-wcag-sc-interpretation'. The third step is labeled, 'Assistive technology support' and uses a skill called 'accessibility-check-at-support'. The fourth step is labeled, 'Prior accessibility audits' and uses a skill called 'accessibility-search-prior-audits-general'. The fifth and final step for this phase is labeled, 'External W3C references' and is governed by a rule called 'Only if local searching is insufficient'. An arrow connects the first phase to the second phase, which is labeled, 'Phase 2 - Code audit'. The first step of phase 2 is labeled, 'Read source files on demand'. The second step is labeled, 'Incorporate user-provided URLs' and has a role called that compels it to always fetch. The third step is labeled, 'Investigate provided URLs’ links' and is governed by a rule called 'search 1 level deep'. The fourth step is labeled, 'Run validation skills' and uses a resource called 'decision table'. The fifth step is labeled, 'Cross-reference findings' and uses a skill called 'use phase 1 research'. The sixth and final step of this phase is labeled, 'Re-review all content interacted with'. An arrow connects the second phase to the third phase, which is labeled, 'Phase 3 - Structured output'. The third phase contains a single step labeled, 'Findings report, output-schema-reviewer'. It has three subsections, 'Summary', 'Finding severity scoring', and 'Each finding includes'. The summary subsection contains an ordered list that reads, '1. total findings', '2. prior audits', '3. escalation needed', '4. escalation scope', and '5. Escalated findings'. Finding and severity scoring has three levels, 'critical', 'warning', and 'info'. Each finding includes applicable WCAG SCs, applicable files and line numbers, current human-facing experience, expected human-facing experience, suggestion for remediation, and an escalation summary (if present).

The interesting bit about this linear order is that it mirrors how I would personally approach performing auditing, remediating, and reporting duties.

Use a template schema pass around sub-agent content

The entire operation of the sandboxed sub-agents is built around template schema files. These files create consistency that is vital to keeping the agent focused and on track.

The two schema templates are:

  1. Reviewer template schema: This focuses on what to audit, and how to find applicable information about it.
  2. Implementer template schema: This focuses on what to fix and how to fix it.

Without the schema files in place, the agents would all attempt to arbitrarily communicate with each other. This would create increased token expenditure, undesirable hallucinations, unnecessary code changes, and difficult-to-impossible behavior for agent auditing purposes.

Acknowledging limitations

Another vital aspect of creating the accessibility agent is understanding areas where agents can fall short.

As the agent is not a turnkey “solution” for accessibility, we want to avoid situations where the agent’s output in error may not be sufficiently interrogated by the human using it. This is especially relevant when someone is not well-versed in digital accessibility considerations and practices.

Here’s what we did to accommodate the agent’s limitations:

Evaluate code complexity

We want to avoid scenarios where we would need to perform costly and time-intensive work to revisit an inaccessible solution that the agent “thinks” is accessible.

To aid with this problem, the accessibility agent uses a small shell script to analyze the code it is set to work on. The script itself is simple, using a small set of basic heuristics to evaluate the relative complexity and distill it down into a score.

This score is then ingested by the agent. If the score passes a set threshold, the agent is instructed to not execute code changes. Instead, it will inform the person using the LLM that they should reach out to the accessibility team to consult on what they are attempting to do.

Identify high-risk patterns

It is a subtle thing to understand, but know that it is entirely possible to have code that passes automated accessibility checks, yet is functionally unusable.

As a companion to code complexity, the accessibility agent is instructed to avoid attempting code generation for patterns the accessibility team has identified as high-risk. This includes, but is not limited to: drag and drop, toasts, rich text editors, tree views, and data grids.

These patterns require a ton of focused attention and detail and currently sit outside of an LLM’s current capabilities to produce in a way that actually works with assistive technology.

Not prohibiting high-risk patterns and high-complexity code environments would lead to unnecessary demands of everyone’s time to readdress, and also represents reputational risk for the accessibility team. We avoid this by shutting off the LLMs capability to go down this pathway.

Reduce bias to action

I am loathe to anthropomorphize LLMs, but one quality they all seem to share is desperately wanting to produce content. For Copilot, that often means generating code.

We had to create anti-gaming instructions to prevent the LLM from creating sneaky ways to get around its instructions to not generate code when human expertise is needed. This prevented it from violating its own intervention instructions.

Know that programmatically determinable issues don’t cover everything

Agent success metrics live within a larger context.

Of the 55 total WCAG level A and AA Success Criterion, only 35 of them can be detected via deterministic automated code checkers. This means that ~36% of level A and AA Success Criterion cannot be discovered automatically.

A pie chart titled, 'WCAG A and AA Success Criterion'. The first of two slices is labeled, '36% require manual evaluation'. The second of two slices is labeled, '64% can be detected automatically'.

LLM-powered agent operation is making inroads on this ~36% gap, but it is not a perfect science. Because of this, it becomes important to manually identify accessibility barriers earlier during design and prototyping effortsthe area where the majority of accessibility issues originate.

This thinking is also reflected in the agent’s escalation logic, in that members of the Accessibility team can pair with designers to help consider alternate approaches and brainstorm solutions that achieve business goals without compromising on accessibility.

This intervention and assistance is done to thwart potential downstream issues—and costly and time-consuming redesigns—are stopped before they ever have a chance to get off the ground.

Manually evaluate agent output and adjust things that aren’t working as expected

We periodically perform manual review of agent output to determine its accuracy and efficacy. In addition, we have tooling in place to capture pull request reviewer sentiment. Both serve as strong signals for areas where the agent needs better instruction, as well as new resources and skills.

Learning in the open

To recap, we learned that the agent is:

  • Used to aid and augment existing accessibility efforts, not to replace them.
  • Significantly more effective when trained on manually audited and remediated accessibility issues for your specific experience.
  • Far more efficient with token consumption when utilizing sub-agents.
  • More accurate and effective when executing instructions in a methodical, linear fashion.
  • More consistent when set to use preformatted templates to pass information around.
  • Set to understand its limitations and route people to alternative support systems.
  • Improved when its output is periodically reviewed to identify areas it needs better instruction.

This journey is also not yet complete. The accessibility agent continues to be iterated upon in the hopes of helping ensure GitHub is an accessible and inclusive platform for all developers.

We hope that we can eventually open source the agent as part of our pledge to help improve the accessibility of open source software at scale. Until then, we hope that in sharing our learnings with this undertaking that other teams can have a resource to reference for their own accessibility efforts.

The post Building a general-purpose accessibility agent—and what we learned in the process appeared first on The GitHub Blog.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Raising the bar: Quality, shared responsibility, and the future of GitHub’s bug bounty program

1 Share

The security research community is one of GitHub’s greatest assets. Every year, researchers from around the world help us find and fix vulnerabilities, making the platform safer for over 180 million developers. Our bug bounty program exists because we believe that collaboration with external researchers is one of the most effective ways to improve security, and we remain deeply committed to it.

But like every bug bounty program, we’re adapting to a changing landscape. We want to share what we’re seeing, what we’re doing about it, and how we think about the security boundaries of a platform like GitHub.

The volume problem

Over the past year, submission volume across the industry has grown significantly. New tools, including AI, have lowered the barrier to entry for security research, which in many ways is a positive development. More people exploring attack surfaces means more opportunities to find real issues.

However, alongside the growth in legitimate reports, we’ve seen a sharp increase in submissions that don’t demonstrate real security impact. These include reports without a proof of concept, theoretical attack scenarios that don’t hold up under scrutiny, and findings that are already covered by our published ineligible list. This isn’t unique to GitHub. Programs across the industry are grappling with the same challenge, and some have shut down entirely.

We don’t want to go that direction. Instead, we want to invest in making our program better.

What makes a strong submission

We’re raising the bar on what we consider a complete submission. Going forward, reports will be evaluated more strictly against these criteria:

  • A working proof of concept with demonstrated security impact. Show us the impact, don’t just describe it. What could an attacker actually achieve? We need a working proof of concept that demonstrates real exploitation and concrete security impact. Show us the boundary that can be crossed, not just that one theoretically exists. If your report says “this could lead to…” but doesn’t show that it does, it’s incomplete.
  • Awareness of scope and ineligible findings. Before submitting, review our scope and ineligible findings list. Reports covering known ineligible categories (DMARC/SPF/DKIM configuration, user enumeration, missing security headers without a demonstrated attack path, and others) will be closed as Not Applicable, which may impact your HackerOne Signal and reputation.
  • Validation before submission. No matter what tools you use (scanners, static analysis, AI assistants), you need to validate the output before submitting. A false positive that’s been manually reviewed is caught before it wastes anyone’s time. One that hasn’t is just noise.

We welcome AI in security research

We want to be explicit about this: we have no problem with researchers using AI tools. AI is a force multiplier, and we expect it to play an increasing role in security research. We use AI across our own internal security programs, and we’re seeing the best external researchers do the same. We welcome it.

What we need is the same standard we’ve always expected: validation. An AI-assisted finding that’s been verified, reproduced, and submitted with a working proof of concept is a great submission. An unvalidated output submitted as-is without reproduction or demonstrated impact is not. This isn’t a new standard. It’s the same standard we apply to scanner output, static analysis, or any other tool. The human researcher is accountable for the accuracy of the submission.

We’d also ask researchers to keep reports concise and structured. A strong report has three things: a short summary of the issue, clear steps to reproduce with supporting evidence (screenshots, HTTP requests, terminal output), and an impact statement explaining what an attacker can actually achieve. That’s it. Verbose reports such as multi-page theoretical narratives, restated background context, or AI-generated filler slow down triage because the actual finding gets buried. The clearer and more direct your report, the faster we can act on it.

The tools don’t matter. The quality of the work does.

Understanding GitHub’s security model: Shared responsibility

One pattern we see frequently deserves its own discussion. Many reports describe scenarios where a user interacts with attacker-controlled content (a malicious repository, a crafted issue, untrusted code) and experiences an undesirable outcome. These reports are often well-written and technically accurate in their observations, but they misunderstand where the security boundary lies.

We invest heavily in systems and teams dedicated to detecting and handling malicious content across the platform, from automated scanning to manual review. That said, GitHub operates on a shared responsibility model. Users are responsible for:

  • Choosing which repositories, issues, and code they trust. GitHub hosts over 600 million repositories. Not all of them are benign. Users are expected to exercise judgment about what they interact with.
  • Reviewing content before executing or interacting with it. This applies to code, scripts, workflows, and any other executable content.
  • Understanding that cloning a repository means choosing to trust that code. Git hooks, build scripts, and other repository-level automation execute because the user chose to check out that repository.
  • Configuring their own environment securely. Token management, credential storage, and local security settings are the user’s responsibility.

When an “attack” requires the victim to actively seek out and engage with attacker-controlled content (cloning a malicious repo, asking an AI tool to analyze untrusted code, opening a crafted file), the security boundary is the user’s decision to trust that content. These scenarios generally don’t represent a bypass of GitHub’s security controls.

Common examples

To help researchers calibrate, here are patterns we see regularly that fall under shared responsibility:

ScenarioWhy it’s shared responsibility
Prompt injection via content the user chose to feed to an AI toolThe user decided to trust that content
Git hooks or filters executing code in a repo the user checked outThis is how Git works by design
Malicious content in a repository the user clonedCloning is an act of trust
LLM producing unexpected output when processing untrusted inputThe user chose to provide that input

Research in these areas is still extremely valuable. If you think you’ve found a blind spot in our defenses (a way to bypass an actual security control that affects users without requiring them to actively trust malicious content), that’s exactly what we want to hear about. Those findings are some of the most impactful submissions we receive. And if you come across content that violates our Terms of Service, please report it.

What this means for researchers

If you’re already submitting quality research, thank you. Nothing changes for you except faster response times as we reduce queue noise.

If you’re newer to bug bounty, welcome! Take a few minutes to read our scope, review the ineligible list, and invest in a working proof of concept before submitting. Quality submissions from new researchers are always valued and appreciated.

If you’ve been prioritizing volume, we’d encourage a shift toward depth. One well-researched, validated finding is worth more than 10 speculative ones, both in bounty payout and reputation. The researchers who earn the most from our program are the ones who go deep.

Changes to how we reward low-risk findings

Not every valid submission represents a meaningful security risk. Some reports identify hardening opportunities or documentation gaps that, while not exploitable, still lead to improvements we choose to make. We appreciate that work.

Going forward, we’re updating how we handle these cases. Submissions that don’t demonstrate significant security impact but do result in a code or documentation fix will be recognized with GitHub swag rather than a bounty payout. This lets us acknowledge the contribution while focusing our bounty resources on the findings that have the greatest impact on platform security.

We’d rather see researchers invest their time in deeper, high-impact research and be compensated accordingly than optimize for volume on low-risk findings.

Looking ahead

We’re committed to making GitHub’s bug bounty program one of the best in the industry, for researchers and for the security of the platform. That means faster triage, clearer communication, and ensuring that valid findings get the attention and compensation they deserve. Raising quality standards is part of that investment.

Security researchers make GitHub safer for every developer who depends on it. That work matters, and we don’t take it for granted.

Happy hacking! 🚀

The post Raising the bar: Quality, shared responsibility, and the future of GitHub’s bug bounty program appeared first on The GitHub Blog.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

How Developers Should Build AI Tools – So The EU Doesn’t Lose IT

1 Share

The August 2026 deadline for the EU AI Act is getting close, and companies and developerds building AI products are starting to feel it.

High-risk AI systems need to be compliant by then, and the ones doing it well aren’t treating it as a last-minute legal scramble. They’re building compliance in from the start.

We sat down with Ervin Jagatic (AI Business Unit Director, Infobip) to talk about what that actually looks like at Infobip, and why compliance-by-design is turning into something engineers think about, not just lawyers.

Compliance starts in the design phase

AI Act compliance doesn’t start at deployment. Ervin is clear on this: it has to enter during system architecture, before a single line of agent code is written:

Compliance enters during the design phase – system architecture, data flow planning. Every layer of our AI Agents product, from planning to memory to tool execution, needs to be designed with traceability and human oversight in mind. We can’t bolt that on after the orchestrator is already coordinating multiple sub-agents autonomously.

The AI Act is changing product development in 3 ways

That shift has already changed how Infobip’s teams design and ship AI-powered features. Ervin points to three major changes that came directly from the AI Act.

1. Transparency and auditability

Transparency is the first. Infobip’s AI Agents documentation is explicit: “you cannot script exact responses” – agents “generate responses dynamically.”

That unpredictability is exactly why the company expanded its logging and analytics infrastructure, Ervin explains:

The AI Act’s transparency obligations pushed us to build comprehensive logging into our Insights and Analytics layer. Every agent execution now produces detailed logs – requests, responses, processing steps. That’s not just good engineering, it’s a direct response to auditability requirements.

2. Explicit guardrails instead of assumptions

The second shift relates to behavioral boundaries and guardrails. Infobip now requires customers to define capability boundaries, mandatory restrictions, and compliance rules directly inside every agent’s system prompt, Ervin points out:

Our own documentation warns that if you do not explicitly define these constraints, the agent makes assumptions. That design philosophy, forcing explicit guardrails rather than relying on implicit model behavior, comes directly from the Act’s emphasis on risk mitigation by design.

3. Human oversight is a part of the architecture

The third shift is human oversight – not as an external policy layer, but built directly into the product architecture. Ervin explains:

AgentOS uses a human-in-the-loop model where complex issues are escalated from AI agents to human agents. We are talking about a core architectural decision that applies human oversight requirements while also improving the product.

Why compliance-by-design is becoming the standard

Ervin believes compliance-by-design is quickly becoming the new industry standard, particularly for teams building enterprise-grade AI systems:

For developers and ML engineers at Infobip, compliance-by-design means several practical things. It means every AI agent we build has a defined architecture where an orchestrator coordinates sub-agents, each with explicit scope, tools, and behavioral rules.

It also changes how engineering teams think about data. “It means our engineers think about data lineage and provenance from the moment they design a training pipeline, not because someone from legal asked them to, but because the architecture demands it,” Ervin points out.

To support that approach, Infobip invested heavily in tooling and analytics infrastructure that now serves both operational and regulatory purposes, Ervin said:

Our Insights and Analytics platform is our compliance infrastructure. When a regulator asks ‘show me how this AI system made this decision,’ we need to answer that question with structured evidence, not anecdotes.

Risk assessment depends on the use case

Internally, the company approaches risk assessment through a framework closely aligned with the AI Act’s four-tier classification model: unacceptable, high, limited, and minimal risk. However, Ervin notes that Infobip applies this framework at the feature level rather than only at the system level:

This is important because a platform like Infobip’s serves vastly different use cases. An AI gamification tool for lead generation on WhatsApp is a fundamentally different risk profile than an AI agent that handles authentication.

The company evaluates risk based on several factors, including the sensitivity of the data involved, the autonomy of the AI component, and the intended use case, Ervin explains:

Our internal process follows a lifecycle approach. During identification, we map known and foreseeable risks, including risks from reasonably foreseeable misuse. During estimation, we assess probability and severity. During mitigation, we implement design controls, testing procedures, and human oversight.

Monitoring continues after deployment through analytics infrastructure designed for drift detection, incident investigation, and performance tracking. For enterprise customers, risk assessment also becomes a collaborative process between Infobip and client compliance teams.

A bank using our AI agents to automate customer support has different risk considerations than a retail brand using the same technology for product recommendations. The platform is the same; the risk profile is not.

August 2026 is approaching…

As August 2026 closes in, Ervin says the conversation has shifted:

The question is no longer whether to integrate compliance into product development. The question is whether you’ve built the infrastructure to do it at speed.

The post How Developers Should Build AI Tools – So The EU Doesn’t Lose IT appeared first on ShiftMag.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories