It's time: the Microsoft Partner of the Year Awards nomination window is now officially open until Tuesday, July 7, at 6:00 PM Pacific Time. Nominate your organization today for awards highlighting exceptional work across solution areas, industries, and regions.
The 2026 Partner of the Year Awards celebrate the many ways our partners are driving Frontier Transformation––turning AI ambition into high-impact business outcomes for customers.
Winning an award is a meaningful achievement that highlights the transformative influence partners like you have on customers around the world—and positions your organization for greater market recognition and new business opportunities. Winners receive benefits such as a customized logo and other go-to-market assets that signify award-winning status, as well as impactful press coverage.
To prepare your submissions, we recommend reviewing the following resources before completing your application:
Winners and finalists will be announced on November 11—just in time to celebrate together at Microsoft Ignite 2026. We look forward to reviewing another amazing set of nominations, so get started today.
Don’t miss this opportunity to celebrate your organization’s accomplishments on a global stage!
M.G. Siegler is the author of Spyglass.org. Siegler joins Big Technology to discuss whether Google is falling behind in AI as OpenAI and Anthropic push ahead with coding agents and super-app ambitions. Tune in to hear why AI agents may reshape the way people use the web, email, apps, and browsers, and why that could put Google in a difficult position. We also cover Apple’s upcoming WWDC, the rumored iPhone Fold, Meta’s messy subscription strategy, and Anthropic’s move toward an IPO. Hit play for a sharp, wide-ranging conversation on the biggest power shifts happening in tech right now.
Join Big Technology's AI Summit on June 18: summit.bigtechnology.com
---
Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice.
Want a discount for Big Technology on Substack + Discord? Here’s 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b
Learn more about your ad choices. Visit megaphone.fm/adchoices

Six months ago, the agentic coding tool was still an argument about form. By the start of June 2026, the argument is mostly over.
The four products that have come to define the category this year have spent the past several months quietly agreeing on what one of these things should be.
The clock starts in November. Google shipped Antigravity in public preview on November 18, 2025, the same day Gemini 3 arrived, and that release pushed the agent-first coding surface into the mainstream. Anthropic’s Claude Code, OpenAI’s Codex and Anysphere’s Cursor were already in the field.
Watching all four grow up over the same half-year tells you more than any single launch, because the interesting part came after the announcements. Think of it as the smartphone settling into a glass slab: Once everyone accepted the shape, the contest moved to the platform around it.
Claude Code stayed close to where it started, living in the terminal and leaning on Anthropic’s long-context reasoning, compaction, and an approval-heavy flow, which makes it strong on large-codebase work where an agent has to hold a lot in its head before touching a line. Developers who want to read every change before it lands gravitate here, and the friction is deliberate, since on a serious codebase the riskiest moment is the one just before a command runs or a file changes, and Claude Code puts a human at exactly that point.
Cursor went the other way and stayed model-agnostic. It runs inside a familiar VS Code surface and lets you point Cursor at whichever frontier model you already pay for, so a team is not tied to one vendor’s release calendar. The deeper advantage is that it asks for no workflow migration, letting developers add agency without leaving the files, tabs, diffs, and shortcuts they navigate by reflex, while the Composer agent now handles multi-file work without pulling them out of the editor.
Codex took the distribution route. Because Codex is packaged into ChatGPT plans for most users rather than carrying a price tag of its own, it reached scale faster than anything else in the category, even as heavier and business usage is now governed by Codex-specific limits and credits. OpenAI reported more than 3 million weekly developers in mid-April 2026 and more than 4 million by late May, with the real money coming from enterprise rollouts within ChatGPT Business and Enterprise.
Antigravity traveled the furthest distance from where it began. It launched as an AI-native IDE built on a fork of VS Code, then relaunched at Google I/O on May 19, 2026 as Antigravity 2.0, a five-surface platform spanning a standalone desktop app, a CLI, an SDK, a Managed Agents API inside the Gemini API, and an enterprise layer for Google Cloud customers.
Think of it as the smartphone settling into a glass slab: Once everyone accepted the shape, the contest moved to the platform around it.
The rebuild was not gentle, removing the original IDE as the default and breaking setups overnight, after an earlier round of anger in March 2026 when Google shifted to a credit-pack model and tightened quotas. Read against Google’s other moves, the real bet is a route from a local coding agent to a managed agent runtime on Google Cloud, the same harness running in the desktop client, the CLI, the Gemini API and the enterprise platform.
One name is deliberately missing from those four. GitHub Copilot shaped the whole category, and its coding agent now plans work, edits a branch and opens a pull request with enterprise controls attached. I kept the focus on the products that drove the agent-first conversation this year, but Copilot earns watching because GitHub already owns the place where issues, pull requests, reviews and Actions live, a home-field edge as agent-written work flows to where it gets merged.
Line the four up today, and the resemblances are hard to miss. They are converging on the same pattern: a terminal or command-line surface, explicit planning before execution, approval gates, access to external tools through the Model Context Protocol, and some form of delegated or parallel agent work. Four labs with very different cultures arrived at almost the same blueprint inside six months, which usually signals the design was less a choice than a discovery.
Four labs with very different cultures arrived at almost the same blueprint inside six months, which usually signals the design was less a choice than a discovery.
Ask any of them to fix a failing integration test across three files and the flow looks much the same, where the agent reads the repo, proposes a plan, waits for approval, edits, runs the test, and reports back while you watch the diffs stream past. That sameness has quietly changed what one of these tools is: a coding agent now reads issues, edits branches, runs tests, calls tools, and opens pull requests, behaving like a junior teammate with commit access rather than an autocomplete.
The connector everyone points to is MCP, but the quieter standard forming inside the repository may matter more. The AGENTS.md convention turns the repo itself into the agent’s onboarding guide, holding how to run tests, what style to follow, and where not to touch, and Codex, Cursor, Copilot, and Windsurf all read it natively.
OpenAI started it; Google, Cursor, and Sourcegraph joined; and since December 2025, it has sat under the Agentic AI Foundation at the Linux Foundation alongside MCP. Convergence here stops short of total, because Claude Code still reads its own CLAUDE.md, yet the direction points to a single instruction file that spans tools and makes an agent’s behavior portable.
What this convergence quietly did was demote the model. For most of 2025, the pitch was about whose model wrote better code. On SWE-bench Verified, the leading scores now sit within a narrow band of each other as of mid-May 2026, and Cursor will happily run any of them.
When the engine stops separating products, the difference moves to everything around it: the harness, the workflow, the approval model, and the distribution channel, and I’d argue that is the most important shift of the last six months, the reason a team’s choice now turns on fit rather than which leaderboard a model topped last week.
Benchmarks still measure whether an agent can solve an isolated task, but in real repositories the hard part is landing a change that survives local conventions, CI, and a human reviewer, so teams are starting to route work by type rather than swear loyalty to one tool.
Lock-in builds in that same layer. A team that wires its review habits, skills, hooks, and subagent patterns around one tool does not switch lightly, and Antigravity’s painful CLI migration showed how much friction there is once a workflow is in place.
Pricing is where the four stop rhyming, and the first thing to grasp is that an agent bills less like a seat than like a compute job, because it reads large repos, spins up sandboxes, runs tests, and loops through retries before it lands a mergeable change. The number worth comparing is the cost per accepted change, rather than the monthly sticker price, since cheap-at-the-door rarely results in cheap-at-scale once a team runs agents all day.
An agent bills less like a seat than like a compute job… The number worth comparing is the cost per accepted change, rather than the monthly sticker price.
Codex is the outlier because it has no line item of its own and rides on top of ChatGPT plans, which drove its rapid growth, though heavier work is metered through Codex-specific credits. Cursor Pro and Claude Code’s entry tier both sit around the $20 mark as of June 2026, with usage-based costs layered on top, while Anthropic’s Max plans run well above that for power users.
Antigravity still carries preview-style access, but Google’s quota and plan changes, including a new $100 per month AI Ultra tier announced around I/O, already show how unstable free becomes once agent workloads get expensive.
| Tool | Center of gravity | Where it tends to pull ahead |
|---|---|---|
| Claude Code | Terminal-native, approval-first | Deep reasoning and large-codebase work, for teams that want to read every diff |
| Cursor | Model-agnostic IDE | Editor-bound teams that want to choose their own model and avoid vendor lock-in |
| Codex | Bundled into ChatGPT | Fast reach and enterprise rollout, helped by no separate price tag |
| Antigravity | Multi-surface platform | Google Cloud and Android shops wanting managed agents, with preview risk attached |
No team should read that table as a verdict. Most shops I talk to run two of these side by side, one in the terminal for serious refactors and one in the editor for everyday edits. The trap is that all four look almost identical in a demo, and the differences that bite show up later, in where the code runs, what the agent may touch, and what it costs over a week of real work. That layer is worth poking at before committing, far more than the SWE-bench number on the launch slide.
The framing that Grok Build is something to watch for in the coming weeks needs a small correction, because xAI has already moved. It arrived in early beta in mid-May 2026 for the highest SuperGrok tier, and xAI published its Grok Build announcement on May 25, opening access to all SuperGrok and X Premium Plus subscribers.
The tool is a terminal-native CLI backed by the grok-build-0.1A model, which xAI says it trained specifically for agentic coding, with a reported score of around 70.8 percent on SWE-bench, verified in early third-party writeups.
Two design choices stand out. Grok Build runs up to eight subagents in parallel, each isolated in its own Git worktree, the boldest architecture bet anyone in the category has made. xAI also calls it local-first, with source code and credentials staying on the machine rather than going to xAI’s servers during a session, which appeals to teams in regulated work, though its compliance paperwork is still thinner than the marketing.
Six months of convergence has settled the shape of the agentic coding tool and turned the next phase into a contest over the harness, the price, and the habits a team builds around one product.
Local execution is not local inference, so what actually matters is which repository context is still used to reach the model. The piece still missing is Arena Mode, which would generate several candidate outputs and let you pick the best, and which has appeared in code traces but is not yet live in the beta.
The launch has happened, so the real test over the coming weeks is retention, namely whether Grok Build keeps developers in the terminal past the first week, whether Arena Mode ships and narrows the benchmark gap in practice, and whether the aggressive pricing pulls paying testers off the incumbents.
Six months of convergence has settled the shape of the agentic coding tool and turned the next phase into a contest over the harness, the price, and the habits a team builds around one product. A fifth terminal agent has now entered that contest with a large captive base inside X Premium Plus and an owner willing to spend, reason enough to watch how the incumbents answer.
The post Claude Code vs. Cursor vs. Codex vs. Antigravity — six months in appeared first on The New Stack.
If you’re already familiar with sandboxing as an isolation technique, sandbox security is the next layer: the policies, controls, and enforcement mechanisms that make sure those isolation boundaries actually hold under real-world pressure.
According to our State of Agentic AI report, 40% of respondents cite security as the top challenge in scaling agentic AI, and 43% point to increased security exposure from orchestration sprawl. As agents execute code, call APIs, and interact with live infrastructure, a sandbox without strong enforcement is a locked room with an open window.
This piece goes deeper into what sandbox security looks like day to day. We’ll cover how to choose the right implementation model and why this layer of security matters now more than ever as AI agents start executing code in your infrastructure.
Key takeaways
- Sandbox security is the practice of enforcing isolation boundaries and access controls around sandboxed environments to prevent threats from escaping containment.
- Effective sandbox security combines multiple layers: process isolation, network segmentation, resource limits, and runtime monitoring.
- As AI agents increasingly execute arbitrary code in production, sandbox security has become critical infrastructure for safe deployment.
Sandbox security is the set of controls and enforcement mechanisms that prevent untrusted or risky processes from breaching their isolation boundaries. Where sandboxing creates the boundary, sandbox security ensures it holds.
As we mentioned before, a sandbox without strong security controls is like a locked room with an open window. The isolation exists in theory, but the enforcement gaps leave room for escape.
For developers and platform engineers, this translates into concrete, daily decisions: which system calls an agent is allowed to make, whether a process can reach the network, how much memory or CPU it can consume, and what happens when it tries to exceed those limits. These are not abstract policy questions. They’re flags you set, profiles you configure, and defaults you either audit or accept on faith.
Sandbox security is not a single control. It’s a combination of mechanisms that work together to keep isolation boundaries intact. The most effective implementations layer several of these components so that a failure in one area does not compromise the entire sandbox.
Process isolation ensures that code running inside a sandbox has no visibility into processes on the host or in other sandboxes. On Linux, kernel namespaces handle this by partitioning process IDs, network interfaces, file systems, and user IDs into separate scopes. A process inside a namespace sees only what you’ve explicitly made available to it.
When things go wrong. Run a container with –pid=host and you’ve just given that workload a window into every process on the machine. It can enumerate services, identify targets, and attempt to interfere with them. That single flag turns your sandbox into a shared apartment.
Proper sandbox security eliminates this by enforcing strict namespace boundaries by default and flagging configurations that weaken them.
Even within a namespace, processes interact with the host kernel through system calls. System call filtering (commonly implemented through seccomp profiles on Linux) restricts which kernel functions a sandboxed process can invoke. Docker’s default seccomp profile blocks around 44 of the 300+ available Linux system calls. That’s a meaningful reduction in attack surface, but it’s a general-purpose default, not a tailored fit.
What to look for. High-security workloads benefit from custom seccomp profiles scoped to the specific application. A sandboxed process that needs to read files and make HTTP requests has no reason to call mount, init_module, or reboot. The tighter the profile, the fewer options an attacker has if they gain code execution inside the sandbox. It’s the same least-privilege thinking that underpins container security more broadly.
A sandbox that can communicate freely with external systems or internal services is harder to defend. Network segmentation restricts what a sandboxed process can reach, limiting both inbound and outbound connections. That’s especially important for workloads that process untrusted input or execute arbitrary code.
How this applies to agents. AI agents that invoke external tools or APIs during execution present a unique challenge. Without network controls, a compromised agent could exfiltrate data to an external endpoint or pivot to internal services it was never intended to reach. Enforcing egress policies at the sandbox environment level ensures agents can only communicate with pre-approved destinations.
Resource exhaustion attacks do not require a sandbox escape, and that’s what makes them easy to overlook. A runaway process that consumes all available CPU or memory can take down every other workload on the same host without ever breaching an isolation boundary. Cgroups on Linux cap what each sandbox can consume, turning a potential host-wide outage into a single contained failure.
The tricky part is calibration. Set memory limits too low and legitimate workloads get OOM-killed. Set them too high and you’re back to sharing the blast radius. The most reliable approach is to monitor actual resource consumption over time, set limits based on observed peaks plus a margin, and treat the initial configuration as something you’ll tune rather than something you’ll get right on the first pass.
Prevention is only part of the equation. You also need to know what’s happening inside the sandbox. Runtime monitoring tools observe system calls, file access patterns, network connections, and process behavior as they occur. When something deviates from the expected baseline, the system can alert operators or kill the process automatically. If you’re evaluating AI governance tools, you’ll find that many of these runtime observability capabilities overlap directly with agent monitoring requirements.
Audit trails serve a different but equally important purpose. When an incident does happen, you need a forensic record of exactly what the sandboxed process did: which files it touched, which endpoints it called, which syscalls it made. That’s valuable for incident response and essential for compliance frameworks that require demonstrable evidence of isolation and access control.
Understanding the different sandboxing models is a good starting point, but the more useful question for sandbox security is: what does each model actually protect against, and what do you need to configure to make it hold? Here’s how they compare on the dimensions that matter for security decisions.
|
Model |
Isolation boundary |
Key security controls |
Best for |
Watch out for |
|
OS-level namespaces, seccomp, MAC |
Shared kernel, separate namespaces |
seccomp profiles, AppArmor/ SELinux policies, read-only rootfs, capability dropping |
Container runtimes, CI/CD jobs, most production workloads |
Kernel vulnerabilities bypass all controls; defaults are permissive |
|
VM-based microVMs, hardware virtualization |
Separate kernel per sandbox |
Hypervisor-enforced memory isolation, independent kernel patching, vTPM |
Multi-tenant platforms, malware analysis, running fully untrusted code |
Higher resource cost; networking and image management add ops complexity |
|
Application-level Wasm, browser tabs, language VMs |
Within-process memory and API restrictions |
Memory-safe execution model, restricted host API surface, capability-based permissions |
Plugin systems, edge functions, embedded scripting |
App compromise bypasses internal sandbox; should never be the only layer |
The right choice depends on your threat model. For most containerized workloads, OS-level controls with a hardened seccomp profile and mandatory access control policy provide strong security at minimal overhead. VM-based isolation makes sense when you genuinely do not trust the code being executed, such as in multi-tenant environments or agent-driven code generation. Application-level sandboxing is a valuable addition in either case, but it should layer on top of kernel-level or hypervisor-level controls, never replace them.
Whichever model you choose, treat the default configuration as a starting point. The security of any sandbox does depend on the isolation technology, but whether someone actually audited the settings is the sticking point. It’s the same software supply chain security discipline that applies at every layer of the stack: trust, but verify the configuration.
Traditional applications follow predictable execution paths. You can read the code, trace the logic, and anticipate the behavior. AI agents are a different story. They make decisions at runtime, generate and execute code on the fly, call external tools, and produce outputs that their own developers may not have anticipated. That autonomy is the whole point of agents, but it’s also what makes sandbox security non-negotiable.
In these situations, perimeter-based security is not sufficient. You need controls that constrain agent behavior at the execution level, regardless of what the agent decides to do. It’s a fundamentally different security challenge. Teams building AI agent sandboxes are converging on a few patterns that address the unique risks agents introduce.
When an AI agent invokes a tool (a code interpreter, a file manager, an API client), each tool execution should run inside its own sandbox with the minimum permissions required. If the agent’s tool-use layer is compromised, sandbox security prevents that compromise from reaching the host or other services.
Agents often process sensitive data as part of their reasoning. Sandbox security controls which files, databases, and environment variables are visible inside the agent’s execution environment. A well-configured secure sandbox exposes only the data the agent needs for its current task, nothing more.
Left unchecked, an agent with network access could make arbitrary HTTP requests, potentially exfiltrating data or interacting with unintended services. Network-level sandbox security restricts egress to an allowlist of approved endpoints.
Start with your threat model. Which workloads process untrusted input? Which ones execute arbitrary code or handle sensitive data? Those are your highest-priority candidates for hardened sandbox security.
From there, layer controls rather than relying on any single mechanism. Combine process isolation with system call filtering, add network segmentation, set resource limits, and enable runtime monitoring. Each layer addresses a different category of risk. Together, they create a posture where any single failure stays contained.
If you’re already running containers, much of the foundation is in place. Container runtimes provide namespace isolation, seccomp profiles, and cgroup limits out of the box. The next step is to actually audit those defaults against your requirements and tighten what needs tightening. Docker Sandboxes extend this with purpose-built microVM isolation for agent workloads.
Start with Docker Sandboxes to put sandbox security into practice.
Sandboxing is the technique of running code in an isolated environment. Sandbox security is the broader discipline of ensuring that isolation actually holds. It’s the policies, configurations, monitoring, and enforcement mechanisms that make a sandbox resistant to escape, resource abuse, and unauthorized access. You can have a sandbox without strong security, but the isolation it provides will be unreliable.
No single security measure can guarantee complete protection. Sandbox security significantly raises the bar by layering multiple controls (namespaces, seccomp, network policies, resource limits, runtime monitoring) so that an attacker would need to bypass several independent defenses. This defense-in-depth approach reduces risk to a level most organizations consider acceptable, especially when combined with regular patching and configuration audits.
The performance impact varies by implementation. OS-level controls like namespaces and seccomp add negligible overhead. Network policies and resource limits introduce minimal latency. VM-based sandbox security has higher overhead due to hardware virtualization, but technologies like microVMs have narrowed that gap significantly. For most workloads, it’s a trade-off that strongly favors security.
Absolutely. AI workloads, particularly agents that execute code dynamically, are among the highest-priority use cases for sandbox security. These workloads are inherently unpredictable, and that’s exactly why strong isolation boundaries are essential. Sandbox security ensures that even if an agent produces unexpected behavior, the impact stays contained within its execution environment.
Several frameworks reference isolation and access controls that map directly to sandbox security practices. SOC 2 requires logical access controls and monitoring. PCI DSS mandates network segmentation for systems handling payment data. FedRAMP and NIST 800-53 include specific controls around process isolation and boundary protection. Organizations pursuing these certifications often find that container-based sandbox security, guided by a structured AI governance framework, provides a strong implementation foundation.