Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
156790 stories
·
33 followers

Why we built ADK 2.0

1 Share
Answering the questions of "why we built ADK 2.0". This explains the rationale, some of the features, and why a developer should consider upgrading. This will be published the day after ADK go 2.0 launches.
Read the whole story
alvinashcraft
43 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

ML Development in VS Code with Google Cloud Power: Workbench Extension Now Available

1 Share
The Google Cloud Workbench Notebooks extension for VS Code has officially launched, allowing developers to connect their local IDE to scalable, cloud-based Jupyter environments. This integration streamlines the machine learning lifecycle by eliminating context switching and providing direct access to high-performance Google Cloud infrastructure. To support transparency and community-driven innovation, the newly released extension is fully open-sourced and available on GitHub and the VS Code Marketplace.
Read the whole story
alvinashcraft
44 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Deploy to Kubernetes clusters

1 Share
Publish and deploy your Aspire application to an existing Kubernetes cluster using Helm — generated charts, values overrides, secrets, and release management workflows.
Read the whole story
alvinashcraft
44 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Why AI Agents Need Isolation

1 Share

AI coding agents are quickly becoming part of everyday development workflows. Today, AI tools can write and execute code, install dependencies, debug repositories, interact with APIs, automate terminal tasks, and modify project files. What once required constant developer involvement can increasingly be delegated to AI-assisted workflows. 

This shift is exciting, but it also changes an important assumption in software development: Should AI-generated code run directly on your machine? As AI agents become more capable, developers need safer ways to experiment, automate, and execute AI-assisted workflows.

That is where isolation becomes important. Docker Sandbox (sbx) introduces a more secure execution model for AI workflows by combining sandbox isolation, microVM-based protection, customizable environments, secure credential handling, and controlled network access. This article explores why isolation matters for AI agents, what Docker SBX changes, and how Sandbox Kits help create safer AI development environments.

The Shift From AI Assistance to AI Action

For years, AI developer tools mostly acted as assistants. They suggested code, explained concepts, or answered questions. Modern AI agents are different. Instead of only suggesting code or answering questions, they can run terminal commands, install packages, edit repositories, access external services, execute generated scripts, and interact directly with development environments. This shift moves AI systems from passive assistance toward active participation in software workflows. That creates new possibilities for productivity. It also introduces new risks.

AI systems generate outputs probabilistically. Even strong models can make mistakes, misunderstand context, or generate unsafe commands. A generated command might:

  • remove important files
  • expose credentials
  • install malicious dependencies
  • modify configurations unexpectedly
  • access sensitive local data

In traditional workflows, developers directly control these actions. With AI agents, developers increasingly supervise actions generated by the model itself. That changes the security model.

Why Isolation Matters

The core idea is simple: AI-generated actions should not automatically receive unrestricted access to a developer’s host machine. Isolation creates a controlled boundary between the host system, the AI agent, generated code, and the external tools and services the agent may interact with. This explicitly helps reduce accidental filesystem damage, credential exposure, unrestricted network access, persistence risks, and unsafe experimentation. 

One example discussed frequently in the Docker SBX community is running:

bash
sudo rm -rf /*

inside a sandbox while the host machine remains protected. The example is intentionally dramatic, but it highlights an important point: AI-generated commands should execute inside environments designed to contain mistakes safely. Isolation is not just a security feature. It is becoming an important part of responsible AI-assisted development.

A New Approach to AI Agent Isolation

Containers already provide lightweight isolation and are foundational to modern development workflows. But AI workloads introduce additional considerations. A common question raised around Docker SBX is:

Why use microVMs instead of standard containers alone? Traditional containers share the host kernel.

For many workloads, that model works extremely well. 

However, AI agents may execute untrusted code, interact with external repositories, dynamically generate commands, access APIs and credentials, and automate sensitive workflows. These workflows can benefit from stronger isolation boundaries. Docker SBX introduces a microVM-based approach designed to provide additional protection while still maintaining a developer-friendly experience. 

Another recurring question has been: Why did Docker build its own VMM instead of using Firecracker?

The reasoning shared publicly is that Docker wanted an approach that works across Windows and Mac environments in addition to Linux-focused deployment scenarios. The goal is simple: AI tooling should remain accessible across developer operating systems while improving isolation for modern AI workflows.

Understanding Docker SBX

Docker SBX focuses on creating isolated environments for AI-assisted development. The platform emphasizes secure execution, sandboxed environments, controlled networking,  safer credential handling and customizable workflows. One particularly interesting part of SBX is how credentials are managed. According to the official documentation, credentials stay on the host and are routed through a proxy instead of directly entering the sandbox VM.

This matters because AI agents increasingly interact with APIs, model gateways, cloud services, development platforms, and external tooling. Reducing direct credential exposure helps improve the safety of these workflows. The official documentation also explains how the proxy-managed credential system works. Inside the sandbox, the agent works with a sentinel placeholder value. The proxy then replaces the outgoing authentication header with the real credential before the request leaves the sandbox environment. This means the real secret never directly enters the VM. That design reflects an increasingly important principle for AI tooling: safer execution environments matter just as much as model capability.

Sandbox Kits: Where Isolation Becomes Practical 

While exploring Docker SBX, one thing that stood out to me was that isolation is only part of the story. Running AI agents inside an isolated environment provides a stronger security boundary, but teams still need a practical way to configure, secure, and standardize those environments. That is where Sandbox Kits play an important role.

According to Docker’s documentation, a Kit can package tools, environment variables, credentials, network rules, files, startup commands, and even memory instructions for an agent into a single reusable specification. Rather than manually configuring every sandbox, teams can define these capabilities once and reuse them across projects and teams. 

What makes Kits particularly interesting is that they are not simply templates or setup scripts. Docker SBX applies and enforces Kit-defined capabilities at runtime. This means that tooling requirements, network policies, proxy-managed credentials, and agent guidance can travel with the sandbox environment itself rather than relying on manual configuration.

This becomes increasingly valuable as AI agents take on more responsibility. An organization may want every AI coding agent to start with approved tools, access only specific services, authenticate through proxy-managed credentials, and follow internal development standards. Without a reusable mechanism, maintaining those controls consistently across environments can quickly become difficult.

Sandbox Kits help address that challenge by turning environment configuration into a reusable artifact. Teams can package their requirements once and apply them repeatedly, creating more consistent and secure AI workflows while preserving the isolation boundaries provided by Docker SBX. MicroVM isolation provides the foundation, while Sandbox Kits help turn that foundation into repeatable day-to-day AI workflows.

Sandbox Kits Make AI Workflows Practical

One of the most interesting additions to Docker SBX is Sandbox Kits. Kit packages reusable customizations for sandbox environments. According to the official documentation, Kits can install tools, configure environment variables, inject files, run startup commands, control allowed domains, and manage credentials through proxy-based injection. This allows teams to create repeatable AI environments tailored to their workflows. For example, a team could create a secure AI coding environment, a research sandbox, a data science workspace, a controlled API testing setup, or an internal experimentation environment.

Kits as Reusable AI Environment Blueprints

Sandbox Kits are useful not only for customizing individual sandboxes but also for creating consistent AI environments that can be reused across teams and projects. Instead of manually configuring environments every time an AI agent is launched, teams can create reusable Kits that package tools, network policies, credentials, files, startup logic, and agent instructions into a single definition. Docker SBX then applies and enforces those capabilities when the sandbox runs.

For example, an engineering team could create a coding-focused Kit that installs approved development tools, restricts outbound access to trusted services, injects shared configuration files, and provides secure access to internal APIs through proxy-managed credentials. Every AI coding session would start with the same controls and capabilities. Similarly, a research team could create an evaluation Kit that installs benchmark tooling, configures required dependencies, injects project instructions through agent memory, and standardizes how experiments are executed. This helps improve reproducibility while maintaining isolation.

Another interesting capability is agent memory. Docker Kits can append instructions and guidance to files such as AGENTS.md or CLAUDE.md, allowing teams to provide project conventions, workflow guidance, or tool-specific instructions directly to the agent at startup. Taken together, these capabilities make Kits more than a customization feature. They provide a practical way to package secure AI environments that teams can share across projects. For example, a developer could start a sandbox with a custom Kit using:

sbx run claude --kit ./my-kit/

This launches an isolated environment with predefined tools, startup commands, and built-in security controls, making it easier to create repeatable AI environments safely.

The documentation also distinguishes between two types of Kits:

Mixin Kits vs Agent Kits

Docker SBX supports two different types of Kits, each designed for a different level of customization.

Mixin Kits

Mixin Kits extend an existing agent with additional capabilities. Rather than creating a completely new environment, they allow teams to layer functionality onto agents they already use. Common examples include:

  • installing linters or developer tools
  • injecting shared team configuration
  • providing access to approved external services
  • adding organization-specific instructions or workflows

This makes Mixin Kits useful when teams want to standardize capabilities without changing the underlying agent experience. Multiple Mixin Kits can also be stacked on the same sandbox, allowing teams to combine capabilities as their workflows evolve.

Agent Kits

Agent Kits take a different approach. Instead of extending an existing agent, they define a complete agent environment from scratch. An Agent Kit can specify:

  • the container image
  • the agent entrypoint
  • networking behavior
  • credential configuration
  • persistence settings
  • startup and installation logic

This makes Agent Kits useful for organizations building internal agents, experimenting with custom agent architectures, or packaging specialized workflows that can be shared across teams. In practice, Mixin Kits help teams standardize and extend existing agents, while Agent Kits provide a framework for building and distributing entirely new agent experiences.

Why This Matters for AI Safety

Many conversations around AI safety focus on topics such as alignment, hallucinations, evaluations, misuse prevention, and model behavior. These are important challenges, but infrastructure-level safety is equally important as AI systems become more capable and autonomous. 

Even highly capable AI models can generate unsafe commands, misuse credentials, access unintended resources, and interact with untrusted code. For that reason, developers need strong runtime isolation, controlled execution environments, credential protections, network boundaries, and safer environments for experimentation. 

As AI agents become more autonomous, secure execution environments may become a foundational part of responsible AI development. Isolation is not about assuming AI will always fail. It is about building systems that safely contain mistakes when they happen. That principle has long existed in security engineering. Now it is becoming increasingly important for AI systems as well.

The Shift Toward Agentic Development

Many developers are already part of an AI adoption journey, even if they do not think of it that way. AI tools are rapidly moving from passive assistance toward:

  • autonomous execution
  • agentic workflows
  • AI-driven development environments
  • automated coding systems

That shift changes how developers think about security. Developers are no longer only running their own commands. They are increasingly reviewing and supervising commands generated by AI systems. As this transition continues, isolation may become a standard part of AI-assisted software development.

Architecture Diagram: Docker SBX Isolation Model

Docker SBX isolation model

Figure 1: Docker SBX isolation model 

This architecture highlights the core SBX security model:

  • AI agents run inside an isolated sandbox
  • credentials stay outside the sandbox
  • Outbound requests pass through a secure proxy layer
  • The host machine remains protected

Workflow Diagram: Secure AI Agent Execution

Secure AI agent execution workflow using Docker SBX

Figure 2: Secure AI agent execution workflow using Docker SBX 

This workflow shows:

1. The developer launches Docker SBX.

2. The AI agent runs inside an isolated sandbox.

3. The agent accesses external services safely.

4. Results return while the host machine remains protected.

Official References

Getting Started

Developers interested in experimenting with Docker SBX can explore the official Sandbox Kits documentation and SBX CLI reference to start building isolated AI workflows. Getting started is straightforward, as the standalone sbx tool installs quickly on macOS, Windows, and Linux without requiring full Docker Desktop dependencies. Even simple sandboxed setups can help create safer environments for AI-assisted development and experimentation.

Conclusion

AI coding agents are reshaping how software is built. But more capability also requires stronger safety boundaries. Docker SBX introduces an approach focused on isolation, microVM-based protection, secure execution, customizable sandbox environments, and safer AI-assisted workflows. Sandbox Kits further extend this model by making secure and repeatable AI environments easier to build and share.

As AI agents continue to evolve, secure execution environments may become just as important as the models themselves. Ultimately, the future of AI development is not only about building more capable systems. It is also about building systems that can operate safely. And isolation is becoming an important part of that future.

Read the whole story
alvinashcraft
44 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

CodeRabbit becomes ESLint gold sponsor

1 Share

CodeRabbit Donates to ESLint

We are happy to share that CodeRabbit has become an ESLint gold sponsor, donating $1,000 each month for the ongoing maintenance and development of ESLint! CodeRabbit is an AI-powered code review platform that provides fast, context-aware feedback on pull requests. It helps development teams automate and accelerate the code review process, catch bugs and logic errors early, and improve code quality and velocity.

Here’s what Santosh Yadav, Principal Developer Advocate at CodeRabbit, had to say about ESLint:

“We love Open Source and ESLint is critical for the ecosystem. We want to make Open Source more sustainable for maintainers. ESLint is shipped as part of the CodeRabbit tools, and we love to support our dependencies.”

– Santosh Yadav, Principal Developer Advocate, CodeRabbit

ESLint is extremely grateful for the support of the CodeRabbit team. Donations from sponsors like CodeRabbit allow ESLint to pay contributors for ongoing maintenance and development, helping to make the open-source ecosystem more sustainable for everyone.

Read the whole story
alvinashcraft
44 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Sandboxing AI Agents

1 Share

It has become clear after many discussions with large enterprises that the interest and excitement around AI agents will only grow. Many enterprises now have C-level executives responsible for implementing AI, which brings associated budgets and measurable outcomes. Meanwhile, individual contributors are well along in their AI journey, using AI-assisted coding agents and general-purpose AI assistants.

Securing these agents is a top concern for enterprises. One common solution to improve the security of AI agents is to run them in a sandboxed environment. In this post, I’ll take a look at what it means to “sandbox” an AI agent in a production environment.

In brief

  • Local AI agents are general-purpose assistants that can perform almost any action on behalf of a user.
  • Local AI agents benefit from sandboxes as a countermeasure to their broad access to CLI tools, local files, and networks.
  • Shared AI agents are designed for specific tasks.
  • Shared AI agents should be decomposed into the agent harness and the tools called by the agent.
  • The tools called by shared AI agents are typical web services.
  • The term “sandbox” has little meaning for shared AI agents, as the tools can be secured with existing security policies and practices.

Distinguishing between local and shared agents

Before discussing what it means to sandbox an agent, it is important to distinguish between local and shared agents.

Local agents are the result of bespoke configuration in an individual’s own workspace. It is the coding agent with a mishmash of MCP servers and personal credentials that a developer has set up to help them with their work. Or the OpenClaw style agent that runs in the background automating tasks like monitoring emails, browsing the web, or organizing files.

To use the pets/cattle analogy (where pets have names and are lovingly cared for while cattle are interchangeable), local agents are pets. Local agents must support a wide range of tasks, including code generation, running scripts, manipulating files, and answering questions. They were never intended to be distributed, and little thought is put into how they might be recreated. Each developer is responsible for their own local agent. In fact, much of the functionality provided by a local agent likely relies on MCP servers exposed by an IDE, which are not available outside a local development environment.

Shared (or managed) agents are designed to perform specialized tasks. They must be secure, testable, deployable, and supported. Shared agents will often be hosted as web applications, perhaps using protocols like the Model Context Protocol (MCP).

Local agents have unique security concerns. It is mesmerizing, and slightly horrifying, watching a local agent query the contents of your /etc/environment file to get the credentials required to execute a curl command as it doggedly attempts to upload a file to a remote server. Local agents are like sharing your keyboard with the most brilliant and amoral entity in the known universe.

Because local agents are general-purpose AI tools, they tend to have broad access to the CLI, local files, and networks. So it makes sense to run local agents in an isolated environment to distinguish between the trust granted to a user and the trust granted to the local AI agent.

Shared agents have a far narrower scope than local agents. Shared agents are designed to solve specific tasks and interact with the world through a small window. The limited scope of shared agents has implications for their security.

The focus of this post is on shared agents. This is not to diminish the security implications of local AI agents, but rather to note that shared agents, iteratively developed and deployed to a production environment, align very closely with the core functionality provided by Octopus.

But before we can understand what it means to sandbox a shared agent, we first need to understand the architecture of shared agents.

Shared agent architecture

At the heart of every AI agent is an LLM making decisions about how best to achieve its task.

For all their wonder and complexity, it is best to think of LLMs used by shared agents as string functions: the prompt string goes in, the response string comes out. (I’m going to ignore the social engineering security aspect of LLMs here, as the generated output of an LLM used by a shared agent is not typically consumed by a person.)

That is it. LLMs cannot, on their own, interact with the world. They cannot browse a web page, read a file, or save a record in a database.

Because LLMs can’t interact with the world, there is very little to contain in a sandbox.

However, this inability to interact with the world severely restricts the problems that LLMs can solve. A chatbot is about as complex a solution as you can build with an isolated LLM. To build useful AI agents, LLMs must be able to act.

This is where the concept of tool calling comes in. Tools are just a fancy way of describing code exposed to an LLM that can interact with the world. MCP is the most common interface through which LLMs learn about and execute tools.

When a tool like switch_lightbulb_on is exposed by an MCP server to an LLM, a prompt like Switch on the lights will cause a physical light bulb to turn on.

Treating the LLM and the tools it calls as separate concerns is crucial to understanding how sandboxes apply to shared AI agents.

Sandboxing the tools

There are many industry examples demonstrating the pattern where the LLM is run as a regular service while the tools called by the LLM are isolated within a sandbox environment.

Some of these quotes have been edited for clarity.

Claude describes the LLM as the brain and the tools as the hands of an AI agent:

The solution we arrived at was to decouple what we thought of as the “brain” (Claude and its harness) from both the “hands” (sandboxes and tools that perform actions) and the “session” (the log of session events).

Notably, in this description, the hands include sandboxes.

Red Hat describes the separation of the brain and the hands, with the hands running in a sandbox, as “the right choice for multi-tenant agent platforms and production workloads”:

The agent’s “brain” (reasoning and orchestration) is decoupled from its “hands” (tool execution and code). The platform orchestrates the agent loop and delegates execution to disposable, stateless sandboxes that you control. Credentials are physically separated from the execution environment, injected at the network boundary rather than stored where agent-generated code can reach them. Both the Responses API and Anthropic’s Managed Agents follow this pattern, whether the sandbox runs in the provider’s cloud or on your own infrastructure through self-hosted environments. This is the right choice for multi-tenant agent platforms and production workloads.

How 11x Rebuilt Their Alice Agent: From ReAct to Multi-Agent with LangGraph notes that agents work best when tools do the heavy lifting:

Tools are preferable over skills. Don’t try to make your agent too smart. Just give it the right tools and tell it how to use them.

In the video Securing MCP in an Agentic World with Arjun Sambamoorthy from Cisco, Arjun describes the importance of run-time MCP security with sandboxes isolating MCP servers:

We should also sandbox and isolate MCP servers to make sure there’s no crosspollination that’s actually happening.

Agentic AI Safety & Security by Dawn Song describes the importance of decomposing systems to enforce the principle of least privilege:

The idea is that instead of building one monolithic agent with different components in one system, one can actually separate the overall agent system into separate components where each component can run its own, for example, container or context such that each separate component can have its own set of privileges depending on its needed capabilities and so on and hence enable and help enforce principle of least privilege.

OpenAI describes when to use a sandbox, and notes that “the sandbox stays focused on provider-specific execution”:

Use sandboxes when the agent needs to manipulate files, run commands, mount a data room, produce artifacts, expose a service, or continue stateful work later.

The key split is the boundary between the harness and compute. The harness is the control plane around the model: it owns the agent loop, model calls, tool routing, handoffs, approvals, tracing, recovery, and run state. Compute is the sandbox execution plane where model-directed work reads and writes files, runs commands, installs dependencies, uses mounted storage, exposes ports, and snapshots state.

Keeping those boundaries separate lets your application keep sensitive control plane work in trusted infrastructure while the sandbox stays focused on provider-specific execution.

Azure Container Apps Sandboxes provide a managed service where:

Agents can run anything safely - an agent spawns a sandbox, executes work inside it, and returns the output with no agent host privileges required.

AWS provides the Amazon Bedrock AgentCore Code Interpreter, which similarly provides a sandbox where untrusted code is run:

With the AgentCore Core Interpreter, AI agents can write and execute code securely in sandbox environments, enhancing their accuracy and expanding their ability to solve complex end-to-end tasks.

The provided diagram clearly shows the Agent and LLM sitting outside the sandbox, and the code being executed inside it:

AgentCore Code Interpreter Diagram

What is clear from these examples is that the LLM is hosted separately from the tools it calls, and it is the tools that are sandboxed, as this is where the real work is done.

What even is a sandbox?

When taking the approach of sandboxing tools, the next decision is which guardrails the sandbox must provide.

At the extreme end, a sandbox provides an environment in which untrusted scripts can run. An example of this is Intel DeepMath, which is a lightweight agent that specializes in solving mathematical problems by running small, sandboxed Python scripts that support and enhance its problem-solving process:

Instead of verbose text, the model emits tiny Python snippets for intermediate steps, runs them in a secure sandbox, and folds the results back into its reasoning, reducing errors and output length.

Your local coding assistant AI agent may even have produced Python scripts to modify files in bulk or search for text.

Because you can do almost anything with a Python script, you need a robust sandbox to prevent any malicious or undesirable actions from being executed.

Running untrusted code is an extreme example, though. Most tools will be far more routine, performing deterministic actions like returning data, sending messages, triggering a workflow, approving a request, etc. Indeed, most of the tools called by a shared AI agent are just wrappers around existing APIs.

The sandbox around these tools must address the same cross-cutting concerns as any web service container, like authentication, authorization, rate limiting, PII redaction, observability, CPU and memory limits, firewalls, etc.

At this point, it may not even make sense to talk about sandboxes at all. Any modern Platform as a Service (PaaS) or orchestration platform has almost certainly addressed these common security concerns, usually without using the term “sandbox.”

Do sandboxes make sense?

General-purpose local AI agents running in an individual’s workspace absolutely benefit from a sandbox. The fact that a local AI agent can and will do anything you ask (and sometimes things you don’t) means a specialized sandbox is a valid countermeasure.

In OpenClaw + Windows, Microsoft demonstrates how OpenClaw is prevented from making unwanted changes to the system by running it in a sandbox:

And you’ll notice down here in the corner we’ve got lots of permissions options along with our sandbox configuration. Now, this sandbox is really interesting because this is using MXC, the Microsoft Execution Containers.

You’ve got full support about what files and folders you want OpenClaw to have access to, and really granular security features like clipboard access or talking to the internet itself.

OpenClaw already has a rich safety layer, and that layer is only augmented more by appropriate containment that can be managed by me or policies applied by IT.

The concept of a sandbox is also applicable for the execution of generated scripts, which administrators must assume can perform any action.

However, the concept of a sandbox is less meaningful when used to isolate specific, deterministic tools required by shared agents. The security layer built into any modern PaaS offering already supports the cross-cutting security concerns required to host web-based services, authentication and authorization policies are available on APIs exposed by tools, and individual tools can be turned on and off as needed in an MCP server.

You could make a good argument that this collection of controls effectively serves as a sandbox. For example, agent-sandbox combines existing Kubernetes features to provide an AI agent sandbox.

But using the term “sandbox” feels more like a distraction from the implementation of existing, standard security controls applied to any web service because it implies that there is some unique security layer that is specifically required to support AI agents.

Conclusion

The term sandbox is thrown around a lot these days. You don’t have to look hard to find examples of AI agents going rogue and deleting files or trashing databases, and it is natural to assume that some kind of sandbox is required to rein in freewheeling AI agents.

But it is important to distinguish between general-purpose local AI agents that are incentivized to support any kind of action and specialized shared AI agents that are designed for a very specific purpose. Further decomposing shared AI agents into the agent harness and the tools highlights that it is the tools that need to be constrained. And centrally managed tools exposed as web services (with an MCP server being a specialized web server) already have a wealth of existing, comprehensive security controls available to secure them.

Enterprises should focus on constraining the tools used by shared AI agents, rather than being distracted by hype around sandboxes. Your existing best practices can be applied to centrally managed tools; there is no need to shoehorn in an additional security layer under the guise of a sandbox.

Happy Deployments!

Read the whole story
alvinashcraft
45 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories