Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
147049 stories
·
32 followers

Docker Model Runner on the new NVIDIA DGX Spark: a new paradigm for developing AI locally

1 Share

We’re thrilled to bring NVIDIA DGX™ Spark support to Docker Model Runner. The new NVIDIA DGX Spark delivers incredible performance, and Docker Model Runner makes it accessible. With Model Runner, you can easily run and iterate on larger models right on your local machine, using the same intuitive Docker experience you already trust.

In this post, we’ll show how DGX Spark and Docker Model Runner work together to make local model development faster and simpler, covering the unboxing experience, how to set up Model Runner, and how to use it in real-world developer workflows.

What is NVIDIA DGX Spark

NVIDIA DGX Spark is the newest member of the DGX family: a compact, workstation-class AI system, powered by the Grace Blackwell GB10 Superchip  that delivers incredible  performance for local model development. Designed for researchers and developers, it makes prototyping, fine-tuning, and serving large models fast and effortless, all without relying on the cloud.

Here at Docker, we were fortunate to get a preproduction version of  DGX Spark. And yes, it’s every bit as impressive in person as it looks in NVIDIA’s launch materials.

Why Run Local AI Models and How Docker Model Runner and NVIDIA DGX Spark Make It Easy 

Many of us at Docker and across the broader developer community are experimenting with local AI models. Running locally has clear advantages:

  • Data privacy and control: no external API calls; everything stays on your machine
  • Offline availability: work from anywhere, even when you’re disconnected
  •  Ease of customization: experiment with prompts, adapters, or fine-tuned variants without relying on remote infrastructure

But there are also familiar tradeoffs:

  • Local GPUs and memory can be limiting for large models
  • Setting up CUDA, runtimes, and dependencies often eats time
  • Managing security and isolation for AI workloads can be complex

This is where DGX Spark and Docker Model Runner (DMR) shine. DMR provides an easy and secure way to run AI models in a sandboxed environment, fully integrated with Docker Desktop or Docker Engine. When combined with the DGX Spark’s NVIDIA AI software stack and large 128GB unified memory, you get the best of both worlds: plug-and-play GPU acceleration and Docker-level simplicity.

Unboxing NVIDIA DGX Spark

The device arrived well-packaged, sleek, and surprisingly small, resembling more a mini-workstation than a server.

Setup was refreshingly straightforward: plug in power, network, and peripherals, then boot into NVIDIA DGX OS, which includes NVIDIA drivers, CUDA, and AI software stack pre-installed.

Nividia 1

Once on the network, enabling SSH access makes it easy to integrate the Spark into your existing workflow.

This way, the DGX Spark becomes an AI co-processor for your everyday development environment, augmenting, not replacing, your primary machine.

Getting Started with Docker Model Runner on NVIDIA DGX Spark

Installing Docker Model Runner on the DGX Spark is simple and can be done in a matter of minutes.

1. Verify Docker CE is Installed

DGX OS comes with Docker Engine (CE) preinstalled. Confirm you have it:

docker version

If it’s missing or outdated, install following the regular Ubuntu installation instructions.

2. Install the Docker Model CLI Plugin

The Model Runner CLI is distributed as a Debian package via Docker’s apt repository. Once the repository is configured (see linked instructions above) install via the following commands:

sudo apt-get update
sudo apt-get apt-get install docker-model-plugin

Or use Docker’s handy installation script:

curl -fsSL https://get.docker.com | sudo bash

You can confirm it’s installed with:

docker model version

3. Pull and Run a Model

Now that the plugin is installed, let’s pull a model from the Docker Hub AI Catalog. For example, the Qwen 3 Coder model:

docker model pull ai/qwen3-coder

The Model Runner container will automatically expose an OpenAI-compatible endpoint at:

http://localhost:12434/engines/v1

You can verify it’s live with a quick test:

# Test via API

curl http://localhost:12434/engines/v1/chat/completions   -H 'Content-Type: application/json'   -d 
'{"model":"ai/qwen3-coder","messages":[{"role":"user","content":"Hello!"}]}'

# Or via CLI
docker model run ai/qwen3-coder

GPUs are allocated to the Model Runner container via nvidia-container-runtime and the Model Runner will take advantage of any available GPUs automatically. To see GPU usage:

nvidia-smi

4. Architecture Overview

Here’s what’s happening under the hood:

[ DGX Spark Hardware (GPU + Grace CPU) ]

             │

     (NVIDIA Container Runtime)

             │

     [ Docker Engine (CE) ]

             │

     [ Docker Model Runner Container ]

             │

     OpenAI-compatible API :12434

The NVIDIA Container Runtime bridges the NVIDIA GB10 Grace Blackwell Superchip drivers and Docker Engine, so containers can access CUDA directly. Docker Model Runner then runs inside its own container, managing the model lifecycle and providing the standard OpenAI API endpoint. (For more info on Model Runner architecture, see this blog).

From a developer’s perspective, interact with models similarly to any other Dockerized service — docker model pull, list, inspect, and run all work out of the box.

Using Local Models in Your Daily Workflows

If you’re using a laptop or desktop as your primary machine, the DGX Spark can act as your remote model host. With a few SSH tunnels, you can both access the Model Runner API and monitor GPU utilization via the DGX dashboard, all from your local workstation.

1. Forward the DMR Port (for Model Access)

To access the DGX Spark via SSH first set up an SSH server:

Using Local Models in Your Daily Workflows
If you’re using a laptop or desktop as your primary machine, the DGX Spark can act as your remote model host. With a few SSH tunnels, you can both access the Model Runner API and monitor GPU utilization via the DGX dashboard, all from your local workstation.

sudo apt install openssh-server
sudo systemctl enable --now ssh

Run the following command to access Model Runner via your local machine. Replace user with the username you configured when you first booted the DGX Spark and replace dgx-spark.local with the IP address of the DGX Spark on your local network or a hostname configured in /etc/hosts. 

ssh -N -L localhost:12435:localhost:12434 user@dgx-spark.local


This forwards the Model Runner API from the DGX Spark to your local machine.
Now, in your IDE, CLI tool, or app that expects an OpenAI-compatible API, just point it to:

http://localhost:12435/engines/v1/models

Set the model name (e.g. ai/qwen3-coder) and you’re ready to use local inference seamlessly.

2. Forward the DGX Dashboard Port (for Monitoring)

The DGX Spark exposes a lightweight browser dashboard showing real-time GPU, memory, and thermal stats, usually served locally at:

http://localhost:11000

You can forward it through the same SSH session or a separate one:

ssh -N -L localhost:11000:localhost:11000 user@dgx-spark.local

Then open http://localhost:11000 in your browser on your main workstation to monitor the DGX Spark performance while running your models.

Nividia 2



This combination makes the DGX Spark feel like a remote, GPU-powered extension of your development environment. Your IDE or tools still live on your laptop, while model execution and resource-heavy workloads happen securely on the Spark.

Example application: Configuring Opencode with Qwen3-Coder


Let’s make this concrete.

Suppose you use OpenCode, an open-source, terminal-based AI coding agent.

Once your DGX Spark is running Docker Model Runner with ai/qwen3-coder pulled and the port is forwarded, you can configure OpenCode by adding the following to ~/.config/opencode/opencode.json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "dmr": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Docker Model Runner",
      "options": {
        "baseURL": "http://localhost:12435/engines/v1"   // DMR’s OpenAI-compatible base
      },
      "models": {
        "ai/qwen3-coder": { "name": "Qwen3 Coder" }
      }
    }
  },
  "model": "ai/qwen3-coder"
}


Now run opencode and select Qwen3 Coder with the /models command:

Nividia 3


That’s it! Completions and chat requests will be routed through Docker Model Runner on your DGX Spark, meaning Qwen3-Coder now powers your agentic development experience locally.

Nividia 4 1


You can verify that the model is running by opening http://localhost:11000 (the DGX dashboard) to watch GPU utilization in real time while coding.
This setup lets you:

  • Keep your laptop light while leveraging the DGX Spark GPUs
  • Experiment with custom or fine-tuned models through DMR
  • Stay fully within your local environment for privacy and cost-control

Summary

Running Docker Model Runner on the NVIDIA DGX Spark makes it remarkably easy to turn powerful local hardware into a seamless extension of your everyday Docker workflow.

You install one plugin and use familiar Docker commands (docker model pull, docker model run).
You get full GPU acceleration through NVIDIA’s container runtime.
You can forward both the model API and monitoring dashboard to your main workstation for effortless development and visibility.

This setup bridges the gap between developer productivity and AI infrastructure, giving you the speed, privacy, and flexibility of local execution with the reliability and simplicity Docker provides.

As local model workloads continue to grow, the DGX Spark + Docker Model Runner combo represents a practical, developer-friendly way to bring serious AI compute to your desk — no data center or cloud dependency required.

Learn more:

  • Read the official announcement of DGX Spark launch on NVIDIA newsroom
  • Check out the Docker Model Runner General Availability announcement
  • Visit our Model Runner GitHub repo. Docker Model Runner is open-source, and we welcome collaboration and contributions from the community! Star, fork and contribute.

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

VS Code now has a Marketplace for MCP Servers!? and more... - Developer News 41/2025

1 Share
From: Noraa on Tech
Duration: 2:46
Views: 2

The #ai Model Context Protocol servers can now be directly installed from #vscode

00:00 Intro
00:11 Visual Studio Code
01:12 Github
02:04 Ticker

-----

Links

Visual Studio Code
• September 2025 (version 1.105) - https://code.visualstudio.com/updates/v1_105?WT.mc_id=MVP_274787
GitHub
• Upcoming changes to GitHub Dependabot pull request comment commands - https://github.blog/changelog/2025-10-07-upcoming-changes-to-github-dependabot-pull-request-comment-commands/
• GitHub now supports social login with Apple - https://github.blog/changelog/2025-10-07-github-now-supports-social-login-with-apple/
• Grok Code Fast 1 is now available in Visual Studio, JetBrains IDEs, Xcode, and Eclipse - https://github.blog/changelog/2025-10-06-grok-code-fast-1-is-now-available-in-visual-studio-jetbrains-ides-xcode-and-eclipse/
• Anthropic’s Claude Sonnet 4.5 is now generally available in GitHub Copilot - https://github.blog/changelog/2025-10-13-anthropics-claude-sonnet-4-5-is-now-generally-available-in-github-copilot/
• Upcoming deprecation of Claude Sonnet 3.5 - https://github.blog/changelog/2025-10-07-upcoming-deprecation-of-claude-sonnet-3-5/
Other
• Introducing Scratch Membership! - https://scratch.mit.edu/discuss/topic/843390/
• Announcing the new Azure DevOps Server RC Release - https://devblogs.microsoft.com/devops/announcing-the-new-azure-devops-server-rc-release/?WT.mc_id=MVP_274787
• Now open for building: Introducing Gemini CLI extensions - https://blog.google/technology/developers/gemini-cli-extensions/

-----

🐦X: https://x.com/theredcuber
🐙Github: https://github.com/noraa-junker
📃My website: https://noraajunker.ch

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Is Context Engineering the Future of AI Development

1 Share
Discover how Context Engineering is reshaping the future of AI development by giving large language models memory, reasoning, and awareness across applications, industries, and enterprise systems.
Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

How Do LLMs Use Context to Generate Better Responses

1 Share
Learn how large language models (LLMs) like GPT-5 and Gemini use context to understand intent, maintain conversation flow, and deliver accurate, human-like responses in AI systems.
Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Primed: Should You Hype Your AI Before You Start?

1 Share

I’ve been having some amazing successes with agentic AI and coding lately. A run-down post on these would be interesting. You can find one write-up at Goodbye Wordpress, thanks AI. With this as the backdrop, let me ask you a question.

Should you amp up your AI at the start of a project?

We’ve heard the debates about whether you should be polite and thank you AI. I think yes but never as a separate follow up comment. But should we try to get it HYPED and excited to work on the project?

I just published my first solo book. One of the challenges is that on the buy page there is a simple pair of images/buttons: Buy on Gumroad and Buy on Amazon. Things are rarely as simple as they seem. The Kindle version of the book is available in 12 different locales (US, Canada, Germany, etc.). Each one of these is a different URL based on the location of the web visitor! That means the website should adapt to each visitor and point them at their store if possible. To accomplish this, I combined some magic from GeoIpLite, diskcache, and a text file full of links to various Amazon stores’ listings of my book. Rather than grinding through this, I asked Cursor and Claude Sonnet 4.5 to help make that page dynamic.

I am finding huge success if I have a top-tier model work with me to create a detailed, reviewed plan. Then have the AI work step by step through the plan.

I was excited for the book and wanted my coding buddy to share in my excitement!

And it delivered! “I’m absolutely ready to rock! 🎸”

I’m really enjoying this. I think going forward, each time I kick off one of these detailed, plan-based projects, I am going to hype my AI coding buddy. Even if it doesn’t make a difference, it makes it more fun. :)

BTW, I’m working on an Agentic Coding course at Talk Python. Join the mailing list if that sounds interesting. There will be plenty of excitement there too.

Cheers
Michael

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Daily Reading List – October 13, 2025 (#647)

1 Share

I never seem to get my demo apps working on the first pass, but it always turns out to be a (painful) blessing in disguise. Instead of taking a couple of hours to build an agent demo, it took a couple of weeks. But I was forced to read source code, experiment, and learn so much more than if it worked the first time. I’ll post my experiences tomorrow.

[blog] F*ck it and Let it Rip. Try your hardest and have fun. A performance approach mindset is the way to go.

[article] Becoming an AI-first business requires a total organizational mindshift. It’s true and many won’t make it. Not because they’re not smart, but because it takes a level of acceptable recklessness to institute the change.

[blog] I’m in Vibe Coding Hell. I liked the points here. There’s a new challenge for self-learners who used to be dependent on the tutorial to get work done; now they’re dependent on their AI tool.

[blog] Predictions 2026: Tech Leadership Will Be Wild — Bring Your Surfboard, Your Calculator, And Maybe A Clone. Yah, I can’t imagine being a team or organization leader in tech next year. Wait a minute.

[article] Salesforce bets on AI ‘agents’ to fix what it calls a $7 billion problem in enterprise software. The unique circus of Dreamforce is going on this week, so expect all sorts of announcements. Here’s one about the new AgentForce 360.

[blog] Quantum computing 101 for developers. My boss is deep into this, but I’ve only stayed peripherally aware. But I thought this was a good article for bringing folks up to speed.

[article] Java or Python for building agents? A silly question twelve months ago, not so much today.

[blog] Agents That Prove, Not Guess: A Multi-Agent Code Review System. It’s tempting to just dump a single prompt or pile of context into an agent and want something good back. But Ayo shows a better approach if you care about repeatability and transparency.

[paper] Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models. Nonstop research and experimentation into making AI models more trustworthy and useful.

[blog] What’s 🔥 in Enterprise IT/VC #467. This is one my favorite weekend reads. Doesn’t hurt that I showed up in this one.

[blog] The Architect’s Dilemma. Good look at how you’d decide between using tools in your agent architecture, or use agents talking to agents.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:



Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories