JetBrains Student Pack
Get the full suite of JetBrains IDEs – IntelliJ IDEA, PyCharm Pro, CLion, DataGrip, and more – plus AI Assistant, Junie, Academy courses, and essential plugins, free for your entire study period.
Apply nowDo you remember, dancin’ in September? 🎶 This September, let JetBrains keep the rhythm of your studies strong with free JetBrains IDEs and tools for study, a brand-new collaboration between AWS and JetBrains Academy, exclusive opportunities for school students, and more. Ready to groove into learning?
The world of local AI is moving at an incredible pace, and at the heart of this revolution is llama.cpp—the powerhouse C++ inference engine that brings Large Language Models (LLMs) to everyday hardware (and it’s also the inference engine that powers Docker Model Runner). Developers love llama.cpp for its performance and simplicity. And we at Docker are obsessed with making developer workflows simpler.
That’s why we’re thrilled to announce a game-changing new feature in llama.cpp: native support for pulling and running GGUF models directly from Docker Hub.
This isn’t about running llama.cpp in a Docker container. This is about using Docker Hub as a powerful, versioned, and centralized repository for your AI models, just like you do for your container images.
Managing AI models can be cumbersome. You’re often dealing with direct download links, manual version tracking, and scattered files. By integrating with Docker Hub, llama.cpp leverages a mature and robust ecosystem to solve these problems.
This new feature cleverly uses the Open Container Initiative (OCI) specification, which is the foundation of Docker images. The GGUF model file is treated as a layer within an OCI manifest, identified by a special media type like application/vnd.docker.ai.gguf.v3. For more details on why the OCI standard matters for models, check out our blog.
When you use the new –docker-repo flag, llama.cpp performs the following steps:
This entire process is seamless and happens automatically in the background.
Ready to try it? If you have a recent build of llama.cpp, you can serve a model from Docker Hub with one simple command. The new flag is –docker-repo (or -dr).
Let’s run gemma3, a model available from Docker Hub.
# Now, serve a model from Docker Hub!
llama-server -dr gemma3
The first time you execute this, you’ll see llama.cpp log the download progress. After that, it will use the cached version. It’s that easy! The default organization is ai/, so gemma3 is resolved to ai/gemma3. The default tag is :latest, but a tag can be specified like :1B-Q4_K_M.
For a complete Docker-integrated experience with OCI pushing and pulling support try out Docker Model Runner. The docker model runner equivalent for chatting is:
# Pull, serve and chat to a model from Docker Hub!
docker model run ai/gemma3
This integration represents a powerful shift in how we think about distributing and managing AI artifacts. By using OCI-compliant registries like Docker Hub, the AI community can build more robust, reproducible, and scalable MLOps pipelines.
This is just the beginning. We envision a future where models, datasets, and the code that runs them are all managed through the same streamlined, developer-friendly workflow that has made Docker an essential tool for millions.
Check out the latest llama.cpp to try it out, and explore the growing collection of models on Docker Hub today!