Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
150001 stories
·
33 followers

Treat test code like production code

1 Share

You have to read and maintain test code, too.

I don't think I've previously published an article with the following simple message, which is clearly an omission on my part. Better late than never, though.

Treat test code like production code.

You should apply the same coding standards to test code as you do to production code. You should make sure the code is readable, well-factored, goes through review, etc., just like your production code.

Test mess #

It's not uncommon to encounter test code that has received a stepmotherly treatment. Such test code may still pay lip service to an organization's overall coding standards by having correct indents, placement of brackets, and other superficial signs of care. You don't have to dig deep, however, before you discover that the quality of the test code leaves much to be desired.

The most common problem is a disregard for the DRY principle. Duplication abound. It's almost as though people feel unburdened by the shackles of good software engineering practices, and as result relish in the freedom to copy and paste.

That freedom is, however, purely illusory. We'll return to that shortly.

Perhaps the second-most common category of poor coding practices applied to test code is the high frequency of Zombie Code. Commented-out code is common.

Other, less frequent examples of bad practices include use of arbitrary waits instead of proper thread synchronization, unwrapping of monadic values, including calling Task.Result instead of properly awaiting a value, and so on.

I'm sure that you can think of other examples.

Why good code is important #

I think that I can understand why people treat test code as a second-class citizen. It seems intuitive, although the intuition is wrong. Nevertheless, I think it goes like this: Since the test code doesn't go into production, it's seen as less important. And as we shall see below, there are, indeed, a few areas where you can safely cut corners when it comes to test code.

As a general rule, however, it's a bad idea to slack on quality in test code.

The reason lies in why we even have coding standards and design principles in the first place. Here's a hint: It's not to placate the computer.

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."

The reason we do our best to write code of good quality is that if we don't, it's going to make our work more difficult in the future. Either our own, or someone else's. But frequently, our own.

Forty (or fifty?) years of literature on good software development practices grapple with this fundamental problem. This is why my most recent book is called Code That Fits in Your Head. We apply software engineering heuristics and care about architecture because we know that if we fail to structure the code well, our mission is in jeopardy: We will not deliver on time, on budget, or with working features.

Once we understand this, we see how this applies to test code, too. If you have good test coverage, you will likely have a substantial amount of test code. You need to maintain this part of the code base too. The best way to do so is to treat it like your production code. Apply the same standards and design principles to test code as you do to your production code. This especially means keeping test code DRY.

Test-specific practices #

Since test code has a specialized purpose, you'll run into problems unique to that space. How should you structure a unit test? How should you organize them? How should you name them? How do you make them deterministic?

Fortunately, thoughtful people have collected and systematized their experience. The absolute most comprehensive such collection is xUnit Test Patterns, which has been around since 2007. Nothing in that book invalidates normal coding practices. Rather, it suggests specializations of good practices that apply to test code.

You may run into the notion that tests should be DAMP rather than DRY. If you expand the acronym, however, it stands for Descriptive And Meaningful Phrases, and you may realize that it's a desired quality of code independent of whether or not you repeat yourself. (Even the linked article fails, in my opinion, to erect a convincing dichotomy. Its notion of DRY is clearly not the one normally implied.) I think of the DAMP notion as related to Domain-Driven Design, which is another thematic take on making code fit in your head.

For a few years, however, I did, too, believe that copy-and-paste was okay in test code, but have long since learned that duplication slows you down in test code for exactly the same reason that it hurts in 'real' code. One simple change leads to Shotgun Surgery; many tests break, and you have to fix each one individually.

Dispensations #

All the same, there are exceptions to the general rule. In certain, well-understood ways, you can treat your test code with less care than production code.

Specifically, assuming that test code remains undeployed, you can skip certain security practices. You may, for example, hard-code test-only passwords directly in the tests. The code base that accompanies Code That Fits in Your Head contains an example of that.

You may also skip input validation steps, since you control the input for each test.

In my experience, security is the dominating exemption from the rule, but there may be other language- or platform-specific details where deviating from normal practices is warranted for test code.

One example may be in .NET, where a static code analysis rule may insist that you call ConfigureAwait. This rule is intended for library code that may run in arbitrary environments. When code runs in a unit-testing environment, on the other hand, the context is already known, and this rule can be dispensed with.

Another example is that in Haskell GHC may complain about orphan instances. In test code, it may occasionally be useful to give an existing type a new instance, most commonly an Arbitrary instance. While you can also get around this problem with well-named newtypes, you may also decide that orphan instances are no problem in a test code base, since you don't have to export the test modules as reusable libraries.

Conclusion #

You should treat test code like production code. The coding standards that apply to production code should also apply to test code. If you follow the DRY principle for production code, you should also follow the DRY principle in the test code base.

The reason is that most coding standards and design principles exist to make code maintainability easier. Since test code is also code, this still applies.

There are a few exception, most notably in the area of security, assuming that the test code is never deployed to production.


This blog is totally free, but if you like it, please consider supporting it.
Read the whole story
alvinashcraft
25 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

What’s new at Stack Overflow: December 2025

1 Share
Including a new MCP server, expanded access to a new question type, a long requested community ask to make copying code easier, and more!
Read the whole story
alvinashcraft
25 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Cutting Through the Noise: Smarter Context Management for LLM-Powered Agents

1 Share

Imagine you’re working on a project and jotting down every single idea, experiment, and failure. After a while, your notes pile up so high that finding what’s useful takes more time and energy than the work itself. A similar problem faces users of software engineering (SE) agents: the agents “take notes” on every generated output, iteratively adding the information to their context; this creates massive – and expensive – memory logs.

Huge contexts can be a problem for a couple of reasons. For one, AI models are priced per word (token), and as the context increases, the number of tokens spent drastically increases. Allowing the context to grow without intervention leads to the risk that the context window of modern LLMs is quickly exceeded. In addition, an agent’s effective context size is, in reality, quite small (see this paper and this paper).

This means that agent-generated context actually quickly turns into noise instead of being useful information. Another way to look at it: agent contexts grow so rapidly that they become very expensive, yet do not deliver significantly better downstream task performance. Currently, we are wasting resources for a suboptimal return on investment.

If growing contexts are problematic, what measures are being taken to manage them? Surprisingly little, considering the consequences. So far, the focus has been on enhancing the agent’s planning capabilities through strategies such as scaling training data and environments (e.g. papers 1 and 2), as well as enhanced planning and search-efficient strategies (e.g. papers 1 and 2).

However, there is still a significant gap in the research on efficiency-based context management. Our researchers have addressed this gap with an empirical study of the major approaches in efficiency-based context management, plus a novel hybrid approach that achieves significant cost reduction. This research is part of Tobias Lindenbauer’s Master’s thesis at TUM’s Software Engineering and AI Lab. We will present our insights at the Deep Learning 4 Code workshop, part of the NeurIPS 2025 Conference in San Diego on December 6th, 2025.

In this post, we will describe:

  • The two main approaches to context management: observation masking and LLM summarization.
  • Our experiment and its results comparing these two approaches against a baseline.
  • Our hybrid solution and the broader application of our study.

Сontext management approaches

When AI agents work on complex coding tasks, they need to remember what they’ve done before, like which files they’ve read, what code they’ve tested, and how they’ve reasoned about errors. This “memory” is also known as context, and it helps the agents reason more effectively. However, managing that memory efficiently is a balancing act between giving the AI enough to think clearly and not overwhelming it with unnecessary clutter. 

Recently, several studies have taken a closer look at how the size of an AI’s context window affects its performance (e.g. this 2024 study and this 2025 one). These papers consistently show that as the context grows, language models often struggle to make good use of all the information they’re given. Even though context management plays a huge role in both how well agents perform and how costly they are to run, most research still treats it as more of an engineering detail than a core research problem.

In the current state of the art, there are the following two main approaches for handling the context management challenge. Note that the first approach is both the more recent and the more sophisticated one; OpenHands initially presented it and is currently used in Cursor and Warp‘s (proprietary) SE agent solutions. The second approach is a bit older and is the simpler of the two.

  • LLM summarization: another AI model generates short summaries 
  • Observation masking: older, less important bits of information are hidden

Both approaches preserve important context, which is why they fundamentally both work. The key difference is in how they do this. The following image depicts the difference, and then the text goes into further detail.

Context management strategies. Figure based on Lindenbauer et al (2025): 4.

Figure based on Lindenbauer et al (2025): 4.

On the lefthand side of the image, we can see the default process, raw agent, omitting prompts for simplicity. Each turn, represented by T1  and T2 in the left margin, comprises three parts: reasoning, action, and observation.

Depicted in the middle of the above image is LLM summarization. It reduces the resolution of all three parts of the involved turns by essentially compressing the long history that is generated (in other words, the trajectory) into a compact form. The yellow-framed square represents the summary of the first two turns, T1 and T2.

On the right-hand side of the image, we can see how observation masking targets the environment observation only, while preserving the action and reasoning history in full. Only the third turn part is hidden by a mask, here in green. Considering that a typical SE agent’s turn heavily skews towards observation, it makes sense for this approach to only reduce the resolution of this specific turn element. And, the agent still has access to its past reasoning and decisions, but no longer reprocesses huge chunks of verbose text from earlier turns, such as test logs or full file reads.

An additional difference between the two approaches concerns infinite contexts. Namely,LLM summarization theoretically allows infinite scaling of turns without an infinitely scaling context, due to the repeated summarization and its consequence that a large enough context window would not be exceeded. On the other hand, observation masking significantly lowers the scope at which the context grows, but the context can grow to infinity if the number of turns is also allowed to grow infinitely. 

The following table presents the advantages and disadvantages of each approach.

Context management table

Recently, a few other researchers have developed context management tools and analysed their performance with respect to efficiency. Recent studies in this domain include the following:

  1. MEM1, which explored dynamic state management for tasks like multi-hop question answering and web navigation. Still, that work didn’t compare against simpler omission-based methods like observation masking, and the benchmarks they used were relatively short and lightweight (just a few hundred tokens), unlike the far longer trajectories seen in SE agents. Note that this approach involves training the model.
  2. A variant of the LLM summarization approach to help SE agents manage their context more efficiently. However, they didn’t include a comparison with the simpler observation masking method. Their closest alternative, called the Delete baseline, drops entire dialogue turns instead of summarizing them. That might sound efficient, but since these agents mostly interact with their environments, deleting whole turns can interrupt their reasoning, as well as hurt performance.
  3. An approach using observation masking worked remarkably well for reinforcement learning and inference in both deep research and computer-use agents. Note that this approach involves training the model.

Note: While the first approach preceded our study, the other two were published after it was completed.

While the above studies deliver interesting results on efficient context management, there’s still a lot to learn about the optimal strategies of managing context efficiently for agents. In our study, we questioned whether complex summarization strategies are really necessary to keep things efficient. To explore this, we ran experiments using SWE-agent and OpenHands to see how simpler techniques stack up, which we will discuss in the next section.

Our empirical look at current context management approaches

The experiments put three memory strategies to the test. The strategies are the following, where the first is a baseline and the second two are the main objects of investigation.

  1. Letting memory grow unchecked – raw agent
  2. Trimming old observations with placeholders – observation masking
  3. Using a separate AI to summarize past steps – LLM summarization

As a baseline for the experiment, we looked at raw agent scaffolds using either ReAct or CodeAct. In these frameworks, the agent’s trajectory is a sequence of interactions with an environment, and the context is unbounded.

For the two main objects of our investigation, we analyzed popular approaches to these through the following representative open-source implementations:

  • Environment observation masking through a rolling window (SWE-agent), which:
    • Keeps the agent’s reasoning and actions intact.
    • Replaces older observations with placeholders once they fall outside a fixed window, essentially telling the model, “some details omitted for brevity.”
  • Prompt-based LLM summarization (OpenHands), which:
    • Uses a separate summarizer language model to compress older interactions (i.e. observations, actions, and reasoning) into summaries.
    • Does not alter the most recent turns.

Based on preliminary experiments, we learned that to fairly compare the bounded property of LLM summarization to the slowed but unbounded growth of observation masking, we need to work with long-horizon task trajectories (described in our paper’s appendix). For the experiments reported here, we let our agents run for up to 250 turns. When it came to observation masking, we found that keeping a window of the latest 10 turns gave us the best balance between performance and efficiency. For LLM summarization, we summarized 21 turns at a time, always retaining the most recent 10 turns in full. 

Furthermore, we used state-of-the-art models, both open-weight (Qwen3) and proprietary (Gemini 2.5 Flash) models, ranging in size from 32B to 480B; we also covered both thinking and non-thinking regimes. All experiments were run on SWE-bench Verified, with 500 instances each. For more details on the configuration, see Section 3 of the paper.

Observation masking is simple but effective

As described in the previous section, our experiments investigated three memory strategies: the first is a baseline, and the second two are the main objects of investigation. The main result of our experiments is that observation masking outperforms LLM summarization in terms of overall efficiency and reliability.

  1. Letting memory grow unchecked – raw agent
  2. Trimming old observations with placeholders – observation masking
  3. Using a separate AI to summarize past steps – LLM summarization

Both approaches (2) and (3) consistently cut costs by over 50% compared to (1), which leaves the agent’s memory unmanaged. These efficiency-first context management approaches drastically reduce context growth, all without hurting the agents’ ability to solve problems.

Surprisingly, the simple approach of observation masking wasn’t just cheaper; it often matched or even slightly beat LLM summarization in solving benchmark tasks. In four out of five test settings, agents using observation masking paid less per problem and often performed better. 

For instance, with the powerful Qwen3-Coder 480B model, observation masking boosted solve rates by 2.6% compared to leaving the context unmanaged, while being 52% cheaper on average. In other situations, LLM summarization matched these results but rarely surpassed them. Even a slight difference – say, three cents per problem – can add up for large-scale AI deployments.

Agent-specific differences in performance

We also wanted to know whether the insights from the above experiments can be generalized to a different agent, using the same parameters but with a smaller subset. To investigate, we ran extra tests using the OpenHands agent in both context management approaches on a slice of the SWE-bench Verified benchmark (i.e. 50 instances instead of 500). In addition, the agent was limited to Gemini 2.5 Flash in no thinking mode. 

The extra tests involved implementing the optimal value of SWE-Agent with the OpenHands agent, observing that there are significant differences in performance, and then changing parameters, which brought the performance back up to standard. We found out that observation masking matched LLM summarization in both cost savings and problem-solving ability – but only after tuning the masking “window” hyperparameter for each agent scaffold. 

What does it mean to tune the masking window hyperparameter? Every agent keeps its conversation history differently. For the tools in our study, SWE-agent (using observation masking) skips over failed retry turns, while OpenHands (using LLM summarization) includes them all. 

Considering parameters, i.e. that the context window is limited to the last 10 turns, the differences in conversation history can be immense. In other words, if the agent were to fail for all 10 turns in a row, the observations for the OpenHands agent’s context window would only be erroneous ones. This can be quite problematic, potentially derailing the agent.

A larger window size is therefore necessary for agents like OpenHands, so that we do not negatively affect performance. By tuning the window size, it is possible to account for such agent-specific differences in hyperparameters and increase the window when necessary, like in the case with OpenHands. 

LLM summarization and trajectory elongation

One thing we really wanted to know was why LLM summarization, despite being the more sophisticated approach, was often less cost-effective than plain old observation masking. Turns out, there’s an interesting twist: when agents use the summary approach, they tend to run for more steps, which means bigger bills and less efficiency in the end.

When we dug into the data, we found that for models like Gemini 2.5 Flash, using LLM summarization led to agents running for an average of 52 turns, a whopping 15% longer than with observation masking. The same thing happened with Qwen3-Coder 480B: summaries stretched out the process by about 15% compared to letting the agent just handle its raw history, and 13% longer compared to masking. 

Why? It seems like LLM-generated summaries may actually smooth over, or hide, signs indicating that the agent should already stop trying to solve the problem. This means it often keeps going past the point where it would normally stop – and where it would make sense to stop (see this paper on solve-rate plateaus).

So, even though summaries are designed to make things neater, they can backfire by encouraging the agent to keep chugging along, using more resources without solving problems any better. It’s a good reminder that sometimes, keeping things simple not only works, but works more efficiently.

What is more, every time the agent asks an LLM for a summary, it fires off a costly API call to process a long chunk of the trajectory, racking up additional charges. These summary-generation calls aren’t cheap, sometimes making up more than 7% of the total cost per instance, especially for the largest models. Because each LLM summarization API call has to work through a unique slice of the agent’s history, there’s very little cache reuse. When these summary costs are subtracted out, the gap in efficiency between LLM summarization and observation masking shrinks dramatically in most cases.

Both smart and simple strategies halved costs versus doing nothing, but simplicity often takes the prize for total efficiency and reliability. So, even though summarization sounds smart, in practice, it’s extra costly and doesn’t reliably outperform the simpler masking approach. This suggests that many current AI agents could reduce costs if they dialed down their reliance on summary calls or developed creative hybrid strategies that effectively combine both methods.

Our hybrid solution for even more efficiency

After seeing how both observation masking and LLM summarization held their own, with masking often as good as (or better than) summarizing, we wanted to see if combining the best parts of both would lead to even more savings. The approaches having complementary strengths, as shown in the table above, is one further argument for merging their strengths. 

Our hybrid approach works by relying on observation masking as the agent’s first line of defense against context bloat. We designed it this way because observation masking is fast and inexpensive: it hides old, noisy tool outputs with placeholders, keeping only the most relevant part of the agent’s work as it moves forward.

But, instead of ignoring summaries altogether, the hybrid system occasionally uses LLM summarization as a last resort to create a short, AI-generated recap of the full story when the context starts getting truly unwieldy. In our setup, the agent let masking handle most steps, but would only trigger LLM summarization after collecting a big batch of turns. In those cases, the summaries are triggered by a tuned hyperparameter.

Advantages of our novel hybrid approach include: 

  1. It lets the agent rack up quick savings from observation masking, especially right at the beginning when new problems are still short, and context isn’t a problem. 
  2. It ensures that even for super-long or complex jobs, the occasional LLM summarization step prevents memory from spiraling out of control, without spending extra on summary generation during simple tasks.
  3. Because our approach does not involve training a model (i.e. changing weights), we can retrofit any existing model, including GPT-5 and Claude, with this approach. This equals immediate savings, even on models where it is not possible to train them. To the best of our knowledge, concurrent approaches lack this application. 

As we know from above, hyperparameter tuning is important, and this is relevant for our hybrid approach as well. Specifically, we tuned the window for masking and the number of turns before summarizing, adjusting both for each specific type of agent or job. When we reused settings that worked for one agent in another setup, we didn’t always get the best results.

Beyond tuning, we rigorously tested our approach, and the numbers support our claims. In tests with the demanding SWE-bench-Verified benchmark and the super-sized Qwen3-Coder 480B model, the hybrid technique reduced costs by 7% compared to pure observation masking and by 11% compared to using only LLM summarization. It also nudged the percentage of successful answers up by about 2.6, all while saving a meaningful chunk of money, up to USD 35 across the entire benchmark, which really adds up when agents are running at scale.

Efficient context management

This study took a deep dive into different ways AI agents handle their growing context, testing them across a wide range of models and agent frameworks. On top of that, we were able to consistently reproduce these findings, given that the trajectories are long enough and special attention is paid to parameter tuning.

The main takeaways from our study are:

  • Ignoring efficiency-based context management means ignoring important cost-saving strategies.
  • If you want AI agents that are both sharp and thrifty, don’t rely on just one context management strategy – our hybrid approach combines the strengths of both observation masking and LLM summarization.

In addition to the paper, we have made our code available online. Check it out and see the difference in context management:

Read the whole story
alvinashcraft
25 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Run Embedding Models and Unlock Semantic Search with Docker Model Runner

1 Share

Embeddings have become the backbone of many modern AI applications. From semantic search to retrieval-augmented generation (RAG) and intelligent recommendation systems, embedding models enable systems to understand the meaning behind text, code, or documents, not just the literal words.

But generating embeddings comes with trade-offs. Using a hosted API for embedding generation often results in reduced data privacy, higher call costs, and time-consuming model regeneration. When your data is private or constantly evolving (think internal documentation, proprietary code, or customer support content), these limitations quickly become blockers.

Instead of sending data to a remote service, you can easily run local embedding models on-premises with Docker Model Runner. Model Runner brings the power of modern embeddings to your local environment, giving you privacy, control, and cost-efficiency out of the box. 

In this post, you’ll learn how to use embedding models for semantic search. We’ll start by covering the theory behind embedding and why developers should run them. Then, we’ll wrap up with a practical example, using Model Runner, to help you get started.

Understanding semantic search embeddings 

Let’s take a moment to first demystify what embeddings are.

Embeddings represent words, sentences, and even code as high-dimensional numerical vectors that capture semantic relationships. In this vector space, similar items cluster together, while dissimilar ones are farther apart.

For example, a traditional keyword search looks for exact matches. If you search for “authentication”, you’ll only find documents containing that exact term. But with embeddings, searching for “user login” might also surface results about authentication, session management, or security tokens because the model understands that these are semantically related ideas.

This makes embeddings the foundation for more intelligent search, retrieval, and discovery — where systems understand what you mean, not just what you type.

For a deeper perspective on how language and meaning intersect in AI, check out “The Language of Artificial Intelligence”.

How Vector Similarity Enables Semantic Search with Embeddings

Here’s where the math behind semantic search comes in, and it’s elegantly simple.

Once text is converted into vectors (lists of numbers), we can measure how similar two pieces of text are using cosine similarity:

Screenshot 2025 12 01 081050

Where:

  • A is your query vector (e.g., “user login”),
  • B is another vector (e.g., a code snippet or document).

The result is a similarity score, typically between 0 and 1, where values closer to 1 mean the texts are more similar in meaning.

In practice:

  • A search query and a relevant document will have a high cosine similarity.
  • Irrelevant results will have low similarity.

This simple mathematical measure allows you to rank documents by how semantically close they are to your query, which powers features like:

  • Natural language search over docs or code
  • RAG pipelines that retrieve contextually relevant snippets
  • Deduplication or clustering of related content

With Model Runner, you can generate these embeddings locally, feed them into a vector database (like Milvus, Qdrant, or pgvector), and start building your own semantic search system without sending a single byte to a third-party API.

Why use Docker Model Runner to run embedding models

With Model Runner, you don’t have to worry about setting up environments or dependencies. Just pull a model, start the runner, and you’re ready to generate embeddings, all inside a familiar Docker workflow.

Full data privacy 

Your sensitive data never leaves your environment. Whether you’re embedding source code, internal documents, or customer content, you can rest assured that everything stays local — no third-party API calls, no network exposure.

Zero cost per embedding

There are no usage-based API costs. Once you have the model running locally, you can generate, update, or rebuild your embeddings as often as you need, at no extra cost.

That means iterating on your dataset or experimenting with new prompts won’t affect your budget.

Performance and control

Run the model that best fits your use case, leveraging your own CPU or GPU for inference.

Models are distributed as OCI artifacts, so they integrate seamlessly into your existing Docker workflows, CI/CD pipelines, and local development setups. This means you can manage and version models just like any other container image, ensuring consistency and reproducibility across environments.

Model Runner lets you bring models to your data, not the other way around, unlocking local, private, and cost-effective AI workflows.

Hands-on: Generating embeddings with Docker Model Runner

Now that we understand what embeddings are and how they capture semantic meaning, let’s see how simple it is to generate embeddings locally using Model Runner.

Step 1. Pull the model

docker model pull ai/qwen3-embedding

Step 2. Generate Embeddings

You can now send text to this endpoint via curl or your preferred HTTP client:

curl http://localhost:12434/engines/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/qwen3-embedding",
    "input": "A dog is an animal"
  }'

The response will include a list of embedding vectors, which is a numerical representation of your input text.

You can store these vectors in a vector database like Milvus, Qdrant, or pgvector to perform semantic search or similarity queries.

Example use case: Semantic search over your codebase

Let’s make it practical.

Imagine you want to enable semantic code search across your project repository.

The process will look like:

Step 1. Chunk and embed your code

Split your codebase into logical chunks. Generate embeddings for each chunk using your local Docker Model Runner endpoint.

Step 2. Store embeddings 

Save those embeddings along with metadata (file name, path, etc.). You would usually use a Vector Database to store these embeddings, but in this demo, we’re going to store them in a file for simplicity.

Step 3. Query by meaning

When a developer searches “user login”, you embed the query and compare it to your stored vectors using cosine similarity.

We have included a demo in the Docker Model Runner repository that does exactly that.

Vecter embedding

Figure 1: Codebase example demo with embeddings stats, example queries, and search results.

Conclusion

Embeddings help applications work with intelligent meaning, not just keywords. The old hassle was wiring up third-party APIs, juggling data privacy, and watching per-call costs creep up.

Docker Model Runner flips the script. Now, you can run embedding models locally where your data lives with full control over your data and infrastructure. Ship semantic search, RAG pipelines, or custom search with a consistent Docker workflow — private, cost-effective, and reproducible. 

No usage fees. No external dependencies. By bringing models directly to your data, Docker makes it easier than ever to explore, experiment, and innovate, safely and at your own pace.

How you can get involved

The strength of Docker Model Runner lies in its community, and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can:

  • Star the repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.
  • Contribute your ideas: Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We’re excited to see what ideas you have!
  • Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Get started with Docker Model Runner

Learn more

Read the whole story
alvinashcraft
26 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Assign Linear issues to Copilot coding agent

1 Share
From: GitHub
Duration: 0:58
Views: 72

You can now use the GitHub app for Linear to collaborate with GitHub Copilot right inside your Linear issues. This release introduces the Copilot coding agent, built to translate issues into code and pull requests. Find out more at https://gh.io/linear

#GitHub #GitHubCopilot #CopilotCodingAgent

Stay up-to-date on all things GitHub by subscribing and following us at:
YouTube: http://bit.ly/subgithub
Blog: https://github.blog
X: https://twitter.com/github
LinkedIn: https://linkedin.com/company/github
Instagram: https://www.instagram.com/github
TikTok: https://www.tiktok.com/@github
Facebook: https://www.facebook.com/GitHub/

About GitHub:
It’s where over 100 million developers create, share, and ship the best code possible. It’s a place for anyone, from anywhere, to build anything—it’s where the world builds software. https://github.com

Read the whole story
alvinashcraft
26 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Episode 97: We Wish You A Trader Joe's Holiday Shopping List

1 Share

Oh by golly there's a lot to discover this holiday season at your neighborhood Trader Joe's! In this episode of Inside Trader Joe's, we're sharing an inside look at just a few – okay, 16, so... several? – of our favorite finds for this festive season. Breakfast goodies and crunchy cookies. A quartet of trios ready for gifting to the foodies on your list (including yourself!). Cheeses for all the occasions. And salads and apps to make every gathering great. Tune in, make a list, then hop on your sleigh and head to TJ's to pick up everything you need (and want!) to make your holidays merry and bright.

Transcript (PDF)





Download audio: https://traffic.libsyn.com/secure/insidetjs/Episode_97_We_Wish_You_A_Trader_Joes_Holiday_Shopping_List.mp3?dest-id=704103
Read the whole story
alvinashcraft
26 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories