Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
147588 stories
·
33 followers

How to solve the context size issues with context packing with Docker Model Runner and Agentic Compose

1 Share

If you’ve worked with local language models, you’ve probably run into the context window limit, especially when using smaller models on less powerful machines. While it’s an unavoidable constraint, techniques like context packing make it surprisingly manageable.

Hello, I’m Philippe, and I am a Principal Solutions Architect helping customers with their usage of Docker.  In my previous blog post, I wrote about how to make a very small model useful by using RAG. I had limited the message history to 2 to keep the context length short.

But in some cases, you’ll need to keep more messages in your history. For example, a long conversation to generate code:

- generate an http server server in golang
- add a human structure and a list of humans
- add a handler to add a human to the list
- add a handler to list all humans
- add a handler to get a human by id
- etc...

Let’s imagine we have a conversation for which we want to keep 10 messages in the history. Moreover, we’re using a very verbose model (which a lot of tokens), so we’ll quickly encounter this type of error:

error: {
    code: 400,
    message: 'request (8860 tokens) exceeds the available context size (8192 tokens), try increasing it',
    type: 'exceed_context_size_error',
    n_prompt_tokens: 8860,
    n_ctx: 8192
  },
  code: 400,
  param: undefined,
  type: 'exceed_context_size_error'
}


What happened?

Understanding context windows and their limits in local LLMs

Our LLM has a context window, which has a limited size. This means that if the conversation becomes too long… It will bug out.

This window is the total number of tokens the model can process at once, like a short-term working memory.  Read this IBM article for a deep dive on context window

In our example in the code snippet above, this size was set to 8192 tokens for LLM engines that power local LLM, like Docker Model Runner, Ollama, Llamacpp, …

This window includes everything: system prompt, user message, history, injected documents, and the generated response. Refer to this Redis post for more info. 

Example: if the model has 32k context, the sum (input + history + generated output) must remain ≤ 32k tokens. Learn more here.  

It’s possible to change the default context size (up or down) in the compose.yml file:

models:
  chat-model:
    model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m
    # Increased context size for better handling of larger inputs
    context_size: 16384

You can also do this with Docker with the following command: docker model configure –context-size 8192 ai/qwen2.5-coder `

And so we solve the problem, but only part of the problem. Indeed, it’s not guaranteed that your model supports a larger context size (like 16384), and even if it does, it can very quickly degrade the model’s performance.

Thus, with hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m, when the number of tokens in the context approaches 16384 tokens, generation can become (much) slower (at least on my machine). Again, this will depend on the model’s capacity (read its documentation). And remember, the smaller the model, the harder it will be to handle a large context and stay focused.

Tips: always provide an option (a /clear command for example) in your application to empty the message list, or to reduce it. Automatic or manual. Keep the initial system instructions though.

So we’re at an impasse. How can we go further with our small models?

Well, there is still a solution, which is called context packing.

Using context packing to fit more information into limited context windows

We can’t indefinitely increase the context size. To still manage to fit more information in the context, we can use a technique called “context packing”, which consists of having the model itself summarize previous messages (or entrust the task to another model), and replace the history with this summary and thus free up space in the context.

So we decide that from a certain token limit, we’ll have the history of previous messages summarized, and replace this history with the generated summary.

I’ve therefore modified my example to add a context packing step. For the exercise, I decided to use another model to do the summarization.

Modification of the compose.yml file

I added a new model in the compose.yml file: ai/qwen2.5:1.5B-F16

models:
  chat-model:
    model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

  embedding-model:
    model: ai/embeddinggemma:latest

  context-packing-model:
    model: ai/qwen2.5:1.5B-F16

Then:

  • I added the model in the models section of the service that runs our program.
  • I increased the number of messages in the history to 10 (instead of 2 previously).
  • I set a token limit at 5120 before triggering context compression.
  • And finally, I defined instructions for the “context packing” model, asking it to summarize previous messages.

excerpt from the service:

golang-expert-v3:
build:
    context: .
    dockerfile: Dockerfile
environment:

    HISTORY_MESSAGES: 10
    TOKEN_LIMIT: 5120
    # ...
   
configs:
    - source: system.instructions.md
    target: /app/system.instructions.md
    - source: context-packing.instructions.md
    target: /app/context-packing.instructions.md

models:
    chat-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_CHAT

    context-packing-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_CONTEXT_PACKING

    embedding-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_EMBEDDING

You’ll find the complete version of the file here: compose.yml

System instructions for the context packing model

Still in the compose.yml file, I added a new system instruction for the “context packing” model, in a context-packing.instructions.md file:

context-packing.instructions.md:
content: |\
    You are a context packing assistant.
    Your task is to condense and summarize provided content to fit within token limits while preserving essential information.
    Always:
    - Retain key facts, figures, and concepts
    - Remove redundant or less important details
    - Ensure clarity and coherence in the condensed output
    - Aim to reduce the token count significantly without losing critical information

    The goal is to help fit more relevant information into a limited context window for downstream processing.

All that’s left is to implement the context packing logic in the assistant’s code.

 Applying context packing to the assistant’s code

First, I define the connection with the context packing model in the Setup part of my assistant:

const contextPackingModel = new ChatOpenAI({
  model: process.env.MODEL_RUNNER_LLM_CONTEXT_PACKING || `ai/qwen2.5:1.5B-F16`,
  apiKey: "",
  configuration: {
    baseURL: process.env.MODEL_RUNNER_BASE_URL || "http://localhost:12434/engines/llama.cpp/v1/",
  },
  temperature: 0.0,
  top_p: 0.9,
  presencePenalty: 2.2,
});

I also retrieve the system instructions I defined for this model, as well as the token limit:

let contextPackingInstructions = fs.readFileSync('/app/context-packing.instructions.md', 'utf8');

let tokenLimit = parseInt(process.env.TOKEN_LIMIT) || 7168

Once in the conversation loop, I’ll estimate the number of tokens consumed by previous messages, and if this number exceeds the defined limit, I’ll call the context packing model to summarize the history of previous messages and replace this history with the generated summary (the assistant-type message: [“assistant”, summary]). Then I continue generating the response using the main model.

excerpt from the conversation loop:

 let estimatedTokenCount = messages.reduce((acc, [role, content]) => acc + Math.ceil(content.length / 4), 0);
  console.log(` Estimated token count for messages: ${estimatedTokenCount} tokens`);

  if (estimatedTokenCount >= tokenLimit) {
    console.log(` Warning: Estimated token count (${estimatedTokenCount}) exceeds the model's context limit (${tokenLimit}). Compressing conversation history...`);

    // Calculate original history size
    const originalHistorySize = history.reduce((acc, [role, content]) => acc + Math.ceil(content.length / 4), 0);

    // Prepare messages for context packing
    const contextPackingMessages = [
      ["system", contextPackingInstructions],
      ...history,
      ["user", "Please summarize the above conversation history to reduce its size while retaining important information."]
    ];

    // Generate summary using context packing model
    console.log(" Generating summary with context packing model...");
    let summary = '';
    const summaryStream = await contextPackingModel.stream(contextPackingMessages);
    for await (const chunk of summaryStream) {
      summary += chunk.content;
      process.stdout.write('\x1b[32m' + chunk.content + '\x1b[0m');
    }
    console.log();

    // Calculate compressed size
    const compressedSize = Math.ceil(summary.length / 4);
    const reductionPercentage = ((originalHistorySize - compressedSize) / originalHistorySize * 100).toFixed(2);

    console.log(` History compressed: ${originalHistorySize} tokens → ${compressedSize} tokens (${reductionPercentage}% reduction)`);

    // Replace all history with the summary
    conversationMemory.set("default-session-id", [["assistant", summary]]);

    estimatedTokenCount = compressedSize

    // Rebuild messages with compressed history
    messages = [
      ["assistant", summary],
      ["system", systemInstructions],
      ["system", knowledgeBase],
      ["user", userMessage]
    ];
  }

You’ll find the complete version of the code here: index.js

All that’s left is to test our assistant and have it hold a long conversation, to see context packing in action.

docker compose up --build -d
docker compose exec golang-expert-v3 node index.js

And after a while in the conversation, you should see the warning message about the token limit, followed by the summary generated by the context packing model, and finally, the reduction in the number of tokens in the history:

Estimated token count for messages: 5984 tokens
Warning: Estimated token count (5984) exceeds the model's context limit (5120). Compressing conversation history...
Generating summary with context packing model...
Sure, here's a summary of the conversation:

1. The user asked for an example in Go of creating an HTTP server.
2. The assistant provided a simple example in Go that creates an HTTP server and handles GET requests to display "Hello, World!".
3. The user requested an equivalent example in Java.
4. The assistant presented a Java implementation that uses the `java.net.http` package to create an HTTP server and handle incoming requests.

The conversation focused on providing examples of creating HTTP servers in both Go and Java, with the goal of reducing the token count while retaining essential information.
History compressed: 4886 tokens → 153 tokens (96.87% reduction)

This way, we ensure that our assistant can handle a long conversation while maintaining good generation performance.

Summary

The context window is an unavoidable constraint when working with local language models, particularly with small models and on machines with limited resources. However, by using techniques like context packing, you can easily work around this limitation. Using Docker Model Runner and Agentic Compose, you can implement this pattern to support long, verbose conversations without overwhelming your model.

All the source code is available on Codeberg: context-packing. Give it a try! 

Read the whole story
alvinashcraft
22 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Why 40% of AI projects will be canceled by 2027 (and how to stay in the other 60%)

1 Share
A colorful hand-drawn illustration of four diverse hands playing with toy cars on a blue grid background with stars and pencils, representing collaborative innovation, the agentic AI race, and organizational experimentation.

The agentic AI race is on, and most organizations are at risk of losing it. Not because they lack ambition, but because they’re fighting three wars simultaneously without a unified strategy.

Looking at what separates successful agentic AI programs from the 40% that Gartner predicts will be canceled by 2027, the pattern is clear. Organizations aren’t failing at AI. They’re failing at the infrastructure that makes AI work at enterprise scale.

There are three underlying crises of most AI initiatives,and solving them independently doesn’t work. They have to be addressed together, through a unified AI connectivity program.

The three crises derailing agentic AI Infrastructure

Crisis #1: Building sustainable velocity

Everyone knows speed matters. Boards are demanding AI agents, executives are funding pilots, and teams are racing to deploy.

But urgency hasn’t translated to velocity. S&P Global reports that 42% of companies are abandoning AI initiatives before production. Organizations are deploying agents quickly, but then pulling them back just as quickly.

The uncomfortable truth is that many of the organizations that moved fastest are now moving backward.

The uncomfortable truth is that many of the organizations that moved fastest are now moving backward. Consider McDonald’s terminating its AI voice ordering program after deploying it to over 100 locations, or the 39% of AI customer service chatbots that were pulled back or reworked.

Speed without first establishing a foundation creates technical debt that compounds until it forces a complete rebuild.

The organizations achieving sustainable velocity aren’t just moving fast. They’re moving fast on infrastructure that supports iteration rather than requiring restarts.

Crisis #2: The fragmentation tax

While teams race to deploy, Finance and FinOps teams are watching margins erode. 84% of companies report more than 6% gross margin erosion from AI costs, and 26% report erosion of 16% or more.

This isn’t coming from strategic over-investment, but from chaos: fragmented systems, untracked token consumption, zombie infrastructure, and redundant tooling scattered across teams that don’t know what the other is building.

It’s not possible to monetize what can’t be measured.

There’s also the secondary problem that it’s not possible to monetize what can’t be measured. Organizations hemorrhaging margin simultaneously leave revenue on the table because they lack visibility into usage patterns, unit economics, and the data required for usage-based pricing.

Only 15% of companies can forecast AI costs within ±10% accuracy. Everyone else is operating on hope rather than data.

Crisis #3: The shadow AI time bomb

The third crisis is quieter but potentially more damaging. 86% of organizations have no visibility into their AI data flows. 20% of security breaches are now classified as Shadow AI incidents. And 96% of enterprises acknowledge that AI agents have either already introduced security risks or will soon.

Development teams under pressure to ship are spinning up LLM connections, routing sensitive data to models, and expanding agent-to-agent communication, all often without security review. The attack surface grows with every deployment, but visibility doesn’t keep up.

By the time organizations discover the problem — through a breach, a failed audit, or a regulatory inquiry — the damage is structural. Remediation means rollbacks, rebuilds, and reputational harm that takes years to recover from.

Why solving these separately doesn’t work

Most organizations get it wrong by treating speed, cost, and governance as independent problems that require separate solutions.

They task Dev and AI/ML teams to drive velocity, task FinOps with controlling AI costs, and task Security to build governance frameworks–all without a shared, unified approach: three workstreams and three organizational silos.

This introduces fragmentation. The structure designed to solve the problem makes it worse. This doesn’t mean these teams shouldn’t be the primary owners of their workstreams; it’s just that leaders shouldn’t approach these challenges in silos. Think about the relationships this way:

  • Governance: Without it, speed creates risk. Every agent deployed without proper controls expands the attack surface. Moving fast just accumulates vulnerabilities faster. In the long term, this slows you down. Governance–when done properly–will equate to speed.
  • Cost visibility: Otherwise, speed burns margin. Every deployment without unit economics is just a bet that the math will work out later. Moving fast means hemorrhaging money faster. And hemorrhaging money ultimately leaves less budget for innovation.
  • Speed: Without speed, governance becomes stagnant. Manual review cycles and approval processes that worked for traditional IT can’t scale to agentic workloads. Governance that slows deployment to a crawl isn’t governance — it’s a slow path to irrelevance.

The organizations that master all three simultaneously will reap the benefits, while those that try to solve them separately will see the gaps widen.

The winners in the agentic era have built a unified infrastructure that addresses speed, cost, and governance

What winning looks like

The winners in the agentic era share a familiar pattern: they’ve built unified infrastructure that addresses speed, cost, and governance as a single integrated platform. This allows them to:

  • Deploy with confidence. Teams ship agents knowing that guardrails are automated, not manual. Security and compliance happen at the infrastructure layer, not through review meetings that add weeks to timelines.
  • Invest with clarity. Finance trusts forecasts because they’re based on consumption data. Product teams can model unit economics before launch. Cost attribution connects spending to business outcomes.
  • Monetize what they build. Usage-based pricing is possible because consumption is metered at every layer, and AI capabilities generate revenue streams.
  • See the full picture. Visibility spans the entire AI data path, not just LLM calls, but the APIs, events, MCP connections, and agent-to-agent communications that make up real-world agentic architectures.
  • Move faster over time. Each deployment builds on the last, institutional knowledge accumulates, and the platform gets smarter.

This is the flywheel in action: Governance enables speed, speed enables cost efficiency, cost efficiency funds further investment in governance and velocity. The three capabilities compound when unified and collapse when fragmented.

An animated infographic depicting the Agentic Innovation flywheel. At the center of the diagram is a bright yellow circle containing the title. Surrounding this center are four strategic pillars connected by a circular dashed line:

AI connectivity: The unified platform approach

The solution to these compounding challenges isn’t another point tool. It’s a new architectural approach for how AI systems, APIs, and agents connect and run in production: AI connectivity.

AI connectivity is the unified governance and runtime layer that spans the full data path agents traverse, from APIs and events to LLM calls, MCP connections, and agent-to-agent communication.

Traditional API management handles request-response traffic between applications. AI Gateways handle traffic between agents and models. Alone, neither addresses the full scope of what agentic AI requires.

Agents don’t just call LLMs. They traverse the entire digital ecosystem, from invoking MCP tools to consuming APIs and event streams as context, coordinating with other agents, and accessing data sources across the enterprise. Each connection point requires visibility, control, and governance that all work together.

AI connectivity closes this gap by providing:

  • Unified traffic management across different protocols, contexts, and intelligence, the agentic stack — REST, GraphQL, gRPC, Kafka, WebSocket, MCP, LLMs, A2A, and more.
  • Consistent policy enforcement that applies security, compliance, and cost controls regardless of whether the traffic is a traditional API call or an agent reasoning through a multi-step workflow.
  • Full data path observability that shows not just what agents are doing, but what they’re connecting to, what data is flowing where, and what it costs.
  • Built-in monetization infrastructure that meters consumption at every layer, enabling usage-based pricing, cost attribution, and unit economics visibility.
  • Developer self-service that lets teams build and deploy without waiting for manual reviews.

When governance, cost visibility, and deployment velocity share a common platform, they reinforce one another rather than compete. Teams move fast with automatic guardrails, costs stay visible with metering built into the runtime, and security scales from policies enforced at the infrastructure layer.

Kong: The foundation for an AI connectivity strategy

Here’s what we’ve built at Kong.

Kong provides the AI connectivity layer that spans the whole data path across APIs, events, and AI-native traffic, with the governance, observability, and monetization infrastructure that sustainable AI programs require.

Organizations using Kong can see and control the entire AI data path from a single platform. They can enforce consistent policies across all traffic types, meter usage for cost attribution and revenue capture, and give developers self-service access to the infrastructure they need to build and deploy agents at scale.

This is AI connectivity in practice: a unified platform that makes the speed-cost-governance flywheel actually work.

The window is starting to close

The organizations that will dominate the agentic era are building their platform foundations today. They’re not waiting for the perfect solution; instead, they’re establishing the infrastructure to support increasingly sophisticated AI workloads.

Most enterprises are still struggling with fragmented tools and siloed approaches, and the market leadership opportunities remain wide open for those who move decisively.

But the window is closing. With each passing quarter, a few more organizations adopt the unified platform approach. Once leaders separate from the pack, catching up becomes exponentially more complicated.

The question isn’t whether AI connectivity matters. It’s whether you’re building on it or falling behind those who are.

The post Why 40% of AI projects will be canceled by 2027 (and how to stay in the other 60%) appeared first on The New Stack.

Read the whole story
alvinashcraft
22 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Bill Introduced To Replace West Virginia's New CS Course Graduation Requirement With Computer Literacy Proficiency

1 Share
theodp writes: West Virginia lawmakers on Tuesday introduced House Bill 5387 (PDF), which would repeal the state's recently enacted mandatory stand-alone computer science graduation requirement and replace it with a new computer literacy proficiency requirement. Not too surprisingly, the Bill is being opposed by tech-backed nonprofit Code.org, which lobbied for the WV CS graduation requirement (PDF) just last year. Code.org recently pivoted its mission to emphasize the importance of teaching AI education alongside traditional CS, teaming up with tech CEOs and leaders last year to launch a national campaign to mandate CS and AI courses as graduation requirements. "It would basically turn the standalone computer science course requirement into a computer literacy proficiency requirement that's more focused on digital literacy," lamented Code.org as it discussed the Bill in a Wednesday conference call with members of the Code.org Advocacy Coalition, including reps from Microsoft's Education and Workforce Policy team. "It's mostly motivated by a variety of different issues coming from local superintendents concerned about, you know, teachers thinking that students don't need to learn how to code and other things. So, we are addressing all of those. We are talking with the chair and vice chair of the committee a week from today to try to see if we can nip this in the bud." Concerns were also raised on the call about how widespread the desire for more computing literacy proficiency (over CS) might be, as well as about legislators who are associating AI literacy more with digital literacy than CS. The proposed move from a narrower CS focus to a broader goal of computer literacy proficiency in WV schools comes just months after the UK's Department for Education announced a similar curriculum pivot to broader digital literacy, abandoning the narrower 'rigorous CS' focus that was adopted more than a decade ago in response to a push by a 'grassroots' coalition that included Google, Microsoft, UK charities, and other organizations.

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
22 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Automate repository tasks with GitHub Agentic Workflows

1 Share

Imagine visiting your repository in the morning and feeling calm because you see:

  • Issues triaged and labelled
  • CI failures investigated with proposed fixes
  • Documentation has been updated to reflect recent code changes.
  • Two new pull requests that improve testing await your review.

All of it visible, inspectable, and operating within the boundaries you’ve defined.

That’s the future powered by GitHub Agentic Workflows: automated, intent-driven repository workflows that run in GitHub Actions, authored in plain Markdown and executed with coding agents. They’re designed for people working in GitHub, from individuals automating a single repo to teams operating at enterprise or open-source scale.

At GitHub Next, we began GitHub Agentic Workflows as an investigation into a simple question: what does repository automation with strong guardrails look like in the era of AI coding agents? A natural place to start was GitHub Actions, the heart of scalable repository automation on GitHub. By bringing automated coding agents into actions, we can enable their use across millions of repositories, while keeping decisions about when and where to use them in your hands.

GitHub Agentic Workflows are now available in technical preview. In this post, we’ll explain what they are and how they work. We invite you to put them to the test, to explore where repository-level AI automation delivers the most value.

Graphic showing quotes from customers. 'Home Assistant has thousands of open issues. No human can track what's trending or which problems affect the most users. I've built GitHub Agentic Workflows that analyze issues and surface what matters: that's the kind of judgment amplification that actually helps maintainers.'- Franck Nijhof, lead of the Home Assistant project, one of the top projects on GitHubby contributor countAgentic workflows also allow maintainers and community to experiment with repository automation together. 'Adopting GitHub’s Agentic Workflows has lowered the barrier for experimentation with AI tooling, making it significantly easier for staff, maintainers and newcomers alike. Inside of CNCF, we are benefiting from improved documentation automation along with improving team reporting across the organization. This isn't just a technical upgrade for our community, it’s part of a cultural shift that empowers our ecosystem to innovate faster with AI and agentic tooling.'- Chris Aniszczyk, CTO of the Cloud Native Computing Foundation (CNCF), whose mission is to make cloud native computing ubiquitous across the worldEnterprises are seeing similar benefits at scale. 'With GitHub Agentic Workflows, we’re able to expand how we apply agents to real engineering work at scale, including changes that span multiple repositories. The flexibility and built-in controls give us confidence to leverage Agentic Workflows across complex systems at Carvana.'- Alex Devkar, Senior Vice President, Engineering and Analytics, at Carvana

AI repository automation: A revolution through simplicity 

The concept behind GitHub Agentic Workflows is straightforward: you describe the outcomes you want in plain Markdown, add this as an automated workflow to your repository, and it executes using a coding agent in GitHub Actions.

This brings the power of coding agents into the heart of repository automation. Agentic workflows run as standard GitHub Actions workflows, with added guardrails for sandboxing, permissions, control, and review. When they execute, they can use different coding agent engines—such as Copilot CLI, Claude Code, or OpenAI Codex—depending on your configuration.

The use of GitHub Agentic Workflows makes entirely new categories of repository automation and software engineering possible, in a way that fits naturally with how developer teams already work on GitHub. All of them would be difficult or impossible to accomplish traditional YAML workflows alone:

  1. Continuous triage: automatically summarize, label, and route new issues.
  2. Continuous documentation: keep READMEs and documentation aligned with code changes.
  3. Continuous code simplificationrepeatedly identify code improvements and open pull requests for them.
  4. Continuous test improvementassess test coverage and add high-value tests.
  5. Continuous quality hygiene: proactively investigate CI failures and propose targeted fixes.
  6. Continuous reportingcreate regular reports on repository health, activity, and trends.

These are just a few examples of repository automations that showcase the power of GitHub Agentic Workflows. We call this Continuous AI: the integration of AI into the SDLC, enhancing automation and collaboration similar to continuous integration and continuous deployment (CI/CD) practices.

GitHub Agentic Workflows and Continuous AI are designed to augment existing CI/CD rather than replace it. They do not replace build, test, or release pipelines, and their use cases largely do not overlap with deterministic CI/CD workflows. Agentic workflows run on GitHub Actions because that is where GitHub provides the necessary infrastructure for permissions, logging, auditing, sandboxed execution, and rich repository context.

In our own usage at GitHub Next, we’re finding new uses for agentic workflows nearly every day. Throughout GitHub, teams have been using agentic workflows to create custom tools for themselves in minutes, replacing chores with intelligence or paving the way for humans to get work done by assembling the right information, in the right place, at the right time. A new world of possibilities is opening for teams and enterprises to keep their repositories healthy, navigable, and high-quality.

Let’s talk guardrails and control 

Designing for safety and control is non-negotiable. GitHub Agentic Workflows implements a defense-in-depth security architecture that protects against unintended behaviors and prompt-injection attacks.

Workflows run with read-only permissions by default. Write operations require explicit approval through safe outputs, which map to pre-approved, reviewable GitHub operations such as creating a pull request or adding a comment to an issue. Sandboxed execution, tool allowlisting, and network isolation help ensure that coding agents operate within controlled boundaries.

Guardrails like these make it practical to run agents continuously, not just as one-off experiments. See our security architecture for more details.

One alternative approach to agentic repository automation is to run coding agent CLIs, such as Copilot or Claude, directly inside a standard GitHub Actions YAML workflow. This approach often grants these agents more permission than is required for a specific task. In contrast, GitHub Agentic Workflows run coding agents with read-only access by default and rely on safe outputs for GitHub operations, providing tighter constraints, clearer review points, and stronger overall control.

A simple example: A daily repo report  

Let’s look at an agentic workflow which creates a daily status report for repository maintainers.

In practice, you will usually use AI assistance to create your workflows. The easiest way to do this is with an interactive coding agent. For example, with your favorite coding agent, you can enter this prompt:

Generate a workflow that creates a daily repo status report for a maintainer. Use the instructions at https://github.com/github/gh-aw/blob/main/create.md

The coding agent will interact with you to confirm your specific needs and intent, write the Markdown file, and check its validity. You can then review, refine, and validate the workflow before adding it to your repository.

This will create two files in .github/workflows

  • daily-repo-status.md (the agentic workflow)  
  • daily-repo-status.lock.yml (the corresponding agentic workflow lock file, which is executed by GitHub Actions) 

The file daily-repo-status.md will look like this: 

--- 
on: 
  schedule: daily 
 
permissions: 
  contents: read 
  issues: read 
  pull-requests: read 
 
safe-outputs: 
  create-issue: 
    title-prefix: "[repo status] " 
    labels: [report] 
 
tools: 
  github: 
---  
 
# Daily Repo Status Report 
 
Create a daily status report for maintainers. 
 
Include 
- Recent repository activity (issues, PRs, discussions, releases, code changes) 
- Progress tracking, goal reminders and highlights 
- Project status and recommendations 
- Actionable next steps for maintainers 
 
Keep it concise and link to the relevant issues/PRs.

This file has two parts: 

  1. Frontmatter (YAML between --- markers) for configuration 
  2. Markdown instructions that describe the job in natural language in natural language

The Markdown is the intent, but the trigger, permissions, tools, and allowed outputs are spelled out up front.

If you prefer, you can add the workflow to your repository manually: 

  1. Create the workflow: Add  daily-repo-status.md with the frontmatter and instructions.
  2. Create the lock file:  
    • gh extension install github/gh-aw  
    • gh aw compile
  3. Commit and push: Commit and push files to your repository.
  4. Add any required secrets: For example, add a token or API key for your coding agent.

Once you add this workflow to your repository, it will run automatically or you can trigger it manually using GitHub Actions. When the workflow runs, it creates a status report issue like this:

Screenshot of a GitHub issue titled "Daily Repo Report - February 9, 2026" showing key highlights, including 2 new releases, 1,737 commits from 16 contributors, 100 issues closed with 190 new issues opened, 50 pull requests merged from 93 opened pull requests, and 5 code quality issues opened.

What you can build with GitHub Agentic Workflows 

If you’re looking for further inspiration Peli’s Agent Factory is a guided tour through a wide range of workflows, with practical patterns you can adapt, remix, and standardize across repos.

A useful mental model: if repetitive work in a repository can be described in words, it might be a good fit for an agentic workflow.

If you’re looking for design patterns, check out ChatOps, DailyOps, DataOps, IssueOps, ProjectOps, MultiRepoOps, and Orchestration.

Uses for agent-assisted repository automation often depend on particular repos and development priorities. Your team’s approach to software development will differ from those of other teams. It pays to be imaginative about how you can use agentic automation to augment your team for your repositories for your goals.

Practical guidance for teams 

Agentic workflows bring a shift in thinking. They work best when you focus on goals and desired outputs rather than perfect prompts. You provide clarity on what success looks like, and allow the workflow to explore how to achieve it. Some boundaries are built into agentic workflows by default, and others are ones you explicitly define. This means the agent can explore and reason, but its conclusions always stay within safe, intentional limits.

You will find that your workflows can range from very general (“Improve the software”) to very specific (“Check that all technical documentation and error messages for this educational software are written in a style suitable for an audience of age 10 or above”). You can choose the level of specificity that’s appropriate for your team.

GitHub Agentic Workflows use coding agents at runtime, which incur billing costs. When using Copilot with default settings, each workflow run typically incurs two premium requests: one for the agentic work and one for a guardrail check through safe outputs. The models used can be configured to help manage these costs. Today, automated uses of Copilot are associated with a user account. For other coding agents, refer to our documentation for details. Here are a few more tips to help teams get value quickly:

  • Start with low-risk outputs such as comments, drafts, or reports before enabling pull request creation.
  • For coding, start with goal-oriented improvements such as routine refactoring, test coverage, or code simplification rather than feature work.
  • For reports, use instructions that are specific about what “good” looks like, including format, tone, links, and when to stop.
  • Agentic workflows create an agent-only, sub-loop that’s able to be autonomous because agents are acting under defined terms. But it’s important that humans stay in the broader loop of forward progress in the repository, through reports, issues, and pull requests. With GitHub Agentic Workflows, pull requests are never merged automatically, and humans must always review and approve.
  • Treat the workflow Markdown as code. Review changes, keep it small, and evolve it intentionally.

Continuous AI works best if you use it in conjunction with CI/CD. Don’t use agentic workflows as a replacement for GitHub Actions YAML workflows for CI/CD. This approach extends continuous automation to more subjective, repetitive tasks that traditional CI/CD struggle to express.

Build the future of automation with us   

GitHub Agentic Workflows are available now in technical preview and are a collaboration between GitHub, Microsoft Research, and Azure Core Upstream. We invite you to try them out and help us shape the future of repository automation.

We’d love for you to be involved! Share your thoughts in the Community discussion, or join us (and tons of other awesome makers) in the #agentic-workflows channel of the GitHub Next Discord. We look forward to seeing what you build with GitHub Agentic Workflows. Happy automating!

Try GitHub Agentic Workflows in a repo today! Install gh-aw, add a starter workflow or create one using AI, and run it. Then, share what you build (and what you want next)

The post Automate repository tasks with GitHub Agentic Workflows   appeared first on The GitHub Blog.

Read the whole story
alvinashcraft
23 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

AGL 456: John Rood

1 Share

About John

John Rood is founder at Proceptual, a AI governance consulting and training company. John is a certified ISO 42001 Lead Auditor. He also teaches AI governance and safety at Michigan State University and the University of Chicago.


Today We Talked About

  • John’s background
  • AI Governance
  • Policy
  • Being Agile
  • Prompt injection
  • PII data
  • Incident Reporting Mechanism
  • Response
  • Training
  • Who and How?
  • ISO 42001
  • Relationships

 


Connect with John


Leave me a tip $
Click here to Donate to the show


I hope you enjoyed this show, please head over to Apple Podcasts and subscribe and leave me a rating and review, even one sentence will help spread the word.  Thanks again!





Download audio: https://media.blubrry.com/a_geek_leader_podcast__/mc.blubrry.com/a_geek_leader_podcast__/AGL_456_John_Rood.mp3?awCollectionId=300549&awEpisodeId=11936177&aw_0_azn.pgenre=Business&aw_0_1st.ri=blubrry&aw_0_azn.pcountry=US&aw_0_azn.planguage=en&cat_exclude=IAB1-8%2CIAB1-9%2CIAB7-41%2CIAB8-5%2CIAB8-18%2CIAB11-4%2CIAB25%2CIAB26&aw_0_cnt.rss=https%3A%2F%2Fwww.ageekleader.com%2Ffeed%2Fpodcast
Read the whole story
alvinashcraft
23 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Why maintaining a codebase is so damn hard – with OhMyZSH creator Robby Russell [Podcast #207]

1 Share

Today Quincy Larson interviews Robby Russell. Robby created the open-source project Oh My ZSH.

Oh My Zsh is a framework for managing your Zsh configuration for your command line terminal. It's been extremely popular among developers for more than a decade.

Robby is also the CEO of Planet Argon, a software consultancy he created two decades ago. He's done work for Nike and lots of other companies.

Note that this discussion is aimed at more advanced devs and engineering managers.

We talk about:

  • How a "Don't let that happen again" culture can make it take forever to get new code into production, and how to reverse this

  • Tips for reducing your team's dependency on that one developer who's been there for years

  • Robby's perspective on LLM tools and how they're speeding up his workflows

Today Quincy Larson interviews Robby Russell. Robby created the open-source project Oh My ZSH.

Oh My Zsh is a framework for managing your Zsh configuration for your command line terminal. It's been extremely popular among developers for more than a decade.

Robby is also the CEO of Planet Argon, a software consultancy he created two decades ago. He's done work for Nike and lots of other companies.

Note that this discussion is aimed at more advanced devs and engineering managers.

We talk about:

  • How a "Don't let that happen again" culture can make it take forever to get new code into production, and how to reverse this

  • Tips for reducing your team's dependency on that one developer who's been there for years

  • Robby's perspective on LLM tools and how they're speeding up his workflows

Watch the podcast on the freeCodeCamp.org YouTube channel or listen on your favorite podcast player.

Links from our discussion:

Community news section:

  1. Learn to code in Python from one of the greatest living Computer Science professors, Harvard's David J. Malan. This is the 2026 version of the famous CS50 course. It will teach you Python programming fundamentals like functions, conditionals, loops, libraries, file I/O, and more. If you're new to Python, or to coding in general, this is an excellent place to start. (25 hour YouTube course): https://www.freecodecamp.org/news/harvard-cs50-2026-free-computer-science-university-course

  2. That Harvard computer science course will get you started with programming. But where do you go from there? freeCodeCamp just published a helpful tutorial that will help you bridge from beginner projects to building real-world applications that solve real-world problems. (40 minute read): https://www.freecodecamp.org/news/how-to-go-from-hello-world-to-building-real-world-applications/

  3. freeCodeCamp also just published a comprehensive intro to OpenClaw. If you've heard of Clawd Bot or Moltbot, this is the same tool, which they renamed to avoid confusion with the Claude LLM tool. OpenClaw is an agent and messaging gateway that lets you automate digital tasks through platforms like Discord. First you'll learn how to set it up. Then you'll learn security practices like implementing Docker-based sandboxing to protect your host system while your agent executes complicated workflows on your behalf. (1 hour YouTube course): https://www.freecodecamp.org/news/openclaw-full-tutorial-for-beginners

  4. You may be using Bluetooth as you read this. It's been a key networking tool since 1999, and now it's getting 3 major upgrades: Passive Scanning, Bond Loss Reasons, and propagation of Service UUIDs. If you're interested in network engineering or IoT–style devices, this tutorial is well worth your read. (90 minute read): https://www.freecodecamp.org/news/how-aosp-16-bluetooth-scanner-works-the-ultimate-guide/

  5. Today's song of the week is 2009 banger Sometimes by Australian band Miami Horror. I love the layers Peter Hooke-style guitar riff, the anthemic snyths, and the driving bassline. This is a perfect song to start of your morning. https://www.youtube.com/watch?v=Fn7FXGaHTNs

Get a freeCodeCamp tshirt for $20 with free shipping anywhere in the US: https://shop.freecodecamp.org

Links from our discussion:

Community news section:

  1. Learn to code in Python from one of the greatest living Computer Science professors, Harvard's David J. Malan. This is the 2026 version of the famous CS50 course. It will teach you Python programming fundamentals like functions, conditionals, loops, libraries, file I/O, and more. If you're new to Python, or to coding in general, this is an excellent place to start. (25 hour YouTube course): https://www.freecodecamp.org/news/harvard-cs50-2026-free-computer-science-university-course

  2. That Harvard computer science course will get you started with programming. But where do you go from there? freeCodeCamp just published a helpful tutorial that will help you bridge from beginner projects to building real-world applications that solve real-world problems. (40 minute read): https://www.freecodecamp.org/news/how-to-go-from-hello-world-to-building-real-world-applications/

  3. freeCodeCamp also just published a comprehensive intro to OpenClaw. If you've heard of Clawd Bot or Moltbot, this is the same tool, which they renamed to avoid confusion with the Claude LLM tool. OpenClaw is an agent and messaging gateway that lets you automate digital tasks through platforms like Discord. First you'll learn how to set it up. Then you'll learn security practices like implementing Docker-based sandboxing to protect your host system while your agent executes complicated workflows on your behalf. (1 hour YouTube course): https://www.freecodecamp.org/news/openclaw-full-tutorial-for-beginners

  4. You may be using Bluetooth as you read this. It's been a key networking tool since 1999, and now it's getting 3 major upgrades: Passive Scanning, Bond Loss Reasons, and propagation of Service UUIDs. If you're interested in network engineering or IoT–style devices, this tutorial is well worth your read. (90 minute read): https://www.freecodecamp.org/news/how-aosp-16-bluetooth-scanner-works-the-ultimate-guide/

  5. Today's song of the week is 2009 banger Sometimes by Australian band Miami Horror. I love the layers Peter Hooke-style guitar riff, the anthemic snyths, and the driving bassline. This is a perfect song to start of your morning. https://www.youtube.com/watch?v=Fn7FXGaHTNs



Read the whole story
alvinashcraft
23 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories