Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
155580 stories
·
33 followers

One Copilot to Rule Them All? Microsoft’s Unified AI Platform for Work and Life

1 Share

Key Takeaways:

  • Microsoft is unifying consumer and enterprise Copilot into one platform to reduce fragmentation and create a more seamless AI experience across work and personal use.
  • The strategy may improve product coherence and speed innovation, but it also increases the risk of deeper customer lock-in to Microsoft’s cloud, apps, identity, and AI stack.
  • For IT leaders, the big questions are governance and control: data separation, admin policies, licensing, and how to manage Copilot across both personal and business contexts.

Microsoft went all-in on One Copilot earlier this year. In March 2026, CEO Satya Nadella reorganized the company’s AI efforts, consolidating the consumer and enterprise Copilot teams under one unified leadership. That means Microsoft’s once-separate personal and business Copilot projects, like Copilot for individuals and Microsoft 365 Copilot for enterprises, are now being built as one platform with a single boss, Jacob Andreou, reporting directly to Nadella.

Microsoft’s rationale is that if Copilot is supposed to be the everyday AI assistant across everything you do, it’s better designed as a single, integrated system rather than a scattered collection of AI features.

This consolidation makes pragmatic sense for Microsoft. For one, it reduces fragmentation. Instead of two different AI assistants with separate capabilities and roadmaps, you will get one Copilot spanning Windows, Microsoft 365 apps, Teams, and personal accounts. The unified approach could also accelerate innovation by aligning every Copilot feature on one foundation, with shared AI models and design ethos.

So where’s the catch? In a word: lock-in.

One Copilot everywhere inherently rests on being deeply embedded in Microsoft’s platform (cloud, OS, apps) at every turn. It’s a good strategy for Microsoft, boosting the “platform gravity” that keeps customers orbiting their services.

But if you’re an IT leader trying to maintain flexibility, the unified Copilot approach may heighten your long-term dependency on Microsoft’s stack. The more your users rely on the seamless Copilot spanning their lives, the less room you’ll have to adopt alternate AI solutions or switch providers down the road.

In effect, Microsoft is making the case: stick with us, we’ll make AI easy everywhere. It’s a compelling pitch but be aware of the golden handcuffs.

A cautionary tale of Teams and Skype

Even as Microsoft’s Teams collaboration suite has become ubiquitous in business, it hasn’t matched the consumer reach of its predecessor Skype. Microsoft tried to collapse personal and enterprise communications into one platform, phasing out Skype in 2025 in favor of “Teams (Free) for personal use”. Even baking a “Chat with Teams” button into Windows 11 by default, which it has since removed.

But Teams hasn’t achieved the same consumer ubiquity Skype once had (the latter still boasted some 300 million monthly users as recently as 2019). This mixed track record of uniting enterprise and consumer experiences (with Teams still mainly perceived as a work app) stands as a cautionary tale for Copilot’s unification. It underscores that even sensible platform consolidation doesn’t guarantee broad adoption. Especially if consumers see the product as an enterprise tool rather than an everyday essential.

Copilot data and identity boundaries

There are also practical governance questions. When Copilot lives in both personal and business contexts, data and identity boundaries become paramount. Today, enterprise admins can control and configure Microsoft 365 Copilot or Windows Copilot separately via policies and settings (like toggling Copilot on corporate devices, for example).

Under a unified Copilot, how will Microsoft ensure corporate data stays completely separated from personal Copilot interactions? Today, this is enforced by forcing you to switch between Microsoft work and personal accounts. It provides clear separation but it doesn’t lead to an elegant user experience.

With Copilot integration pervasive, companies may need to update internal guidelines, training employees on personal vs. work usage and adjusting compliance rules for AI-generated content.

Microsoft MAI

Notably, the Copilot mega-merger coincides with another strategic shift: Microsoft’s launch of its first in-house AI models (nicknamed “MAI”) for speech-to-text, voice, and image generation. Rolled out in April 2026, these models are Microsoft’s hedge to reduce dependence on OpenAI’s tech. The move aims to give Microsoft more control over the AI stack powering Copilot for cost efficiency, scale, and customization.

In short, the unified Copilot vision is increasingly backed by Microsoft’s own AI engines, potentially making it an even more fully Microsoft-native platform going forward.

Convenience comes with deeper platform entrenchment

Unifying Copilot under one umbrella is a forward-looking bet from Microsoft. It could usher in seamless, contextual AI assistance, one that follows a user from writing a Word report to planning a family holiday, no disjointed handoffs or app-hopping needed.

For IT decision-makers and practitioners, the upside is a more coherent AI deployment (one platform to manage) and a workforce that can reap productivity gains.

Just go in with eyes open: convenience comes with deeper platform entrenchment. Now is a good time to ask questions about licensing, admin controls, and data segregation. And to fine-tune your governance policies for a world where Copilot is everywhere.

The post One Copilot to Rule Them All? Microsoft’s Unified AI Platform for Work and Life appeared first on Petri IT Knowledgebase.

Read the whole story
alvinashcraft
6 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Best Python AI Frameworks in 2026

1 Share
Best Python AI Frameworks in 2026

Whether you’re building chatbots, training computer vision models, or analyzing business data, choosing the right AI framework can make or break your project. Python has become the dominant language for AI and machine learning development, and the ecosystem of frameworks supporting this work has matured significantly.

The right framework choice depends on what you’re building. A production recommendation system has different requirements than a research prototype. A chatbot powered by large language models (LLMs) needs different tools than a fraud detection system analyzing tabular data.

Let’s explore seven essential frameworks and where each excels so you can find the best AI framework for your specific project.

What is an AI framework?

AI frameworks are pre-built libraries and tools that handle the complex mathematics, data structures, and computational operations underlying AI and machine learning models. Rather than implementing neural networks or gradient descent from scratch, AI frameworks provide abstractions that let you focus on model architecture, data preparation, and business logic.

These frameworks generally fall into three categories:

  • Deep learning frameworks like TensorFlow, PyTorch, and Keras specialize in neural networks and GPU acceleration for tasks involving images, text, and audio.
  • Classical and tabular machine learning frameworks like scikit-learn and XGBoost focus on statistical and tree-based models for structured data, powering many real-world AI systems, including forecasting, risk-scoring, and decision-automation solutions.
  • LLM and AI agent frameworks like LangChain and Hugging Face provide tools for building applications powered by large language models.

Why do AI frameworks matter? 

AI frameworks dramatically accelerate your development by providing tested, optimized implementations of complex algorithms. They offer strong community support with extensive documentation, tutorials, and troubleshooting resources. They provide production-ready tooling for deployment, monitoring, and scaling. They’re optimized for specific hardware like GPUs and TPUs, delivering performance that would be difficult to achieve with custom implementations.

Open-source vs. commercial AI frameworks

Open-source AI frameworks are the dominant model in AI development today. And they offer compelling advantages, from community-driven innovation for rapid feature development and bug fixes to transparency that enables auditing and algorithm customization. There’s also no vendor lock-in or licensing fees, making them cost-effective for both experimentation and production deployment.

Commercial AI platforms also exist, with AWS SageMaker, Google Vertex AI, and Azure Machine Learning among the prominent examples. However, these platforms often use open-source frameworks underneath rather than competing with them directly. They provide managed infrastructure, automated workflows, and enterprise features on top of tools like TensorFlow and PyTorch.

If you’re thinking open source means they’re unsupported, think again. All seven frameworks below have robust ecosystems, and many are backed by major tech companies. Google supports TensorFlow, Meta backs PyTorch, and organizations like Microsoft contribute significantly to various projects in the ecosystem.

Top Python AI frameworks

These seven frameworks represent the essential toolkit for Python AI development in 2026. Each performs strongly in specific domains, and many developers use multiple frameworks depending on project requirements.

TensorFlow

TensorFlow is an open-source deep learning framework developed by Google for building and deploying machine learning models at enterprise scale. With a 37% market share in data science and machine learning and adoption by 25,000 companies globally, TensorFlow has proven itself in high-stakes production environments.

The framework evolved significantly from TensorFlow 1.x to 2.x, with Keras integration making it far more accessible while maintaining its enterprise-grade capabilities. If you’re building large-scale image recognition systems or natural language processing pipelines, or you need to deploy across web, mobile, and edge devices through TensorFlow Lite and TensorFlow.js, TensorFlow can help.

If you’re just getting started with TensorFlow, follow our step-by-step tutorial on how to train your first TensorFlow model using PyCharm.

Advantages of TensorFlow

  • Enterprise-grade scalability: Built for production from day one, TensorFlow handles massive datasets and distributed training across multiple GPUs and TPUs seamlessly. You can scale from experimentation to serving millions of predictions without switching tools.
  • Comprehensive deployment ecosystem: TensorFlow Serving handles model deployment, TensorFlow Lite optimizes for mobile and edge devices, and TensorFlow.js brings models to browsers. This complete deployment story reduces friction when moving from development to production.
  • TPU optimization: Native support for Google’s Tensor Processing Units delivers superior performance for large-scale training workloads, offering significantly better performance per watt than traditional hardware.
  • Strong industry adoption: Companies like Airbnb, Twitter, and Intel rely on TensorFlow for critical applications, giving you confidence in its production readiness and long-term viability.

Disadvantages of TensorFlow

  • Steeper learning curve: Despite Keras integration, TensorFlow’s complexity can overwhelm beginners, especially when you move beyond high-level APIs to custom implementations.
  • Verbose syntax for custom models: Building custom training loops or novel architectures requires significantly more code compared with PyTorch’s more Pythonic approach.
  • Debugging challenges: Static graph optimization, while beneficial for performance, can make runtime errors harder to trace than in frameworks with dynamic computation graphs.

scikit-learn

scikit-learn is an open-source Python library for classical machine learning, providing simple and efficient tools for classification, regression, clustering, and dimensionality reduction. With adoption by over 16,000 companies worldwide, it’s your essential first stop for structured and tabular data before considering deep learning approaches.

The framework supports a wide range of supervised and unsupervised learning on structured business data, along with feature engineering and data preprocessing pipelines. Companies like J.P. Morgan use scikit-learn extensively for classification tasks and predictive analytics in financial decision-making.

Advantages of scikit-learn

  • Beginner-friendly API: Consistent, intuitive syntax across all algorithms makes learning and switching between models effortless. The fit/predict pattern works the same whether you’re using linear regression or random forests.
  • Comprehensive algorithm library: Its library covers virtually every classical ML algorithm – regression, classification, clustering, dimensionality reduction – with well-tested implementations ready for your projects.
  • Excellent for tabular data: On structured data, traditional algorithms often outperform deep learning, and scikit-learn gives you the tools to maximize this advantage.
  • Fast prototyping: Its simple syntax means you can build and test models in minutes, not hours, making it ideal for rapid experimentation.
  • Seamless integration: scikit-learn works perfectly with NumPy, pandas, and Matplotlib, fitting naturally into your data science workflows.

Disadvantages of scikit-learn

  • No deep learning support: scikit-learn is not designed for neural networks – you’ll need to switch to TensorFlow or PyTorch for complex deep learning architectures.
  • Limited GPU acceleration: The framework is CPU-bound and struggles with very large datasets where GPU-accelerated frameworks perform better.
  • Not suited for unstructured data: Images, text, and audio require deep learning frameworks that can handle high-dimensional, unstructured inputs.

PyTorch

PyTorch is an open-source deep learning framework developed by Meta that prioritizes flexibility and a natural Python coding experience. It’s used in approximately 85% of deep learning research papers and has a 55% adoption rate in the research community. From its academic roots, PyTorch has evolved into a production-ready powerhouse.

The framework excels at cutting-edge research and experimentation with novel architectures. It supports natural language processing and generative AI models such as GPT, Llama, and Stable Diffusion, and enables computer vision research with custom model development. Its Pythonic philosophy makes it feel natural if you’re already comfortable with Python, reducing cognitive load and accelerating your development.

Advantages of PyTorch

  • Dynamic computation graphs: The define-by-run approach allows runtime model modifications, making debugging and experimentation intuitive. You can use standard Python control flow and debugging tools you already know.
  • Pythonic and readable: PyTorch code feels like native Python, not a separate language. This flattens your learning curve and makes code more maintainable.
  • Research-first innovation: Latest techniques and models appear in PyTorch first, driven by its dominance in academic research.
  • Strong ecosystem: Hugging Face Transformers, PyTorch Lightning, and extensive community packages provide specialized tools for virtually any task you’ll encounter.

Disadvantages of PyTorch

  • Deployment complexity: While TorchServe has improved the situation, PyTorch historically has had weaker production tooling compared to TensorFlow’s mature deployment ecosystem.
  • Manual training loops: Greater control means more boilerplate code for standard training patterns, though libraries like PyTorch Lightning address this.

Keras

Keras is a high-level deep learning API designed for fast experimentation with neural networks. With over 60,000 GitHub stars and integration as TensorFlow’s default interface, Keras has become synonymous with rapid prototyping and ease of use. The release of Keras 3.0 changed the game by adding multi-backend support for TensorFlow, JAX, and PyTorch.

The framework is ideal for rapidly prototyping neural network architectures, working on educational projects to learn deep learning fundamentals, or tackling deep learning tasks that don’t require low-level customization.

Advantages of Keras

  • Simplest API in deep learning: You can build sophisticated models in just a few lines of code with the Sequential or Functional API, offering the lowest barrier to entry in deep learning.
  • Multi-backend flexibility: Keras 3.0 runs on TensorFlow, JAX, or PyTorch – write once, run anywhere. This future-proofs your code and lets you switch backends as your needs change.
  • Built-in best practices: The API guides you toward sound model architecture decisions and incorporates best practices by default.
  • Fast experimentation: You can iterate quickly without wrestling with framework complexity, focusing on model design rather than implementation details.

Disadvantages of Keras

  • Limited low-level control: The abstraction layer sacrifices fine-grained control needed for cutting-edge research or novel architectures.
  • Performance overhead: The additional abstraction can introduce latency compared to native framework calls, though this is often negligible for most applications.
  • Less suitable for custom architectures: Highly novel model designs may require you to drop down to the underlying framework.

LangChain

LangChain is an open-source framework that helps you build applications powered by large language models, providing core components for prompt management, chains, memory, and agent orchestration. It acts as an abstraction layer to easily connect LLMs to external data sources and computational tools. With over 120,000 GitHub stars, the framework has become essential infrastructure for the AI agent revolution.

LangChain is most commonly used for building conversational AI and chatbots with memory and context, retrieval-augmented generation (RAG) systems for enterprise knowledge bases, and multi-agent systems with autonomous workflows.

If you want to go beyond the basics, read our LangChain Python Tutorial: A Complete Guide for 2026. It takes a deeper look at what LangChain offers and walks through real-world use cases for building AI agents in Python.

Advantages of LangChain

  • Comprehensive LLM orchestration: Handles everything from prompt management to chains, memory, and tool use, giving you a complete infrastructure for LLM applications in one package.
  • Provider-agnostic: Works seamlessly with OpenAI, Anthropic, Hugging Face, and local models, letting you avoid vendor lock-in and switch providers as your needs change.
  • Rich agent capabilities: LangGraph enables complex, stateful workflows with human-in-the-loop patterns, supporting sophisticated agentic behaviors.
  • Production-ready tooling: LangSmith provides monitoring, debugging, and tracing specifically designed for LLM applications, addressing the unique challenges you’ll face in production.

Disadvantages of LangChain

  • Learning curve for abstractions: LangChain Expression Language (LCEL) and framework-specific concepts take time to master, especially if you’re new to LLM orchestration.
  • Abstraction overhead: Additional layers between you and LLM APIs can sometimes obscure what’s happening, making debugging more challenging.
  • Fast-moving target: Frequent updates mean your code can become outdated quickly, requiring ongoing maintenance to stay current.

Hugging Face

Hugging Face is an open-source platform and library ecosystem for natural language processing and machine learning, with over one million models and 250,000 datasets to power your next project. It’s become a central hub for the AI community, with its Transformers library earning 150,000+ GitHub stars.

The platform is particularly effective at accessing and fine-tuning pre-trained transformer models like BERT, GPT, and Llama, building NLP applications without training models from scratch, and sharing and deploying custom models to the community.

For a practical example, read A Practical Guide to Fine-Tuning and Deploying GPT Models Using Hugging Face Transformers. It walks through using a pre-trained GPT model, fine-tuning it on custom data, and deploying the result with FastAPI.

Advantages of Hugging Face

  • Massive model repository: With hundreds of thousands of pre-trained models available, you rarely need to train from scratch. Models for virtually every task and language are ready for you to use.
  • Transformers library dominance: This is the de facto standard for NLP, computer vision, and multimodal models, with support for the latest architectures as soon as they’re published.
  • Framework interoperability: Models work with PyTorch, TensorFlow, and JAX, giving you maximum flexibility in your development workflow.
  • Inference infrastructure: Hosted inference APIs and Spaces make deployment straightforward without managing your own infrastructure.

Disadvantages of Hugging Face

  • Dependency complexity: The large dependency tree can lead to version conflicts and package management challenges, especially in complex environments.
  • Model quality variance: Community-contributed models vary in quality and may not be production-ready without thorough vetting and testing on your part.
  • Platform dependency: Heavy reliance on Hugging Face Hub creates some platform lock-in, though you can download models and host them independently.

XGBoost

XGBoost is an optimized gradient boosting library designed for speed and performance on structured data. The algorithm continues to dominate machine learning competitions alongside other gradient-boosted decision tree libraries, earning its reputation through battle-tested performance on real-world problems.

You can use the framework for predictive modeling on structured business data, including sales forecasting, risk assessment, and feature importance analysis for model interpretability. Its gradient-boosting approach achieves outstanding precision on structured data, powering reliable insights for business applications.

Advantages of XGBoost

  • Superior accuracy on tabular data: XGBoost consistently outperforms deep learning on structured datasets, making it your default choice for business analytics and forecasting.
  • Built-in regularization: L1 and L2 regularization prevents overfitting better than basic gradient boosting, producing more robust models for your production systems.
  • Efficient computation: Handles large datasets efficiently with parallel processing and intelligent tree pruning, making it practical for production use.
  • Missing value handling: Automatically learns optimal strategies for missing data, reducing your preprocessing burden.
  • Feature importance scores: Built-in interpretability helps you understand model decisions, crucial for business applications and regulatory compliance.

Disadvantages of XGBoost

  • Not suitable for unstructured data: Images, text, and audio require deep learning approaches. XGBoost is designed specifically for tabular data.
  • Hyperparameter complexity: There are many parameters to tune for optimal performance, though tools like Optuna can automate this process for you.
  • Limited interpretability compared with simple models: While more explainable than deep neural networks, XGBoost’s ensemble structure is harder to interpret than linear or rule-based models, even with feature importance and SHAP analysis.

How to choose an AI framework

Selecting the best AI framework depends on your specific project characteristics, but in practice, the choice is rarely binary. Many successful teams use multiple frameworks together. A common and effective pattern is to use scikit-learn for preprocessing and feature engineering, PyTorch for research and model development, TensorFlow for production deployment, and LangChain for LLM-powered features.

Your decision will likely come down to data type, team expertise, and where your model needs to run. Use this table as a starting point:

Decision factorSuitable Frameworks
By modeling approach and prediction type
Single-value or label prediction (regression or classification using classical ML)scikit-learn, XGBoost
Image and video modeling with neural networksTensorFlow, PyTorch, Keras
Text and NLP with transformer modelsHugging Face, PyTorch, TensorFlow
LLM-powered and agent-based applicationsLangChain, Hugging Face
By level of abstraction and control required
High-level APIs and rapid iterationKeras, scikit-learn
Fine-grained control over training and architecturesPyTorch, TensorFlow
Research-driven experimentation and custom workflowsPyTorch
Managed LLM orchestration and toolingLangChain
By deployment target
Production at scaleTensorFlow
Research/ExperimentationPyTorch
Mobile/EdgeTensorFlow Lite
Web applicationsTensorFlow.js
LLM applicationsLangChain
By task and project objective
Classical prediction and forecasting systemsscikit-learn, XGBoost
Neural network–based modellingTensorFlow, PyTorch, Keras
Building and training novel architecturesPyTorch
Scalable production deploymentTensorFlow
LLM-powered features and workflowsLangChain, Hugging Face

If your choice comes down to PyTorch or TensorFlow, read our dedicated PyTorch vs. TensorFlow: Choosing the Right Framework in 2026 guide, where we compare learning curves, deployment options, and use cases to help you choose the right deep learning framework.

Read the whole story
alvinashcraft
6 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

When Context Collapses: Teaching Agents to Detect and Recover from Lost Memory

1 Share

This is the eighth article in a series on agentic engineering and AI-driven development. Read part one here, part two here, part three here, part four here, part five here, part six here, and part seven here.

“640K ought to be enough for anybody.”—Bill Gates (allegedly)

If you’re building AI agents that do complex, multistep work, you’re going to run into context loss. The agent’s working memory fills up, older information gets silently dropped or compressed, and the agent keeps going without realizing it’s forgotten something. This article, the third in my Radar article trilogy about context management, walks through a pattern I’ve been refining for detecting and recovering from that problem, which I call the externalize-recognize-rehydrate pattern (or ERR, which I think is actually a pretty good acronym for an error recovery pattern): save your agent’s state to files on disk, detect when context has degraded, and reload from those files to recover. The individual techniques are standard practice in agent and skill engineering—checkpointing, progress files, state verification—but the real power comes from combining them into a coherent workflow that you can use live or build into your agents. I’ll walk through each step with specific prompts you can adapt for your own agents and coding sessions.

Which brings me to memory. Gates has said on multiple occasions that he never actually said that quote at the top of this article, but it endures because it captures one of the core limitations of that era, one that people struggled with constantly, in a way that we can laugh about now. Around that time I was using a 286 with 1 MB of RAM. That’s megabytes, not gigabytes. MS-DOS 3.3 gave me 640K of conventional memory plus 384K of upper memory, and I spent a lot of time figuring out how to use every bit of it. I configured memory managers, loaded device drivers high, used (and wrote!) terminate-and-stay-resident programs that moved themselves out of conventional memory to free up space, and generally treated memory as a resource that required active, deliberate engineering. There was a lot I wanted to do that didn’t fit into 640K, and like most people at the time, I went to some lengths to compensate for the memory limitations.

We’re at the 640K stage of AI development. The context window is the new RAM ceiling. Most of today’s models give you somewhere between 200K and 2M tokens of working memory (and, like memory in the late 1980s and early 1990s, those numbers are growing all the time), and if you’re building agents that do complex multistep work, you will hit that ceiling. When you do, the AI starts compacting: compressing or dropping older parts of the conversation to make room. And just like running out of conventional memory on a 286, things stop working right and you’re not sure why.

In 20 years we’ll be looking back at today’s puny context windows and wondering how developers in the 2020s managed to get anything done with just a few million tokens. Because none of this is new. In case you don’t believe me, here’s a photo of my dad at Princeton in the early 1970s working on an Evans and Sutherland LDS-1 graphics computer, the first commercial vector graphics machine, connected to a PDP-10 mainframe:

Keep on truckin

The actual LDS-1 is in the large cabinet in the background, directly behind the monitor. Sitting next to it, just out of the picture, is an even larger cabinet that holds a memory unit with 16K of magnetic core memory (technically 8K words).

So you can imagine that just a decade later, 640K in a tiny PC that fit on your desktop seemed extravagant.

In the last two articles in this series (“Why Doesn’t Anyone Teach Developers About Context Management?” and “Your AI Agent Already Forgot Half of What You Told It”), I talked about what context is and why context management matters, and I shared practical techniques and prompts for keeping important information in files instead of leaving it in the AI’s context window. This article gets more technical. I want to build on those strategies and talk about how to build agents that can detect when they’ve lost context and recover from it on their own.

Brute-forcing my way through context loss

I’ve been doing this kind of context management for a while now, long before the specific tools I’m about to describe existed. But a recent crash gave me a clean example of what the process looks like in its most brute-force form.

I was working in Copilot with a seven-step plan, going through it one step at a time, having another AI review each step before moving on. Steps one and two went fine. When it came time to do step three and I gave it the prompt, it jumped straight to step four. This kind of thing can be really frustrating, because it seems like an AI smart enough to implement a complex feature in code should be able to (ahem) count to four.

The key to not getting frustrated when the AI loses track of steps or can’t seem to count from prompt to prompt is to remember what it’s good at and how it remembers things. If the AI you’re using does that, check the conversation history. You’ll probably see something like “summarizing conversation history” or “compacting conversation” somewhere above your last message. That’s telling you that the AI lost track of where it was because that count was literally purged from its memory.

AIs are good at carrying out an instruction. They’re bad at keeping track of their own state over a long conversation, and the way they manage their memory is a big part of that. This article is about finding ways to build your AI tools so you’re not relying on them to do the thing they’re worst at.

But compaction isn’t the only way your AI loses context. A few weeks ago I was deep into a long session with Copilot, working through a multiphase code review. I’d spent a while building up context with the AI about my codebase and the decisions we’d made together. I was about to move on to the next phase, and then I got this:

Phase B

The entire context was wiped, which could have been a really frustrating problem, since I had a long history with the session, and it had built up a lot of knowledge about what we were doing. This turned out to be a bug in Opus 4.6’s interaction with Copilot’s conversation history, and I’ve seen other people hit the same thing. I was staring at a fresh prompt with nothing in it.

So I did something that, in retrospect, is a pretty good brute-force version of what this whole article is about. I recognized the context was gone (hard to miss when the whole conversation disappears). I copied the entire conversation out of Copilot and pasted it into a text file. Then I gave the new session a prompt:

We were in the middle of a long conversation, then I got an error and the entire context was wiped. I saved a copy of the conversation in #file:chat_history.txt, read it and bring yourself back up to speed.

And it worked! This brought the new session back to where I needed it to be.

That simple error and recovery actually outlines a pretty good pattern for dealing with context loss:

  1. Externalize the state. Get the important information out of the conversation and into a file on disk, where it won’t disappear when the context window reshuffles.
  2. Recognize the loss. Notice that the agent’s working context has been wiped or degraded, whether that’s obvious (like a crash) or subtle (like output that quietly stops making sense).
  3. Rehydrate from the file. Point a new session at that file and let it rebuild its understanding from what’s written down.

The individual mechanics are well-documented across cognitive science (cognitive offloading, task resumption), software engineering (the Memento pattern, React hydration), and knowledge management (the SECI model). I’m not claiming to have invented any of them. But the specific abstraction of these three phases into a unified, named pattern applied to AI context management is, as far as I can tell, new. It’s synthesis and codification, not invention.

In this case I did it with copy and paste, which isn’t particularly elegant, but it worked for me. But this is a blunt instrument, because a raw conversation dump is both too much and too little: it’s too much because it’s full of noise, like tool calls, dead ends, back-and-forth that doesn’t matter anymore; and it’s too little because the context that got silently compressed away during the session is already gone. When you build these mechanisms into agents and skills, you can do it in a much more subtle and automated way.

Externalize: Add two layers of state to your agent

The idea behind externalization, or periodically saving your agent’s state, came out of a conversation I was having with an AI assistant while building the Quality Playbook, an open source AI coding skill that runs structured code reviews. The playbook runs a structured code review as a single process, but that process could easily turn into a 15-million-token request if you tried to do it all in one shot. I described in the previous article in this series how I broke it into six phases, and that was only possible because the context for each phase had already been externalized. Each phase reads its inputs from files, does its work, writes its outputs to files, and stops. The next phase picks up from the files, not from whatever the agent remembers. If this sounds like the familiar advice to ask the AI to plan before you ask it to implement, it’s the same principle applied to context management. Separating each step and persisting the output means you can inspect it, and the next step doesn’t depend on the agent’s memory.

But what should those files contain? I found that the AI is actually good at figuring that out. At some point I asked the assistant:

Would it make sense for the agent to record more context in files as it progresses, to make sure nothing is dropped along the way? It should work even if you break it into separate prompts, because the result from each step is persisted. Plus, we can audit its reasoning for debugging and improvement.

That prompt was all it took. The assistant designed the file structure itself: a progress tracker that records which phase is active and what’s been completed, a JSONL artifact file (JSONL is just a file with a bundle of JSON objects, with one record per line) where each pass appends its output, and a set of brief documents describing the purpose of each phase. You don’t need to overengineer this. Tell the agent what you’re trying to preserve and let it figure out the file layout.

What emerged falls into two categories that I think of as execution continuity and task continuity:

  • Execution continuity is the state the agent needs to resume work in the middle of a task: what step it’s on, what it’s completed, what decisions it’s made so far. These files change constantly as the agent works.
  • Task continuity is the broader context that doesn’t change during execution: what the whole task is about, what success looks like, what the structural constraints are. These files are written once and read at every resumption.

When an agent needs to resume after suspected compaction, it reads back both layers. The task continuity files anchor it back to what the whole endeavor is about. The execution continuity files put it back in the middle of the work. Together, they give the agent enough information to continue without relying on anything that might have been compacted.

The key is that externalization isn’t something you do once at the beginning of a task. You want the agent saving its state at frequent checkpoints so that if compaction happens mid-run, the most recent checkpoint is close to where the agent was working. Here’s the kind of instruction I gave the agent for tasks that processed records one at a time:

Update the progress file after every single record, not in batches. Write the output line first, then update the progress file with the new cursor and a fresh timestamp. If the progress file’s timestamp falls behind the output file’s, you’re batching and that’s wrong.

The frequency matters because context can compact at any point. If the agent only saves state at the end of a long run, compaction in the middle means losing everything since the start. If it checkpoints after every unit of work, the worst case is losing one unit.

Two-layer externalization survives context reshaping, not only outright context loss. Even if the agent’s context window isn’t full, if the context has been reorganized or reprioritized (a compression that reshapes without truncating), the agent can reload the external files and know for certain what the ground truth is.

Recognize: Detecting loss from inside the agent

The second step in the pattern is to recognize that your agent has lost context, and it turns out to be the hardest part (at least with today’s AI technology). When the context window fills up, the AI compacts silently, and the agent keeps working without realizing it’s lost information. The agent can’t tell you it’s forgotten something, because it doesn’t know it forgot. Detecting that change turns out to be a nontrivial problem; I’ll walk you through an approach that helped me, and keep it general enough so you can do the same thing. The copy-and-paste approach works when the context loss is obvious, like a crash that wipes your whole conversation. But most context loss isn’t that visible.

I described context compaction in the previous article, but it’s worth restating the core problem from the agent’s perspective. Different tools handle context overflow differently: Some truncate older messages; some compress conversations into summaries; some use a sliding window. But they all have the same effect. Information disappears from the agent’s working context, and the agent doesn’t get notified.

This was a challenge when I built the Quality Playbook, because it runs multiple passes over a codebase, each one reading source files, extracting requirements, and checking coverage. Each pass can involve enough work that it fills the context window multiple times over. And when context compacts mid-pass, the agent doesn’t know it happened. It keeps working, but the output starts silently degrading. So I started building mechanisms for the agent to detect compaction and recover by reading back the files it had written earlier. The patterns that came out of that work are general enough to apply to anyone building agents that need to survive context pressure.

From the agent’s perspective, compaction is seamless. It’s tracking state, referencing decisions made earlier in the conversation, and then at some point the earlier context is gone. But the agent can’t tell the difference between “I never knew that” and “I knew it but lost it.” It tries to reference something and finds nothing, or finds a compressed version that lost the nuance. And because the agent doesn’t know it lost anything, it doesn’t know it needs to recover.

This invisibility is the core problem. But it turns out you can work around it, and the next two sections walk through how.

Building a detection mechanism

Once you have files on disk, the question is what specifically to check and how to know when something has gone wrong. I landed on a mechanism while building the Quality Playbook’s requirement extraction pipeline. The playbook processes source documents in multiple passes, and each pass appends its output to a JSONL artifact file. After each unit of work, the agent also writes a progress record to a separate file: what it just finished, what it found, and where it should pick up next.

The detection mechanism comes from two rules I gave the agent. The idea is that the progress file tracks a cursor, which is just a position marker that tells the agent which record to process next. If the agent writes a record to the output file but then loses context before updating the progress file, those two files will be out of sync.

The agent didn’t need to understand any of that upfront; I just described the rules in plain language and let it figure out the implementation. The first rule establishes an invariant between the output file and the progress file:

Cursor advances only after the line is on disk. Write the summary line to the output file first, then update the progress file. The cursor must always equal the index of the next record that still needs to be processed.

The second rule told the agent how to check that invariant on startup:

On startup, read the progress file. Resume from its cursor value. Verify continuity: the last line in the output file should equal cursor minus one. If not, roll the cursor back to match disk state and report the discrepancy.

If the progress file says the cursor is at record 381, but the last line in the output file is record 379, something happened. The context compacted and the agent lost track of where it was. The divergence between the two files is the signal.

This worked because files on disk don’t change when context compacts. They’re written once and then read repeatedly. If what the agent thinks it knows doesn’t match what’s actually in the files, something shifted in the agent’s memory, not on disk. I ended up folding this check into a preamble that every session started with:

If this session has experienced auto-compaction, re-read the pass specification from disk. Do not try to reconstruct it from the compacted summary. Read the progress file. Read the last record of the JSONL artifact and confirm its index equals the cursor minus one. If not, roll the cursor back to match disk state. Disk is the source of truth. The conversation is not.

That preamble ran at the top of every session. During one particularly intensive day of pipeline development, I ran over a hundred Claude Code sessions with that exact instruction. Most of them completed without hitting compaction. But the ones that did hit it recovered cleanly, because the preamble told the agent exactly what to check and exactly what to do when the check failed.

The specific prompts I used are tied to the Quality Playbook’s file structure, but the technique generalizes. If you’re building any agent that does multistep work, you can adapt the same approach. Here’s a version you could drop into a session preamble or an agent’s system prompt:

Before continuing any task, read your progress file and your most recent output file. Compare them: does the progress file say you’ve completed work that isn’t reflected in the output? If so, trust the output file, roll back your progress to match, and note the discrepancy. Do not rely on what you remember from the conversation. The files on disk are the source of truth.

The wording doesn’t have to be precise. What matters is the structure: tell the agent where to look, what to compare, and which source to trust when they disagree.

But didn’t you just say the AI can’t detect its own compaction?

Right, and it can’t. What I described above isn’t the agent detecting compaction. It’s the agent running a deterministic check against files on disk and finding a discrepancy. The agent doesn’t need to know that compaction happened. It just needs to notice that two files disagree. Think of the agent as an amnesiac clerk. You don’t ask the clerk to remember what they did yesterday. You make the clerk check the physical ledger every time they sit down at the desk. If their notes disagree with the ledger, they’re trained to trust the ledger.

If you saw Christopher Nolan’s breakout movie Memento, you can think of your agent as Leonard Shelby, the character played by Guy Pearce with anterograde amnesia. You couldn’t ask Leonard to remember what he did yesterday. He had to check his tattoos every time he woke up. If his tattoos disagreed with what he’s seeing, he trusts the tattoo (which leads to a major plot point, which I won’t spoil). Again, this isn’t a new idea either. I mentioned the Memento pattern earlier, which is literally named after this movie.

This is a classic distributed systems technique. In double-entry bookkeeping, you maintain two independent records of the same transaction and reconcile them regularly. If they disagree, you investigate. You don’t need to know why they diverged; the divergence itself is the signal. A two-phase commit works the same way: write the data first, then update the record that says the data was written. If you find data without a matching record, or a record without matching data, something went wrong between the two phases.

That’s exactly what the cursor invariant does. The agent writes the output line first, then updates the progress file. If those two files are out of sync, something happened between the two writes. The agent doesn’t detect compaction. It detects a broken invariant, and it’s been told that when the invariant breaks, the files on disk win.

Three things make this work. First, the check is purely deterministic: read two files, compare two numbers, act on the result. There’s no reasoning involved, no judgment call about whether the agent “feels” like it lost context. I wrote about this principle in “Keep Deterministic Work Deterministic”; you never want an AI making decisions that a file comparison can make for it. Second, the files on disk don’t change when context compacts. They’re the stable reference point that the agent’s memory gets checked against. Third, the instruction to run the check lives in the system prompt or preamble, which is generally preserved even when conversation context gets compacted. The check survives the thing it’s designed to detect.

Rehydrate: Reading back the state

Rehydration is the process of reading back externalized state and rebuilding the agent’s working context. Once the agent detects compaction (or, more specifically and accurately, has enough evidence from the filesystem that compaction occurred), the recovery step is to read back the externalized files and rebuild. For the Quality Playbook, rehydration meant:

  1. Read the phase brief to re-anchor the purpose of this pass
  2. Read the progress file to know which unit is active and what’s been completed
  3. Read the tail of the JSONL artifact to confirm the last successfully written record
  4. Recompute the next unit of work from those files

This is different from just continuing without detection. Without detection, the agent tries to pick up where it left off and hopes it still has enough context. With detection, the agent knows something happened and deliberately reloads state before continuing.

You can make the rehydration process itself auditable. Instead of silently reading the files and resuming, have the agent write down what it learned:

Read the progress file and the JSONL artifact. Write a summary of what you learned: what pass is running, what unit is active, what the cursor position is, and how many requirements have been extracted so far. Then continue from there.

Writing a rehydration summary serves two purposes. It gives you visibility into what the agent understood and whether it rehydrated correctly. And it forces the agent to process the external files explicitly rather than just loading them into context. Explicit processing is more reliable than silent loading because the agent has to commit to an interpretation, and you can read that interpretation and catch mistakes.

You can adapt this approach to any agent workflow where work happens in steps. The specific files and cursor values are particular to my pipeline, but the underlying technique is general: have the agent write its progress to a file after each step, and check that file against its output at the start of every session. And this advice isn’t just for writing agents or skills. Even in a live session with Claude Code, Cursor, or Copilot, you can tell the agent to periodically write a summary of what it’s done and what it plans to do next to a file on disk. If the session crashes or the context gets long enough to compact, you can point a new session at that file and pick up where you left off. The key is getting the state out of the conversation and onto disk before you need it.

Context management is an architectural concern

Every technique I’ve described in these articles comes down to the same principle: Important information shouldn’t live only in the agent’s context window. The previous articles covered how to put that information on disk. This one covers how to make the agent aware of its own limitations so it can recover when context pressure gets too high.

An agent that can detect its own degradation and correct for it is fundamentally more reliable than one that just keeps going. When the agent knows how to stop, check itself against ground truth, and reload what it lost, context pressure becomes a recoverable event instead of a slow, silent failure.

This concludes my mini-series trilogy of articles about context management. The first article in this series was about understanding what context is and why it disappears. The second was about getting important information out of the conversation and onto disk before you need it. This one is about closing the loop: making the agent aware of its own limitations so it can detect degradation and recover from it. Together, they add up to treating context as an engineering problem rather than something you hope works out.

These are still early days. Context windows will get larger, compaction will get smarter, and some of the workarounds in this article will eventually be unnecessary. But the underlying principle won’t change: If your agent’s ability to do its job depends on information, that information needs to live somewhere more durable than working memory. That was true for my dad’s 32KB core memory at Princeton, it was true for my 640K of conventional RAM, and it’s true for today’s 200K-token context windows.

The Quality Playbook and Octobatch are open source projects where these techniques are used in production. Both are built using AI-driven development and available for exploration if you want to see how this looks in practice.


Disclosure: Aspects of the approach described in this article are the subject of US Provisional Patent Application No. 64/044,178, filed April 20, 2026, by the author. The open source Quality Playbook project (Apache 2.0) includes a patent grant to users of that project under the terms of the Apache 2.0 license.



Read the whole story
alvinashcraft
6 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Contribute to the State of PHP Survey

1 Share

Together with The PHP Foundation, we’re embarking on an exciting expedition – the first annual State of PHP census that aims to tally the elePHPant population across the world. Come join us!

With the help of the community, we’ve designed a comprehensive survey that aims to unearth the PHP developer trends – from years of experience and favorite frameworks to AI preferences and beyond. Our goal is to draw the most representative picture of the PHP ecosystem and the developers behind it, and share insights that will benefit the community and help shape the future direction of PHP.

For this to be a success, we need your help. Our survey will only be truly representative if PHP developers of all backgrounds participate, regardless of where you live, how long you’ve been developing in PHP, what frameworks you prefer, or the tools you use. So take part in the State of PHP survey now and help us map the herd!

Survey participants can enter a drawing to win one of five EUR 500 (or equivalent in local currency) vouchers, which they can redeem for any prize they want via Tremendous (subject to the applicable terms and conditions).

State of PHP report

You can expect to read the travel logs from this journey in October 2026. We will publish the aggregated results in the State of PHP 2026 report, so be sure to follow The PHP Foundation and PhpStorm on X or sign up for our newsletter below to be among the first to see the results.

Thank you for being part of the first edition of the State of PHP!

The PHP Foundation and PhpStorm teams

Read the whole story
alvinashcraft
7 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Big Thinkers: Werner Vogels – The Operational Philosophy Behind AWS and Hyperscale Cloud Architecture

1 Share
Modern cloud computing did not emerge simply because virtualization improved or because data centers became larger. It emerged because a generation of engineers learned how…
Read the whole story
alvinashcraft
7 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Anthropic apologizes for invisible Claude Fable guardrails

1 Share

Anthropic has apologized for stealthily throttling its new AI model, Claude Fable 5, with hidden guardrails that undermine both researchers and rivals using it to develop competing systems. The company says it is reversing course and will be more transparent about when the restrictions kick in, even if that means Fable refuses more queries.

Fable is the first widely available model in Anthropic's Mythos class of AI systems, a group the company has spent months warning are too dangerous for public release. Anthropic says it has addressed some of those risks by launching Fable with safeguards that prevent it from responding to certain "high-r …

Read the full story at The Verge.

Read the whole story
alvinashcraft
8 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories