Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
155294 stories
·
33 followers

The AI Agents Stack (2026 Edition)

1 Share

The following article originally appeared on Paolo Perrone’s The AI Engineer Substack and is being reposted here with the author’s permission.

Your team picks LangGraph for a customer support chatbot. Three weeks in, you’ve got 14 nodes in a state graph, a custom checkpointer writing to Redis, and retry logic for tool calls that fail once a week. The agent answers refund questions. It calls one API. A 50-line script on the OpenAI SDK with two MCP servers would have done the same thing. But nobody mapped which layers the problem actually needed.

In November 2024, Letta published an AI agents stack diagram that became the default reference for half the engineering teams I talk to. If you’ve seen a “layers of an agent” visual on LinkedIn or pinned in a Slack channel, it probably traces back to that article.

That diagram is 14 months old now, and a lot has changed since. MCP didn’t exist yet. Memory was still treated as a subset of your vector database. Nobody was shipping provider-native agent SDKs. Eval wasn’t even on the map. The stack has six layers in 2026, and at least three of them didn’t exist as distinct categories when Letta drew the original.

So we drew it from scratch. This is the 2026 version.

The minimum viable agent stack in 2026

TL;DR

That’s the starting stack. Add complexity when something specific breaks, not before.

What are we even mapping?

Before the stack, there was a loop. In “What Is an AI Agent?,” we defined an agent as the think-act-observe cycle: The model reasons about a task, takes an action (calls a tool, writes to memory), observes the result, and loops until the task is done. That loop is the atomic unit. Everything in this issue is infrastructure that makes that loop work reliably, at scale, in production.

The agent stack is not the LLM stack. A chatbot needs inference and maybe RAG. An agent needs state management across multistep execution, tool access governed by protocols, memory that persists across sessions, autonomous reasoning loops, and guardrails that constrain behavior in real time. That’s a fundamentally different set of infrastructure problems.

We’re mapping the six layers between your LLM and a production agent. We’re not covering training infrastructure, data pipelines, or model fine-tuning. Those are adjacent stacks. We covered RAG in depth in Issue #5. Today we’re zooming out to show where RAG fits in the bigger picture.

Three things redrew the map between 2024 and 2026. MCP standardized tool connectivity, and the entire tools layer is new because of it. Reasoning models changed what agents can do autonomously, with single-call agents replacing some multistep chains. And memory became a first-class architectural primitive, not an afterthought bolted onto a vector database.

How to evaluate each layer

When choosing tools at each layer, ask three questions. How much state do you need to manage? A stateless tool caller and a multi-session agent that learns over time are different engineering problems, and the layers where state management is hardest (memory, frameworks) are where most teams get stuck. How much vendor lock-in can you tolerate? MCP is an open standard, provider SDKs are not, and every tool choice either increases or decreases how painful your next migration will be. And how hard is it to go from demo to production? Some layers (model serving) have almost no gap, while others (eval, guardrails) have a massive one. The layer where you feel that gap most is the one to invest in first.

We take each layer from the bottom up, starting with the most stable and ending with the least mature.

Layer 1: Models and inference

How you run the model that powers your agent: call an API, use a managed open weight provider, or self-host.

Models & inference: key players

The inference layer changed more in tone than in substance. Reasoning models like o1, o3, DeepSeek R1, and Claude with extended thinking shifted what agents can plan and execute. Agents that previously needed multistep chains can now solve problems in a single reasoning call. Open weight models like Llama 3.3, DeepSeek V3, and Qwen 2.5 closed the quality gap dramatically, so “always use the biggest closed model” is no longer default advice. The emerging pattern is to prototype on closed source and deploy on open weight.

The honest take: This layer is commoditizing. Model differences matter less each quarter. The real decision is the cost and latency trade-off, not which model is “smartest.”

On the evaluation side, API calls are stateless. Send a request, get a response. Nothing to manage. Lock-in risk runs high for closed APIs because each model reasons differently, so switching providers means retuning prompts, adjusting for different failure modes, and retesting your eval suite. It’s low for open weight, where you can swap the model and keep the infra. The prototype-to-production gap is the smallest of any layer. Your demo API call is the same as your production API call.

Self-host when your agent call volume makes API pricing untenable or when you need sub-100ms latency that API round-trips can’t deliver.

Layer 2: Protocols and tools

How your agent calls external tools and APIs: through MCP servers, browser automation, or agent-to-agent protocols.

Protocols & tools: key players

This layer didn’t exist as a distinct category in 2024. Every framework had its own JSON schema for tool definitions. Now MCP is the standard, with 97M monthly SDK downloads, adoption by OpenAI, Google, and Microsoft, and a donation to the Linux Foundation.

Browser Use exploded in parallel, hitting 78K GitHub stars in under a year. Nobody was shipping browser agents in production in 2024. And agents can now talk to other agents. IBM launched ACP, and Google launched A2A. Neither is standard yet, but the problem they solve (agents coordinating with other agents) is real and growing.

Security is the open problem. Endor Labs analyzed 2,614 MCP servers and found 82% prone to path traversal and 67% to code injection.

The honest take: The protocol debate is over. MCP won. The only question left is how you lock down your MCP servers before someone exploits them.

State management is nonexistent here. Your agent calls a tool, gets a response, done. No session, no memory between calls. Lock-in risk is low because MCP is an open standard, so if you build MCP servers, any MCP-compatible agent can use them. The prototype-to-production gap is medium. Your demo MCP server works until someone sends a malicious tool description. Security and governance are the gap.

MCP standardized how agents use tools. It says nothing about how agents talk to each other. ACP and A2A are trying to solve that, but neither has reached critical mass. If you need multi-agent coordination today, you’re building it yourself at the framework layer. We covered MCP in depth in Issue #4.

Layer 3: Memory and knowledge

How your agent stores and retrieves what it knows: in-context state, vector search, or persistent memory across sessions.

Memory & knowledge: key players

All three tiers feed into the same place: The context window your agent sees on every call.

In 2024, memory meant “pick a vector database and do RAG.” In 2026, memory is a first-class architectural primitive with three distinct tiers. Context windows got massive. Gemini hit 1M+ tokens, Claude 200K. Bigger windows didn’t kill the need for memory. They changed the trade-off: What do you stuff in-context versus what do you retrieve on demand?

“Context engineering” replaced “prompt engineering” as the core discipline. Instead of writing a better prompt, you architect what information the agent sees on every call. Memory blocks appeared as named, structured fields in the context window that the agent can read and overwrite every turn. Instead of dumping everything into the system prompt, the agent manages its own state: what to keep, what to update, what to drop.

On the infrastructure side, pgvector became the default for teams that don’t need a dedicated vector database. It’s just Postgres with an extension. GraphRAG emerged as a second retrieval option: follow relationships between entities instead of matching embeddings, with Neo4j leading this space. Sleep-time compute, where agents process information during idle time, is research stage but signals where tier 3 is heading.

The honest take: Most teams overcomplicate memory. Start with conversation history in Postgres and a structured system prompt. Add vector search when your history exceeds context limits. Add agentic memory management only when your agent needs to learn across sessions.

This IS the state layer. You’re deciding what your agent remembers, how it retrieves it, and when it forgets. Highest complexity in the stack. Lock-in risk is medium. pgvector is portable because it’s just Postgres, while specialized tools like Mem0 or Zep are harder to migrate away from. The prototype-to-production gap is large. Demo memory works because context windows are big enough. Production memory breaks when conversations get long and your agent starts forgetting the important parts.

In-context memory breaks down when agents need to share memory across instances or maintain state across model provider switches. That’s where dedicated memory infrastructure like Letta, Zep, and Mem0 earns its keep.

Layer 4: Frameworks and SDKs

How you wire together the model calls, tool use, and control flow that make your agent work: a provider’s built-in toolkit (SDK), a graph-based framework like LangGraph, or raw code.

Frameworks & SDKs: key players

Every major AI lab now ships its own agent SDK. OpenAI has the Agents SDK (evolved from Swarm). Google released ADK. Microsoft has Semantic Kernel and AutoGen. Hugging Face built smolagents. Two years ago, LangChain was the only game. Now you pick between three camps: provider SDKs that are fast to start but locked to one model, graph-based frameworks like LangGraph that are portable but require more setup, or no framework at all. That choice didn’t exist in 2024.

LangGraph solidified as the graph-based orchestration leader with v1.0 released October 2025 and production deployments at Uber, JPMorgan, LinkedIn, and Klarna. LangChain agents are now built on LangGraph under the hood. Meanwhile, the “build it yourself” camp grew. Teams that tried LangChain in 2024 and fought the abstraction are now writing thin wrappers over provider APIs + MCP. No framework means full control. This works until your agent needs state management or complex branching.

A quick note on naming: “LangChain” and “LangGraph” are not the same thing. LangChain is the integration layer handling model connectors, tool calling, and prompt templates. LangGraph is the orchestration engine managing state, control flow, and graphs. Most production teams use both together, but LangGraph is where the agent logic lives.

The honest take: Most teams pick too much framework. If your agent calls a model and a few tools, you don’t need LangGraph. A provider SDK and a couple of tool calls will get you to production faster than any graph.

Provider SDKs manage state for you. LangGraph makes you define every state transition explicitly. Build-it-yourself means you roll your own. Lock-in risk is the highest in the stack. Your orchestration code doesn’t port. A LangGraph agent rewritten for CrewAI is a new codebase. Provider SDKs are worse because you’re locked to one model too. The prototype-to-production gap is large. Demo works because nothing goes wrong. Production means handling tool failures, retries, timeouts, and humans who need to approve before the agent acts.

The framework you pick determines your migration cost. Provider SDKs are fastest to start but lock you to one model. LangGraph is portable but complex. Building your own gives you full control until your agent outgrows your wrapper. MCP is the one layer that transfers across all three camps.

Layer 5: Eval and observability

How you measure whether your agent is doing its job: tracing runs, scoring outputs, and catching regressions before users do.

Eval & observability: key players

This layer barely existed in 2024. Now it’s the gap. LangChain’s State of Agent Engineering survey found 89% of teams with production agents have implemented observability, but only 52% have evals. That 37-point gap is where production quality dies.

“Evaluation as infrastructure” is converging on three tiers: fast checks on every PR (Did the agent call the right tools?), nightly regression suites that use an LLM to judge output quality, and continuous production monitoring that alerts when agent performance drifts. New agent-specific benchmarks have emerged too, including Context-Bench for memory management, Recovery-Bench for error recovery, and Terminal-Bench for coding agents.

The honest take: Most teams skip eval until something breaks in production. By then they’re debugging blind. The teams that don’t have this problem built evals before they deployed.

State management matters here because your agent runs 12 steps, step 3 picked the wrong tool, and steps 4–12 were doomed from there. If your eval only checks the final output, you’ll never know why. Lock-in risk is moderate. Most tools export OpenTelemetry traces, so switching observability providers is doable, but switching eval frameworks means rebuilding your test suites. The prototype-to-production gap is the biggest of any layer. Most prototypes have zero eval. You don’t feel the pain until production users find the failures for you.

Current eval tools are strongest for single-turn and tool-calling evaluation. Multi-agent evaluation, long-horizon task assessment, and evaluating agents that learn over time are all unsolved problems. If your agent does any of those, you’ll need custom eval infrastructure beyond what the platforms offer today.

Layer 6: Guardrails and safety

How you stop your agent from doing things it shouldn’t: filtering inputs, authorizing tool calls, and validating outputs.

Guardrails & safety: key players

Agent guardrails became a separate discipline from LLM guardrails. In 2024, guardrails meant input/output filters on a model. In 2026, your agent calls tools, spends money, and takes actions. Guardrails now means authorizing tool calls, enforcing rate limits, and validating what the agent actually did.

The “guardrails before action” pattern emerged from teams that learned the hard way. They now enforce authorization at the tool execution layer, not the output layer. By the time you filter the response, the agent already sent the email. OWASP published the MCP Top 10 (beta), which is the first real security checklist for tool-connected agents. Deployment is still DIY. LangGraph Cloud and Bedrock Agents exist, but most production teams are still deploying with FastAPI and their own infra. This layer is where you’ll spend the most unplanned engineering time.

The honest take: This is the least mature layer in the stack. No dominant framework, no established patterns. You’re writing policy code from scratch.

Guardrails need to know what the agent is doing right now to decide what it shouldn’t do next. That means tracking agent state in real time. Lock-in risk is low because most guardrails are custom policy code you write yourself. NeMo Guardrails is the closest thing to a framework, but you’ll still write most rules from scratch. The prototype-to-production gap is effectively infinite. Your demo has no guardrails because nobody’s trying to break it. Production will.

Current guardrails tools focus on single-agent systems. If you’re running multi-agent workflows where agents delegate to each other, guardrail propagation across agent boundaries is an unsolved problem. You’ll need custom authorization logic.

What are you building?

This is the decision that cuts through the framework confusion. The agent type determines which layers you invest in and which tools to pick at each one.

A stateless tool caller answers questions from a knowledge base, looks up an order, or checks inventory. You need a provider SDK, MCP, and Postgres. No framework, no vector database. This is a weekend project.

A multistep workflow processes a refund end to end, reviews a PR across five files, or triages and routes support tickets. Steps depend on each other, things fail in the middle, and humans need to approve before the agent acts. You need LangGraph, MCP, and eval. Build evals before you deploy because these agents break silently.

An agent that learns remembers your preferences across sessions, gets better at your codebase over time, or tracks project context across weeks. You need a memory-first architecture, a vector DB, and eval. Orchestration is the easy part. The hard part is deciding what to remember, what gets dropped, and how you stop old context from polluting new answers.

A multi-agent system has agents that delegate to other agents, split a research task across specialists, or run parallel workstreams. You need the full stack. Two agents passing context to each other is already hard to debug. Five is impossible without trace-level evals on every handoff. Build eval infrastructure before you build the second agent.

Pick your stack

Coding agents: All 6 layers in action

Coding agents like Cursor, Claude Code, Codex, and Windsurf are the most proven application of the AI agents stack. All six layers, working together.

At the inference layer, these tools serve hundreds of millions of daily requests. Cursor routes between Claude, GPT-4, and its own fine-tuned models depending on the task. At the protocols layer, MCP servers connect to editors, terminals, filesystems, and Git, which is how the agent reads your code and runs commands. The memory layer uses codebase-aware retrieval with reranking. The agent doesn’t read your whole repo. It retrieves the files that matter for this specific edit.

At the framework layer, these are custom orchestration systems with RL loops. Not LangGraph, not a provider SDK. Purpose-built control flow for code generation, review, and iteration. At the eval layer, Cursor retrains its acceptance-rate model every 90 minutes based on whether users accept or reject suggestions. That’s eval running in production, continuously. And at the guardrails layer, sandboxed execution prevents runaway agents. The agent can write code and run it, but inside a container that limits what it can touch.

The AI agent stack cheat sheet

Every layer scored on the three questions from the evaluation framework: How much state do you need to manage? How much vendor lock-in can you tolerate? And how hard is it to go from demo to production?

The agent stack cheat sheet

The bigger picture

Most teams are building like it’s still 2024. They pick LangGraph before they know if they need state. They add a vector database before they’ve outgrown Postgres. They design multi-agent architectures before they’ve shipped one agent that works. The decision flowchart above exists because a tool-calling chatbot and a multi-agent research system share almost no infrastructure. Treat them the same and you’ll overbuild the first and underbuild the second.

The teams that got past this run evals on every deploy, not once a quarter. Their guardrails sit at the tool call layer, not the output layer. Their memory architecture was designed, not inherited from whatever the framework defaulted to. Most teams ship the opposite: no evals, output-only filtering, and a system prompt that grows until the context window chokes. The gap isn’t talent or budget. It’s knowing which layers matter for your specific agent instead of half-building all six.

The stack is going to collapse. Provider SDKs are already absorbing memory, tool calling, and basic eval into a single API. By early 2027, most teams won’t build each layer separately. They’ll get an increasingly opinionated stack from their model provider and that will be fine for 80% of use cases. The other 20%, agents at scale where the defaults break, will still build custom at every layer. But even then, when something fails in production, you need to know which layer failed. That’s what this article is for.

Sources

  1. The AI Agents Stack,” Letta, November 2024.
  2. Donating the Model Context Protocol and Establishing the Agentic AI Foundation,” Anthropic, December 2025.
  3. 120+ Agentic AI Tools Mapped Across 11 Categories [2026],” StackOne, February 2026.
  4. Henrik Plate and Darren Meyer, Dependency Management Report, Endor Labs, January 2026.
  5. Jason Liu, Context Engineering Series: Building Better Agentic RAG Systems, August 2025.
  6. LangChain and LangGraph Agent Frameworks Reach v1.0 Milestones,” LangChain, October 2025.
  7. State of Agent Engineering, LangChain, December 2025.
  8. Yunfei Bai, Allie Colin, Kashif Imran, and Winnie Xiong, “Evaluating AI Agents: Real-World Lessons from Building Agentic Systems at Amazon,” Amazon, February 2026.
  9. OWASP MCP Top 10, OWASP.


Read the whole story
alvinashcraft
14 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

AI Arcade

1 Share

On Mondays we like to trawl the internet to find excellent builds from the Raspberry Pi community. Sometimes they end up in Raspberry Pi Official Magazine, like this AI Arcade by Grigor Todorov. Happy Maker Monday, friends!

Raspberry Pi is at the heart of many DIY games consoles. By running emulation software such as RetroPie, Recalbox, Batocera, and Lakka, our favourite computer is more than capable of playing a host of classic games from a host of systems spanning the 8-bit to 32-bit eras. When Grigor Todorov decided to produce his own console, however, he decided not to base it around existing systems like the Sega Mega Drive or SNES. Instead, he pursued a project that would allow him to enjoy new titles, made on the fly, using the magic of AI.

The wooden box is reused packaging from a set of glass cups
Games are generated via AI and they’re also saved for future play

Grigor is an avid gamer. “I enjoy playing games on my Steam Deck, especially roguelikes such as Hades and The Binding of Isaac,” he says. “I like games that are quick to start, replayable, and a little unpredictable, which strongly influenced the idea behind AI Arcade.” Having already owned a Raspberry Pi and an arcade joystick left over from other projects, Grigor was able to immediately start working on his console. He wanted it to generate as many games as possible using large language models (LLMs).

“I wanted to experiment with LLMs in a more playful, physical way,” he explains. “As LLMs have improved so quickly, I became curious about what the future of gaming might look like with AI involved. One possibility was a device that could generate endless new games, giving you a fresh experience every time.” For this he turned to ChatGPT and engaged in a spot of vibe-coding (which involves describing an app in natural language rather than directly writing code) over the course of an afternoon.

Code generation 

Using a Raspberry Pi felt natural for this project. “The Raspberry Pi is very versatile, compact, and power-efficient,” Grigor says. “It also has a huge online community and excellent documentation, which makes it ideal for projects like this. I like that the console could be made fully self-contained. I did not add a battery for this version, but it would be possible to turn it into a portable, self-powered box that only needs to be connected to a screen.”

Games can take a little while to generate, but a mini-game entertains the player in the meantime

From the start, Grigor wanted the console to be as easy to use as possible. “That was one of the main goals,” he says. “I wanted a Raspberry Pi computer to host a web page locally and automatically launch it full-screen on startup. I also wanted to ensure that the user needn’t understand anything technical. The aim was that people would just power it up, connect the HDMI cable to a screen, wait for it to load, and then either prompt the LLM to create a new game or play one that has already been generated.”

With all of that in mind, Grigor got down to generating the local web app. He got the program to produce a browser-based user interface that could be navigated using a joystick controller with two buttons via a browser Gamepad API. He also ensured the interface could be navigated using a keyboard. The web app was originally set up so that a completely random game would be generated straight away, but he found this could be expanded. “I found it more interesting when the player could steer the result a little,” he reveals. “The choices give the user some influence over the theme, mechanics, or style of the final game.”

Grigor says the games are simple and the gameplay doesn’t always make sense

The result was an app which uses two LLM API calls. “First, the app asks the model to generate four questions, each with two possible answers. This only takes a few seconds,” Grigor says. “The player then chooses between the answers, and their selections are used in a second prompt. That second prompt is used to generate an HTML file, which is then displayed as the game.”

Retro roots

Even though he was vibe-coding, it wasn’t really a shortcut to success. “One of the main challenges was making the experience feel smooth and appliance-like, rather than like a computer running a script,” he says. “Getting the Raspberry Pi to boot, host the local web page, open Chromium, and go full-screen automatically was an important part of that. Another challenge was dealing with the unpredictability of LLM-generated code. Sometimes the generated games work surprisingly well, and other times they are a bit broken. I tried to make the prompting more structured so that the output would be consistent and playable.”

The actual build is very straightforward. Grigor used an 8GB Raspberry Pi 4 computer

The end result is impressive. The system is able to produce simple, addictive games that evoke the nostalgia of a bygone era. “The gameplay can be genuinely interesting and the results are often more repeatable than I expected,” Grigor says. But the experiment still shows that AI has some way to go to be able to match the genius of past developers. “Sometimes the games are broken, but they are still fun to experiment with,” he adds. “The weakest part is usually the art. The assets are not especially strong, so I think the project could be improved by adding another AI model that specialises in generating higher-quality visuals.”

Still, it’s a fun project to try and, true to its retro roots, it doesn’t have to be a complicated build. The case itself is just a wooden packaging box with a joystick and two buttons and that’s just the way Grigor wanted it. “I like reusing old packaging and I thought the wooden box worked well for this kind of project,” he explains. “For simple builds, I do not think 3D printing is always necessary. Sometimes an old box and a hole-saw drill bit are enough. That said, I would love to remake it one day in a nicer oak box.”

When generating a game, users are asked four random questions which form the basis of the gameplay. This can then be enjoyed using the console’s joystick and arcade buttons. The games have a definite retro vibe

He’d also like to take the AI Arcade to another level. “I would like to lean more into the multiplayer side of arcades and possibly add extra buttons,” he says. “It would also be interesting to experiment with generating 3D games. At the moment, many of the LLM-generated games are quite similar. One idea I am interested in is pre-generating 20 to 30 game templates and then using local generation to tweak values, change mechanics and swap art assets. That could make the box work offline while still producing varied games.”

The post AI Arcade appeared first on Raspberry Pi.

Read the whole story
alvinashcraft
15 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

1011: tmux + Terminal Maxxing with Ben Vinegar

1 Share

Scott and Wes sit down with Ben Vinegar, former Syntax GM and founder of Modem.dev, to geek out over terminal-maxxing, from SSH-based development and tmux workflows to AI-powered coding agents. Ben also demos two of his open source tools: Hunk, a slick terminal code reviewer with 4k+ GitHub stars, and TermDraw, a terminal-based diagramming tool that posts directly to your agent.

Show Notes

Sick Picks

Shameless Plugs

Hit us up on Socials!

Syntax: X Instagram Tiktok LinkedIn Threads

Wes: X Instagram Tiktok LinkedIn Threads

Scott: X Instagram Tiktok LinkedIn Threads

Randy: X Instagram YouTube Threads





Download audio: https://traffic.megaphone.fm/FSI4346709770.mp3
Read the whole story
alvinashcraft
15 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

BONUS The Communication Tax — Why Your Team Collaborates Too Much and What to Cut First With Roman Nikolaev

1 Share

BONUS: The Communication Tax — Why Your Team Collaborates Too Much and What to Cut First

In this BONUS episode, Roman Nikolaev challenges one of the most deeply held beliefs in the agile world: that more collaboration is always better. As Head of Technology at Cambri, Roman has watched teams burn their best hours in meetings and handoffs that create the feeling of productivity without the outcomes. He shares practical tools — from the vacation test to RFC processes — that help teams find the minimum viable level of collaboration.

From Senior Engineer to Accidental Manager

"I kind of accidentally ended up in management. I didn't want to lead anyone, I wanted to be just a senior engineer doing my stuff. But somehow, four months in the job, I was already leading a team, and then one year after, I was head of technology."

 

Roman's career in engineering goes back to the early 2000s. When he changed jobs during COVID, he specifically didn't want a management role — he wanted to code. But within months he was leading a team, and within a year he was running the entire technical organization at Cambri. That unexpected shift from hands-on engineering to leading teams gave him a front-row seat to how collaboration actually works — and how often it doesn't. What he noticed was that the most important differentiator for technical teams isn't technical knowledge — it's communication, and the tax you pay when communication goes wrong.

The Communication Tax Is Real

"The communication tax is real. The less we need to pay for communication, the more we can concentrate and own things end to end."

 

Roman describes a pattern most teams will recognize: stakeholders inside and outside the team — product managers, QA, scrum masters, product owners — and at some point, it becomes a game of telephone. The people doing the actual work don't have the context they need. The result? Unnecessary features, wrong implementations, suboptimal technical solutions that don't scale. His argument isn't that collaboration is bad. It's that every handoff, every meeting, every "quick sync" has a cost — and most teams aren't honest about how much they're paying.

Handoffs Aren't Collaboration

"If you look at a typical software development lifecycle — a ticket created by a product owner, refinement with the team, development, code review, QA, acceptance — there are quite many handoffs. If we can reduce some of this, we get a more effective workflow."

 

Roman walks through the standard ticket lifecycle and counts the handoffs: PO creates ticket, team refines, developer picks it up, code review with other developers, QA phase, acceptance phase. Each transition is a potential information loss. His provocation: instead of involving more people when someone struggles with a task, give the person working on it the tools and knowledge to complete it independently. The trigger for his thinking was a real team conversation where someone suggested everyone should "jump on the ticket" to help. Roman's response: wouldn't it be better to equip the individual rather than create more dependencies?

Async Tools That Actually Work

"Instead of gathering a meeting where people come unprepared or with some raw ideas, we have ownership for a task. Someone takes their time, writes down their thoughts, options in a document, and then we assign people to review it."

 

Roman shares two async practices his teams use at Cambri. First, the RFC (Request for Comments) process on Confluence — one person owns a decision, writes it up with options, and assigned reviewers sign off asynchronously. It turns out to be more effective at finding better technical solutions while spreading knowledge without requiring synchronous deep-dives. Second, his Monday written updates: every week, he spends about 90 minutes writing a detailed post covering all project statuses, what happened last week, what's coming, and business context. The team feedback in skip-level meetings is consistently positive, and he fields far fewer questions about business context and priorities than before the practice started.

The Vacation Test

"One heuristic would be that if one of the team members goes on vacation, the rest of the team can continue working on their task."

 

Roman learned this the hard way. He went on a typical Finnish one-month vacation. Before leaving, he explained the architecture and intent for a key task to his team. He came back to discover they'd built the completely wrong thing — wasting one month of a two-month project. He spent the remaining time working weekends, on planes, on trains, just to hit the deadline. The lesson wasn't that he needed more collaboration or synchronous communication before leaving. It was that he needed better communication — and a way to test whether shared context actually exists. His heuristic: if Alice goes on vacation, can Bob continue from where she stopped? If not, you don't need more meetings. You need better async context-sharing.

Where to Start: Ownership First, Then Cut Meetings

"I would probably first look into if a particular initiative, a feature, or some kind of process has an owner and well-defined roles. Usually, if there is no clear owner, that leads to a lot of synchronous meetings."

 

For Scrum Masters and team leads looking for a practical starting point, Roman offers a two-step approach. First, ensure every initiative, feature, and process has a clear owner with well-defined roles. Without clear ownership, meetings multiply because nobody is sure who's responsible, so everyone attends everything. Second, look at the team calendar starting with the biggest meetings and ask: can this be an RFC? A message? An email? Then experiment — cancel a meeting, replace it with an async channel, and see what happens. You can always bring it back. In the agile world, Roman argues, we should embrace experimentation with our own processes, not just our products.

Recommended Resources

Roman recommends Team Topologies by Matthew Skelton and Manuel Pais. The book gave him a clear mental model for independent teams that own their area end to end — teams aligned to value streams that own the customer problem completely. For more of Roman's thinking on collaboration, check out his Substack newsletter: Is Your Collaboration Good or Evil? on High Impact Engineering.

About Roman Nikolaev

Roman Nikolaev is Head of Technology at Cambri. He's spent his career thinking about how teams actually get work done — and his contrarian view that most teams collaborate too much has sparked real debate in the agile community.

 

You can link with Roman Nikolaev on LinkedIn.





Download audio: https://traffic.libsyn.com/secure/scrummastertoolbox/20260608_Roman_Nikolaev_M.mp3?dest-id=246429
Read the whole story
alvinashcraft
15 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

518: Windows is Back! New Microsoft AI Coding Models

1 Share

Microsoft Build 2026 brought a major shift: homegrown AI models designed for efficiency and real-world developer workflows. From the cost-effective MAI Code 1 Flash to sandboxed code execution and Windows developer tools, discover how Microsoft is making powerful AI accessible without breaking the bank. James shares his hands-on experience proving that you don't need expensive flagship models for most agentic coding tasks.

Windows: https://developer.microsoft.com/en-us/windows/dev-tools
Microsoft AI: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/

Follow Us

⭐⭐ Review Us ⭐⭐

Machine transcription available on http://mergeconflict.fm

Support Merge Conflict





Download audio: https://aphid.fireside.fm/d/1437767933/02d84890-e58d-43eb-ab4c-26bcc8524289/5b4b8b33-39c3-4e9e-bb24-6025eec8eaa7.mp3
Read the whole story
alvinashcraft
15 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Microsoft unlocks Visual Studio for developers left behind by its own AI

1 Share

Microsoft used its Build 2026 conference last week to announce a series of updates to its flagship Visual Studio IDE centered on a theme the company is calling agents that participate in development rather than sit next to it — along with a long-awaited move to let developers bring their own AI models and keys to the IDE.

The announcements span debugging, profiling, testing, merge conflict resolution, .NET modernization, and a new model flexibility option that Microsoft says will open the door to teams whose environments have made the current AI integration a non-starter.

“Historically, AI integration in Visual Studio has been limited to a small set of sanctioned endpoints,” writes Mads Kristensen, principal product manager for Visual Studio, in a blog post accompanying the announcements. “That works for a lot of developers, but it has left real customers behind, including teams whose environments call for different choices.”

BYOK: The enterprise unlock

The bring-your-own-key (BYOK) announcement may be the most significant for enterprise shops. Microsoft is moving toward a model that lets developers use different AI models — whether running locally or in the cloud — rather than being locked to the handful of endpoints Visual Studio has historically supported.

Microsoft is willing to compete on flexibility rather than assuming developers will simply work within whatever AI stack Redmond has blessed.

That matters for teams with compliance requirements, cost constraints, or data sovereignty concerns that have prevented them from using Visual Studio’s AI features in their current form. The move also signals that Microsoft is willing to compete on flexibility rather than assuming developers will simply work within whatever AI stack Redmond has blessed.

Agents in the toolchain

Beyond BYOK, Microsoft’s bigger architectural push is embedding agents directly into the IDE’s existing toolchain — the debugger, profiler, and test runner — rather than treating AI as a parallel chat interface.

“This is not about replacing the tools you already rely on,” Kristensen writes. “It is about connecting them more effectively.”

The practical pitch is aimed at enterprise C# and C++ developers working in large codebases where, as Kristensen puts it, the hard problems are not “write this function” but “figure out why this thing is slow under load.” The agents are meant to help identify issues faster, explain what is happening, suggest fixes, and help validate results — all within the context of the existing debugger and profiler rather than requiring developers to context-switch to a chat window.

A dedicated Build session — “GitHub Copilot in Visual Studio: Agents That Debug, Profile, and Test” (BRK207), featuring Kristensen and Nik Karpinsky, Principal Software Engineer Lead at Microsoft, provides additional information on this topic.

Modernization gets more ambitious

Microsoft is also expanding what it calls GitHub Copilot modernization, the agent experience built into Visual Studio for upgrading applications to the latest .NET stack.

New this summer: the ability to migrate Web Forms applications to Blazor and add Aspire to existing apps for cloud-ready observability and orchestration. The modernization agent is designed to assess a project, build a migration plan, and execute upgrades step by step.

The pitch is aimed at teams that have been sitting on aging Web Forms codebases because the economics of a full rewrite never made sense. Whether the agent-assisted approach actually changes that calculus remains to be seen, but it is a more concrete use case than general-purpose code generation.

Smaller changes worth noting

Microsoft is also shipping a quality-of-life fix that addresses a scenario most Visual Studio developers have encountered: builds that run even when the Error List already shows obvious problems, only to fail on something that was visible up front. Going forward, Visual Studio will check errors and warnings before the build starts, Kristensen writes.

Going forward, Visual Studio will check errors and warnings before the build starts, Kristensen writes.

On the collaboration side, Microsoft is working on AI-assisted merge conflict resolution — not auto-merging, but helping to understand the conflict and make a decision. Also coming: Microsoft-authored skills that apply automatically based on project type and context, reducing the need for developers to know what to prompt for.

Under the hood

Underneath it all, Visual Studio is moving to the GitHub Copilot SDK as the foundation for its AI integration. The change will not be visible in any menu, but Microsoft says it will allow the company to move faster and stay aligned with the broader Copilot ecosystem.

The full set of announcements is available at the Visual Studio blog. Also, the Build sessions are streaming online for free at build.microsoft.com.

The post Microsoft unlocks Visual Studio for developers left behind by its own AI appeared first on The New Stack.

Read the whole story
alvinashcraft
15 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories