We tend to assume that if every part of a system behaves correctly, the system itself will behave correctly. That assumption is deeply embedded in how we design, test, and operate software. If a service returns valid responses, if dependencies are reachable, and if constraints are satisfied, then the system is considered healthy. Even in distributed systems, where failure modes are more complex, correctness is still tied to the behavior of individual components. In modern AI systems, particularly those combining retrieval, reasoning, and tool invocation, this assumption is increasingly stressed under continuous operation.
This model works because most systems are built around discrete operations. A request arrives, the system processes it, and a result is returned. Each interaction is bounded, and correctness can be evaluated locally. But that assumption begins to break down in systems that operate continuously. In these systems, this behavior is not the result of a single request. It emerges from a sequence of decisions that unfold over time. Each decision may be reasonable in isolation. The system may satisfy every local condition we know how to measure. And yet, when viewed as a whole, the outcome can be wrong.
One way to think about this is as a form of behavioral drift systems that remain operational but gradually diverge from their intended trajectory. Nothing crashes. No alerts fire. The system continues to function. And still, something has gone off course.
The root of the issue is not that components are failing. It is that correctness no longer composes cleanly. In traditional systems, we rely on a simple intuition: If each part is correct, then the system composed of those parts will also be correct. This intuition holds when interactions are limited and well-defined.
In autonomous systems, that intuition becomes unreliable. Consider a system that retrieves information, reasons over it, and takes action. Each step in that process can be implemented correctly. Retrieval returns relevant data. The reasoning step produces plausible conclusions. The action is executed successfully. But correctness at each step does not guarantee correctness of the sequence.
The system might retrieve information that is contextually valid but incomplete or misaligned with the current task. The reasoning step might interpret it in a way that is locally consistent but globally misleading. The action might reinforce that interpretation by feeding it back into the system’s context. Each step is valid. The trajectory is not. This is what behavioral drift looks like in practice: locally correct decisions producing globally misaligned outcomes.
In these systems, correctness is no longer a property of individual steps. It is a property of how those steps interact over time. This breakdown is subtle but fundamental. It means that testing individual components, even exhaustively, does not guarantee that the system will behave correctly when those components are composed into a continuously operating whole.
To understand why this happens, it helps to look at where behavior actually comes from. In many modern AI systems, behavior is not encoded directly in a single component. It emerges from interaction:
Each of these elements operates with partial information. Each contributes to the next state of the system. The system evolves as these interactions accumulate. This pattern is especially visible in LLM-based and agentic AI systems, where context assembly, reasoning, and action selection are dynamically coupled. Under these conditions, behavior is dynamic and path dependent. Small differences early in a sequence can lead to large differences later on. A slightly suboptimal decision, repeated or combined with others, can push the system further away from its intended trajectory.
This is why behavior cannot be fully specified ahead of time. It is not simply implemented; it is produced. And because it is produced over time, it can also drift over time.
Modern observability systems are very good at telling us what a system is doing. We can measure latency, throughput, and resource utilization. We can trace requests across services. We can inspect logs, metrics, and traces in near real time. In many cases, we can reconstruct exactly how a particular outcome was produced. These signals are essential. They allow us to detect failures that disrupt execution. But they are tied to a particular model of correctness. They assume that if execution proceeds without errors and if performance remains within acceptable bounds, then the system is behaving as expected.
In systems exhibiting behavioral drift, that assumption no longer holds. A system can process requests efficiently while producing outputs that are progressively less aligned with its intended purpose. It can meet all its service-level objectives while still moving in the wrong direction. Observability captures activity. It does not capture alignment.
This distinction becomes more important as systems become more autonomous. In AI-driven systems, particularly those operating as long-lived agents, this gap between activity and alignment becomes operationally significant. The question is no longer just whether the system is working. It is whether it is still doing the right thing. This gap between activity and alignment is where many modern systems begin to fail without appearing to fail.
A natural response to this problem is to add more validation. We can introduce checks at each stage:
These mechanisms improve local correctness. They reduce the likelihood of obviously incorrect decisions. But they operate at the level of individual steps.
They answer questions like:
They do not answer:
A system can pass every validation check and still drift. Behavioral drift is not caused by invalid steps. It is caused by valid steps interacting in ways we did not anticipate. Increasing validation does not eliminate this problem. It only shifts where it appears, often pushing it further downstream, where it becomes harder to detect and correct.
If correctness does not compose automatically, then what determines system behavior? Increasingly, the answer is coordination. In traditional distributed systems, coordination refers to managing shared state, ensuring consistency, ordering operations, and handling concurrency. In autonomous systems, coordination extends to decisions.
The system must coordinate:
This coordination is not centralized. It is distributed across models, planners, tools, and feedback loops. In agentic AI architectures, this coordination spans model inference, retrieval pipelines, and external system interactions. The system’s behavior is not defined by any single component. It emerges from the interaction between them.
In this sense, the system is no longer just the sum of its parts. The system is the coordination itself. Failures arise not from broken components, but from the dynamics of interaction timing, sequencing, feedback, and context. This also explains why small inconsistencies can propagate and amplify. A slight mismatch in one part of the system can cascade through subsequent decisions, shaping the trajectory in ways that are difficult to anticipate or reverse.
One response to this complexity is to introduce more structure. Control planes, policy engines, and governance layers provide mechanisms to enforce constraints at key decision points. They can validate inputs, restrict actions, and ensure that certain conditions are met before execution proceeds. This is an important step. Without some form of structure, it becomes difficult to reason about system behavior at all. But structure alone is not sufficient.
Most control mechanisms operate at entry points. They evaluate decisions at the moment they are made. They determine whether a particular action should be allowed, whether a policy is satisfied, and whether a request can proceed. The problem is that many of the failures in autonomous systems do not originate at these entry points. They emerge during execution, as sequences of individually valid decisions interact in unexpected ways. A control plane can ensure that each step is permissible. It cannot guarantee that the sequence of steps will produce the intended outcome. This distinction is subtle but important: control provides structure, but not assurance.
Traditional monitoring focuses on events. A request is processed. A response is returned. An error occurs. Each event is evaluated independently. In systems exhibiting behavioral drift, behavior is better understood as a trajectory. A trajectory is a sequence of states connected by decisions. It captures how the system evolves over time. Two trajectories can consist of individually valid steps and still produce very different outcomes. One remains aligned. The other drifts. This represents a shift from failure as an event to failure as a trajectory, a distinction that traditional system models are not designed to capture.
Correctness is no longer about individual events. It is about the shape of the trajectory. This shift has implications not just for how we monitor systems, but for how we design them in the first place.
If failure manifests as drift, then detecting it requires a different set of signals. Instead of looking for errors, we need to look for patterns:
These signals are not binary. They do not indicate that something is broken. They indicate that something is changing. The challenge is that change is not always failure. Systems are expected to adapt. Models evolve. Data shifts. The question is not whether the system is changing. It is whether the change remains aligned with intent. This requires a different kind of visibility, one that focuses on behavior over time rather than isolated events. Once drift is identified, the system needs a way to respond. Traditional responses, restart, rollback, stop, assume failure is discrete and localized. Behavioral drift is neither.
What is needed is the ability to influence behavior while the system continues to operate. This might involve constraining action space, adjusting decision selection, introducing targeted validation, or steering the system toward more stable trajectories. These are not binary interventions. They are continuous adjustments.
This perspective aligns with how control is handled in other domains. In control systems engineering, behavior is managed through feedback loops. The system is continuously monitored, and adjustments are made to keep it within desired bounds. Control is no longer just a gate. It becomes a continuous process that shapes behavior over time.
This leads to a different definition of reliability. A system can be available, responsive, and internally consistent—and still fail if its behavior drifts away from its intended purpose. Reliability becomes a question of alignment over time: whether the system remains within acceptable bounds and continues to behave in ways consistent with its goals.
If behavior is trajectory-based, then system design must reflect that. We need to monitor patterns, understand interactions, treat behavior as dynamic, and provide mechanisms to influence trajectories. We are very good at detecting failure as breakage. We are much less equipped to detect failure as drift. Behavioral drift accumulates gradually, often becoming visible only after significant misalignment has already occurred.
As systems become more autonomous, this gap will become more visible. The hardest problems will not be systems that fail loudly, but systems that continue working while gradually moving in the wrong direction. The question is no longer just how to build systems that work. It is how to build systems that continue to work for the reasons we intended.
I wanted to give an update on GitHub’s availability in light of two recent incidents. Both of those incidents are not acceptable, and we are sorry for the impact they had on you. I wanted to share some details on them, as well as explain what we’ve done and what we’re doing to improve our reliability.
We started executing our plan to increase GitHub’s capacity by 10X in October 2025 with a goal of substantially improving reliability and failover. By February 2026, it was clear that we needed to design for a future that requires 30X today’s scale.
The main driver is a rapid change in how software is being built. Since the second half of December 2025, agentic development workflows have accelerated sharply. By nearly every measure, the direction is already clear: repository creation, pull request activity, API usage, automation, and large-repository workloads are all growing quickly.

This exponential growth does not stress one system at a time. A pull request can touch Git storage, mergeability checks, branch protection, GitHub Actions, search, notifications, permissions, webhooks, APIs, background jobs, caches, and databases. At high scale, small inefficiencies compound: queues deepen, cache misses become database load, indexes fall behind, retries amplify traffic, and one slow dependency can affect several product experiences.
Our priorities are clear: availability first, then capacity, then new features. We are reducing unnecessary work, improving caching, isolating critical services, removing single points of failure, and moving performance-sensitive paths into systems designed for these workloads. This is distributed systems work: reducing hidden coupling, limiting blast radius, and making GitHub degrade gracefully when one subsystem is under pressure. We’re making progress quickly, but these incidents are examples of where there’s still work to do.
Short term, we had to resolve a variety of bottlenecks that appeared faster than expected from moving webhooks to a different backend (out of MySQL), redesigning user session cache to redoing authentication and authorization flows to substantially reduce database load. We also leveraged our migration to Azure to stand up a lot more compute.
Next we focused on isolating critical services like git and GitHub Actions from other workloads and minimizing the blast radius by minimizing single points of failure. This work started with careful analysis of dependencies and different tiers of traffic to understand what needs to be pulled apart and how we can minimize impact on legitimate traffic from various attacks. Then we addressed those in order of risk. Similarly, we accelerated parts of migrating performance or scale sensitive code out of Ruby monolith into Go.
While we were already in progress of migrating out of our smaller custom data centers into public cloud, we started working on path to multi cloud. This longer-term measure is necessary to achieve the level of resilience, low latency, and flexibility that will be needed in the future.
The number of repositories on GitHub is growing faster than ever, but a much harder scaling challenge is the rise of large monorepos. For the last three months, we’ve been investing heavily in response to this trend both within git system and in the pull request experience.
We will have a separate blog post soon describing extensive work we’ve done and the new upcoming API design for greater efficiency and scale. As part of this work, we have invested in optimizing merge queue operations, since that is key for repos that have many thousands of pull requests a day.
The two recent incidents were different in cause and impact, but both reflect why we are increasing our focus on availability, isolation, and blast-radius reduction.
On April 23, pull requests experienced a regression affecting merge queue operations.
Pull requests merged through merge queue using the squash merge method produced incorrect merge commits when a merge group contained more than one pull request. In affected cases, changes from previously merged pull requests and prior commits were inadvertently reverted by subsequent merges.
During the impact window, 230 repositories and 2,092 pull requests were affected. We initially shared slightly higher numbers because our first assessment was intentionally conservative. The issue did not affect pull requests merged outside merge queue, nor did it affect merge queue groups using merge or rebase methods.
There was no data loss: all commits remained stored in Git. However, the state of affected default branches was incorrect, and we could not safely repair every repository automatically. More details are available in the incident root cause analysis.
This incident exposed multiple process failures, and we are changing those processes to prevent this class of issue from recurring.
On April 27, an incident affected our Elasticsearch subsystem, which powers several search-backed experiences across GitHub, including parts of pull requests, issues, and projects.
We are still completing the root cause analysis and will publish it shortly. What we know now is that the cluster became overloaded (likely due to a botnet attack) and stopped returning search results. There was no data loss, and Git operations and APIs were not impacted. However, parts of the UI that depended on search showed no results, which caused a significant disruption.
This is one of the systems we had not yet fully isolated to eliminate as a single point of failure, because other areas had been higher in our risk-prioritized reliability work. That impact is unacceptable, and we are using the same dependency and blast-radius analysis described above to reduce the likelihood and impact of this type of failure in the future.
We have also heard clear feedback that customers need greater transparency during incidents.
We recently updated the GitHub status page to include availability numbers. We have also committed to statusing incidents both large and small, so you do not have to guess whether an issue is on your side or ours.
We are continuing to improve how we categorize incidents so that the scale and scope are easier to understand. We are also working on better ways for customers to report incidents and share signals with us during disruptions.
GitHub’s role has always been to support developers on an open and extensible platform.
The team at GitHub is incredibly passionate about our work. We hear the pain you’re experiencing. We read every email, social post, support ticket, and we take it all to heart. We’re sorry.
We are committed to improving availability, increasing resilience, scaling for the future of software development, and communicating more transparently along the way.
The post An update on GitHub availability appeared first on The GitHub Blog.
According to the 2025 Go Developer Survey, 46% of Go developers use the language to build websites and/or web services. It’s therefore unsurprising that the topic of web frameworks frequently pops up in conversation and is often the subject of healthy debate.
The GoLand team enters the chat armed with data to answer the question: What are the most popular web frameworks for Go developers and why?
Developers from environments that rely heavily on frameworks, such as JavaScript, will naturally seek out frameworks to simplify or reduce their workload. Meanwhile, hardcore Gophers will reject external libraries and frameworks altogether as superfluous dependencies that ultimately make their work harder. Both sides are right to some extent; using Go’s standard library exclusively comes with its own set of pros and cons.
net/httpnet/http package is no exception. Go’s standard library provides a solid foundation for building web services; in particular, it already includes routing, middleware composition via handlers, and an HTTP server implementation.net/http, so as to avoid external dependencies. Third-party libraries are only good as long as they’re maintained, and they can always introduce security risks and maintenance overhead. Especially in larger commercial projects, these limitations are often unacceptable.net/http have full control over their code and don’t have to pay the “overhead tax” for features they don’t use that come with the libraries.net/http (or so we hope), in commercial settings, this eliminates the need to learn new tools to become productive when a new developer is hired.net/httpWhether you are convinced by the arguments for or against, the truth remains that – according to JetBrains State of Developer Ecosystem Report 2025 – as much as 32% of Go developers use net/http, and its popularity remains largely unchanged.

Unlike a lot of other languages – as is the case with Ruby and Rails or Python with Django or Flask – there isn’t one dominant framework that every Go developer would recognize and use. While stdlib remains a popular choice, our data shows that it has a formidable opponent, used by almost half of Go developers – Gin.
On top of that, in our analysis of The Go Ecosystem in 2025, we’ve identified the most widely used web frameworks to be Gin (48%), Gorilla (17%), Echo (16%), and Fiber (11%).
Now, let’s look at how they stack up against each other and against net/http and see if there is a clear winner when it comes to web frameworks for Go (spoiler alert: There isn’t 😉).
Gin is an HTTP web framework for building REST APIs, web applications, and microservices in Go. It offers middleware support, JSON validation, route grouping, error management, and built-in rendering. Gin is highly extensible and remains the top choice for Go developers as one of the fastest regularly maintained frameworks with a developer-friendly API. It has over 88,000 stars on GitHub and a sizable community around it, so you can expect to find a lot of examples and support from other developers when you run into trouble.
You can create a router engine in Gin with (gin.Default()) or without (gin.New()) middleware attached, depending on how much control you need. gin.Default() comes with logger and recovery middleware out of the box. Other middleware lives in the official gin-contrib collection, where you can find a CORS mechanism, as well as authentication, session manager, pprof, and other tools.
Gin’s creators boast that its “performance [is] up to 40 times faster than Martini”, though in 2026, this probably doesn’t indicate much. Thankfully, they also run their own benchmarks, allowing you to compare Gin’s performance against a number of more modern libraries as well. And while Gin is not the most performant in all scenarios (Aero actually takes that cake), the results are still very close to the top across the board. This is largely due to zero-allocation routing, which keeps the app memory usage stable even under high traffic.
Gin is an opinionated framework, which means it follows its own approach. For example, it uses gin.Context instead of the standard context.Context. It is still built on top of net/http, but if your code depends heavily on Gin-specific features, moving to another framework later may require extra work.
Pick Gin if you want a framework that:
Echo is yet another high-performing and minimalist framework that has an HTTP router without dynamic memory allocation. It includes automatic TLS, support for HTTP/2, middleware, data binding and rendering, and various template engines. It’s also very extensible at various levels. It continues to grow in popularity, with 16% of Go developers declaring its use in 2025.
Echo has a broad catalog of official middleware that you will also find in other frameworks, such as CORS, JWT, and key authentication tools, as well as a logger and rate limiter.
Similarly to Gin, Echo is built on top of net/http, but it does deviate from it at times. For example, it uses Echo.context instead of context.Context. Also, Echo handlers use the signature func(echo.Context) error rather than http.HandlerFunc, but Echo provides adapters to integrate standard net/http handlers when needed.
Choose Echo if you:
Chi also claims to be a lightweight yet robust composable router for building Go HTTP services. Some would argue it’s not really a framework, but it’s still quite popular with Go developers, ranking as the fifth most popular alternative (used by 12% of developers in 2025).
Chi’s authors claim it offers “an elegant and comfortable design” for large REST APIs, and the framework itself is deconstructed into smaller parts. You can use the standalone core router or extend it with subpackages for middleware, rendering, and/or docgen. Other key features include full compatibility with net/http and no external dependencies. Chi uses standard Go handler types and middleware shape.
This means that Chi is compatible with all standard middleware, giving you more flexibility. Its optional middleware package includes a suite of core net/http tools, and on top of that, there are also extra middleware and other packages. Some noteworthy middleware options include: CORS, JWT auth, request logger, and rate limiter tools.
You can check Chi’s benchmarks here, though they are rather old.
It’s worth considering Chi if:
net/http. If, however, you feel that a router would make your life easier, but you still want full compatibility with stdlib, Chi might be the choice for you.Finally, there’s Fiber – a framework that JavaScript developers in particular will be very fond of, as it’s inspired by Express. It boasts robust routing, the ability to serve static files, API-readiness, a rate limiter, flexible middleware support, low memory footprint, support for template engines and WebSocket, and – you guessed it! – great performance. The Fiber team provides its own benchmarks here. It’s the last framework that’s been adopted by over 10% of developers according to our survey.
What differentiates Fiber from the other frameworks we’ve already discussed is that it’s built on a different HTTP engine – fasthttp. It can interoperate with net/http; however, the compatibility is provided through adapters and not through shared foundations. As Fiber’s architecture is fundamentally different from the standard library, choosing it locks you in more than other frameworks do, and migrating between Fiber and net/http frameworks typically requires more refactoring.
Considering Fiber’s architecture, it’s no surprise that it offers the broadest built-in toolbox of all the frameworks discussed. On top of that, it also has a whole host of third-party middleware maintained by the Fiber team or the community. Some of the most popular middleware options for Fiber are: a template engine, adaptor that converts net/http handlers to Fiber handlers and vice versa, Helmet integration, and key authentication tools.
Fiber is especially good for:
Strictly speaking, Gorilla is not a full-blown framework but rather a toolkit, and Gorilla/mux is just a router. But we include it on the list due to its enduring presence, with 17% of developers still using it in 2025. Even with a sharp decline in popularity compared to 2020, it’s still the third most popular choice (after Gin and net/http), despite the project no longer being actively maintained by the original team (the last update was in November 2023). While community forks continue development, most new projects today veer towards alternatives, such as Chi, or the improved routing features in Go’s standard library.
net/http | Gin | Echo | Chi | Fiber | |
| HTTP engine | stdlib | net/http | net/http | net/http | fasthttp |
Compatibility with net/http | Native | Partial | Partial | Full | Via an adapter |
| Maintenance | Core Go | Active | Active | Active | Active |
| Performance | High | Very high | High | High | Extremely high |
| Learning curve | Low | Low | Medium | Very low | Medium |
| Standout features | No dependencies; standardized solution | Wide adoption; robust community support | Clean error handling; “batteries-included” approach | Closeness to stdlib; minimalist features | Similarity to Express; extremely high performance |
| Dependencies | None | Moderate | Moderate | Very low | Significant |
| Extensibility | Very high | Moderate – sometimes requires adaptation | Moderate – requires wrapping | Very high | Low – mostly incompatible with standard middleware |
| Ecosystem | The largest and most mature; supported by Google | Very mature with the largest community and adoption after net/http | Mature with a strong community and long-term maintenance | Mature with a sizeable community | Younger framework, with a community that’s still expanding |
Compatibility with net/http | Native | Opinionated, but built on net/http | Built on top of net/http | Designed around net/http – fully compatible | Not directly compatible, built on fasthttp |
To show you how each framework handles API ergonomics, how verbose it is, and how it deviates from idiomatic Go, here’s a sample of the same API endpoint implemented in different frameworks.
net/http// net/http
http.HandleFunc("/users", func(w http.ResponseWriter, r *http.Request) {
json.NewEncoder(w).Encode(users)
})
// Gin
router.GET("/users", func(c *gin.Context) {
c.JSON(200, users)
})
// Echo
e.GET("/users", func(c echo.Context) error {
return c.JSON(200, users)
})
// Chi
r.Get("/users", func(w http.ResponseWriter, r *http.Request) {
json.NewEncoder(w).Encode(users)
})
// Fiber
app.Get("/users", func(c *fiber.Ctx) error {
return c.JSON(users)
})
Using frameworks is not the only thing that can make your life easier; your IDE can help you stay productive as well, without locking you in. Here’s how using GoLand can help you as a web developer.

Like in any other language, libraries are there to provide solutions to common problems and free up your time to focus on important work. While Go’s standard library, and net/http in particular, contain everything you need to build a production-ready web server, the libraries described in this article will help you avoid a lot of boilerplate and simplify common tasks such as routing, middleware composition, and request binding.
In the Go ecosystem, frameworks are optional rather than foundational. Many production services are built directly on top of net/http, while others use lightweight routers or frameworks to simplify common tasks.
Because the standard library is so robust, for many engineers, there isn’t really a good justification for using frameworks that require learning their syntax, create extra dependencies, and force regular updates. There isn’t a single dominant framework because none of them is superior to stdlib; they just offer different tradeoffs.
net/http sufficient for building scalable APIs?One of Go’s key differentiators is that, technically, you don’t need any external libraries for your application. Go’s standard library has everything you need, regardless of the scale of your project, and some hard-core Gophers stick to only that.
This is not to say that libraries are bad or useless – if they make your life easier and you’re aware of the tradeoffs, there’s really no reason not to use them.
While most Go web frameworks support common features such as routing, middleware, and JSON handling, they differ in architecture, ecosystem compatibility, and how closely they follow net/http. When deciding on the right one for you, focus primarily on what your team and project need.
net/http, Chi is your best bet.To a certain extent – yes, at least more than the other frameworks mentioned in this article. Because Fiber was designed around fasthttp, it had to make architectural trade-offs.
The biggest limitation is that you cannot use the vast library of generic Go middleware with Fiber out of the box. You therefore need to use middleware maintained by the Fiber team or employ an adapter (adaptor.FromHTTP), which introduces performance overhead and effectively defeats the reason to use Fiber in the first place.
Every company, when evaluating new tools, technologies, or infrastructure, eventually runs into the same question:
“Should we build this ourselves or buy a ready-made solution?”
The default answer is often: “We can do it ourselves.” And technically, that’s true.
But the real question isn’t whether it’s possible. It’s how fast and how efficiently you can get there. How long will it take to get something working? And more importantly, how long will it take to make it reliable, maintainable, and usable across the company?
Those are very different problems, and in the context of agentic analytics, the gap between them is especially wide.
Most build vs. buy discussions are approached with a narrow perspective:
“If we simply plug an LLM into our database and documentation, will that give us what we need?”
In early demo stages, this option often works surprisingly well. But it reduces the problem to a single dimension: evaluating model performance on a limited snapshot of data and context.
What it doesn’t capture is what happens next, once the system is used across teams, over time, in real workflows. Questions of consistency, reuse, maintainability, cost, and integration – all of which matter in production – are often overlooked at this stage, even though they ultimately determine whether the system succeeds or fails.
So the real question isn’t: “Can we make this work?”
It’s: “What does it take to make this work reliably across the business, over time?”
In the early stages, DIY setups often look promising. You connect an agent to your warehouse, add some documentation, and run a few queries. The results can be impressive, especially compared to having no solution at all.
But the issues don’t show up in demos. They show up later as the scope expands and becomes more complex. They usually revolve around four main pitfalls commonly found in organizations.
1. Ambiguous business logic
DIY setups typically rely on documentation written in plain English, data catalogs, or a mix of both. These are quick to produce, but they leave room for interpretation, especially across teams that use different definitions for the same metrics.
Take a simple example: What does “active customer” actually mean?
Is it someone who logged in in the last 30 days, made a purchase, or holds an active subscription?
Without a formal definition, the agent has to guess the meaning. And those inferences are not stable; they shift depending on context, phrasing, or even the model’s behavior. Over time, this ambiguity accumulates and leads to inconsistent answers that often look correct but aren’t.
2. Answer quality is a context problem
It’s tempting to assume that better models will fix these issues. In reality, answer quality depends far more on the structure of the underlying context than on the model itself.
When metrics are defined in a structured and consistent way, queries become repeatable. The same question leads to the same result, grounded in the same logic.
Without that structure, each answer becomes a new interpretation. That’s why systems that perform well in controlled benchmarks can fail in production, where the same questions are asked repeatedly, by different people, in slightly different ways.
3. Two sources of noise
Agentic analytics sits at the intersection of two unavoidable sources of noise:
These two types of noise amplify each other. As both the questions and the definitions become more complex, the system becomes increasingly unstable – unless there is a clear, structured layer underneath.
4. Maintainability is the real bottleneck
This is where most DIY projects start to break down.
Even if the system works initially, it raises a series of difficult-to-answer questions as time progresses.
What happens when a metric definition changes? How do you correct inaccurate answers? How does the system evolve as new data sources or teams are added? And what happens when the person who built the original setup is no longer involved?
Beyond that, there are operational concerns: tracking usage and adoption, controlling cost per query, and monitoring answer quality over time.
At this point, the scope has expanded well beyond a simple agent. What started as a quick experiment has grown into a broader system encompassing a semantic layer, integrations, monitoring, and internal workflows. In practice, you are no longer building a tool; you are maintaining an internal product.
What begins as a lightweight prototype quickly turns into a multi-surface system. It needs to integrate into communication tools like Slack or Teams, provide usable interfaces, expose APIs or MCP servers, manage permissions, and offer visibility into performance and usage.
Each of these components introduces its own complexity. And most teams don’t account for this upfront, discovering it gradually, once the system is already in use.
This pattern shows up again and again:
DIY approaches can work, but they rarely scale without significant ongoing investment.
When evaluating whether to build or buy, the decision usually comes down to a few core considerations:
These questions matter far more than whether a prototype works on day one.
There are cases where building your own solution makes sense. Typically, these are large organizations with dedicated platform teams, the resources to maintain a semantic layer as a product, and a long-term commitment to developing internal tooling.
For most companies, however, that level of investment isn’t realistic.
In the end, the dilemma is simpler than it appears:
Do you want to build and maintain an internal product, or use an existing one?
And if you choose to buy, the more important question becomes whether the platform you choose will remain flexible, transparent, and maintainable over time.
Databao is built around this exact problem: making agentic analytics reliable without requiring teams to build and maintain the entire system themselves.
It focuses on generating a structured semantic layer automatically, keeping it aligned with real usage, integrating directly into existing workflows, and ensuring consistency over time.
If you’re evaluating how to bring AI into your analytics stack, we’d be happy to explore the process with you and provide a proof of concept for your individual use case.
We’re back with a fresh PowerToys release! This month introduces Power Display for controlling your monitors from the system tray, Grab And Move for quickly moving and resizing windows, and a wave of improvements to Command Palette and the Dock, along with updates across the utility suite. You can grab the update by checking for updates in PowerToys or by heading to the release page. Let’s dive in!
Introducing Grab And Move – drag and resize windows from anywhere (Preview)This release introduces Grab And Move, a new utility that lets you drag and resize windows without having to target the title bar or window edges. Hold Alt + Left Click anywhere on a window to drag it, or Alt + Right Click to resize it from wherever your cursor is. For users who already use Alt as a system modifier, you can now choose to use the Win key instead.
Grab And Move is ideal for large monitors or windows that have moved off-screen, and it integrates with the existing Settings experience including GPO policy support, an OOBE page, and a modifier-agnostic configuration UI.
Meet Power Display: control your monitors right from the system tray (Preview)Meet Power Display, a new utility that lets you control your hardware monitors right from the system tray. Once enabled, you can open the flyout from the tray icon or a configurable shortcut to quickly access your connected monitors. Power Display automatically detects your displays and, if supported, lets you adjust settings like volume, brightness, contrast, and color profile. No more reaching for those hard to find buttons on the back of your screen!
You can also create profiles to quickly switch between different setups with a single click. Profiles can be configured in Settings and will appear directly in the flyout for easy access.
Lastly, Power Display profiles can now be automatically switched with Light Switch. In the Light Switch settings, you can select a profile as an action, making it easy to adjust your monitor settings based on the current light or dark theme.
Command Palette: Compact Dock, Calculator history, and reliabilityThis release brings a large set of fixes and improvements to Command Palette and the Dock. Alongside a wide range of performance and stability improvements, this release also introduces new capabilities, including support for plain text and image viewer content types for extensions, making it possible to display raw text and zoomable images directly in the content pane, as well as a persistent calculator history with options to save, reuse, delete, and clear entries, plus a configurable primary action and the ability to replace the query on enter.
We’ve also made several improvements to the Dock experience. You can now choose to keep the Dock always on top of other windows. When the Dock is positioned at the top or bottom of the screen, a new Compact mode is available, offering a more condensed layout that hides the subtitle!
Pinning has also been improved. When you pin a command from Command Palette, a new dialog lets you choose where it appears in the Dock and whether to show or hide the title and subtitle.
This release also fixes two separate typing-crash scenarios, hardens extension loading so one faulty extension no longer takes down the whole list, improves indexer search with filename broadening and Windows Search availability indicators, and adds Windows Terminal profile pinning with per-profile icons.
Massive thanks to @jiripolasek for the sustained Command Palette work across this release!
Keyboard Manager improvementsIn the last release, we introduced a new Keyboard Manager Editor that makes it easier to create and manage remappings. In this release, we are refining that experience further. You can now manually tweak recorded keys. After recording a remapping, each key becomes a dropdown, allowing you to adjust it or select keys that may not exist on your physical keyboard.
We also added a new action called Disabled, which lets you quickly disable specific keys or shortcuts.
We also fixed an important issue with multi line text replacement, significantly improving reliability in chat apps and plain text editors.
ZoomIt gets scrolling screenshotsZoomIt also brings several enhancements to productivity and capture workflows. You can now take scrolling screenshots, making it easier to capture long pages or content that extends beyond the visible screen. We’ve also added text extraction directly when snipping, so you can quickly grab and reuse text without extra steps. In addition, the break timer has been improved with a new screen saver mode, helping you step away and take breaks more effectively.
Other notable changesFor the full list of changes and fixes, check out the complete release notes on GitHub.
Big thanks to the communityAs always, a big thank-you to everyone who contributed — we couldn’t do this release without you! Thanks @jiripolasek, @adelobosko, @oxygen-dioxide, @foxmsft, @RinZ27, @Salehnaz, @squirrelslair, @PesBandi, @daverayment, @raycheung, @jsoref, @Gijsreyn, @Jay-o-Way, and @markrussinovich for your pull requests!
We’re always happy to get your feedback and contributions – whether it’s a bug report, a feature idea, or a pull request. Head over to the PowerToys repo to jump in.
The full release notes can be found here.
Useful linksThe post PowerToys 0.99 is here: new monitor controls, easier window management, and Dock upgrades appeared first on Windows Command Line.