Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
150485 stories
·
33 followers

Why AI coding agents aren’t production-ready: Brittle context windows, broken refactors, missing operational awareness

1 Share

Remember this Quora comment (which also became a meme)?

(Source: Quora)

In the pre-large language model (LLM) Stack Overflow era, the challenge was discerning which code snippets to adopt and adapt effectively. Now, while generating code has become trivially easy, the more profound challenge lies in reliably identifying and integrating high-quality, enterprise-grade code into production environments.

This article will examine the practical pitfalls and limitations observed when engineers use modern coding agents for real enterprise work, addressing the more complex issues around integration, scalability, accessibility, evolving security practices, data privacy and maintainability in live operational settings. We hope to balance out the hype and provide a more technically-grounded view of the capabilities of AI coding agents.

Limited domain understanding and service limits

AI agents struggle significantly with designing scalable systems due to the sheer explosion of choices and a critical lack of enterprise-specific context. To describe the problem in broad strokes, large enterprise codebases and monorepos are often too vast for agents to directly learn from, and crucial knowledge can be frequently fragmented across internal documentation and individual expertise.

More specifically, many popular coding agents encounter service limits that hinder their effectiveness in large-scale environments. Indexing features may fail or degrade in quality for repositories exceeding 2,500 files, or due to memory constraints. Furthermore, files larger than 500 KB are often excluded from indexing/search, which impacts established products with decades-old, larger code files (although newer projects may admittedly face this less frequently).

For complex tasks involving extensive file contexts or refactoring, developers are expected to provide the relevant files and while also explicitly defining the refactoring procedure and the surrounding build/command sequences to validate the implementation without introducing feature regressions.

Lack of hardware context and usage

AI agents have demonstrated a critical lack of awareness regarding OS machine, command-line and environment installations (conda/venv). This deficiency can lead to frustrating experiences, such as the agent attempting to execute Linux commands on PowerShell, which can consistently result in ‘unrecognized command’ errors. Furthermore, agents frequently exhibit inconsistent ‘wait tolerance’ on reading command outputs, prematurely declaring an inability to read results (and moving ahead to either retry/skip) before a command has even finished, especially on slower machines.

This isn't merely about nitpicking features; rather, the devil is in these practical details. These experience gaps manifest as real points of friction and necessitate constant human vigilance to monitor the agent’s activity in real-time. Otherwise, the agent might ignore initial tool call information and either stop prematurely, or proceed with a half-baked solution requiring undoing some/all changes, re-triggering prompts and wasting tokens. Submitting a prompt on a Friday evening and expecting the code updates to be done when checking on Monday morning is not guaranteed.

Hallucinations over repeated actions

Working with AI coding agents often presents a longstanding challenge of hallucinations, or incorrect or incomplete pieces of information (such as small code snippets) within a larger set of changesexpected to be fixed by a developer with trivial-to-low effort. However, what becomes particularly problematic is when incorrect behavior is repeated within a single thread, forcing users to either start a new thread and re-provide all context, or intervene manually to “unblock” the agent.

For instance, during a Python Function code setup, an agent tasked with implementing complex production-readiness changes encountered a file (see below) containing special characters (parentheses, period, star). These characters are very common in computer science to denote software versions.

(Image created manually with boilerplate code. Source: Microsoft Learn and Editing Application Host File (host.json) in Azure Portal)

The agent incorrectly flagged this as an unsafe or harmful value, halting the entire generation process. This misidentification of an adversarial attack recurred 4 to 5 times despite various prompts attempting to restart or continue the modification. This version format is in-fact boilerplate, present in a Python HTTP-trigger code template. The only successful workaround involved instructing the agent to not read the file, and instead request it to simply provide the desired configuration and assure it that the developer will manually add it to that file, confirm and ask it to continue with remaining code changes.

The inability to exit a repeatedly faulty agent output loop within the same thread highlights a practical limitation that significantly wastes development time. In essence, developers tend to now spend time on debugging/refining AI-generated code rather than Stack Overflow code snippets or their own.

Lack of enterprise-grade coding practices

Security best practices: Coding agents often default to less secure authentication methods like key-based authentication (client secrets) rather than modern identity-based solutions (such as Entra ID or federated credentials). This oversight can introduce significant vulnerabilities and increase maintenance overhead, as key management and rotation are complex tasks increasingly restricted in enterprise environments.

Outdated SDKs and reinventing the wheel: Agents may not consistently leverage the latest SDK methods, instead generating more verbose and harder-to-maintain implementations. Piggybacking on the Azure Function example, agents have outputted code using the pre-existing v1 SDK for read/write operations, rather than the much cleaner and more maintainable v2 SDK code. Developers must research the latest best practices online to have a mental map of dependencies and expected implementation that ensures long-term maintainability and reduces upcoming tech migration efforts.

Limited intent recognition and repetitive code: Even for smaller-scoped, modular tasks (which are typically encouraged to minimize hallucinations or debugging downtime) like extending an existing function definition, agents may follow the instruction literally and produce logic that turns out to be near-repetitive, without anticipating the upcoming or unarticulated needs of the developer. That is, in these modular tasks the agent may not automatically identify and refactor similar logic into shared functions or improve class definitions, leading to tech debt and harder-to-manage codebases especially with vibe coding or lazy developers.

Simply put, those viral YouTube reels showcasing rapid zero-to-one app development from a single-sentence prompt simply fail to capture the nuanced challenges of production-grade software, where security, scalability, maintainability and future-resistant design architectures are paramount.

Confirmation bias alignment

Confirmation bias is a significant concern, as LLMs frequently affirm user premises even when the user expresses doubt and asks the agent to refine their understanding or suggest alternate ideas. This tendency, where models align with what they perceive the user wants to hear, leads to reduced overall output quality, especially for more objective/technical tasks like coding.

There is ample literature to suggest that if a model begins by outputting a claim like “You are absolutely right!”, the rest of the output tokens tend to justify this claim.

Constant need to babysit

Despite the allure of autonomous coding, the reality of AI agents in enterprise development often demands constant human vigilance. Instances like an agent attempting to execute Linux commands on PowerShell, false-positive safety flags or introduce inaccuracies due to domain-specific reasons highlight critical gaps; developers simply cannot step away. Rather, they must constantly monitor the reasoning process and understand multi-file code additions to avoid wasting time with subpar responses.

The worst possible experience with agents is a developer accepting multi-file code updates riddled with bugs, then evaporating time in debugging due to how ‘beautiful’ the code seemingly looks. This can even give rise to the sunk cost fallacy of hoping the code will work after just a few fixes, especially when the updates are across multiple files in a complex/unfamiliar codebase with connections to multiple independent services.

It's akin to collaborating with a 10-year old prodigy who has memorized ample knowledge and even addresses every piece of user intent, but prioritizes showing-off that knowledge ove solving the actual problem, and lacks the foresight required for success in real-world use cases.

This "babysitting" requirement, coupled with the frustrating recurrence of hallucinations, means that time spent debugging AI-generated code can eclipse the time savings anticipated with agent usage. Needless to say, developers in large companies need to be very intentional and strategic in navigating modern agentic tools and use-cases.

Conclusion

There is no doubt that AI coding agents have been nothing short of revolutionary, accelerating prototyping, automating boilerplate coding and transforming how developers build. The real challenge now isn’t generating code, it’s knowing what to ship, how to secure it and where to scale it. Smart teams are learning to filter the hype, use agents strategically and double down on engineering judgment.

As GitHub CEO Thomas Dohmke recently observed: The most advanced developers have “moved from writing code to architecting and verifying the implementation work that is carried out by AI agents.” In the agentic era, success belongs not to those who can prompt code, but those who can engineer systems that last.

Rahul Raja is a staff software engineer at LinkedIn.

Advitya Gemawat is a machine learning (ML) engineer at Microsoft.

Editors note: The opinions expressed in this article are the authors' personal opinions and do not reflect the opinions of their employers.



Read the whole story
alvinashcraft
4 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Windows Authentication for Cloud-Native Identities: Modernizing Azure SQL Managed Instance (Preview)

1 Share

Organizations moving to the cloud often face a critical challenge: maintaining seamless authentication for legacy applications without compromising security or user experience. Today, we’re excited to announce support for Windows Authentication for Microsoft Entra principals on Azure SQL Managed Instance, enabling cloud-native identities to authenticate using familiar Windows credentials.

Why This Matters

Traditionally, Windows Authentication relied on on-premises Active Directory, making it difficult for businesses adopting a cloud-only strategy to preserve existing authentication models. With this new capability:

  • Hybrid Identity Support: Users synchronized between on-premises AD DS and Microsoft Entra ID can continue using a single set of credentials for both environments.
  • Cloud-Only Identity (Preview): Identities that exist only in Microsoft Entra ID can now leverage Kerberos-based Windows Authentication for workloads like Azure SQL Managed Instance—without requiring domain controllers.

This means organizations can modernize infrastructure while maintaining compatibility with legacy apps, reducing friction during migration.

Key Benefits

  • Seamless Migration: Move legacy applications to Azure SQL Managed Instance without rewriting authentication logic.
  • Passwordless Security: Combine Windows Authentication with modern credentials like Windows Hello for Business or FIDO2 keys, enabling MFA and reducing password-related risks.
  • Cloud-Native Integration: Microsoft Entra Kerberos acts as a cloud-based Key Distribution Center (KDC), issuing Kerberos tickets for cloud resources such as Azure SQL Managed Instance and Azure Files

Breaking Barriers to Cloud Migration

Many enterprises hesitate to migrate legacy apps because they depend on Windows Authentication. By extending this capability to cloud-native identities, we remove a major barrier—allowing customers to modernize at their own pace while leveraging familiar authentication models.

Learn More

Read the whole story
alvinashcraft
13 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Contributor is Not the Magic Wand You May Think it is!

1 Share
The code referenced in this blog can be found here ! There are still moments when Azure catches me by surprise. Back when I first moved away from traditional on-prem environments, I was convinced the Contributor  role was almost identical to Owner , just without the ability to manage access or view billing. Simple, right? Turns out that was completely wrong, and what I learned since then keeps saving me headaches every time I spin up a new environment. My goal of this post is that it...

Read the whole story
alvinashcraft
13 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Create Your Own SMTP Server Using Aspire 13

1 Share
Building your own email server is a rite of passage for many developers. It’s also a fantastic way to learn exactly how internet email works. In this post, we’re going to cover the code contained in my BlazorSMTPServer repository . We will walk through how email actually works, why sending it is so much harder than receiving it, and how to build a casual solution using Blazor and Azure. Important Note: This project is for educational purposes and personal experimentation. This is n
Read the whole story
alvinashcraft
13 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Immutable Collection Add() Trap: Don’t Get Burned — Use a Builder Instead

1 Share
When using immutable collections in .NET, remember that the Add() method creates a new instance rather than mutating the original. Ignoring the return value leads to ineffective loops. Instead, use a builder for efficiency, as it offers faster performance and lower memory allocation while still ensuring immutability in the final result.





Read the whole story
alvinashcraft
13 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Which AI Agent Framework Should I Use

1 Share
A complete guide to choosing the best AI agent framework for your project. Learn the differences between LangChain, LangGraph, AutoGen, Semantic Kernel, and CrewAI, and discover which one fits your use case, technical requirements, and scalability goals.
Read the whole story
alvinashcraft
13 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories