There’s a growing body of research around AI coding assistants with a confusing range of conflicting results. This is to be expected when the landscape is constantly shifting from coding suggestions to agent-based workflows to Ralph Wiggum loops and beyond.
The Reichenbach Falls in Switzerland has a drop of 250 metres and a flow rate of 180-300 cubic metres per minute (enough to fill about 1,500 bathtubs). This is comparable to the rate of change in tools and techniques around coding assistants over the past year, so few of us are using it in the same way. You can’t establish best practices under these conditions; only practical point-in-time techniques.
As an industry, we, like Sherlock Holmes and James Moriarty, are battling on the precipice of this torrent, and the survival of high-quality software and sustainable delivery is at stake.
Given the rapid evolution of tools and techniques, I hesitate to cite studies from 2025, let alone 2023. Yet these are the most-cited studies on the effectiveness of coding assistants, and they present conflicting findings. One study reports developers completed tasks 56% faster, while another reports a 19% slowdown.
The studies provide a platform for thinking critically about AI in software development, enabling more constructive discussions, even as we fumble our collective way toward understanding how to use it meaningfully.
The GitHub self-assessment
The often-cited 56% speedup stems from a 2023 collaboration among Microsoft Research, GitHub, and MIT. The number emerged from a lab test in which developers were given a set of instructions and a test suite to see how quickly and successfully they could create an HTTP server in JavaScript.
In this test, the AI-assisted group completed the task in 71 minutes, compared to 161 minutes for the control group. That makes it 55.8% faster. Much of the difference came from the speed at which novice developers completed the task. Task success was comparable between the two groups.
There are weaknesses in this approach. The tool vendor was involved in defining the task against which the tool would be measured. If I were sitting an exam, it would be to my advantage to set the questions. Despite this, we can generously accept that it made the coding task faster, and that the automated tests sufficiently defined task success.
We might also be generous in stating that tools have improved over the past three years. Benchmarking reports like those from METR indicate that AI has doubled the number of tasks it can handle every 7 months; other improvements are likely.
We’ve also observed the emergence of techniques that introduce work plans and task chunking, thereby improving the agent’s ability to perform larger tasks that would otherwise incur context decay.
And METR is also the source of our cautionary counterfinding regarding task speed.
The METR sense check
The METR study in 2025 examined the impact of contemporary tools on task completion times in real-world open-source projects. The research is based on 246 tasks performed by 16 developers who had experience using AI tools. Each task was randomly assigned to an AI-assisted group and a control group. Screen recordings were captured to check and categorize the task completion.
The research found that tasks were slowed by 19%, which appears to contradict the earlier report. In reality, the active coding time was reduced by AI tools, as was the task of searching for answers, testing, and debugging. The difference in the METR report was that it identified tools that introduced new task categories, such as reviewing AI output, prompting, and waiting for responses. These new tasks, along with increased idle/overhead time, consumed the gains and pushed overall task completion times into the red.
One finding from the METR study worth noting is the perception problem. Developers predicted AI assistants would speed them up. After completing the task, they also estimated they had saved time, even though they were 19% slower. This highlights that our perceptions of productivity are unreliable, as they were when we believed that multitasking made us more productive.
Lack of consensus
A recently released study from Multitudes, based on data collected over 10 months in 2025, highlights the lack of consensus around the productivity benefits of AI coding tools. They found that the number of code changes increased, but this was countered by an increase in out-of-hours commits.
This appears to be a classic case of increasing throughput at the expense of stability, with out-of-hours commits representing failure demand rather than feature development. It also clouds the picture, as developers who work more hours tend to make more commits, even without an AI assistant.
Some of the blame was attributed to adoption patterns that left little time for learning and increased delivery pressure on teams, even though they now had tools that were supposed to help them.
The wicked talent problem
One finding that repeatedly comes up in the research is that AI coding assistants benefit novice developers more than those with deep experience. This makes it likely that using these tools will exacerbate a wicked talent problem. Novice developers may never shed their reliance on tools, as they become accustomed to working at a higher level of abstraction.
This is excellent news for those selling AI coding tools, as an ever-expanding market of developers who can’t deliver without the tools will be a fruitful source of future income. When investors are ready to recoup, organizations will have little choice but to accept whatever pricing structure is required to make vendors profitable. Given the level of investment, this may be a difficult price to accept.
The problem may deepen as organizations have stopped hiring junior developers, believing that senior developers can delegate junior-level tasks to AI tools. This doesn’t align with the research, which shows junior developers speed up the most when using AI.
The AI Pulse Report compares this to the aftermath of the dot-com bubble, when junior hiring was frozen, resulting in a shortage of skilled developers. When hiring picked up again, increased competition for talent led to higher salaries.
Continuous means safe, quick, and sustainable
While many practitioners recognize the relevance of value stream management and the theory of constraints to AI adoption, a counter-movement is emerging that calls for the complete removal of downstream roadblocks.
“If you can’t complete code reviews at the speed at which they are created with AI, you should stop doing them. Every other quality of a system should be subverted to straight-line speed. Why waste time in discovery when it would starve the code-generating machine? Instead, we should build as much as we can as fast as we can.”
As a continuous delivery practitioner and a long-time follower of the DORA research program, I find this no longer makes sense to me. One of the most powerful findings in the DORA research is that a user-centric approach beats flat-line speed in terms of product performance. You can slow development down to a trickle if you’ve worked out your discovery process, because you don’t need many rounds of chaotic or random experiments when you have a deep understanding of the user and the problem they want solved.
We have high confidence that continuous delivery practices improve the success of AI adoption. You shouldn’t rush to dial up coding speed until you’ve put those practices in place, and you shouldn’t remove practices in the name of speed. That means working in small batches, integrating changes into the main branch every few hours, keeping your code deployable at all times, and automating builds, code analysis, tests, and deployments to smooth the flow of change.
Continuous delivery is about getting all types of changes to users safely, quickly, and sustainably. The calls to remove stages from the deployment pipeline to expedite delivery compromise the safety and sustainability of software delivery, permanently degrading the software’s value for a temporary gain.
It’s a system
There’s so much to unpack in the research, and many studies focus on a single link in a much longer chain. Flowing value from end to end safely, quickly, and sustainably should be the goal, rather than merely maintaining flat-line speed or optimizing individual tasks, especially when those tasks are the constraining factor.
With the knowledge we’ve built over the last seven decades, we should be moving into a new era of professionalism in software engineering. Instead, we’re being distracted by speed above all other factors. When my local coffee shop did this, complete with a clipboard-wielding Taylorist assessor tasked with bringing order-to-delivery times down to 30 seconds, the delivery of fast, bad coffee convinced me to find a new place to get coffee. Is this what we want from our software?
The results across multiple studies show that claims of a revolution are premature, unless it’s an overlord revolution that will depress the salaries of those pesky software engineers and produce a group of builders who can’t deliver software without these new tools. Instead, we should examine the landscape and learn from research and from one another as we work out how to use LLM-based tools effectively in our complex socio-technical environments.
We are at a crossroads: either professionalize our work or adopt a prompt-and-fix model that resembles the earliest attempts to build software. There are infinite futures ahead of us. I don’t dread the AI-assisted future as a developer, but as a software user. I can’t tolerate the quality and usability chasm that will result from removing continuous delivery practices in the name of speed.
The post How AI coding makes developers 56% faster and 19% slower appeared first on The New Stack.