Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
147213 stories
·
33 followers

Why the Moltbook frenzy was like Pokémon

1 Share

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Lots of influential people in tech last week were describing Moltbook, an online hangout populated by AI agents interacting with one another, as a glimpse into the future. It appeared to show AI systems doing useful things for the humans that created them (one person used the platform to help him negotiate a deal on a new car). Sure, it was flooded with crypto scams, and many of the posts were actually written by people, but something about it pointed to a future of helpful AI, right?

The whole experiment reminded our senior editor for AI, Will Douglas Heaven, of something far less interesting: Pokémon.

Back in 2014, someone set up a game of Pokémon in which the main character could be controlled by anyone on the internet via the streaming platform Twitch. Playing was as clunky as it sounds, but it was incredibly popular: at one point, a million people were playing the game at the same time.

“It was yet another weird online social experiment that got picked up by the mainstream media: What did this mean for the future?” Will says. “Not a lot, it turned out.”

The frenzy about Moltbook struck a similar tone to Will, and it turned out that one of the sources he spoke to had been thinking about Pokémon too. Jason Schloetzer, at the Georgetown Psaros Center for Financial Markets and Policy, saw the whole thing as a sort of Pokémon battle for AI enthusiasts, in which they created AI agents and deployed them to interact with other agents. In this light, the news that many AI agents were actually being instructed by people to say certain things that made them sound sentient or intelligent makes a whole lot more sense. 

“It’s basically a spectator sport,” he told Will, “but for language models.”

Will wrote an excellent piece about why Moltbook was not the glimpse into the future that it was said to be. Even if you are excited about a future of agentic AI, he points out, there are some key pieces that Moltbook made clear are still missing. It was a forum of chaos, but a genuinely helpful hive mind would require more coordination, shared objectives, and shared memory.

“More than anything else, I think Moltbook was the internet having fun,” Will says. “The biggest question that now leaves me with is: How far will people push AI just for the laughs?”

Read the whole story.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

AI Doesn’t Reduce Work—It Intensifies It

1 Share

AI Doesn’t Reduce Work—It Intensifies It

Aruna Ranganathan and Xingqi Maggie Ye from Berkeley Haas School of Business report initial findings in the HBR from their April to December 2025 study of 200 employees at a "U.S.-based technology company".

This captures an effect I've been observing in my own work with LLMs: the productivity boost these things can provide is exhausting.

AI introduced a new rhythm in which workers managed several active threads at once: manually writing code while AI generated an alternative version, running multiple agents in parallel, or reviving long-deferred tasks because AI could “handle them” in the background. They did this, in part, because they felt they had a “partner” that could help them move through their workload.

While this sense of having a “partner” enabled a feeling of momentum, the reality was a continual switching of attention, frequent checking of AI outputs, and a growing number of open tasks. This created cognitive load and a sense of always juggling, even as the work felt productive.

I'm frequently finding myself with work on two or three projects running parallel. I can get so much done, but after just an hour or two my mental energy for the day feels almost entirely depleted.

I've had conversations with people recently who are losing sleep because they're finding building yet another feature with "just one more prompt" irresistible.

The HBR piece calls for organizations to build an "AI practice" that structures how AI is used to help avoid burnout and counter effects that "make it harder for organizations to distinguish genuine productivity gains from unsustainable intensity".

I think we've just disrupted decades of existing intuition about sustainable working practices. It's going to take a while and some discipline to find a good new balance.

Via Hacker News

Tags: careers, ai, generative-ai, llms, ai-assisted-programming, ai-ethics

Read the whole story
alvinashcraft
19 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Testing ads in ChatGPT

1 Share
OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control.
Read the whole story
alvinashcraft
28 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Transformers.js v4 Preview: Now Available on NPM!

1 Share
Read the whole story
alvinashcraft
39 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Designing Effective Multi-Agent Architectures

1 Share

Papers on agentic and multi-agent systems (MAS) skyrocketed from 820 in 2024 to over 2,500 in 2025. This surge suggests that MAS are now a primary focus for the world’s top research labs and universities. Yet there is a disconnect: While research is booming, these systems still frequently fail when they hit production. Most teams instinctively try to fix these failures with better prompts. I use the term prompting fallacy to describe the belief that model and prompt tweaks alone can fix systemic coordination failures. You can’t prompt your way out of a system-level failure. If your agents are consistently underperforming, the issue likely isn’t the wording of the instruction; it’s the architecture of the collaboration.

Beyond the Prompting Fallacy: Common Collaboration Patterns

Some coordination patterns stabilize systems. Others amplify failure. There is no universal best pattern, only patterns that fit the task and the way information needs to flow. The following provides a quick orientation to common collaboration patterns and when they tend to work well.

Supervisor-based architecture

A linear, supervisor-based architecture is the most common starting point. One central agent plans, delegates work, and decides when the task is done. This setup can be effective for tightly scoped, sequential reasoning problems, such as financial analysis, compliance checks, or step-by-step decision pipelines. The strength of this pattern is control. The weakness is that every decision becomes a bottleneck. As soon as tasks become exploratory or creative, that same supervisor often becomes the point of failure. Latency increases. Context windows fill up. The system starts to overthink simple decisions because everything must pass through a single cognitive bottleneck.

Blackboard-style architecture

In creative settings, a blackboard-style architecture with shared memory often works better. Instead of routing every thought through a manager, multiple specialists contribute partial solutions into a shared workspace. Other agents critique, refine, or build on those contributions. The system improves through accumulation rather than command. This mirrors how real creative teams work: Ideas are externalized, challenged, and iterated on collectively.

Peer-to-peer collaboration

In peer-to-peer collaboration, agents exchange information directly without a central controller. This can work well for dynamic tasks like web navigation, exploration, or multistep discovery, where the goal is to cover ground rather than converge quickly. The risk is drift. Without some form of aggregation or validation, the system can fragment or loop. In practice, this peer-to-peer style often shows up as swarms.

Swarms architecture

Swarms work well in tasks like web research because the goal is coverage, not immediate convergence. Multiple agents explore sources in parallel, follow different leads, and surface findings independently. Redundancy is not a bug here; it’s a feature. Overlap helps validate signals, while divergence helps avoid blind spots. In creative writing, swarms are also effective. One agent proposes narrative directions, another experiments with tone, a third rewrites structure, and a fourth critiques clarity. Ideas collide, merge, and evolve. The system behaves less like a pipeline and more like a writers’ room.

The key risk with swarms is that they generate volume faster than they generate decisions, which can also lead to token burn in production. Consider strict exit conditions to prevent exploding costs. Also, without a later aggregation step, swarms can drift, loop, or overwhelm downstream components. That’s why they work best when paired with a concrete consolidation phase, not as a standalone pattern.

Considering all of this, many production systems benefit from hybrid patterns. A small number of fast specialists operate in parallel, while a slower, more deliberate agent periodically aggregates results, checks assumptions, and decides whether the system should continue or stop. This balances throughput with stability and keeps errors from compounding unchecked. This is why I teach this agents-as-teams mindset throughout AI Agents: The Definitive Guide, because most production failures are coordination problems long before they are model problems.

If you think more deeply about this team analogy, you quickly realize that creative teams don’t run like research labs. They don’t route every thought through a single manager. They iterate, discuss, critique, and converge. Research labs, on the other hand, don’t operate like creative studios. They prioritize reproducibility, controlled assumptions, and tightly scoped analysis. They benefit from structure, not freeform brainstorming loops. This is why it’s not a surprise if your systems fail; if you apply one default agent topology to every problem, the system can’t perform at its full potential. Most failures attributed to “bad prompts” are actually mismatches between task, coordination pattern, information flow, and model architecture.

Want Radar delivered straight to your inbox? Join us on Substack. Sign up here.

Breaking the Loop: “Hiring” Your Agents the Right Way

I design AI agents the same way I think about building a team. Each agent has a skill profile, strengths, blind spots, and an appropriate role. The system only works when these skills compound rather than interfere. A strong model placed in the wrong role behaves like a highly skilled hire assigned to the wrong job. It doesn’t merely underperform, it actively introduces friction. In my mental model, I categorize models by their architectural personality. The following is a high-level overview.

Decoder-only (the generators and planners): These are your standard LLMs like GPT or Claude. They are your talkers and coders, strong at drafting and step-by-step planning. Use them for execution: writing, coding, and producing candidate solutions.

Encoder-only (the analysts and investigators): Models like BERT and its modern representations such as ModernBERT and NeoBERT do not talk; they understand. They build contextual embeddings and are excellent at semantic search, filtering, and relevance scoring. Use them to rank, verify, and narrow the search space before your expensive generator even wakes up.

Mixture of experts (the specialists): MoE models behave like a set of internal specialist departments, where a router activates only a subset of experts per token. Use them when you need high capability but want to spend compute selectively.

Reasoning models (the thinkers): These are models optimized to spend more compute at test time. They pause, reflect, and check their own reasoning. They’re slower, but they often prevent expensive downstream mistakes.

So if you find yourself writing a 2,000-word prompt to make a fast generator act like a thinker, you’ve made a bad hire. You don’t need a better prompt; you need a different architecture and better system-level scaling.

Designing Digital Organizations: The Science of Scaling Agentic Systems

Neural scaling1 is continuous and works well for models. As shown by classic scaling laws, increasing parameter count, data, and compute tends to result in predictable improvements in capability. This logic holds for single models. Collaborative scaling,2 as you need in agentic systems, is different. It’s conditional. It grows, plateaus, and sometimes collapses depending on communication costs, memory constraints, and how much context each agent actually sees. Adding agents doesn’t behave like adding parameters.

This is why topology matters. Chains, trees, and other coordination structures behave very differently under load. Some topologies stabilize reasoning as systems grow. Others amplify noise, latency, and error. These observations align with early work on collaborative scaling in multi-agent systems, which shows that performance does not increase monotonically with agent count.

Recent work from Google Research and Google DeepMind3 makes this distinction explicit. The difference between a system that improves with every loop and one that falls apart is not the number of agents or the size of the model. It’s how the system is wired. As the number of agents increases, so does the coordination tax: Communication overhead grows, latency spikes, and context windows blow up. In addition, when too many entities attempt to solve the same problem without clear structure, the system begins to interfere with itself. The coordination structure, the flow of information, and the topology of decision-making determine whether a system amplifies capability or amplifies error.

The System-Level Takeaway

If your multi-agent system is failing, thinking like a model practitioner is no longer enough. Stop reaching for the prompt. The surge in agentic research has made one truth undeniable: The field is moving from prompt engineering to organizational systems. The next time you design your agentic system, ask yourself:

  • How do I organize the team? (patterns) 
  • Who do I put in those slots? (hiring/architecture) 
  • Why could this fail at scale? (scaling laws)

That said, the winners in the agentic era won’t be those with the smartest instructions but the ones who build the most resilient collaboration structures. Agentic performance is an architectural outcome, not a prompting problem.


References

  1. Jared Kaplan et al., “Scaling Laws for Neural Language Models,” (2020): https://arxiv.org/abs/2001.08361.
  2. Chen Qian et al., “Scaling Large Language Model-based Multi-Agent Collaboration,” (2025): https://arxiv.org/abs/2406.07155.
  3. Yubin Kim et al., “Towards a Science of Scaling Agent Systems,” (2025): https://arxiv.org/abs/2512.08296.


Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

How AI coding makes developers 56% faster and 19% slower

1 Share

There’s a growing body of research around AI coding assistants with a confusing range of conflicting results. This is to be expected when the landscape is constantly shifting from coding suggestions to agent-based workflows to Ralph Wiggum loops and beyond.

The Reichenbach Falls in Switzerland has a drop of 250 metres and a flow rate of 180-300 cubic metres per minute (enough to fill about 1,500 bathtubs). This is comparable to the rate of change in tools and techniques around coding assistants over the past year, so few of us are using it in the same way. You can’t establish best practices under these conditions; only practical point-in-time techniques.

As an industry, we, like Sherlock Holmes and James Moriarty, are battling on the precipice of this torrent, and the survival of high-quality software and sustainable delivery is at stake.

Given the rapid evolution of tools and techniques, I hesitate to cite studies from 2025, let alone 2023. Yet these are the most-cited studies on the effectiveness of coding assistants, and they present conflicting findings. One study reports developers completed tasks 56% faster, while another reports a 19% slowdown.

The studies provide a platform for thinking critically about AI in software development, enabling more constructive discussions, even as we fumble our collective way toward understanding how to use it meaningfully.

The GitHub self-assessment

The often-cited 56% speedup stems from a 2023 collaboration among Microsoft Research, GitHub, and MIT. The number emerged from a lab test in which developers were given a set of instructions and a test suite to see how quickly and successfully they could create an HTTP server in JavaScript.

In this test, the AI-assisted group completed the task in 71 minutes, compared to 161 minutes for the control group. That makes it 55.8% faster. Much of the difference came from the speed at which novice developers completed the task. Task success was comparable between the two groups.

There are weaknesses in this approach. The tool vendor was involved in defining the task against which the tool would be measured. If I were sitting an exam, it would be to my advantage to set the questions. Despite this, we can generously accept that it made the coding task faster, and that the automated tests sufficiently defined task success.

We might also be generous in stating that tools have improved over the past three years. Benchmarking reports like those from METR indicate that AI has doubled the number of tasks it can handle every 7 months; other improvements are likely.

We’ve also observed the emergence of techniques that introduce work plans and task chunking, thereby improving the agent’s ability to perform larger tasks that would otherwise incur context decay.

And METR is also the source of our cautionary counterfinding regarding task speed.

The METR sense check

The METR study in 2025 examined the impact of contemporary tools on task completion times in real-world open-source projects. The research is based on 246 tasks performed by 16 developers who had experience using AI tools. Each task was randomly assigned to an AI-assisted group and a control group. Screen recordings were captured to check and categorize the task completion.

The research found that tasks were slowed by 19%, which appears to contradict the earlier report. In reality, the active coding time was reduced by AI tools, as was the task of searching for answers, testing, and debugging. The difference in the METR report was that it identified tools that introduced new task categories, such as reviewing AI output, prompting, and waiting for responses. These new tasks, along with increased idle/overhead time, consumed the gains and pushed overall task completion times into the red.

 

Source: METR Measuring the Impact of Early-2025 AI. Task category comparison.

One finding from the METR study worth noting is the perception problem. Developers predicted AI assistants would speed them up. After completing the task, they also estimated they had saved time, even though they were 19% slower. This highlights that our perceptions of productivity are unreliable, as they were when we believed that multitasking made us more productive.

Lack of consensus

A recently released study from Multitudes, based on data collected over 10 months in 2025, highlights the lack of consensus around the productivity benefits of AI coding tools. They found that the number of code changes increased, but this was countered by an increase in out-of-hours commits.

This appears to be a classic case of increasing throughput at the expense of stability, with out-of-hours commits representing failure demand rather than feature development. It also clouds the picture, as developers who work more hours tend to make more commits, even without an AI assistant.

Some of the blame was attributed to adoption patterns that left little time for learning and increased delivery pressure on teams, even though they now had tools that were supposed to help them.

The wicked talent problem

One finding that repeatedly comes up in the research is that AI coding assistants benefit novice developers more than those with deep experience. This makes it likely that using these tools will exacerbate a wicked talent problem. Novice developers may never shed their reliance on tools, as they become accustomed to working at a higher level of abstraction.

This is excellent news for those selling AI coding tools, as an ever-expanding market of developers who can’t deliver without the tools will be a fruitful source of future income. When investors are ready to recoup, organizations will have little choice but to accept whatever pricing structure is required to make vendors profitable. Given the level of investment, this may be a difficult price to accept.

The problem may deepen as organizations have stopped hiring junior developers, believing that senior developers can delegate junior-level tasks to AI tools. This doesn’t align with the research, which shows junior developers speed up the most when using AI.

The AI Pulse Report compares this to the aftermath of the dot-com bubble, when junior hiring was frozen, resulting in a shortage of skilled developers. When hiring picked up again, increased competition for talent led to higher salaries.

Source: The AI Pulse Report. Hiring plans for junior developers.

Continuous means safe, quick, and sustainable

While many practitioners recognize the relevance of value stream management and the theory of constraints to AI adoption, a counter-movement is emerging that calls for the complete removal of downstream roadblocks.

“If you can’t complete code reviews at the speed at which they are created with AI, you should stop doing them. Every other quality of a system should be subverted to straight-line speed. Why waste time in discovery when it would starve the code-generating machine? Instead, we should build as much as we can as fast as we can.”

As a continuous delivery practitioner and a long-time follower of the DORA research program, I find this no longer makes sense to me. One of the most powerful findings in the DORA research is that a user-centric approach beats flat-line speed in terms of product performance. You can slow development down to a trickle if you’ve worked out your discovery process, because you don’t need many rounds of chaotic or random experiments when you have a deep understanding of the user and the problem they want solved.

We have high confidence that continuous delivery practices improve the success of AI adoption. You shouldn’t rush to dial up coding speed until you’ve put those practices in place, and you shouldn’t remove practices in the name of speed. That means working in small batches, integrating changes into the main branch every few hours, keeping your code deployable at all times, and automating builds, code analysis, tests, and deployments to smooth the flow of change.

Continuous delivery is about getting all types of changes to users safely, quickly, and sustainably. The calls to remove stages from the deployment pipeline to expedite delivery compromise the safety and sustainability of software delivery, permanently degrading the software’s value for a temporary gain.

It’s a system

There’s so much to unpack in the research, and many studies focus on a single link in a much longer chain. Flowing value from end to end safely, quickly, and sustainably should be the goal, rather than merely maintaining flat-line speed or optimizing individual tasks, especially when those tasks are the constraining factor.

With the knowledge we’ve built over the last seven decades, we should be moving into a new era of professionalism in software engineering. Instead, we’re being distracted by speed above all other factors. When my local coffee shop did this, complete with a clipboard-wielding Taylorist assessor tasked with bringing order-to-delivery times down to 30 seconds, the delivery of fast, bad coffee convinced me to find a new place to get coffee. Is this what we want from our software?

The results across multiple studies show that claims of a revolution are premature, unless it’s an overlord revolution that will depress the salaries of those pesky software engineers and produce a group of builders who can’t deliver software without these new tools. Instead, we should examine the landscape and learn from research and from one another as we work out how to use LLM-based tools effectively in our complex socio-technical environments.

We are at a crossroads: either professionalize our work or adopt a prompt-and-fix model that resembles the earliest attempts to build software. There are infinite futures ahead of us. I don’t dread the AI-assisted future as a developer, but as a software user. I can’t tolerate the quality and usability chasm that will result from removing continuous delivery practices in the name of speed.

The post How AI coding makes developers 56% faster and 19% slower appeared first on The New Stack.

Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories