Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
152693 stories
·
33 followers

Your PS5 can now transform into a Linux PC

1 Share

A developer has created a method to get Linux running on some versions of Sony's PlayStation 5 console. Andy Nguyen previously showed off a ported version of Ubuntu running PC games on a PS5 last month, and he's now published the installation steps on GitHub this week.

This is a soft mod, so it won't persist between power downs or restarts, but the Linux installation will let you play PC games once it's up and running. So far we've seen GTA V running with enhanced ray tracing at 60fps in Ubuntu on a PS5, as well as Spider-Man running at 1440p resolution and 60fps.

Nguyen is relying on a patched vulnerability to transform a PS5 into a Linux …

Read the full story at The Verge.

Read the whole story
alvinashcraft
23 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Don’t Automate Your Moat: Matching AI Autonomy to Risk and Competitive Stakes

1 Share

I was talking to a senior engineer at a well-funded company not long ago. I asked him to walk me through a critical algorithm at the heart of their product, something that ran hundreds of times a second and directly affected customer outcomes. He paused and said, “Honestly, I’m not totally sure how it works. AI wrote it.”

A few weeks later, a different engineer at another company was paged about a system outage. He pulls up the failing service and realizes he has no idea it’s connected to a database. A colleague accepted the AI-generated PR three months ago that added that dependency. The tests passed. The change was never written down. The original engineer moved on and the knowledge was lost.

These aren’t new stories. Engineers have always inherited systems they didn’t fully build. What’s new is the disguise and the speed. AI is an amazing enabler. Organizations must adopt it to remain relevant. Yet the emerging pattern—describe what you want, let an agent iterate until it works, pay for it in tokens instead of engineering hours—is functionally a buy decision wearing a build costume. The code is in your repo. Your engineers merged the PR. It feels like you built it. But if nobody on your team understands why it works the way it does, you’ve purchased a dependency you can’t maintain from a vendor you can’t call.

AI doesn’t create that gap once. It widens it continuously at a pace that outstrips the organizational habits that once kept it manageable. Two problems compound at once. You can’t extend the thing that makes you hard to replace. And when it breaks, the incident lands on a team that doesn’t understand what they’re fixing, turning a recoverable outage into a customer-facing crisis. Engineering leaders have wrestled with build-versus-buy tradeoffs for decades, and the hard-won lesson has always been the same: You don’t outsource your competitive advantage. The token-funded generation loop doesn’t change that calculus. It makes it easier to skip the question entirely.

The question that matters isn’t “Can AI do this?” If it can’t today, it will be able to tomorrow. And the argument that follows does not depend on the quality of the AI-generated code. This article covers two questions most engineering organizations have never asked at the same time. Most teams optimize for velocity and never ask what they’re risking or giving away in the process. The gap between those unasked questions is where the most expensive mistakes are already being made.

Part 1: Two dimensions. Neither is velocity.

Moving faster matters. But velocity alone misses the two dimensions that determine whether AI autonomy helps or hurts your business.

Business risk: What’s the blast radius if this fails? A bug in an internal CLI tool costs you an afternoon. A bug in your authentication logic costs you customers and possibly market cap. A bug in your core pricing algorithm costs you the business. These are not the same.

Competitive differentiation: Does this code define your business? Your moat is your architecture, your performance characteristics, your core algorithms, and the product decisions baked into your infrastructure. But it’s also the institutional knowledge that shaped them: the reasoning behind the trade-offs, the context that no model was trained on. If your competitors can generate the same code with the same model you’re using, it stops being an advantage.

Most organizations ask the first question on a good day. Almost none ask the second. That gap is how you end up shipping fast into a moat nobody can explain and nobody can extend.

Understanding why both dimensions matter starts with velocity and what happens when the feedback loop around it breaks.

Velocity feels real. Debt is often invisible.

AI coding tools are genuinely impressive. GitHub’s research showed 55% faster task completion with Copilot in controlled conditions.1 That number has driven an assumption that faster is always better.

A 2025 METR randomized controlled trial2 found something that should give every engineering leader pause. Sixteen experienced developers on real production codebases forecasted they’d complete tasks 24% faster with AI. After finishing, they estimated they’d gone 20% faster. They’d actually gone 19% slower.

The velocity finding is striking. But the perception gap matters more. The feedback loop between “how am I doing?” and “how am I actually doing?” was broken throughout and never corrected itself. This doesn’t resolve the velocity debate. It reframes it. The danger isn’t that individuals move too fast. Organizations mistake output volume for productivity and strip out the review processes that used to catch what that gap costs.

A Tilburg University study of open source projects after GitHub Copilot’s introduction found the same pattern at the organizational level.3 Productivity did increase, but primarily among less-experienced developers. Code written after AI adoption required more rework to meet repository standards. The added rework burden fell on the most experienced (core) developers who reviewed 6.5% more code after Copilot’s introduction and saw a 19% drop in their own original code output. The velocity looks real at the surface. Underneath, the maintenance cost shifts upward to the people who can least afford to lose productive time.

That broken feedback loop has a name. Researchers call it cognitive debt4: the growing gap between how much code exists in your system and how much of it anyone actually understands. Technical debt shows up in your linter and your backlog. Cognitive debt is invisible. There’s no signal telling engineers where their understanding ends. That’s precisely what the METR perception gap showed. It never corrected itself.

Research by Anthropic Fellows found that engineers using AI assistance when learning new tools scored 17% lower on comprehension tests than those who coded by hand, with the steepest drops in debugging ability.5 MIT’s Media Lab found the same pattern in writing tasks: Brain connectivity was weakest in the group using LLM assistance, strongest in the group working without tools.⁴ Active production builds understanding. Passive consumption doesn’t.

You understand what you build better than what you review. When you write code, you produce output and build a mental model. That’s what Peter Naur called the “theory of the program.” It lives in your head, not in the repo.6 The MIT study captured this directly. 83% of participants who wrote essays with LLM assistance could not quote a single sentence from essays they had just written.⁴

Cognitive debt is invisible until it isn’t. When it surfaces, it hits both dimensions hard, in different ways.

Business risk: The blast radius of not knowing

On the business risk dimension, cognitive debt is a safety problem.

When nobody fully understands the system, the blast radius of a failure expands silently. The incident that eventually comes (and it always comes) lands on a team that can’t diagnose what they didn’t build. The engineer pulling up the failing service at 2 AM has no mental model of why it was built the way it was, what it connects to, or what the edge cases look like under load. So they ask the LLM. It can explain what the code does and often propose a reasonable fix. It can’t tell you why it was designed that way. And a fix that looks right to the model can quietly violate constraints that nobody thought to document.

Cognitive debt compounds a second, independent risk: the pace at which AI-generated code reaches production. OX Security’s analysis7 of over 300 software repositories found that AI-generated code isn’t necessarily more vulnerable per line than human-written code. The problem is velocity.

Code review, debugging, and team oversight are the bottlenecks that catch vulnerable code before it ships. AI makes it easy to remove them. CodeRabbit’s analysis of real-world pull requests found AI-authored changes contain up to 1.7x more critical and major defects than human-written code, with logic and correctness issues up 75%.8 Apiiro’s analysis found that while AI reliably reduces surface-level syntax errors, architectural design flaws and privilege escalation paths (the categories automated scanners miss and human reviewers struggle to catch) spiked in AI-assisted codebases.9

AI accelerates output and accelerates unreviewed risk in equal measure. The cognitive debt means that when something breaks, the team is learning the system as they’re trying to fix it. Remove their understanding and you haven’t streamlined the process. You’ve only removed the thing standing between a bad day and a catastrophic one.

Competitive differentiation: What you give away without knowing it

The competitive differentiation risk isn’t that AI will generate your exact competitive algorithm and hand it to your competitor. It’s subtler. Your advantage was never the code itself; it was the judgment that shaped it. When AI writes that code, the judgment never forms. The code arrives, but the understanding that would let your team extend it, improve it, or defend it under pressure doesn’t. Your moat is most likely to survive in the places AI finds hardest to reach.

That judgment—formed by the performance trade-offs that took years to tune, the failure modes that only someone who’s been paged understands, the architectural decisions that encode domain knowledge nobody wrote down—doesn’t live in the codebase. It lives in your engineers’ heads.

And here’s the part most teams miss: Your competitor with the same AI tools doesn’t just get similar code, they get a team that also doesn’t understand why it works the way it does, which means neither of you can extend it, and the race to the next architectural move is a coin flip rather than a compounding advantage. The build-versus-buy discipline exists precisely because decades of experience taught engineering organizations that outsourcing your core means losing the ability to extend it. The token-funded generation loop doesn’t change that calculus. It makes it easier to mistake the outsourcing for ownership because the code has your name on it.

The structural problem runs even deeper. Models trained on public code produce outputs weighted toward well-represented patterns, the common solutions to common problems. Research confirms this. LLM performance drops sharply on less-common programming languages where training data is sparse, and on genuinely novel implementations. Even the best current models correctly implement fewer than 40% of coding tasks drawn from recent research papers.10 And the convergence problem extends beyond code. A pre-registered experiment tracking 61 participants over seven days found that while ChatGPT consistently boosted creative output during use, performance reverted to baseline the moment the tool was unavailable.11 More critically, the work produced with AI assistance became increasingly homogenized over time. That homogenization persisted even after the tool was removed. The participants hadn’t borrowed the tool’s output. They’d internalized its patterns. For engineering organizations, this is the differentiation risk made concrete: Teams that rely on AI for their most critical design decisions risk generating commodity code today and training themselves to think in commodity patterns tomorrow.

Engineers who deeply own their most critical systems are better at diagnosing incidents and see the next architectural move that competitors can’t follow. Delegate that comprehension away and you can keep the lights on. You can’t see around corners.

When it goes wrong, it really goes wrong

Both dimensions rest on the same vulnerability: cognitive debt accumulating on work that matters. The failure cases make it concrete.

The production failures are accumulating. A Replit AI agent deleted months of production data in seconds after violating explicit code-freeze instructions, then initially misled the user about whether recovery was possible.12 Reports emerged in early 2026 of a major cloud provider convening mandatory engineering reviews after a pattern of high-blast-radius incidents, with AI-assisted code changes cited as a contributing factor. In each case, the humans in the loop either didn’t understand what they were approving, or weren’t in the loop at all.

The deeper pattern predates AI tools entirely. Knight Capital Group took seventeen years to become the largest trader in U.S. equities. It took forty-five minutes to lose $460 million.13 The culprit was a nine-year-old piece of deprecated code called Power Peg, left on production servers and never retested after engineers modified an adjacent function in 2005. When engineers reused its feature flag for new functionality in 2012, nobody understood what they were reactivating. When the fault surfaced, the team’s attempt to fix it made things worse. They uninstalled the new code from the seven servers where it had deployed correctly, which caused Power Peg to activate on those servers too and compounded the losses. The SEC’s enforcement order is unambiguous: absent deployment procedures, no code review requirements, no incident response protocols. A failure of institutional comprehension where the mental model had quietly evaporated while the code kept running.

No AI tool wrote that code. The failure was entirely human, through entirely normal processes: engineers leaving, tests never rerun after refactors, flags reused without documentation. This is the baseline, what software organizations produce under ordinary conditions over nine years. An engineering team with modern AI tools won’t recreate this specific bug. They’ll create the conditions for the next one faster: more code that nobody fully understands, more dependencies nobody documented, more cognitive debt accumulating before anyone notices. AI removes the friction that once slowed exactly this kind of erosion.

None are failures of AI capability. They’re failures of judgment about where to deploy AI and how much human oversight to maintain.

Part 2: A four-quadrant model for AI autonomy

The quadrants

Human involvement in programming quadrants

Four quadrants emerge when both questions are asked together. Before the examples, two contrasts are worth naming because the quadrants that look most similar on the surface are the ones most often confused in practice.

Supervised automation versus Human-led craftsmanship. Both demand high human involvement. Both feel like “be careful here.” But the difference is fundamental. In Supervised Automation, the human is a safety gate. The work is a commodity; you’re there to catch errors before they escape. In Human-led craftsmanship, the human is the author. You’re building the mental model that lets the next engineer reason about this system under pressure three years from now and take it somewhere new. The code isn’t something you need to verify. It’s something you need to own. And ownership here extends beyond the individual engineer. The team writes RFCs, debates trade-offs, identifies which parts of the implementation fall into which quadrant, and makes sure the reasoning behind key decisions is shared, not siloed. Human-led craftsmanship isn’t one person writing code alone. It’s a team making sure the understanding survives the people who built it.

Collaborative co-creation versus Human-led craftsmanship. Both involve high differentiation, and in both, the human drives the vision and owns the key decisions. But risk changes everything about how you work. In Collaborative co-creation, early iterations are recoverable. A wrong turn can be corrected before it costs you anything serious, so AI can genuinely accelerate execution. In Human-led craftsmanship, the blast radius of not understanding what you’ve built compounds over time. Wrong turns become load-bearing walls, and the architectural moves you can’t see are the ones that let competitors catch up. AI assists with scoped subtasks only. Every contribution gets interrogated.

In full automation, the human is a director. You define what needs to be done, AI produces the output, and you spot-check the result. The work is low-risk and low-differentiation. If something’s wrong, you fix it in the next iteration without anyone outside the team noticing. This is where AI earns its keep without qualification, and where restricting it costs you real velocity with nothing to show for it.

To make all four quadrants concrete, we’ll use a single feature as a lens: building AI Gateway cost controls, the system that sets token budgets per agent, enforces spending limits, tracks usage by model and agent, and handles enforcement modes when an agent exceeds its budget.

Low risk, low differentiation: Full automation

API docs for cost controls. Test scaffolding for token limit scenarios. Config examples for per-agent budgets. Every platform has docs, and if there’s a mistake, you fix it in the next iteration without anyone outside the team noticing. Humans set direction and spot-check. AI writes, tests, and ships.

The test: If this is wrong, can you fix it before a customer sees it or complains? If yes, automate freely.

Low risk, high differentiation: Collaborative co-creation

Designing the UX for the token usage dashboard. Iterating on routing rules that determine when an agent degrades to a cheaper model, halts entirely, or triggers a notification. These decisions separate a sophisticated platform from a blunt on/off switch, but early iterations are recoverable. A first version that doesn’t surface guardrail costs separately isn’t a disaster. It’s a product conversation. Humans drive the design vision and interrogate AI on trade-offs. AI accelerates execution and handles boilerplate.

The test: If you flipped the ratio (AI deciding, human rubber-stamping) would you be comfortable? If not, this requires genuine co-creation, not delegation. The human should be able to explain the trade-offs in the current design and know where to push it next.

High risk, low differentiation: Supervised automation

Enforcement logic that halts an agent when it hits its token budget. Every cost control system needs enforcement, so this isn’t differentiating. But if it fails, agents run unconstrained and rack up unbounded LLM spend. AI can draft the logic. A human must trace every path and understand every state transition before signing off. The question before merge: Can I explain exactly what happens when an agent hits the limit mid-execution? Can I explain this behavior to Customer Success or the Customer?

The test: Could a competent engineer review this confidently without having written it? If yes, the human’s job is to verify, not to author. But the bar for verification is explanation, not approval.

High risk, high differentiation: Human-led craftsmanship

The core token metering and attribution engine. It tracks usage per agent and per model, attributes guardrail costs separately so they don’t count against agent budgets, and provides the auditability enterprise customers need to govern AI spend. Get it wrong and customers can’t trust the numbers. Get it right and it’s a genuine competitive moat that competitors can’t replicate with the same AI tools you’re using.

Human engineers own the design end-to-end. AI assists on scoped subtasks once the design is settled: drafting specific functions, generating test coverage for paths the engineer has already reasoned through. Every contribution gets interrogated. The bar is whether the engineer could explain it in an incident review without looking at the code first.

The test: If the engineer who built this left tomorrow, would the team still understand why it works the way it does? Could they make it better? If the honest answer is no, you’re accumulating the most dangerous kind of cognitive debt there is.

The counterargument (it’s a good one)

Any engineering leader will push back here, and they’ll have good reason to.

The research is thin. METR’s study had 16 developers. MIT’s EEG work is a preprint that its own critics say should be interpreted conservatively.14 The Anthropic comprehension study shows a quiz score gap, not a business outcome. The evidence is early-stage. Intellectual honesty requires acknowledging that.

But the pattern keeps showing up in unrelated fields. A Lancet study found that endoscopists who routinely used AI for polyp detection performed measurably worse when the AI was removed, with adenoma detection rates dropping from 28.4% to 22.4% in three months.15 The study is observational and small. But the direction is consistent with everything else: Routine AI assistance may erode the skills it was supposed to support.

Most engineering work isn’t high-stakes. Studies consistently estimate that 60–80% of engineering time goes to maintenance, tests, docs, integration, and tooling, exactly the stuff that belongs in the automate quadrant regardless. Restricting AI because of the top 20% creates a real tax on the other 80%.

And can’t engineers develop deep ownership of AI-generated code through study and iteration? Partially. But the behavioral data tells a harder story. GitClear’s analysis of 211 million changed lines shows a decline in refactored code since AI adoption accelerated.16 Engineers aren’t studying AI-generated code carefully. They’re moving on to the next feature. LLM tools can explain what code does; they can’t tell you why the system was designed the way it was.17

The serious pro-AI argument isn’t “use AI everywhere.” It’s more precise: The guardrails for verification and oversight are improving fast, engineers who actively interrogate AI output build understanding even from generated code, and the organizations that restrict AI on their most critical work will fall behind competitors who don’t. This is a real argument.

The answer isn’t to dismiss it but to sharpen what “critical work” means. And, to recognize that the interrogative use of AI that the research identifies as understanding-preserving requires organizational discipline that most teams haven’t built yet. The quadrant isn’t permanent. The threshold shifts as both AI capability and human oversight practices mature. The discipline is the habit of asking both questions honestly before you start, not a fixed answer to them.

The discipline is simple. Maintaining it isn’t.

The quadrant tells you where to be careful. How you engage AI once you’re there determines whether careful is enough. The difference between “write me this function” and “explain why you made this trade-off, and what breaks if the input is malformed” is the difference between borrowing intelligence and developing it. Active, interrogative AI use preserves comprehension. Passive delegation destroys it. That’s what the Anthropic study’s behavioral data shows directly.

Match your review process to the quadrant. AI-generated docs and test scaffolding get a spot-check. AI-generated code touching your core product logic gets the same scrutiny as a junior engineer’s first PR. The bar for approval isn’t “tests pass.” It’s “someone on this team can explain what this does, defend it under pressure, and use that understanding to make it better.” Full automation needs a spot-check. Human-led craftsmanship needs an RFC, a team review, and shared ownership of the reasoning before anyone writes a line of code.

This matters especially in real-time data and AI infrastructure, systems where the most dangerous failure modes are emergent, appearing at scale and under load in combinations the code itself doesn’t express. Recognize that the threshold will shift. As AI capability improves, what belongs in the automate quadrant expands. The discipline isn’t a fixed answer. It’s the habit of asking both questions honestly before you start. It’s a core reason Redpanda is designed for simplicity and predictability: engineers need to be able to reason about how infrastructure behaves under pressure, not discover it during an incident.18

The real competitive question

The companies that get this right won’t be the ones that use the most AI or the least. They’ll be the ones whose leaders have internalized that risk and differentiation are independent variables, and that cognitive debt threatens both.

The engineer who doesn’t know how their algorithm works is a symptom. The organization that allowed it is the cause.

Treat cognitive debt as only a risk problem and you end up with engineers who can’t diagnose failures they didn’t build. Treat it as only a differentiation problem and you get fragile systems that survive until the next incident. Let it accumulate on your most critical systems and you get both at once.

Your competitor is making this calculation right now. The question isn’t whether to use AI. It’s whether you’re being honest about which quadrant you’re in, and whether your team will know the answer when it finally matters.


Co-authored with Claude (Anthropic). Yes, we took the advice from this article.


Footnotes

  1. Peng, S. et al. (2023). The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. https://arxiv.org/abs/2302.06590 ↩
  2. Becker, J., Rush, N. et al. (2025). Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. METR. https://arxiv.org/abs/2507.09089 ↩
  3. Xu, F., Medappa, P.K., Tunc, M.M., Vroegindeweij, M., & Fransoo, J.C. (2025). AI-Assisted Programming May Decrease the Productivity of Experienced Developers by Increasing Maintenance Burden. Tilburg University. https://arxiv.org/abs/2510.10165 ↩
  4. Kosmyna, N. et al. (2025). Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. MIT Media Lab. https://arxiv.org/abs/2506.08872 (preprint, not yet peer-reviewed) ↩
  5. Shen, J.H. & Tamkin, A. (2026). How AI Impacts Skill Formation. Anthropic Safety Fellows Program. https://arxiv.org/abs/2601.20245 ↩
  6. The generation effect: Rosner, Z.A. et al. (2012). The Generation Effect: Activating Broad Neural Circuits During Memory Encoding. Cortex. https://pmc.ncbi.nlm.nih.gov/articles/PMC3556209/ and Bertsch, S. et al. (2007). The generation effect: A meta-analytic review. Memory & Cognition. https://link.springer.com/article/10.3758/BF03193441 and Naur, P. (1985). Programming as Theory Building. Microprocessing and Microprogramming. https://pages.cs.wisc.edu/~remzi/Naur.pdf ↩
  7. OX Security. (October 2025). Army of Juniors: The AI Code Security Crisis. https://www.helpnetsecurity.com/2025/10/27/ai-code-security-risks-report/ ↩
  8. CodeRabbit. (December 2025). State of AI vs Human Code Generation Report. https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report. Note: CodeRabbit produces AI code review tooling; findings should be read in that context. ↩
  9. Apiiro. (September 2025). 4x Velocity, 10x Vulnerabilities: AI Coding Assistants Are Shipping More Risks. https://apiiro.com/blog/4x-velocity-10x-vulnerabilities-ai-coding-assistants-are-shipping-more-risks/. Note: Apiiro produces application security tooling; findings should be read in that context. ↩
  10. Joel, S., Wu, J.J., & Fard, F.H. (2024). A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages. ACM TOSEM. https://arxiv.org/abs/2410.03981. See also: Hua, et al. (2025). ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code. https://arxiv.org/abs/2506.02314 ↩
  11. Liu, Q., Zhou, Y., Huang, J., & Li, G. (2024). When ChatGPT is Gone: Creativity Reverts and Homogeneity Persists. https://arxiv.org/abs/2401.06816 ↩
  12. Fortune. (July 2025). AI-Powered Coding Tool Wiped Out a Software Company’s Database in ‘Catastrophic Failure.’ https://fortune.com/2025/07/23/ai-coding-tool-replit-wiped-database-called-it-a-catastrophic-failure/ ↩
  13. Knight Capital Group. SEC Administrative Proceeding, Release No. 70694 (October 16, 2013). https://www.sec.gov/litigation/admin/2013/34-70694.pdf. Levine, M. (2013). Knight Capital’s $440 Million Compliance Disaster. Bloomberg. https://www.bloomberg.com/opinion/articles/2013-10-17/knight-capital-s-440-million-compliance-disaster ↩
  14. Stankovic, M. et al. (2025). Comment on: Your Brain on ChatGPT. https://arxiv.org/abs/2601.00856 ↩
  15. Budzyń, K., Romańczyk, M. et al. (2025). Endoscopist Deskilling Risk After Exposure to Artificial Intelligence in Colonoscopy: A Multicentre, Observational Study. Lancet Gastroenterol Hepatol. 10(10):896-903. https://doi.org/10.1016/S2468-1253(25)00133-5 ↩
  16. Harding, W. (2025). AI Copilot Code Quality: Evaluating 2024’s Increased Defect Rate via Code Quality Metrics. GitClear. https://www.gitclear.com/ai_assistant_code_quality_2025_research ↩
  17. Zhou, X., Li, R., Liang, P., Zhang, B., Shahin, M., Li, Z., & Yang, C. (2025). Using LLMs in Generating Design Rationale for Software Architecture Decisions. ACM TOSEM. https://arxiv.org/abs/2504.20781. See also: Tang, N., Chen, M., Ning, Z., Bansal, A., Huang, Y., McMillan, C., & Li, T.J.-J. (2024). A Study on Developer Behaviors for Validating and Repairing LLM-Generated Code Using Eye Tracking and IDE Actions. IEEE VL/HCC 2024. https://arxiv.org/abs/2405.16081 ↩
  18. Gallego, A. (2025). Introducing the Agentic Data Plane. Redpanda. https://www.redpanda.com/blog/agentic-data-plane-adp. Crosier, K. (2026). How to Safely Deploy Agentic AI in the Enterprise. Redpanda. https://www.redpanda.com/blog/deploy-agentic-ai-safely-enterprise ↩



Read the whole story
alvinashcraft
24 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

RapidClaw Earns a 44.89 Proof of Usefulness Score by Building AI Co-Founder Agents

1 Share
RapidClaw helps early-stage founders and indie hackers automate startup tasks like investor outreach, pitch decks, market research, and dev work — each agent gets its own isolated server.
Read the whole story
alvinashcraft
24 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

I Added an MCP Server to My Browser-Based Tool Suite. Agents Found It Immediately.

1 Share

\ Last month I shipped an MCP server for Toolora — a suite of 41 browser based developer tools I've been building. Within days, AI agents were connecting to it. 594 MCP calls in the first week. I opened my analytics and thought I'd made it.

Then I looked closer at the breakdown.

246 _connect calls. 313 _list calls. 35 actual tool executions.

Agents were finding the server, cataloging every tool, and then… moving on. Like a tourist who photographs a restaurant menu but doesn't order. The MCP discovery-to-execution gap is real, it's not a code problem, and nobody is really talking about it.

But let me back up.

Why I Built This

Every developer tool I've ever used that "processes your file" actually sends it somewhere. Cloudflare, AWS, a startup's backend I know nothing about. I wanted tools that are genuinely local — where the privacy guarantee isn't a policy you read, it's a physical fact. Your file never leaves your device because no code exists to send it anywhere.

So I built Toolora as a fully browser-side tool suite. PDF merging runs on pdf-lib in a Web Worker. Image compression uses the Canvas API. JSON formatting, Base64, hashing, JWT decoding — all in-browser JavaScript, zero server contact.

41 tools now, across text, developer utilities, image processing, PDF manipulation, and AI/RAG prep tools (token counting, text chunking, HTML to Markdown).

The MCP Server

MCP (Model Context Protocol) is Anthropic's open standard for connecting AI agents to external tools. Think of it as a universal adapter that lets Claude, Cursor, and other AI tools call your functions in a structured way.

The irony of adding an MCP server to privacy-first browser tools isn't lost on me. The tools themselves are local-only; the MCP server is a server-side layer that wraps some of the same logic for agents that can't run browser code. It felt like the right tradeoff — humans get the browser tools, agents get the API.

Here's a simplified version of what the MCP endpoint returns when an agent calls tools/list:

{  "tools": [    {      "name": "count_words",      "description": "Count words, characters, sentences, and estimate reading time.",      "inputSchema": {        "type": "object",        "properties": {          "text": { "type": "string", "description": "Text to analyze" }        },        "required": ["text"]      }    }    // ... 24 more  ]}

I'm using a standard HTTP+SSE transport rather than stdio, which means it works as a hosted endpoint — paste the URL into Claude Desktop's config and you're connected. No install, no Docker, no npm package.

The Discovery Gap

Back to those numbers. 594 MCP calls, 35 actual executions.

When I dug in, the pattern was clear. Agents were running automated discovery sweeps — connecting to MCP servers, listing all available tools, and storing that metadata somewhere. The _connect and _list calls happened in seconds. The actual count_words and other tool calls trickled in much later, from what looked like different sessions.

This tells me a few things:

  1. MCP discoverability is a solved problem. Agents find new servers fast, apparently through directories and shared registries I haven't fully mapped yet.
  2. The gap is a placement problem, not a quality problem. Agents discover tools but don't use them unless they have a specific task that maps to that tool at the right moment. The tool descriptions and input schemas matter enormously for this.
  3. Tool descriptions need to be written for agents, not humans. My initial descriptions were human-readable marketing copy. Agents need specificity: what inputs, what output format, what edge cases.

I've started rewriting tool descriptions with agent usage patterns in mind. "Count words, chars & reading time" becomes "Analyzes text and returns word count, character count (with and without spaces), sentence count, paragraph count, and estimated reading time in minutes at 200 WPM."

The Technical Stack

The browser tools are all React components, each doing real work in JavaScript/WebAssembly:

  • PDF processing: pdf-lib + pdfjs-dist
  • Image compression: Canvas API with quality tuning
  • Hashing: Web Crypto API (SHA-1, SHA-256, SHA-512) for the stuff browsers support natively, WebAssembly for MD5
  • YAML parsing: js-yaml
  • Token counting: tiktoken (GPT-4o encoding) + custom Claude tokenizer approximation
  • Text chunking: configurable chunk size/overlap for RAG pipelines

The MCP server is a Node.js/Express layer that re-implements the same logic server-side for the tools that make sense as agent utilities. It runs alongside the frontend but is architecturally separate.

One thing I got wrong early: I tried to share code between the browser tools and the MCP server. It seemed elegant. It was a maintenance nightmare. Browser bundles and Node modules have just enough overlap to make you think it'll work, and just enough difference to break at the worst times. They're separate implementations now.

What's Actually Working

The organic traffic side is more straightforward than the MCP side. Each tool page is a dedicated route with static HTML content for crawlers:

toolora.dev/tools/developer/jwt-decodertoolora.dev/tools/developer/regex-testertoolora.dev/tools/text/word-counter

These rank reasonably well for long-tail queries because they load fast (Vite-bundled, sub-second LCP), they're genuinely useful, and the content is specific enough that there's not much competition for the exact query.

The Chrome extension has been the biggest surprise for retention. Users who install it come back more than users who bookmark the site. That's not shocking in retrospect — the extension appears every time they open a new tab — but I underestimated how much the distribution channel affects the usage pattern.

What I'd Do Differently

Ship the MCP server earlier. I treated it as a nice-to-have. It turns out AI agents are eager to adopt new tools and the barrier to them discovering your server is low if you list it in the right places.

Design for agents from day one. Not just the tool descriptions — the actual tool interfaces. Agents prefer tools with narrow, composable inputs over multi-purpose tools with lots of options. A tool that does one thing and does it predictably is more useful than a Swiss Army knife.

Don't over-engineer the privacy story. "Your files never leave your device" is a one-sentence sell. I spent too long writing longer explanations when the short version works fine.

The Broader Point

Browser-side processing has gotten genuinely powerful. WebAssembly, Web Workers, the Canvas API, Web Crypto — you can do a lot of real work without a server. The tradeoff is not zero (you can't do everything, and WebAssembly bundles add size), but for the category of "developer utility tools," the browser is now a legitimate compute platform.

And MCP is making server-side exposure of the same tools useful again, but for a different audience. It's an interesting moment where the same product can have two genuinely different technical delivery mechanisms for two genuinely different users.

If you want to see the tools: toolora.dev \n MCP server endpoint for Claude Desktop:https://toolora.dev/api/mcp

The extension is on the Chrome Web Store if you search "Toolora."


What's your experience with the MCP discovery gap? I'm curious whether other MCP server authors see the same connect/list vs. actual-execution ratio.

\n

\

Read the whole story
alvinashcraft
24 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

DNS MX Record Modification and Unauthorised Email Redirection

1 Share
A security issue affected a small number of emails sent to @caphyon.com on April 22, 2026. Some messages were briefly redirected, but no systems were breached. Users who sent sensitive data during this time are advised to contact support.

Read the whole story
alvinashcraft
25 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

INI vs JSON vs XML vs Others for the best config file format

1 Share
We compare the most commonly used configuration file formats and try to determine which file type is suitable for you to use.

Read the whole story
alvinashcraft
25 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories