We kicked off our new weekly series This Week in AI on Monday, and we covered a lot of ground in 30 minutes, including an AI model that found security holes faster than decades of human auditing, a data center in Utah the size of two Manhattans, and a practical argument for why the harness you build around a model now matters more than which model you pick.
Here are a few takeaways from the conversation between host Eric Freeman, faculty member at UT Austin and a longtime friend of OâReilly, and guest John Berryman, founder of Arcturus Labs, an early production engineer on GitHub Copilot, and coauthor of O’Reilly’s Prompt Engineering for LLMs. Watch the entire episode to find out why you should be building your own agent and why John believes eventually there will be no internet for humans.
Youâve probably already heard about Mythos. Anthropic’s internal testing of the frontier model surfaced thousands of previously unknown security vulnerabilities across major operating systems, browsers, and financial infrastructure, including a 27-year-old bug in OpenBSD. Anthropic chose not to release the model publicly and instead launched Project Glasswing, a restricted program giving monitored access to a small group of trusted partners for defensive patching.
That decision moved fast in Washington. In roughly six weeks, the conversation shifted from the light-touch national AI policy released in March to reported White House discussions of an executive order review process modeled on how the FDA handles drugs. Security researcher Bruce Schneier has questioned whether Mythos is uniquely capable here or whether similar results are achievable with cheaper public models, but as Freeman noted (paraphrasing Schneier), either way, itâs a problem thatâs coming.
Anthropic leased xAI’s entire Colossus 1 supercluster in Memphis: more than 200,000 GPUs and 300 megawatts of power. A month before that deal, Anthropic expanded its agreement with Google and Broadcom for 3.5 gigawatts of capacity coming online in 2027. For context, that’s roughly 10 times the power output of the Colossus 1 deal, in a single contract. After this episode aired, Anthropic announced that that deal has been expanded to Colossus 2 as well.
Box Elder County, Utah, just approved a 40,000-acre AI data center called the Stratos project, backed by investor and TV personality Kevin O’Leary (a.k.a. Mr. Wonderful). Itâs planned for 9 gigawatts at full buildout. That’s a footprint more than twice the size of Manhattan, powered by the equivalent of nine commercial nuclear reactors. And like many data center deals going forward, including Colossus above, it was approved over local protests.
Infrastructure at this incredible scale takes years to come online, and the companies making these bets are pricing in a world where model capability keeps scaling. Whether that assumption holds will determine a lot about what’s economically viable to build in the next decade.
John was on hand to rethink the agent harness, which as he pointed out, entered a new phase with the step change in model capability that occurred in November and December of last year. He took Eric through the arc of AI product development, from document completion and chat loops to tool-calling agents, DAG-based workflows, and now the harness era represented by tools like Claude Code. Each progression added capability, John noted, but also complexity, and each generated a new class of problems around reliability and control. In our current moment, which John has dubbed the âage of the unharnessed agent,â agents are now within reach of everyone, not just software developers.
The payoff of this âunharnessedâ era is control. John described a client engagement where he replaced a bespoke application with a skills-driven agent. Now domain experts with no development experience can read the agent’s behavior written in plain English and better understand it. As John explained,
Rather than building a bespoke agent. . ., I just built something that was just the agent harnessâthe agentâand I just gave it skills that describe what basically I learned in interviewing their experts, how they would work with these agents. And it worked perfectly. Not only does the agent stay on track and do what it needs to do these days, but it’s coded, as far as my client is concerned, in English.
The experts don’t have to complain to developers âthis doesn’t work.â The experts can look at the English description of what’s going on and see problems, and maybe even fix it themselves. And I’m really excited to basically give that power into the hands of the people that know best how to change it, the experts.
That’s a different relationship between the experts and the tool than anything a wrapped commercial product offers.
As Eric pointed out, recent Stanford research supports this broader point: Performance gaps between a bare model and a well-designed harness now often matter more than which underlying model you’re using. The benchmark that used to dominate buying decisions, which model scores highest, has been displaced by a harder question about which harness fits the task.
John closed with a demo of his personal agent moving from an Obsidian notebook into Wikipedia and back, carrying context across environments. He used it to illustrate a concept he called the “open agent protocol,” his term for a not-yet-existing standard where an agent receives environment-specific skills as it moves between contexts. The protocol doesn’t exist yet, but the demo made the direction clear.
Join us and a rotating lineup of expert guests for weekly live tool demos and deeper dives into the topics that matter in AI. Weâre taking next week off for Memorial Day in the US, but weâll be back on June 1 with host Andreas Welsch and guests Maya Mikhailov and Doug Shannon to cut through another week of AI headlines and separate what actually drives business value from what looks good in a demo but goes nowhere in production. Our first few episodes are free and open to all if youâd like to attend liveâregister here.
Weâll continue to share full episodes and publish our takeaways here on Radar each Friday. You can also watch or listen on YouTube, Spotify, Apple, or wherever you get your podcasts.

Production incidents are a context problem. By the time an engineers understand what’s happening, they’ve already bounced across several different tools – and the incident is still ongoing. PagerDuty thinks MCP is the fix.
When incidents hit production systems, engineers rarely stay inside one tool for long, jumping from logs to dashboards to runbooks, trying to reconstruct what is actually happening.
Talking to other builders, it seemed like almost everybody faces this context-switching problem.
RocĂo Bayon (Product Manager) and Sebastian Villanelo (Sr. Forward Deployed Engineer) from PagerDuty think MCP is how you fix it.
RocĂo explained that their MCP is solving the issue of context switching:
When an incident hits, the engineer has to go between 5 to 10 different tools to understand what’s happening.
That’s the real problem they’re trying to solve.
PagerDuty’s framing of MCP was interesting: neither RocĂo nor Sebastian described MCP as just another integration layer. They framed it as connective tissue that gathers logs, alerts, runbooks, and incident context into a single workflow.
What the MCP does, it brings all that context into one platform where engineers are usually already working.
Most engineering organizations already have enormous amounts of observability data. The real problem is that it is scattered across systems, and engineers end up reconstructing operational context manually during incidents.
Sebastian framed the problem as signal retrieval. Rather than feeding the model more information, the goal is pulling the relevant operational state around a specific incident.
If you have the right parameters or the queries and all this stuff, you will retrieve the exact information that you need.
That means narrowing context around the actual incident window. When an incident hits, it retrieves information around that time only, Sebastian explained.
That also changes how they think about efficiency, reducing context switching directly affects operational speed, token usage, and cost.
You will see that information only with one call. And that saves a lot of tokens and time. That’s money and time.

Still, both of them were careful not to frame AI as autonomous incident management.
RocĂo repeatedly emphasized that MCP and AI systems are primarily helping with context gathering and operational visibility, while engineers remain responsible for the high-risk decisions:
The AI is helping you, but the engineer is the one who is assessing and making decisions where there’s a high risk.
That human layer is intentional. PagerDuty’s broader vision seems less about replacing on-call engineers and more about reducing the operational overhead surrounding incidents. Their MCP systems help gather information, surface relationships between systems, and accelerate investigation workflows, but humans still decide what actually happens next.
RocĂo also mentioned that their SRE agent is designed to support larger incident workflows beyond information retrieval:
It can also help you trigger those incident workflows. So it can help you resolve the incident. And it learns as it goes.
I asked RocĂo and Sebastian, how does MCP fit into the tools they already use without becoming just another silo.
And both of them clearly framed MCP as anti-silo infrastructure since it brings everything to one place. RocĂo called MCP “the connective tissue between all these different tools.”
That framing probably captures the broader architectural challenge better than anything else in the interview.
Modern incident response already spans dozens of systems: observability platforms, deployment pipelines, CI/CD tooling, ticketing systems, infrastructure management, and communication layers.
AI systems inherit that fragmentation unless something explicitly connects operational state.
Sebastian mentioned that teams often react very differently to MCP systems. Some embrace them immediately while others remain skeptical, especially around security and predictability. For him, trust improves once systems consistently produce expected outcomes:
When a person or a teammate says “ah, I’m retrieving what I’m expecting to retrieve”, that will help them to trust it.
A lot of AI tooling discussions still focus on model capability, reasoning quality, or benchmark performance. But operational systems are usually adopted much more pragmatically. Engineers trust systems that behave predictably, retrieve the right operational context, and fit into workflows they already rely on.
The post When systems go down, devs still juggle 10 tabs. PagerDuty says MCP fixes that appeared first on ShiftMag.
Join us for the Big Technology AI Summit on June, 18, 2026. Get your tickets here: summit.bigtechnology.com.... Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We cover: 1) OpenAI's revenue numbers come out ahead of its potential IPO filing 2) Why is OpenAI considering going public now? 3) Is OpenAI trying to IPO ahead of Anthropic? 4) Is the Iran War accelerating the timeline of these fundraisings? 5) What would the top of the AI boom look like? 6) SpaceX files for an IPO 7) Are datacenters in space a myth? 8) Eric Schmidt gets booed at a college commencement 9) Meta's mass layoff 10) Meta's keystroke tracking rationale 11) Marc Andreessen says AI won't file a HR complaint
---
Enjoying Big Technology Podcast? Please rate us five stars âââââ in your podcast app of choice.
Want a discount for Big Technology on Substack + Discord? Hereâs 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b
Learn more about your ad choices. Visit megaphone.fm/adchoices