Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154329 stories
·
33 followers

Meta quietly launches a new Reddit-like app called Forum

1 Share
The company describes the app as a "dedicated space built for deeper discussions, real answers and communities you care about."
Read the whole story
alvinashcraft
35 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

This Week in AI: Rethinking the Agent Harness

1 Share

We kicked off our new weekly series This Week in AI on Monday, and we covered a lot of ground in 30 minutes, including an AI model that found security holes faster than decades of human auditing, a data center in Utah the size of two Manhattans, and a practical argument for why the harness you build around a model now matters more than which model you pick.

Here are a few takeaways from the conversation between host Eric Freeman, faculty member at UT Austin and a longtime friend of O’Reilly, and guest John Berryman, founder of Arcturus Labs, an early production engineer on GitHub Copilot, and coauthor of O’Reilly’s Prompt Engineering for LLMs. Watch the entire episode to find out why you should be building your own agent and why John believes eventually there will be no internet for humans.

AI’s security problem is now a policy problem

You’ve probably already heard about Mythos. Anthropic’s internal testing of the frontier model surfaced thousands of previously unknown security vulnerabilities across major operating systems, browsers, and financial infrastructure, including a 27-year-old bug in OpenBSD. Anthropic chose not to release the model publicly and instead launched Project Glasswing, a restricted program giving monitored access to a small group of trusted partners for defensive patching.

That decision moved fast in Washington. In roughly six weeks, the conversation shifted from the light-touch national AI policy released in March to reported White House discussions of an executive order review process modeled on how the FDA handles drugs. Security researcher Bruce Schneier has questioned whether Mythos is uniquely capable here or whether similar results are achievable with cheaper public models, but as Freeman noted (paraphrasing Schneier), either way, it’s a problem that’s coming.

The compute race is getting stranger

Anthropic leased xAI’s entire Colossus 1 supercluster in Memphis: more than 200,000 GPUs and 300 megawatts of power. A month before that deal, Anthropic expanded its agreement with Google and Broadcom for 3.5 gigawatts of capacity coming online in 2027. For context, that’s roughly 10 times the power output of the Colossus 1 deal, in a single contract. After this episode aired, Anthropic announced that that deal has been expanded to Colossus 2 as well.

Box Elder County, Utah, just approved a 40,000-acre AI data center called the Stratos project, backed by investor and TV personality Kevin O’Leary (a.k.a. Mr. Wonderful). It’s planned for 9 gigawatts at full buildout. That’s a footprint more than twice the size of Manhattan, powered by the equivalent of nine commercial nuclear reactors. And like many data center deals going forward, including Colossus above, it was approved over local protests.

Infrastructure at this incredible scale takes years to come online, and the companies making these bets are pricing in a world where model capability keeps scaling. Whether that assumption holds will determine a lot about what’s economically viable to build in the next decade.

The harness matters more than the model

John was on hand to rethink the agent harness, which as he pointed out, entered a new phase with the step change in model capability that occurred in November and December of last year. He took Eric through the arc of AI product development, from document completion and chat loops to tool-calling agents, DAG-based workflows, and now the harness era represented by tools like Claude Code. Each progression added capability, John noted, but also complexity, and each generated a new class of problems around reliability and control. In our current moment, which John has dubbed the “age of the unharnessed agent,” agents are now within reach of everyone, not just software developers.

The payoff of this “unharnessed” era is control. John described a client engagement where he replaced a bespoke application with a skills-driven agent. Now domain experts with no development experience can read the agent’s behavior written in plain English and better understand it. As John explained,

Rather than building a bespoke agent. . ., I just built something that was just the agent harness—the agent—and I just gave it skills that describe what basically I learned in interviewing their experts, how they would work with these agents. And it worked perfectly. Not only does the agent stay on track and do what it needs to do these days, but it’s coded, as far as my client is concerned, in English.

The experts don’t have to complain to developers “this doesn’t work.” The experts can look at the English description of what’s going on and see problems, and maybe even fix it themselves. And I’m really excited to basically give that power into the hands of the people that know best how to change it, the experts.

That’s a different relationship between the experts and the tool than anything a wrapped commercial product offers.

As Eric pointed out, recent Stanford research supports this broader point: Performance gaps between a bare model and a well-designed harness now often matter more than which underlying model you’re using. The benchmark that used to dominate buying decisions, which model scores highest, has been displaced by a harder question about which harness fits the task.

John closed with a demo of his personal agent moving from an Obsidian notebook into Wikipedia and back, carrying context across environments. He used it to illustrate a concept he called the “open agent protocol,” his term for a not-yet-existing standard where an agent receives environment-specific skills as it moves between contexts. The protocol doesn’t exist yet, but the demo made the direction clear.

What’s next

Join us and a rotating lineup of expert guests for weekly live tool demos and deeper dives into the topics that matter in AI. We’re taking next week off for Memorial Day in the US, but we’ll be back on June 1 with host Andreas Welsch and guests Maya Mikhailov and Doug Shannon to cut through another week of AI headlines and separate what actually drives business value from what looks good in a demo but goes nowhere in production. Our first few episodes are free and open to all if you’d like to attend live—register here.

We’ll continue to share full episodes and publish our takeaways here on Radar each Friday. You can also watch or listen on YouTube, Spotify, Apple, or wherever you get your podcasts.



Read the whole story
alvinashcraft
36 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Trump Mobile confirms it exposed customers’ personal data, unclear whether it will notify those affected

1 Share
Lorenzo Franceschi-Bicchierai reports: Phone provider Trump Mobile has confirmed that it was exposing customers’ names, email addresses, mailing addresses, cell numbers, and order identifiers to the open internet. Chris Walker, a spokesperson for the Trump-branded phone maker, told TechCrunch that the company is investigating the exposure and has not found evidence that content or financial...

Source

Read the whole story
alvinashcraft
36 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

When systems go down, devs still juggle 10 tabs. PagerDuty says MCP fixes that

1 Share

Production incidents are a context problem. By the time an engineers understand what’s happening, they’ve already bounced across several different tools – and the incident is still ongoing. PagerDuty thinks MCP is the fix.

When incidents hit production systems, engineers rarely stay inside one tool for long, jumping from logs to dashboards to runbooks, trying to reconstruct what is actually happening.

Talking to other builders, it seemed like almost everybody faces this context-switching problem.

Rocío Bayon (Product Manager) and Sebastian Villanelo (Sr. Forward Deployed Engineer) from PagerDuty think MCP is how you fix it.

PagerDuty built their MCP to cut context switching

Rocío explained that their MCP is solving the issue of context switching:

When an incident hits, the engineer has to go between 5 to 10 different tools to understand what’s happening.

That’s the real problem they’re trying to solve.

PagerDuty’s framing of MCP was interesting: neither Rocío nor Sebastian described MCP as just another integration layer. They framed it as connective tissue that gathers logs, alerts, runbooks, and incident context into a single workflow.

What the MCP does, it brings all that context into one platform where engineers are usually already working.

Most engineering organizations already have enormous amounts of observability data. The real problem is that it is scattered across systems, and engineers end up reconstructing operational context manually during incidents.

Retrieve what you need, nothing more

Sebastian framed the problem as signal retrieval. Rather than feeding the model more information, the goal is pulling the relevant operational state around a specific incident.

If you have the right parameters or the queries and all this stuff, you will retrieve the exact information that you need.

That means narrowing context around the actual incident window. When an incident hits, it retrieves information around that time only, Sebastian explained.

That also changes how they think about efficiency, reducing context switching directly affects operational speed, token usage, and cost.

You will see that information only with one call. And that saves a lot of tokens and time. That’s money and time.

Photo: Lea Lobor

AI helps but engineers still decide

Still, both of them were careful not to frame AI as autonomous incident management.

Rocío repeatedly emphasized that MCP and AI systems are primarily helping with context gathering and operational visibility, while engineers remain responsible for the high-risk decisions:

The AI is helping you, but the engineer is the one who is assessing and making decisions where there’s a high risk.

That human layer is intentional. PagerDuty’s broader vision seems less about replacing on-call engineers and more about reducing the operational overhead surrounding incidents. Their MCP systems help gather information, surface relationships between systems, and accelerate investigation workflows, but humans still decide what actually happens next.

Rocío also mentioned that their SRE agent is designed to support larger incident workflows beyond information retrieval:

It can also help you trigger those incident workflows. So it can help you resolve the incident. And it learns as it goes.

“MCP – the connective tissue between tools”

I asked Rocío and Sebastian, how does MCP fit into the tools they already use without becoming just another silo.

And both of them clearly framed MCP as anti-silo infrastructure since it brings everything to one place. Rocío called MCP “the connective tissue between all these different tools.”

That framing probably captures the broader architectural challenge better than anything else in the interview.

Modern incident response already spans dozens of systems: observability platforms, deployment pipelines, CI/CD tooling, ticketing systems, infrastructure management, and communication layers.

AI systems inherit that fragmentation unless something explicitly connects operational state.

Engineers trust systems that behave predictably

Sebastian mentioned that teams often react very differently to MCP systems. Some embrace them immediately while others remain skeptical, especially around security and predictability. For him, trust improves once systems consistently produce expected outcomes:

When a person or a teammate says “ah, I’m retrieving what I’m expecting to retrieve”, that will help them to trust it.

A lot of AI tooling discussions still focus on model capability, reasoning quality, or benchmark performance. But operational systems are usually adopted much more pragmatically. Engineers trust systems that behave predictably, retrieve the right operational context, and fit into workflows they already rely on.

The post When systems go down, devs still juggle 10 tabs. PagerDuty says MCP fixes that appeared first on ShiftMag.

Read the whole story
alvinashcraft
37 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Dispatches from O'Reilly: The accidental orchestrator

1 Share
Experiments in agentic engineering and AI-driven development
Read the whole story
alvinashcraft
37 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Is OpenAI Ready To IPO?, The Datacenters in Space Myth, The Kids Boo AI

1 Share

Join us for the Big Technology AI Summit on June, 18, 2026. Get your tickets here: summit.bigtechnology.com.... Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We cover: 1) OpenAI's revenue numbers come out ahead of its potential IPO filing 2) Why is OpenAI considering going public now? 3) Is OpenAI trying to IPO ahead of Anthropic? 4) Is the Iran War accelerating the timeline of these fundraisings? 5) What would the top of the AI boom look like? 6) SpaceX files for an IPO 7) Are datacenters in space a myth? 8) Eric Schmidt gets booed at a college commencement 9) Meta's mass layoff 10) Meta's keystroke tracking rationale 11) Marc Andreessen says AI won't file a HR complaint

---

Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice.

Want a discount for Big Technology on Substack + Discord? Here’s 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b

Learn more about your ad choices. Visit megaphone.fm/adchoices





Download audio: https://pdst.fm/e/tracking.swap.fm/track/t7yC0rGPUqahTF4et8YD/pscrb.fm/rss/p/traffic.megaphone.fm/AMPP9425036022.mp3
Read the whole story
alvinashcraft
37 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories