A new test framework joining the F# ecosystem: Scriptorium.It is designed from the ground up as a cross-target testing framework, seamlessly supporting both F# .NET and Fable compilation targets (JavaScript only at the time of writing)fable-hub.github.io/Scriptorium/#fsharp #fablecompiler
DataficationSDK/Verso — Extensible interactive notebook platform for .NET. Every built-in feature, from the C# kernel to the dashboard layout, is an extension built on the same public interfaces available to third-party authors. Runs in VS Code and the browser.
bartul/imperium — Strategy board game engine for Imperial (Mac Gerdts), exploring F# DDD and CQRS patterns
Add vertical scrollbar with mouse drag support to the main conversation view
Switching to Autopilot mode no longer triggers unexpected permission prompts for tool, path, or URL access
copilot --continue from a session's saved directory now refreshes the saved branch and git context instead of leaving them stale
Kill command safety filter no longer rejects valid commands that contain shell redirection like kill -0 <PID> 2>/dev/null.
Sessions now resume in their saved working directory; pass -C to override. Flags whose values are relative paths (e.g. --attachment, --log-dir) resolve from the saved cwd.
Context window tier selection (default ~200K vs 1M tokens) is now enforced end-to-end, so picking a tier actually constrains compaction, truncation, and token display
AI Credits usage correctly displays after sessions using the Responses API
Rendering no longer stutters when using tmux on Cygwin or mintty
Slash command picker keeps (experimental) and (staff) labels orange when the row is selected
Reasoning tokens display as a parenthetical on output token count in the token usage summary
Sessions containing events with non-URL strings in URL/URI fields resume without a 'Session file is corrupted' error
Requests that time out due to an HTTP/2 upload stall automatically retry over HTTP/1.1
Sessions no longer fail to load on Windows when a process exits with a high-bit exit code (e.g., .NET unhandled exceptions)
Timeline entry connector color matches surrounding elements when expanded
Gray background bar no longer appears behind user messages on terminals without truecolor support
Status line command supports plain shell commands in addition to executable script paths
Automatically prune old process log files from ~/.copilot/logs/ at startup to prevent unbounded disk growth
Polish /statusline picker with cleaner item descriptions and better spacing
Picker checkboxes now use a single-cell ▣/▢ glyph for tighter, more consistent rows across pickers
Custom agents support opt-in deferred tool loading via deferred-tool-loading in agent frontmatter, enabling tool-search discovery for agents with large tool lists
Exit summary displays AI Credits label with correct spacing before the value
/restart and /update preserve the current session ID after restarting
Legacy nested oauth.clientId and oauth.callbackPort keys in MCP server configs are now migrated to the supported oauthClientId and auth.redirectPort keys instead of being silently dropped
MCP OAuth re-authentication honors the configured redirectPort
PowerShell division operator no longer triggers false 'Allow directory access' prompts on Windows
/compact accepts optional focus instructions to shape the compaction summary
General-purpose subagents use GPT-5.4 or GPT-5.5 when available
/usage shows quota progress bars for session and weekly limits
AI credits error messages updated with clearer language and a Manage budget link
We kicked off our new weekly series This Week in AI on Monday, and we covered a lot of ground in 30 minutes, including an AI model that found security holes faster than decades of human auditing, a data center in Utah the size of two Manhattans, and a practical argument for why the harness you build around a model now matters more than which model you pick.
Here are a few takeaways from the conversation between host Eric Freeman, faculty member at UT Austin and a longtime friend of O’Reilly, and guest John Berryman, founder of Arcturus Labs, an early production engineer on GitHub Copilot, and coauthor of O’Reilly’sPrompt Engineering for LLMs. Watch the entire episode to find out why you should be building your own agent and why John believes eventually there will be no internet for humans.
AI’s security problem is now a policy problem
You’ve probably already heard about Mythos. Anthropic’s internal testing of the frontier model surfaced thousands of previously unknown security vulnerabilities across major operating systems, browsers, and financial infrastructure, including a 27-year-old bug in OpenBSD. Anthropic chose not to release the model publicly and instead launched Project Glasswing, a restricted program giving monitored access to a small group of trusted partners for defensive patching.
That decision moved fast in Washington. In roughly six weeks, the conversation shifted from the light-touch national AI policy released in March to reported White House discussions of an executive order review process modeled on how the FDA handles drugs. Security researcher Bruce Schneier has questioned whether Mythos is uniquely capable here or whether similar results are achievable with cheaper public models, but as Freeman noted (paraphrasing Schneier), either way, it’s a problem that’s coming.
Box Elder County, Utah, just approved a 40,000-acre AI data center called the Stratos project, backed by investor and TV personality Kevin O’Leary (a.k.a. Mr. Wonderful). It’s planned for 9 gigawatts at full buildout. That’s a footprint more than twice the size of Manhattan, powered by the equivalent of nine commercial nuclear reactors. And like many data center deals going forward, including Colossus above, it was approved over local protests.
Infrastructure at this incredible scale takes years to come online, and the companies making these bets are pricing in a world where model capability keeps scaling. Whether that assumption holds will determine a lot about what’s economically viable to build in the next decade.
The harness matters more than the model
John was on hand to rethink the agent harness, which as he pointed out, entered a new phase with the step change in model capability that occurred in November and December of last year. He took Eric through the arc of AI product development, from document completion and chat loops to tool-calling agents, DAG-based workflows, and now the harness era represented by tools like Claude Code. Each progression added capability, John noted, but also complexity, and each generated a new class of problems around reliability and control. In our current moment, which John has dubbed the “age of the unharnessed agent,” agents are now within reach of everyone, not just software developers.
The payoff of this “unharnessed” era is control. John described a client engagement where he replaced a bespoke application with a skills-driven agent. Now domain experts with no development experience can read the agent’s behavior written in plain English and better understand it. As John explained,
Rather than building a bespoke agent. . ., I just built something that was just the agent harness—the agent—and I just gave it skills that describe what basically I learned in interviewing their experts, how they would work with these agents. And it worked perfectly. Not only does the agent stay on track and do what it needs to do these days, but it’s coded, as far as my client is concerned, in English.
The experts don’t have to complain to developers “this doesn’t work.” The experts can look at the English description of what’s going on and see problems, and maybe even fix it themselves. And I’m really excited to basically give that power into the hands of the people that know best how to change it, the experts.
That’s a different relationship between the experts and the tool than anything a wrapped commercial product offers.
As Eric pointed out, recent Stanford research supports this broader point: Performance gaps between a bare model and a well-designed harness now often matter more than which underlying model you’re using. The benchmark that used to dominate buying decisions, which model scores highest, has been displaced by a harder question about which harness fits the task.
John closed with a demo of his personal agent moving from an Obsidian notebook into Wikipedia and back, carrying context across environments. He used it to illustrate a concept he called the “open agent protocol,” his term for a not-yet-existing standard where an agent receives environment-specific skills as it moves between contexts. The protocol doesn’t exist yet, but the demo made the direction clear.
What’s next
Join us and a rotating lineup of expert guests for weekly live tool demos and deeper dives into the topics that matter in AI. We’re taking next week off for Memorial Day in the US, but we’ll be back on June 1 with host Andreas Welsch and guests Maya Mikhailov and Doug Shannon to cut through another week of AI headlines and separate what actually drives business value from what looks good in a demo but goes nowhere in production. Our first few episodes are free and open to all if you’d like to attend live—register here.
We’ll continue to share full episodes and publish our takeaways here on Radar each Friday. You can also watch or listen on YouTube, Spotify, Apple, or wherever you get your podcasts.
Lorenzo Franceschi-Bicchierai reports: Phone provider Trump Mobile has confirmed that it was exposing customers’ names, email addresses, mailing addresses, cell numbers, and order identifiers to the open internet. Chris Walker, a spokesperson for the Trump-branded phone maker, told TechCrunch that the company is investigating the exposure and has not found evidence that content or financial...
Production incidents are a context problem. By the time an engineers understand what’s happening, they’ve already bounced across several different tools – and the incident is still ongoing. PagerDuty thinks MCP is the fix.
When incidents hit production systems, engineers rarely stay inside one tool for long, jumping from logs to dashboards to runbooks, trying to reconstruct what is actually happening.
Talking to other builders, it seemed like almost everybody faces this context-switching problem.
Rocío Bayon (Product Manager) and Sebastian Villanelo (Sr. Forward Deployed Engineer) from PagerDuty think MCP is how you fix it.
PagerDuty built their MCP to cut context switching
Rocío explained that their MCP is solving the issue of context switching:
When an incident hits, the engineer has to go between 5 to 10 different tools to understand what’s happening.
That’s the real problem they’re trying to solve.
PagerDuty’s framing of MCP was interesting: neither Rocío nor Sebastian described MCP as just another integration layer. They framed it as connective tissue that gathers logs, alerts, runbooks, and incident context into a single workflow.
What the MCP does, it brings all that context into one platform where engineers are usually already working.
Most engineering organizations already have enormous amounts of observability data. The real problem is that it is scattered across systems, and engineers end up reconstructing operational context manually during incidents.
Retrieve what you need, nothing more
Sebastian framed the problem as signal retrieval. Rather than feeding the model more information, the goal is pulling the relevant operational state around a specific incident.
If you have the right parameters or the queries and all this stuff, you will retrieve the exact information that you need.
That means narrowing context around the actual incident window. When an incident hits, it retrieves information around that time only, Sebastian explained.
That also changes how they think about efficiency, reducing context switching directly affects operational speed, token usage, and cost.
You will see that information only with one call. And that saves a lot of tokens and time. That’s money and time.
Photo: Lea Lobor
AI helps but engineers still decide
Still, both of them were careful not to frame AI as autonomous incident management.
Rocío repeatedly emphasized that MCP and AI systems are primarily helping with context gathering and operational visibility, while engineers remain responsible for the high-risk decisions:
The AI is helping you, but the engineer is the one who is assessing and making decisions where there’s a high risk.
That human layer is intentional. PagerDuty’s broader vision seems less about replacing on-call engineers and more about reducing the operational overhead surrounding incidents. Their MCP systems help gather information, surface relationships between systems, and accelerate investigation workflows, but humans still decide what actually happens next.
Rocío also mentioned that their SRE agent is designed to support larger incident workflows beyond information retrieval:
It can also help you trigger those incident workflows. So it can help you resolve the incident. And it learns as it goes.
“MCP – the connective tissue between tools”
I asked Rocío and Sebastian, how does MCP fit into the tools they already use without becoming just another silo.
And both of them clearly framed MCP as anti-silo infrastructure since it brings everything to one place. Rocío called MCP “the connective tissue between all these different tools.”
That framing probably captures the broader architectural challenge better than anything else in the interview.
Modern incident response already spans dozens of systems: observability platforms, deployment pipelines, CI/CD tooling, ticketing systems, infrastructure management, and communication layers.
AI systems inherit that fragmentation unless something explicitly connects operational state.
Engineers trust systems that behave predictably
Sebastian mentioned that teams often react very differently to MCP systems. Some embrace them immediately while others remain skeptical, especially around security and predictability. For him, trust improves once systems consistently produce expected outcomes:
When a person or a teammate says “ah, I’m retrieving what I’m expecting to retrieve”, that will help them to trust it.
A lot of AI tooling discussions still focus on model capability, reasoning quality, or benchmark performance. But operational systems are usually adopted much more pragmatically. Engineers trust systems that behave predictably, retrieve the right operational context, and fit into workflows they already rely on.