AI is moving fast from research to real-world deployment, and when things go wrong, the consequences are no longer hypothetical. In this episode, Sean McGregor, co-founder of the AI Verification & Evaluation Research Institute and also the founder of the AI Incident Database, joins Chris and Dan to discuss AI safety, verification, evaluation, and auditing. They explore why benchmarks often fall short, what red-teaming at DEF CON reveals about machine learning risks, and how organizations can better assess and manage AI systems in practice.
Featuring:
Links:
Upcoming Events:
As we move further into 2026, we’re excited to announce a variety of new powerful tools and reporting capabilities for Viva Insights that make it easier to understand how your organization's adoption and use of Microsoft 365 Copilot compares to other uses of it both within and outside your company. You can also now learn more about the adoption and impact of agents built in Microsoft Copilot Studio, and there are new functionalities to share and customize reports more broadly across your organization. Let's dive in.
We’re thrilled to announce the initial rollout of the Agent Dashboard, a powerful new functionality for the Viva Insights web app that provides leaders and analysts with actionable insights into agent adoption. With this new dashboard, users can dive into Copilot Credit usage – which measures how agents are used – and identify opportunities to optimize and track agent adoption over time.
To start, the dashboard covers adoption metrics aggregated at the user level for agents used within Microsoft 365 Copilot.
Read more about the Agent Dashboard here on the Microsoft 365 Copilot blog.
The Agent Dashboard is currently rolling out to public preview customers only. When you’re ready to start using it, learn how using our guide on MS Learn.
The Copilot Dashboard has been focused on providing actionable insights into Copilot readiness, adoption, and impact trends for specific groups within the organization. Now, with benchmarks in the Copilot Dashboard, users can also see how their adoption compares to others, either within their organization, or with other companies.
Benchmarks in the Copilot Dashboard provide context around Copilot adoption trends, so users can compare usage across internal cohorts, or see how their adoption of Copilot compares to similar organizations.
Read more about benchmarks in the Copilot Dashboard here on the Microsoft 365 Copilot blog.
We’re excited to introduce the initial public preview rollout of Copilot metrics export from the Copilot Dashboard. This new capability gives organizations greater flexibility to analyze the usage of both Microsoft 365 Copilot and Copilot Chat beyond the dashboard across the past six months at the de-identified user level, with user identifiers removed.
With this export tool, leaders and analysts with access to the global scope dashboard can download the data directly to support Copilot initiatives, such as tracking adoption and usage trends over time, or combining it with other data sources for custom analysis and reporting.
Learn more about how to use the export feature on MS Learn.
In an exciting expansion of our reporting tools measuring the adoption and impact of Microsoft 365 Copilot, the Copilot Studio agents report is now broadly available. This powerful new Power BI template allows users to learn more about the impact of agents built in Microsoft Copilot Studio, and how their deployed across channels in the organization.
This report can help users answer questions like:
The report provides insights about agents built in Microsoft Copilot Studio that are deployed across a variety of channels, including Microsoft Teams and Microsoft 365 Copilot, Facebook, mobile apps, and custom and demo websites.
To learn more about the report and how to run it, refer to our guide on MS Learn.
Existing tools for Power BI template reports in the Viva Insights web app allow users to customize their reports for their organization's needs, through their selection of filters, metrics, and organizational attributes. Now, an expanded toolkit allows Viva Insights analysts to further customize out-of-the-box Power BI reports to make them even more relevant to their organization.
With these new tools, analysts can, for example:
Users can now customize pre-built Power BI reports such as the Copilot Studio agents or Copilot for Sales adoption reports, but not custom queries such as Person queries or Meeting queries. Users can also customize any queries that they or other analysts in their organization have previously run.
To learn how to customize out-of-the-box reports, please see our guide on MS Learn.
When reading through my RSS feeds for the week, the number of articles about agent-assisted coding (even ones not supported by default) I read was overwhelming. It felt like there was no other topic for the week!
Am I going to write more about it, too? Well, yes, but not on how to set it up, use it, or whether I like the Xcode 26.3 implementation. Instead, I’m going to look a little into a possible future of working with these tools.
You’ve probably watched Apple’s code-along session introducing the feature. It’s good, and the presenters do a good job of explaining it, but you’ll notice something about the prompts they use. They keep mentioning that you should use detailed prompts, but the ones they use are short and don’t look much like the ones I have had success with.
Don’t get me wrong, I’m not criticising. It’s very easy for me to say a video should use essay-length prompts, but that’s hard to do in a 30-45 minute tutorial session and I understand why the session is how it is.
Ambiguity in a prompt is the one thing that the constant improvement in models will never fix. What happens in the edge cases? What happens when there’s an error? A short prompt can’t contain the detail needed for predictable and reliable software. Some of the agents, like Claude Code, will now ask clarifying questions if they detect ambiguity, but in my opinion it’s much better to think through your software and specify more things ahead of time before letting the agent start to code.
Learning how to use an agent effectively is definitely a skill, and the spec for whatever new app or new feature you’re building is really important. One of the most effective techniques I have used came from watching this workshop by Peter Steinberger where he lets two different LLM contexts try to find and fix holes in a spec before coding. As long as you read the feedback and make decisions where they need to be made, this is an incredibly effective way of working.
What I’d like to see is some of these agentic tools start to integrate that way of working, and none of the UIs I have seen so far encourage this. The agent is always in a sidebar or smaller window and they don’t guide you down a path of making sure you have removed ambiguity before coding. I can imagine a future version of Xcode where the agent works with you to refine a spec for whatever change you’re planning, full screen in the editor, including using multiple context windows to push back against you. I’d love to see the spec in a completely separate context window to the coding agent, too. It would need a lot of work and getting the UI right would be tricky, but I suspect it would be powerful. It would also change the focus of the UI from “the agent is only important enough to live in a sidebar” to “the agent and your instructions are as important as the code”.
All of this is possible already, of course, but only by using multiple tools and you need careful model and context management for it to be effective. I’d love to see Apple and the Xcode team really lead from the front and reduce the friction of working this way. Maybe we’ll see something at this year’s WWDC?
– Dave Verwer
Shipping white-label apps used to mean repeating the same steps and signing in and out of App Store Connect dozens of times per release. With Runway, ship everything in one place, just once.
It’s that time of year again! The Swift Student Challenge is open for submissions until February 28th. It’s truly a unique experience for the winners, but it’s also a great opportunity to put your mind to a focused, constrained project for a couple of weeks to see what you can create. Check your eligibility (it’s broader than you might think) and get your application in!
SimTag adds a small, unobtrusive overlay to each iOS Simulator window showing the branch that build came from.
That’s it.
What a great idea. 👍
What’s this? An article that doesn’t even mention AI? 😂 Gabriel Theodoropoulos writes about how little tweaks to the built-in SwiftUI morphing animations can make all the difference to how polished your app feels. I love that there are videos for each step of the process, too. Great article.
It goes without saying that just reading alone won’t make you a Swift concurrency expert, you’ll need to continuously put everything you learn to practice.
And yet, also:
But those who do are guaranteed to become top 1% subject matter experts.
I love the format of this article, it’s great.
Yes, there were Swift talks at this year’s FOSDEM, but this one by Dylan Ayrey and Mike Nolan was the one that caught my eye. They ask the question of whether open source is doomed in the age of LLMs, and it’s really a fantastic talk.
Talking of FOSDEM, there was also a pre-event focused entirely on Swift with nine short talks. Perfect for a few bite-sized watches this weekend.
First it became a typeset PDF again, and now a physical object!
A collection of upcoming CFPs (call for papers) from across the internet and around the world.
The post Call For Papers Listings for 2/13 appeared first on Leon Adato.