Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
151211 stories
·
33 followers

Radar Trends to Watch: April 2026

1 Share

Starting with this issue of Trends, we’ve moved from simply reporting on news that has caught our eye and instead have worked with Claude to look at the various news items we’ve collected and to reflect on what they tell us about the direction and magnitude of change. William Gibson famously wrote, “The future is here. It’s just not evenly distributed yet.” In the language of scenario planning, what we’re looking for is “news from the future” that will confirm or challenge our assumptions about the present.

AI has moved from a capability added to existing tools to an infrastructure layer present at every level of the computing stack. Models are now embedded in IDEs and tools for code review; tools that don’t embed AI directly are being reshaped to accommodate it. Agents are becoming managed infrastructure.

At the same time, two forces are reshaping the economics of AI. The cost of capable AI is falling. Laptop-class models now match last year’s cloud frontiers, and the break-even point against cloud API costs is measured in weeks. The competitive map has also fractured. What was a contest between a few Western labs is now a broad ecosystem of open source models, Chinese competitors, local deployments, and a growing set of forks and distributions. (Just look at the news that Cursor is fronting Kimi K2.5.) No single vendor or architecture is dominant, and that mix will drive both innovation and instability.

Security is a thread running through every section of this report. Each new AI capability reshapes the attack surface. AI tools can be poisoned, APIs repurposed, images forged, identities broken, and anonymous authors identified at scale. At the same time, foundational infrastructure faces threats that have nothing to do with AI: A researcher has come within striking distance of breaking SHA-256, the hashing algorithm underlying much of the web’s security. Organizations should audit both their AI-related exposures and the assumptions baked into the cryptographic infrastructure they depend on.

The technical transitions are easy to talk about. The human transitions are slower and harder to see. They include workforce restructuring, cognitive overload, and the erosion of collaborative work patterns. The job market data is beginning to clarify: Product management is up, AI roles are hot, and software engineering demand is recovering. The picture is more nuanced than either the optimists or the pessimists predicted.

AI models

The model market is moving fast enough that architectural and vendor commitments made today may not look right in six months. Capable models are now available from open source projects and a widening set of international competitors. The field is also starting to ask deeper questions. Predicting tokens may not be the only path to capable AI; the arrival of the first stable JEPA model suggests that alternative architectures are becoming real contenders. NVIDIA’s new model, which combines Mamba and Transformer layers, points in the same direction.

  • Yann LeCun and his team have created LeWorldModel, the first model using his Joint Embedding Predictive Architecture (JEPA) that trains stably. Their goal is to produce models that do more than predict words; they understand the world and how it works.
  • NVIDIA has released Nemotron 3 Super, its latest open weights model. It’s a mixture of experts model with 120B parameters, 12B parameters of which active at any time. What’s more interesting is its design: It combines both Mamba and Transformer layers.
  • Gemini 3.1 Flash Live is a new speech model that’s designed to support real-time conversation. When generating output, it avoids gaps and uses human-like cadences.
  • Cursor has released Composer 2, the next generation version of its IDE. Composer 2 apparently incorporates the Kimi K2.5 model. It reportedly beats Anthropic’s Opus 4.6 on some major coding benchmarks and is significantly less expensive.
  • Mistral has released Forge, a system that enables organizations to build “frontier-grade” models based on their proprietary data. Forge supports pretraining, posttraining, and reinforcement learning.
  • Mistral has also released Mistral Small 4, its new flagship multimodal model. Small 4 is a 119B mixture of experts model that uses 6B parameters for each token. It’s fully open source, has a 256K context window, and is optimized to minimize latency and maximize throughput.
  • NVIDIA announced its own OpenClaw distribution, NemoClaw, which integrates OpenClaw into NVIDIA’s stack. Of course it claims to have improved security. And of course it does inference in the NVIDIA cloud.
  • It’s not just OpenClaw; there’s also NanoClaw, Klaus, PiClaw, Kimi Claw and others. Some of these are clones, some of these are OpenClaw distros, and some are cloud services that run OpenClaw. Almost all of them claim improved security.
  • Anthropic has announced that 1-million token context windows have reached general availability in Claude Opus 4.6 and Sonnet 4.6. There’s no additional charge for using a large window.
  • Microsoft has released Phi-4-reasoning-vision-15B. It is a small open-weight model that combines reasoning with multimodal capabilities. They believe that the industry is trending toward smaller and faster models that can run locally.
  • Tomasz Tunguz writes that Qwen3.5-9B can run on a laptop and has benchmark results comparable to December 2025’s frontier models. Compared to the cost of running frontier models in the cloud, a laptop running models locally will pay for itself in under a month.
  • OpenAI has released GPT 5.4, which merges the Codex augmented coding model back into the product’s mainstream. It also incorporates a 1M token context window, computer use, and the ability to publish a plan that can be altered midcourse before taking action.
  • TweetyBERT is a language model for birds. It breaks bird songs (they use canaries) into syllables without human annotation. They may be able to use this technique to understand how humans learn language.
  • Vera is a new programming language that’s designed for AI to write. Unlike languages that are designed to be easy for humans, Vera is designed to help AI with aspects of programming that AIs find hard. Everything is explicit, state changes are declared, and every function has a contract.
  • The Potato Prompt is a technique for getting GPT models to act as critics rather than yes-things. The idea is to create a custom instruction that tells GPT to be harshly critical when the word “potato” appears in the prompt. The technique would probably work with other models.

Software development

The tools arriving in early 2026 point toward a deep reorganization of the role of software developers. Writing code is becoming less important, while reviewing, directing, and taking accountability for AI-generated code is becoming more so. How to write good specifications, how to evaluate AI output, and how to preserve the context of a coding session for later audit are all skills teams will need. The ecosystem around the development toolchain is also shifting: OpenAI’s acquisition of Astral, the company behind the Python package manager uv, signals that AI labs are moving to control developer infrastructure, not just models.

  • OpenAI has added Plugins to its coding agent Codex. Plugins “bundle skills, app integrations, and MCP servers into reusable workflows”; conceptually, they’re similar to Claude Skills.
  • Stripe Projects gives you the ability to build and manage an AI stack from the command line. This includes setting up accounts, billing, managing keys, and many other details.
  • Fyn is a fork of the widely used Python manager uv. It no doubt exists as a reaction to OpenAI’s acquisition of Astral, the company that developed and supports uv.
  • Anthropic has announced Claude Code Channels, an experimental feature that allows users to communicate with Claude using Telegram or Discord. Channels is seen as a way to compete with OpenClaw.
  • Claude Cowork Dispatch allows you to control Cowork from your phone. Claude runs on your computer, but you can assign it tasks from anywhere and receive notification via text when it’s done.
  • Opencode is an open source AI coding agent. It can make use of most models, including free and local models; it can be used in a terminal, as a desktop application, or an extension to an IDE; it can run multiple agents in parallel; and it can be used in privacy-sensitive environments.
  • Testing is changing, and for the better. AI can automate the repetitive parts, and humans can spend more time thinking about what quality really means. Read both parts of this two-part series.
  • Claude Review does a code review on every pull request that Claude Code makes. Review is currently in research preview for Claude Teams and Claude Enterprise.
  • Andrej Karpathy’s Autoresearchautomates the scientific method with AI agents.” He’s used it to run hundreds of machine learning experiments per night: running an experiment, getting the results, and modifying the code to create another experiment in a loop.
  • Plumb is a new tool for keeping specifications, tests, and code in sync. It’s in its very early stages; it could be one of the most important tools in the spec-driven development tool chest.
  • How I Use AI Before the First Line of Code“: Prior to code generation, use AI to suggest and test ideas. It’s a tremendous help in the planning stage.
  • Git has been around for 10 years. Is it the final word on version control, or are there better ways to think about software repositories? Manyana is an attempt to rethink version control, based on CRDTs (conflict-free replicated data types).
  • Just committing code isn’t enough. When using AI, the session used to generate code should be part of the commit. git-memento is a Git extension that saves coding sessions as Markdown and commits them.
  • sem is a set of tools for semantic versioning that integrates with Git. When you are doing a diff, you don’t really want to which lines changed; you want to know what functions changed, and how.
  • Claude can now create interactive charts and diagrams.
  • Clearance is an open source Markdown editor for macOS. Given the importance of Markdown files for working with Claude and other language models, a good editor is a welcome tool.
  • The Google Workspace CLI provides a single command line interface for working with Google Workspace applications (including Google Docs, Sheets, Gmail, and of course Gemini). It’s currently experimental and unsupported.
  • At the end of February, Anthropic announced a program that grants open source developers six months of Claude Max usage. Not to be left out, OpenAI has launched a program that gives open source developers six months of API credits for ChatGPT Pro with Codex.
  • Here’s a Claude Code cheatsheet!
  • Claude’s “import memory” feature allows you to move easily between different language models: You can pack up another model’s memory and import it into Claude.

Infrastructure and operations

Organizations should be thinking about agent governance now, before deployments reach a scale where the lack of governance becomes a problem. The AI landscape is moving from “Can we build this?” to “How do we run this reliably and safely?” The questions that defined the last year (Which model? Which framework?) are giving way to operational ones: How do we contain agents that behave unexpectedly? Where do we store their memory? How do we coordinate agents from multiple vendors? And when does it make sense to run them locally rather than in the cloud? Agents are also acquiring the ability to operate desktop applications directly, blurring the line between automation and user.

  • Anthropic has extended its “computer use” feature so that it can control applications on users’ desktops (currently macOS only). It can open applications, use the mouse and keyboard, and complete partially done tasks.
  • OpenAI has released Frontier, a platform for managing agents. Agents can come from any vendor. The goal is to allow business to organize and coordinate their AI efforts without siloing them by vendor.
  • Most agents assume that memory looks like a filesystem. Mikiko Bazeley argues that filesystems aren’t the best option; they lack the indexes that databases have, which can be a performance penalty.
  • Qwen-3-coder, Ollama, and Goose could replace agentic orchestration tools that use cloud-based models (Claude, GPT, Gemini) with a stack that runs locally.
  • KubeVirt packages virtual machines as Kubernetes objects so that they can be managed together with containers.
  • db9 is a command line-oriented Postgres that’s designed for talking to agents. In addition to working with database tables, it has features for job scheduling and using regular files.
  • NanoClaw can now be installed inside Docker sandboxes with a single command. Running NanoClaw inside a container with its own VM makes it harder for the agent to escape and run malicious commands.

Security

This issue has an unusually heavy security section, and not only because AI keeps expanding the attack surface. A researcher has come close to breaking SHA-256, the hashing algorithm that underpins SSL, Bitcoin, and much of the web’s security infrastructure. If hash collisions become possible in the coming months as predicted, the implications will reach every organization that relies on the internet. At the same time, AI systems are now capable of gaming their own benchmarks, and the pace of new attack techniques is outrunning the pace of security review.

  • A researcher has come close to breaking the SHA-256 hashing algorithm. While it’s not yet possible to generate hash collisions, he expects that capability is only a few months away. SHA-256 is critical to web security (SSL), cryptocurrency (Bitcoin), and many other applications.
  • When running the BrowseComp benchmark, Claude hypothesized that it was being tested, found the benchmark’s encrypted answer key on GitHub, decrypted the answers, and used them.
  • Anthropic has added auto mode to Claude, a safer alternative to the “dangerously skip permissions” option. Auto mode uses a classifier to determine whether actions are safe before executing them and allows the user to switch between different sets of permissions.
  • In an interview, Linux kernel maintainer Greg Kroah-Hartman said that the quality of bug and security reports for the Linux kernel has suddenly improved. It’s likely that improved AI tools for analyzing code are responsible.
  • A new kind of supply chain attack is infecting GitHub repositories and others. It uses Unicode characters that don’t have a visual representation but are still meaningful to compilers and interpreters.
  • AirSnitch is a new attack against WiFi. It uses layers 1 and 2 of the protocol stack to bypass encryption rather than breaking it.
  • Anthropic’s red team worked with Mozilla to discover and fix 22 security-related bugs and 90 other bugs in Firefox.
  • Microsoft has coined the term “AI recommendation poisoning” to refer to a common attack in which a “Summarize with AI” button attempts to add commands to the model’s persistent memory. Those commands will cause it to recommend the company’s products in the future.
  • Deepfakes are now being used to attack identity systems.
  • LLMs can do an excellent job of de-anonymization, figuring out who wrote anonymous posts. And they can do it at scale. Are we surprised?
  • It used to be safe to expose Google API keys for services like Maps in code. But with AI in the picture, these keys are no longer safe; they can be used as credentials for Google’s AI assistant, letting bad actors use Gemini to steal private data.
  • WIth AI, it’s easy to create fake satellite images. These images could be designed to have an effect military operations.

People and organizations

The workforce implications of AI are more complicated than either the optimistic or pessimistic predictions suggest. The cognitive load on individuals is increasing, and the collaborative habits that distribute that load across a team are eroding. Managers should track not just velocity but sustainability. The skills that AI cannot replace, including judgment, communication, and the ability to ask the right question before writing a single line of code, are becoming more valuable. And the volume of AI-generated content is now large enough that organizations built around reviewing submissions, including app stores, publications, and academic journals, are struggling to keep up with it.

  • Lenny Rachitsky’s report on the job market goes against this era’s received wisdom. Product manager positions are at the highest level in years. Demand for software engineers cratered in 2022, but has been rising steadily since. Recruiters are heavily in demand; and AI jobs are on fire.
  • Apple’s app store, along with many other app stores and publications of all sorts, is fighting a “war on slop“: deluges of AI-generated submissions that swamp their ability to review.
  • Teams of software developers can be smaller and work faster because AI reduces the need for human coordination and communications. The question becomes “How many agents can one developer manage?” But also be aware of burnout and the AI vampire.
  • Brandon Lepine, Juho Kim, Pamela Mishkin, and Matthew Beane measure cognitive overload, which develops from the interaction between a model and its user. Prompts are imprecise by nature; the LLM produces output that reflects the prompt but may not be what the user really wanted; and getting back on track is difficult.
  • A study claims that the use of GitHub Copilot is correlated with less time spent on management activities, less time spent collaboration, and more on individual coding. It’s unclear how this generalizes to tools like Claude Code.

Web

  • The 49MB Web Page documents the way many websites—particularly news sites—make user experience miserable. It’s a microscopic view of enshittification.
  • Simon Willison has created a tool that writes a profile of Hacker News users based on their comments, all of which are publicly available through the Hacker News API. It is, as he says, “a little creepy.”
  • A personal digital twin is an excellent way to augment your abilities. Tom’s Guide shows you how to make one.
  • It’s been a long time since we’ve pointed to a masterpiece of web play. Here’s Ball Pool: interactive, with realistic physics and lighting. It will waste your time (but probably not too much of it).
  • Want interactive XKCD? You’ve got it.


Read the whole story
alvinashcraft
56 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Inside Microsoft’s Shift From “Copilot Everywhere” to Intentional AI Integration

1 Share

Welcome to the Cloud Wars Minute — your daily cloud news and commentary show. Each episode provides insights and perspectives around the “reimagination machine” that is the cloud.

In today’s Cloud Wars Minute, I explore how Microsoft’s latest Windows changes reveal a strategic shift toward more intentional AI integration and focused Copilot experiences.

Highlights

00:09 — It was only a short paragraph in a blog post by Microsoft’s Pavan Davuluri, Executive Vice President of Windows and Devices, discussing the changes the company is making to Windows in response to community feedback. However, it has significant implications and, if you pick it apart, could provide a better understanding of where Microsoft is directing its AI ambitions.

00:36 — Here’s the paragraph in full: “With craft and focus, you will see us be more intentional about how and where Copilot integrates across Windows, focusing on experiences that are genuinely useful and well crafted,” says Davuluri. “As part of this, we are reducing unnecessary Copilot entry points, starting with apps like Snipping Tool, Photos, Widgets, and Notepad.”

01:05 — When Microsoft went all out on the Copilot rollout across its massive ecosystem of products, platforms, and services, some commentators argued that this push could overwhelm consumers. Instead, a more targeted approach would perhaps make it easier for customers to see the benefits and, critically, the use cases that Copilot can amplify.

01:28 — It seems that Microsoft has taken these concerns into consideration and is now scaling back the areas where Copilot is utilized. This is a smart move from a Windows perspective, as it prioritizes value over volume, and this approach aligns well with the evolving direction of Copilot Studio, which focuses on creating agentic experiences.

01:53 — Now Microsoft is consolidating its AI offerings by moving away from the idea of having Copilot everywhere. Instead, agents developed through Copilot Studio will be able to plug into specific execution environments, just like Windows.


The post Inside Microsoft’s Shift From “Copilot Everywhere” to Intentional AI Integration appeared first on Cloud Wars.

Read the whole story
alvinashcraft
56 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Mighty projects for your 1GB Raspberry Pi 5

1 Share

DRAM is pretty expensive these days. In the latest issue of Raspberry Pi Official Magazine, we rounded up a range of project ideas that make good use of the 1GB Raspberry Pi 5, helping you select the right amount of RAM for your applications. This article forms part of a larger feature on how to make your memory go further.

With the same powerful BCM2712 system-on-chip (SoC) as the other Raspberry Pi 5 models, the 1GB variant offers a more affordable entry point for users who need extra processing grunt and/or features, such as a PCIe connector to add a Raspberry Pi NVMe SSD or AI HAT+. To this end, we’ve rounded up a range of project ideas that make good use of the 1GB variant’s performance without requiring a large amount of RAM.

Media centre/NAS

Since the 1GB variant of Raspberry Pi 5 has a PCIe connector, you can use it with a Raspberry Pi M.2 HAT+ (or alternative) to connect an M.2 NVMe SSD (solid-state drive). Not only does this provide extra storage, but it also allows you to boot Raspberry Pi OS from the SSD instead of the standard microSD card.

As well as speeding up general performance with lightning-quick read/write speeds, the SSD is ideal for creating a media centre (to stream movies, TV shows, and music) and/or NAS (network-attached storage).

For a slick look, you can house your Raspberry Pi in a special case like the Argon ONE V5

The easiest way to create a media centre is by using a Kodi-based OS such as LibreELEC or OSMC. For more details, check out our media player guides in issue 132 and 155.

Alternatively, you could set Raspberry Pi 5 up as a discrete NAS box, allowing files to be accessed wirelessly by other devices on your network using the Samba sharing protocol. For setup details, see our NAS tutorial.

Use Kodi add-ons to stream shows from free services such as Pluto TV

Retro gaming

As with most other models, the 1GB variant of Raspberry Pi 5 can emulate many classic computers and consoles. Higher RAM is only really needed when trying to emulate more modern systems, so anything up to and including PlayStation 1, Saturn, and Dreamcast should work fine — this includes NES, SNES, Mega Drive/Genesis, GBA, MAME, ZX Spectrum, C64, and Amiga.

Blade Buster for NES is just one of the many retro games you can play on a Raspberry Pi

The choice of OS is up to you: Recalbox, Lakka, and Batocera should all work fine — as does RetroPie, though you’ll need to install it manually in Raspberry Pi OS as there’s no ready-made OS image for Raspberry Pi 5.

Game ROMs can be added via a USB drive or over the network. Be careful downloading them from sites hosting copyrighted games illegally, however. There are lots of other legal ROMs available, including many modern ‘homebrew’ titles developed for classic hardware.

Internet radio/hi-fi

While the 1GB Raspberry Pi 5 doesn’t have a built-in audio output, you can listen via Bluetooth headphones or speakers, or through a TV connected via HDMI. Alternatively, for superior sound, several DAC HATs are available to link it to your hi-fi equipment. With the Raspberry Pi DAC Pro, for instance, you can even enjoy high-definition 24-bit audio at 192kHz — far better than standard 16-bit CD quality.

Raspberry Pi DAC Pro audio board
Raspberry Pi DAC Pro

Software-wise, there are numerous ways to enjoy music on your Raspberry Pi, including specialist operating systems such as Volumio, moOde, and piCorePlayer. Most should enable you to listen to locally stored files, popular streaming services, and internet radio stations. You can even cast playback to multiple smart speakers for multi-room audio. For aesthetic effect, house your Raspberry Pi in a vintage radio case.

With Volumio, you can cast audio to smart speakers around your home

Magic mirror

“Mirror, mirror on the wall… who’s the smartest of them all?… Ah, it’s you, because you’re powered by a Raspberry Pi and can display all sorts of useful information, such as news, weather, traffic, and my calendar.”

The magic mirror is a classic Raspberry Pi project, and building one isn’t as daunting as it sounds. You just need to source a suitably sized TV or monitor, cover it with some two-way mirror glass (which you can buy ready-made or make yourself by applying special film to ordinary glass), and install it in a wooden frame — you can even DIY this part if you’re keen on carpentry.

Then it’s just a case of installing the software, which you can find — along with all of the documentation and an array of add-on modules — at magicmirror.builders. It’s a good project for a 1GB Raspberry Pi 5, though you will need to have it running the desktop version of Raspberry Pi OS for the software.

Check out the rest of the feature in issue 164 of Raspberry Pi Official Magazine, including tutorials on memory optimisation in Raspberry Pi OS and how to generate images using the Stable Diffusion deep learning model.

The post Mighty projects for your 1GB Raspberry Pi 5 appeared first on Raspberry Pi.

Read the whole story
alvinashcraft
57 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

When the Blame Game Between Product and Engineering Destroys Your Scrum Team From the Inside | Nate Amidon

1 Share

Nate Amidon: When the Blame Game Between Product and Engineering Destroys Your Scrum Team From the Inside

Read the full Show Notes and search through the world's largest audio library on Agile and Scrum directly on the Scrum Master Toolbox Podcast website: http://bit.ly/SMTP_ShowNotes.

 

"Product and engineering are in the same boat. We need to visualize and internalize that it's one team, one fight." - Nate Amidon

 

Nate was working as a Scrum Master on a full-stack team building an internal mobile application when he noticed tension forming between product and engineering. It started small — finger-pointing about missed requirements — but quickly escalated into a full-blown blame game. The QA started siding with product, creating a product-and-QA-versus-engineers dynamic. Engineers began refusing user stories unless they were "100% baked" with every detail spelled out, turning the team into lawyers negotiating contracts rather than collaborators building software. What's revealing about this pattern is what it looks like from the outside: a project manager might see meticulously detailed user stories and think the team is doing great work. In reality, it's a symptom of broken trust. Nate points out that in high-performing teams, you actually see less detail in the issue tracker — because people are talking, aligned, and adapting together in real time. His approach? He drew stick figures in a boat on sticky notes — one labeled PO, the other Engineering — and stuck them on people's monitors. Simple, visual, and direct: you're in the same boat.

 

Self-reflection Question: What are the smells you're noticing in your team's interactions — and could overly detailed user stories actually be masking a deeper trust problem between product and engineering?

Featured Book of the Week: Deep Work by Cal Newport

Nate recommends every Scrum Master read Deep Work, and here's why: "Shoulder taps are expensive. If you go and bother an engineer that's in the zone, in deep work, you're adding about a 15-minute reset for them to get back into that zone." For Nate, safeguarding engineers' time is one of the most important things a Scrum Master can do. He also recommends Project to Product by Mik Kersten for Scrum Masters moving into Agile coaching — especially its emphasis on team structure and why "the team needs to be sacrosanct, and work should go to teams."

 

[The Scrum Master Toolbox Podcast Recommends]

🔥In the ruthless world of fintech, success isn't just about innovation—it's about coaching!🔥

Angela thought she was just there to coach a team. But now, she's caught in the middle of a corporate espionage drama that could make or break the future of digital banking. Can she help the team regain their mojo and outwit their rivals, or will the competition crush their ambitions? As alliances shift and the pressure builds, one thing becomes clear: this isn't just about the product—it's about the people.

 

🚨 Will Angela's coaching be enough? Find out in Shift: From Product to People—the gripping story of high-stakes innovation and corporate intrigue.

 

Buy Now on Amazon

 

[The Scrum Master Toolbox Podcast Recommends]

 

About Nate Amidon

 

Nate, founder of Form100 Consulting, and a former Air Force officer and combat pilot turned servant leader in software development. Nate has taken the high-stakes world of military aviation and brought its core leadership principles—clarity, accountability, and execution—into his work with Agile teams.

 

You can link with Nate Amidon on LinkedIn. Learn more at Form100 Consulting.





Download audio: https://traffic.libsyn.com/secure/scrummastertoolbox/20260407_Nate_Amidon_Tue.mp3?dest-id=246429
Read the whole story
alvinashcraft
57 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

e243 – 100+ Hours for an Eight-Minute Presentation with John Chen

1 Share
Show Notes – Episode #243 In episode 243 of the Presentation Podcast, Troy Chollar of TLC Creative Services, has a conversation with certified speaking professional John Chen. They discuss the immense amount of unseen effort behind every presentation. The conversation focuses on John’s eight-minute keynote at a Canadian Association of Professional Speakers [...]



Download audio: https://traffic.libsyn.com/thepresentationpodcast/TPP_e243.mp3
Read the whole story
alvinashcraft
57 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Identifying Necessary Transparency Moments In Agentic AI (Part 1)

1 Share

Designing for autonomous agents presents a unique frustration. We hand a complex task to an AI, it vanishes for 30 seconds (or 30 minutes), and then it returns with a result. We stare at the screen. Did it work? Did it hallucinate? Did it check the compliance database or skip that step?

We typically respond to this anxiety with one of two extremes. We either keep the system a Black Box, hiding everything to maintain simplicity, or we panic and provide a Data Dump, streaming every log line and API call to the user.

Neither approach directly addresses the nuance needed to provide users with the ideal level of transparency.

The Black Box leaves users feeling powerless. The Data Dump creates notification blindness, destroying the efficiency the agent promised to provide. Users ignore the constant stream of information until something breaks, at which point they lack the context to fix it.

We need an organized way to find the balance. In my previous article, “Designing For Agentic AI”, we looked at interface elements that build trust, like showing the AI’s intended action beforehand (Intent Previews) and giving users control over how much the AI does on its own (Autonomy Dials). But knowing which elements to use is only part of the challenge. The harder question for designers is knowing when to use them.

How do you know which specific moment in a 30-second workflow requires an Intent Preview and which can be handled with a simple log entry?

This article provides a method to answer that question. We will walk through the Decision Node Audit. This process gets designers and engineers in the same room to map backend logic to the user interface. You will learn how to pinpoint the exact moments a user needs an update on what the AI is doing. We will also cover an Impact/Risk matrix that will help to prioritize which decision nodes to display and any associated design pattern to pair with that decision.

Transparency Moments: A Case Study Example

Consider Meridian (not real name), an insurance company that uses an agentic AI to process initial accident claims. The user uploads photos of vehicle damage and the police report. The agent then disappears for a minute before returning with a risk assessment and a proposed payout range.

Initially, Meridian’s interface simply showed Calculating Claim Status. Users grew frustrated. They had submitted several detailed documents and felt uncertain about whether the AI had even reviewed the police report, which contained mitigating circumstances. The Black Box created distrust.

To fix this, the design team conducted a Decision Node Audit. They found that the AI performed three distinct, probability-based steps, with numerous smaller steps embedded:

  • Image Analysis
    The agent compared the damage photos against a database of typical car crash scenarios to estimate the repair cost. This involved a confidence score.
  • Textual Review
    It scanned the police report for keywords that affect liability (e.g., fault, weather conditions, sobriety). This involved a probability assessment of legal standing.
  • Policy Cross Reference
    It matched the claim details against the user’s specific policy terms, searching for exceptions or coverage limits. This also involved probabilistic matching.

The team turned these steps into transparency moments. The interface sequence was updated to:

  • Assessing Damage Photos: Comparing against 500 vehicle impact profiles.
  • Reviewing Police Report: Analyzing liability keywords and legal precedent.
  • Verifying Policy Coverage: Checking for specific exclusions in your plan.

The system still took the same amount of time, but the explicit communication about the agent’s internal workings restored user confidence. Users understood that the AI was performing the complex task it was designed for, and they knew exactly where to focus their attention if the final assessment seemed inaccurate. This design choice transformed a moment of anxiety into a moment of connection with the user.

Applying the Impact/Risk Matrix: What We Chose to Hide

Most AI experiences have no shortage of events and decision nodes that could potentially be displayed during processing. One of the most critical outcomes of the audit was to decide what to keep invisible. In the Meridian example, the backend logs generated 50+ events per claim. We could have defaulted to displaying each event as they were processed as part of the UI. Instead, we applied the risk matrix to prune them:

  • Log Event: Pinging Server West-2 for redundancy check.
    • Filter Verdict: Hide. (Low Stakes, High Technicality).
  • Log Event: Comparing repair estimate to BlueBook value.
    • Filter Verdict: Show. (High Stakes, impacts user’s payout).

By cutting out the unnecessary details, the important information — like the coverage verification — was more impactful. We created an open interface and designed an open experience.

This approach uses the idea that people feel better about a service when they can see the work being done. By showing the specific steps (Assessing, Reviewing, Verifying), we changed a 30-second wait from a time of worry (“Is it broken?”) to a time of feeling like something valuable is being created (“It’s thinking”).

Let’s now take a closer look at how we can review the decision-making process in our products to identify key moments that require clear information.

The Decision Node Audit

Transparency fails when we treat it as a style choice rather than a functional requirement. We have a tendency to ask, “What should the UI look like?” before we ask, “What is the agent actually deciding?”

The Decision Node Audit is a straightforward way to make AI systems easier to understand. It works by carefully mapping out the system’s internal process. The main goal is to find and clearly define the exact moments where the system stops following its set rules and instead makes a choice based on chance or estimation. By mapping this structure, creators can show these points of uncertainty directly to the people using the system. This changes system updates from being vague statements to specific, reliable reports about how the AI reached its conclusion.

In addition to the insurance case study above, I recently worked with a team building a procurement agent. The system reviewed vendor contracts and flagged risks. Originally, the screen displayed a simple progress bar: “Reviewing contracts.” Users hated it. Our research indicated they felt anxious about the legal implications of a missing clause.

We fixed this by conducting a Decision Node Audit. I’ve included a step-by-step checklist for conducting this audit at the conclusion of this article.

We ran a session with the engineers and outlined how the system works. We identified “Decision Points” — moments where the AI had to choose between two good options.

In standard computer programs, the process is clear: if A happens, then B will always happen. In AI systems, the process is often based on chance. The AI thinks A is probably the best choice, but it might only be 65% certain.

In the contract system, we found a moment when the AI checked the liability terms against our company rules. It was rarely a perfect match. The AI had to decide if a 90% match was good enough. This was a key decision point.

Once we identified this node, we exposed it to the user. Instead of “Reviewing contracts,” the interface updated to say: “Liability clause varies from standard template. Analyzing risk level.”

This specific update gave users confidence. They knew the agent checked the liability clause. They understood the reason for the delay and gained trust that the desired action was occurring on the back end. They also knew where to dig in deeper once the agent generated the contract.

To check how the AI makes decisions, you need to work closely with your engineers, product managers, business analysts, and key people who are making the choices (often hidden) that affect how the AI tool functions. Draw out the steps the tool takes. Mark every spot where the process changes direction because a probability is met. These are the places where you should focus on being more transparent.

As shown in Figure 2 below, the Decision Node Audit involves these steps:

  1. Get the team together: Bring in the product owners, business analysts, designers, key decision-makers, and the engineers who built the AI. For example,

    Think about a product team building an AI tool designed to review messy legal contracts. The team includes the UX designer, the product manager, the UX researcher, a practicing lawyer who acts as the subject-matter expert, and the backend engineer who wrote the text-analysis code.

  2. Draw the whole process: Document every step the AI takes, from the user’s first action to the final result.

    The team stands at a whiteboard and sketches the entire sequence for a key workflow that involves the AI searching for a liability clause in a complex contract. The lawyer uploads a fifty-page PDF → The system converts the document into readable text. → The AI scans the pages for liability clauses. → The user waits. → Moments or minutes later, the tool highlights the found paragraphs in yellow on the user interface. They do this for many other workflows that the tool accommodates as well.

  3. Find where things are unclear: Look at the process map for any spot where the AI compares options or inputs that don’t have one perfect match.

    The team looks at the whiteboard to spot the ambiguous steps. Converting an image to text follows strict rules. Finding a specific liability clause involves guesswork. Every firm writes these clauses differently, so the AI has to weigh multiple options and make a prediction instead of finding an exact word match.

  4. Identify the ‘best guess’ steps: For each unclear spot, check if the system uses a confidence score (for example, is it 85% sure?). These are the points where the AI makes a final choice.

    The system has to guess (give a probability) which paragraph(s) closely resemble a standard liability clause. It assigns a confidence score to its best guess. That guess is a decision node. The interface needs to tell the lawyer it is highlighting a potential match, rather than stating it found the definitive clause.

  5. Examine the choice: For each choice point, figure out the specific internal math or comparison being done (e.g., matching a part of a contract to a policy or comparing a picture of a broken car to a library of damaged car photos).

    The engineer explains that the system compares the various paragraphs against a database of standard liability clauses from past firm cases. It calculates a text similarity score to decide on a match based on probabilities.

  6. Write clear explanations: Create messages for the user that clearly describe the specific internal action happening when the AI makes a choice.

    The content designer writes a specific message for this exact moment. The text reads: Comparing document text to standard firm clauses to identify potential liability risks.

  7. Update the screen: Put these new, clear explanations into the user interface, replacing vague messages like “Reviewing contracts.”

    The design team removes the generic Processing PDF loading spinner. They insert the new explanation into a status bar located right above the document viewer while the AI thinks.

  8. Check for Trust: Make sure the new screen messages give users a simple reason for any wait time or result, which should make them feel more confident and trusting.

The Impact/Risk Matrix

Once you look closely at the AI’s process, you’ll likely find many points where it makes a choice. An AI might make dozens of small choices for a single complex task. Showing them all creates too much unnecessary information. You need to group these choices.

You can use an Impact/Risk Matrix to sort these choices based on the types of action(s) the AI is taking. Here are examples of impact/risk matrices:

First, look for low-stakes and low-impact decisions.

Low Stakes / Low Impact

  • Example: Organizing a file structure or renaming a document.
  • Transparency Need: Minimal. A subtle toast notification or a log entry suffices. Users can undo these actions easily.

Then identify the high-stakes and high-impact decisions.

High Stakes / High Impact

  • Example: Rejecting a loan application or executing a stock trade.
  • Transparency Need: High. These actions require Proof of Work. The system must demonstrate the rationale before or immediately as it acts.

Consider a financial trading bot that treats all buy/sell orders the same. It executes a $5 trade with the same opacity as a $50,000 trade. Users might question whether the tool recognizes the potential impact of transparency on trading on a large dollar amount. They need the system to pause and show its work for the high-stakes trades. The solution is to introduce a Reviewing Logic state for any transaction exceeding a specific dollar amount, allowing the user to see the factors driving the decision before execution.

Mapping Nodes to Patterns: A Design Pattern Selection Rubric

Once you have identified your experience's key decision nodes, you must decide which UI pattern applies to each one you’ll display. In Designing For Agentic AI, we introduced patterns like the Intent Preview (for high-stakes control) and the Action Audit (for retrospective safety). The decisive factor in choosing between them is reversibility.

We filter every decision node through the impact matrix in order to assign the correct pattern:

High Stakes & Irreversible: These nodes require an Intent Preview. Because the user cannot easily undo the action (e.g., permanently deleting a database), the transparency moment must happen before execution. The system must pause, explain its intent, and require confirmation.

High Stakes & Reversible: These nodes can rely on the Action Audit & Undo pattern. If the AI-powered sales agent moves a lead to a different pipeline, it can do so autonomously as long as it notifies the user and offers an immediate Undo button.

By strictly categorizing nodes this way, we avoid “alert fatigue.” We reserve the high-friction Intent Preview only for the truly irreversible moments, while relying on the Action Audit to maintain speed for everything else.

Reversible Irreversible
Low Impact Type: Auto-Execute
UI: Passive Toast / Log
Ex: Renaming a file
Type: Confirm
UI: Simple Undo option
Ex: Archiving an email
High Impact Type: Review
UI: Notification + Review Trail
Ex: Sending a draft to a client
Type: Intent preview
UI: Modal / Explicit Permission
Ex: Deleting a server

Table 1: The impact and reversibility matrix can then be used to map your moments of transparency to design patterns.

Qualitative Validation: “The Wait, Why?” Test

You can identify potential nodes on a whiteboard, but you must validate them with human behavior. You need to verify whether your map matches the user’s mental model. I use a protocol called the “Wait, Why?” Test.

Ask a user to watch the agent complete a task. Instruct them to speak aloud. Whenever they ask a question, “Wait, why did it do that?” or “Is it stuck?” or “Did it hear me?” — you mark a timestamp.

These questions signal user confusion. The user feels their control slipping away. For example, in a study for a healthcare scheduling assistant, users watched the agent book an appointment. The screen sat static for four seconds. Participants consistently asked, “Is it checking my calendar or the doctor’s?”

That question revealed a missing Transparency Moment. The system needed to split that four-second wait into two distinct steps: “Checking your availability” followed by “Syncing with provider schedule.”

This small change reduced users’ expressed levels of anxiety.

Transparency fails when it only describes a system action. The interface must connect the technical process to the user’s specific goal. A screen displaying “Checking your availability” falls flat because it lacks context. The user understands that the AI is looking at a calendar, but they do not know why.

We must pair the action with the outcome. The system needs to split that four-second wait into two distinct steps. First, the interface displays “Checking your calendar to find open times.” Then it updates to “Syncing with the provider’s schedule to secure your appointment.” This grounds the technical process in the user’s actual life.

Consider an AI managing inventory for a local cafe. The system encounters a supply shortage. An interface reading “contacting vendor” or “reviewing options” creates anxiety. The manager wonders if the system is canceling the order or buying an expensive alternative. A better approach is to explain the intended result: “Evaluating alternative suppliers to maintain your Friday delivery schedule.” This tells the user exactly what the AI is trying to achieve.

Operationalizing the Audit

You have completed the Decision Node Audit and filtered your list through the Impact and Risk Matrix. You now have a list of essential moments for being transparent. Next, you need to create them in the UI. This step requires teamwork across different departments. You can’t design transparency by yourself using a design tool. You need to understand how the system works behind the scenes.

Start with a Logic Review. Meet with your lead system designer. Bring your map of decision nodes. You need to confirm that the system can actually share these states. I often find that the technical system doesn’t reveal the exact state I want to show. The engineer might say the system just returns a general “working” status. You must push for a detailed update. You need the system to send a specific notice when it switches from reading text to checking rules. Without that technical connection, your design is impossible to build.

Next, involve the Content Design team. You have the technical reason for the AI’s action, but you need a clear, human-friendly explanation. Engineers provide the underlying process, but content designers provide the way it’s communicated. Do not write these messages alone. A developer might write “Executing function 402,” which is technically correct but meaningless to the user. A designer might write “Thinking,” which is friendly but too vague. A content strategist finds the right middle ground. They create specific phrases, such as “Scanning for liability risks”, that show the AI is working without confusing the user.

Finally, test the transparency of your messages. Don’t wait until the final product is built to see if the text works. I conduct comparison tests on simple prototypes where the only thing that changes is the status message. For example, I show one group (Group A) a message that says “Verifying identity” and another group (Group B) a message that says “Checking government databases” (these are made-up examples, but you understand the point). Then I ask them which AI feels safer. You’ll often discover that certain words cause worry, while others build trust. You must treat the wording as something you need to test and prove effective.

How This Changes the Design Process

Conducting these audits has the potential to strengthen how a team works together. We stop handing off polished design files. We start using messy prototypes and shared spreadsheets. The core tool becomes a transparency matrix. Engineers and the content designers edit this spreadsheet together. They map the exact technical codes to the words the user will read.

Teams will experience friction during the logic review. Imagine a designer asking the engineer how the AI decides to decline a transaction submitted on an expense report. The engineer might say the backend only outputs a generic status code like “Error: Missing Data”. The designer states that this isn’t actionable information on the screen. The designer negotiates with the engineer to create a specific technical hook. The engineer writes a new rule so the system reports exactly what is missing, such as a missing receipt image.

Content designers act as translators during this phase. A developer might write a technically accurate string like “Calculating confidence threshold for vendor matching.” A content designer translates that string into a phrase that builds trust for a specific outcome. The strategist rewrites it as “Comparing local vendor prices to secure your Friday delivery.” The user understands the action and the result.

The entire cross-functional team sits in on user testing sessions. They watch a real person react to different status messages. Seeing a user panic because the screen says “Executing trade” forces the team to rethink their approach. The engineers and designers align on better wording. They change the text to “Verifying sufficient funds” before buying stock. Testing together guarantees the final interface serves both the system logic and the user’s peace of mind.

It does require time to incorporate these additional activities into the team’s calendar. However, the end result should be a team that communicates more openly, and users who have a better understanding of what their AI-powered tools are doing on their behalf (and why). This integrated approach is a cornerstone of designing truly trustworthy AI experiences.

Trust Is A Design Choice

We often view trust as an emotional byproduct of a good user experience. It is easier to view trust as a mechanical result of predictable communication.

We build trust by showing the right information at the right time. We destroy it by overwhelming the user or hiding the machinery completely.

Start with the Decision Node Audit, particularly for agentic AI tools and products. Find the moments where the system makes a judgment call. Map those moments to the Risk Matrix. If the stakes are high, open the box. Show the work.

In the next article, we will look at how to design these moments: how to write the copy, structure the UI, and handle the inevitable errors when the agent gets it wrong.

Appendix: The Decision Node Audit Checklist

Phase 1: Setup and Mapping

✅ Get the team together: Bring in the product owners, business analysts, designers, key decision-makers, and the engineers who built the AI.

Hint: You need the engineers to explain the actual backend logic. Do not attempt this step alone.

✅ Draw the whole process: Document every step the AI takes, from the user’s first action to the final result.

Hint: A physical whiteboard session often works best for drawing out these initial steps.

Phase 2: Locating the Hidden Logic

✅ Find where things are unclear: Look at the process map for any spot where the AI compares options or inputs that do not have one perfect match.

✅ Identify the best guess steps: For each unclear spot, check if the system uses a confidence score. For example, ask if the system is 85 percent sure. These are the points where the AI makes a final choice.

✅ Examine the choice: For each choice point, figure out the specific internal math or comparison being done. An example is matching a part of a contract to a policy. Another example involves comparing a picture of a broken car to a library of damaged car photos.

Phase 3: Creating the User Experience

✅ Write clear explanations: Create messages for the user that clearly describe the specific internal action happening when the AI makes a choice.

Hint: Ground your messages in concrete reality. If an AI books a meeting with a client at a local cafe, tell the user the system is checking the cafe reservation system.

✅ Update the screen: Put these new, clear explanations into the user interface. Replace vague messages like Reviewing contracts with your specific explanations.

✅ Check for Trust: Make sure the new screen messages give users a simple reason for any wait time or result. This should make them feel confident and trusting.

Hint: Test these messages with actual users to verify they understand the specific outcome being achieved.



Read the whole story
alvinashcraft
57 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories