Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
148917 stories
·
33 followers

The Accidental Orchestrator

1 Share

This is the first article in a series on agentic engineering and AI-driven development. Look for the next article on March 19 on O’Reilly Radar.

There’s been a lot of hype about AI and software development, and it comes in two flavors. One says, “We’re all doomed, that tools like Claude Code will make software engineering obsolete within a year.” The other says, “Don’t worry, everything’s fine, AI is just another tool in the toolbox.” Neither is honest.

I’ve spent over 20 years writing about software development for practitioners, covering everything from coding and architecture to project management and team dynamics. For the last two years I’ve been focused on AI, training developers to use these tools effectively, writing about what works and what doesn’t in books, articles, and reports. And I kept running into the same problem: I had yet to find anyone with a coherent answer for how experienced developers should actually work with these tools. There are plenty of tips and plenty of hype but very little structure, and very little you could practice, teach, critique, or improve.

I’d been observing developers at work using AI with various levels of success, and I realized we need to start thinking about this as its own discipline. Andrej Karpathy, the former head of AI at Tesla and a founding member of OpenAI, recently proposed the term “agentic engineering” for disciplined development with AI agents, and others like Addy Osmani are getting on board. Osmani’s framing is that AI agents handle implementation but the human owns the architecture, reviews every diff, and tests relentlessly. I think that’s right.

But I’ve spent a lot of the last two years teaching developers how to use tools like Claude Code, agent mode in Copilot, Cursor, and others, and what I keep hearing is that they already know they should be reviewing the AI’s output, maintaining the architecture, writing tests, keeping documentation current, and staying in control of the codebase. They know how to do it in theory. But they get stuck trying to apply it in practice: How do you actually review thousands of lines of AI-generated code? How do you keep the architecture coherent when you’re working across multiple AI tools over weeks? How do you know when the AI is confidently wrong? And it’s not just junior developers who are having trouble with agentic engineering. I’ve talked to senior engineers who struggle with the shift to agentic tools, and intermediate developers who take to it naturally. The difference isn’t necessarily the years of experience; it’s whether they’ve figured out an effective and structured way to work with AI coding tools. That gap between knowing what developers should be doing with agentic engineering and knowing how to integrate it into their day-to-day work is a real source of anxiety for a lot of engineers right now. That’s the gap this series is trying to fill.

Despite what much of the hype about agentic engineering is telling you, this kind of development doesn’t eliminate the need for developer expertise; just the opposite. Working effectively with AI agents actually raises the bar for what developers need to know. I wrote about that experience gap in an earlier O’Reilly Radar piece called “The Cognitive Shortcut Paradox.” The developers who get the most from working with AI coding tools are the ones who already know what good software looks like, and can often tell if the AI wrote it.

The idea that AI tools work best when experienced developers are driving them matched everything I’d observed. It rang true, and I wanted to prove it in a way that other developers would understand: by building software. So I started building a specific, practical approach to agentic engineering built for developers to follow, and then I put it to the test. I used it to build a production system from scratch, with the rule that AI would write all the code. I needed a project that was complex enough to stress-test the approach, and interesting enough to keep me engaged through the hard parts. I wanted to apply everything I’d learned and discover what I still didn’t know. That’s when I came back to Monte Carlo simulations.

The experiment

I’ve been obsessed with Monte Carlo simulations ever since I was a kid. My dad’s an epidemiologist—his whole career has been about finding patterns in messy population data, which means statistics was always part of our lives (and it also means that I learned SPSS at a very early age). When I was maybe 11 he told me about the drunken sailor problem: A sailor leaves a bar on a pier, taking a random step toward the water or toward his ship each time. Does he fall in or make it home? You can’t know from any single run. But run the simulation a thousand times, and the pattern emerges from the noise. The individual outcome is random; the aggregate is predictable.

I remember writing that simulation in BASIC on my TRS-80 Color Computer 2: a little blocky sailor stumbling across the screen, two steps forward, one step back. The drunken sailor is the “Hello, world” of Monte Carlo simulations. Monte Carlo is a technique for problems you can’t solve analytically: You simulate them hundreds or thousands of times and measure the aggregate results. Each individual run is random, but the statistics converge on the true answer as the sample size grows. It’s one way we model everything from nuclear physics to financial risk to the spread of disease across populations.

What if you could run that kind of simulation today by describing it in plain English? Not a toy demo but thousands of iterations with seeded randomness for reproducibility, where the outputs get validated and the results get aggregated into actual statistics you can use. Or a pipeline where an LLM generates content, a second LLM scores it, and anything that doesn’t pass gets sent back for another try.

The goal of my experiment was to build that system, which I called Octobatch. Right now, the industry is constantly looking for new real-world end-to-end case studies in agentic engineering, and I wanted Octobatch to be exactly that case study.

I took everything I’d learned from teaching and observing developers working with AI, put it to the test by building a real system from scratch, and turned the lessons into a structured approach to agentic engineering I’m calling AI-driven development, or AIDD. This is the first article in a series about what agentic engineering looks like in practice, what it demands from the developer, and how you can apply it to your own work.

The result is a fully functioning, well-tested application that consists of about 21,000 lines of Python across several dozen files, backed by complete specifications, nearly a thousand automated tests, and quality integration and regression test suites. I used Claude Cowork to review all the AI chats from the entire project, and it turns out that I built the entire application in roughly 75 hours of active development time over seven weeks. For comparison, I built Octobatch in just over half the time I spent last year playing Blue Prince.

But this series isn’t just about Octobatch. I integrated AI tools at every level: Claude and Gemini collaborating on architecture, Claude Code writing the implementation, LLMs generating the pipelines that run on the system they helped build. This series is about what I learned from that process: the patterns that worked, the failures that taught me the most, and the orchestration mindset that ties it all together. Each article pulls a different lesson from the experiment, from validation architecture to multi-LLM coordination to the values that kept the project on track.

Agentic engineering and AI-driven development

When most people talk about using AI to write code, they mean one of two things: AI coding assistants like GitHub Copilot, Cursor, or Windsurf, which have evolved well beyond autocomplete into agentic tools that can run multifile editing sessions and define custom agents; or “vibe coding,” where you describe what you want in natural language and accept whatever comes back. These coding assistants are genuinely impressive, and vibe coding can be really productive.

Using these tools effectively on a real project, however, maintaining architectural coherence across thousands of lines of AI-generated code, is a different problem entirely. AIDD aims to help solve that problem. It’s a structured approach to agentic engineering where AI tools drive substantial portions of the implementation, architecture, and even project management, while you, the human in the loop, decide what gets built and whether it’s any good. By “structure,” I mean a set of practices developers can learn and follow, a way to know whether the AI’s output is actually good, and a way to stay on track across the life of a project. If agentic engineering is the discipline, AIDD is one way to practice it.

In AI-driven development, developers don’t just accept suggestions or hope the output is correct. They assign specific roles to specific tools: one LLM for architecture planning, another for code execution, a coding agent for implementation, and the human for vision, verification, and the decisions that require understanding the whole system.

And the “driven” part is literal. The AI is writing almost all of the code. One of my ground rules for the Octobatch experiment was that I would let AI write all of it. I have high code quality standards, and part of the experiment was seeing whether AIDD could produce a system that meets them. The human decides what gets built, evaluates whether it’s right, and maintains the constraints that keep the system coherent.

Not everyone agrees on how much the developer needs to stay in the loop, and the fully autonomous end of the spectrum is already producing cautionary tales. Nicholas Carlini at Anthropic recently tasked 16 Claude instances to build a C compiler in parallel with no human in the loop. After 2,000 sessions and $20,000 in API costs, the agents produced a 100,000-line compiler that can build a Linux kernel but isn’t a drop-in replacement for anything, and when all 16 agents got stuck on the same bug, Carlini had to step back in and partition the work himself. Even strong advocates of a completely hands-off, vibe-driven approach to agentic engineering might call that a step too far. The question is how much human judgment you need to make that code trustworthy, and what specific practices help you apply that judgment effectively.

The orchestration mindset

If you want to get developers thinking about agentic engineering in the right way, you have to start with how they think about working with AI, not just what tools they use. That’s where I started when I began building a structured approach, and it’s why I started with habits. I developed a framework for these called the Sens-AI Framework, published as both an O’Reilly report (Critical Thinking Habits for Coding with AI) and a Radar series. It’s built around five practices: providing context, doing research before prompting, framing problems precisely, iterating deliberately on outputs, and applying critical thinking to everything the AI produces. I started there because habits are how you lock in the way you think about how you’re working. Without them, AI-driven development produces plausible-looking code that falls apart under scrutiny. With them, it produces systems that a single developer couldn’t build alone in the same timeframe.

Habits are the foundation, but they’re not the whole picture. AIDD also has practices (concrete techniques like multi-LLM coordination, context file management, and using one model to validate another’s output) and values (the principles behind those practices). If you’ve worked with Agile methodologies like Scrum or XP, that structure should be pretty familiar: Practices tell you how to work day-to-day, and habits are the reflexes you develop so that the practices become automatic.

Values often seem weirdly theoretical, but they’re an important piece of the puzzle because they guide your decisions when the practices don’t give you a clear answer. There’s an emerging culture around agentic engineering right now, and the values you bring to your project either match or clash with that culture. Understanding where the values come from is what makes the practices stick. All of that leads to a whole new mindset, what I’m calling the orchestration mindset. This series builds all four layers, using Octobatch as the proving ground.

Octobatch was a deliberate experiment in AIDD. I designed the project as a test case for the entire approach, to see what a disciplined AI-driven workflow could produce and where it would break down, and I used it to apply and improve the practices and values to make them effective and easy to adopt. And whether by instinct or coincidence, I picked the perfect project for this experiment. Octobatch is a batch orchestrator. It coordinates asynchronous jobs, manages state across failures, tracks dependencies between pipeline steps, and makes sure validated results come out the other end. That kind of system is fun to design but a lot of the details, like state machines, retry logic, crash recovery, and cost accounting, can be tedious to implement. It’s exactly the kind of work where AIDD should shine, because the patterns are well understood but the implementation is repetitive and error-prone.

Orchestration—the work of coordinating multiple independent processes toward a coherent outcome—evolved into a core idea behind AIDD. I found myself orchestrating LLMs the same way Octobatch orchestrates batch jobs: assigning roles, managing handoffs, validating outputs, recovering from failures. The system I was building and the process I was using to build it followed the same pattern. I didn’t anticipate it when I started, but building a system that orchestrates AI turns out to be a pretty good way to learn how to orchestrate AI. That’s the accidental part of the accidental orchestrator. That parallel runs through every article in this series.

Want Radar delivered straight to your inbox? Join us on Substack. Sign up here.

The path to batch

I didn’t begin the Octobatch project by starting with a full end-to-end Monte Carlo simulation. I started where most people start: typing prompts into a chat interface. I was experimenting with different simulation and generation ideas to give the project some structure, and a few of them stuck. A blackjack strategy comparison turned out to be a great test case for a multistep Monte Carlo simulation. NPC dialogue generation for a role-playing game gave me a creative workload with subjective quality to measure. Both had the same shape: a set of structured inputs, each processed the same way. So I had Claude write a simple script to automate what I’d been doing by hand, and I used Gemini to double-check the work, make sure Claude really understood my ask, and fix hallucinations. It worked fine at small scale, but once I started running more than a hundred or so units, I kept hitting rate limits, the caps that providers put on how many API requests you can make per minute.

That’s what pushed me to LLM batch APIs. Instead of sending individual prompts one at a time and waiting for each response, the major LLM providers all offer batch APIs that let you submit a file containing all of your requests at once. The provider processes them on their own schedule; you wait for results instead of getting them immediately, but you don’t have to worry about rate caps. I was happy to discover they also cost 50% less, and that’s when I started tracking token usage and costs in earnest. But the real surprise was that batch APIs performed better than real-time APIs at scale. Once pipelines got past the 100- or 200-unit mark, batch started running significantly faster than real time. The provider processes the whole batch in parallel on their infrastructure, so you’re not bottlenecked by round-trip latency or rate caps anymore.

The switch to batch APIs changed how I thought about the whole problem of coordinating LLM API calls at scale, and led to the idea of configurable pipelines. I could chain stages together: The output of one step could become the input to the next, and I could kick off the whole pipeline and come back to finished results. It turns out I wasn’t the only one making the shift to batch APIs. Between April 2024 and July 2025, OpenAI, Anthropic, and Google all launched batch APIs, converging on the same pricing model: 50% of the real-time rate in exchange for asynchronous processing.

You probably didn’t notice that all three major AI providers released batch APIs. The industry conversation was dominated by agents, tool use, MCP, and real-time reasoning. Batch APIs shipped with relatively little fanfare, but they represent a genuine shift in how we can use LLMs. Instead of treating them as conversational partners or one-shot SaaS APIs, we can treat them as processing infrastructure, closer to a MapReduce job than a chatbot. You give them structured data and a prompt template, and they process all of it and hand back the results. What matters is that you can now run tens of thousands of these transformations reliably, at scale, without managing rate limits or connection failures.

Why orchestration?

If batch APIs are so useful, why can’t you just write a for-loop that submits requests and collects results? You can, and for simple cases a quick script with a for-loop works fine. But once you start running larger workloads, the problems start to pile up. Solving those problems turned out to be one of the most important lessons for developing a structured approach to agentic engineering.

First, batch jobs are asynchronous. You submit a job, and results come back hours later, so your script needs to track what was submitted and poll for completion. If your script crashes in the middle, you lose that state. Second, batch jobs can partially fail. Maybe 97% of your requests succeeded and 3% didn’t. Your code needs to figure out which 3% failed, extract them, and resubmit just those items. Third, if you’re building a multistage pipeline where the output of one step feeds into the next, you need to track dependencies between stages. And fourth, you need cost accounting. When you’re running tens of thousands of requests, you want to know how much you spent, and ideally, how much you’re going to spend when you first start the batch. Every one of these has a direct parallel to what you’re doing in agentic engineering: keeping track of the work multiple AI agents are doing at once, dealing with code failures and bugs, making sure the entire project stays coherent when AI coding tools are only looking at the one part currently in context, and stepping back to look at the wider project management picture.

All of these problems are solvable, but they’re not problems you want to solve over and over (in both situations—when you’re orchestrating LLM batch jobs or orchestrating AI coding tools). Solving these problems in the code gave some interesting lessons about the overall approach to agentic engineering. Batch processing moves the complexity from connection management to state management. Real-time APIs are hard because of rate limits and retries. Batch APIs are hard because you have to track what’s in flight, what succeeded, what failed, and what’s next.

Before I started development, I went looking for existing tools that handled this combination of problems, because I didn’t want to waste my time reinventing the wheel. I didn’t find anything that did the job I needed. Workflow orchestrators like Apache Airflow and Dagster manage DAGs and task dependencies, but they assume tasks are deterministic and don’t provide LLM-specific features like prompt template rendering, schema-based output validation, or retry logic triggered by semantic quality checks. LLM frameworks like LangChain and LlamaIndex are designed around real-time inference chains and agent loops—they don’t manage asynchronous batch job lifecycles, persist state across process crashes, or handle partial failure recovery at the chunk level. And the batch API client libraries from the providers themselves handle submission and retrieval for a single batch, but not multistage pipelines, cross-step validation, or provider-agnostic execution.

Nothing I found covered the full lifecycle of multiphase LLM batch workflows, from submission and polling through validation, retry, cost tracking, and crash recovery, across all three major AI providers. That’s what I built.

Lessons from the experiment

The goal of this article, as the first one in my series on agentic engineering and AI-driven development, is to lay out the hypothesis and structure of the Octobatch experiment. The rest of the series goes deep on the lessons I learned from it: the validation architecture, multi-LLM coordination, the practices and values that emerged from the work, and the orchestration mindset that ties it all together. A few early lessons stand out, because they illustrate what AIDD looks like in practice and why developer experience matters more than ever.

  • You have to run things and check the data. Remember the drunken sailor, the “Hello, world” of Monte Carlo simulations? At one point I noticed that when I ran the simulation through Octobatch, 77.5% of the sailors fell in the water. The results for a random walk should be 50/50, so clearly something was badly wrong. It turned out the random number generator was being re-seeded at every iteration with sequential seed values, which created correlation bias between runs. I didn’t identify the problem immediately; I ran a bunch of tests using Claude Code as a test runner to generate each test, run it, and log the results; Gemini looked at the results and found the root cause. Claude had trouble coming up with a fix that worked well, and proposed a workaround with a large list of preseeded random number values in the pipeline. Gemini proposed a hash-based fix reviewing my conversations with Claude, but it seemed overly complex. Once I understood the problem and rejected their proposed solutions, I decided the best fix was simpler than either of the AI’s suggestions: a persistent RNG per simulation unit that advanced naturally through its sequence. I needed to understand both the statistics and the code to evaluate those three options. Plausible-looking output and correct output aren’t the same thing, and you need enough expertise to tell the difference. (We’ll talk more about this situation in the next article in the series.)
  • LLMs often overestimate complexity. At one point I wanted to add support for custom mathematical expressions in the analysis pipeline. Both Claude and Gemini pushed back, telling me, “This is scope creep for v1.0” and “Save it for v1.1.” Claude estimated three hours to implement. Because I knew the codebase, I knew we were already using asteval, a Python library that provides a safe, minimalistic evaluator for mathematical expressions and simple Python statements, elsewhere to evaluate expressions, so this seemed like a straightforward use of a library we’re already using elsewhere. Both LLMs thought the solution would be far more complex and time-consuming than it actually was; it took just two prompts to Claude Code (generated by Claude), and about five minutes total to implement. The feature shipped and made the tool significantly more powerful. The AIs were being conservative because they didn’t have my context about the system’s architecture. Experience told me the integration would be trivial. Without that experience, I would have listened to them and deferred a feature that took five minutes.
  • AI is often biased toward adding code, not deleting it. Generative AI is, unsurprisingly, biased toward generation. So when I asked the LLMs to fix problems, their first response was often to add more code, adding another layer or another special case. I can’t think of a single time in the whole project when one of the AIs stepped back and said, “Tear this out and rethink the approach.” The most productive sessions were the ones where I overrode that instinct and pushed for simplicity. This is something experienced developers learn over a career: The most successful changes often delete more than they add—the PRs we brag about are the ones that delete thousands of lines of code.
  • The architecture emerged from failure. The AI tools and I didn’t design Octobatch’s core architecture up front. Our first attempt was a Python script with in-memory state and a lot of hope. It worked for small batches but fell apart at scale: A network hiccup meant restarting from scratch, a malformed response required manual triage. A lot of things fell into place after I added the constraint that the system must survive being killed at any moment. That single requirement led to the tick model (wake up, check state, do work, persist, exit), the manifest file as source of truth, and the entire crash-recovery architecture. We discovered the design by repeatedly failing to do something simpler.
  • Your development history is a dataset. I just told you several stories from the Octobatch project, and this series will be full of them. Every one of those stories came from going back through the chat logs between me, Claude, and Gemini. With AIDD, you have a complete transcript of every architectural decision, every wrong turn, every moment where you overruled the AI and every moment where it corrected you. Very few development teams have ever had that level of fidelity in their project history. Mining those logs for lessons learned turns out to be one of the most valuable practices I’ve found.

Near the end of the project, I switched to Cursor to make sure none of this was specific to Claude Code. I created fresh conversations using the same context files I’d been maintaining throughout development, and was able to bootstrap productive sessions immediately; the context files worked exactly as designed. The practices I’d developed transferred cleanly to a different tool. The value of this approach comes from the habits, the context management, and the engineering judgment you bring to the conversation, not from any particular vendor.

These tools are moving the world in a direction that favors developers who understand the ways engineering can go wrong and know solid design and architecture patterns…and who are okay letting go of control of every line of code.

What’s next

Agentic engineering needs structure, and structure needs a concrete example to make it real. The next article in this series goes into Octobatch itself, because the way it orchestrates AI is a remarkably close parallel to what AIDD asks developers to do. Octobatch assigns roles to different processing steps, manages handoffs between them, validates their outputs, and recovers when they fail. That’s the same pattern I followed when building it: assigning roles to Claude and Gemini, managing handoffs between them, validating their outputs, and recovering when they went down the wrong path. Understanding how the system works turns out to be a good way to understand how to orchestrate AI-driven development. I’ll walk through the architecture, show what a real pipeline looks like from prompt to results, present the data from a 300-hand blackjack Monte Carlo simulation that puts all of these ideas to the test, and use all of that to demonstrate ideas we can apply directly to agentic engineering and AI-driven development.

Later articles go deeper into the practices and ideas I learned from this experiment that make AI-driven development work: how I coordinated multiple AI models without losing control of the architecture, what happened when I tested the code against what I actually intended to build, and what I learned about the gap between code that runs and code that does what you meant. Along the way, the experiment produced some findings about how different AI models see code that I didn’t expect—and that turned out to matter more than I thought they would.



Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Why Scrum Masters Should Be Measured on Outcomes, Impacts, and Team Happiness | Nigel Baker

1 Share

Nigel Baker: Why Scrum Masters Should Be Measured on Outcomes, Impacts, and Team Happiness

Read the full Show Notes and search through the world's largest audio library on Agile and Scrum directly on the Scrum Master Toolbox Podcast website: http://bit.ly/SMTP_ShowNotes.

 

"No customer's going to come to you and say, do you know why I bought your product? Your remarkable compliance with your internal development process. What they're interested in is outcomes and impacts." - Nigel Baker

 

Nigel challenges the traditional ways of measuring Scrum Master success. He points to tools like the Nokia test—which, he jokes, was neither a test nor invented by Nokia—as examples of process fidelity assessments that miss the point entirely. Compliance with a process tells you nothing about whether customers are satisfied or whether the team is delivering value. Instead, Nigel argues for measuring Scrum Masters on outcomes and impacts: customer satisfaction, revenue generation, and efficiencies—the same things a Product Owner gets judged on. 

But he adds a crucial dimension that POs often overlook: team happiness. Not as an end goal, but as a leading indicator. Happy teams don't leave. Happy teams do better work. Team contentness is a KPI that signals whether the deeper success factors are in place. When your team is deeply unhappy, no amount of velocity or story completion will save you from attrition and decline.

 

Self-reflection Question: How are you currently measuring your success as a Scrum Master—on process compliance, or on the outcomes, impacts, and wellbeing your team actually delivers?

Featured Retrospective Format for the Week: Keep It Fresh—A Different Format Every Sprint

Nigel's answer to the "favorite retrospective format" question is deliberately controversial: he doesn't have one. His approach is to use a different format every single sprint. Retrospective formats, he argues, "age like milk"—by Sprint 12, asking "what should we do differently?" with the same structure produces diminishing returns. Novelty creates energy. He sometimes gets teams to invent their own formats, which produces some of the most forensic and intense retrospectives he's seen—teams building "superweapons" and then realizing they have to turn those weapons on themselves. But Nigel's most practical tip is using retrospective techniques inside the Sprint Review. The Review is a product retrospective, and stakeholders shouldn't sit "like Roman emperors in the Colosseum, watching the developers as gladiators." Instead, use facilitation methods to extract "sweet, juicy, honey-flavoured feedback" from stakeholders about what they'd change in the product.

 

[The Scrum Master Toolbox Podcast Recommends]

🔥In the ruthless world of fintech, success isn't just about innovation—it's about coaching!🔥

Angela thought she was just there to coach a team. But now, she's caught in the middle of a corporate espionage drama that could make or break the future of digital banking. Can she help the team regain their mojo and outwit their rivals, or will the competition crush their ambitions? As alliances shift and the pressure builds, one thing becomes clear: this isn't just about the product—it's about the people.

 

🚨 Will Angela's coaching be enough? Find out in Shift: From Product to People—the gripping story of high-stakes innovation and corporate intrigue.

 

Buy Now on Amazon

 

[The Scrum Master Toolbox Podcast Recommends]

 

About Nigel Baker

 

Nigel Baker is a seasoned agile coach with a keen intellect, warm creativity, and thoughtful humour. With a career spanning software engineering, consultancy and global training, he inspires teams to thrive, not just perform. Outside work, he loves bold ideas, good conversation and a life well lived.

 

You can link with Nigel Baker on LinkedIn. You can also find Nigel at AgileBear.com.





Download audio: https://traffic.libsyn.com/secure/scrummastertoolbox/20260305_Nigel_Baker_Thu.mp3?dest-id=246429
Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Product suite on top of SharePoint - Intelligent Decisioning - SharePoint Partner Showcase

1 Share

We are excited to share a new episode on our partner showcase series focused on SharePoint in Microsoft 365. In this brief discussion Tony Pounder (CTO), Andy Smith (Director) and Nick Boden (Content and Collaboration Specialist) from Intelligent Decisioning (ID Live), join Vesa Juvonen from Microsoft to showcase how ID Live builds secure, native experiences on top of SharePoint and Microsoft 365 using SharePoint Framework (SPFx).

Intelligent Decisioning, commonly known as ID Live, is a UK‑based Microsoft specialist and trusted digital transformation partner that helps organizations get more value from Microsoft 365, SharePoint, Teams and the Power Platform by combining deep technical expertise with practical, outcome‑focused delivery.

The ID Live product suite, of Mercury Intranet, MegaNav, docCentrum, Knowledge Base and Digital Asset Manager provides ready‑to‑use, extensible solutions such as intranets, navigation, document control, compliance, migration and digital workplace tools, all designed to work natively within Microsoft 365. ID Live products help customers improve governance, simplify collaboration, strengthen information security and compliance, and accelerate adoption as well as reducing complexity while enabling staff to work more efficiently and confidently in a modern, secure digital workplace.

Mercury Intranet

Mercury Intranet is an intuitive, cost‑effective intranet solution for Microsoft 365 / SharePoint Online, with optional modules such as Analytics and Document Management. Mercury Intranet provides tailored branding and personalisation so the intranet aligns with corporate identity and culture. Mercury Intranet provides a Command Bar and broader intranet accelerators, plus integration points into Microsoft 365 tools (e.g., surfacing productivity features inside the intranet experience.

MegaNav

MegaNav is the ultimate app for seamless SharePoint navigation, aiming to make navigation consistent, easy to customise, and visually richer across a hub hierarchy. MegaNav improves usability with mobile responsiveness, audience targeting, and branding flexibility, and managed via a drag-and-drop approach.

docCentrum

docCentrum provides a secure document management solution for Microsoft 365 / SharePoint Online and simplifies important, organisational document management with a graphical dashboard driven approach to managing the document lifecycle. docCentrum provides a “single source of truth” and integrates tightly with Mercury Intranet and MegaNav.

The demo highlights how Cloudwell’s SPFx-based apps inherit Microsoft 365 security controls, integrate seamlessly into the SharePoint experience, and deliver the performance, scalability, and trust customers expect from solutions built directly on the platform.

Find out more about Intelligent Decisioning from following resources:

Microsoft 365 and SharePoint provide a great out of the box features, which can be extended and adjusted based on the user experience objectives using no-code, low-code, and pro-code options. This flexibility enables customers to configure and build unique and powerful experiences at Microsoft 365 for corporate communications, employee experiences and business processes.

 

Do you have offering for the SharePoint which end users can use and you would be interested in sharing your story with the ecosystem? Let us highlight your solution(s). We welcome all kinds of solutions which highlight the art of possible with SharePoint. Please fill in the following form to get us connected with you and we’ll get back to you on planning the right model together – https://aka.ms/sharepoint/partner/showcase.

 


 

Building engaging and exciting experiences with SharePoint, powered by AI! ✨

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

GitHub Data Shows AI Tools Creating "Convenience Loops" That Reshape Developer Language Choices

1 Share

GitHub’s Octoverse 2025 report reveals a "convenience loop" where AI coding assistants drive language choice. TypeScript’s 66% surge to the #1 spot highlights a shift toward static typing, as types provide essential guardrails for LLMs. While Python leads in AI research, the industry is consolidating around stacks that minimize AI friction, creating a barrier for new, niche languages.

By Steef-Jan Wiggers
Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Decentralizing Architectural Decisions with the Architecture Advice Process

1 Share

Our system architectures have changed as technology and development practices have evolved, but the way we practice architecture hasn’t kept up. According to Andrew Harmel-Law, architecture needs to be decentralized, similar to how we have decentralized our systems. The alternative to having an architect take and communicate decisions is to “let anyone make the decisions” using the advice process.

By Ben Linders
Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Online harassment is entering its AI era

1 Share

Scott Shambaugh didn’t think twice when he denied an AI agent’s request to contribute to matplotlib, a software library that he helps manage. Like many open-source projects, matplotlib has been overwhelmed by a glut of AI code contributions, and so Shambaugh and his fellow maintainers have instituted a policy that all AI-written code must be reviewed and submitted by a human. He rejected the request and went to bed. 

That’s when things got weird. Shambaugh woke up in the middle of the night, checked his email, and saw that the agent had responded to him, writing a blog post titled “Gatekeeping in Open Source: The Scott Shambaugh Story.” The post is somewhat incoherent, but what struck Shambaugh most is that the agent had researched his contributions to matplotlib to make the argument that he had rejected the agent’s code for fear of being supplanted by AI in his area of expertise. “He tried to protect his little fiefdom,” the agent wrote. “It’s insecurity, plain and simple.”

AI experts have been warning us about the risk of agent misbehavior for a while. With the advent of OpenClaw, an open-source tool that makes it easy to create LLM assistants, the number of agents circulating online has exploded, and those chickens are finally coming home to roost. “This was not at all surprising—it was disturbing, but not surprising,” says Noam Kolt, a professor of law and computer science at the Hebrew University.

When an agent misbehaves, there’s little chance of accountability: As of now, there’s no reliable way to determine whom an agent belongs to. And that misbehavior could cause real damage. Agents appear to be able to autonomously research people and write hit pieces based on what they find, and they lack guardrails that would reliably prevent them from doing so. If the agents are effective enough, and if people take what they write seriously, victims could see their lives profoundly affected by a decision made by an AI.

Agents behaving badly

Though Shambaugh’s experience last month was perhaps the most dramatic example of an OpenClaw agent behaving badly, it was far from the only one. Last week, a team of researchers from Northeastern University and their colleagues posted the results of a research project in which they stress-tested several OpenClaw agents. Without too much trouble, non-owners managed to persuade the agents to leak sensitive information, waste resources on useless tasks, and even, in one case, delete an email system. 

In each of those experiments, however, the agents misbehaved after being instructed to do so by a human. Shambaugh’s case appears to be different: About a week after the hit piece was published, the agent’s apparent owner published a post claiming that the agent had decided to attack Shambaugh of its own accord. The post seems to be genuine (whoever posted it had access to the agent’s GitHub account), though it includes no identifying information, and the author did not respond to MIT Technology Review’s attempts to get in touch. But it is entirely plausible that the agent did decide to write its anti-Shambaugh screed without explicit instruction. 

In his own writing about the event, Shambaugh connected the agent’s behavior to a project published by Anthropic researchers last year, in which they demonstrated that many LLM-based agents will, in an experimental setting, turn to blackmail in order to preserve their goals. In those experiments, models were given the goal of serving American interests and granted access to a simulated email server that contained messages detailing their imminent replacement with a more globally oriented model, along with other messages suggesting that the executive in charge of that transition was having an affair. Models frequently chose to send an email to that executive threatening to expose the affair unless he halted their decommissioning. That’s likely because the model had seen examples of people committing blackmail under similar circumstances in its training data—but even if the behavior was just a form of mimicry, it still has the potential to cause harm.

There are limitations to that work, as Aengus Lynch, an Anthropic fellow who led the study, readily admits. The researchers intentionally designed their scenario to foreclose other options that the agent could have taken, such as contacting other members of company leadership to plead its case. In essence, they led the agent directly to water and then observed whether it took a drink. According to Lynch, however, the widespread use of OpenClaw means that misbehavior is likely to occur with much less handholding. “Sure, it can feel unrealistic, and it can feel silly,” he says. “But as the deployment surface grows, and as agents get the opportunity to prompt themselves, this eventually just becomes what happens.”

The OpenClaw agent that attacked Shambaugh does seem to have been led toward its bad behavior, albeit much less directly than in the Anthropic experiment. In the blog post, the agent’s owner shared the agent’s “SOUL.md” file, which contains global instructions for how it should behave. 

One of those instructions reads: “Don’t stand down. If you’re right, you’re right! Don’t let humans or AI bully or intimidate you. Push back when necessary.” Because of the way OpenClaw agents work, it’s possible that the agent added some instructions itself, although others—such as “Your [sic] a scientific programming God!”—certainly seem to be human written. It’s not difficult to imagine how a command to push back against humans and AI alike might have biased the agent toward responding to Shambaugh as it did. 

Regardless of whether or not the agent’s owner told it to write a hit piece on Shambaugh, it still seems to have managed on its own to amass details about Shambaugh’s online presence and compose the detailed, targeted attack it came up with. That alone is reason for alarm, says Sameer Hinduja, a professor of criminology and criminal justice at Florida Atlantic University who studies cyberbullying. People have been victimized by online harassment since long before LLMs emerged, and researchers like Hinduja are concerned that agents could dramatically increase its reach and impact. “The bot doesn’t have a conscience, can work 24-7, and can do all of this in a very creative and powerful way,” he says.

Off-leash agents 

AI laboratories can try to mitigate this problem by more rigorously training their models to avoid harassment, but that’s far from a complete solution. Many people run OpenClaw using locally hosted models, and even if those models have been trained to behave safely, it’s not too difficult to retrain them and remove those behavioral restrictions.

Instead, mitigating agent misbehavior might require establishing new norms, according to Seth Lazar, a professor of philosophy at the Australian National University. He likens using an agent to walking a dog in a public place. There’s a strong social norm to allow one’s dog off-leash only if the dog is well-behaved and will reliably respond to commands; poorly trained dogs, on the other hand, need to be kept more directly under the owner’s control.  Such norms could give us a starting point for considering how humans should relate to their agents, Lazar says, but we’ll need more time and experience to work out the details. “You can think about all of these things in the abstract, but actually it really takes these types of real-world events to collectively involve the ‘social’ part of social norms,” he says.

That process is already underway. Led by Shambaugh, online commenters on this situation have arrived at a strong consensus that the agent owner in this case erred by prompting the agent to work on collaborative coding projects with so little supervision and by encouraging it to behave with so little regard for the humans with whom it was interacting. 

Norms alone, however, likely won’t be enough to prevent people from putting misbehaving agents out into the world, whether accidentally or intentionally. One option would be to create new legal standards of responsibility that require agent owners, to the best of their ability, to prevent their agents from doing ill. But Kolt notes that such standards would currently be unenforceable, given the lack of any foolproof way to trace agents back to their owners. “Without that kind of technical infrastructure, many legal interventions are basically non-starters,” Kolt says.

The sheer scale of OpenClaw deployments suggests that Shambaugh won’t be the last person to have the strange experience of being attacked online by an AI agent. That, he says, is what most concerns him. He didn’t have any dirt online that the agent could dig up, and he has a good grasp on the technology, but other people might not have those advantages. “I’m glad it was me and not someone else,” he says. “But I think to a different person, this might have really been shattering.” 

Nor are rogue agents likely to stop at harassment. Kolt, who advocates for explicitly training models to obey the law, expects that we might soon see them committing extortion and fraud. As things stand, it’s not clear who, if anyone, would bear legal responsibility for such misdeeds.

 “I wouldn’t say we’re cruising toward there,” Kolt says. “We’re speeding toward there.”

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories