Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
155617 stories
·
33 followers

Generative AI in the Real World: Agentic Systems Fundamentals with Maarten Grootendorst

1 Share

BERTopic creator and Google DeepMind developer relations engineer Maarten Grootendorst has spent years helping practitioners build intuition for how AI systems actually work—not just how to prompt them. Maarten joined Ben Lorica to cover the enduring relevance of embeddings and topic models in an LLM-dominated world, his hot take that agents are essentially just an “LLM in a for loop with some tools, some memory, and perhaps some guardrails,” and what separates genuine agentic behavior from a well-constructed pipeline. They also get into the practical trade-offs between open weight and proprietary models, the future of state space models and attention, and why Maarten worries that a generation of builders shipping code they can’t read may be storing up technical debt they can’t repay. “If you don’t really know how an LLM works,” he says, “that intuition [about how to use it effectively] is much more difficult to develop.”

About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2026, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.

Check out other episodes of this podcast on the O’Reilly learning platform or follow us on YouTube, Spotify, Apple, or wherever you get your podcasts.

Transcript

This transcript was created with the help of AI and has been lightly edited for clarity.

0.50 
All right. So today we have Maarten Grootendorst. He is a developer relations engineer at Google DeepMind, and he is also the coauthor of two O’Reilly books, Hands-On Large Language Models and An Illustrated Guide to AI. And so, Maarten, welcome to the podcast.

01.10
Thank you. It’s wonderful to be here.

01.12 
So, I had you on the podcast—I was looking at it earlier this morning—August 2022, a few months before ChatGPT was released. 

01.23
It’s been a while. [laughs]

01.25
Yeah. Back then, what I wanted to talk to you about was, I was a user of your BERTopic library. For listeners who are not familiar, BERTopic was kind of a marriage between the transformer approach with topic modeling and Maarten wrote one of the more popular libraries for doing that. Actually, what’s happened to this whole topic of topic models?

01.58
Oh, yeah. I think it’s still going strong. You mentioned ChatGPT. So a lot of people say, “OK, just use that for topic modeling.” You can. It’s just very difficult to make sure you get a more structured, standardized output rerun thing, especially if [you have] millions of potential documents. And you can still use that on top of that. It’s still my baby of sorts, right? I mean, it’s been four years since we talked, and. . . I love working on that. I don’t have that much time to do it anymore, but it’s great.

02.36
Yeah. So I think one of the things that these large language models have done is kind of, I guess, cast by the wayside some of these earlier approaches for really wading through a lot of text. Unfortunately, I think people, as you mentioned, are trying to prompt their way into a topic model. But I think topic models themselves are still very useful. So one question to you, Maarten. What’s the level of usage of BERTopic now compared to when we talked?

03.13
It’s only grown since then.

03.17
Really?

03.18
Yeah. It surprised me too. [laughs] I think it’s because it’s easy to use. I did some, I think, cool tricks in there, but other than that, I think the main benefit was mostly just a nice user experience. And that helps people use something for a very specific task instead of trying to prompt your way towards something that might or might not work, and you still have to iterate over that. It just works out of the box. It’s not perfect. Nothing is. It’s not a free lunch. But yeah, I think that’s it.

03.55
One thing that’s happened, of course, is that this whole area of AI and NLP has gotten so democratized that. . . When we talked, I think the people who were using BERTopic at least had some notion of what NLP was and what text mining was, right? I would imagine now, in your role as a developer relations person, you encounter a lot of people who don’t come from a data science or ML background. And so they have no clue what topic models are, I would imagine.

04.34
Yeah, many don’t. It’s very interesting to see because you mentioned NLP and text mining and, well, [they’re] completely outdated terms now for some reason. It’s all AI. Let’s just call it AI and be done with it. [laughs] That’s not necessarily a bad thing, don’t get me wrong. It’s just very interesting to see how the field has evolved, but that also means that people don’t really look towards these “older techniques” that still drive much of the adoption of newer stuff.

Sometimes it feels like that, you know, AI and LLMs. . . It’s a hammer and we’re looking for nails to actually use it instead of, “OK, but we have packages for very specific things, and you can use LLMs on top of that.” You don’t have to. But it requires a bit of education on that end, because like you mentioned, a lot of people new to the field, you have to explain, “What are embeddings? What is clustering?” It’s also very interesting to see that even something like that needs to be explained a little bit in more detail. It’s a nice opportunity for me to explain stuff. I like doing that.

05.48
And the key here is that because a lot of people are entering this field and building things and they don’t necessarily know the prior art, so to speak, it seems like they might be leaving a lot of things on the table. Right? So in terms of, here’s my text or my data, I am just going to prompt and I think that I got everything out of it, but that’s not really the case for the most part.

06.24
No. Definitely not. There’s so many things that you can do with these systems, whether it’s on the LLM side or the agentic side or the topic modeling side. If you just know a little bit more on what’s going on under the hood then that helps you understand “When do I prompt? When do I not prompt? What’s going wrong?” That feeling, that intuition. You don’t just get it with building. Building’s very important, but if you don’t really know how an LLM works, that intuition is much more difficult to develop.

06.59
Which brings me to your two books, which are fantastic, which I think go a long way into helping people get that foundation. But let’s face it, a lot of people, Maarten. . . So let’s take your earlier book with Jay [Alammar], which is Hands-On Large Language Models. A lot of people may say, “I don’t have time to read this whole book.” So for someone who is a developer, doesn’t have a data science or ML background, what would be the most important concepts for large language models? Drill down on these three or four concepts that will set you up for success.

07.49 
From the top of my head, those are chapters two and three. So buy the book now. [laughs] I’m just kidding. Tokens. Super underappreciated.

08.03
Which now is a big topic because, as I joke, the CFO has now become the CTO, the chief token officer.

08.11
I didn’t know that one. That’s amazing. I’m gonna use it. But, yeah, tokens are now the thing, right? It’s what LLMs use to see the world, so to say—to interpret the world. And it’s how they communicate with the world. So it’s really important to know what tokens are. It helps you get into the realm of embeddings, which I still think is super fundamental to so many things we do.

And the second part is kind of an obvious one, but the attention mechanism, “Oh, wow. Why are these things so strong? What makes them so special?” Attention is an obvious one. We have other things like Mamba, recurrent neural networks, but it all starts from attention. So if you’re completely new to this field, those two. Yeah.

08.58 
Let’s take the topic of embeddings. I think at least that topic, Maarten, some people have had to play around with it, right? Because when LLMs first came online, the “Hello, World!” example was RAG, and one of the knobs that people were tuning was embedding, obviously chunking, so the information extraction, the search and retrieval—they’re all important. But one thing that people immediately tried to play around with was embeddings because they could go to places like Hugging Face:
Hey, let me try these four different embeddings.” Do you find that embeddings have a special place in that more people play around with embeddings and have some rudimentary understanding of embeddings?

09.50 
I have a sweet spot for embeddings because it’s the main part of BERTopic. But I think it’s so fundamental to so many things that we do in this field. Even things like RAG—which some people think is outdated. It actually isn’t. It’s very much alive and still kicking—runs on embeddings and understanding how they work will also help you understand how LLMs work. And it can be used in so many different ways. 

Sometimes we’re looking for bigger embedding models, more contextualized information. Great. [They] have their own purposes. And there are now certain parties focusing a little bit more on these static embeddings that are super fast and quick, like the old school embeddings that we used to have, and now in a new form that can be used in conjunction with coding agents to quickly search through repos and find the information that they’re looking for. Much of what we do is still search, and search revolves in big part on embeddings. And it’s just nice when you have text that you have one numerical representation for it—just that gives you so many opportunities to do so many cool things. . .

11.18
So when you’re trying to convince someone, Maarten, that “Hey, you should learn more about embeddings, because they’re important,” is there a canonical example that you use to say, “Hey, look, if you just understood embeddings and you made this one decision, look at the change in your application.” Is there a canonical example that you go to?

11.40
Oh, yeah, I love the question, but I don’t think I have an answer to that. Because, OK, so I’m a psychologist and I really like to say “it depends on,” and here it kind of depends on the application that you’re running, obviously. Contextualized versus noncontextualized embeddings is a very interesting example because the contextualized ones are generally larger. But there’s larger transformer-like models that require a lot of compute to run. So you can see the latency actually appearing in your search engines. Or if you connect your coding agent to one of those, it slows down because, you know, it needs to wait for the search compared to the faster static ones, for instance, like Model2Vec and stuff like that, which are tremendously fast. So amazing for those use cases, not that performance because they’re way smaller, obviously. And it’s these use cases where the building does get you a lot of intuition about when to use what instead of relaying that decision only to an agent. You’re still the one that needs to have the feeling, that gut feeling, to say this works better for my use case.

13.03
But I would say the reality is that people will go to some leaderboard.

13.09 
Yeah. That’s just the way it is.

13.13
So there we go. OK. So in this leaderboard here are the top 10. In this top 10, there’s some that look larger than the others. So I’ll try three or four of varying sizes. Is that a fair characterization of what normally happens?

13.32
Yeah that’s even what I always did. Just you know, top of the leaderboard, pick one or two. But then as you are more experienced with picking one, what about multilinguality? I’m Dutch. There aren’t that many very good Dutch embedding models—big problem there. There are things like matryoshka embeddings, where they’re embedding one embedding model, but they generate embeddings of different sizes for different purposes, which is also very interesting. So there’s all these types of small decisions and nuances that you can make. And we now have instruction-tuned embeddings, where you prefix it with an instruction that you want an embedding for clustering or for classification or for what have you. And then you suddenly see the nuances in selecting something.

14.27
So on the attention mechanism, again, I will play the role of someone who has no time. I don’t have time to read the chapter, Maarten. What are one to three things I should know about the attention mechanism?

14.44 
I think the most important thing about the attention mechanism is it contextualizes information. That’s by far the most important thing. When you look at the world before attention and after, it’s a little bit less black-and-white, obviously, but it puts stuff into context. You know, if you have the word “bank,” is it the bank of a river or a financial bank? And as we talk now with each other, there’s a lot of contextual stuff going on. You need to interpret what I’m saying, because if you only focus on what I say, you don’t know that that was actually a question beforehand that drives my answer. And I think that’s what makes attention so special. It tries to look at the entire thing instead of individual tokens or words.

15.34
Playing devil’s advocate, so you just explained it to me. Why do I have to learn more than that? [laughs]

15.40
Always learn more. [laughs]

15.44
Yeah, yeah, yeah. So you mentioned Mamba and the state space models. There was some excitement around them. So maybe give our listeners a high-level description of what these state space models are and what their current status is in the wild in terms of actual practical usage.

16.08 
State space models are a completely different way of approaching this attention mechanism, right? It almost does away with it and replaces it with something that is much, much faster. It’s a very complex and highly technical subject, so I don’t want to go too into that because it’s really confusing. [laughs]

So what you see happening is that people replace attention mechanisms. So you have a decoder and LLM, and it has several stacks of attention mechanism normally. What you can do is you can remove half of them with the very quick state space models that help speed up the inference—because that’s what we’re mostly bound now by, is inference speeds. People want more, more tokens. So it needs to be faster. So it’s, it’s a way to make it quicker.

17.13
Yeah. And so what is the actual implementation or adoption of state space models right now?

17.21
Mostly hybrid models. Models, stats, interleave the attention blocks, the decoder blocks with Mamba blocks as a way to make it faster, where some do it with, for example, local attention and global attention—one is more compute-intensive than others. Mamba is a way to do something similar, as a way to speed up that inference.

17.51
Your latest book is about agents: An Illustrated Guide to AI Agents. Before we dive in, in your mind, what makes a system truly agentic? In other words, before we started bandying around the word “agents,” people were using the term “robotic process automation” or something like that. So in your mind, what makes a system agentic?

18.22 
That’s actually been one of the more complex topics for us to actually describe, because the field has been changing so quickly. And what is fundamentally an agent when they change it every two months? It’s a little bit of a hot take, but I really do think that an agent is an LLM in a for loop with some tools, some memory, and perhaps some guardrails. And that really is essentially all it boils down to at its base.

18.55
You just described the harness basically. The hot term right now is harness engineering. So what is the real progress and what is just marketing when it comes to agents?

19.19 
Yeah, I agree very much with what you imply here because agents sound so cool, and they are cool, but the moment you give an LLM complete freedom, no constraints, just go off and do your stuff, it will fail horribly, horribly, horribly. Agents still need. . . And we can call them guardrails, but you can call them something else. They need direction. They need to be constrained a little bit in the things that they do. So yes, agents, there’s a lot of hype around that. I’m not a big fan of hype. It is what it is. But there are a lot of cool use cases for it because there’s a reason why coding agents are now the big thing. I’m using them myself daily because they make my life easier. But when we look at other use cases, we’re so early in AI progress. Yeah, coding works very nicely. But to ask an agent to book a vacation for me. Yeah. No.

20.35
It seems like that example of “I want to go on a trip. This trip will involve staying in five countries. And I want you to pick the best hotel for every country.” always was kind of the demo even during the robotic process automation. And as you alluded to, I don’t think we can do it quite yet. So here’s another family of agents, Maarten, that a lot of people are using now: deep research agents. Would you consider deep research an agent?

21.15
Maybe. It kind of depends on how it’s implemented. It depends. I’m sorry. I’m going to do that a couple of times, but. . . You can make it very structured, where you say, “OK, do the search on the archive, read the abstracts, make a summary. That’s it.” That’s not really. . .

21.38
It fits into your description in that you’re prompting an LLM. The LLM goes on a for loop where it uses as tools a search index, a knowledge graph. . .

21.53
Fair enough. Yeah. It makes the decision on its own when to use a tool, why to use a tool. Whereas you can also put it in a pipeline where you specifically say, “I always want you to do steps one, two, and three.” And an agent might decide to say, “OK, I’m going to do step 3, 3, 1, 2, 1, 3.” Decide on its own when and where to use specific tools. I think that’s maybe the best distinction you can make on what is and what isn’t an agent.

22.26
And then I guess it depends on the implementation, as you mentioned. But memory could also fill a role there, especially. . . Let’s say I’m using only one service—Google or Perplexity. Maybe it remembers over time what my preferences are. I don’t know if they actually implement it that way. But there’s potentially that aspect.

22.53
So how we phrase it in the book at least, we say, “OK, an agent is a reasoning LLM that has access to planning, tools, and memory,” because there’s no such thing as an agent that goes off and does three steps of something only to forget what the previous steps were. So I think memory is maybe a little bit underappreciated in the realm of agents, because imagine it has to go through an entire codebase and translate it from Python to C++ or Rust or what have you. It’s a very common example of things people want to do. That requires hundreds of steps to do, because it’s potentially a large codebase. How does it remember what it did when it did what, what the current state is, what what’s changed, etc., etc.? And you can write that in a Markdown file. That’s nice, but it also needs to understand, “OK, what’s the trajectory that I went through?” And you can do a lot of cool stuff with that trajectory, because that’s essentially the memory of an agent.

24.02
In your role in developer relations, I assume you talk to a lot of people who work in different companies. We’ve mentioned coding agents; we mentioned deep research. So what are some of the more common agents that people are building? They could be internal or external facing. So what are some of the more common agent types, I guess, that people are building?

24.29
Aside from the obvious, it depends on the industry. I do see coding agents actually being done quite a bit internally. Just trying to see how they can prevent data from being leaked elsewhere. Because a lot of processes now are very privacy sensitive. I came from healthcare before I joined DeepMind. And what you see in these kinds of fields is that, especially in Europe. . .

25.06
I imagine if you’re in finance in a hedge fund. . .

25.09
So yeah, same. . . And these are situations wherein people focus a lot on privacy and making sure that everything’s constrained within their environments. And you see a lot of people playing around with LLMs and then using harnesses—can be Hermes but also [taking] a more foundational agent and build[ing] stuff around that. Or the larger organizations that, well, just use whatever cloud offering there is and use an agent there. We’re so at the beginning of all of this. [laughs]

25.50
For me, the area where I see it being used—and this is not going to be a surprise to our listeners—is still the technical team bucket, which would be DevOps, data engineering, platform engineering. . . They’re building agents to help them do the work. But you might be interacting with a large website, and in the background, there’s a bunch of agents doing a lot of heavy lifting, moving data around for you to get the answer you want or whatever, or internal processes. But DevOps, I think they’re starting to build their own agents. I think, data engineering for pipelines, they’re building their own agents. I would imagine the people in security teams are also building agents because they have to go through lots of log files and. . .

26.55
A question for you then: Are they building agents, as in, you know, fully an agent, or are they building skills? Because I’ve seen a lot of people more focusing on creating skills and giving that to whatever agent is available. Or do you also see a lot of people actually building agents from scratch?

27.17
I think internally there are people who are building what we would consider agents in the sense that it would do a huge chunk of their normal work and they interact with it with prompting, but maybe they don’t consider it completely autonomous. So in the sense that many people who use coding agents, at least, the ones who know how to code, as you might still test and read some of the code, right?

27.50
Sometimes. Sometimes. [laughs]

27.52
Our listeners may be sharp, but there’s huge cohorts of people using coding agents who don’t know how to code or who are building websites and web applications. So in the data, in the DevOps, in the data engineering field, the kinds of agents they’re building are somewhat similar to the coding agents in that they’re doing a lot of the work, but they still have guardrails. I would say they’re still human-in-the-loop. Now, there’s also agents in the nontechnical fields, but they’re a little more. . . Maybe to your point, maybe they can be better described as skills, for example, in marketing or sales. Internally at some of these companies, they’re building things to help these teams be more independent from IT.

29.01
So yeah, you see mostly and we can call them skills, but we can also call them workflows or pipelines or just prompts. . .

29.10
Imagine you’re a marketing analyst at a big Fortune 500 company. And your job used to be to manage a bunch of ad campaigns and online campaigns. That was very manual, and so now you can automate a lot of that work. And then you might still have a dashboard where you can kind of see what’s going on. But the things that used to drive you crazy, now you can focus on other things.

29.46
But I am curious about the long-term effects of all of this, especially when, as you mentioned, a lot of people code without knowing how to code. I think that’s fun for a while but in the long term, stuff breaks and you don’t know where to start.

30.01
I don’t know about you, but I’ve come across people who literally don’t know how to code, who built a website, starting to have customers. Customers will file support questions or they say, “This part of your website doesn’t quite work.” Since they don’t know how to code, they go back to the same coding agent: “Hey, fix this.” The coding agent says I fixed it. They go back to the customer: “It’s fixed.” The customer goes, “It’s not fixed.” And so then this is when they start going “I need to hire someone to actually. . . Because now it actually needs to be fixed. And the holding agent can’t fix it.” So there are obviously dangers to going kind of completely wild on these technologies.

So open weights versus proprietary. This might be a sensitive topic to you because you have Gemini, but you guys also have Gemma.

31.09
I work on Gemma. Ask me everything about Gemma. [laughs]

31.12
[laughs] In your work—or not in your work, but in your day-to-day life, talking to friends, traveling, in your dev rel hat, what is a level of interest in open weights?

31.27
Oh, a lot, yeah. That’s for the most part because I’m in Europe. And Europe loves to say, “OK, we want to own things. We don’t want to push it over to someone else.” So there’s a lot of interest for open weight models. It’s way more than I initially thought because there was quite a big performance gap when ChatGPT came out, 3.5. But now they’re closing in. These models are extremely capable. You can run them on MacBooks. I mean, when Claude came out, I’ve seen so many threads of people buying Mac Studios just to be able to run whatever local LLM they have. So you see it in every part of the field, whether it’s very large organizations or very small, finance, healthcare, what have you.

32.25
One of the challenges with open weights is open weights is a business decision. And business decisions can be reversed. Meta Llama may no longer produce open weights. Alibaba—kind of mixed signals there. Some of the Chinese open weights providers are starting to send mixed signals. So it’s one thing to release an open weights model. But as you know, in this environment you have to release models at a regular cadence and that starts getting expensive. So I guess one of the challenges there for our whole community and industry is, you know, where is the steady supply of open weights models going to come from moving forward? Because basically, like I said, it’s a business decision, and a business decision is going to be reversed.

33.28
No, I agree on that. So in the general sense, that’s what we see happening. Some organizations stop doing open source, [or] less of it, focus on different things. It’s understandable in a way, because, you know. . .

33.45
And, you know, one of the obvious advantages of open weights is you can take the weights and run it in your cluster. And so you have control if. . . One of the things that annoys a lot of these enterprise teams is OK, so I’m really optimized for Claude 4.5. And then, hey, they are deprecating Claude 4.5, you know. So here at least you have control. And I think one of the things that most teams are starting to realize, Maarten, is actually I can use open weights for a lot of things because. . . Let’s say it’s so focused, like a simple sentiment analysis or whatever. I don’t need the most expensive models. And this I can control moving forward. So I think people and teams are discovering, “Hey, while I should be concerned that these open weights models may stop getting released, for some, for many of my tasks, maybe I don’t need the latest and greatest anyway.”

34.52
That can be the case. Yeah, because these models are very capable. I think there will always be a steady supply of open weight models. If we look at the status of the field now, many. . . Obviously Qwen, they’re doing an amazing job. Needs to be said. Same with Gemma, they’re also doing well.

35.14
The Qwen team lost a bunch of people, and I think there’s some worry that Alibaba may back off from. . .

35.23
I think they will continue. I don’t know, obviously, but I think it’s still a very good strategy to do.

35.30
And wait, Gemma is not as good as Gemini. [laughs]

35.33
We have good benchmarks. What is this? What is this? [laughs] No, but they serve different audiences. And what we see happening with open weights is you get so much back from giving open weights to the community. And DeepMind is a nice example. But the more labs obviously that have always given a lot to the community, when you do that, you also get a lot back, right? Because if people are super excited about Gemma 4—we released a model two days ago, 12B-1. And you see people using that for a lot of cool use cases. Driving research to create new things that, you know, we might not have thought of. That can be the case. You see Flash, for instance, which is a diffusion-based drafter, super fast, very incredible being used with Gemma 4. That’s cool. And it’s not to say that Gemma was the first one that drove that, but open weights in general allow a random person somewhere without access to thousands of GPUs to pretrain a model and still be able to do very cool and interesting research. So as long as I’m at DeepMind, I’m gonna make sure we’re gonna keep doing very cool Gemma stuff.

37.03
All right, so let’s close with a rapid fire round. So for each question, keep your answer under a minute. Question number one. OpenClaw. What says you, Maarten, about this trend around personal agents?

37.21
I love personal agents. They’re very cool and interesting. And at the same time, I’m very worried about the security of it. We’re seeing a lot of people’s keys being opened up, things that are being deleted that shouldn’t be deleted. And that’s because we’re in very early stages of all of this—just a little bit more time, and then it will be amazing.

37.46
Yeah. And run it locally with Gemma. [laughs]

37.50
Yeah, of course. [laughs] I’m not gonna sell too much. I love Gemma, I’m selling already too much.

37.57
Question number two: reinforcement learning. I’m a big fan. I always push out a post once a year at least, where I say it’s just around the corner. Now it seems like there’s a bit of a comeback with reinforcement, fine-tuning. Are you paying attention to reinforcement learning?

38.21
A lot. I have a couple of colleagues, and we started something called the RAG Pack with some bigger influencers, like Jay Allamar and Josh Starmer from StatQuest. And we did a course on reinforcement quite recently. It’s such a cool technology. It’s the technique that makes LLMs the way they are today. And there’s still a lot of new things coming up in that field to make them faster, more capable, multituning trajectories. Yeah, it’s the whole thing.

38.54
Third question: scaling loss. So Anthropic in particular is big on scaling loss: bigger models, more data, that’s the road to better and better models. So what’s your feeling right now about scaling loss.

39.11
They change quickly. We started with regular “more parameters, better model.” Then we switched to reasoning, where we said “longer reasoning, better model.” And now we’re slowly going towards the “longer trajectories, better model.” You know, more is better. I think they’re interesting, but they’re changing now so quickly that I’m wondering in half a year what the new scaling law and the new nifty thing is going to be.

39.39
So in closing, data centers. Data centers are a hot topic in the US. A lot of communities seem to be coalescing around opposing the build-out of data centers. So it’s a bit of a complicated issue in the sense that, you know, assuming that these AI technologies work and they get adopted, we will need compute in order for people to have access to these technologies. Otherwise, maybe the rich are the only ones who will have access to AI. On the other hand, the data centers themselves, you definitely need local input because, electricity, water, noise. . . And then unlike factories, they don’t really produce a lot of jobs because how many people do you really need to run a data center with all the DevOps agents now that we talked about? So what’s going on in data centers in Europe?

40.43
We don’t like them. I’m saying we—I’m Dutch. If I’m saying for the people of the Netherlands, we don’t like them generally. And that’s going to be very interesting moving forward because there’s still demand for AI. I know there’s a lot of people that don’t like it, but at the same time, there’s still a lot of people using it, and we need to find a way to balance that out. There’s no way forward otherwise, and I really hope we can focus more on efficiency when it comes to these compute-heavy things. That’s why I focus so much on Gemma. They’re small, capable models that you run on your cell phone. That’s great. Without needing to have these large data centers, aside from training, maybe, but that will always be there. We have to be honest about that. AI is here to stay. We just need to make it more efficient.

41.38
And with that, thank you, Maarten. And by the way, closing note about data centers, for our listeners, there’s a lot of announcements, right? Several gigawatts are being. . . Contracts being signed. But if you really follow what’s going on, there’s not a lot of build-out. There’s not a lot of data centers actually being built in and coming online. So… Thank you, Maarten.

42.07 
Thank you.



Read the whole story
alvinashcraft
8 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

PowerToys 0.1 Arrives With Improvements to Several Utilities

1 Share

Microsoft this week released PowerToys 0.1 with nice improvements across several of its utilities and no change at all to its bizarre version numbering scheme. “PowerToys 0.100 introduces the brand-new Shortcut Guide, a major Command Palette update with the new Extension Gallery and multi-monitor Dock support, and a wave of improvements to Power Display,” the […]

The post PowerToys 0.1 Arrives With Improvements to Several Utilities appeared first on Thurrott.com.

Read the whole story
alvinashcraft
9 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Tech, Community, and a Movie: MVPs Help Bring Stir Trek to Life

1 Share

What happens when you combine a full day of technical learning with a movie theater full of developers, designers, and tech leaders - and a shared commitment to giving back? You get Stir Trek: Tech & a Flick, a one-day community conference in Columbus, Ohio, that ends not with closing slides, but with popcorn and a blockbuster movie.

Since its first event in 2009, Stir Trek has built a reputation for being practical, welcoming, and unmistakably different. The format is simple: 50+ sessions of technical content, conversations with regional and national speakers, breakfast, lunch, movie refreshments, and a shared movie screening experience. But the impact goes beyond the agenda. Stir Trek also organizes a MEGA FOOD DRIVE to support local food banks and supports the Stir Scholarship, which provides support for women in Computer Science programs.

For Microsoft MVPs, that combination of technical learning, community connection, and service makes Stir Trek a natural place to show up, share knowledge, and help others take their next step.

Stir Trek attendees in the movie theater lobby

Why MVPs Show Up

This year, MVP speakers including Steve Smith, Barret Blake, Robert Fornal, Brian Gorman, Brian McKeiver, Cory House, Ed Charbeneau, Jay Harris, Joseph Guadagno, Matthew-Hope Eland, Sam Basu, and Samuel Gomez brought their expertise to the Stir Trek stage. Their sessions reflected what the MVP community does best: translate real-world experience into practical guidance that helps others learn, build, and grow.

For MVP Brian McKeiver, the chance to speak at Stir Trek was also a chance to meet technologists where they are right now. “What stood out to me at Stir Trek was the sheer curiosity that almost every person had this year about AI tooling like GitHub Copilot CLI and Microsoft Foundry because everyone is on the same learning curve,” he shared. “We are all trying to learn tips and tricks, best practices, and what not to do when building AI solutions.”

“Everyone is on the same learning curve.” - MVP Brian McKeiver

That focus on usefulness is part of what makes the event stand out. Stir Trek’s audience includes people across disciplines and experience levels, from software developers and engineers to designers, IT pros, tech leaders, and aspiring community contributors. For speakers, that means designing sessions that are approachable, relevant, and grounded in what practitioners can apply immediately.

MVP Brian McKeiver presenting at Stir Trek

MVP Robert Fornal brought that practical focus into his TypeScript session. “The session I brought to Stir Trek focused on TypeScript, which can be used right now, because I want developers to walk away with tangible improvements to their systems and processes,” he shared.

That curiosity reinforced the value of practical, community-led learning. It also showed why MVPs continue to invest their time in events where the audience is ready to engage deeply and learn together - even when showing up requires a significant personal commitment.

For MVP Joseph Guadagno, traveling from Arizona to Ohio to speak at Stir Trek was worth it because of the chance to connect with technologists from a different part of the country. “I get to meet technology people from a different part of the country which generally means different viewpoints and problems that need to be solved,” he shared. “The community impact I hoped to make was to further grow people. I hoped to at least meet and connect to one new person, which I did.”

A Conference That Feels Different

The movie-theater setting gives Stir Trek a character all its own. Instead of moving through a traditional conference center, attendees spend the day learning in theaters, connecting in shared spaces, and ending the experience together with a film. It creates a rhythm that feels both focused and fun.

Brian also pointed to the event’s unique rhythm. “The mix of technical sessions, hallway conversations, and a shared movie experience creates a community experience that really is unmatched,” he said. “Stir Trek is and always has been a pretty unique conference. The sense of overall community is very strong there.”

“The blend of technical sessions, hallway conversations, and a movie screening creates a community experience that really is unmatched.”

- MVP Brian McKeiver

That difference matters. The event is memorable not only because of the sessions, but because the structure invites people to stay, talk, laugh, learn, and participate in something shared. It lowers barriers, makes room for connection, and reminds attendees that community can be both purposeful and playful.

MVP Robert Fornal presenting at Stir Trek

For Robert Fornal, the format helps keep the focus on learning. “Stir Trek feels different from other technical conferences because of its unique theater environment and focused selection of high-quality presentations,” he said. “The movie-theater format changes the energy of the day by focusing the time on the presentation.”

“The movie theater snack that best captures the spirit of Stir Trek is trail mix, because it has a little bit of everything.” - MVP Kevin Griffin

The Community Work Behind the Curtain

Stir Trek is also a reminder that great community events do not happen by accident. MVP organizers and community leaders help create the conditions that make the day work - from program planning and speaker coordination to attendee experience and the details that make the event feel welcoming.

For organizers like MVP Kevin Griffin and MVP Carey Payette, the work reflects the same community-first mindset that defines the MVP Program. As Carey shared, one lesson from organizing Stir Trek is that accessibility goes beyond ticket price or session variety. “It is about creating a relaxed, friendly environment where people feel comfortable learning, connecting, and participating at whatever stage of their career they are in,” she said. “Stir Trek aims to keep prices low (budget cuts are very real in the tech industry) and offers scholarship tickets for students and the unemployed.”

The giving component is central to that mission. Through its annual MEGA FOOD DRIVE and the Stir Scholarship, Stir Trek connects technical learning with tangible community impact. In 2023, attendees donated more than 1,400 pounds of food, and the scholarship program has awarded more than $87,000 to support women in Computer Science programs.

The organizers and volunteers behind Stir Trek - including MVPs Matthew-Hope Eland (second from left, front row), Samuel Gomez (third from left, front row), Carey Payette (right side, front row), Kevin Griffin (second from right, back row), and Steve Smith (right side, back row)

Carey also described the impact organizers hope to create beyond the day itself: “A moment from organizing Stir Trek that reminded me why this work matters was hearing that attendees went back to work excited about what they learned. It is even better when those stories include people making professional connections, finding jobs, volunteering year after year, or giving their first tech talk at Stir Trek. That kind of impact makes all the planning worthwhile and proves that you can, in fact, build community inside a movie theater.”

“You can, in fact, build community inside a movie theater.” — MVP Carey Payette

Advice for Future Speakers, Organizers, and Community Builders

For anyone hoping to get more involved - whether as a future speaker, volunteer, organizer, or attendee - the MVPs emphasized starting with contribution. Attend with curiosity. Ask questions. Share what you are learning. Look for gaps you can help fill. Community impact often begins with one practical step.

For organizers, the advice is similar: start with the people you want to serve. “If a community wanted to create its own tech or shared experience event, I would encourage them to invite the people they would like to see in that environment,” said Kevin Griffin. “A lot of the success of Stir Trek was from us personally reaching out to people that we knew would make Stir Trek an amazing experience.”

What They Took Home

Like the best community events, Stir Trek sends people home with more than notes from a session. It gives attendees new ideas, new connections, and a reminder that technical communities thrive when people keep showing up for one another.

Brian McKeiver said one moment he will remember is the curiosity attendees brought to conversations about AI tooling like GitHub Copilot CLI and Microsoft Foundry. That shared sense of learning reinforced one of Stir Trek’s strengths: people were not just attending sessions; they were comparing experiences, asking practical questions, and learning alongside one another.

That mix of practical learning, community care, and shared fun is what makes Stir Trek memorable - and what makes MVP participation so meaningful. Whether they are speaking, organizing, mentoring, or simply making room for someone new to join the conversation, MVPs help events like Stir Trek become more than a day on the calendar. They become a place where community grows.

Want to learn more about the MVP Program?

To find an MVP and learn more about the MVP Program visit the MVP Communities website and follow our updates on LinkedIn.

Join us for a future live session through the Microsoft Reactor where we walk through what the MVP program is about, what we look for, and how nominations work. These sessions are designed to help you connect the dots between the work you’re already doing and the impact the MVP Program recognizes — with time for questions, examples, and real conversations. 

Read the whole story
alvinashcraft
9 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Three Smart Guys: Too Big to Flail?

1 Share
Directions' Barry Briggs and industry analysts George Gilbert and Peter O'Kelly outline the implications of a new computing architecture anchored by an intelligent client.



Download audio: https://www.directionsonmicrosoft.com/wp-content/uploads/2026/06/season1ep5TSG.mp3
Read the whole story
alvinashcraft
9 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Creating a multi-agent application – Part 1

1 Share

The following text was created by a multi-agent application designed to create blog posts. In my next post we’ll take the application apart, step by step. For now, here is a test run with the prompt Use of multiagents in writing a C# application.”

Draft created: 2653 characters
{‘author’: {‘draft’: ‘# Harnessing Multi-Agent Systems in C# Applications\n’
‘\n’
‘In the evolving landscape of software development, ‘
‘multi-agent systems (MAS) have emerged as a powerful ‘
‘paradigm, particularly in enhancing the functionality of ‘
‘applications. However, the integration of these systems ‘
‘into C# applications comes with its own set of ‘
‘challenges and considerations. This post explores the ‘
‘key aspects of implementing multi-agent systems in C#, ‘
‘drawing from recent research findings.\n’
‘\n’


‘## Understanding Multi-Agent Systems\n’
‘\n’
‘At its core, a multi-agent system consists of multiple ‘
‘autonomous agents that interact with one another to ‘
‘achieve specific goals. These agents can be designed to ‘
‘perform tasks collaboratively, leading to improved ‘
‘efficiency and problem-solving capabilities. However, as ‘
‘highlighted by Elliot One, simply increasing the number ‘
‘of agents does not guarantee better outcomes. In fact, ‘
‘it can complicate the debugging process, making it more ‘
‘difficult to trace failures and understand system ‘
‘behavior. This underscores the importance of thoughtful ‘
‘design and implementation when developing multi-agent ‘
‘systems.\n’
‘\n’


‘## Multi-Agent Architecture in .NET\n’
‘\n’
‘For developers looking to implement multi-agent systems ‘
‘in C#, the Microsoft Agent Framework provides a robust ‘
‘foundation. A recent tutorial video introduces the ‘
‘concept of multi-agent orchestration and workflows, ‘
‘offering insights into how these systems can be ‘
‘structured within .NET applications. This resource is ‘
‘invaluable for developers seeking to grasp the ‘
‘architectural considerations necessary for effective ‘
‘multi-agent implementation.\n’
‘\n’
‘## Practical Implementation\n’
‘\n’
‘Building a multi-agent system requires a structured ‘
‘approach. A Codelabs resource outlines a step-by-step ‘
‘guide for developers interested in practical ‘
‘implementation. This guide covers prerequisites, ‘
‘essential components, and best practices for creating a ‘
‘functional multi-agent system. By following this ‘
‘structured methodology, developers can mitigate common ‘
‘pitfalls and enhance the overall quality of their ‘
‘applications.\n’
‘\n’
‘## Conclusion\n’
‘\n’
‘While multi-agent systems hold significant potential for ‘
‘enhancing C# applications, developers must navigate the ‘
‘complexities associated with their implementation. By ‘
‘leveraging available resources and adhering to best ‘
‘practices, it is possible to create robust multi-agent ‘
‘systems that improve application functionality without ‘
‘succumbing to the common challenges of increased ‘
‘complexity. As the field continues to evolve, staying ‘
‘informed and educated on the latest developments will be ‘
‘crucial for developers aiming to harness the full power ‘
‘of multi-agent systems in their applications.’

————————————————————
>>REVIEWER Review: APPROVED – The draft effectively introduces the concept of multi-agent systems in C# applications an… ✓ Draft APPROVED {‘reviewer’: {‘next_step’: ‘END’, ‘review_notes’: ‘APPROVED’}}
————————————————————
>>>Blogger Blogger: Draft approved, ending workflow Decision: END Task: Report approved and complete {‘blogger’: {‘current_sub_task’: ‘Report approved and complete’, ‘next_step’: ‘END’}}

Read the whole story
alvinashcraft
10 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Your agent just scaffolded a project from 2020

1 Share

Your agent ran a scaffold command. Project generated, dependencies resolved, no errors. Everything looks fine. Except it’s based on the project structure from 2020, and neither you nor the agent noticed.

How npx picks the right-but-wrong version

When an agent scaffolds a project or runs a CLI tool, it often reaches for npx without specifying a version. Something like:

npx create-some-app my-project

Notice, that there’s no version pinned anywhere. The agent typed the package name and assumed it’d get the latest. That’s where things break.

When you run npx without a version, npm resolves the latest version that’s compatible with your current Node runtime. Since npm-pick-manifest v9.1.0 (shipped mid-2024), npm prioritizes versions whose engines field matches your Node version over the latest tag. If an old version has no engine constraints at all, npm considers it compatible with everything.

Say a package has these two versions on the registry:

// v2.0.0 (latest)
{ "engines": { "node": ">=22.14.0 <23.0.0" } }

// v1.3.0 (old, no engines field)
{}

On Node 24, npm skips v2.0.0 (upper bound excludes 24) and lands on v1.3.0, because no engines means “works everywhere.” The resolution chain looks like this:

Step What npm does Result
1 Check latest tag (v2.0.0) node <23.0.0 — skip
2 Check next highest (v1.9.0) node <23.0.0 — skip
3 Check v1.3.0 No engines field — compatible
4 Install v1.3.0 Done

This is how npx works for everyone, agents included. It reads package.json, not docs. It doesn’t know that a newer version exists and would work fine if you just had the right Node version. It follows the engine constraints mechanically, and if those constraints exclude your runtime, it moves on to the next compatible version. This behavior surprised some folks when it first shipped, but it’s been in npm since 10.8.2.

That’s the trap: the scaffold succeeds, the files look fine, nothing flags the mismatch. It all seems to work, just with something ancient.

The Node version manager trap

Things get worse if you use a Node version manager like fnm or nvm.

Version managers let you install multiple Node versions and easily switch between them. It’s convenient: you can run Node 22 for one project and Node 24 for another. But agents don’t think about this. When the agent opens a terminal and runs npx, it gets whatever Node version happens to be active. It doesn’t look for version requirements in the package it’s about to install. Instead, it uses the Node version that’s readily available.

So if you switched to Node 24 for a different project yesterday and forgot to switch back, your agent is now scaffolding with a Node version that pushes npm into its fallback behavior. The agent doesn’t know better. It ran the command, got output, and kept going.

How would you even catch this? npx succeeds, the files appear, the agent reports success. You’d have to inspect the generated package.json, notice the version mismatch, trace it back to your Node version, and connect that to npm’s engine resolution. That’s a lot of detective work for something that looks like it worked.

Old versions become a catch-all

Specifying engine constraints is the right thing to do. It keeps your package off runtimes you haven’t tested, and no one should stop doing it. But there’s a catch.

Any package that’s been around long enough will likely have older versions that didn’t specify engine constraints. Those unconstrained versions are technically compatible with every Node version, because they never said otherwise. So when npm looks for a compatible version, the latest unconstrained version becomes the catch-all package that npx will use.

This doesn’t affect every package equally. Most popular packages (Next.js, ESLint, Prettier, Angular CLI) use open-ended lower bounds like "node": ">=20". Those cover any newer Node version, so npm resolves to the latest just fine. The problem hits packages that use tight upper bounds ("node": ">=22.14.0 <23.0.0") or caret-bounded ranges ("node": "^16 || ^14 || ^12"). Those ranges exclude Node versions outside the tested range, which is exactly when npm starts looking for older alternatives.

Enterprise and platform tools are more likely to use this pattern. They certify specific Node LTS versions and explicitly exclude untested runtimes. Responsible engineering, but it means agents get silently downgraded when they run npx without pinning a version.

And the trend is toward more engine constraints, not fewer. Package authors are increasingly dropping support for older runtimes, and that’s good practice. But it also means more versions of more packages will trigger this fallback when paired with a Node version that’s outside the supported range.

Why agents get hit the hardest

A developer running npx might notice the version mismatch. Maybe the scaffold output mentions v1.11, maybe the generated files look unfamiliar, maybe the developer just knows what version they expect. Agents don’t have any of that context and without explicit instructions, they likely won’t pay attention to it.

All the agent sees is output. Command ran, exit code 0, files appeared. Done. It doesn’t compare expected vs. actual versions, doesn’t cross-reference the generated project structure against what it knows about the current version. It trusts the tool, and the tool silently gave it something old.

We found this while evaluating the Agent Experience of scaffolding SharePoint Framework (SPFx) projects. The agent ran npx without a version, our test machine was on a Node version outside SPFx’s supported range, and npm silently resolved v1.11.0 (published July 2020) instead of the latest v1.23.0. Walked back 12 versions and 6 years. The agent knew v1.23.0 was the latest. But when npx resolved v1.11.0, it saw a successful scaffold and moved on. It didn’t stop to ask: why did I get a version from 2020 when I know the latest is from 2026?

Any package with tight engine bounds would do the same thing. Agents inherit whatever runtime environment they land in and don’t validate it. Node version, npm cache, environment variables: they all shape what a command produces, and the agent takes every one at face value.

What you can do

Here are a few practices for you to consider to avoid this trap.

Pin versions in your prompts

If you’re asking an agent to scaffold a project, include the version you want. “Scaffold a project with create-next-app@15.3” is better than “scaffold a Next.js project.” It nudges the agent toward including a version specifier in the npx command. Not foolproof (the agent might still omit it), but it helps.

Pin versions in your extensions

If you’re building agent extensions (MCP servers, instruction files) that scaffold projects, hardcode the version or make it a required parameter. Don’t let the resolution algorithm decide. Your tool description should produce a command like npx some-package@1.5.0 with the version baked in.

Control your Node version

Use .node-version or .nvmrc in your project root and make sure your version manager respects it. Some version managers auto-switch when you cd into a directory (fnm does this with --use-on-cd). It won’t fix the agent’s behavior, but it reduces the chance of your terminal running an unexpected Node version when the agent starts.

Watch your upper bounds

If you’re a package author, a constraint like "node": ">=22.14.0 <23.0.0" means anyone on Node 23+ gets silently downgraded via npx. Maybe you only tested that range, fair. But if your package works on newer Node versions, a lower-bound-only constraint ("node": ">=22.14.0") avoids the silent fallback without giving up protection against genuinely incompatible runtimes.

Verify the output

After any scaffold command, check the version in the generated package.json. We trust scaffolding tools to give us the latest. With agents, trust but verify.

The real lesson

Runtime environments are invisible to agents. The agent sees your prompt, has access to tools, and runs commands. It doesn’t see your Node version or your version manager state. Either one can change what a command produces, and the agent can’t tell the difference between “worked correctly” and “worked, but with a 6-year-old package.”

If you’re building technology that developers use through agents, assume the runtime is unpredictable. Pin versions, communicate versions, and validate outputs. Don’t rely on resolution algorithms designed for humans who read terminal output. Agents might, but there’s no guarantee. They execute commands and take the results at face value.

The post Your agent just scaffolded a project from 2020 appeared first on Microsoft for Developers.

Read the whole story
alvinashcraft
10 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories