Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
150197 stories
·
33 followers

GeekWire’s AI summit is Tuesday: What to know if you’re attending our ‘Agents of Transformation’ event

1 Share

There’s still time to grab a last-minute ticket for GeekWire’s Agents of Transformation, a half-day summit in Seattle on Tuesday that will explore how agentic AI is redefining work, creativity, and leadership.

Keep reading for details about our speaker lineup, the schedule, logistical information and more.

We look forward to seeing you at the event!

When: Tuesday, March 24, 1 – 6:30 p.m.

Location: Block 41 | 115 Bell St., Seattle, 98121

Schedule:

  • 1:00 PM – Doors open: check-out the AWS Marketplace AI Innovator Spotlight Studio, Startup Zone and grab your barista bot coffee.
  • 1:40 PM – Main stage program begins.
  • 5:00 PM – Reception – appetizers, drinks, and networking while exploring the Startup Zone demos and robotic cocktail bar.
  • 6:30 PM – Event concludes.

Parking: Multiple parking lots are available within a 3-block radius of Block 41.

What’s included:

  • Four fireside chats featuring leaders from Microsoft, AWS, OpenAI, and more.
  • Expert panel on practical uses of AI agents.
  • Startup Zone with live pitches from emerging AI companies.
  • AWS Marketplace AI Innovator Spotlight Studio (live thought leadership recordings).
  • Networking reception hosted by Nebius with appetizers and beverages.

Speakers:

  • Charles Lamanna, President of Business & Industry Copilot, Microsoft.
  • Julia White, VP & CMO, AWS.
  • Vijaye Raji, CTO of Applications, OpenAI.
  • Deepak Singh, VP of Kiro, AWS.
  • Expert panel: Angela Garinger (Outreach), Jeremy Tryba (AI2), Liat Ben-Zur (LBZ Advisory).

Tickets: A limited number of tickets are available here

Questions? Contact us at events@geekwire.com.

This event builds on an ongoing GeekWire editorial series, underwritten by Accenture, spotlighting how startups, developers and tech giants are using intelligent agents to innovate.

Thanks to presenting sponsor Accenture; gold sponsors Nebius and AWS Marketplace; and silver sponsors Prime Team Partners, Astound Business Solutions, Pay-i and Cascade for helping to make the event possible. For sponsorship opportunities or any other inquiries about the event, contact events@geekwire.com.

Read the whole story
alvinashcraft
9 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Some writing advice from Project Hail Mary’s Andy Weir

1 Share
A photo of author Andy Weir on the set of Project Hail Mary.
Andy Weir on the set of Project Hail Mary. | Image: Amazon MGM Studios

Andy Weir has done pretty well when it comes to adaptations. His first novel, The Martian, was turned into a movie in 2015, and the Ridley Scott-directed picture earned more than $600 million at the box office. And Project Hail Mary just had a huge opening weekend that puts it on track to be one of the year's biggest movies. However, despite that success, Weir tells me that he does his best to keep the idea of an adaptation out of his mind when he starts a new novel. "I try not to think about it at all," he explains.

The reason, according to Weir, is that the two mediums are just so different. That's something he's learned over the last dec …

Read the full story at The Verge.

Read the whole story
alvinashcraft
9 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Apple’s Worldwide Developers Conference returns the week of June 8

1 Share
Apple today announced it will host its annual Worldwide Developers Conference (WWDC) online from June 8-12.

Read the whole story
alvinashcraft
9 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

GitHub expands application security coverage with AI‑powered detections

1 Share

AI is accelerating software development and expanding the range of languages and frameworks used in modern repositories. Security teams are increasingly responsible for protecting code written across many ecosystems, not just the core enterprise languages traditionally covered by static analysis.

That’s why GitHub is introducing AI-powered security detections in GitHub Code Security to expand application security coverage across more languages and frameworks. These detections complement CodeQL by surfacing potential vulnerabilities in areas that are difficult to support with traditional static analysis alone. Public preview availability is planned for early Q2.

Expanding application security coverage with static analysis and AI

Static analysis remains an effective way to identify vulnerabilities in supported languages, which is why GitHub Code Security continues to rely on CodeQL for deep semantic analysis. But modern codebases often include scripts, infrastructure definitions, and application components built across many additional ecosystems.

To address this reality, GitHub Code Security extends coverage by pairing CodeQL with AI-powered security detections across additional languages and frameworks. This hybrid detection model helps surface vulnerabilities—and suggested fixes—directly to developers within the pull request workflow.

In internal testing, the system processed more than 170,000 findings over a 30-day period, with more than 80% positive developer feedback. Early results show strong coverage for ecosystems newly supported through AI-powered detections, including Shell/Bash, Dockerfiles, Terraform configurations (HCL), and PHP.

This capability sits within GitHub’s broader agentic detection platform, which powers security, code quality, and code review experiences across the developer workflow. What begins as expanded coverage establishes a foundation for evolving detections over time, pairing the precision of static analysis with deeper context and new vulnerability insights that emerge as development continues to accelerate.

Bringing expanded security coverage into pull requests

Pull requests are where developers already review and approve changes, making them the most effective place to surface security risks early. When a pull request is opened, GitHub Code Security automatically analyzes the changes using the most appropriate detection approach, whether that is static analysis powered by CodeQL or AI-powered security detections.

The results appear directly in the pull request alongside other code scanning findings, surfacing risks such as unsafe, string built SQL queries or commands, insecure cryptographic algorithms, and infrastructure configurations that expose sensitive resources.

By integrating security detections into the pull request workflow, GitHub helps teams catch and fix vulnerabilities earlier, without asking developers to leave the tools and processes they already use.

Turning expanded detection into review-ready fixes with Copilot Autofix

Identifying vulnerabilities early is only part of the challenge. Security teams must also ensure those issues are fixed quickly and safely.

GitHub Code Security connects detection to remediation with Copilot Autofix, which can suggest fixes that developers can review, test, and apply as part of the normal code review process.

Developers are already using Autofix at scale. It has fixed more than 460,000 security alerts in 2025, reaching resolution in 0.66 hours on average compared to 1.29 hours without Autofix.

Together, expanded detection and Copilot Autofix help teams move faster from finding risk to fixing it.

Enforce security outcomes at the point of merge

Because GitHub sits at the merge point of the development workflow, security teams can enforce outcomes where code is reviewed and approved, not after it ships. By bringing detection, remediation, and policy enforcement together in pull requests, GitHub helps teams reduce risk without slowing development.

At RSAC, GitHub will preview how AI-powered security detections expand application security coverage directly within pull requests. This demonstration reflects a broader direction: starting with expanded coverage today, and evolving toward deeper, AI-augmented static analysis as part of GitHub’s agentic detection platform. Visit GitHub at RSAC booth #2327 to see how hybrid detection, developer-native remediation, and platform governance work together to secure modern software development.

The post GitHub expands application security coverage with AI‑powered detections appeared first on The GitHub Blog.

Read the whole story
alvinashcraft
9 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Will machines ever be intelligent?

1 Share

Technical advances are moving at such a rapid pace that it can be challenging to define the tomorrow we’re working toward. In The Shape of Things to Come, Microsoft Research leader Doug Burger and experts from across disciplines tease out the thorniest AI issues facing technologists, policymakers, business decision-makers, and other stakeholders today. The goal: to amplify the shared understanding needed to build a future in which the AI transition is a net positive. 

In this first episode of the series, Burger is joined by Nicolò Fusi of Microsoft Research and Subutai Ahmad (opens in new tab) of Numenta to examine whether today’s AI systems are truly intelligent. They compare transformer-based large language models (LLMs) with the human brain’s distributed, continuously learning architecture, exploring differences in efficiency, representation, and sensory-motor grounding. The discussion probes what intelligence really means, where current models excel or fall short, and what future AI systems might need to bridge the gap.

Transcript

[MUSIC] 

DOUG BURGER: This is The Shape of Things to Come, a Microsoft Research Podcast. I’m your host, Doug Burger. In this series, we’re going to venture to the bleeding edge of AI capabilities, dig down into the fundamentals, really try to understand them, and think about how these capabilities are going to change the world—for better and worse.   

In today’s podcast, I’m bringing on two AI researcher-experts: Nicolò Fusi, who is an expert in digital, transformer-based large language model architectures and learning, and Subutai Ahmad, who is an expert in biological architectures, specifically the human brain. And the question we’re going to discuss is, are machines intelligent?  

And what I mean by that: are digital intelligence, large language models, on a path to surpass humans, or are the architectures just so fundamentally different that one will do one set of things well, the other will do something else very well? And so we’ll be debating the architecture of intelligence across digital implementations and biological implementations because the answer to that question, I think, really will determine the shape of things to come. 

[MUSIC FADES] 

I’d like to ask each of my guests to introduce themselves. Tell me a little bit about your background and what you’re currently working on—to the extent you can talk about it—in AI. So, Nicolò, would you please start? 

NICOLÒ FUSI: Yeah, thank you, Doug, for having us and having me here. It’s so much fun. So I’m Nicolò Fusi. I’m a researcher at MSR [Microsoft Research]. So Doug is my boss, so I will be very, very, very good to Doug in this podcast.  

No, but jokes aside, my own background is in Bayesian nonparametric. That’s what I started studying. So Gaussian processes and things like that. And then equally, I would say, in computational biology, because I found it, like, one of the most interesting use cases for AI techniques. And that, kind of, has been true throughout my career. And pretty much like everybody else, eventually, I moved away from the kernel methods and the Bayesian nonparametrics and I started working more on language models, transformer models, with a particular eye towards information theory and the connection between information theory and generative modeling. And that’s, kind of, one of the main things I do today other than, kind of, managing the research of people who do much more interesting work than I do. [LAUGHS]  

BURGER: I have to interject there, Nicolò, because you dragged a piece of bait across my path.  

FUSI: I figured.  

BURGER: You know, at Microsoft Research, I have a management rule that I can’t tell anyone what to do because we hire some of the best people in the world. You have to trust them. And everyone is always completely free to call BS on me. And so Nicolò was joking there; [LAUGHTER] he does not have to toe the party line. In fact, I encourage him not to. So, so … 

FUSI: I just have to be well-behaved. That’s the only thing I will say. [LAUGHS] 

BURGER: Yeah. Thank you, thank you for baiting me. [LAUGHS] Because he knew exactly what he was doing. And I love him for it.  

Subutai, can you tell us a little bit about yourself? 

SUBUTAI AHMAD: Sure. Thank you so much, Doug, for having me. I’m really looking forward to the conversation between us all.  

So I see myself fundamentally as a computer scientist. You know, I’ve been studying computer science for longer than I care to admit. But something changed for me during my undergrad years. I decided to minor in cognitive psychology, and I started to get really interested in how the brain works. 

And to me, understanding intelligence and implementing intelligence was the hardest problem a computer scientists could ever solve. So I got very, very interested in that. You know, I couldn’t see how to really commercialize that. I was very interested in making products and stuff. So I stopped, you know, working on that for a while. I did a number of startups doing computer vision, you know, video processing, a lot of that stuff. 

And then when Jeff Hawkins started Numenta back in 2005 with the idea of really deeply understanding how the brain works and figuring out how to apply that to AI, for me, it was like all my worlds coming together. This, like, this is what I had to do. None of us thought [LAUGHS] it would take as long as it did. We spent the last couple of decades really deeply trying to understand neuroscience from a computer scientist—from a programmer’s—standpoint, the underlying algorithms. And that’s really what I’m passionate about, just trying to translate what we understand about the neuroscience to today’s AI.  

And in terms of what we’re working on today, it’s, you know, the human—maybe we’ll get into some of this—the brain is super efficient in how it works—power efficient, energy efficient—and we’re trying to embody those ideas and trying to make AI a lot more efficient than it is today. 

BURGER: Great. I think we’ll get into efficiency a little bit later in the podcast because that’s a subject that’s near and dear to my heart, you know, being a computer architect originally by training.  

I want to go back to, you know, one of the reasons I got involved with Numenta is, you know, Subutai and I have been exchanging emails, like, discussing collaborations, you know, visiting each other through the years, and the thing that really stuck with me was when I read one of the earlier books from Jeff On Intelligence (opens in new tab). And there was an example in the book that talked about how, you know, the human brain learns continuously. I think biological organisms in general learn continuously.  

And the anecdote that I remember was this anecdote if you’re walking down your basement steps, you know, you’re walking down the stairs to your basement and there’s one step that’s always been a few inches off and you decide to fix it, and so you raise it so it’s even with the others, and then the next time you go down the stairs, you don’t remember and you’re wildly off and, you know, you hit that step, you hit it earlier or later than you anticipated, you go out of balance. You’re flailing around. You know, you get all this adrenaline. You think you’re going to pitch headfirst down the stairs. Hopefully you don’t. And then the second time you do it, you’re a little off balance, but it’s not crazy. And the third time you maybe notice a little bit, and the fourth time, it’s, like, it’s your basement stairs. 

And so somewhere between that first time down and the third and fourth times down, there are molecular changes in your brain that have learned the new timing of your basement steps. And I remember just that example vividly from the book. And that got me thinking, wow, this is so different from the way our digital AI works. I’ll turn it over to you to comment for that and then I think we’ll go into the digital. 

AHMAD: Yeah, no, that’s a great example. I think it’s remarkable how our brain is constantly modeling our entire world at such a granular level, and we’re not even aware of it perceptually. Like, you know, that example of the steps is probably not … you wouldn’t consciously be aware of it, yet if something is different about anything in your world that you’re very familiar with, you’ll instantly notice it. And then you’ll, you know, you’ll update your world model, you’ll adjust, and you’ll continue on. It’s really remarkable how the brain’s able to do that so seamlessly. 

BURGER: And a lot of that is based on neurotransmitters, right? Because there’s just a … you know, when you have that physical reaction to “I’m about to pitch down the stairs,” you get a flood of transmitters that actually changes the way your brain’s learning or at least the rate. 

AHMAD: Yeah, there’s a flood of neurotransmitters and neuromodulators, as well, that invoke change, sometimes very rapidly. Another example, you know, if you touch a hot stove—that’s the canonical example—you will learn that very, very quickly. So there’s a lot of chemical changes that happen. But it’s also really interesting that we can update things and update our world knowledge without impacting everything else that we know. This is something that’s very, very different, again, from today’s AI models. We’re able to make these changes in a very contextual and very, sort of, fine-grained way.  

BURGER: So, Nicolò, I want to go and talk a little bit now to transformers. So I think, you know, you and I and Subutai were all working in the AI field, you know, many years before 2017, when the transformer hit. You know, I was building, you know, with my team hardware to accelerate RNNs [recurrent neural networks], LSTMs [long short-term memory], you know, which had this awful loop-carried dependence, you know, the bottlenecked computation, and then the transformer was just much more parallelizable.  

So what do you think’s really going on in these things? And maybe we could start—I know you and I have talked a lot about this—maybe just start with the major blocks. You know, you’ve got the attention layer. You’ve got the feedforward layer. You’ve got, you know, the encoder stack and the decoder stack and the latent space in between. Can you just, kind of, walk us through those pieces at a high level and tell us what you think is going on? 

FUSI: Yeah. Yeah, I mean, I have a very opinionated view of why transformers are so great.  

BURGER: That’s why you’re here. [LAUGHS] 

FUSI: Maybe, like, yeah, maybe I’ll inject it. I don’t know. I don’t think it’s a super novel creative opinion, but it is an opinion. So I guess the two principal … the two main components you already described: the, you know, the transformer [read: attention] layers and the feedforward layers. One way to think about them is, how does information in your context relate to each other and what is every token referring to, for instance, in the case of transformers in language models? 

So by context, we mean, like, the information you feed through the model, that the model keeps continuously generating and appending to. 

BURGER: So like your chat history. 

FUSI: Your prompt. Your what? Your chat history or your particular prompt in a chat session.  

BURGER: OK.  

FUSI: That prompt, which is a sequence of words, gets discretized in a series of tokens. Tokens can be individual words, can be multiple words, kind of, connected together. The way we go from words to tokens typically is through an algorithm that tries to basically collapse as much as possible. Multiple words, like “the dog,” may be just one token as a first, kind of, level of compression to feed into the model. So it just tries to bring things together as efficiently as possible.  

Then there is, you know, within these models, there is a transformer layer. This transformer layer or this attention layer, sorry, tries to basically figure out what the “the” refers to—the term “the” in “the dog,” or “the dog jumps on the table,” “jumps” refers to the dog. So there is this kind of, like, mapping that happens.

And then there is, like, feedforward layers, which in modern large language models, they store a lot of information. Like, that’s kind of, like, where the knowledge typically kind of sits in, the things that the model just knows. You know, that, I don’t know, if you slam your arm against [the] cup of water on your table, that cup of water falls off the table. That’s something that the model, kind of, has baked in through reading a lot about cups falling off of tables when they’re hit. 

So that’s, kind of, those are, for me, the two fundamental components, and the reason why I have an opinionated view is that, you know, honestly, I do believe that RNNs and, you know, even state-space—modern incarnations of state-space models—are good enough to learn over these, you know, language data or whatever or vision data or audio data. 

The good thing about transformers is that they do two things very well. One is they get out of the way. They don’t have this notion of “everything has to be encoded through a state” like recurrent networks. And two, they do that very computationally efficiently as you were saying. There isn’t a computational bottleneck. And so they created this nice overhang where they happen to be the right architecture at the right time to unlock enough flow of information through the model … 

BURGER: Yeah.  

FUSI: … that we could get through these amazing things. 

BURGER: Let me press you on one thing. Like, you know, in the attention blocks, you can figure out which words or which tokens relate to which tokens. So I put in the prompt and it’s finding all the relations and then feeding those relations up to, you know, the feedforward layer—well, the feedforward unit within a layer. And you said that knowledge is encoded there, but then what does it really mean for those maps to then access knowledge, but then you project it back into, you know, the output and then feed it up to the attention block in the next layer?  

FUSI: Again, yeah.  

BURGER: So it seems kind of weird that I’d be, like, accessing knowledge and then taking that knowledge, merging it, and going back to another attention map. 

FUSI: Well, you can see it as a mixing operation that happens in the feedforward part of the layer. You know, like, you’re attending, then you’re mixing, and, kind of, like, reprojecting to some space with higher-information content or, like, a different level of information extraction. And then you’re putting it back into, “OK, so let me do another round of processing” and, kind of, attending and then a mix again. And then I do it again and then I do it again.  

So I think that the information that is present in the prompt and in the, you know, that has been baked into the weights gather further and further refined. Whether that refinement is extraction of structure or aggregation into higher-level concepts, I’m not sure. I think it’s just structure gets extracted and things that are irrelevant get kind of pushed away. But that doesn’t necessarily mean that it gets aggregated through the architecture.  

BURGER: So now I’m going to try to, like, restate what I think I hear you saying. So, you know, we’re adding information and we’re kind of adding information at a higher level but not necessarily throwing away the low-level information, at least that’s not relevant, right?  

FUSI: Yeah. 

BURGER: Because, you know, if the higher-level stuff depends on the low-level stuff, I have to have that first. And so then you get to the top of the encoder block and you’re in the latent space with all of that information kind of maximized. Is that a way to think about it? And if you agree, can you talk about what the encoder block really is and what the latent space is? 

FUSI: I tend to agree, yes. I mean, there is … you’re describing … I think you’re describing what I think is happening, which is there is given the context in your prompt and given the task that the model perceives or, like, figures out that you’re doing, it has to highlight and pull out the relevant information. And it does that not by summarizing layer by layer, but it does it by, you know, increasing the prominence of that information and suppressing other things. So I think that’s ultimately what happens up to the point where you reach this beautiful point in concept space, which identifies both your intent and the things in the prompt and in the knowledge of the model that are necessary to solve it. 

BURGER: And so one last question, and then I want to go to Subutai for a second.  

So now when we go through the decoder stack, are we just going the other way and stripping out the high-level concepts early and then getting down to the granular tokens? Or, you know … because you go up through the encoder stack, those attention blocks and feedforward layers, to get to that magical latent space. And now we’re going to go the other direction. How do you think about that other direction through the decoder stack, which is the same primitives as the encoder stack? 

FUSI: Same primitives. You can think of it as kind of the reverse operation. Like you, you never lost information throughout. You just kind of suppress or privileged different kinds of information. And now you’re basically just projecting it back out to a space that is, you know, intelligible. And it’s, kind of, where the model gets it’s … I hesitate to use the term reward because it has a particular implication, but that’s, kind of, where the loss gets computed and then gets pushed back through the model. 

BURGER: Right, as you’re trying to evolve and train all those parameters—the relationship between words, the information in the feedforward layers, the design of that latent space, and the extraction of the knowledge from it. 

FUSI: That’s right. And so in encoder-decoder model, you push through the whole thing, you decode back to a particular token, which for people who don’t know, it’s, like, literally a number out of a vocabulary, like word No. 487. And if it was word No. 1,500, you get, you know, like, … 

BURGER: Something else. 

FUSI: … a bad reward. Yeah. Yeah. And then … and if you got it right, you get a positive signal that then just flows back through the model. 

BURGER: I’d like to go over to Subutai now. So after hearing this, you’ve studied, you know, neuroscience and the neocortex and cortical columns and all of this for a long time, and you and I have had lots of debates. Is the human brain doing something different than that? You know, are we just building latent spaces, then extracting? The architecture is very different, but what’s going on under the hood? 

AHMAD: Yeah, the architecture is very different. You know, as Nicolò was describing what happens throughout a transformer stack, I was trying to relay and relate, you know, what we know in the brain, as well.  

In a typical, you know, transformer model, there is, at the end of the day, there is a single latent space from which the next token is output. That does not happen in the brain. There are thousands and thousands of latent spaces that are, sort of, collaborating together, if you will.  

You know, a lot of what we publish is under the moniker the Thousand Brains Theory of Intelligence. And Jeff has published a book a few years ago on that (opens in new tab). And that, kind of, dates back to discoveries in neuroscience from the ’60s and ’70s by the neuroscientist Vernon Mountcastle (opens in new tab), who was a professor at Johns Hopkins. 

BURGER: Yup. 

AHMAD: And what he discovered … he made this remarkable discovery that, you know, our neocortex, which is the biggest part of our brain—that’s where all intelligent function happens—is actually composed of roughly 100,000 what he called cortical columns (opens in new tab)

BURGER: Right.  

AHMAD: And each cortical column is maybe 50,000 neurons. And there’s a very complex microcircuit and microarchitecture between the neurons in a cortical column.  

But then there’s 100,000 of them, and every part of your brain—whether it’s doing visual processing, auditory processing, language, thought, motor actions—they’re all composed of this, essentially, this same microarchitecture. And this was a remarkable discovery. It says that there’s a universal architecture. It’s not a simple one. It’s complex. But it’s repeated throughout the brain. 

And that’s where this, you know, the idea of the Thousand Brains … each of these cortical columns is actually a complete sensory-motor processing system. It has inputs; it has outputs. It’s getting sensory input. It’s sending outputs to motor systems. And it’s building, in our theory, complete world models. So there isn’t a single latent space. There’s thousands of these latent spaces. 

And each little cortical column is trying to understand its little bit of the world. You know, one cortical column might be getting, at the lowest level, maybe one degree of visual information from the top right-hand corner of your retina. Another one might be focusing on specific frequencies in the auditory range. You know, each one has its own little view of the world, and it’s building its own little world model. 

And then they all collaborate together. There’s no top or bottom here. There’s no homunculus in the brain. Everything is sort of equal. And they’re all simultaneously collaborating and voting and coming up to, you know, what is the, you know, consistent interpretation of all of these sensory inputs that we’re getting? What is the single consistent, you know, concept, if you will, and, based on that, make the motor actions that are most relevant to that. 

So it’s a sensory-motor loop. It’s a, you know, it’s a constantly recurring system; we’re constantly making predictions. As we discussed earlier, you know, we are constantly learning. Every cortical column is constantly updating its connections, constantly updating its weights. It’s building and incrementally improving its world model constantly. So it’s a massively distributed, you know, set of processing elements that we call cortical columns that are, they’re all equal, operating in parallel. 

So I think there are similarities, for sure, between them. But at least the way I described it, I think it’s very different in its operation than what I understand today’s LLMs to be. I don’t know if you agree with that or not. 

FUSI: Yeah, I … To better understand, I had a question, which is, are these cortical columns relying on the fact that these are essentially multiple views of the same process and those multiple views, like, the, you know, the part of the sensory input that gets allocated or subdivided, is it happening at the same time point? So in other words, if you could artificially delay by some time t some cortical columns with respect to the rest, would the learning suffer?  

AHMAD: Yes, absolutely. Yeah.  

FUSI: And so in other words, how important is it that it’s, kind of, on the same schedule? 

AHMAD: [LAUGHS] Yeah, I mean, that’s another … I mean, LLMs today, you know, you get your input, one layer processes it, then the next, then the next, and the other layers are not operating. In the brain, it’s not like that. Everything is operating in parallel asynchronously. And this is important. They’re constantly trying to make predictions and so on. So if you were to artificially slow down some of your cortical columns, you would absolutely suffer. Your thinking would absolutely suffer. 

BURGER: I wanted to interject here just because this is where … this discussion is where, you know, I got super interested in the difference and then spent a bunch of time with Subutai to learn from him. So if I think about my skin, you know, which is an organ, you know, as I understand it, there’s a cortical column attached to each patch of my skin and the size of that patch, kind of, corresponds to the nerve density there.  

AHMAD: That’s right. Yeah. 

BURGER: So in my brain, there is a set of cortical columns that are skin sensors, and I could actually … if I numbered all the cortical columns in the brain, I could draw a map on my skin and say, “This is No. 72 in this patch. This is No. 73 in this patch.” Now are human cortical columns, like, better than, say, what we see in a mouse? And, of course, this is a leading question because I know the answer. 

AHMAD: [LAUGHS] Yeah. So, yes, it, you know, cortical columns in your sensory areas, primary sensory areas, each, you know, pay attention to or get input from a, you know, some patch of your skin somewhere on your body. And there’s many more cortical columns associated with your fingertips than, you know, a square centimeter of your back, for example. So there’s definitely, you know, areas of sensory information that we pay a lot more attention to and devote a lot more physical resources to.  

In terms of a mouse and humans, it’s pretty remarkable that the cortical columns … so all mammals have cortical columns; all mammals have a neocortex. All mammals have cortical columns from a mouse all the way up to humans. And mice have cortical columns that are very, very similar to what a human has. It’s not identical. There are differences. But by and large, the architecture of a cortical column in a mouse is, you know, very, very similar to cortical columns in humans. Human cortical columns are bigger. There are more neurons, and there’s more detail there, but essentially, it’s the same. And …  

BURGER: Maybe just scaled up a little bit.  

AHMAD: Yeah. So evolution basically discovered this structure—that it’s really excellent for processing information and dealing with it—and then through, you know, very fast in evolutionary time, basically figured out that if you could scale up the number of cortical columns, you get more intelligent animals. And that’s what happened very, very fast evolutionarily. 

FUSI: I didn’t know about the unevenness of cortical columns present. Like, this is not … I’m not a neuroscientist, and so this is interesting because one of the biggest frustrations with many modern architectures of models is that they deploy a constant amount of computation no matter what the input is.  

So I go through the same number of layers whether I’m trying to predict the word “dog” after “the” or whether I’m trying to solve, like, give the final answer to a very complicated math question or, you know, whether a theorem was proven or not in the prompt. And so that’s interesting because, like, some current instantiations of modern architecture actually deploy … try to cluster things together such that you have a constant amount of information that you then push together through the model. [LAUGHTER] And so maybe like on my fingertips, I need more processing than I need on my elbow because, like, you know … and so this, kind of, makes sense. 

BURGER: Nicolò is being humble. He was working on this problem two years ago and told me about it. It was one of the things I learned from you that made me think differently. So … 

FUSI: I just like to refer to people are working on this … [LAUGHS] 

BURGER: Random average people who are not all necessarily brilliant AI scientists.  

So the prediction part of this, though, is really what’s fascinating to me, because, again, something else Subutai and I discussed many years ago, you know, if I’m, like, moving my finger towards the table and…my brain is making predictions because I have a world model. It knows a table is there. And the cortical columns representing that patch of skin, as it’s getting closer, they’re starting to predict that I’m going to feel something that feels like the table. And, yup, there; I hit it. Prediction met.  

But if I touched it and it felt really icy cold or super hot or fluffy or not there—I pass through it—I’d get a flurry of activity because the prediction wouldn’t match the world model, and that’s where learning would happen.  

Subutai, does that sound like the right model and intuition?  

AHMAD: Yeah, that’s definitely a very important component of it. We’re constantly making predictions. And as you said, you know, you’re moving your right fingertip down; you know, perhaps you’ve never sat in this room before or, you know, seen this table before, you would still have a prediction, a very good prediction of it. 

BURGER: Yeah. Because you know what a table is. 

AHMAD: You know what a table is. And if it was different, you would, you know, you would notice it right away. But if your left hand, which you weren’t paying attention to, also felt icy cold, then you would notice that, as well. So you’re actually making not just one prediction; you’re making thousands and thousands of predictions constantly about … 

BURGER: Every cortical column. 

AHMAD: Every cortical column is making predictions. And if something were anomalous, highly anomalous, you would notice it. So this is something, you know, we don’t often realize; we’re making very, very granular predictions constantly. And when things are wrong, we do learn from it.  

And the other interesting thing—and this is, again, possibly different from how LLMs work— you know, if I were to tell you to touch the, you know, the bottom surface of the table, you could without, again, without looking at the table or opening your eyes, you would be able to move your finger in and touch the bottom of your table because you have a, you know, set of reference frames that relate to …  

BURGER: Yup … 

AHMAD: There you go. Yep. You’re able to do it. 

BURGER: I did it! Yeah. Amazing. 

AHMAD: Even though you maybe never have been in this room; maybe you’ve never seen this table before. It doesn’t matter. 

BURGER: I’ve been in this room because we had to prep for the podcast series. But I didn’t touch the underside of the table, that’s for sure. [LAUGHS] 

AHMAD: Yeah, exactly. [LAUGHS] So, you know, we know where things are in relation to each other, where our body is in relation to everything, and we can very, very rapidly learn. And again, if the bottom part of the table was anomalous, you would notice it and potentially remember that. 

FUSI: I’m not going to lie. I was expecting you to find something under that table, [LAUGHTER] like a talk show. 

AHMAD: Or chewing gum or something. 

FUSI: And if you reach under the table, you’re going to find a copy of my paper. [LAUGHS] 

BURGER: [LAUGHS] You know, if I was smarter and better prepared, that’s exactly what would have happened. But, sorry, guys.  

I think you told me something, Subutai, you know, that … and I’ll give a little bit of preamble.  

So, you know, the brain has these dendritic networks in each neuron, and they form synapses. And so a neuron fires, and that, you know, the axon of the neuron that’s firing will propagate a signal through the synapses, which might do a little signal processing to the dendrites of the downstream neurons, and those downstream—the dendrites can then prime the neuron to fire. That’s one of the fundamental mechanisms. And it’s the formation of those synapses, you know, between the upstream and downstream neurons, the dendrites, that seem to be the basis of learning, and to me, that feels a little bit like an attention map. 

AHMAD: Yes.  

BURGER: So maybe the dendritic network is doing something akin to self-attention, and we have some work going on in that direction at MSR. But the thing you told me was that your brain is actually forming an incredibly large number of synapses speculatively. In some sense, sampling the world when something happens in case it will recur. You know, it’s a more … maybe it’s a version of Hebbian learning, right? You know, things that fire together, wire together. 

AHMAD: Exactly. 

BURGER: But then if that pattern doesn’t recur, then they get pruned. And I’m just going to, you know, what is the fraction of your synapses to get turned over every three or four days, you know, ballpark? 

AHMAD: OK. Yeah, I remember this. This was an absolute mind-blowing study in [The Journal of] Neuroscience (opens in new tab). So, you know, the way a lot of learning happens in the brain is by adding and dropping connections. 

In AI models, it’s usually strengthening, you know, high-precision floating-point number, making it higher or lower. But you’re not adding and dropping connections. The connections are always—in fact, everything is fully connected, right, between layers. And so in the brain, you’re always adding and dropping connections. That’s a fundamental mechanism by which we learn, one of the fundamental mechanisms.  

What I read in this study is that they looked at adult mice and adult animals, and what they found is that they would look at the number of synapses that were connected over the course of a couple of months—and they were able to trace individual synapses in this particular part of the brain—and what they found is that every four days, 30% of the synapses that were there were no longer there four days from now. And there was a new 30%. And there’s a huge number of connections that are constantly being added and constantly being pruned. And my theory of what’s going on there is that we’re always speculatively trying to learn things. 

So, you know, there’s all sorts of random coincidences and things that we are exposed to on a day-to-day basis. We’re constantly forming connections there because we don’t know what’s actually going to be required and what’s real and what’s random. Most of it’s random; most of it’s not necessary. And the stuff that actually is necessary will stay on. But we’re constantly trying to learn. 

This is a part of continuous learning that’s often not appreciated, I think, is that we’re constantly forming new connections, and then we prune the stuff that we don’t need. In an AI model, if you were to do that, it would just go, I don’t know, it would go bananas. [LAUGHTER]  

BURGER: Well, so let’s double-click on that. So when you told me that, the way I … 

AHMAD: This is mind-blowing, this 30%.  

BURGER: It’s crazy.  

AHMAD: Your brain is going to be totally different a few days from now. 

BURGER: It’s so mind-blowing. When you told me that, I spent some time processing it, so a whole bunch of synapses were created and destroyed during that time.  

But it just made me think that we have, you know, we have all of these columns getting all of this input continuously. You know, eyes, hearing, smell, taste, skin, heat, and then, you know, interactions with people, and then planning and experiences, just at every level. And they’re constantly sampling all this noise coming in and basically filtering out the noise. It’s like, kind of, like a low-pass filter. But when something statistically significant recurs, it’s going to lock and then become persistent.  

AHMAD: Yeah, yeah, I think so. There’s so much that’s happening, and you’re constantly learning, and, you know, when you touch a hot stove or something, there’s a flood of dopamine specific to those areas that caused these synapses to strengthen very, very quickly. You know, most of these synapses that are learned are very, very weak synapses.  

BURGER: Yup. 

AHMAD: And so, yeah, you know, when you look … in this study, they also quantified the turnover in, kind of, strong synapses versus weak synapses. And it’s comforting to know that the strong synapses stay there. It’s really these weak synapses that are constantly added and dropped. And then some of them will become strong. 

BURGER: Now I want to go back … return to Nicolò, but with an observation.   

So when I’m training a transformer, it’s also a prediction-based system. You know, I’m running … I have my input in the training set; I have my masked token or the next token I’m trying to predict. I run it through. I look at how successfully did it make that prediction, and the worse it was, the, sort of, the steeper the error, you know, I drive back through the network. So, you know, if it’s spot-on, I don’t learn very much. But if the prediction is way off, I’ve got to change a bunch of stuff. That sounds analogous to what Subutai was just describing with the cortical columns. 

FUSI: No, that’s right. I mean, with, I don’t know, with one big pet peeve of mine in pretraining, in particular around pretraining these language models.  

BURGER: OK. 

FUSI: So again, for context, like, language models in particular, but, you know, many other instantiations of large models, are trained in a few phases usually. One of them is pretraining, where you have some ground truth text and you remove, let’s say, just the last word, and then you ask the model to predict the last word. And that’s when you get that loss. Do you get the word right? Do you get the word wrong?  

One of the big problems that I have is that, you know, in human experience, we do not get feedback every single thought.  

The problem with language models, the way we are training them, at least in pretraining, is that they do a thing called teacher forcing. So they guess the word, then they get immediately the signal, and then the right word gets filled in, and then they predict the next one. 

So when you go through, like, a passage of text, you constantly get this reward. And it’s such a bizarre way to train a model. It’s necessary because you want a lot of flow of supervision. Like, you want, like, a lot of supervision to essentially use all the computation available. But at the same time, it actually makes the models arguably a little bit worse than what they would be if you had enough compute to train them without this. 

I went on a tangent just because it’s a pet peeve. [LAUGHS]  

BURGER: It’s a really important point, though, because your goal when you’re training a model is to get to your loss target with the minimal cost and time. Or, of course, like, fixed budget and, like, lowest loss target. 

But, you know, biological systems, also, their goal is survival with energy minimization. And so, like, once you’ve built a world model that works, right, like touching the table, touching the underside of the table—nope, still nothing exciting there—like, it takes very little energy to do that. And I think a tragedy is that we all have these supercomputers in our heads. You know, the neocortex is what, about 10 watts? And it’s this amazing thing, right, that can compose symphonies. But once we have a world model, a lot of us just stop learning because it’s comfortable, right. You don’t have to perturb the state. You can go through … and, you know, I mean, how many of us go through every day and all of our predictions succeed [LAUGHTER], and there’s no surprises, you know?  

So all the new synapses get swept away, right. That’s not a goal of pretraining because then you’re just wasting energy. But we’re trying to minimize energy consumption. So it does feel, kind of, aligned to me in some sense. 

So I’ve got a straw man I want to hit you with, but before we do, Nicolò, I want you to talk about your view on compression, like LLMs as compressors, because I know this is something you’re very passionate about and opinionated about. And I’ve learned a lot from you on this, too. 

And then, Subutai, after this, I’d like to hear your biological response. I mean, your response from a biological perspective. [LAUGHTER] And …  

AHMAD: You’ll get both.  

BURGER: That’s right, of course. And then I want to try … I want to throw out this hybrid straw man. So, Nicolò, tell us about compression. 

FUSI: The view is that basically the generative models are compressors in an information theoretic sense, and so trying to come up with a better generative model is equivalent to trying to find the best compressor for some data. And … 

BURGER: Now when you say compressor, do you mean lossless or lossy? 

FUSI: I mean lossless.  

BURGER: OK. 

FUSI: You can basically look at literally my much-maligned objective function that you use for pretraining, which is, you know, next-token prediction, and you can basically draw a complete parallel to what you would do if you were trying to come up with the, you know, try to do compression, which is coming up with the shortest possible code for something that you’re trying to compress. 

And so the two things are the same, and it, kind of, fits into a broader picture that, you know, like, goes back to Occam’s razor and Kolmogorov complexity and Solomonoff’s principle of induction, which is, you want short descriptions for likely things that happen in the world and you want your algorithm that produces those short descriptions to be also short. That’s the minimum description length principle.  

And I do feel like it fits in, kind of, also what you were saying about the concept of you have a good world model, why look for surprise? Because it simultaneously affects both terms, both the algorithm, like your own world model, but also the loss that you incur when something unexpected happens. 

And so if I’m an agent in the world trying to minimize the minimum description length of the world, I’d like to go and seek some in-distribution data such that I don’t bump up my surprise term too much. 

BURGER: Right. And I think you said at some point that, you know, when I’m training a model, even though you took the same loss point, you know, between Model A and Model B, if I have a steeper loss curve in Model A than Model B, you know, it’s getting to a better, sort of, compressed-based vocabulary faster, which makes it more general. The shape of that curve matters from a compression perspective. 

FUSI: Yeah. I mean, I think it would help here to expand on what I was talking about in terms of, … 

BURGER: Yes. Please.  

FUSI: … like, minimum description length principle. The minimum description length principle is basically the loss of the model you’re training; that’s one component. And so it’s a sum over the mistakes you make at predicting or, you know, the mistakes you make at predicting each word. And that’s one term. And the other term is how long it takes you in code to describe the model and the training procedure, … 

BURGER: Right. 

FUSI: … to get to that training curve, to produce that training curve.  

BURGER: Right. 

FUSI: So, yes, if you look at collectively, one term is, kind of, fixed. It’s an amount of code it would take you to write out a language model, for instance, in code. Like, literally implement it, not the weights, just implement the initialization of it and then the training loop. And then on the other side, you have this training loss that gets generated as you start observing data. And, of course, because it’s a sum, you want to minimize really the area, like, you want to minimize the sum. And so, like, a flatter curve is much better than, like, the steeper curve, you know, even if it ends up at the end to be slightly better. 

BURGER: Yeah. Concave is better than convex. 

FUSI: Among other things, yes. [LAUGHTER] 

BURGER: Sorry. So, you know, I think that we could do a whole episode on this compression view because it’s really fascinating. And the lossless part of it is what blew my mind. And I think, you know, I’m guessing there are multiple camps here, and you’re squarely in one camp, so I’m guessing we’ll get a bunch of feedback from the other camps. 

So, Subutai, you know, can I think of cortical columns as compressors? 

AHMAD: Yeah, it’s a good question. You know, I, you know, there’s so much in the compression literature that you can draw insight from. You know, if you look at the representations in cortical columns and that populations that neurons have, you know, some of the things you have to deal with are that the brain doesn’t have a huge nuclear power plant attached to it. 

You know, we only have 12 watts or so to process everything we want to do, and the representations that evolution has discovered are incredibly sparse. And what that means is that you may have thousands and thousands of neurons in a layer, but only about 1% of them will actually be active at a time. And so it’s a very small subset of neurons that are actually active.  

I don’t know about this minimum description length, whether that applies. I can say a couple of things about that. There’s, you know, by and large, the representations are very sparse when you’re predicting well. When you see a surprise, there’s a burst of activity.  

BURGER: Yup. 

AHMAD: When there’s something that’s unusual, there’s a lot more neurons that fire, and … 

BURGER: That’s why learning is tiring!  

AHMAD: That’s why learning [LAUGHTER] … exactly. No, no, that’s right, that’s right.  

And so what we think is happening is that, you know, the actual representation of something is a very small number of neurons. When you’re surprised, there may be many things that are consistent with that surprise, and so your brain represents a union of all of those things at once. 

And when you have a very sparse representation, you can actually have a union of many, many different things without getting confused. So that’s what we think is going on there. So it is a very compressed, very efficient representation. And because it’s such a small percentage of neurons that are firing, we are very, very parsimonious in how we represent things and extremely energy efficient metabolically. 

BURGER: I wanted to get to the efficiency point, but before I do, you know, you talk about this 1, you know, 1 to 2% of the neurons firing. But it’s, actually, the brain is actually much sparser than that at a fine grain, right?  

AHMAD: Yes, yes.  

BURGER: Because, you know, you have 1% of the neurons firing, but they aren’t connected to all the other neurons in the region. 

AHMAD: That’s right. Yeah. 

BURGER: So really the sparsity should be the product of the connectivity fraction times the activity factor. 

AHMAD: Yeah. Yeah. 

BURGER: Right. That’s about one out of 10,000. Something like that. 

AHMAD: Exactly. Yeah. So something like maybe 1% of the neurons are firing at any point in time, and maybe 1% of the connections that are possible are actually there at any point in time. So it’s a very, very small, you know, subnetwork through this massive network that’s actually being activated, a tiny percentage of neurons going through a very, very tiny piece of the full network. 

You know, it’s common to, you know, some people say, “Oh, we’re only using 1% of our brain.” That’s not true. It just means at any point in time, you’re only using 1%, but at other points in time, a different 1% is being used. So, you know, the activity does move around quite a bit. But, any point in time, it’s extremely small. 

BURGER: So, OK, the sparsity, I think, you know, the representation—how the brain is doing this compression biologically—is super fascinating. And I want to go on a little bit of a detour now to efficiency. So I remember in 2017 when in MSR we were building, you know, hardware acceleration for RNNs. 

And then the transformer hit, and they were optimized, you know, to be highly parallelizable across this quadratic attention map for GPUs. The way I would describe it is that that transition to semi-supervised training moved us from an era when we were really data limited, like you had to have good high-quality labeled data, to you were compute limited.  

And when that transition happened, we hockey-sticked from, “I’m building faster machines but I’m limited by data” to the bigger machine I can build, as long as I have enough, you know, unlabeled data of high quality, the better I can do with the model. And so we went on the supercomputing arms race, and now we’re building these, like, just gargantuan machines. 

And really, we’ve kind of been brute-forcing it. I mean, we’ve done a lot of things to optimize, like quantization, you know, and other and, you know, a better process node, you know, a better, more efficient tensor unit design. But to first order, we’ve been training bigger models by building bigger systems.  

And I just wonder, do you think that the brain at this 10 to 12 watts in the neocortex just has a fundamentally more efficient learning mechanism? Or do we think that, you know, what we’re doing in transformers in the most advanced silicon is as efficient, we’re just building much larger, more capable models? 

AHMAD: Oh, I think without a doubt, transformers are extremely inefficient and very, very brute force. We touched on this a little bit earlier in the attention mechanism, where we’re, you know, transformers are essentially comparing every token to every other token. I mean, there are architectures which reduce that, for sure, but it’s essentially an n-squared operation. And we’re doing this at every layer. 

I mean, there’s nothing like that in the brain. Our processing, you know, in some sense, the context for the very next word I’m about to say is my entire life, right? And the amount of time I take to take the next word doesn’t depend on the length of the context at all. It’s a constant time dependence on context. 

So it’s a significant, you know, reduction in the compute that’s required. You can kind of think about, like the brain—I think has somewhere around maybe 70 trillion synapses. When I say the brain, I mean the neocortex, has about 70 trillion synapses. And it’s using only 12 watts. And a synapse is roughly equivalent to a parameter. 

And if you were to take the most efficient GPUs today and try to run a 70 trillion parameter model, it would be something like a megawatt of power. It’s tens of thous … it’s orders of magnitude more inefficient than what our brain is doing. So I absolutely believe that. 

BURGER: The metric I use, to go back to your point, you know, is, this is something, I think we talked about this back in the day, right? When, you know, after this kicked off for a few years, we were trying to project, like, how far would this go under the current model to inform the research and the directions you took. Which is why I got so interested in sparsity and working with you.  

And we would look at a training run and just say, how many joules did it take to train the whole model? How many parameters do we have? And sort of what’s our parameters per joule? And, if by that metric, you know, we were off by many orders of magnitude where the brain is, but I don’t know that that’s the right metric. So any thoughts on that? 

AHMAD: Yeah. I mean, in some ways, you know, transformers, you know, embody more knowledge in them than any human has.  

BURGER: Right.  

AHMAD: It has memorized, you know, the entire internet’s worth of knowledge, essentially. 

BURGER: All scientific papers … 

AHMAD: All scientific papers. You know, good and bad, whatever, you know, it has memorized everything. So that’s something that, you know, humans just cannot do. So there’s definitely stuff that’s better in transformers than humans.  

But fundamentally, I think, you know, we’re extremely efficient in how we process the next token or the next bit of information that’s coming in. And I think there’s a lot we can learn from the brain and apply to LLMs and future AI models there. 

FUSI: I was going to ask a question related to that because … forget memorizing the internet. But let me give you another example that transformers do really well. And I’m wondering, like, you know, the human aspect of this or the brain aspect of this because transformers, because of the n-square computation, they’re really good at stuff, like a needle in the haystack. 

So I can tell you right now, I can speak, I can talk to you, and I can tell you the password is something silly like “podcast microphone blue,” whatever. That’s the password. And then I can proceed and read the entire Odyssey or a bunch of other books to you out loud for the next 5 or 6 hours. And then I can ask the transformer, what was the password? And transformer will do this nice n-square computation many times, and it will spit out the password.  

A human, you know, there will be a decay of that password. And then at some point, it won’t remember, and depending on the human, it may be in the first chapter of the Odyssey or like at the end, but … so fundamentally the type of computation that is done is very different. So it always makes me wonder about the efficiency because it’s just, like, it’s a different type of computation. So the efficiency of … like, efficiency is kind of like, what are you doing divided by how good are you at doing it. And so when the things we’re doing are so incomparable in many ways, that always makes me … always troubles me a little bit. I don’t know… I don’t know if there’s any question in there. [LAUGHTER] 

AHMAD: Yeah. I mean, transformers can do the stuff that humans find very, very difficult to do. Absolutely. You know, maybe there’s a way to get the best of both. I don’t know. You know, I don’t know that it’s fundamentally necessary to have such brute-force computation to get all of these features. 

FUSI: That’s right. 

BURGER: Yeah. Yeah, it is a weird thing because, you know, this is why memory palaces work so well. Like, there is a way, though, for a human to remember that my microphone is gray. It’s not actually blue, Nicolò. 

FUSI: Mine is blue. You don’t see it. It’s off camera. You see, your world model …  

BURGER: It’s off camera. Yeah, I know. I was just teasing you.  

But there’s a way, like, if I can just connect it to enough things, get that connectivity graph, then I’ll remember it because it’s captured the signal out of the noise and connected to enough things I can retrieve it. And retrieval would be a whole other topic we don’t have time to get into today.  

But I do … now, I want to go to the straw man. So let’s take continual learning off the table. Let’s imagine that, as I go through my day, I’m just saving all of the sensory data to put in my training set. And now imagine that I take 100,000 little transformer blocks, and I’m training them each with what they’re seeing. 

OK, I replay the day so I don’t have to, again, I don’t have to worry about continuous learning and whatever cross-cortical column, you know, routing feature of the outputs, the inputs, and there’s—Subutai, we’ve talked about this—there’s a complex set of wiring there to bring features from here to there that gets learned. If I replicated that, could a transformer block kind of do what the cortical columns are doing?

Could I just instrument all my sensory patches with little transformer blocks and then wire them up in the right way and have it work? 

AHMAD: I think there’ll be … there’s still a couple of things we need. One is that cortical columns are fundamentally sensory motor. And so they’re actually, each one, each cortical column is initiating actions, as well. So you cannot have a static dataset fundamentally ahead of time. It’s always a dynamic because we’re constantly making movements to get the next bit of data. And so … 

BURGER: Couldn’t I tokenize that, though? 

AHMAD: I mean, you could tokenize the input and you can tokenize the output, but, you know, if you were to play the same set of inputs back again to a network that … a cortical column that’s randomly wired differently, it may make a different set of actions. And so as soon as it makes the first action that’s different, that dataset is no longer valid, right? It’s, you know, there is … you can’t fundamentally … you have to have a simulation of an environment rather than a static one-way dataset, if that makes sense.  

So I think that’s one piece that I think’s missing in transformers today, is this, sort of, sensory-motor loop. And then the other piece we talked about is continuous learning. 

BURGER: Yeah. 

AHMAD: I guess you said take it off the table, but … 

BURGER: It’s fundamental.  

AHMAD: Fundamental … different. Yeah, yeah. And maybe one other difference. We talked, you know, much earlier about a single latent space and the prediction that’s being made at the top of the transformer that you compute the loss function, and that’s back-propagated through the transformer. That’s not how neurons learn. Neurons are making … every neuron is actually making predictions, and every neuron is getting its input. 

And it’s learning independent of anything that happens at the top. And so it’s a much more granular learning signal. And information does flow from the top to bottom. But there’s also many, many other sources of information that it’s learning from. So it’s different in that sense, as well, mechanistically.  

BURGER: The reason I ask, and now I’d like to get into, you know, some of the … the fun speculation because I’ve just … it’s been a phenomenal discussion with the two. I think we’ve kind of elucidated the differences. Something I’ve wondered after I’ve talked to both of you … and, you know, Nicolò, kind of learning about this compression view of the world, lossless compression, and, Subutai, just, you know, the Thousand Brains Theory and these cortical columns and the sampling of, you know, the world to capture the signal that you can learn from. 

So let’s say that I was able to design a really small, efficient digital cortical column. Maybe it’s transformer-based with some, you know, a sparse representation and some sensory-motor mechanism built in. Maybe it’s more dendritic-based, you know, mapped into digital hardware. And I put a cortical column on every sensor I have in the world, associated with every person, and wire them up together with some of this and then have a, you know, billions of them that can form higher-level abstractions. Like, what do you think would happen? What could we do? 

AHMAD: That’s a fantastic thought exercise, I think [LAUGHS]. You know, again, assuming the cortical column is faithful and can generate, you know, or suggest motor actions, as well. I mean, in some sense, you could potentially have a super intelligent system, right, that’s far more intelligent than anything else on the planet.  

Now we’re scaling the number of cortical columns, you know, not from a mouse, you know, to a hundred thousand columns that a human might have, but potentially billions of cortical columns and way more. And there’s no reason to think there’s any fundamental limit there. So this sort of a system is, I think, the way that superintelligent systems will eventually be built.  

BURGER: But this is a very different direction … 

AHMAD: It’s a very different … 

BURGER: … than the one we’re currently headed down with, like, these monolithic models where we’re doing tons of RL, you know, to capture, you know, to get high-value human collaboration in distribution. 

AHMAD: Yes. It’s completely different than the direction we’re proceeding.  

So I think they, you know, to go down that path, there needs to be a fundamental rethinking of some of our assumptions, potentially even down to the hardware architectures that are necessary to implement it. The, you know, fundamental learning algorithms, the fundamental training paradigm. We talked about, you know, you can’t have a static dataset. You’re constantly moving around in the world and doing things. So it’s a very, very different way of going about AI than what we’re doing today. 

BURGER: Sounds like a great time to be an AI researcher. 

AHMAD: Absolutely. [LAUGHTER] 

BURGER: Nicolò, what was your reaction to that hypothesis? 

FUSI: It sounds super interesting. I mean, my brain was churning. You know, my background is very different. And so, like, I’m in a much worse position to answer this question. But I was starting to think, OK, so let’s say I do this. What would be my loss function? What, you know, how would information flow through the system? Like, sounds like cortical columns would each have their own loss that then I would aggregate—and then I would add a contribution that is, like, higher level. 

And then back to my question. You know, how is the temporal information coordinated? Because one way to see this is that, you know, the way I’m coming to understand this is that it’s kind of like a multi-view framework. 

You have the same phenomena represented to multiple independent, but at the same time, views. And so part of me is like it feels like that you need to tie together these cortical columns in such a way that they all get that gradient feedback if you’re training with gradient-based methods, for instance. And so that’s, kind of, it feels super, super interesting. 

It is related to a lot of, you know, very superficially, to a lot of ideas in machine learning around, hey, is it better to have one giant super deep network? Is it better to have a bunch of shallow networks? But the difference is also in the way you train them, right? We typically train this bunch of shallow networks on kind of the same objective and the same data and not typically into an experiential cycle. Whereas this sounds like this is a different way to do it.  

BURGER: Right, right. I think … I want to pull this back around to the title of the podcast. And so I’ll share an observation. You know, so I’ve been using some of the latest models to code. You know, they’re getting better really fast. I’ve been using them to kind of relearn some of the physics that I never really understood deeply. 

You know, especially in general relativity, like E=MC2. Like, why is C in there at all, right? Just stuff like that. Because now it can actually explain it to me, and I can keep beating at it until I understand it, and then, of course, work. 

And at some point, I asked the model, “Can you describe how I think?” And I was just curious. And it, you know, it gave me a page description that my jaw dropped because I said this, this thing knows me better than I know myself. I don’t think any human being, including me, could have captured kind of the way my approach to learning and my brain works, and I just read it as, like, like, yep, that’s right. And I learned something about myself.  

So I wouldn’t say that it passed the Turing test because this is way beyond Turing test. This was like, this thing knows me way better, you know, than I thought any machine ever could. I mean, I’m having a conversation with it. It could be human, but it’s superhuman. So in some sense, it’s like intelligent beyond human capabilities with its ability to discern patterns in how someone’s interacting.  And yet it’s a tool. You know, it’s not conscious. It doesn’t have agency, embodiment, emotion. It understands a lot of that stuff from the training data. But at the end of the day, it’s a stochastic parrot, right? It’s got, you know, it’s got the weights, and I give it a token, and it outputs a token. So, like, are these machines intelligent or not?  

FUSI: I’ll let Subutai answer first. [LAUGHS] 

AHMAD: OK. You know, you know, it’s definitely a savant, right? It knows a huge amount about the world. It’s absorbed a lot of stuff, and it can articulate that in ways that are just amazing. And, you know, it’s taken your chat history with, you know, presumably thousands of chats and able to summarize that in a way that’s remarkable. 

At the same time, I think, you know, transformers are not intelligent in the way that a three-year-old is, right? A three-year-old human is very curious, is constantly learning. It can learn almost anything. And, you know, a three-year-old Einstein was able to learn and eventually come up with theories that shook the world. That, you know, E=MC2

And so, you know, could a transformer do that? I don’t think so. And so I think there’s still a difference. There’s things it can do that are amazing. But there are still basic things that a child can do that transformers cannot do. So I think there’s still a gap there. Exactly how to articulate it, and how to bridge that gap, is, of course, the trillion-dollar question. But it is bridgeable. And there is a gap today. 

BURGER: Right. Nicolò? 

FUSI: You know, I think, from my perspective, they are intelligent. And from my perspective, I go back to the definition of intelligent, which is like, can you achieve your objectives in a variety of environments? It’s a very basic fundamental, but it’s kind of, you know, it can be embodied, a form of embodied intelligence, an agentic intelligence. If I plop you in an environment, and I give you an objective, can you achieve it? And the wilder the environment, the harder the task is.  

And I do think … I agree with Subutai. Like, there is a jaggedness of intelligence we keep describing.  

BURGER: Yup. 

FUSI: Like these things cannot be simultaneously super good, you know, Olympiad-level mathematicians and still give you stupid answers when you’re trying to, I don’t know, you know, figure out which cable goes where in your … in your car’s battery, you know, like, whatever. 

BURGER: [LAUGHS] Well, then it’s better than me. I’m not an Olympiad-level mathematician, and I do stupid stuff all the time. 

FUSI: I know exactly. Well, you know, whatever that was, that was a bad example. But you get it. But part of it goes back to the compression view. Like, I do believe that intelligence is compression. So the ability to come up with succinct explanations for complex phenomena and even succinct explanations for complex worlds, and then it implies or leads to your ability to operate within them, and the fact that we have these things that they can prove crazy theorems but at the same time fail at fairly rudimentary tasks is a sign that the, yes, transformers are great in terms of inductive biases they put on the world and computation that are great, but we’re ultimately all subject to the No Free Lunch Theorem (opens in new tab)

You know, across the world, the set of tasks that you could be pursuing. You know, you have certain inductive biases that kind of privilege certain tasks at the expense of others. And there isn’t, like, a thing yet that has expanded our set of tasks that are addressable. And so I do think that it’s a matter of rethinking our approach to a few things, whether I think likely both on the architecture front and on the losses and the way we train these systems front. I think there is an opportunity to expand the intelligent frontier of these models. But yeah, from my perspective, they are intelligent already just in a jagged way. 

BURGER: It’s such an interesting question, and I know a lot of people write a lot about this, so I don’t think treading any new ground here. But, you know, there’s the diversity of the tasks you can excel at. You know, are you able to handle nuance and understand things deeply? Are you able to learn continuously? Right now, the systems can’t, right. Are you embodied? I don’t know if that matters. Do you have an objective? Well, we could give them one. Are you conscious? Is that … I mean, that’s a whole other thing.  

So it just feels like there’s a bunch of check boxes, and we’ve checked a bunch of them, and a bunch of them are unchecked. And maybe there’s no consensus on, like, where that threshold is because there are many dimensions of intelligence, and some of which humans don’t even have. 

FUSI: And that’s why we have the term AGI and ASI, and people are debating the G and the S—what is general, what is specialized. So there is, like, it’s a huge discourse, like, for sure. But that’s why we had to start characterizing. But if you go back in the definition, going back to my schooling, go back to the definition of intelligence from Plato and Aristotle and Descartes, like, in some sense, you see the goalpost moving through the centuries around what we define as intelligent.  

BURGER: Right.  

FUSI: And I feel like we are still doing it. 

BURGER: Yeah. We’ll be doing it for a long time, you know, which in AI velocity is probably another like four or five years.  

Hey, I just want to thank you both for the dialogue. You know, I treasure both of you as, you know, intellects and scholars and friends. It was just a joy to nerd out with you all. So thank you both for taking the time. 

AHMAD: Thank you so much, Doug, for having me.  

FUSI: Thank you for having us. This was great. 

[MUSIC] 

STANDARD OUTRO: You’ve been listening to The Shape of Things to Come, a Microsoft Research Podcast. Check out more episodes of the podcast at aka.ms/researchpodcast or on YouTube and major podcast platforms. 

[MUSIC FADES] 

Opens in a new tab

The post Will machines ever be intelligent?  appeared first on Microsoft Research.

Read the whole story
alvinashcraft
9 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

The Missing Mechanisms of the Agentic Economy

1 Share

For the past two years, I’ve been working with economist Ilan Strauss at the AI Disclosures Project. We started out by asking what regulators would need to know to ensure the safety of AI products that touch hundreds of millions of people. We are now exploring the missing mechanisms that are needed to enable the agentic economy.

This essay traces our path from disclosures through protocols to markets and mechanism design. Rather than simply stating our conclusions, I’m sharing our thought process and some of the conversations and historical examples that have shaped it.

We will be holding a number of focused convenings to explore these ideas over the next couple of months, and my hope is that shared context will enable more productive engagement with what is very much a work in progress.

The disclosure problem

Ilan Strauss and I started the AI Disclosures Project in early 2024 with a conviction that most regulators had little idea how AI worked or where it was going. The field was so young that many of the early regulatory proposals were misguided. We thought that regulators and industry should start by agreeing on standards for disclosure, so that we could all learn together as the technology develops. You can’t regulate what you don’t understand.

One of our first insights was that focusing solely on model safety was a mistake, much as if regulators inspected automobiles at the factory but completely ignored their use on the roads. We believed (and still do) that the focus should be on AI as deployed. And we believe that disclosures shouldn’t focus just on capabilities but on business models and the operating metrics that AI companies use to shape how their products operate.

Ilan and I had worked together previously with Mariana Mazzucato at University College London on what we called “algorithmic attention rents,” studying how platforms like Amazon and Google control user attention to extract economic rents from their suppliers. We observed that organic search at Google and Amazon was a huge advance in market coordination, using hundreds of signals to find the best match for a user’s intent. In effect, both companies had built a better “invisible hand.” And yet after decades of success, they turned away from that advance. To use Cory Doctorow’s coinage, they began “enshittifying” their services by substituting inferior paid results for the top organic search results in order to pad their bottom line.

We’d also watched social media start out with the promise of keeping you in touch with your friends and foster productive conversations, but then instead began to optimize for engagement at the expense of everything else. By the time anyone understood what was happening, the damage had been done. We can see the inflection point in their financial metrics, but neither regulators nor the public can see the changes in operating metrics that drove the financials. What if we could capture what good looks like before it gets enshittified, and identify how that changes over time?

We also observed that modern technology companies are completely different from industrial era corporations, where you can understand key elements of the business by tracing the inputs and the outputs through the financial statements. Instead, the business is largely driven by intangibles, which are lumped into one impenetrable black box.

We wanted to learn from that mistake. While the horse was already out of the barn on search and social media, we hoped to get disclosure of operating metrics into AI governance while there was still an appetite for regulation. Unfortunately, that window was very short. The failure turned out to be productive, though, because it forced us to think harder about regulation more broadly and what other leverage points might be found.

Protocols as functional disclosures

The first turn in our thinking came when we realized that disclosures aren’t just informational. The most important disclosures are functional. We came to see the parallels between disclosures and communications protocols, the agreed-on methods by which networked systems share information. For example, the HTTP protocol that underlies the World Wide Web specifies how a web browser and web server communicate in order to display a web page.

This is a structured communication with rules that must be followed and data that must be exchanged in a particular order.  An HTTP request that identifies the user agent as a command line program such as curl rather than a graphical browser such as Chrome triggers a different response from the server. The user-agent string isn’t a report filed with a regulator. It’s an operational signal embedded in the protocol, and it carries a lot of information.

Once you see protocols as a system of functional disclosures, you start noticing that every regulatory system has a kind of communications and control protocol at its heart. Generally Accepted Accounting Principles (GAAP) or IFRS, the European equivalent, are protocols for communication between companies and their accountants, auditors, banks, investors, and tax authorities. Even road markings and road signs are a communications protocol, giving information to drivers about local conditions, laws, and the proper use of the road. These are slow, analog protocols, but they are protocols nonetheless.

Protocols can be inspected. Observability is the key to governance. Police observe speeders on the road; credit card processors and banks watch for credit card fraud on their payment networks; email processors filter spam as it passes through nodes on the network. The observability points for AI are still emerging, but that’s where regulators should be focused.

Even beyond being a locus for observability and regulability, protocols themselves do an enormous amount of the governing work in modern technology systems. Spanning everything from how packets get from one place to another to what gets displayed, who has permission to see it, and sometimes even what it costs, they ultimately determine who can interoperate with whom. That led us to an even bigger realization.

Protocols shape markets

Think about the early shape of the AI chatbot market. It was a winner-takes-all race to be the dominant platform for AI in the way Windows became the platform for PCs, or iOS and Android for phones. Whoever wins controls the market. Then Anthropic introduced MCP, the Model Context Protocol. All of a sudden, the landscape looked more like a web. There could be many winners. It didn’t matter what model you were running or whose APIs you were calling as long as you followed the protocol. And as the agentic AI market unfolded, the protocol wasn’t just MCP. An AI agent could be a user of the existing internet protocol stacks. Whether MCP itself survives or is superseded by other protocols, the shape of the market was transformed.

This insight reframed our whole project. Protocols are not just technical infrastructure. They are market-shaping mechanisms.

Workflows are also protocols

I talked last week with some of the folks working on the Long Now Foundation’s partnership with Ethereum’s Summer of Protocols project, and that widened my lens even further.

When software people hear “protocol,” we think of communication protocols: TCP/IP, HTTP, MCP, or, say, Stripe’s Machine Payment Protocol (MPP).

To the Long Now folks, a protocol is any standardized way of doing something. Wildfire management teams follow protocols. So do flood response teams, hospital emergency rooms, and air traffic controllers. Atul Gawande’s book The Checklist Manifesto was an attempt to establish a common protocol for surgical operating theaters. This is a very different definition of protocol, and yet putting the two meanings of the word into the same frame makes a new kind of sense.

In his introduction to the Summer of Protocols’ Protocol Reader, Venkatesh Rao cited Ethereum researcher Danny Ryan’s definition of a protocol as a “stratum of codified behavior” enabling coordination. He pointed out that protocols tend to become invisible once adopted. Rao calls this a “Whitehead advance,” after the philosopher Alfred North Whitehead’s observation that civilization advances by extending what we can do without thinking.

But he also made the thought-provoking point that a protocol is an “engineered argument,” in contrast with an API, which he says is an “engineered agreement” enforced by one dominant actor. There’s more to it than just the power asymmetry of enforced agreement, though. In a followup conversation, Venkatesh Rao noted that protocols are “not just codified modes of information exchange, but modes of live, structured, argumentation, often with an active computational element. For example, CSMA/CD (Ethernet) must detect packet collisions and compute and execute a random delay for retransmittal of packets. This is not mere structured communication. This is argumentation with what philosophers call dynamic semantics.”

Rao continued: “The moment you go beyond computing protocols, real-world feedback loops from material consequences become really important. For example, container-shipping is quite close architecturally to TCP/IP (the big difference being that packets can be dropped and retransmitted while lost containers are actually lost), but because it has a materially embodied feedback loop, regulatory mechanisms start to behave more like control systems than communication systems.”

I love the idea of protocols as an engineered argument. The dynamism this suggests is going to be ever more true in a future of agentic protocols. But this notion also triggered another thought, which is that markets are also engineered arguments. My bridge to this reformulation was the difference between de jure protocols that arise from a formal standards process, and de facto protocols that arise through market contention.

In the early days of the internet, the Internet Engineering Task Force (IETF) was all about engineered arguments. People had ideas about how the internet ought to work, and to prove their point they had to show up with interoperable implementations. No one had the ability to enforce anything. Agreement had to evolve. As Dave Clark famously put it, “We reject: kings, presidents, and voting. We believe in: rough consensus and running code.” The de facto protocols of the internet that emerged from the IETF ended up significantly outperforming the competing de jure networking protocols that emerged from telecommunications standards bodies. The IETF framed the argument; whoever showed up made their case and won or lost by way of adoption.

It also made me remember another decades old story that I had lived through. Microsoft and Netscape were duking it out in the web server market and were building their own “engineered agreements” for what was up the stack from the base web server functionality. Everyone thought that Apache wasn’t keeping up, but they had a trump card. They provided an extension layer. And that engineered all kinds of productive arguments between a market of competing developers rather than a single engineered agreement imposed by either a dominant player OR a dominant committee.

Rao also noted that protocols spread slowly but become nearly impossible to dislodge once established. For example, SMTP (the protocol for email) dates back to 1982, and has outlasted many competitors. There is a lot of path dependence. And so getting the first steps right is an important part of engineering the argument.

And in his essay “Standards Make the World” for the Summer of Protocols project, David Lang makes the point that technical standards form a third pillar of modern society, alongside private organizations and public institutions. They aren’t the state and they aren’t the market, but they’re essential to both. When they work well, standards become enabling technologies. The internet. The shipping container. Standard time. They are civilizational infrastructure.

In short, we are not just building communication protocols for software agents. We are developing a new way to standardize the best practices and workflows that will shape the human + AI future, allowing humans and agents to cooperate across organizations, industries, and borders.

Skills can also be seen as protocols

Once the Long Now team planted in my mind the connection between workflows and protocols, it occurred to me that Agent Skills are also a “stratum of codified behavior,” and perhaps even a set of competing “engineered arguments” for how to do work with AI.

At the simplest level, a Skill is a piece of structured knowledge: here’s how to create a Word document; here’s how to extract the text from a PDF; here’s how to publish on the Hugging Face Hub. There can be many Skills that attempt to codify the same knowledge, but some may be better than others. As Skills multiply, how will we find the best ones? This is in many ways analogous to the organic web search problem, which Google solved by aggregating hundreds of useful signals.

And we’re seeing that there is a kind of hierarchy of skills. Jesse Vincent’s Superpowers framework, which has become one of the most widely adopted open source projects in AI-assisted development, doesn’t just give agents individual capabilities. It encodes an entire software development methodology: brainstorm before you build, plan before you code, test before you ship, review before you merge. That’s a standardized workflow. It’s a lot like the kinds of protocol that the Long Now folks were talking about, expressed in a form that agents can follow.

The existing protocols that the protocol research community talks about, like wildfire management protocols or hospital triage protocols, encode best practices into a repeatable, teachable process for human teams. They have yet to be adapted for agents. And in fact, many of them are never going to be entirely agentic. We will need to build mechanisms for workflows that include both AI agents and humans working together.

Agent skills in some (but not all) areas raise the same questions that industrial standards have always raised: who decides what the best practice is? How do you verify quality? How do you govern updates? We may be talking about skills that encode the workflow for regulatory compliance in a specific industry, or for conducting an environmental impact assessment, or for managing a clinical trial. Are the standards de jure or de facto, the result of an engineered agreement by a committee or an engineered argument that enables a vibrant market?

At O’Reilly, this is something we think about a lot. We’re a company built on codifying expert knowledge. We’ve published books and organized conferences and online training that taught people how to do new things. Now we’re asking “What does it look like to publish the skills that teach agents how to do things? And how do we make sure those skills are discoverable, trustworthy, and monetizable, not just for us but for every domain expert who has knowledge worth encoding?” And how do they emerge from contention in a vibrant market rather than by decree?

We believe we’ll all be better off with an engineered argument than an engineered agreement. And that brings me to mechanism design.

The missing mechanisms

Economists use the term “mechanism design” to describe the engineering of rules and incentive structures that lead self-interested actors to produce outcomes that are good for everyone. It’s sometimes called “reverse game theory.” Rather than analyzing the equilibria that emerge from a given set of rules, you start with the outcome you want and work backward to design the rules that will get you there.

Mechanism design theory got its start in the 1960s when Leonid Hurwicz took up the problem of how a planner can make good decisions when the information needed to make them is scattered among many different people, each of whom has their own interests. His key insight was that people won’t reliably reveal what they know unless it’s in their interest to do so. So how do you design a system that aligns their incentives?

The field that Hurwicz founded and that Eric Maskin and Roger Myerson developed through the 1970s and 80s earned all three the Nobel Prize in Economics in 2007.

I first encountered the field when Jonathan Hall, at the time the Chief Economist at Uber, waved Al Roth’s book Who Gets What — and Why at me and said “This is my Bible.” In it, Roth describes his own work on mechanism design, which won him the 2012 Nobel Prize in Economics along with Lloyd Shapley. Roth applied mechanism design to kidney matching markets, markets for college admissions, for law clerks and judges, and for hospitals and medical residents. When I first talked to Jonathan and then Al Roth, my layman’s takeaway about mechanism design was that it was simply the application of economic theory to design better markets.

And I’ve since come to think even more broadly about what mechanism design might mean in a technology context. In my broader framing, packet switching was a breakthrough in mechanism design. So for that matter was TCP/IP, the World Wide Web, and the protocol-centric architecture of Unix/Linux, which enabled open source and the distributed, cooperative software development environment we take for granted today. PageRank and the rest of Google’s organic search system also seems to me to be a kind of mechanism design. So do Pay Per Click advertising and the Google ad auction. All of them are ways of aligning incentives such that self-interested actors produce outcomes that are good for others as well.

So that brings me back to AI. Right now, there’s a problem that makes the AI/human knowledge market less efficient than it could be. The disrespect for IP that has been shown by the AI labs and applications during the training stage, and even now during inference, has led to efforts by content owners to protect their content from AI. Do not crawl. Lawsuits. Reluctance to share information. Even the AI labs are complaining about the theft of their IP and trying to protect their model weights from distillation.

It’s an economy crying out for mechanism design.

The lesson of YouTube Content ID is worth learning. Twenty-five years ago, the music industry was in the same position that content creators are in today with AI. In response to unauthorized use of their music by creators, music publishers’ demand to YouTube was “Take it down.” But as Google engineer Doug Eck explained to me, YouTube came up with a better answer: “How about we help you monetize it instead?” I don’t know the details of how that decision was made but I do know the eventual outcome. Aligned incentives led to a vibrant creator economy in which YouTube’s video creators, the music companies, and Google all got to share in the value that was created.

That should give us inspiration for how to solve some of the problems we face now with AI. Whether it’s with Agent Skills, NotebookLM, or other emergent artifacts of the new AI/human knowledge economy, we need to align the incentives. If we can grow the pie, and in a way where no single gatekeeper captures the bulk of the benefit, there’s a way to create a vibrant market. But that requires building mechanisms that don’t exist yet.

What mechanisms are missing from the agentic economy? Here’s a partial list:

Skills markets. There’s an enormous economic opportunity for humans to create and trade skills that agents can use. These are not just simple aggregation of context with tool use instructions, but higher-level, industry-specific workflows that encode deep human expertise. At O’Reilly, we’re figuring out how to turn our knowledge and that of our authors into skills, how to make them discoverable, and how to sell them. But as of yet, there’s no way for a broader community of skill creators to participate.

Quality and governance for skills. Some skills will need the same kinds of governance that industrial standards have. Who certifies that a medical skills package follows current clinical guidelines? Who updates it when the guidelines change? We haven’t begun to build the institutions that would govern agent skills at that level.

Registries and discovery. The MCP community has been working on a registry protocol, as is the Ethereum community.

This isn’t just a technical development but a business opportunity. I still remember when Network Solutions was running the original top level internet domain name registry under contract from the National Science Foundation. When the government said it wouldn’t end the payments, Network Solutions planned to walk away. Then they realized what they had. On the early internet, domain name registration became a surprisingly big business. Now it’s just boring civilizational infrastructure. Is there something similar for AI models, applications, and agents?

Organic search for agents. Google’s first great innovation on the web wasn’t how to make pay per click ads really work with a data-driven ad auction. It was organic search: a way of coordinating a market with hundreds of signals that ignored price and worked independently of whether the destination content was free or paid. The New York Times (or oreilly.com) is subscription-based, but that isn’t a factor in whether Google shows it to you. Google figured out signals that let them say, “This is the best result for this query.” Sites behind paywalls figured out how to disclose enough for people to decide whether they wanted to take the next step and enter into a transaction. That’s an engineered argument.

We’re going to need the equivalent for skills and agent services. We’ll start with curated marketplaces. Vercel already has one. But we’re a long way from something as effective as Google’s peak in organic search. The search space will be huge, with hundreds of millions, maybe billions of agents seeking the best way to accomplish trillions of distinct tasks. Skills can help them save on inference costs and deliver better results. The question is what signals will drive discovery of the best match.

Extension architectures. MCP’s extension model (including the new Apps Extension) is promising. This is the Apache model all over again: keep the core simple, let people layer different approaches on top, and let the market sort out which ones win. It is, in essence, an engineered argument rather than an engineered agreement.

Payment layers. Stripe has been working on agentic commerce, but it seems to be focused on traditional e-commerce transactions like booking a ticket or buying a product. What about a payment layer for skills? There have been proposals for monetizing MCP calls, pay per call, pay per token, but none have caught on yet. Coinbase’s x402 protocol may also end up playing a role.

Progressive access and authentication. MCP Server Cards promise to let a service specify its terms: here’s what we charge, here’s how you authenticate. That’s a functional disclosure layer that could enable commerce. It could enable progressive privileges: a free O’Reilly subscriber gets one set of tools, a paying subscriber gets a richer set, all on top of the same MCP server. Again, that’s an engineered argument with the market deciding the winners.

Neutrality in agent routing. When ChatGPT decides to show you a Booking.com widget instead of an Airbnb widget, who made that choice, and on what basis? OpenAI claims commercial considerations aren’t a factor. That’s hard to take at face value. We need something like the original principle of organic search: surface the best result for the user, not the most profitable one for the platform.

We don’t know the future, but we can set ourselves up to shape it for the better

I’m old enough to remember when UUCP was giving way to the internet, and there was a real debate over whether explicit path routing or domain routing was better. In retrospect, it’s blindingly obvious that path routing wasn’t going to scale. But it’s worthwhile to know that at the time, people weren’t at all clear about that!

The same is true now. Some of what I’ve described will turn out to be the equivalent of explicit path routing: a dead end that was only plausible for a small scale network. Other parts will turn out to be as fundamental as DNS or HTTP. But we’re not trying to pick the winners. We’re trying to engineer the argument.

If we can enable better markets, it will allow a process of discovery. People try different things, most fail, some catch on. The job right now is to build the mechanisms that help the market to evolve.

We need mechanisms that no single gatekeeper can control. Modular, decentralized architectures let people experiment with business models, routing decisions, payment systems, and quality signals. And alongside those markets, we will eventually need institutions (some of which will be protocols) to maintain standards that will become the infrastructure of the next economy.

This article recapitulates a conversation with Ilan Strauss and Ido Salomon, and a separate conversation on the broader meaning of protocols in the context of industry workflows and civilizational infrastructure with Venlaktesh Rao and Timber Schroff of the Ethereum Foundation’s Summer of Protocols program, and Denise Hearn and James Home of the Long Now Foundation. Rao’s Protocol Reader and  David Lang’s “Standards Make the World,” published through the Summer of Protocols project, inform the argument about protocols as civilizational infrastructure.



Read the whole story
alvinashcraft
10 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories