Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
155206 stories
·
33 followers

This Week in AI: Production Viability

1 Share

On this week’s episode, host and the founder of AI advisory firm Intelligence Briefing Andreas Welsch brought together Maya Mikhailov, cofounder and CEO of Savvi AI, and Doug Shannon, generative AI and intelligent automation leader, to cover a handful of interconnected topics that practitioners are navigating right now: OpenAI’s push into personal finance, the role of metacognition in AI-assisted technical work, the growing backlash against token-based productivity metrics, and the new role of forward-deployed engineer. Together, these stories sketch a picture of an industry that’s good at generating output but is still figuring out what output is worth.

Why OpenAI wants your bank account data

When OpenAI announced it was analyzing users’ transaction data in partnership with financial institutions, the coverage focused on the consumer benefit: a smarter way to track spending, comparable to what Credit Karma or Mint offered but with a more conversational interface.

But that’s not all the company’s interested in, or even the main thing. Maya reframed the stakes: “What OpenAI wants to do is figure out consumer intent.” Being able to access users’ financial data is less about helping people manage their money and more about completing a profile the company can then monetize. OpenAI already builds a surprisingly accurate picture of users from their chat histories. Add transaction data and you get specifics that weren’t there before: what someone is saving for, what they’re anxious about, where their money is actually going. That’s a data asset worth a great deal to advertisers.

We’ve seen this pattern before, and as Andreas noted, companies have long held (and used) potentially invasive data to recommend products. The Target pregnancy prediction story is now more than a decade old, but it’s still being taught in business school, including by Andreas, precisely because it illustrates how behavioral data can be combined to infer things people haven’t explicitly disclosed—and spotlights the fine line between effective recommendations and those that feel too personalized, reminding consumers just how much information companies have on them. Companies’ profile-building capability hasn’t changed, but AI chat adds a new wrinkle, said Maya. A conversational interface makes disclosure feel natural, so the knowledge graph based on your chat history is very powerful. And these tools are also better positioned to share recommendations than traditional avenues. “By having this style that is agreeable, that is engaging,” Maya explained, “those recommendations are going to be a lot stickier than what a fragment of a sentence I type into a regular search engine.”

Metacognition as a professional skill

When you delegate thinking to a system that averages across a massive range of inputs to produce an answer, you need to know when that answer is good enough and when it isn’t.

“We’re essentially being averaged out,” Doug said. The model is doing many things behind the scenes to find a mean response. The human’s job is to ask questions about the questions, to push past the first answer, and to know whether their own judgment is still in the loop. That’s why Doug’s been pushing for a renewed interest in metacognition, or “thinking about thinking.” Offloading cognitive load that’s peripheral to your work is fine, Doug and Maya agreed. Offloading the reasoning that’s central to your job’s value—what Doug called cognitive surrender—is where organizations get into trouble.

The future advantage won’t come from access to AI. Everyone will have some kind of access to it. The advantage will come from knowing what to offload, what to question, and what should never leave human judgment. This is a skill-development question as much as a philosophical one. The people who’ll be most effective with AI tools aren’t the ones who use them most; they’re the ones who understand what to hand off and what to keep. That requires domain knowledge, judgment about when a model’s answer is plausible but wrong, and enough fluency with how these systems work to recognize when you’re being handed an average instead of an answer.

Tokenmaxxing and the wrong incentive

The tokenmaxxing debate seems to be coming to a head. Amazon abolished its AI productivity leaderboard after employees started gaming it by writing inefficient code to rack up token usage. And one company reportedly burned through $500M in Anthropic tokens in a single month after failing to set limits. The companies encouraging tokenmaxxing are incentivizing the wrong metrics, Maya argued. It’s like determining which bakery is best by the amount of flour it uses. The right question is “Are we making a quality product?”

Andreas shared his own vibe coding experience as an example of how token consumption and technical debt compound in practice. A developer starts with a modest plan and burns through their quota running agents in half an hour. They upgrade to a higher tier, paying five times more, but now the sunk-cost logic kicks in. As Andreas pointed out, now they feel like they “should also be getting five times more the value out of [their subscription],” so scope expands from a single tool into a unified business operating system. Three weeks later, the accumulated complexity has outpaced the ability to evaluate it: Repeated security audits keep surfacing new issues, each pass generating recommendations that require cybersecurity expertise most vibe coders don’t have. Here’s where Doug’s point about metacognition applies: The more a builder stays actively involved in understanding what the system is actually doing, the better their judgment about whether it is working. For less engaged users, the risk is accepting the output, shipping the debt, and discovering the consequences later.

Most of the misalignment originates in the gap between what executives expect from AI and what practitioners deal with day-to-day. Executives see a capability that could change the slope of productivity, Maya explained. Engineers and analysts live with the technical debt, the version control problems, and the regulatory constraints that don’t disappear because you have a better code completion tool. The leaderboard problem is a symptom of that disconnect.

GitHub’s recent shift from unlimited to usage-based pricing for Copilot is likely to realign these incentives faster than any internal policy change would. When more CFOs start seeing the actual bills, the leaderboards will all come down.

Doug identified a related problem emerging with the “cognitive surrender” to LLMs. When organizations encourage employees to pipe internal processes, proprietary logic, and institutional knowledge into foundation models without governance, they’re not just running up token bills. They’re giving away the operational knowledge that differentiates them. Process documentation, workflow logic, and institutional memory about why certain decisions were made are all forms of intellectual property, and once they’re encoded into a general-purpose model, the organization’s advantage from them diminishes.

Forward-deployed engineers aren’t enough on their own

Is the answer to these challenges to put a skilled engineer directly inside the customer environment to translate between what a model produces and what an organization actually needs? That’s the promise of the forward-deployed engineer (FDE) approach popularized by AI firms. Doug and Maya both had some criticisms of the model.

Maya’s objection was structural. Enterprise AI deployment isn’t a matter of adding capability on top of existing infrastructure. Organizations arrive with siloed data, legacy systems, and regulatory constraints that no forward-deployed engineer can resolve on technical skill alone. You can’t “just sprinkle some AI on it, and it’ll work just by a package of tokens,” she said. Engineers have to know the context behind why certain data can’t be used or why a particular model can’t be deployed in a regulated context. FDEs coming into an organization fresh don’t have this understanding and as a result may undo decisions that were made carefully and for reasons that aren’t written down anywhere obvious.

Doug’s concern was about communication. FDEs, in his experience, tend to arrive with strong technical instincts and limited organizational context. They get into the work quickly but struggle to communicate across the full stack of stakeholders involved. That’s why business analysts exist, to understand the customers’ problems and what the process actually is before engineers can address them. Skip that step and you get technically correct output that solves the wrong problem.

What both Maya and Doug were underscoring is that AI deployment at the enterprise level is fundamentally a context problem. The models are capable. What’s hard is knowing which capability to apply, where to do it, and with what constraints in place. That knowledge doesn’t live in the model; it lives in the people who’ve worked inside the organization long enough to know why things are the way they are.

The measurement problem

All the topics in this episode circle back to the same question: What are we actually measuring, and what incentives are we setting in place with those measurements? Token counts and lines of code don’t always correlate to the outcomes companies want. You need human expertise and a contextual knowledge of the business to figure out what goals you want to achieve and what to measure to ensure you get there.

On next Monday’s episode of This Week in AI, RecoMind founder Miguel Fierro joins host Christina Stathopoulos to discuss responsible AI, multimodal content creation, and more on how LLMs are changing personalization and user understanding. Miguel will also lead a live demo that offers a glimpse of the next generation of recommendation experiences—register here.

We’ll continue to publish our takeaways here on Radar each Friday and share full episodes on YouTube, Spotify, Apple, or wherever you get your podcasts.



Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

The latest AI news we announced in May 2026

1 Share
Here are Google’s latest AI updates from May 2026
Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Why Zig Isn’t 1.0 (Yet)

1 Share

Most programming languages follow a familiar trajectory: early experimental releases, rapid iteration, and then – at some point – a 1.0 version that signals stability and the potential for serious adoption.

Zig hasn’t followed that well-trodden path. What could be the reason?

Andrew Kelley quit his job in 2018 to build a programming language. Eight years later, Zig powers Ghostty, TigerBeetle, and Uber’s cross-compilation. It’s in the top five most admired languages on Stack Overflow. But there’s just one thing missing – a 1.0 release. For many engineers, that raises a rather obvious question: “What’s taking so long?” And perhaps more importantly: “Is that a cause for concern?”

In a recent conversation with JetBrains, Zig creator Andrew Kelley dealt with those questions pretty directly. And the answers might just surprise you!

🎥 Watch the full interview here → https://jb.gg/andrew-kelley-zig-interview

A familiar milestone with an unfamiliar definition

In most ecosystems, version 1.0 carries clear implications of stability, maturity, and a commitment to backward compatibility. It’s the signal many teams wait for before adopting a technology in production.

But as Kelley points out, that definition is less clear-cut than it might first appear. A 1.0 release, at its core, is simply a promise – a guarantee that future changes won’t break existing code. Beyond that, it says surprisingly little about whether a language is actually ready for long-term use.

Different languages have interpreted that milestone in very different ways. Some locked things down early and avoided making significant changes thereafter. Others shipped 1.0 and continued evolving rapidly under the hood. The version number stayed the same, but the language kept on changing.

Zig has taken an entirely different route.

Deliberately not shipping

What stands out in Kelley’s perspective is that Zig’s missing 1.0 isn’t an oversight or a delay – it’s a deliberate choice.

Rather than rushing to declare stability, the project is optimizing for something else: getting the fundamentals right before locking them in.

That decision becomes easier to understand when you look at how Zig is built and maintained. Unlike many modern language ecosystems, Zig isn’t backed by venture funding or driven by corporate timelines. It’s developed by a small, independent team under a nonprofit foundation, supported largely by individual donors.

That structure removes a common source of pressure for its developers.

There’s no need to hit growth targets, no requirement to ship a milestone release just for optics, and no external force pushing the project toward a premature definition of “done”. The result is a development process that can afford to be patient, and in some cases, even intentionally slow.

But that patience naturally comes with trade-offs.

The cost of waiting

There’s little doubt that a 1.0 release would accelerate Zig’s adoption. Many companies and developers use this label as a gating signal or a synonym for trust.

Kelley acknowledges this openly. When Zig eventually reaches 1.0, adoption will likely jump.

And yet, the project continues to prioritize long-term design over short-term growth.

That tension – between adoption and attention to detail – is where Zig’s approach becomes particularly interesting. Instead of asking “How quickly can we get to 1.0?”, the project is asking a different question:

“What would we regret locking in if we shipped it today?”

That unorthodox yet refreshing framing shifts the goal from speed to permanence.

A different philosophy on progress

Zooming out, this approach isn’t just about version numbers – it reflects a broader philosophy that is visible throughout Zig’s design.

Where some ecosystems embrace a “ship fast, fix later” mentality, Zig is trying to find a middle ground that will enable it to deliver powerful capabilities with minimal complexity, without accumulating long-term design debt.

It’s an approach Kelley describes as aiming to “do more with less”, which translates to finding leverage in simplicity rather than layering on features or abstractions.

That philosophy extends beyond the language itself. It shapes decisions about tooling, dependencies, and even community processes. The central theme is consistency: Avoid unnecessary complexity now, so you don’t have to support it forever.

From that perspective, Zig’s delay in reaching 1.0 isn’t hesitation – it’s restraint.

Rethinking what “ready” means

All of this raises a broader question that goes beyond Zig: What does it actually mean for a technology to be “ready”?

If 1.0 is just a compatibility promise, is it the right signal to rely on? Or has it become a proxy for something more nuanced, like trust, ecosystem maturity, or long-term stability?

Zig’s approach challenges a fundamental premise behind the question. It suggests that readiness might not be a single milestone, but the result of a series of deliberate decisions about what not to finalize too early.

Whether that approach ultimately accelerates or limits adoption remains to be seen.

Watch the full conversation

As interesting as all this has been, it’s just one thread of a much wider discussion. In the full interview, Andrew Kelley dives into topics ranging from Zig’s positioning relative to C and Rust, to its stance on AI-generated contributions, to what the next decade of programming might look like.

If you’re interested in where systems programming – and language design more broadly – might be heading, it’s well worth checking out.

Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

C++ Documentary is Now Available

1 Share

The Story of C++: The World's Most Consequential Programming Language is now available for free on YouTube.

The post C++ Documentary is Now Available appeared first on Thurrott.com.

Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

MAI-Thinking-1: Microsoft's New Reasoning Model and What It Means for Developers

1 Share

Microsoft just shipped MAI-Thinking-1, their first in-house reasoning model. If you've been watching the AI space, you know reasoning models — the kind that "think before they answer" — have become a battleground. OpenAI has o3, Anthropic has Claude with extended thinking, Google has Gemini's thinking mode. Now Microsoft is in with their own, and they built it from the ground up rather than licensing or distilling from someone else's model.

Here is what you actually need to know as a developer.

What Is MAI-Thinking-1?

MAI-Thinking-1 is Microsoft's reasoning-focused language model, developed by their internal AI lab (Microsoft AI, or MAI). It is a medium-sized model designed specifically for complex, multi-step tasks — the kind of problems where a model needs to reason through multiple steps before producing an answer, rather than just pattern-matching to a response.

The headline positioning is this: it is a smaller model that punches well above its weight class on software engineering and math benchmarks.

The Architecture: Sparse Mixture of Experts

The model is a sparse Mixture of Experts (MoE) architecture:

  • 35 billion active parameters at inference time
  • ~1 trillion total parameters across all expert layers

This distinction matters for developers. In a dense model, every parameter fires for every token. In a MoE model, only a subset of "experts" activate per token, so the active compute footprint is much smaller than the total parameter count suggests. The practical result: you get near-frontier quality reasoning at a significantly lower inference cost than a comparable dense model.

Compare that to something like GPT-4 class models which are estimated at 1.8T+ parameters (dense), and you start to see why Microsoft is calling this "mid-weight pricing."

Benchmark Performance

Microsoft reports the following numbers:

Benchmark MAI-Thinking-1 Notes
AIME 2025 97.0% Advanced math competition
AIME 2026 94.5% Most recent math competition
SWE-Bench Pro Competitive with Claude Opus 4.6 Real-world software engineering tasks
Human side-by-side Preferred over Claude Sonnet 4.6 Blind evaluation by Surge raters

The SWE-Bench Pro result is worth unpacking. SWE-Bench tests models on real GitHub issues — the model has to read a codebase, understand a bug report, and produce a patch that passes the existing test suite. It is arguably the most developer-relevant benchmark that exists right now. Matching Claude Opus 4.6 on this benchmark while running on far fewer active parameters is a meaningful result.

The human preference eval covered 1,276 tasks across single-turn and multi-turn conversations, judged by professional raters from Surge, and prioritized whether responses actually advanced the user's goals rather than just sounding good.

What Makes It Different From Other Models: Training Philosophy

Microsoft made a deliberate choice that is worth understanding because it affects how the model behaves.

No distillation from third-party models. Most smaller models are trained by learning to imitate a larger, more capable model (this is called distillation or knowledge distillation). MAI-Thinking-1 was trained without doing this. Microsoft argues that distilled models are fundamentally bound to the design choices of their teacher model and struggle to generalize to new situations. Training from scratch on their own data means the model has to genuinely learn reasoning rather than mimicking it.

Clean, licensed training data only. All pre-training data was commercially licensed, and AI-generated content was excluded from pre-training. For enterprises, this matters a lot: it affects copyright exposure and gives Microsoft better ability to explain (and improve) model behavior.

In-house training infrastructure end-to-end. From hardware co-design on Microsoft's own accelerators to the reinforcement learning framework, the entire training stack is built internally. This is what they call the "Hill-Climbing Machine" — a system where every component can be improved independently, so capabilities improve continuously rather than requiring architectural overhauls.

Developer-Relevant Features

Before you think about API calls, here is the feature set:

Context window: 256,000 tokens. That is roughly 600 pages of text. You can fit entire codebases, large contracts, or lengthy research documents in a single context. For agentic coding workflows this is essential.

Function calling / tool use. Supported. If you are building agents that need to call APIs, query databases, or interact with external services, the model can handle structured tool calls in the standard format.

System prompt / developer instructions. The model was trained to follow multi-layer instructions — meaning system prompts, user instructions, and constraints stack and interact predictably rather than the model silently ignoring one in favor of another.

Chat Completions API compatibility. This is significant. The API uses the same interface as the widely adopted OpenAI Chat Completions format. If you already have code that calls Azure OpenAI or any OpenAI-compatible endpoint, migration should require minimal changes — primarily just swapping the model name and endpoint URL.

Enterprise security via Microsoft Foundry. All MAI models come with Microsoft Foundry's compliance stack: data residency controls, audit logging, private networking options. If you are building in a regulated industry, this is the access path that gets you the compliance paperwork you need.

What Setup Will Look Like (When It's Available)

Since the model is Chat Completions API-compatible, here is what calling it will look like once you have Foundry access. The pattern is essentially identical to calling Azure OpenAI:

import openai

client = openai.AzureOpenAI(
    azure_endpoint="https://<your-foundry-endpoint>.azure.com",
    api_version="2024-12-01-preview",
    api_key="<your-foundry-api-key>"
)

response = client.chat.completions.create(
    model="mai-thinking-1",
    messages=[
        {
            "role": "system",
            "content": "You are a senior software engineer. Think step by step."
        },
        {
            "role": "user",
            "content": "Review this function and identify any edge cases: ..."
        }
    ],
    max_tokens=4096
)

print(response.choices[0].message.content)

If you are already on the Azure OpenAI SDK or any OpenAI-compatible client, this is the shape of the migration. The main difference is the endpoint URL and model name — the rest of your code stays the same.

For agentic workflows with tool calling:

tools = [
    {
        "type": "function",
        "function": {
            "name": "run_tests",
            "description": "Run the test suite and return results",
            "parameters": {
                "type": "object",
                "properties": {
                    "test_path": {
                        "type": "string",
                        "description": "Path to the test file or directory"
                    }
                },
                "required": ["test_path"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="mai-thinking-1",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

Where MAI-Thinking-1 Fits in Your Stack

If you are trying to decide whether this model is worth tracking, here is a practical breakdown by use case:

Agentic coding pipelines. This is the primary target use case. The model was trained on deterministic, executable environments with real test suites. It is built for the multi-step loop of reading code, making edits, running tests, and recovering from failures. If you are building AI-powered code review, bug fixing, or code generation pipelines, this is worth evaluating.

Complex reasoning tasks. The AIME scores put it near the top of the field for mathematical and scientific reasoning. If your application involves multi-step problem solving — financial modeling, technical analysis, research summarization with synthesis — a reasoning model like this will outperform instruction-tuned models.

Enterprise document processing. The 256k context window plus the licensing provenance story makes this a credible option for enterprises processing contracts, technical documentation, or large codebases where IP exposure and compliance are real concerns.

High-volume daily workflows. The MoE architecture and mid-weight pricing position this below frontier-cost models. If you have a use case that could benefit from strong reasoning but cannot justify the cost of running a full dense frontier model on every request, this is the price-performance sweet spot Microsoft is targeting.

The Safety Approach (And Why It Matters for Developers)

Microsoft made an interesting engineering decision on safety that is worth understanding.

Rather than treating safety as a post-hoc filter or a separate fine-tuning stage, they trained safety with the same reinforcement learning loop as capability. Unsafe compliance and unnecessary over-refusals are both treated as defects in the same reward model, weighted by potential harm severity.

The practical effect: you should see fewer situations where the model refuses legitimate developer requests (writing code that involves networking, security concepts, system administration) while still declining actually harmful requests. Microsoft explicitly calls unnecessary refusals a failure mode, not a safe default.

For developers, this means less time spent writing system prompts that work around overly cautious models.

What to Watch For

A few things to keep an eye on as this moves to public preview:

Pricing. Not yet announced publicly. The "mid-weight" positioning suggests something meaningfully below frontier model pricing, but the actual numbers will determine whether the SWE-Bench Pro performance justifies switching from existing workflows.

Regional availability. Microsoft Foundry supports multi-region deployment, but which specific Azure regions will have MAI-Thinking-1 available at launch will affect latency and data residency requirements for some use cases.

Rate limits and quota. Private previews typically have constrained throughput. Production planning should wait for public preview numbers.

Quick Reference

Model type Sparse Mixture of Experts (reasoning)
Active parameters 35B
Total parameters ~1T
Context window 256,000 tokens
API format Chat Completions (OpenAI-compatible)
Function calling Yes
Current status Private preview on Microsoft Foundry
Public access Coming soon (MAI Playground)
Early access Apply via Microsoft Foundry signup form

Links

Read the whole story
alvinashcraft
4 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Burning Out

1 Share

I have had several conversations lately about burnout. I have had three pretty high profile burnout / meltdowns on API Evangelist, beginning with the summer I spent with Isaiah, which was different than the other two, which resulted from me working with StreamData and Postman. Both the conversations I had this week about burnout emphasized preventing the fourth one, which I actually don’t agree with all that much. I think the damage can be minimized from burnout, and I can definitely make better decisions that make them less frequent, but honestly I don’t think they are avoidable and are my fault as an individual.

The speed at which my brain moves is fast. I work hard. I like it. I need it. I really enjoy working on large complex problems. I need to work and stay busy. First I need to make a living and pay the bills, but I also need to keep my brain working, otherwise it can be difficult to maintain balance. Me working a lot of hours, or producing lots of work isn’t the problem. I know when I need to put work down. I know when I am reaching the point of exhaustion and when burnout is accumulating. I am very skilled in knowing when things aren’t sustainable, and I take breaks, step away, and I rest.

Where I have burned out is when things are outside of my control shift and change. In 2016 with the summer away, it was life getting in the way, and I stepped away to help Isaiah. In 2019 I burned out, and put down API Evangelist, because I had worked myself to a point of exhaustion and the two companies I was working with, both Streamdata and Axway were playing games and moving the ground beneath my feet. The same happened at Postman. This is when I burnout. This is when things become untenable. This is when I crash. When the game changes beneath my feet and I do not have control.

While I am working to have my control in my work at Naftiko, I still operate in a capitalist market where I don’t have control over everything. I am a passionate individual and hard worker. This is always a recipe for burnout. I just see it as the way things are in a capitalist system, where value is perpetually being extracted from hard working people like me. I continue to steer myself towards situations where I have more control to minimize the extraction of value from my hard work. It is why I did API Evangelist in the first place. It is why I started Naftiko. To have more control. To hopefully minimize the ground moving under my feet.

I also think the market is extremely unpredictable and volatile today. I think Trump has made this 100x worse. This is why I invest in open storytelling, standards, and tooling, as I feel these are stabilizing nutrients in a very extract market. I tell stories to minimize burnout. I invest in standards to minimize burnout. I invest in tools to minimize burnout. I am doing all the work I can to minimize the burnout, and me driving Kin Lane into the ditch. I am not going to slow down in my work, ideas, and storytelling. I should be able to work at the speed I need. I just need to minimize the obstacles in my way, but more importantly the ground moving underneath my feet by working with people who are straightforward and honest in how they work, and will speak truth to what is going on rather than playing games.



Read the whole story
alvinashcraft
4 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories