Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
137356 stories
·
31 followers

Meta AI gets two new models as Meta releases Llama 4

1 Share

Meta has announced the release of Llama 4, its newest collection of AI models that now power Meta AI on the web and in WhatsApp, Messenger, and Instagram Direct. The two models, also available to download from Meta or Hugging Face now, are Llama 4 Scout, a small model capable of “fitting in a single Nvidia H100 GPU,” and Llama 4 Maverick, which is more akin to GPT-4o and Gemini 2.0 Flash. And the company says it’s in the process of training Llama 4 Behemoth, which Meta CEO Mark Zuckerberg says on Instagram is “already the highest performing base model in the world.”

According to Meta, Scout has a 10-million-token context window — the working memory of an AI model — and beats Google’s Gemma 3 and Gemini 2.0 Flash-Lite models, as well as the open-source Mistral 3.1, “across a broad range of widely reported benchmarks,” while still “fitting in a single Nvidia H100 GPU.” It makes similar claims about its larger Maverick model’s performance versus OpenAI’s GPT-4o and Google’s Gemini 2.0 Flash, and says its results are comparable to DeepSeek-V3 in coding and reasoning tasks using “less than half the active parameters,” or the variables that guide AI models’ behavior.

Visual comparison of model specs.

Meanwhile, Llama 4 Behemoth has 288 billion active parameters with 2 trillion parameters in total. The company again says Behemoth can outperform its competitors, in this case GPT-4.5 and Claude Sonnet 3.7, “on several STEM benchmarks.”

For Llama 4, Meta says it switched to a “mixture of experts” (MoE) architecture, an approach that conserves resources by using only the parts of a model that are needed for a given task. The company plans to discuss future plans for AI models and products at LlamaCon, which is taking place on April 29th.

As with its past models, Meta calls the Llama 4 collection “open-source,” although it has been criticized for its licenses’ less-than-open requirements. For instance, the Llama 4 license requires commercial entities with more than 700 million monthly active users to request a license from Meta before using its models, which the Open Source Initiative wrote in 2023 takes it “out of the category of ‘Open Source.’”

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Microsoft has created an AI-generated version of Quake

1 Share

Microsoft unveiled its Xbox AI era earlier this year with a new Muse AI model that can generate gameplay. While it looked like Muse was still an early Microsoft Research project, the Xbox maker is now allowing Copilot users to try out Muse through an AI-generated version of Quake II.

The tech demo is part of Microsoft’s Copilot for Gaming push, and features an AI-generated replica of Quake II that is playable in a browser. The Quake II level is very basic and includes blurry enemies and interactions, and Microsoft is limiting the amount of time you can even play this tech demo.

While Microsoft originally demonstrated its Muse AI model at 10fps and a 300 x 180 resolution, this latest demo runs at a playable frame rate and at a slightly higher resolution of 640 x 360. It’s still a very limited experience though, and more of hint at what might be possible in the future.

Microsoft is still positioning Muse as an AI model that can help game developers prototype games. When Muse was unveiled in February, Microsoft also mentioned it was exploring how this AI model could help improve classic games, just like Quake II, and bring them to modern hardware.

“You could imagine a world where from gameplay data and video that a model could learn old games and really make them portable to any platform where these models could run,” said Microsoft Gaming CEO Phil Spencer in February. “We’ve talked about game preservation as an activity for us, and these models and their ability to learn completely how a game plays without the necessity of the original engine running on the original hardware opens up a ton of opportunity.”

It’s clear that Microsoft is now training Muse on more games than just Bleeding Edge, and it’s likely we’ll see more short interactive AI game experiences in Copilot Labs soon. Microsoft is also working on turning Copilot into a coach for games, allowing the AI assistant to see what you’re playing and help with tips and guides. Part of that experience will be available to Windows Insiders through Copilot Vision soon.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

GeekWire Podcast with Microsoft CEO Satya Nadella on the company’s 50th anniversary

1 Share
Microsoft CEO Satya Nadella at the company’s 50th anniversary event Friday. (GeekWire Photo / Kevin Lisota)

On this episode of the GeekWire Podcast, we talk with Microsoft CEO Satya Nadella about the company’s 50th anniversary, and where it’s headed from here.

Plus, highlights from Microsoft’s 50th anniversary event in Redmond, which featured a rare joint appearance by Nadella alongside former leaders Bill Gates and Steve Ballmer.

The day also reflected Microsoft’s role in an increasingly complex global landscape, with a CNBC interview focusing in part on the impact of tariffs on the company and the global economy, and a protest outside the event condemning the use of the company’s technologies to support Israel in the ongoing war in Gaza.

Subscribe to GeekWire in Apple Podcasts, Spotify, or wherever you listen.

Related coverage:

Microsoft@50 is an independent GeekWire editorial project supported by Accenture.

More: Microsoft@50

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Claude Code and the Art of Test-Driven Development

1 Share
test tubes

While I mostly like code completion, LLM assistants have given me quite a few problems with Visual Studio Code. When changing large language model (LLM) extensions for reviews, VS Code almost had a meltdown. So while I wasn’t sure about running a code assistant that didn’t run inside the IDE, at least it wouldn’t have to play nice with VS Code.

Claude Code from Anthropic describes itself as an “agentic coding tool that lives in your terminal.” It reads your project and “streamlines your workflow.” I don’t really know what that last bit implies, but the rest of the buzzwords are fine. Claude Code is described as an evolving beta research preview, which could frankly describe every generative AI (GenAI) product out there. I’m aware that this is possibly the planned path to “vibe coding,” as it has low- or no-code interaction via the developer. But I’ll ignore that for now.

I did want to see if it could do some test-driven development (TDD), however. Rather amusingly, there has been some pushback against TDD partly because LLMs struggle with it. LLMs are pretty good at generating passing tests after the code is done; unfortunately, writing the tests afterwards means you are just marking your own homework. But I have been told that LLMs can work with TDD.

Installing Claude Code

It needs Node.js 18+, so I open up my terminal:

I’m good with that, so let’s put some coins in the slot. If you go to the Anthropic console, you can register and buy some tokens. It isn’t a massive imposition, but given I’m trying something from the lab, they could have gone a bit gentler:

The model wants to look at your project code, so I’ll start a console project with some tests and see if I can persuade it to do some TDD with me. If you look at the post I did on Codium, you will see some of the same techniques and code applied.

I open VS Code from a new directory, then use the command palette to create a new project as a Console App. I also want to use the Nunit test framework, so I add the configuration directly into my csproj.

I then define a minimal BankAccount class:

namespace BankAccount
{
    public class SavingsAccount
    {

    }
    class Program
    {
        static void Main(string[] args)
        {
            SavingsAccount account = new SavingsAccount(1000);
            account.Deposit(500);
            account.Withdraw(200);
            Console.WriteLine($"Current balance: {account.GetBalance()}");
        }
    }
}


Of course, the above won’t run — which is the point. I add in the basic test class just to make sure I have everything set up correctly:

using BankAccount; 
using NUnit.Framework; 

public class Tests { 

  [SetUp] 
  public void Setup() { } 

  [Test] public void Test1() 
  { 
    Assert.Pass(); 
  } 
}


OK, so now we can do our senior engineer bit and define the tests so that our junior (Claude) can write the code. Here are the initial tests:

using BankAccount;
using NUnit.Framework;
public class Tests
{
    [SetUp]
    public void Setup()
    {
    }

    [Test] 
    public void test_deposit() 
    { 
        SavingsAccount ba = new SavingsAccount(); 
        ba.Deposit(20); Assert.AreEqual("$20", ba.ShowBalance()); 
    } 

    [Test] 
    public void test_withdraw_more_than_balance() 
    { 
        SavingsAccount ba = new SavingsAccount(); 
        Assert.Throws<Exception>(() => ba.Withdraw(25)); 
    }
}


Now let’s complete the installation of Claude and put it to work. We move into the work directory and turn on Claude:

Which results in:

And yes, that is some nice ASCII art. Of course, I set this up with the Anthropic Console above.

We are then pushed out and into the browser. And I only just got into the terminal!

Now I’m in. Remember, this is a beta research preview.

To start, I can create a claude.md file that I can instruct Claude with. Hopefully, I can tell it here that we are doing TDD. I added a line into the generated file describing my intentions, but I have no idea whether it was effective.

After I requested it to make the tests pass, Claude wrote the code and we got passing tests:

Just to confirm, the tests did pass:

The generated code is fine. So we are doing TDD! I would say that this feels better not going on in the IDE.

Of course, this banking code is both basic and probably simple to find all over the web. So I will introduce the idea of daily interest via the tests:

[Test]
  public void test_daily_interest_rate()
  {
      SavingsAccount ba = new SavingsAccount();
      ba.SetDailyInterestRate(0.05m);
      ba.Deposit(100);
      ba.ApplyDailyInterest();
      Assert.AreEqual("$100.05", ba.ShowBalance());
  }


I save this and again ask Claude to write the missing methods. It successfully suggests the new code needed:

This is good. Obviously the savings account should come with a default daily rate, but in terms of agile coding, this progress is fine. Now if you linger, you will see a bug. I’m thinking that a daily interest rate is a percentage. But the code just straight out multiplies the balance by 0.05, which represents 5%. This would be a little high for daily interest. The bug is perfectly visible when we run the tests, though:

Think of this as TDD with a pair. I’ll tell Claude that the rate is supposed to be a percentage:

There we go. I’ll ask Claude to make the change. All good, but now we have a different bug when we run the tests:

This is just a format issue. We only want the precision of the balance to show two places after the decimal point to represent cents.

Claude understands this and makes the fix:

And now I have to actually fix my test, which doesn’t insist on the correct precision. And after I fix my test, finally we are good:

Conclusion

So I’ve proved that Claude Code can do TDD in principle. This makes me happy, as it leaves me with reasonably safe code without having to hope that my LLM pair partner understands everything. The LLM doesn’t technically understand anything; that isn’t a tool’s job. But it can follow instructions. It’s important to note that I didn’t let it run the tests — there is no need for it to loop around itself when a human can pick out a direction.

The discipline of TDD works quite well with LLM assistance, as the human developer can fix the quality barriers and define the design. In fact, this has often been how senior engineers have worked with mixed teams, so it isn’t even forming a new type of relationship. I can only hope that future LLM assistants can push forward with TDD and close the trust gap they currently suffer from.

The post Claude Code and the Art of Test-Driven Development appeared first on The New Stack.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

DOGE Is Planning a Hackathon at the IRS. It Wants Easier Access to Taxpayer Data

1 Share
DOGE operatives have repeatedly referred to the software company Palantir as a possible partner in creating a “mega API” at the IRS, sources tell WIRED.
Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Llama 4 is now on available in Azure Databricks

1 Share

We are excited to announce the availability of Meta's Llama 4 in Azure Databricks. 

As you know, enterprises all over the world already use Llama models in Azure Databricks to power AI enterprise agents, workflows, and applications. Now with Llama 4 and Azure Databricks, you can get higher quality, faster inference, and lower cost than previous models. 

Llama 4 Maverickthe highest-quality and largest Llama model from today's announcement, is built for developers building the next generation of AI products that combine multilingual fluency, image understanding precision, and security.  With Maverick on Azure Databricks, you can:

  • Build domain specific AI agents with your data
  • Run scalable inference with your data pipeline
  • Fine-tune for accuracy and
  • Govern AI usage with Mosaic AI Gateway

Azure Databricks Intelligence Platform makes it easy for you to securely connect Llama 4 to your enterprise data using Unity Catalog governed tools to build agents with contextual awareness. 

Enterprise data needs enterprise scale, whether it is to summarize documents or analyze support tickets, but without the infrastructure overhead. With Azure Databricks workflows and Llama 4 at scale, you can use SQL/Python to run LLMs at scale without overhead.

You can tune Llama 4 to your custom use case for accuracy and alignment such as assistant behavior or summarization. 

All this comes with built in security controls and compliant model usage via Azure Databricks Mosaic AI Gateway with PII detection, logging, and policy guardrails on Azure Databricks.


Llama 4 is available now in Azure Databricks. More models will become available in phases. Llama 4 Scout is coming soon and you'll be able to pick the model that fits your workload best. Learn more about Llama 4 and supported models in Azure Databricks here and get started today.

 

 

 

 

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories