Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
155149 stories
·
33 followers

DuckDuckGo Makes AI-Free Search Easier To Set as Default

1 Share

DuckDuckGo’s new No AI search extensions make AI-free search easier to set as a browser default, giving users a persistent alternative to AI-generated search results.

The post DuckDuckGo Makes AI-Free Search Easier To Set as Default appeared first on TechRepublic.

Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Windows is back on the Microsoft menu

1 Share
The Microsoft Windows logo on an illustrated background.

I can't remember the last time Microsoft kicked off a Build keynote with Windows front and center, but that's exactly what CEO Satya Nadella did this week. Nadella didn't address the issues Microsoft is trying to fix in Windows 11 but chose to woo the audience with Microsoft's slick Surface RTX Spark Dev Kit instead, calling it a "dream machine."

Nadella unveiled the new Surface hardware just days after Nvidia officially returned to Windows on Arm with its new RTX Spark chips. Both companies are talking up these chips as some kind of new beginning for PCs, and it's clear that RTX Spark will drive local AI workloads in a way that Microsoft's …

Read the full story at The Verge.

Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

The Tidy House

1 Share

DJ Patil has spent the past several months on a listening tour. Wherever he travels, he finds a local university, pings faculty and students and anyone else who wants to show up, and runs an AMA. He’s heard from grad students who can’t get callbacks, hospital administrators dealing with federal policy changes that land like a change in the laws of physics, and executives who can’t forecast their AI spending past six months. He’s trying to synthesize all of it and help reframe the wider conversation.

DJ co-coined the term “data scientist,” served as America’s first chief data scientist under President Obama, and was chief scientist at LinkedIn. He’s a longtime O’Reilly author, going back to Building Data Science Teams and Ethics and Data Science, and he’s on the founding team at Devoted Health, where he’s spent the past decade building the kind of data infrastructure most organizations are still struggling to put in place. He calls it “the tidy house.” He sat down with me to talk about the gap between what the technology can do and what most institutions can actually absorb.

The broken promise

What DJ keeps hearing on his tour is anger and angst. One word that keeps coming up is “terrified.” Workers are worried about layoffs. Meanwhile, students, including those from top-tier universities like MIT, Carnegie Mellon, and UC Berkeley, have been applying to 300+ internships and getting fewer than 10 callbacks. Many had zero offers going into the summer. And the industry’s response has been to tell them to learn more AI and burn more tokens. What it comes down to, DJ explained, is “effectively a broken promise”:

We said, “Go to college, get these things, you’re going to get an internship, you’re going to get job training, you’re going to pay off your student loans, and then you’re going to have all the other things that are part of that social contract.”

What the students are feeling for the first time [is]. . .“Wait, if I can’t get this internship, . . .I’m fundamentally off trajectory from getting this job.” And it doesn’t have to be a technical person. It could be someone that is in marketing. It could be someone that’s in the liberal arts. It could be a researcher. . . .There are plenty of students that I have talked to who are supposed to be going to a doctoral PhD program or a medical school or something like that. The slots aren’t there because of the overall budget impacts. And so whether you call it AI impact or economic reframing, the thing is broken.

This is where I’ve been trying to build a counter narrative. The story coming from the AI labs is destructive: “We’re going to put all of you out of work, and we’ll figure out the rest once the intelligence explosion arrives.” That’s bad PR for AI, but it’s also magical thinking. An economy is a circulatory system. You can’t put your customers out of work and at the same time expect that the economy will hum along as usual. A catastrophic recession could easily interrupt the funding that keeps AI on its growth path and the concentration of value that they assume will fund universal basic income and an expanded safety net.

That’s why I’m a fan of mechanism design: start from the outcome you want, then figure out the rules of the game that produces it. Right now, they’ve designed a game that concentrates all the value in the hands of AI first movers. They could be designing a game that generates value throughout the economy. But they aren’t building affordances for that.

YouTube ContentID is a good example of mechanism design leading to economic value creation. When unauthorized music use by online video creators triggered a backlash from rights holders, YouTube replied to the takedown notices with a way for both the people who owned the music and the people who wanted to use it to get paid. A whole creator economy came out of that design choice. The labs have the same opportunity in front of them and mostly aren’t taking it.

DJ had one concrete mechanism in mind:

Imagine OpenAI and Anthropic and Microsoft. . .get together and [say], “If you’re building something for your local community, we’ll fully subsidize the token cost for some period of time.”. . .We’re talking about marginal token usage relatively on the spectrum of things, but the potential innovation and use of AI to help local communities could be astounding. You’re not putting anybody out of a job with that. . . .You’re filling the holes that already exist in the system.

The OpenAI Foundation just announced it will put $1 billion into public-benefit projects this year, including $250 million aimed at building economic futures. It’s a start. But it mostly seems designed to ameliorate the bad effects of AI rather than to forestall them by building a more inclusive AI future. If the labs start investing in the human-plus-AI economy rather than just studying the job losses, the payoff to local communities could be real.

A makerspace to bridge the internship gap

DJ’s plan is to build a bridge. He’s launching a program, basically a makerspace, for students who don’t have an internship this summer. Over two four-week sprints, an initial cohort will get mentors, speakers, and the space to explore whatever they’re interested in. It doesn’t have to be AI. Whether they’re doing investigative journalism, screenwriting, or building civic tech, participants will get some experience with current tools and produce a tangible asset they can use to prove what they know. As I told DJ in our conversation, I think he’s really on to something, and I’d love O’Reilly to be part of what he’s building.

There’s a kind of person who has always been at the center of the O’Reilly community and never waited for a job description. High school dropouts who started companies. People who looked around, found something that needed doing, and did it. DJ is one of them. He’s a community college kid who learned from a good local library, from the books with the “funny animals” on the cover, and from open source. That path is still open. The early O’Reilly business came out of exactly this instinct. We were a tech-writing consulting shop, and when we ran out of paid work, we wrote manuals that didn’t exist yet but that we thought were needed. Later, when there were big conferences for every corporate technology and none for open source, we ran the first one for Perl. Conferences became a whole new business for us. You look for the gap and you fill it.

DJ pushes the same idea down to the level of the neighborhood:

If you want to feel rewarded, go fix something in your neighborhood. Go help out the food pantry. Go help out the local foster child care system. Go help out. . .parks and rec. Use those skills to go do something, and then you’re going to see. . .people respond in a different way. . . .The target-rich area for problems is massive. You just have to look.

I’ve never bought the jobless-future story. Back when I wrote WTF? in 2016, I pointed out that there is so much around us that needs to be made better. The constraint has never been a shortage of problems. AI gives us new tools for solving them. It should be a way to put people to work, not out of work.

The organization is the bottleneck

DJ has also been visiting hospitals and clinics and talking to CIOs and CTOs as part of the tour, and what he’s seeing is alarming.

The federal changes to Medicaid and the Affordable Care Act are landing on systems that were already near collapse. Hospitals that depended on outpatient procedures like colonoscopies for margin are watching volumes drop 20% to 30% because people can’t afford insurance. Some are running $1 million a day behind, a $300 to $400 million shortfall for the year.

At the same time, AI companies are telling those same hospitals to move into the new world, and partly because of the “you will soon be replaced” narrative from the AI labs, labor is responding the way the Kaiser nurses did in California, where any use of AI was off the table as a bargaining condition. As DJ pointed out, we can’t afford to disregard AI when it has the potential to automate the most painful parts of healthcare workers’ jobs and let them “do the job they’re trained for” without the administrative burden. Businesses need to change not just their narrative but their strategy. They need to be saying, “We’re going to use AI to help you do more for our customers. We’re going to make your job more human and let the machines deal with the BS.”

The constraint here is organizational capacity, not technology. The Silicon Valley default assumes that incumbents will just get disrupted by startups, the way media was by Google and Meta and retail was by Amazon. There’s some truth to that. But disruption takes much longer than people think, and in a domain this central, the delay means real harm to real people. Healthcare is a third of the economy. You can’t just let it fail and rebuild it fresh while people depend on it for survival.

There’s a version of this where the efficiencies AI creates get plowed back into better patient care. There’s also the version that’s actually happening in most places, where private equity captures the savings as profit. The difference is institutional design, and that’s where reform isn’t happening. I saw this directly with a Code for America project called Clear My Record. A California initiative had turned a number of petty crimes into misdemeanors, but very few people were petitioning to have their status changed. We started using software to streamline an absurdly convoluted criminal record expungement process, but then we asked ourselves why we were helping people fill out forms that shouldn’t exist. The law had already changed the record. The process should have been a database update, not something that required a petition to the court. That’s the kind of problem AI was born to solve. It can help us refactor old stuck processes and move to something way better.

Done right, DOGE could have been an opportunity to carry out that kind of real institutional change at scale. Instead it became a wrecking ball, and it’s given the whole idea of institutional reform a bad name.

Data infrastructure is the competitive advantage

DJ’s term for the alternative he’s living with at Devoted is “the tidy house.” He built the boring infrastructure years before LLMs existed, and that’s why the company could move the moment AI arrived.

One of the ways we’ve tried to make this work is fundamentally still data 101, unified data environments, data flows that are clean, that have a lot of organization. . . .Because we invested so heavily in that infrastructure, the dumb, boring, painful parts of making sure you’ve got a really great data warehouse, great data engineering pipes, all of the metadata that goes with it, when AI shows up, you get to use it right away. Now you get to focus on the orchestration, the harness, all those pieces.

While other organizations are reconstructing ETL inside context windows and paying for it in GPU costs, Devoted’s team gets to work on the actual clinical problems. As DJ put it, transforming a healthcare system is “like walking and chewing gum while balancing bowling balls on your head and on a unicycle,” with the laws of physics changing on you the whole time. The organizations that come through it will be the ones that did the unglamorous work of keeping clean, flowing data with its lineage and metadata intact. The ones that didn’t will keep paying to reconstruct context they should have had all along.

The pharmacists who built their own agents

The tidy house pays off when you put the tools in the hands of people who already know the domain. At Devoted, clinicians are building things without waiting for a product manager to learn the problem first. These frontline workers have already spent decades understanding it.

A pharmacist. . .says, “Hey, you know what? I’m really worried when I see these kinds of drugs show up together. That’s not a good thing. . . .Why don’t I have an agent that alerts me every time this happens? I should just automate it because maybe one of the patients gets prescribed something by another provider and we don’t see it.” So the pharmacist [says,]. . .”I’m just going to build that agent.” Now I’ve got an agent always looking for bad drug interactions. And another pharmacist says, “I’ve got my own version of that.” . . .So I say, “Hey, agent, I want you to go ask all the pharmacists that we have a quick survey of what might be happening. . . .What are the universe of things that we should be watching out for?” Now I’ve got a robust medical layer. . .looking out and protecting all of our members from bad drug interactions.

One clinician automating the thing they’d always done by hand expands to cover an entire membership of patients. Having the right infrastructure makes it possible to act on decades of accumulated judgment at the scale of the whole system.

The histogram is still the most powerful product

You don’t need exotic tooling to get value out of data, and DJ has a way of puncturing the assumption that you do.

Oftentimes, I tell people, the most powerful data product you can build is still a histogram. Just give me a distribution of what’s going on. . . .AI gives us a tremendous opportunity to let people [access this data quickly], but we’ve got to figure out the guardrails, so people don’t ask [questions] or get answers. . .[without realizing] that there’s a flaw in how they’re asking it.

We’ve been in this loop since the beginning of the data movement, DJ explained. The stewards of the data warehouse stand at the gate and say, “You shall not pass!” Then democratization breaks it open, and the gatekeepers reconstitute themselves in the next era. Hadoop did it last time. LLMs are doing it now, and the temptation to insist that only experts can use the tools correctly is as strong as it’s ever been. You do need ways to catch errors. But the goal should always be access.

The real opportunity is in the layers above AI models

That’s a new discipline forming inside computer science. We are increasingly having to engineer the trade-offs between conventional software and LLMs, when to reach for a local or open weight model, and what inference actually costs against what it returns.

Getting that right requires an expanded view of what economists call mechanism design. While this isn’t how economists talk about it, many advances in technology are really a form of mechanism design: redesigning the rules of a game to get better outcomes. Pay-per-click advertising started as a crude auction that sold to the highest bidder, and then Google refined it into something that worked. Rob McCool wired a web server to a database with CGI and ushered in a decade of invention of new mechanisms for data-driven websites. Or take Apache Kafka, which DJ reminded us began as a project to help LinkedIn rein in its Splunk bill and only later became the foundation for a company and an ecosystem.

We’re at the front of an architectural innovation cycle now, and the biggest opportunities are probably not in the models themselves but in the layers above them. That’s also where a renaissance of open source for the AI era could happen.

DJ and I are both, as he says, “this giant human LLM, summarizing and distilling all the things we’re hearing” from a lot of people. What we’re hearing is that the technology is mostly ready, but our institutions are not. What’s lagging is the organizational and economic infrastructure that lets universities, hospitals, data teams, and the labs themselves actually deploy what’s been built.

It’s time to get busy!

On June 10, Harper Reed, cofounder of 2389 Research, will join me to talk about why the future of software depends on creativity, serendipity, and building weird stuff. And on July 9, Trail of Bits cofounder and CEO Dan Guido will stop by to share his playbook for going AI native. You can register to attend them live here. You can also follow Live with Tim O’Reilly on YouTube, Spotify, Apple, or wherever you get your podcasts.



Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

GitHub Universe is back: All together now, in the agentic era

1 Share

If you’ve been following all the AI agent conversations and wondering what’s useful versus what’s just noise, you’re not alone. There are ideas everywhere. What’s challenging is finding the time and a practical path from cool demos to workflows that make your day easier. GitHub Universe bridges that gap.

Universe is our flagship event for developers and the teams who support them—builders, maintainers, security practitioners, technical leaders, and partners—coming together for two days of learning, product exploration, and collaboration. One reason to come to Universe is the packed agenda. But perhaps the most crucial reason is the energy: the magic of what happens when you’re in the same room with people who speak your language and have solved (or are currently solving) the same problems as you.

Universe 2026: All together now, in the agentic era

Software development has always been deeply collaborative. Today, that collaboration goes beyond just people, extending to tools, integrations, and agents in one unified workflow.

GitHub Universe is where that workflow clicks into place: where builders become orchestrators and ideas shaping the industry show up in unexpected places.

Throughout the two days, you’ll attend exciting keynotes, panels, and sessions. Though some of the most valuable moments may just happen in between: a hallway conversation that saves you a week of trial and error, a live demo that sparks inspiration, a workshop where you get time with a workflow you can apply to a project, and a quick chat over really good donuts that turns into a future collaboration.

You’ll leave with practical examples, new approaches, and friends you can follow up with when you’re back in the day-to-day.

What’s new this year

We’re evolving the Universe experience based on what attendees loved last year, making it easier to learn, connect, and take action.

  • A new format for Ship & Tell sessions: A fast-paced lightning talk experience where the developer community shares what they’ve built, with time for Q&A.
  • Speaker After Parties: Deeper conversations in GitHub Central, where you can ask about the work behind the talks.
  • Discussions Lounge (powered by Braindate): Attendees can suggest topics and lead small-group discussions for an easy way to tap into the collective knowledge in the room.
  • The Open Source Zone is now The Source: A bigger open source presence with more projects, and better ways to meet the people behind them.

Look back at Universe

Want a feel for the in-person energy? Last year, Universe brought the fun in all directions: giant inflatable flowers, robot cotton-candy machines, hackable badges, and a Makerspace where you could build your own Octolamp.

And that was only last year. Imagine what we have planned for 2026.

Ready to join us?

Super Early Bird passes are available now at our best price of the year. You can even bring your whole team: Save an extra 20% when you buy four or more passes.

Register before prices go up on July 9.

Additional resources

The post GitHub Universe is back: All together now, in the agentic era appeared first on The GitHub Blog.

Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Cursor cuts prices and adds enterprise spend controls amid “tokenomics” reckoning

1 Share
Abstract close-up of glowing blue and amber fiber optic strands with streaking bokeh light, suggesting high-speed data transmission and network bandwidth.

If there’s one big takeaway from the AI coding space this week, it’s that the era of flat-rate, all-you-can-code pricing is coming to an end — and the bill is arriving faster than some might have anticipated.

The clearest illustration came from GitHub, which retired Copilot’s fixed subscription model in favor of token-based billing, tying costs directly to consumption. The backlash, which was already brewing since the announcement in April, was real. Some subscribers reported projected monthly bills jumping tenfold overnight, with others characterizing the change as a bait-and-switch.

Then on Wednesday, the Linux Foundation announced plans for the Tokenomics Foundation, a new industry body backed by the likes of Google, Microsoft, Salesforce, JPMorgan Chase, and others, with a mandate to build open standards and frameworks around AI token production, consumption, and monetization — an acknowledgment that enterprises currently have no consistent, vendor-neutral way to measure or control what they owe.

Bringing visibility and control to the enterprise

Cursor, for its part, has clearly been paying attention. On Monday, the AI coding agent company restructured pricing for its Teams plan, cutting annual seat costs by 20% to $32 per user per month, while introducing a new Premium tier at $120 per month, with the promise of five times the usage of the standard seat at three times the price — explicitly targeting power users whose consumption had become hard to forecast.

Alongside this, Cursor added a dedicated usage pool for its own first-party Composer model, separate from the allowance for third-party models from the likes of Anthropic and OpenAI.

The update also includes a rebuilt spend alert feature, letting admins configure alerts based on dollar thresholds — per member or team-wide — delivered via Slack or email before an unexpected charge lands.

Spend alert
Spend alert

Fast-forward to Wednesday, and Cursor launched an enterprise governance layer aimed squarely at the IT and finance teams now responsible for keeping AI spend in check.

The new “organizations” structure lets large companies manage multiple Cursor deployments from a single dashboard, with budgets, model access, and agent permissions all configurable at the department level.

The idea is that different functions carry different risk profiles and different cost tolerances.

The idea is that different functions carry different risk profiles and different cost tolerances — a product or engineering team may warrant the full model roster and generous spending headroom, while a marketing or finance team might be locked to cheaper models, lower ceilings, and a requirement that agents get human sign-off before executing any command.

An org-level dashboard rolls up spend and token consumption across every team, filterable by user, team, or cloud agent, giving finance teams the visibility to run chargebacks by business unit.

Usage analytics by team
Usage analytics by team

Collectively, these features are designed to bring visibility and control into enterprise settings, where unwieldy AI pricing is now top of mind for CFOs across sectors.

To understand why, it helps to follow the economics of tools like Cursor.

These features are designed to bring visibility and control into enterprise settings, where unwieldy AI pricing is now top of mind for CFOs across sectors.

The wrapper squeeze

Unlike Anthropic or OpenAI, which charge for inference directly on a per-token basis, Cursor is a wrapper — it buys inference from frontier model providers at API rates and resells access to developers, historically at a flat monthly fee. That model worked when usage was modest, but it stopped working as agentic coding sessions grew longer, heavier, and far more token-hungry.

The ringfenced Composer pool is Cursor’s most telling response to that squeeze. Composer 2.5, Cursor’s own coding model, costs $0.50 per million input tokens and $2.50 per million output tokens. Claude Opus 4.7 and 4.8, by comparison, run at $5.00 input and $25.00 output — a tenfold difference on the tokens that matter most.

By giving Composer its own separate allowance, and automatically falling back to it when a user exhausts their third-party API allocation, Cursor is structurally nudging users toward cheaper inference it controls — and protecting its own margins in the process.

Cursor is structurally nudging users toward cheaper inference it controls — and protecting its own margins in the process.

This dynamic is playing out across the space. On Monday, JetBrains open-sourced Mellum2, a 12-billion-parameter coding model built for the infrastructure layer of agentic systems — routing, retrieval pipelines, and sub-agent tasks — as well as on-premises deployment in environments where hosted tools like Cursor and Claude Code can’t operate. While its predecessor, Mellum, handled code completion alone, Mellum2 is built for the broader coordination work that now defines how engineering teams deploy AI.

The method differs — Mellum2 is self-hostable, putting inference costs entirely in the hands of the team running it — but the underlying impulse is the same: reduce dependence on expensive third-party API calls.

Pricing scars

With GitHub facing the wrath of angry users this week over its Copilot overhaul, it’s worth noting that Cursor too has navigated the tricky terrain of pricing before.

In June 2025, the company launched its $200-per-month Ultra plan — made possible by multi-year volume deals with Anthropic, OpenAI, Google, and xAI. But at the same time, it switched its Pro plan from request-based to compute-based billing, a change that caught many users off guard and led to unexpected charges.

The execution of that change was rough enough that Cursor had to issue a public apology and refunds to affected users.

The moves this week are a different kind of response to the same underlying pressure. While the 2025 changes focused on restructuring Cursor’s charges and how they’re applied, this week’s updates give organizations the visibility and controls to manage what they’re already spending.

Whether it succeeds will depend partly on transparency. Cursor still doesn’t publish the actual size of its included usage pools, describing them only as “generous” — a vagueness the Tokenomics Foundation was arguably created to address.

As J.R. Storment, executive director of the FinOps Foundation, tells The New Stack, organizations currently have no consistent way to compare costs across providers or make informed decisions about AI deployment.

“Each hyperscaler and each model provider and each hardware provider will have their own approach, their own data, their own value metrics,” Storment says. “We aim to align consistent models between them as we’ve done previously.”

Until that changes, users on every platform are navigating the new token economy largely in the dark – which is why Cursor’s spend alerts, usage dashboards, and model access controls, however modest, are a step in the right direction.

The post Cursor cuts prices and adds enterprise spend controls amid “tokenomics” reckoning appeared first on The New Stack.

Read the whole story
alvinashcraft
4 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Google Gemma 4 12B nearly matches 26B benchmarks — and runs on your laptop

1 Share
persona lying on grass using laptop

Google has introduced Gemma 4 12B, a new model designed to bring high-performance, multi-modal intelligence to standard laptops. Small enough to run locally on a mere 16GB of VRAM or unified memory, the latest Gemma model is drawing enthusiasm in early community conversations where developers welcome the idea of making high performance local. 

Almost as good as Gemma 4 26B, but much smaller

Size matters. The standout quality of Google’s model released on Wednesday is that, according to the company, it performs nearly as well as Gemma 4 26B — but at less than half the total memory footprint. A look at the benchmarks does indeed show 12B neck and neck with 26B’s performance, even pushing past the older model on DocVQA (i.e., Document Visual Question Answering). 

Source: Google

Making the firepower (or close to it) of Gemma 4 26B accessible on standard, consumer-grade laptops means practically anyone can run advanced, multi-step reasoning and agentic workflows — wherever they want, offline. Before, doing so required turning to Google’s other more powerful (but heavier) Gemma variants. 

In case you missed it, in April, Google released the latest four Gemma models — what it then called “our most intelligent open models to date.” That family release included two models for personal computers (26B and 31B) and two models for mobile and IoT devices (E2B and E4B). 

Now, Gemma 4 12B sits in the middle, giving developers more juice than is available via E2B and E4B but at a lighter weight than 26B and 31B.

The star attraction: native audio inputs 

Size matters, but it isn’t everything. Another reason Gemma 4 12B is turning developers’ heads is that its unified architecture enables native audio inputs. It’s Google’s first mid-sized model to do so. 

Unlike traditional multimodal models (including the rest of Google’s own Gemma family), Gemma 4 12B doesn’t use separate encoders to translate images and audio into representations for LLM processing. Instead, as Google describes in its launch blog post, the new model passes those inputs “directly into the LLM backbone,” thereby ditching the extra latency and memory usage that usually come with encoding work. 

How so? 

For images, Gemma 4 12B uses an embedding module instead of a vision encoder, allowing the LLM itself to take over visual processing. 

Audio processing, the tech company says, is even simpler; with no audio encoder to speak of, Gemma 4 12B simply “project[s] the raw audio signal into the same dimensional space as text tokens.” 

So far, so good

Gemma 4 12B’s grand entrance to the Reddit developer communities has, so far, received a rather warm welcome. 

In r/LocalLLaMA, one Redditor dubbed it “one of the most exciting models I’ve heard about in a long time.” In particular, the unified architecture is drawing attention, with another Redditor saying, “the native audio support on a non-tiny model is by far the most exciting thing about this for me.” 

There’s not been much time to take the new model for a spin, but the enthusiasm is there: “I have a lot of use [cases] that would greatly benefit if this works even decently well,” adds another Redditor.

As far as potential drawbacks, one commenter on Hacker News calls out, what they muse, may be the model’s limited coding capabilities — word of which is, indeed, absent from Google’s announcement: “It will likely not have good performance on coding in general, compared to other small models like Qwen 3.6 35B A3B, Gemma 4 26B A4B, Nvidia Nemotron 3 Nano 30B-A3B, gpt-oss-20b.” 

Another commenter agrees: “qwen IMO is far better for coding, esp agentic coding when combined with something like Pi, it comes probably close enough to Sonnet for a lot of use cases. Gemma family is better for almost all other tasks you’d use a local llm for.”

Is the future local? 

But acing coding benchmarks, it seems, may not be the point. What’s noteworthy is Gemma 4 12B’s rather hefty performance and less-than-hefty size. 

The fact that it can run locally on standard computers means developers don’t always need to look to the cloud for high-performance intelligence, which could have profound cost implications down the line. As one Redditor puts it: “Cloud is convenient, but you’re paying per token forever, and your prompts go through someone else’s server. local = one time setup, private, zero ongoing cost.” 

Perhaps Google does think the future is local. Last September, the technology company launched Google AI Edge Gallery, stating it wanted to make the open-source app “the most inspiring and helpful showcase for on-device AI.”

By bringing near-26B performance to standard, consumer-grade laptops, Google is bringing more attention to on-device AI, and developers are here for it. 

The post Google Gemma 4 12B nearly matches 26B benchmarks — and runs on your laptop appeared first on The New Stack.

Read the whole story
alvinashcraft
4 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories