Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154749 stories
·
33 followers

Ozempic May Be Reshaping the Brain, Scientists Say

1 Share
A research team found "extensive changes" on brain scans of 13 young women taking GLP-1 drugs, reports the Washington Post: Within only a few months, the brain connections in the salience network, which helps target attention, had multiplied... ["We didn't expect to see this effect, and we really don't know what it means," said an assistant professor assisting the research.] Ozempic and other GLP-1 drugs were initially understood as a metabolism breakthrough: medicines that act like hormones to control hunger, blood sugar and weight. But as researchers probe deeper into how the drugs work, early evidence suggests that GLP-1s may also be reshaping parts of the brain. Tens of millions of people are now taking the medications worldwide, turning what began as an obesity and diabetes treatment into what could be modern medicine's largest unplanned neuroscience experiments... Long before Oprah Winfrey and social media influencers helped popularize GLP-1 drugs, physician-scientist Lorenzo Leggio was studying them as a possible addiction treatment... Several major studies examining GLP-1 drugs on nicotine dependence, opioid- and cocaine-use disorders, gambling addiction and binge eating are also underway. "It's very exciting times, but we don't fully understand how it works," Leggio said... As evidence has grown that inflammation, metabolism and mental health may be far more connected than scientists once believed, researchers have become intrigued by patients who say GLP-1 drugs appear to ease anxiety, compulsive thinking and emotional distress. Daniel Drucker, a University of Toronto researcher and GLP-1 drug pioneer who receives funding from several drugmakers, said researchers are investigating the medications across a variety of psychiatric and neurological conditions, though none are approved for them. "We have so many anecdotal reports: They were treated for blood sugar and then they felt much happier. Or they took one dose of the drug and their brain fog cleared," he said. The article suggests social media complaints "raise deeper questions about what, exactly, these drugs are changing. "If GLP-1s alter the brain systems involved in reward, craving and motivation, researchers wonder, where is the line between quieting a person's destructive impulses and reshaping personality itself?"

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
38 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

‘What a joke’: Github Copilot’s new token-based billing spurs consternation among devs

1 Share
The golden age of Microsoft's Github Copilot appears to be at an end.
Read the whole story
alvinashcraft
38 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How to Use Build5Nines.SharpVector with Microsoft Agent Framework for Local RAG in C#

1 Share
There is a moment every developer hits when building an AI agent: the demo works, the model responds, the tool calls fire, and everything feels…
Read the whole story
alvinashcraft
40 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Why GPT-5.4, Claude, and Gemini can’t agree on basic, real-world facts

1 Share

As the frontier model race accelerates, AI devotees are splitting their loyalty across the major providers at both the user and developer levels. Differences in inference are an accepted norm — but most assume that, at the highest level, frontier LLMs would agree on basic, real-world facts.

Except that’s not the case.

An analysis published this month on the claim-verification platform Lenz found that across 1,000 recent real-user fact-check claims — statements about the world asserted as true — a panel of five frontier LLMs split on 67% of them, meaning at least one model dissented from the majority verdict, or no clear majority formed at all.

Pick a verdict from a 4-bucket rubric

The five models (GPT-5.4, Claude Opus 4.7, Gemini 3 Pro, Gemini 3 Pro + Search, Sonar Pro) were each given the same real-world claim and asked to pick a verdict from a 4-bucket rubric (True / Mostly True / Misleading / False). Because only one bucket can be correct per claim, any disagreement among the panel means at least one model is label-inconsistent.

According to Lenz, the “split across these five models is intentional” because it covers the spread of inference modes that are common in production AI systems.

How many types of inference are there?

Spanning from latency-sensitive inference to throughput-aware, resource-constrained and scalable inference, inference is typically divided into low-latency high-throughput inference (e.g. for interactive chatbots) and offline or batch inference, where processes accumulate data before it is subsequently analyzed, once optimized for cost.

“Unlike the standard benchmark questions, the models have not seen these claims during training — i.e., it’s a fresh real-world corpus across science, healthcare, politics, law, and other domains.”

Research informing the May 21 paper was led by Kosta Jordanov, founder of Lenz and co-founder of Wiser, an IT consulting and software engineering group headquartered in Sofia, Bulgaria.  

Jordanov tells The New Stack that the claims his team used in the research are real claims that users have fact-checked on Lenz since February 15, 2026. 

A fresh, real-world corpus of data

“We’ve excluded private claims, near-duplicate claims, and any claims containing personally identifiable information (PII),” Joranov says. “The interesting thing about this corpus is that, unlike the standard benchmark questions, the models have not seen these claims during training — i.e., it’s a fresh real-world corpus across science, healthcare, politics, law, and other domains on topics that people care about and fact-check.” 

Beyond the 67% dissent metric, 34% of the claims are substantially disagreed on

Beyond the 67% dissent metric, 34% of the claims are substantially disagreed on (2+ buckets apart), and 21% are polar opposites (at least one model says False and at least one says True). At this level, we can start to see the path from dissent to disagreement having a real impact on live production AI systems and tools.

“If a software engineering team operates a system where legal, financial, or reputational risk is involved – and it delivers untrue or hallucinated content to users, you should think about the ways in which you validate the AI-generated content before it reaches users.” —Kosta Jordanov.

What should AI developers think about this disconnect?

The practical takeaway is that on real-world claims, a single frontier LLM gives one opinion from a visibly unstable distribution. A second model often gives another. 

“For many applications, that’s fine,” Joranov clarifies. “But if a software engineering team operates a system where legal, financial, or reputational risk is involved — and it delivers untrue or hallucinated content to users — you should think about the ways in which you validate the AI-generated content before it reaches users.” 

The question arises, then, why do frontier models converge confidently at True/False poles but fracture badly on middle-ground verdicts? Unfortunately, that’s a hard question to answer based on this research. One hypothesis Joranov puts forward is that the Mostly True and Misleading categories are a bit more ambiguous than the True and False categories. 

“What we measured, though, is that some models use the middle buckets way less often than others – Gemini is quite ‘confident’ and classified only 6% of the claims in the two middle buckets vs. 45% for Opus 4.7,” he says.

Is Anthropic especially out of line?

Looking at the potential howlers here, if Claude Opus 4.7 (which had received early criticism) aligned with the peer majority least often at 70%, should that concern Anthropic?

“Not necessarily,” clarifies Joranov. “Our limited preliminary research shows that the majority is often wrong, and sometimes we see wrong unanimous verdicts; i.e., having a different opinion than the majority does not necessarily mean being wrong.”

This research does not use any “ground truths” (indisputable real-world facts that have been widely validated and verified) and only measures the differences between the models’ verdicts. It cannot answer which model is correct for which claim.

“Our analysis [of LLM accuracy] reveals that apparent convergence in benchmark accuracy can conceal deep epistemic divergence.” – Cornell University’s Yang & Wang.

Other studies in this space

Academic and commercially underpinned model research appears to be turning to this space right now. A study by Eddie Yang and Dashun Wang at Cornell University published in February notes that benchmarks underpin how progress in large language models (LLMs) is measured and trusted. 

“Yet our analyses reveal that apparent convergence in benchmark accuracy can conceal deep epistemic divergence. Using two major reasoning benchmarks — MMLU-Pro and GPQA — we show that LLMs achieving comparable accuracy still disagree on 16-66% of items, and 16-38% among top-performing frontier models,” wrote Yang & Wang in February.

Humans in the loop next

Joranov confirms that this analysis is the first step. 

“We do plan a follow-up where we measure the models against human-provided labels, and also measure the source-based multi-step multi-model Lenz pipeline against those labels and against the frontier models,” Joranov says. “The time-consuming part is the methodologically correct labeling by human experts in all of those domains, but we aim to publish in the coming months.”

This report concluded with a statement to explain that the point of this work isn’t to create a leaderboard.

The point is to map the structure of disagreement, i.e., where do frontier panels systematically diverge from a human consensus, where does Lenz diverge from both, how each individual model and Lenz align with the same human reference, and what categories of claims drive each kind of divergence (rubric ambiguity, temporal framing, domain specialization, calibration drift).

The post Why GPT-5.4, Claude, and Gemini can’t agree on basic, real-world facts appeared first on The New Stack.

Read the whole story
alvinashcraft
40 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

JetBrains Simplifies Kotlin Multiplatform Project Structure

1 Share

JetBrains Simplifies Kotlin Multiplatform Project Structure

JetBrains announced a streamlined default structure for Kotlin Multiplatform (KMP) projects, replacing the all‑in‑one composeApp module with a shared module and separate app modules for each platform.

The shared module contains common code, while platform‑specific modules such as androidApp, desktopApp and webApp depend on it. This change clarifies module responsibilities and aligns projects with modern build conventions.

Developers can further split shared code into sharedLogic and sharedUI when using native UIs (for example, SwiftUI on iOS). Projects that include a server now gain a dedicated server module and a core module for shared models and validation logic.

The restructuring addresses issues in the previous template, which mixed multiplatform library code and application configuration, making it hard to tell where to place platform‑specific settings. It also prepares projects for Android Gradle Plugin 9.0 (AGP 9), which requires the Android entry point to be in its own module.

The new setup is already available via JetBrains’ project wizard, and migration of existing projects is optional except for AGP 9.0‑specific changes, which are mandatory for Android targets.

Developers can explore the new template at JetBrains’ KMP wizard and reference the migration guide for existing projects.

More information can be found on the official blog post:
https://blog.jetbrains.com/kotlin/2026/05/new-kmp-default-structure/

Read the whole story
alvinashcraft
41 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

#550: AI Contributions and Maintainer Load in Open Source

1 Share
You wake up, brew the coffee, open GitHub, and there it is. Another pull request on your open source project. Thirteen thousand lines added. No issue filed first. No discussion. Just "here, please review this for me."

Over the past year, GitHub activity has spiked roughly twelve times in a few short months, and a huge chunk of that signal is landing on the same small group of maintainers who were already stretched thin. The curl bug bounty got buried under AI-generated noise. Jazzband, the home of Django classics like pip-tools and the Django debug toolbar, hit what its maintainer called an "apocalypse" and started sunsetting. Even CPython just shipped fresh guidelines on AI-assisted contributions this week.

So what does all of this actually look like from the receiving end of the pull request?

On this episode, Paolo Melchiorre joins us to tell that story from inside the maintainer's chair. Paolo is a director of the Django Software Foundation, an organizer of PyCon Italy, a Django Girls coach, and he has spent the past year carefully collecting examples of how AI is reshaping open source contributions. The good, the bad, and the extra fingers.

We dig into his PyCon US talk on AI-assisted contributions and maintainer load, why AI is best understood as an amplifier rather than a new kind of contributor, the wildly different policies across 86 open source foundations, whether projects banning AI today are reacting to last year's models.

Episode sponsors

AgentField AI
Talk Python Courses

Guest
Paolo Melchiorre: github.com

DSF: www.djangoproject.com
djangonaut-space: djangonaut.space
PyCon Italia: 2026.pycon.it
uDjango: github.com
My PyCon US 2026 post: www.paulox.net
AI-Assisted Contributions and Maintainer Load: www.paulox.net
Senior Engineer Tries Vibe Coding: www.youtube.com
Code Rabbit AI PR Reviews: www.coderabbit.ai
GitHub Usage Graphs: github.blog
Update on CPython's AI Policies: fosstodon.org
High-Quality Chaos from Curl: daniel.haxx.se
The Generative AI Policy Landscape in Open Source: redmonk.com

Watch this episode on YouTube: youtube.com
Episode #550 deep-dive: talkpython.fm/550
Episode transcripts: talkpython.fm

Theme Song: Developer Rap
🥁 Served in a Flask 🎸: talkpython.fm/flasksong

---== Don't be a stranger ==---
YouTube: youtube.com/@talkpython

Bluesky: @talkpython.fm
Mastodon: @talkpython@fosstodon.org
X.com: @talkpython

Michael on Bluesky: @mkennedy.codes
Michael on Mastodon: @mkennedy@fosstodon.org
Michael on X.com: @mkennedy




Download audio: https://talkpython.fm/episodes/download/550/ai-contributions-and-maintainer-load-in-open-source.mp3
Read the whole story
alvinashcraft
41 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories