Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154751 stories
·
33 followers

Announcing the TechBash CLI: The AI-Powered Companion for Your Terminal

1 Share

As developers, our terminal is our command center. It’s where we write code, manage dependencies, and ship features. But what if your terminal could also help you plan your next dev conference, map your project's tech stack to relevant sessions, and organize your event notes?

Today, we are thrilled to announce the TechBash CLI, a developer-first complement to our event app. Built as a plugin extension for popular terminal-based AI assistants like GitHub Copilot CLI and Claude Code, the TechBash CLI connects your local development environment directly to the live TechBash 2026 session catalog.

Whether you are preparing your trip, sitting in the front row at the Kalahari Resort, or heading home to apply what you learned, the TechBash CLI has you covered.

What Can the TechBash CLI Do?

The TechBash CLI isn’t just a basic text scraper—it’s an intelligent tool designed to streamline your entire conference workflow in three phases:

1. Before TechBash: Plan Your Perfect Schedule

Instead of manually browsing through dozens of talks to see what fits your current work, the TechBash CLI inspects your actual project files. It reads configuration and dependency files—such as package.json, requirements.txt, *.csproj, or go.mod—maps your stack to relevant topics, and queries our live catalog hosted on Sessionize.

  • Match sessions to your project: Ask it, "What TechBash sessions should I attend?" and it will tailor recommendations based on your local codebase.
  • Filter by topic: "Show me the .NET / DevOps / soft-skills sessions."
  • Look up speakers or workshops: "Tell me about Mitchel Sellers" or "What's on the workshop day?"
  • Family track & Venue info: Bring the family along? Ask, "What's happening on Family Day?" or navigate seamlessly with "How do I get to Kalahari from EWR?"

2. During TechBash: Live Tracking & Easy Note-Taking

When you're on the ground in Pocono Manor, PA, the CLI turns into your live event copilot.

  • See what's on now: Ask "What's happening right now?" or "What's next in the main hall?" (Note: Room and time data populate dynamically once the final GridSmart schedule goes live!).
  • Capture notes instantly: Write your takeaways directly into your workflow without switching windows. For example: Log a note from 'Restoring Lost Work in Git': great demo of reflog. The tool automatically handles custom markdown templates for you.

3. After TechBash: Ship What You Learned

The conference doesn't end when you leave the Kalahari. The CLI helps you synthesize knowledge so you can share it with your team.

  • Review notes: Ask it to "Summarize my TechBash notes."
  • Draft reports: Tell it to "Draft a TechBash trip report" to easily justify that conference budget to your manager.

Getting Started (Quick Start)

Getting up and running takes less than a minute. No API keys are required, and all data is fetched live.

For GitHub Copilot CLI:

Open your terminal in any project folder and run:

/plugin install techbash/techbash-cli

Restart your session:

/restart

Test it out:

What TechBash sessions are relevant to my project?

For Claude Code:

Add the plugin from the marketplace:

/plugin marketplace add techbash/techbash-cli

Install the skill:

/plugin install techbash@techbash-marketplace

Scope, Architecture, & What’s Next

The TechBash CLI is highly focused on optimizing the TechBash 2026 (Oct 13–16, 2026) experience. Because it queries the Sessionize catalog live, an active network connection is required.

Behind the scenes, the skill relies on load-bearing YAML frontmatter and markdown templates (session-note.md, daily-rollup.md, trip-report.md) to deterministically manage your developer logs. Looking ahead, the architecture is slated for some exciting upgrades, including Zoho Backstage integration for sponsors and ticketing, alongside an @techbash/events-cli Node helper for even faster local search caching.

Join Us at TechBash 2026!

We built this tool because we want our community to have a seamless, developer-centric experience from the command line to the convention floor. Head over to the TechBash CLI GitHub Repository to check out the source code, view the workflow recipes, or drop some feedback in the issues section.

Haven't registered yet? Head over to techbash.com to grab your 3-Day or 4-Day passes and join us this October at the Kalahari Resort!

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

v2026.5.28

1 Share

openclaw 2026.5.28

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Ozempic May Be Reshaping the Brain, Scientists Say

1 Share
A research team found "extensive changes" on brain scans of 13 young women taking GLP-1 drugs, reports the Washington Post: Within only a few months, the brain connections in the salience network, which helps target attention, had multiplied... ["We didn't expect to see this effect, and we really don't know what it means," said an assistant professor assisting the research.] Ozempic and other GLP-1 drugs were initially understood as a metabolism breakthrough: medicines that act like hormones to control hunger, blood sugar and weight. But as researchers probe deeper into how the drugs work, early evidence suggests that GLP-1s may also be reshaping parts of the brain. Tens of millions of people are now taking the medications worldwide, turning what began as an obesity and diabetes treatment into what could be modern medicine's largest unplanned neuroscience experiments... Long before Oprah Winfrey and social media influencers helped popularize GLP-1 drugs, physician-scientist Lorenzo Leggio was studying them as a possible addiction treatment... Several major studies examining GLP-1 drugs on nicotine dependence, opioid- and cocaine-use disorders, gambling addiction and binge eating are also underway. "It's very exciting times, but we don't fully understand how it works," Leggio said... As evidence has grown that inflammation, metabolism and mental health may be far more connected than scientists once believed, researchers have become intrigued by patients who say GLP-1 drugs appear to ease anxiety, compulsive thinking and emotional distress. Daniel Drucker, a University of Toronto researcher and GLP-1 drug pioneer who receives funding from several drugmakers, said researchers are investigating the medications across a variety of psychiatric and neurological conditions, though none are approved for them. "We have so many anecdotal reports: They were treated for blood sugar and then they felt much happier. Or they took one dose of the drug and their brain fog cleared," he said. The article suggests social media complaints "raise deeper questions about what, exactly, these drugs are changing. "If GLP-1s alter the brain systems involved in reward, craving and motivation, researchers wonder, where is the line between quieting a person's destructive impulses and reshaping personality itself?"

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

‘What a joke’: Github Copilot’s new token-based billing spurs consternation among devs

1 Share
The golden age of Microsoft's Github Copilot appears to be at an end.
Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

How to Use Build5Nines.SharpVector with Microsoft Agent Framework for Local RAG in C#

1 Share
There is a moment every developer hits when building an AI agent: the demo works, the model responds, the tool calls fire, and everything feels…
Read the whole story
alvinashcraft
6 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Why GPT-5.4, Claude, and Gemini can’t agree on basic, real-world facts

1 Share

As the frontier model race accelerates, AI devotees are splitting their loyalty across the major providers at both the user and developer levels. Differences in inference are an accepted norm — but most assume that, at the highest level, frontier LLMs would agree on basic, real-world facts.

Except that’s not the case.

An analysis published this month on the claim-verification platform Lenz found that across 1,000 recent real-user fact-check claims — statements about the world asserted as true — a panel of five frontier LLMs split on 67% of them, meaning at least one model dissented from the majority verdict, or no clear majority formed at all.

Pick a verdict from a 4-bucket rubric

The five models (GPT-5.4, Claude Opus 4.7, Gemini 3 Pro, Gemini 3 Pro + Search, Sonar Pro) were each given the same real-world claim and asked to pick a verdict from a 4-bucket rubric (True / Mostly True / Misleading / False). Because only one bucket can be correct per claim, any disagreement among the panel means at least one model is label-inconsistent.

According to Lenz, the “split across these five models is intentional” because it covers the spread of inference modes that are common in production AI systems.

How many types of inference are there?

Spanning from latency-sensitive inference to throughput-aware, resource-constrained and scalable inference, inference is typically divided into low-latency high-throughput inference (e.g. for interactive chatbots) and offline or batch inference, where processes accumulate data before it is subsequently analyzed, once optimized for cost.

“Unlike the standard benchmark questions, the models have not seen these claims during training — i.e., it’s a fresh real-world corpus across science, healthcare, politics, law, and other domains.”

Research informing the May 21 paper was led by Kosta Jordanov, founder of Lenz and co-founder of Wiser, an IT consulting and software engineering group headquartered in Sofia, Bulgaria.  

Jordanov tells The New Stack that the claims his team used in the research are real claims that users have fact-checked on Lenz since February 15, 2026. 

A fresh, real-world corpus of data

“We’ve excluded private claims, near-duplicate claims, and any claims containing personally identifiable information (PII),” Joranov says. “The interesting thing about this corpus is that, unlike the standard benchmark questions, the models have not seen these claims during training — i.e., it’s a fresh real-world corpus across science, healthcare, politics, law, and other domains on topics that people care about and fact-check.” 

Beyond the 67% dissent metric, 34% of the claims are substantially disagreed on

Beyond the 67% dissent metric, 34% of the claims are substantially disagreed on (2+ buckets apart), and 21% are polar opposites (at least one model says False and at least one says True). At this level, we can start to see the path from dissent to disagreement having a real impact on live production AI systems and tools.

“If a software engineering team operates a system where legal, financial, or reputational risk is involved – and it delivers untrue or hallucinated content to users, you should think about the ways in which you validate the AI-generated content before it reaches users.” —Kosta Jordanov.

What should AI developers think about this disconnect?

The practical takeaway is that on real-world claims, a single frontier LLM gives one opinion from a visibly unstable distribution. A second model often gives another. 

“For many applications, that’s fine,” Joranov clarifies. “But if a software engineering team operates a system where legal, financial, or reputational risk is involved — and it delivers untrue or hallucinated content to users — you should think about the ways in which you validate the AI-generated content before it reaches users.” 

The question arises, then, why do frontier models converge confidently at True/False poles but fracture badly on middle-ground verdicts? Unfortunately, that’s a hard question to answer based on this research. One hypothesis Joranov puts forward is that the Mostly True and Misleading categories are a bit more ambiguous than the True and False categories. 

“What we measured, though, is that some models use the middle buckets way less often than others – Gemini is quite ‘confident’ and classified only 6% of the claims in the two middle buckets vs. 45% for Opus 4.7,” he says.

Is Anthropic especially out of line?

Looking at the potential howlers here, if Claude Opus 4.7 (which had received early criticism) aligned with the peer majority least often at 70%, should that concern Anthropic?

“Not necessarily,” clarifies Joranov. “Our limited preliminary research shows that the majority is often wrong, and sometimes we see wrong unanimous verdicts; i.e., having a different opinion than the majority does not necessarily mean being wrong.”

This research does not use any “ground truths” (indisputable real-world facts that have been widely validated and verified) and only measures the differences between the models’ verdicts. It cannot answer which model is correct for which claim.

“Our analysis [of LLM accuracy] reveals that apparent convergence in benchmark accuracy can conceal deep epistemic divergence.” – Cornell University’s Yang & Wang.

Other studies in this space

Academic and commercially underpinned model research appears to be turning to this space right now. A study by Eddie Yang and Dashun Wang at Cornell University published in February notes that benchmarks underpin how progress in large language models (LLMs) is measured and trusted. 

“Yet our analyses reveal that apparent convergence in benchmark accuracy can conceal deep epistemic divergence. Using two major reasoning benchmarks — MMLU-Pro and GPQA — we show that LLMs achieving comparable accuracy still disagree on 16-66% of items, and 16-38% among top-performing frontier models,” wrote Yang & Wang in February.

Humans in the loop next

Joranov confirms that this analysis is the first step. 

“We do plan a follow-up where we measure the models against human-provided labels, and also measure the source-based multi-step multi-model Lenz pipeline against those labels and against the frontier models,” Joranov says. “The time-consuming part is the methodologically correct labeling by human experts in all of those domains, but we aim to publish in the coming months.”

This report concluded with a statement to explain that the point of this work isn’t to create a leaderboard.

The point is to map the structure of disagreement, i.e., where do frontier panels systematically diverge from a human consensus, where does Lenz diverge from both, how each individual model and Lenz align with the same human reference, and what categories of claims drive each kind of divergence (rubric ambiguity, temporal framing, domain specialization, calibration drift).

The post Why GPT-5.4, Claude, and Gemini can’t agree on basic, real-world facts appeared first on The New Stack.

Read the whole story
alvinashcraft
6 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories