Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
156565 stories
·
33 followers

Vibe slop is the symptom. Context debt is the disease.

1 Share
3D illustration of an exploded cube made of uneven interlocking blocks at varying heights, rendered in a gradient of purple to magenta.

Some of the engineers who made vibe coding possible have decided it’s a problem. Last month, The Wall Street Journal’s Christopher Mims interviewed Armin Ronacher and Mario Zechner, the engineers behind the Pi engine that powers OpenClaw. These engineers have been instrumental in popularizing these agentic tools, and their assessment was blunt: The tools are flooding the world with bad, sometimes dangerous code. “Eventually it will catch up to us,” Zechner told Mims. 

They’re referring to vibe slop. Code that’s bug-ridden, inefficient, hard-to-maintain software produced by someone prompting it into existence. Ronacher and Zechner talked about an overlooked aspect of vibe coding, too. Sloppy code doesn’t just break more; it burns more compute, more memory, and more bandwidth, they said. They warned some vibe-coded startups may not be able to pay their own compute bills, either. 

I largely agree with Ronacher and Zechner. But someone put a sharper name to the problem, and he did it in a product launch post.

The real debt is context

On June 2, Postman, an AI-native platform, shipped what it calls the AI Engineer, and CEO Abhinav Asthana used the launch post to make the case.

Abhinav Asthana, CEO and founder of Postman

Bad code, he argues, is the part you can see. It’s visible, so it gets the blame.

The harder problem is everything wrapped around it, the services and APIs that stack up faster than anyone can keep track of. He calls it context debt, and I’m buying what he’s selling. 

The idea goes like this. With vibe-coded platforms, systems quickly evolve from MVPs into products filled with countless APIs, services, and databases, and those pieces interact in ways nobody fully designed or understands.

Every new service, every new platform, every agent-generated change stacks on past work, and these are often lost when adding new work. Like technical debt, context debt compounds.

Unlike technical debt, you can’t refactor your way out of it because the debt isn’t in the code. It’s in what the code means and how it connects.

Unlike technical debt, you can’t refactor your way out of it because the debt isn’t in the code. It’s in what the code means and how it connects.

Most organizations still handle this the old-fashioned way, with a handful of senior engineers who keep the whole map in their heads. The org rests on the institutional knowledge of these engineers, and the process doesn’t scale. Now, when a fleet of coding agents starts building and shipping at machine speed, the rate of production outpaces the team’s ability to understand what has been built. 

That’s the trap. Fixing slop without fixing context moves the failure upstream. You start pushing code into a system no one understands anymore.

Asthana thinks the clock is short. He puts the window at six to nine months before context debt outruns most teams’ ability to manage it by hand, and he says the warning sign is already visible: early-stage startups that can’t keep their own architecture coherent. In his view, in this era of vibe coding, the smallest, fastest-moving companies are hitting the wall first.

Agenting Engineering needs a memory.

Simon Willison points to agentic engineering as a useful frame for this. The agents run in a loop, writing and executing code, while the real human skill lives in goal definition, tool prep, and verification. The human becomes the architect who defines, reviews, and integrates.

This framing defines how a person should work with an agent on a task. Context debt is the question of what happens across thousands of those tasks in one organization. An agent can verify that its code works. It has a much harder time verifying whether the API it just designed duplicates one already used by another team, or whether the contract it quietly changed breaks something nobody documented. 

Postman’s answer is what it calls a Context Graph: an always-on, continuously updated map of the APIs, services, and dependencies across a Postman organization that grounds the AI Engineer before it acts. The agents run in a sandbox and perform write operations that require a human-in-the-loop; the output integrates with the existing PR review process. Asthana describes the risk model as a junior engineer whose work can go through code review. A regular coding agent can write a function. It can’t tell which of your 17 payment APIs you’re actually supposed to use.

A regular coding agent can write a function. It can’t tell which of your 17 payment APIs you’re actually supposed to use.

This matters more at scale. The largest Context Graphs Postman mapped run well exceeded anything a senior engineer could hold in their head, Asthana told me. They mapped more than 1,100 APIs at a major U.S. telecom, more than 2,600 at a global telecom, and over 11,000 at one large tech company.

Postman has been running the agents on its own engineering team and has shared some data with me. APIs touch 68% of the company’s pull request traffic, Asthana tells The New Stack, and Postman runs the AI Engineer across the API-related work. The most consequential catch so far, he said, was a set of downstream dependency changes that likely would have cleared review but failed in production.

An out-of-date map can be worse than none, because the agent acts on it with confidence.

Postman’s bet is that ten years of customers documenting their APIs within its tools give it a head start in building that map. That’s a real advantage, with a real catch. The map is only as good as the information behind it, and plenty of companies’ records are years out of date. Asthana cites a failure from Postman’s own use: An agent didn’t know about some live systems running in another data center, so it worked from an incomplete map and got part of its analysis wrong. The failure was the map. An out-of-date map can be worse than none, because the agent acts on it with confidence.

This is a category, not a feature

Whether Postman’s AI Engineer wins or not, the framing is starting to look like a trend. Cursor and Windsurf index your repository. Claude Code reads your CLAUDE.md and knows your codebase. GitHub is integrating Copilot deeper into the dependency graph. The major coding-agent companies seem to be heading to the same spot: Context is a bottleneck just like the model.

Expect more vendors in this space to announce a context layer, and most engineering teams will end up buying into one. And when they do, it’s important to consider whether their orgs’ context exists in a form that’s readable by a machine (and not in someone’s head).

Infrastructure is becoming more critical in this new agentic world. It’s the ground on which agents need to stand. Teams need to start treating context more importantly. 

Context debt has always been around. Agents are making the problem more urgent. The Pi engineers are probably right that slop will catch up to us. The teams that get ahead of it won’t be the ones using the best model or coding agents, but the ones whose systems are available to humans and machines alike.

The post Vibe slop is the symptom. Context debt is the disease. appeared first on The New Stack.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Fresher Is Not Faster - Why Cloud Costs Refuse to Show Up in Real Time!

1 Share
As with previous blog posts, all code can be found here! The Norns sat beside the Well of Urd weaving fate into the roots of Yggdrasil. The thread was already spun long before anyone knew how the story would end, and by the time the consequences arrived, they were simply discovering decisions that had already been made. I found myself thinking about that recently (HUGE Norse mythology fan) after yet another conversation about cloud billing, because the parallels are stronger than you might...

Read the whole story
alvinashcraft
20 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Open source maintainership in the age of AI

1 Share

AI has really changed the game around software development. More people are leveraging AI than ever to contribute patches to projects they use. To me, this is a good thing as more folks will contribute patches rather than fork or not fix them. The main problem is that AI has made generating code fast but there has been very little improvement in maintaining code bases. In this post, we will highlight the ways the Kubernetes community is adapting to the world of AI assisted coding.

The first step of this journey was to develop an AI policy. This seems mundane and bureaucratic but there were many PRs that derailed into discussions around AI usage. The AI policy helps steer the conversation around the project's stance on AI and provides a clear signal to contributors on how to use these tools responsibly.

Kubernetes AI policy

The Kubernetes project has established clear guidelines for AI-assisted contributions that balance innovation with accountability. These policies are designed to maintain code quality and ensure human oversight while acknowledging that AI tools can be valuable aids in the development process.

Transparency first

Contributors must disclose when AI tools have been used to assist with a pull request. A simple statement in the PR description such as "This PR was written in part with the assistance of generative AI" is sufficient. This transparency helps reviewers understand the context and apply appropriate scrutiny.

Human accountability

While AI tools can assist, the human contributor remains fully responsible for every change. The policy explicitly prohibits:

  • Listing AI as a co-author on commits
  • Using AI co-signing on commits
  • Adding trailers like "assisted-by" or "co-developed" that attribute work to AI

This isn't about diminishing AI's role as a tool—it's about maintaining clear accountability. If something breaks, there needs to be a human who understands why and can fix it.

CLA enforcement for co-authors

The CNCF provides a tool for verifying the contributor license agreements on each pull request. AI agents are not able to solve these contributor license agreements so one enforcement the project made is to enable the CLA check for co-authors. This provides a flag to reviewers that the PR is not ready to merge.

Human engagement required

Perhaps the most critical aspect of the policy: reviewers expect to engage with humans, not with AI. Contributors cannot rely on AI to respond to review comments. If you cannot personally explain changes that AI helped generate, your PR will be closed. This requirement ensures that knowledge transfer happens and that contributors genuinely understand the code they're submitting.

Verification obligations

Contributors must verify AI-generated changes through code review, testing, and personal understanding. It's not enough for the code to work—you need to know why it works and be able to maintain it.

These policies reflect a mature approach to AI: embrace it as a tool, but never let it replace human judgment, understanding, or responsibility.

Automated AI reviews

There exist many tools to aid in reviewing code. AI pull request tools introduce governance challenges so one of the first tasks the community took on was to document the process for what is needed to bring in new AI tools. One of the major evaluation criteria for these tools is to find maintainers willing to test drive them in kubernetes-sigs repositories. Kueue, JobSet and Agent-Sandbox have been experimenting with these tools to provide more support for maintainers.

Copilot

One tool that many maintainers started using was GitHub Copilot. The CNCF provides access for maintainers so this ended up being the first tool many started using. It provides some good experience on tuning reviews but there were some growing pains with this tool. The biggest blocker for community adoption is relying on contributors to have a copilot license. Only maintainers were able to request copilot reviews and automated reviews of pull requests was out of reach for the community. One of the goals of AI review tools is to provide an automated review tool that maintainers don't need to request. This demonstrated the need for organization control rather than relying on contributors having access.

CodeRabbit

In mid 2026, the Kubernetes community has rolled out CodeRabbit to a few projects. As with copilot, some tuning has been required to provide better reviews but the overall feedback has been positive. There is a lot of configuration available for this tool and one of the most interesting uses of this tool comes from agent-sandbox.

AI pull request tools can be a quality gate. Contributors can at least get a quick spot check review without waiting for a maintainer. Agent-sandbox has added a label on PRs to reflect that there is still a need to resolve some of the comments from AI tools.

Next steps

The reality is that leveraging AI in open source projects is an area of active exploration. The community could use your help in tuning reviews tools, evaluating tools or evaluating emerging technologies in the AI space.

Some areas we are exploring more:

  • The use of AI skills to reduce maintainer burnout.
  • AI assisted triage of failing tests.
  • Skills to aid the operational aspects of Kubernetes.
Read the whole story
alvinashcraft
31 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

The performance dividend: Optimizing PostgreSQL on Azure directly in Visual Studio Code

1 Share

Poor database performance is never just a database problem. In enterprise teams, it shows up as missed service-level agreements (SLAs), delayed releases, frustrated development teams, and rising operational risk. The performance problem compounds further in business impact, often resulting in frustrated customers, retention and conversion risk, and lost revenue.

I have seen this repeatedly while working with enterprises building and running large‑scale data platforms, both as a customer and partner, and now with Microsoft. When teams are forced to jump between SQL editors, monitoring dashboards, cloud portals, and documentation just to diagnose a slow query, the real cost is not just technical. It’s also time, trust, and momentum lost across the business.

A more integrated way to run PostgreSQL on Azure

This is why I am optimistic about where PostgreSQL on Azure stands today. Microsoft’s investment in open source and PostgreSQL has matured significantly over the last several years. Azure Database for PostgreSQL has evolved into a fully managed, fully open-source enterprise-ready platform, Azure HorizonDB has entered the conversation as the next-gen Postgres on Azure delivering 3x faster performance than self-managed Postgres, and Microsoft is extending that value directly into the tools developers and database administrators (DBAs) already use. The PostgreSQL extension for Visual Studio Code is a clear example of that progress, especially with its new performance‑enhancing capabilities.

Most enterprise teams do not lack tooling. They lack integration. Performance work often breaks down because insights live in one place, actions live in another, and context is lost in between. Microsoft’s direction with the PostgreSQL extension for VS Code focuses on closing those gaps by bringing development, diagnostics, and tuning into a single workflow.

The extension is designed to help teams manage PostgreSQL across the full lifecycle, from authoring queries and exploring schemas to monitoring server health and optimizing performance. For organizations standardizing PostgreSQL on Azure, this creates a more coherent operating model that reduces friction between developers, DBAs, and platform teams.

Seeing performance clearly with the Server Metrics Dashboard

One of the most impactful additions is the server metrics dashboard. For DBAs and platform engineers, this dashboard brings key performance signals such as CPU, memory, storage, and connections directly into VS Code. Instead of switching contexts to investigate an issue, teams can view metrics where they already work.

Because the dashboard is integrated with Azure, it provides Azure‑specific telemetry and historical insights that help teams understand trends, not just snapshots. When performance issues arise, the time from detection to investigation is significantly reduced.

From insight to action with Azure Advisor in VS Code

Observability only matters if it leads to action. The PostgreSQL extension surfaces Azure Advisor recommendations directly in the editor, connecting performance insights with concrete guidance. These recommendations can include suggestions around configuration, indexing, and resource optimization based on Azure telemetry.

For enterprise teams, this shortens the feedback loop. Instead of manually correlating metrics with best practices, teams receive contextual recommendations aligned to their actual workloads. This improves operational confidence and helps standardize tuning practices across environments.

Faster diagnosis with Query Plan visualization and AI assistance

Performance tuning often comes down to understanding query behavior. Recent improvements to the extension enhance query plan visualization, making execution plans easier to interpret during troubleshooting and optimization.

Beyond visualization, Microsoft is embedding AI‑assisted query analysis and optimization directly into the workflow. Developers and DBAs can analyze query plans, understand potential bottlenecks, and explore optimization options without leaving VS Code. This does not replace deep PostgreSQL expertise, but it helps teams move faster and make better decisions earlier in the development cycle.

These capabilities are especially valuable for enterprise environments where not every developer is a PostgreSQL specialist, yet performance expectations remain high.

Better authoring experiences reduce performance issues upstream

Performance work does not start in production. It starts when schemas are designed and queries are written. The PostgreSQL extension improves this experience with schema‑aware IntelliSense, search_path‑aware query authoring, and reliable object explorer behavior for large and complex databases.

Developers can write, run, and refine SQL with better context, while DBAs benefit from more consistent and predictable interactions with large schema estates. Improvements to object explorer reliability also matter at enterprise scale, where long‑running sessions and frequent refreshes are common.

Combined with Microsoft Entra ID authentication and integrated Azure resource discovery, the extension provides a secure and governed way to work with PostgreSQL across development and production environments.

From tuning to performance payout

Taken together, these capabilities change the day‑to‑day experience of running PostgreSQL on Azure. Azure Database for PostgreSQL already delivers the managed fundamentals enterprises expect, including high availability, security, and best‑practice guidance. The PostgreSQL extension for VS Code extends that value into execution by making performance management part of the same workflow as development.

This integration is a practical differentiator. It reflects an understanding of how enterprise teams actually work and where time is lost today. Instead of adding more tools, Azure is tightening the loop between insight and action.

A look ahead: AI‑native PostgreSQL with Azure HorizonDB

As enterprises look toward AI‑native architectures, Microsoft is also introducing Azure HorizonDB in public preview. Azure HorizonDB is designed for cloud‑native, AI‑ready PostgreSQL‑compatible workloads that require advanced scalability and integrated AI capabilities.

For most production workloads today, Azure Database for PostgreSQL remains the recommended choice. Azure HorizonDB represents an adjacent, forward‑looking option for teams exploring what comes next for their AI‑powered applications.

Turning performance into a competitive advantage

The real advantage of these new capabilities is the way they come together to reduce friction, improve clarity, and help teams act faster. For enterprises managing PostgreSQL at scale, that translates directly into better reliability, faster delivery, and lower operational risk.

If you are running PostgreSQL on Azure today, now is a good time to see what this looks like in practice. Try the PostgreSQL extension for VS Code and connect it to your Postgres databases on Azure to diagnose issues faster, optimize performance with greater confidence, and keep critical workloads running the way your business and your customers expect.

Try the PostgreSQL extension for VS Code

Diagnose issues faster and optimize performance with confidence

Abstract 3D cubes and spheres floating on a blue grid background with a curved turquoise line.

The post The performance dividend: Optimizing PostgreSQL on Azure directly in Visual Studio Code appeared first on Microsoft Azure Blog.

Read the whole story
alvinashcraft
39 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

ESLint v10.6.0 released

1 Share

Highlights

New option checkRelationalComparisons in no-constant-binary-expression

ESLint v10.6.0 introduces a new option checkRelationalComparisons for the no-constant-binary-expression rule. When enabled, the rule reports relational comparisons using <, <=, >, or >= whose result is always constant based on their literal operands.

For example:

const value = "a" > "b"; // always `false`
1
while (0 <= 0) { // always `true`
    /* ... */
}
1
2
3

Rule refinements

The following rules have been tweaked to improve correctness and ensure more consistent or intuitive behavior in edge cases:

Features

Bug Fixes

Documentation

  • a83683d docs: Update README (GitHub Actions Bot)
  • f5449f9 docs: document userland patterns for global assertionOptions in RuleT… (#20986) (playgirl)
  • bea49f7 docs: Update README (GitHub Actions Bot)
  • e5f70f9 docs: update code-path diagrams (#20984) (Tanuj Kanti)
  • 8890c2d docs: add TypeScript config guidance for MCP server (#20796) (Pierluigi Lenoci)
  • 3eb3d9b docs: Update README (GitHub Actions Bot)
  • c5bb59c docs: Update README (GitHub Actions Bot)
  • eb3c97c docs: fix grammar in prefer-const rule description (#20983) (lumir)

Chores

  • 6a42034 ci: run ecosystem tests on main branch (#20891) (sethamus)
  • 3dbacdb ci: bump actions/checkout from 6 to 7 (#21014) (dependabot[bot])
  • c3abfca chore: correct JSDoc param types in html formatter (#21018) (Minseon Kim)
  • a832320 ci: split ecosystem tests into separate jobs (#21001) (xbinaryx)
  • 27166e7 chore: update ecosystem plugins (#21005) (ESLint Bot)
  • 865d76e ci: bump pnpm/action-setup from 6.0.8 to 6.0.9 (#20989) (dependabot[bot])
  • 27a88c9 chore: update dependency markdown-it to v14 in root (#20994) (Milos Djermanovic)
  • 970cea6 chore: update dependency markdown-it to v14 (#20993) (Milos Djermanovic)
  • b482120 chore: update dependency prettier to v3.8.4 (#20990) (renovate[bot])
  • 6993fb3 chore: update ecosystem plugins (#20985) (ESLint Bot)
Read the whole story
alvinashcraft
47 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

How to Build a Personal AI Web Research Agent with Ollama and Qwen

1 Share

In this tutorial, I’ll show you how to build an AI web research agent using Ollama, Qwen, and Python. The agent searches the web for a topic, fetches relevant pages, and uses a local LLM to generate a concise digest.

Table of Contents

Background

Most of us have used ChatGPT or Claude to send queries to a large language model. You've probably also seen hallucinations in the response when the model didn't know something, sometimes because its knowledge was out of date.

With the rise of tool calling, LLMs can now use tools to search the web for the latest information. They can then bring that information into context and use it to generate an output, summarize results, and extract key points from retrieved sources.

In this tutorial, I'll show you how I built a personal research agent that searches the internet for any topic and uses local LLM to summarize what it finds. It runs entirely on my own machine to preserve privacy and has no API costs. So it's completely free.

To follow this tutorial, you'll need Ollama installed on your machine and a free Ollama account. The tutorial works on macOS, Windows, and Linux. I'm using a MacBook Pro with 32 GB of RAM, but you can run this on a lower-memory machine by choosing a smaller Qwen model from Ollama.

Motivation and Architecture

The motivation behind this project is to have agents running on my machine that can handle a variety of tasks every day. I can spin off agents to create a daily digest of AI news, surface the latest world events, or look for new job postings.

Running a local LLM also means none of these queries leave my machine. My research history stays private, and there are no per-query API costs to worry about.

For this project, we'll use Ollama web search for retrieval and local Qwen LLM for summarization (rather than rely on hosted chat tools like ChatGPT or Claude). The system diagram below shows how the agent works.

When run in the terminal, the agent asks the user what they want to research. It then calls the Ollama web search API to fetch the top 5 results for the query, downloads each of those pages, and extracts the readable text.

The extracted content from all five pages is sent to the local Qwen model along with the user's prompt and a system prompt: "Use these web results and page contents to answer in Markdown format." The model's response is then saved as a Markdown file on disk.

Diagram of the process: user prompt, Ollama web search API, top 5 result URLs, requests + BeautifulSoup, clean page text,  local Qwen model via Ollama, markdown digest saved to disk.

Step 1: Install Ollama and Get an API Key

To get started, install the Ollama application and create an account to get an API key. The free tier of Ollama will suffice for this tutorial.

Once you have the key, place it in an environment variable:

export OLLAMA_API_KEY="paste-key-here"

Step 2: Pull the Qwen Model

We'll use Qwen for this tutorial, an open-weight model that's currently one of the best smaller sized models available.

I'm using the 4-billion-parameter variant because it follows structured prompts well and runs on a laptop without a dedicated GPU. There are other sizes like 2b or 9b available.

To use Qwen3.5:4b locally, install it using Ollama. The 4b model size is around 3.4 GB on my machine. If your machine has lower RAM, you can use qwen3.5:0.8b instead of the 4b model.

ollama pull qwen3.5:4b

Step 3: Install Python Dependencies

python3 -m venv venv
source venv/bin/activate
pip install ollama requests beautifulsoup4

Step 4: Write the Agent Code

The below Python code does four things: it takes a research prompt from the terminal, calls Ollama's web search API for the top 5 results, downloads the webpages using Requests and cleans each page's text using BeautifulSoup, then sends everything to a local Qwen model with an instruction to summarize in Markdown. Finally, it saves the result to a timestamped .md file.

Save the code in your research_agent.py file.

The summarization prompt is intentionally basic. Feel free to tweak it to match the kind of output you want.

import os
import json
import requests
import ollama
from bs4 import BeautifulSoup
from datetime import datetime
from pathlib import Path

API_KEY = os.getenv("OLLAMA_API_KEY")
SEARCH_URL = "https://ollama.com/api/web_search"
MODEL = "qwen3.5:4b"

# Search web using Ollama web search 
def search_web(query):
    response = requests.post(
        SEARCH_URL,
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"query": query, "max_results": 5},
        timeout=30,
    )
    response.raise_for_status()
    return response.json().get("results", [])

# Fetch full web page content
def fetch_text(url):
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
    except requests.RequestException as e:
        return ""
    soup = BeautifulSoup(response.text, "html.parser")
    for tag in soup(["script", "style", "nav", "footer"]):
        tag.decompose()
    return soup.get_text(separator="\n", strip=True)


def main():
    user_prompt = input("Enter your prompt: ").strip()
    if not user_prompt:
        print("Prompt cannot be empty.")
        return

    results = search_web(user_prompt)

    # For each url in web search result, fetch full content
    pages = []
    for item in results:
        url = item.get("url")
        if not url:
            continue

        print(f"Fetching: {url}")
        page_text = fetch_text(url)

        pages.append({
            "title": item.get("title", ""),
            "url": url,
            "snippet": item.get("content", ""),
            "page_text": page_text,
        })

    # Prompt to send to Qwen model with web data
    prompt = f"""
    User request:
    {user_prompt}

    Use these web results and page contents to answer in markdown format.

    Data:
    {json.dumps(pages, ensure_ascii=False)}
    """

    # Invoke local Qwen model 
    response = ollama.chat(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}],
    )

    digest = response.message.content

    # Build a unique filename using today's date and time
    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    filename = f"digest-{timestamp}.md"

    # Save the digest to disk
    with open(filename, "w") as f:
        f.write(digest)
    
    print(f"Saved to digest")

if __name__ == "__main__":
    main()

Step 5: Run the Agent

python research_agent.py

The script will prompt you to enter the topic you'd like to research.

Sample Output

The summarized digest is saved as a timestamped Markdown file. The agent also prints the source URLs as it fetches them.

Before trusting the summary, skim it and spot-check a claim or two against the original source. Local models are smaller than hosted frontier models and tend to hallucinate more. So spot-checking can help with accuracy.

As a test run, I asked the research agent: "What's new in LLMs" and it fetched 5 web pages as seen below:

Enter your prompt: What's new in LLMs
Fetching: https://openai.com/nl-NL/index/chatgpt-memory-dreaming/
Fetching: https://pub.towardsai.net/tai-210-glm-5-2-closes-most-of-the-open-weight-gap-in-ten-weeks-2f970c5f1326
Fetching: https://www.globenewswire.com/news-release/2026/06/23/3315999/0/en/Multiverse-Computing-Launches-Pulsar-16B-in-collaboration-with-NVIDIA-Frontier-Grade-Reasoning-at-Half-the-Parameters.html
Fetching: https://thenextweb.com/news/anthropic-claude-tag-slack-always-on-ai-teammate
Fetching: https://www.aidoers.io/blog/claude-mythos-5-and-fable-5-explained-what-anthropic-actually-shipped

Saved to digest

The digest came out reasonably well-structured for a 4B local model. It's organized into sections with all the relevant data from the sources. I spot-checked the summary and it was accurate.

Here's what it produced:

# What's New in LLMs (June 2026)

The landscape of Large Language Models (LLMs) has evolved rapidly in June 2026, with significant updates in memory synthesis, new frontier models, enterprise integrations, and market dynamics.

## 1. Memory & Personalization: OpenAI’s "Dreaming" Update
OpenAI has deployed a new memory architecture for ChatGPT, referred to as **Dreaming V3**.
*   **Purpose:** Improves memory synthesis to optimize freshness, continuity, and relevance.
*   **Evolution:**
    *   **2024:** "Saved memories" (manual instruction-based).
    *   **2025:** "Dreaming V0" (background process curating memories from chat history).
    *   **2026:** **Dreaming V3** (significantly more capable and compute-efficient architecture).
*   **Impact:** Memory is now reviewable via a summary page, allowing users to update information and set instructions on topics to bring up.
*   **Availability:** Rolled out to ChatGPT Plus and Pro users in the US today, expanding to additional countries and Free/Go users over coming weeks.
*   **Capability:** The model now remembers specific user setups (e.g., photography gear preferences) and constraints (e.g., vegetarian diet, hotel AC preferences) without requiring explicit "remember" cues.

## 2. New Frontier Models & Benchmarks

### Claude Fable 5 & Mythos 5 (Anthropic)
*   **Classification:** Mythos-class tier, sitting above Opus in raw capability.
*   **Differentiation:** **Fable 5** is available to the public. **Mythos 5** is the identical model with cybersecurity safeguards removed, restricted to **Project Glasswing** partners only.
*   **Pricing:** \(10 per million input tokens / \)50 per million output tokens.
*   **Availability:** Included at no extra cost on Pro, Max, Team, and enterprise plans until June 22.
*   **Capabilities:** Significant jumps in **Knowledge work**, **Agentic coding**, **Vision**, **Legal reasoning**, and **Biology**.

### Z.ai GLM-5.2 (Open Weights)
*   **Release:** Z.ai (Z.AI) released GLM-5.2 under an MIT license on June 16, 2026.
*   **Performance:** Closed the open-weight gap in ten weeks. Scored **51** on the Artificial Analysis Intelligence Index.
    *   **Context:** Expanded from 200K to **1 million tokens**.
    *   **Architecture:** Utilizes "IndexShare" for long-context efficiency and "Compaction-aware reinforcement learning" for agents.
*   **Benchmarks:** Ranked third on the AA-Briefcase (91 held-out tasks), behind Fable and Opus 4.8 but ahead of GPT-5.5.
*   **Cost:** ~\(0.52 per task (compared to \)0.86 for GPT-5.5 and $1.80 for Opus 4.8).

### Multiverse Pulsar 16B (NVIDIA Collaboration)
*   **Parameters:** 16.15B total parameters (3.1B active).
*   **Performance:** Delivers 30B-class intelligence at half the parameter count.
*   **Validation:** Matches 30B-class architectures (e.g., Nemotron-3-Nano-30B-A3B) on reasoning, coding, and math.
*   **Deployment:** Available on Hugging Face under Apache 2.0 license. Optimized for lower-memory GPUs and single-node environments.

## 3. Enterprise Integration & Tools

*   **Claude Tag (Anthropic):**
    *   An "always-on AI teammate" available to **Claude Enterprise and Team** customers.
    *   **Features:** Lives inside Slack, follows conversations, learns context, and uses an **ambient mode** to proactively flag updates and tasks.
    *   **Scoping:** Identity-based permissions allow admins to restrict which channels/teams the AI can access.
*   **MCP Connectors (Anthropic):**
    *   Launched **Enterprise-Managed Authorization (EMA)**.
    *   Allows IT admins to provision connector access via identity providers (Okta) without individual OAuth flows.
*   **Perplexity Brain (Computer Agent):**
    *   Research preview for Max/Enterprise Max subscribers.
    *   Self-improving memory system that remembers what the agent *did* rather than user preferences.
    *   Results show 25% increase in answer correctness on repeated tasks.

## 4. Industry Trends & Personnel Moves

*   **Market Dynamics:** ChatGPT market share dropped below 50% (46.4% by May 2026). Claude leads in subscription conversion (13%).
*   **Talent Shifts:**
    *   **Noam Shazeer:** Co-inventor of Transformer (Google) joins OpenAI as Lead for Architecture Research.
    *   **John Jumper:** Nobel Laureate (DeepMind) joins Anthropic for AI-for-science infrastructure.
*   **Corporate M&A:**
    *   **SpaceX** acquires **Cursor** (Anysphere) for **$60 Billion** in a Q3 2026 deal to strengthen its AI coding division.
    *   **Alibaba** released the **Qwen-Robot Suite** (Qwen-RobotNav, Manip, World) for embodied intelligence and robotic control.

Conclusion

In this tutorial, you learned how to build a personal AI web research agent that searches the web, summarizes results with a local LLM, and saves a Markdown digest. All this runs on your own machine with no data leaving your laptop. You have full control over the model and prompts without any API costs.

From here, you can try new prompts to research different topics, tweak the system prompt to change the output, swap in other local models like Qwen 3.6 or Mistral, or extend the script to fit your own workflow. Happy tinkering!

If you enjoyed this tutorial, you can find more of my writing on my blog (recent posts include system design paper series), my work on my personal website, and updates on LinkedIn.



Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories