Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
149231 stories
·
33 followers

Junie CLI, the LLM-agnostic coding agent, is now in Beta

1 Share

This March, we’re taking a major step forward in the development of Junie, the coding agent by JetBrains.

Meet Junie CLI, the evolution of Junie into a fully standalone AI agent. With the upcoming release of Junie CLI, you will be able to use Junie directly from the terminal, inside any IDE, in CI/CD, and on GitHub or GitLab. Why do we call it LLM-agnostic? Junie supports all the top-performing models from OpenAI, Anthropic, Google, and Grok, and will be integrating the latest models as they are released.

Supporting all popular developer workflows, we want Junie CLI to be barrier-free from the very beginning:

  • One-click migration from other agents such as Claude Code, Codex, and others.
  • Flexible customization through guidelines, custom agents and agent skills, commands, MCP, and other agent configuration methods.
  • BYOK (Bring Your Own Key) pricing, allowing you to use your own model keys and run the agent without additional platform charges.

Note: To help you get started, we’re offering free access to Gemini 3 Flash for one week. It’s enabled by default, so you can install Junie CLI and begin using it right away at no cost. After the week, standard pricing applies. 

Bring JetBrains quality to any environment

Junie is powered by JetBrains intelligence, combining LLM capabilities with deep project context, structured understanding, and workflow awareness.

Junie demonstrates high performance across top-performing models, delivering strong benchmark results — even with fast, low-cost models like Gemini Flash 3 — while maintaining responsiveness and accuracy.

It’s designed to be:

  • LLM-agnostic and open to all high-performing models
  • Capable of solving even complex problems
  • Context-aware by default
  • Reliable and secure, supported by all required safeguards

Real-time Prompting
Work doesn’t stop while Junie runs. You can adjust instructions and add details in real time — refining outputs without restarting the process.

Codebase Intelligence

Junie isn’t just “AI in a terminal.” It’s a fully standalone agent with capabilities designed to move beyond simple prompting.

Easy MCP Configuration

Install popular MCP servers in a few clicks, with no manual configuration required. Junie can also detect when an MCP server could help with your task and recommend the most relevant option.

Next-task Prediction
By understanding the project context, Junie anticipates what you might need next. It doesn’t just react — it proactively supports your workflow, and can even remind you of things you might otherwise forget or miss.

Making the pricing model more affordable and open

We are designing the pricing in a completely new way. As usual, JetBrains AI licenses can be used to access Junie CLI. However, we believe that our first users deserve an even more transparent model.

We support BYOK (Bring Your Own Key), giving developers and teams the flexibility to choose their preferred model or easily test new ones. Whether you rely on specific providers for compliance, performance, cost management, or internal policies, Junie integrates seamlessly with your existing setup. This ensures teams can adopt Junie without compromising governance, security, or code quality.

Millions of tasks – and one agent to rule them all

We don’t work in a single environment anymore.

As a developer, you have to switch between different platforms all the time:  

  • IDEs
  • Terminals
  • Pull requests
  • CI/CD pipelines
  • Cloud platforms

Now Junie meets you where you are. By making Junie available outside JetBrains IDEs, we’re expanding from IDE-native AI to ecosystem-level AI – using one agent to connect platforms. This is a significant milestone for us and an important step toward enabling professional-level development, even outside the IDE.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Air Launches as Public Preview – A New Wave of Dev Tooling Built on 26 Years of Experience

1 Share

Download Air – free for macOS. Windows and Linux versions coming soon.

We hold a principled optimism for agentic software development – and a pragmatic one. After 26 years of building developer tools, we have a clear view of what needs to be built and a strong conviction that agents will fundamentally change how software gets made. But new concepts are emerging faster than anyone can validate them, so we’d rather ship what works than hype what might.

The current state of working with coding agents is fragmented: Each agent runs in a separate tool, with a different setup, different context, and no structural understanding of your code. Air is an important piece in solving that puzzle, and today marks the launch of its Public Preview. It’s available to developers with a JetBrains AI subscription or those with existing subscriptions to agent providers (except Anthropic) and API keys.

A real agentic development environment, not a chat window

JetBrains Air is an agentic development environment for delegating coding tasks to multiple AI agents and running them concurrently. Like IntelliJ IDEA, an IDE, Air is built on the idea of integrating the essential tools into a single coherent experience. But there’s a key difference: IDEs add tools to the code editor, while Air builds tools around the agent. The new development experience is optimized for you to guide the agent and fine-tune its output.

Air helps you navigate your codebase. You can mention a specific line, commit, class, method, or other symbol when defining a task. As a result, the agent gets precise context instead of a blob of pasted text. And when the task is done, your review doesn’t stop at the code diff – Air lets you see the changes in the context of your entire codebase, and you’ll have essential tools like a terminal, Git client, and built-in preview right in front of you. 

Let’s be honest: Complex codebases aren’t yet ready for pure agentic coding. This is where our 26 years of experience building IDEs come into play. Air focuses on agent orchestration without replacing existing development workflows. Air handles the agent-powered development; your IDE handles the rest.

Switch agents freely, run tasks concurrently

Air supports Codex, Claude Agent, Gemini CLI, and Junie out of the box. AI vendors are leapfrogging each other – Air makes switching agents across projects a natural part of the workflow, not a migration. Air supports the Agent Client Protocol (ACP) and will soon add support for other agents available via ACP through the ACP Agent Registry.

Run agents locally by default, or isolate them in Docker containers and Git worktrees for sandboxing and concurrent work.

Air helps you to avoid the mess of having multiple windows and terminal tabs open for each task. You see one task (meaning one agent session) at a time. You’ll get a notification when another task needs your attention, so you can quickly switch to it while the agent keeps working. Air then helps bring your changes from a container or worktree to your main copy.

Getting started

If you have a subscription to JetBrains AI Pro (which is included in the All Products Pack and dotUltimate) or AI Ultimate, all agents are included – just sign in with your JetBrains Account. Prefer to use your own API keys from Anthropic, OpenAI, or Google? You can bring them along! You can also use personal-use subscriptions from Google and OpenAI. If you take the BYOK (Bring Your Own Key) approach, your own keys will always be used first, and any usage not covered by those keys will default back to your JetBrains subscription. A dedicated offering for enterprises is coming soon.

Cloud execution (i.e. remote agent runs in isolated sandboxes) is in tech preview and will be available soon for Air users. 

Next step: Team collaboration

This release focuses on individual developer productivity. At the same time, we see this as a step toward a future where humans and agents collaborate more closely.

One insight we’ve gained from working with agents is that collaboration doesn’t start when reviewing agent output. It starts earlier, when defining the task itself. Teams benefit from refining and aligning on the task together before any agents get involved at all. We’ll share more on this soon.

Download Air, sign in, and start your first task. We listen to your feedback and are constantly using it to improve – join us on X or get in touch directly.





Download video: https://blog.jetbrains.com/wp-content/uploads/2026/03/main-screen-h265-10bit-crf18.mp4
Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Introducing the First Frontier Suite built on Intelligence + Trust

1 Share

Today Microsoft is announcing:

  • Wave 3 of Microsoft 365 Copilot
  • Expanded model diversity with Claude and next-gen OpenAI models available today
  • General availability of Agent 365 on May 1 for $15 per user
  • General availability of the new Microsoft 365 E7: The Frontier Suite on May 1 for $99 per user

Frontier Transformation is a holistic reimagining of business, aligning AI with human ambition to achieve an organization’s highest aspirations. It is the next evolution of AI Transformation — not only do we need to deliver efficiency and productivity, but we need to democratize intelligence and do more for humanity. Companies do not want or need more AI experimentation. They need AI that delivers real business outcomes and growth.

In my daily conversations with customers and partners, they typically question what the most important components of an AI solution are. Is it the model? Is it silicon? At Microsoft, we believe the two most essential elements of Frontier Transformation are Intelligence + Trust. Organizations need to harness their own unique work intelligence as they build agents and solutions; and all AI artifacts across their technology stack must be observed, managed and secured to ensure they are delivering value responsibly. 

Intelligence that shows up in real work 

I often say that zero-shot artifact creation is nothing more than a parlor trick. Models can reason over data, produce draft documents, presentations and spreadsheets, but they do not understand work. Real differentiation comes from intelligence — deep work context, embedded in the tools people already use. AI should amplify your intelligence but do so in a manner that protects your differentiation and unique value.

Work IQ amplifies an individual’s IQ by tapping into your organization’s IQ. It is the intelligence layer that enables Microsoft 365 Copilot and agents to know how you work, with whom you work, and the content upon which you collaborate. That is why Copilot is faster, more accurate and more trusted than solutions built on models and connectors alone.

This month, we are unleashing Work IQ with our next generation of agentic experiences in Wave 3 of Microsoft 365 Copilot in Word, Excel, PowerPoint and Outlook. Employees will have an enhanced chat experience in Copilot with the ability to create and augment artifacts, and the power to build their own agents within the canvas they work in every day.

Microsoft 365 Copilot is model diverse by design. Rather than betting on a single model, we built a system that makes every model useful at work. Customers get the choice, performance and flexibility in an open, heterogenous environment.  Copilot leverages leading models from OpenAI and Anthropic, operating openly across clouds and data services without locking customers in. Claude is now available in mainline chat in Copilot via the Frontier program, alongside the latest generation of OpenAI models.

Microsoft 365 Copilot Wave 3 is not just a singular release of new capabilities but rather a commitment to continuous innovation. We will bring frontier capabilities with enterprise promises for our customers in an open and model diverse manner. Another great example of this is Copilot Cowork, which is in research preview. Built in close collaboration with Anthropic, we are bringing the technology that powers Claude Cowork into Microsoft 365 Copilot to enable long-running, multi-step work that unfolds over time.  Learn about our Wave 3 news.

These announcements come as our customers across industries are already seeing the value of Microsoft 365 Copilot. Microsoft recently delivered its strongest quarter yet with Copilot, with paid seats growing more than 160% year over year and daily active usage up ten times, as customers increasingly make Copilot a core part of everyday work. Expansion is also accelerating as the number of customers deploying Copilot at significant scale — more than 35,000 seats — tripled year over year. Just last week, Mercedes Benz announced a global rollout of Microsoft 365 Copilot, following recent investments from NASA, Fiserv, ING, the University of Kentucky, the University of Manchester, the U.S. Department of the Interior and Westpac. This is in addition to the 90 percent of the Fortune 500 who now use Copilot.

Trust: from agent experimentation and sprawl to enterprise control 

The speed of agent development and proliferation tells us customers see value, but without guardrails the pace of adoption turns into blind spots, diminished ROI and real security risk. As AI agents become more capable and autonomous, trust is nonnegotiable. IDC predicts 1.3B agents in circulation by 2028, and 80% of the Fortune 500 are already using Microsoft agents, led by operationally complex industries like manufacturing, financial services and retail.

That is why I am excited to announce the May 1 general availability of Microsoft Agent 365, the control-plane for AI agents. Priced at $15 per user, Agent 365 gives IT and security leaders a single place to observe, govern, manage and secure agents across the organization — using the same infrastructure, applications and protections they rely on to manage people today.

We are seeing tremendous momentum with our preview customers. In just two months, tens of millions of agents have appeared in the Agent 365 Registry. We have tens of thousands of customers that are already adopting Agent 365 to securely govern and scale AI agents across enterprise workflows.

At Microsoft, we are also using Agent 365 as Customer Zero and the early signals are clear. We now have visibility into more than 500,000 agents across the company with the most widely used focused on research, coding, sales intelligence, customer triage and HR self-service. That adoption is translating into real work. Over the past 28 days alone, agents have been generating more than 65,000 responses every day for employees. This is evidence that we are not simply experimenting, we are embedding agents in the flow of everyday work and empowering human ambition.

Introducing the Frontier Suite

To meet this demand, I am thrilled to announce we are bringing Intelligence + Trust together with Microsoft 365 E7: The Frontier Suite. Microsoft 365 E7 unifies Microsoft 365 E5, Microsoft 365 Copilot and Agent 365 into a single solution powered by Work IQ and integrated with the apps and security stack customers already rely on. It includes Microsoft Entra Suite and advanced Defender, Intune and Purview security capabilities, delivering comprehensive protection across agents and employees.

Customers have told us E5 alone is no longer enough; they do not want multiple tools stitched together, they want one trusted solution. At $99 per user, E7 is priced below purchasing these capabilities à la carte, giving customers a simpler, more cost-effective way to deploy enterprise AI at scale.

With the general availability of Agent 365 and the latest agentic experiences in Microsoft 365 Copilot offered as one Frontier suite, AI moves from experimentation to durable, enterprise-wide value, built on a foundation of Intelligence + Trust. This is how we make Frontier Transformation real. Microsoft is not just imagining the future of AI, we are empowering organizations across industries and around the world to build it.

The post Introducing the First Frontier Suite built on Intelligence + Trust appeared first on The Official Microsoft Blog.

Read the whole story
alvinashcraft
13 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Secure agentic AI for your Frontier Transformation

1 Share

Today we shared the next step to make Frontier Transformation real for customers across every industry with Wave 3 of Microsoft 365 Copilot, Microsoft Agent 365, and Microsoft 365 E7: The Frontier Suite.

As our customers rapidly embrace agentic AI, chief information officers (CIOs), chief information security officers (CISOs), and security decision makers are asking urgent questions: How do I track and monitor all these agents? How do I know what they are doing? Do they have the right access? Can they leak sensitive data? Are they protected from cyberthreats? How do I govern them?

Agent 365 and Microsoft 365 E7: The Frontier Suite, generally available on May 1, 2026, are designed to help answer these questions and give organizations the confidence to go further with AI.

Agent 365—the control plane for agents

As organizations adopt agentic AI, growing visibility and security gaps can increase the risk of agents becoming double agents. Without a unified control plane, IT, security, and business teams lack visibility into which agents exist, how they behave, who has access to them, and what potential security risks exist across the enterprise. With Microsoft Agent 365 you now have a unified control plane for agents that enables IT, security, and business teams to work together to observe, govern, and secure agents across your organization—including agents built with Microsoft AI platforms and agents from our ecosystem partners—using new Microsoft Security capabilities built into their existing flow of work.

Here is what that looks like in practice:

As we are now running Agent 365 in production, Avanade has real visibility into agent activity, the ability to govern agent sprawl, control resource usage, and manage agents as identity-aware digital entities in Microsoft Entra. This significantly reduces operational and security risk, represents a critical step forward in operationalizing the agent lifecycle at scale, and underscores Microsoft’s commitment to responsible, production-ready AI.

—Aaron Reich, Chief Technology and Information Officer, Avanade

Key Agent 365 capabilities include:

Observability for every role

With Agent 365, IT, security, and business teams gain visibility into all Agent 365 managed agents in their environment, understand how they are used, and can act quickly on performance, behavior, and risk signals relevant to their role—from within existing tools and workflows.

  • Agent Registry provides an inventory of agents in your organization, including agents built with Microsoft AI platforms, ecosystem partner agents, and agents registered through APIs. This agent inventory is available to IT teams in the Microsoft 365 admin center. Security teams see the same unified agent inventory in their existing Microsoft Defender and Purview workflows.
  • Agent behavior and performance observability provides detailed reports about agent performance, adoption and usage metrics, an agent map, and activity details.
  • Agent risk signals across Microsoft Defender*, Entra, and Purview* help security teams evaluate agent risk—just like they do for users—and block agent actions based on agent compromise, sign-in anomalies, and risky data interactions. Defender assesses risk of agent compromise, Entra evaluates identity risk, and Purview evaluates insider risk. IT also has visibility into these risks in the Microsoft 365 admin center.
  • Security policy templates, starting with Microsoft Entra, automate collaboration between IT and security. They enable security teams to define tenant-wide security policies that IT leaders can then enforce in the Microsoft 365 admin center as they onboard new agents.

*These capabilities are in public preview and will continue to be on May 1.

Secure and govern agent access

Unmanaged agents may create significant risk, from accessing resources unchecked to accumulating excessive privileges and being misused by malicious actors. With Microsoft Entra capabilities included in Agent 365, you can secure agent identities and their access to resources.

  • Agent ID gives each agent a unique identity in Microsoft Entra, designed specifically for the needs of agents. With Agent ID, organizations can apply trusted access policies at scale, reduce gaps from unmanaged identities, and keep agent access aligned to existing organizational controls.
  • Identity Protection and Conditional Access for agents extend existing user policies that make real-time access decisions based on risks, device compliance from Microsoft Intune, and custom security attributes to agents working on behalf of a user. These policies help prevent compromise and help ensure that agents cannot be misused by malicious actors.
  • Identity Governance for agents enables identity leaders to limit agent access to only resources they need, with access packages that can be scoped to a subset of the users permissions, and includes the ability to audit access granted to agents.

Prevent data oversharing and ensure agent compliance

Microsoft Purview capabilities in Agent 365 provide comprehensive data security and compliance coverage for agents. You can protect agents from accessing sensitive data, prevent data leaks from risky insiders, and help ensure agents process data responsibly to support compliance with global regulations.

  • Data Security Posture Management provides visibility and insights into data risks for agents so data security admins can proactively mitigate those risks.
  • Information Protection helps ensure that agents inherit and honor Microsoft 365 data sensitivity labels so that they follow the same rules as users for handling sensitive data to prevent agent-led sensitive data leaks.
  • Inline Data Loss Prevention (DLP) for prompts to Microsoft Copilot Studio agents blocks sensitive information such as personally identifiable information, credit card numbers, and custom sensitive information types (SITs) from being processed in the runtime.
  • Insider Risk Management extends insider risk protection to agents to help ensure that risky agent interactions with sensitive data are blocked and flagged to data security admins.
  • Data Lifecycle Management enables data retention and deletion policies for prompts and agent-generated data so you can manage risk and liability by keeping the data that you need and deleting what you don’t.  
  • Audit and eDiscovery extend core compliance and records management capabilities to agents, treating AI agents as auditable entities alongside users and applications. This will help ensure that organizations can audit, investigate, and defensibly manage AI agent activity across the enterprise.
  • Communication Compliance extends to agent interactions to detect and enable human oversight of risky AI communications. This enables business leaders to extend their code of conduct and data compliance policies to AI communications.

Defend agents against emerging cyberthreats

To help you stay ahead of emerging cyberthreats, Agent 365 includes Microsoft Defender protections purpose-built to detect and mitigate specific AI vulnerabilities and threats such as prompt manipulation, model tampering, and agent-based attack chains.

  • Security posture management for Microsoft Foundry and Copilot Studio agents* detects misconfigurations and vulnerabilities in agents so security leaders can stay ahead of malicious actors by proactively resolving them before they become an attack vector.
  • Detection, investigation, and response for Foundry and Copilot Studio agents* enables the investigation and remediation of attacks that target agents and helps ensure that agents are accounted for in security investigations.
  • Runtime threat protection, investigation, and hunting** for agents that use the Agent 365 tools gateway, helps organizations detect, block, and investigate malicious agent activities.

Agent 365 will be generally available on May 1, 2026, and priced at $15 per user per month. Learn more about Agent 365.

*These capabilities are in public preview and will continue to be on May 1.

**This new capability will enter public preview in April 2026 and continue to be on May 1.

Microsoft 365 E7: The Frontier Suite

Microsoft 365 E7 brings together intelligence and trust to enable organizations to accelerate Frontier Transformation, equipping employees with AI across email, documents, meetings, spreadsheets, and business application surfaces. It also gives IT and security leaders the observability and governance needed to operate AI at enterprise scale.

Microsoft 365 E7 includes Microsoft 365 Copilot, Agent 365, Microsoft Entra Suite, and Microsoft 365 E5 with advanced Defender, Entra, Intune, and Purview security capabilities to help secure users, delivering comprehensive protection across users and agents. It will be available for purchase on May 1, 2026, at a retail price of $99 per user per month. Learn more about Microsoft 365 E7.

End-to-end security for the agentic era

Frontier Transformation is anchored in intelligence and trust, and trust starts with security. Microsoft Security capabilities help protect 1.6 million customers at the speed and scale of AI.1 With Agent 365, we are extending these enterprise-grade capabilities so organizations can observe, secure, and govern agents and delivering comprehensive protection across agents and users with Microsoft 365 E7.

Secure your Frontier Transformation today with Agent 365 and Microsoft 365 E7: The Frontier Suite. And join us at RSAC Conference 2026 to learn more about these new solutions and hear from industry experts and customers who are shaping how agents can be observed, governed, secured, and trusted in the real world.

To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity) for the latest news and updates on cybersecurity.


1Microsoft Fiscal Year 2026 Second Quarter Earnings Conference Call.

The post Secure agentic AI for your Frontier Transformation appeared first on Microsoft Security Blog.

Read the whole story
alvinashcraft
18 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Soft Forks: How Agent Skills Create Specialized AI Without Training

1 Share

Our previous article framed the Model Context Protocol (MCP) as the toolbox that provides AI agents tools and Agent Skills as materials that teach AI agents how to complete tasks. This is different from pre- or posttraining, which determine a model’s general behavior and expertise. Agent Skills do not “train” agents. They soft-fork agent behavior at runtime, telling the model how to perform specific tasks that it may need.

The term soft fork comes from open source development. A soft fork is a backward-compatible change that does not require upgrading every layer of the stack. Applied to AI, this means skills modify agent behavior through context injection at runtime rather than changing model weights or refactoring AI systems. The underlying model and AI systems stay unchanged.

The architecture maps cleanly to how we think about traditional computing. Models are CPUs—they provide raw intelligence and compute capability. Agent harnesses like Anthropic’s Claude Code are operating systems—they manage resources, handle permissions, and coordinate processes. Skills are applications—they run on top of the OS, specializing the system for specific tasks without modifying the underlying hardware or kernel.

Agentic AI abstractions
Figure 1: Agentic AI abstractions. Source: SkillsBench.ai

You don’t recompile the Linux kernel to run a new application. You don’t rearchitect the CPU to use a different text editor. You install a new application on top, using the CPU’s intelligence exposed and orchestrated by the OS. Agent Skills work the same way. They layer expertise on top of the agent harness, using the capabilities the model provides, without updating models or changing harnesses.

This distinction matters because it changes the economics of AI specialization. Fine-tuning demands significant investment in talent, compute, data, and ongoing maintenance every time the base model updates. Skills require only Markdown files and resource bundles.

How soft forks work

Skills achieve this through three mechanisms—the skill package format, progressive disclosure, and execution context modification.

The skill package is a folder. At minimum, it contains a SKILL.md file with frontmatter metadata and instructions. The frontmatter declares the skill’s name, description, allowed-tools, and versions, followed by the actual expertise: context, problem solving approaches, escalation criteria, and patterns to follow.

Frontmatter for Anthropic's skill-creator package
Figure 2. Frontmatter for Anthropic’s skill-creator package. The frontmatter lives at the top of Markdown files. Agents choose skills based on their descriptions.

The folder can also include reference documents, templates, resources, configurations, and executable scripts. It contains everything an agent needs to perform expert-level work for the specific task, packaged as a versioned artifact that you can review, approve, and deploy as a .zip file or .skill file bundle.

Individual skill object
Figure 3. A Skill Object for Anthropic’s skill-creator. skill-creator contains SKILL.md, LICENSE.txt, Python scripts, and reference files.

Because the skill package format is just folders and files, you can use all the tooling we have built for managing code—track changes in Git, roll back bugs, maintain audit trails, and all of the best practices of software engineering development life cycle. This same format is also used to define subagents and agent teams, meaning a single packaging abstraction governs individual expertise, delegated workflows, and multi-agent coordinations alike.

Progressive disclosure keeps skills lightweight. Only the frontmatter of SKILL.md loads into the agent’s context at session start. This respects the token economics of limited context windows. The metadata contains name, description, model, license, version, and very importantly allowed-tools. The full skill content loads only when the agent determines relevance and decides to invoke it. This is similar to how operating systems manage memory; applications load into RAM when launched, not all at once. You can have dozens of skills available without overwhelming the model’s context window, and the behavioral modification is present only when needed, never permanently resident.

Agent Skill execution flow
Figure 4. Agent Skill execution flow. At session start, only frontmatter is loaded. Once the agent chooses a skill, it reads the full SKILL.md and executes with the skill’s permissions.

Execution context modification controls what skills can do. When agents invoke a skill, the permission system changes to the scope of the skill’s definition, specifically, model and allowed-tools declared in its frontmatter. It reverts after execution completes. A skill could use a different model and a different set of tools from the parent session. This sandboxed the permission environment so skills get only scoped access, not arbitrary system control. This ensures the behavioral modification operates within boundaries.

This is what separates skills from earlier approaches. OpenAI’s custom GPTs and Google’s Gemini Gems are useful but opaque, nontransferable, and impossible to audit. Skills are readable because they are Markdown. They are auditable because you can apply version control. They are composable because skills can stack. And they are governable because you can build approval workflows and rollback capability. You can read a SKILL.md to understand exactly why an agent behaves a certain way.

What the data shows

Building skills is easy with coding agents. Knowing whether they work is the hard part. Traditional software testing does not apply. You cannot write a unit test asserting that expert behavior occurred. The output might be correct while reasoning was shallow, or the reasoning might be sophisticated while the output has formatting errors.

SkillsBench is a benchmarking effort and framework designed to address this. It uses paired evaluation design where the same tasks are evaluated with and without skill augmentation. The benchmark contains 85 tasks, stratified across domains and difficulty levels. By comparing the same agent on the same task with the only variable being the presence of a skill, SkillsBench isolates the causal effect of skills from model capability and task difficulty. Performance is measured using normalized gain, the fraction of possible improvement the skill actually captured.

The findings from SkillsBench challenge our presumption that skills universally improve performance.

Skills improve average performance by 13.2 percentage points. But 24 of 85 tasks got worse. Manufacturing tasks gained 32 points. Software engineering tasks lost 5. The aggregate number hides variances that domain-level evaluation reveals. This is precisely why soft forks need evaluation infrastructure. Unlike hard forks where you commit fully, soft forks let you measure before you deploy widely. Organizations should segment evaluations by domains and by tasks and test for regression, not just improvements. As an example, what improves document processing might degrade code generation.

Compact skills outperform comprehensive ones by nearly 4x. Focused skills with dense guidance showed +18.9 percentage point improvement. Comprehensive skills covering every edge case showed +5.7 points. Using two to three skills per task is optimal, with four or more showing diminishing returns. The temptation when building skills is to include everything. Every caveat, every exception, every piece of relevant context. Resist it. Let the model’s intelligence do the work. Small, targeted behavioral changes outperform comprehensive rewrites. Skill builders should start with minimum viable guidance and add detail only when evaluation shows specific gaps.

Models cannot reliably self-generate effective skills. SkillsBench tested a “bring your own skill” condition where agents were prompted to generate their own procedural knowledge before attempting tasks. Performance stayed at baseline. Effective skills require human-curated domain expertise that models cannot reliably produce for themselves. AI can help with packaging and formatting, but the insight has to come from people who actually have the expertise. Human-labeled insight is the bottleneck of building effective skills, not the packaging or deployment.

Models cannot reliably self-generate effective skills
Figure 5. Models cannot reliably self-generate effective skills without human feedback and verifications.

Skills can partially substitute for model scale. Claude Haiku, a small model, with well-designed skills achieved a 25.2% pass rate. This slightly exceeded Claude Opus, the flagship model, without skills at 23.6%. Packaged expertise compensates for model intelligence on procedural tasks. This has cost implications: Smaller models with skills may outperform larger models without them at a fraction of the inference cost. Soft forks democratize capability. You do not need the biggest model if you have the right expertise packaged.

Skills can partially substitute for model scale
Figure 6. Skills improve model performance and close the gap between small and large models.

Open questions

Many challenges remain unresolved. What happens when multiple skills conflict with each other during a session? How should organizations govern skill portfolios when teams each deploy their own skills onto shared agents? How quickly does encoded expertise become outdated, and what refresh cadence keeps skills effective without creating maintenance burden? Skills inherit whatever biases exist in their authors’ expertise, so how do you audit that? And as the industry matures, how should evaluation infrastructure such as SkillsBench scale to keep pace with the growing complexity of skill augmented systems?

These are not reasons to avoid skills. They are reasons to invest in evaluation infrastructure and governance practices alongside skill development. The capability to measure performance must evolve in lockstep with the technology itself.

Agent Skills advantage

Fine-tuning models for a single use case is no longer the only path to specialization. It demands significant investment in talent, compute, and data and creates a permanent divergence that requires reevaluation and potential retraining every time the base model updates. Fine-tuning across a broad set of capabilities to improve a foundation model remains sound, but fine-tuning for one narrow workflow is exactly the kind of specialization that skills can now achieve at a fraction of the cost.

Skills are not maintenance free. Just as applications sometimes break when operating systems update, skills need reevaluation when the underlying agent harness or model changes. But the recovery path is lighter: update the skills package, rerun the evaluation harness, and redeploy rather than retrain from a new checkpoint.

Mainframes gave way to client-server. Monoliths gave way to microservices. Specialized fine-tuned models are now giving way to agents augmented by specialized expertise artifacts. Models provide intelligence, agent harnesses provide runtime, skills provide specialization, and evaluation tells you whether it all works together.



Read the whole story
alvinashcraft
25 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Coming Off the Bench for Bluesky

1 Share

I'm excited to tell you that I will serve as interim CEO at Bluesky, a company whose mission I believe in deeply.

I've been a partner at True Ventures for many years, and one of the great privileges of that job is getting a front-row seat to companies that are trying to do something genuinely hard. Bluesky is one of those companies, and when the moment came to contribute in a more hands-on fashion, the timing felt right.

How I Got Here

I've spent most of my career working on open platforms, from WordPress and Automattic to the Yahoo Developer Network and to open marketplace businesses like Bandcamp that we backed at True. What I've learned is that openness is not just a technical choice, it's a philosophical one. Decisions about who controls the network, who owns the data, and who captures the value shape what the internet becomes.

I first met Jay Graber and Rose Wang (Bluesky's CEO and COO) about two years ago, while I was back at Automattic, serving as their interim CEO. Automattic was a seed investor in Bluesky, so an introduction made sense. What I didn't expect was how quickly that first conversation would turn into conviction.

I'll be honest: I was skeptical about decentralized social. The vision was always compelling. A social web that no single company controls, where users own their identity and their relationships, where anyone can build on top of the protocol. But I'd seen enough promising decentralized projects fade or fragment that I had stopped expecting one to get to scale.

Bluesky changed that. Hearing their vision and, more importantly, learning about the architecture they'd built (the AT Protocol) I became a believer. This was a real, scalable foundation for a different kind of internet.

Over the last two years I've been an investor and an advisor to Bluesky, a fan cheering the team on as they pulled off something many said was impossible.

What Bluesky Has Built

Bluesky has cracked a case that stumped the industry for years: How to create a social network that has the best of both worlds. The personal freedom and ownership that comes from being part of an open network and the immediacy and ease of use that people expect from modern social services.

That's not a small thing. A lot of people said you had to choose one or the other.

At Bluesky, a small and extraordinarily talented team has signed up over 40 million people, nurtured an open developer ecosystem with over 500 active apps, and scaled all the systems that make that experience smooth and possible: A consumer app, servers, on-boarding, moderation, safety, the list goes on. And they've done it while staying true to the open protocol underneath. Now it's time to build on that foundation and deliver more open goodness to the world.

Thank You, Jay

None of this would exist without Jay Graber. She had the vision, recruited the team, and drove the execution that got Bluesky to where it is today. I'm grateful for the trust she is placing in me to step in during this transition, and I'm excited to support her in her next chapter as Bluesky's Chief Innovation Officer. Her focus on the long-term architecture and vision of the protocol will propel us forward into exciting new territory.

What I'm Here to Do

For the Bluesky team: My job is to support you, not to change what's working. You've built something genuinely special, and my goal is to make sure you have everything you need to keep doing that.

For users: The commitment to an open, user-controlled social web isn't going anywhere. You own your identity, your data, your graph. If anything, we're doubling down.

For developers and atproto app builders: You are a core part of what makes this ecosystem work. Open platforms only thrive when third-party builders can trust them. We will continue to work on earning that trust and moving towards a fully decentralized system.

For anyone thinking about joining the Bluesky team: This is a rare moment. A platform with real technical foundations for decentralization, a passionate and growing community, and a lot of important and meaningful work still ahead. Come help build it!

PS: My role as interim CEO will be to help set up Bluesky's next phase of growth. While doing this work, I will remain active in my role as partner at True Ventures.

Read the whole story
alvinashcraft
34 seconds ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories