Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
147592 stories
·
33 followers

What's new in Viva Insights

1 Share

As we move further into 2026, we’re excited to announce a variety of new powerful tools and reporting capabilities for Viva Insights that make it easier to understand how your organization's adoption and use of Microsoft 365 Copilot compares to other uses of it both within and outside your company. You can also now learn more about the adoption and impact of agents built in Microsoft Copilot Studio, and there are new functionalities to share and customize reports more broadly across your organization. Let's dive in.

Agent Dashboard 

We’re thrilled to announce the initial rollout of the Agent Dashboard, a powerful new functionality for the Viva Insights web app that provides leaders and analysts with actionable insights into agent adoption. With this new dashboard, users can dive into Copilot Credit usage – which measures how agents are used – and identify opportunities to optimize and track agent adoption over time. 

To start, the dashboard covers adoption metrics aggregated at the user level for agents used within Microsoft 365 Copilot.  

Read more about the Agent Dashboard here on the Microsoft 365 Copilot blog. 

The Agent Dashboard is currently rolling out to public preview customers only. When you’re ready to start using it, learn how using our guide on MS Learn. 

Benchmarks in Copilot Dashboard 

The Copilot Dashboard has been focused on providing actionable insights into Copilot readiness, adoption, and impact trends for specific groups within the organization. Now, with benchmarks in the Copilot Dashboard, users can also see how their adoption compares to others, either within their organization, or with other companies.  

Benchmarks in the Copilot Dashboard provide context around Copilot adoption trends, so users can compare usage across internal cohorts, or see how their adoption of Copilot compares to similar organizations.  

Read more about benchmarks in the Copilot Dashboard here on the Microsoft 365 Copilot blog. 

Export Copilot metrics from the Copilot Dashboard  

We’re excited to introduce the initial public preview rollout of Copilot metrics export from the Copilot Dashboard. This new capability gives organizations greater flexibility to analyze the usage of both Microsoft 365 Copilot and Copilot Chat beyond the dashboard across the past six months at the de-identified user level, with user identifiers removed. 

With this export tool, leaders and analysts with access to the global scope dashboard can download the data directly to support Copilot initiatives, such as tracking adoption and usage trends over time, or combining it with other data sources for custom analysis and reporting. 

Learn more about how to use the export feature on MS Learn. 

Launch of Copilot Studio agents report 

In an exciting expansion of our reporting tools measuring the adoption and impact of Microsoft 365 Copilot, the Copilot Studio agents report is now broadly available. This powerful new Power BI template allows users to learn more about the impact of agents built in Microsoft Copilot Studio, and how their deployed across channels in the organization. 

This report can help users answer questions like: 

  • What are the top agents being used? 
  • What are top agents' high-level KPI-like sessions, satisfaction scores, and success rates? 
  • What is the impact of individual agents (conversational and autonomous), such as the split of engaged sessions and topics against actions and triggers distribution, as well as maker-led inputs? 

The report provides insights about agents built in Microsoft Copilot Studio that are deployed across a variety of channels, including Microsoft Teams and Microsoft 365 Copilot, Facebook, mobile apps, and custom and demo websites.  

To learn more about the report and how to run it, refer to our guide on MS Learn. 

New ability to customize out-of-the-box Power BI reports  

Existing tools for Power BI template reports in the Viva Insights web app allow users to customize their reports for their organization's needs, through their selection of filters, metrics, and organizational attributes. Now, an expanded toolkit allows Viva Insights analysts to further customize out-of-the-box Power BI reports to make them even more relevant to their organization. 

With these new tools, analysts can, for example: 

  • Add new visualizations, text boxes, and graphics 
  • Change the report's filters 
  • Add, rename, and rearrange report pages 
  • Save and delete your customized reports 

Users can now customize pre-built Power BI reports such as the Copilot Studio agents or Copilot for Sales adoption reports, but not custom queries such as Person queries or Meeting queries. Users can also customize any queries that they or other analysts in their organization have previously run. 

To learn how to customize out-of-the-box reports, please see our guide on MS Learn

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Issue 743

1 Share

Comment

When reading through my RSS feeds for the week, the number of articles about agent-assisted coding (even ones not supported by default) I read was overwhelming. It felt like there was no other topic for the week!

Am I going to write more about it, too? Well, yes, but not on how to set it up, use it, or whether I like the Xcode 26.3 implementation. Instead, I’m going to look a little into a possible future of working with these tools.

You’ve probably watched Apple’s code-along session introducing the feature. It’s good, and the presenters do a good job of explaining it, but you’ll notice something about the prompts they use. They keep mentioning that you should use detailed prompts, but the ones they use are short and don’t look much like the ones I have had success with.

Don’t get me wrong, I’m not criticising. It’s very easy for me to say a video should use essay-length prompts, but that’s hard to do in a 30-45 minute tutorial session and I understand why the session is how it is.

Ambiguity in a prompt is the one thing that the constant improvement in models will never fix. What happens in the edge cases? What happens when there’s an error? A short prompt can’t contain the detail needed for predictable and reliable software. Some of the agents, like Claude Code, will now ask clarifying questions if they detect ambiguity, but in my opinion it’s much better to think through your software and specify more things ahead of time before letting the agent start to code.

Learning how to use an agent effectively is definitely a skill, and the spec for whatever new app or new feature you’re building is really important. One of the most effective techniques I have used came from watching this workshop by Peter Steinberger where he lets two different LLM contexts try to find and fix holes in a spec before coding. As long as you read the feedback and make decisions where they need to be made, this is an incredibly effective way of working.

What I’d like to see is some of these agentic tools start to integrate that way of working, and none of the UIs I have seen so far encourage this. The agent is always in a sidebar or smaller window and they don’t guide you down a path of making sure you have removed ambiguity before coding. I can imagine a future version of Xcode where the agent works with you to refine a spec for whatever change you’re planning, full screen in the editor, including using multiple context windows to push back against you. I’d love to see the spec in a completely separate context window to the coding agent, too. It would need a lot of work and getting the UI right would be tricky, but I suspect it would be powerful. It would also change the focus of the UI from “the agent is only important enough to live in a sidebar” to “the agent and your instructions are as important as the code”.

All of this is possible already, of course, but only by using multiple tools and you need careful model and context management for it to be effective. I’d love to see Apple and the Xcode team really lead from the front and reduce the friction of working this way. Maybe we’ll see something at this year’s WWDC?

– Dave Verwer

Release white-label apps with a single click

Shipping white-label apps used to mean repeating the same steps and signing in and out of App Store Connect dozens of times per release. With Runway, ship everything in one place, just once.

News

Swift Student Challenge for 2026

It’s that time of year again! The Swift Student Challenge is open for submissions until February 28th. It’s truly a unique experience for the winners, but it’s also a great opportunity to put your mind to a focused, constrained project for a couple of weeks to see what you can create. Check your eligibility (it’s broader than you might think) and get your application in!

Tools

SimTag: Context for your iOS Simulators

Aryaman Sharda:

SimTag adds a small, unobtrusive overlay to each iOS Simulator window showing the branch that build came from.

That’s it.

What a great idea. 👍

Code

Morphing Sheets Out of Buttons in SwiftUI

What’s this? An article that doesn’t even mention AI? 😂 Gabriel Theodoropoulos writes about how little tweaks to the built-in SwiftUI morphing animations can make all the difference to how polished your app feels. I love that there are videos for each step of the process, too. Great article.


Swift Concurrency from Zero to Hero

Alex Ozun:

It goes without saying that just reading alone won’t make you a Swift concurrency expert, you’ll need to continuously put everything you learn to practice.

And yet, also:

But those who do are guaranteed to become top 1% subject matter experts.

I love the format of this article, it’s great.

Videos

Let’s end open source together with this one simple trick

Yes, there were Swift talks at this year’s FOSDEM, but this one by Dylan Ayrey and Mike Nolan was the one that caught my eye. They ask the question of whether open source is doomed in the age of LLMs, and it’s really a fantastic talk.


Swift Pre-FOSDEM Community Event 2026

Talking of FOSDEM, there was also a pre-event focused entirely on Swift with nine short talks. Perfect for a few bite-sized watches this weekend.

And finally...

First it became a typeset PDF again, and now a physical object!

Read the whole story
alvinashcraft
21 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Call For Papers Listings for 2/13

1 Share

A collection of upcoming CFPs (call for papers) from across the internet and around the world.

The post Call For Papers Listings for 2/13 appeared first on Leon Adato.

Read the whole story
alvinashcraft
33 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Reading List 354

1 Share

This reading list is courtesy of Vivaldi browser, who pay me decent money to fight for a better web and don’t moan at me for reading all this stuff. We’ve just released Vivaldi 7.8 for mobile, with even more personalistion and zero “A.I.”, because it’s cream of the crop, not a stream of the Slop.

Read the whole story
alvinashcraft
42 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

How to solve the context size issues with context packing with Docker Model Runner and Agentic Compose

1 Share

If you’ve worked with local language models, you’ve probably run into the context window limit, especially when using smaller models on less powerful machines. While it’s an unavoidable constraint, techniques like context packing make it surprisingly manageable.

Hello, I’m Philippe, and I am a Principal Solutions Architect helping customers with their usage of Docker.  In my previous blog post, I wrote about how to make a very small model useful by using RAG. I had limited the message history to 2 to keep the context length short.

But in some cases, you’ll need to keep more messages in your history. For example, a long conversation to generate code:

- generate an http server server in golang
- add a human structure and a list of humans
- add a handler to add a human to the list
- add a handler to list all humans
- add a handler to get a human by id
- etc...

Let’s imagine we have a conversation for which we want to keep 10 messages in the history. Moreover, we’re using a very verbose model (which a lot of tokens), so we’ll quickly encounter this type of error:

error: {
    code: 400,
    message: 'request (8860 tokens) exceeds the available context size (8192 tokens), try increasing it',
    type: 'exceed_context_size_error',
    n_prompt_tokens: 8860,
    n_ctx: 8192
  },
  code: 400,
  param: undefined,
  type: 'exceed_context_size_error'
}


What happened?

Understanding context windows and their limits in local LLMs

Our LLM has a context window, which has a limited size. This means that if the conversation becomes too long… It will bug out.

This window is the total number of tokens the model can process at once, like a short-term working memory.  Read this IBM article for a deep dive on context window

In our example in the code snippet above, this size was set to 8192 tokens for LLM engines that power local LLM, like Docker Model Runner, Ollama, Llamacpp, …

This window includes everything: system prompt, user message, history, injected documents, and the generated response. Refer to this Redis post for more info. 

Example: if the model has 32k context, the sum (input + history + generated output) must remain ≤ 32k tokens. Learn more here.  

It’s possible to change the default context size (up or down) in the compose.yml file:

models:
  chat-model:
    model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m
    # Increased context size for better handling of larger inputs
    context_size: 16384

You can also do this with Docker with the following command: docker model configure –context-size 8192 ai/qwen2.5-coder `

And so we solve the problem, but only part of the problem. Indeed, it’s not guaranteed that your model supports a larger context size (like 16384), and even if it does, it can very quickly degrade the model’s performance.

Thus, with hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m, when the number of tokens in the context approaches 16384 tokens, generation can become (much) slower (at least on my machine). Again, this will depend on the model’s capacity (read its documentation). And remember, the smaller the model, the harder it will be to handle a large context and stay focused.

Tips: always provide an option (a /clear command for example) in your application to empty the message list, or to reduce it. Automatic or manual. Keep the initial system instructions though.

So we’re at an impasse. How can we go further with our small models?

Well, there is still a solution, which is called context packing.

Using context packing to fit more information into limited context windows

We can’t indefinitely increase the context size. To still manage to fit more information in the context, we can use a technique called “context packing”, which consists of having the model itself summarize previous messages (or entrust the task to another model), and replace the history with this summary and thus free up space in the context.

So we decide that from a certain token limit, we’ll have the history of previous messages summarized, and replace this history with the generated summary.

I’ve therefore modified my example to add a context packing step. For the exercise, I decided to use another model to do the summarization.

Modification of the compose.yml file

I added a new model in the compose.yml file: ai/qwen2.5:1.5B-F16

models:
  chat-model:
    model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

  embedding-model:
    model: ai/embeddinggemma:latest

  context-packing-model:
    model: ai/qwen2.5:1.5B-F16

Then:

  • I added the model in the models section of the service that runs our program.
  • I increased the number of messages in the history to 10 (instead of 2 previously).
  • I set a token limit at 5120 before triggering context compression.
  • And finally, I defined instructions for the “context packing” model, asking it to summarize previous messages.

excerpt from the service:

golang-expert-v3:
build:
    context: .
    dockerfile: Dockerfile
environment:

    HISTORY_MESSAGES: 10
    TOKEN_LIMIT: 5120
    # ...
   
configs:
    - source: system.instructions.md
    target: /app/system.instructions.md
    - source: context-packing.instructions.md
    target: /app/context-packing.instructions.md

models:
    chat-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_CHAT

    context-packing-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_CONTEXT_PACKING

    embedding-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_EMBEDDING

You’ll find the complete version of the file here: compose.yml

System instructions for the context packing model

Still in the compose.yml file, I added a new system instruction for the “context packing” model, in a context-packing.instructions.md file:

context-packing.instructions.md:
content: |\
    You are a context packing assistant.
    Your task is to condense and summarize provided content to fit within token limits while preserving essential information.
    Always:
    - Retain key facts, figures, and concepts
    - Remove redundant or less important details
    - Ensure clarity and coherence in the condensed output
    - Aim to reduce the token count significantly without losing critical information

    The goal is to help fit more relevant information into a limited context window for downstream processing.

All that’s left is to implement the context packing logic in the assistant’s code.

 Applying context packing to the assistant’s code

First, I define the connection with the context packing model in the Setup part of my assistant:

const contextPackingModel = new ChatOpenAI({
  model: process.env.MODEL_RUNNER_LLM_CONTEXT_PACKING || `ai/qwen2.5:1.5B-F16`,
  apiKey: "",
  configuration: {
    baseURL: process.env.MODEL_RUNNER_BASE_URL || "http://localhost:12434/engines/llama.cpp/v1/",
  },
  temperature: 0.0,
  top_p: 0.9,
  presencePenalty: 2.2,
});

I also retrieve the system instructions I defined for this model, as well as the token limit:

let contextPackingInstructions = fs.readFileSync('/app/context-packing.instructions.md', 'utf8');

let tokenLimit = parseInt(process.env.TOKEN_LIMIT) || 7168

Once in the conversation loop, I’ll estimate the number of tokens consumed by previous messages, and if this number exceeds the defined limit, I’ll call the context packing model to summarize the history of previous messages and replace this history with the generated summary (the assistant-type message: [“assistant”, summary]). Then I continue generating the response using the main model.

excerpt from the conversation loop:

 let estimatedTokenCount = messages.reduce((acc, [role, content]) => acc + Math.ceil(content.length / 4), 0);
  console.log(` Estimated token count for messages: ${estimatedTokenCount} tokens`);

  if (estimatedTokenCount >= tokenLimit) {
    console.log(` Warning: Estimated token count (${estimatedTokenCount}) exceeds the model's context limit (${tokenLimit}). Compressing conversation history...`);

    // Calculate original history size
    const originalHistorySize = history.reduce((acc, [role, content]) => acc + Math.ceil(content.length / 4), 0);

    // Prepare messages for context packing
    const contextPackingMessages = [
      ["system", contextPackingInstructions],
      ...history,
      ["user", "Please summarize the above conversation history to reduce its size while retaining important information."]
    ];

    // Generate summary using context packing model
    console.log(" Generating summary with context packing model...");
    let summary = '';
    const summaryStream = await contextPackingModel.stream(contextPackingMessages);
    for await (const chunk of summaryStream) {
      summary += chunk.content;
      process.stdout.write('\x1b[32m' + chunk.content + '\x1b[0m');
    }
    console.log();

    // Calculate compressed size
    const compressedSize = Math.ceil(summary.length / 4);
    const reductionPercentage = ((originalHistorySize - compressedSize) / originalHistorySize * 100).toFixed(2);

    console.log(` History compressed: ${originalHistorySize} tokens → ${compressedSize} tokens (${reductionPercentage}% reduction)`);

    // Replace all history with the summary
    conversationMemory.set("default-session-id", [["assistant", summary]]);

    estimatedTokenCount = compressedSize

    // Rebuild messages with compressed history
    messages = [
      ["assistant", summary],
      ["system", systemInstructions],
      ["system", knowledgeBase],
      ["user", userMessage]
    ];
  }

You’ll find the complete version of the code here: index.js

All that’s left is to test our assistant and have it hold a long conversation, to see context packing in action.

docker compose up --build -d
docker compose exec golang-expert-v3 node index.js

And after a while in the conversation, you should see the warning message about the token limit, followed by the summary generated by the context packing model, and finally, the reduction in the number of tokens in the history:

Estimated token count for messages: 5984 tokens
Warning: Estimated token count (5984) exceeds the model's context limit (5120). Compressing conversation history...
Generating summary with context packing model...
Sure, here's a summary of the conversation:

1. The user asked for an example in Go of creating an HTTP server.
2. The assistant provided a simple example in Go that creates an HTTP server and handles GET requests to display "Hello, World!".
3. The user requested an equivalent example in Java.
4. The assistant presented a Java implementation that uses the `java.net.http` package to create an HTTP server and handle incoming requests.

The conversation focused on providing examples of creating HTTP servers in both Go and Java, with the goal of reducing the token count while retaining essential information.
History compressed: 4886 tokens → 153 tokens (96.87% reduction)

This way, we ensure that our assistant can handle a long conversation while maintaining good generation performance.

Summary

The context window is an unavoidable constraint when working with local language models, particularly with small models and on machines with limited resources. However, by using techniques like context packing, you can easily work around this limitation. Using Docker Model Runner and Agentic Compose, you can implement this pattern to support long, verbose conversations without overwhelming your model.

All the source code is available on Codeberg: context-packing. Give it a try! 

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Why 40% of AI projects will be canceled by 2027 (and how to stay in the other 60%)

1 Share
A colorful hand-drawn illustration of four diverse hands playing with toy cars on a blue grid background with stars and pencils, representing collaborative innovation, the agentic AI race, and organizational experimentation.

The agentic AI race is on, and most organizations are at risk of losing it. Not because they lack ambition, but because they’re fighting three wars simultaneously without a unified strategy.

Looking at what separates successful agentic AI programs from the 40% that Gartner predicts will be canceled by 2027, the pattern is clear. Organizations aren’t failing at AI. They’re failing at the infrastructure that makes AI work at enterprise scale.

There are three underlying crises of most AI initiatives,and solving them independently doesn’t work. They have to be addressed together, through a unified AI connectivity program.

The three crises derailing agentic AI Infrastructure

Crisis #1: Building sustainable velocity

Everyone knows speed matters. Boards are demanding AI agents, executives are funding pilots, and teams are racing to deploy.

But urgency hasn’t translated to velocity. S&P Global reports that 42% of companies are abandoning AI initiatives before production. Organizations are deploying agents quickly, but then pulling them back just as quickly.

The uncomfortable truth is that many of the organizations that moved fastest are now moving backward.

The uncomfortable truth is that many of the organizations that moved fastest are now moving backward. Consider McDonald’s terminating its AI voice ordering program after deploying it to over 100 locations, or the 39% of AI customer service chatbots that were pulled back or reworked.

Speed without first establishing a foundation creates technical debt that compounds until it forces a complete rebuild.

The organizations achieving sustainable velocity aren’t just moving fast. They’re moving fast on infrastructure that supports iteration rather than requiring restarts.

Crisis #2: The fragmentation tax

While teams race to deploy, Finance and FinOps teams are watching margins erode. 84% of companies report more than 6% gross margin erosion from AI costs, and 26% report erosion of 16% or more.

This isn’t coming from strategic over-investment, but from chaos: fragmented systems, untracked token consumption, zombie infrastructure, and redundant tooling scattered across teams that don’t know what the other is building.

It’s not possible to monetize what can’t be measured.

There’s also the secondary problem that it’s not possible to monetize what can’t be measured. Organizations hemorrhaging margin simultaneously leave revenue on the table because they lack visibility into usage patterns, unit economics, and the data required for usage-based pricing.

Only 15% of companies can forecast AI costs within ±10% accuracy. Everyone else is operating on hope rather than data.

Crisis #3: The shadow AI time bomb

The third crisis is quieter but potentially more damaging. 86% of organizations have no visibility into their AI data flows. 20% of security breaches are now classified as Shadow AI incidents. And 96% of enterprises acknowledge that AI agents have either already introduced security risks or will soon.

Development teams under pressure to ship are spinning up LLM connections, routing sensitive data to models, and expanding agent-to-agent communication, all often without security review. The attack surface grows with every deployment, but visibility doesn’t keep up.

By the time organizations discover the problem — through a breach, a failed audit, or a regulatory inquiry — the damage is structural. Remediation means rollbacks, rebuilds, and reputational harm that takes years to recover from.

Why solving these separately doesn’t work

Most organizations get it wrong by treating speed, cost, and governance as independent problems that require separate solutions.

They task Dev and AI/ML teams to drive velocity, task FinOps with controlling AI costs, and task Security to build governance frameworks–all without a shared, unified approach: three workstreams and three organizational silos.

This introduces fragmentation. The structure designed to solve the problem makes it worse. This doesn’t mean these teams shouldn’t be the primary owners of their workstreams; it’s just that leaders shouldn’t approach these challenges in silos. Think about the relationships this way:

  • Governance: Without it, speed creates risk. Every agent deployed without proper controls expands the attack surface. Moving fast just accumulates vulnerabilities faster. In the long term, this slows you down. Governance–when done properly–will equate to speed.
  • Cost visibility: Otherwise, speed burns margin. Every deployment without unit economics is just a bet that the math will work out later. Moving fast means hemorrhaging money faster. And hemorrhaging money ultimately leaves less budget for innovation.
  • Speed: Without speed, governance becomes stagnant. Manual review cycles and approval processes that worked for traditional IT can’t scale to agentic workloads. Governance that slows deployment to a crawl isn’t governance — it’s a slow path to irrelevance.

The organizations that master all three simultaneously will reap the benefits, while those that try to solve them separately will see the gaps widen.

The winners in the agentic era have built a unified infrastructure that addresses speed, cost, and governance

What winning looks like

The winners in the agentic era share a familiar pattern: they’ve built unified infrastructure that addresses speed, cost, and governance as a single integrated platform. This allows them to:

  • Deploy with confidence. Teams ship agents knowing that guardrails are automated, not manual. Security and compliance happen at the infrastructure layer, not through review meetings that add weeks to timelines.
  • Invest with clarity. Finance trusts forecasts because they’re based on consumption data. Product teams can model unit economics before launch. Cost attribution connects spending to business outcomes.
  • Monetize what they build. Usage-based pricing is possible because consumption is metered at every layer, and AI capabilities generate revenue streams.
  • See the full picture. Visibility spans the entire AI data path, not just LLM calls, but the APIs, events, MCP connections, and agent-to-agent communications that make up real-world agentic architectures.
  • Move faster over time. Each deployment builds on the last, institutional knowledge accumulates, and the platform gets smarter.

This is the flywheel in action: Governance enables speed, speed enables cost efficiency, cost efficiency funds further investment in governance and velocity. The three capabilities compound when unified and collapse when fragmented.

An animated infographic depicting the Agentic Innovation flywheel. At the center of the diagram is a bright yellow circle containing the title. Surrounding this center are four strategic pillars connected by a circular dashed line:

AI connectivity: The unified platform approach

The solution to these compounding challenges isn’t another point tool. It’s a new architectural approach for how AI systems, APIs, and agents connect and run in production: AI connectivity.

AI connectivity is the unified governance and runtime layer that spans the full data path agents traverse, from APIs and events to LLM calls, MCP connections, and agent-to-agent communication.

Traditional API management handles request-response traffic between applications. AI Gateways handle traffic between agents and models. Alone, neither addresses the full scope of what agentic AI requires.

Agents don’t just call LLMs. They traverse the entire digital ecosystem, from invoking MCP tools to consuming APIs and event streams as context, coordinating with other agents, and accessing data sources across the enterprise. Each connection point requires visibility, control, and governance that all work together.

AI connectivity closes this gap by providing:

  • Unified traffic management across different protocols, contexts, and intelligence, the agentic stack — REST, GraphQL, gRPC, Kafka, WebSocket, MCP, LLMs, A2A, and more.
  • Consistent policy enforcement that applies security, compliance, and cost controls regardless of whether the traffic is a traditional API call or an agent reasoning through a multi-step workflow.
  • Full data path observability that shows not just what agents are doing, but what they’re connecting to, what data is flowing where, and what it costs.
  • Built-in monetization infrastructure that meters consumption at every layer, enabling usage-based pricing, cost attribution, and unit economics visibility.
  • Developer self-service that lets teams build and deploy without waiting for manual reviews.

When governance, cost visibility, and deployment velocity share a common platform, they reinforce one another rather than compete. Teams move fast with automatic guardrails, costs stay visible with metering built into the runtime, and security scales from policies enforced at the infrastructure layer.

Kong: The foundation for an AI connectivity strategy

Here’s what we’ve built at Kong.

Kong provides the AI connectivity layer that spans the whole data path across APIs, events, and AI-native traffic, with the governance, observability, and monetization infrastructure that sustainable AI programs require.

Organizations using Kong can see and control the entire AI data path from a single platform. They can enforce consistent policies across all traffic types, meter usage for cost attribution and revenue capture, and give developers self-service access to the infrastructure they need to build and deploy agents at scale.

This is AI connectivity in practice: a unified platform that makes the speed-cost-governance flywheel actually work.

The window is starting to close

The organizations that will dominate the agentic era are building their platform foundations today. They’re not waiting for the perfect solution; instead, they’re establishing the infrastructure to support increasingly sophisticated AI workloads.

Most enterprises are still struggling with fragmented tools and siloed approaches, and the market leadership opportunities remain wide open for those who move decisively.

But the window is closing. With each passing quarter, a few more organizations adopt the unified platform approach. Once leaders separate from the pack, catching up becomes exponentially more complicated.

The question isn’t whether AI connectivity matters. It’s whether you’re building on it or falling behind those who are.

The post Why 40% of AI projects will be canceled by 2027 (and how to stay in the other 60%) appeared first on The New Stack.

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories