2026-06-02
- Add the /voice command to dictate prompts using local speech-to-text models
2026-06-02
Azure Content Understanding in Foundry Tools is Microsoft’s comprehensive content AI service. It ingests diverse data types — documents, audio, images, and video — and extracts the most critical information to power well-grounded, reliable generative AI and agentic solutions. Azure Content Understanding brings together Azure Document Intelligence’s proven traditional AI with advanced LLM-based content reasoning, enabling both structured and unstructured content extraction, as well as multimodal understanding to address your full spectrum of processing needs.
Leading organizations are already using Content Understanding to move from unstructured content to production-scale automation.
DataSnipper is embedding Content Understanding into everyday financial and audit workflows, allowing professionals to work directly with structured data derived from unstructured documents. As Vidya Peters, CEO of DataSnipper, shares, “By building with Azure Content Understanding, DataSnipper is turning unstructured documents into structured, actionable data, directly inside Excel. Together, we are enabling faster reviews, reliable evidence, and AI you can trust.”
FinHero is evolving from traditional document processing approaches with Azure Document Intelligence to more advanced, LLM-powered contextual reasoning using Content Understanding. By leveraging structured outputs across more complex document types and workflows, they are expanding automation beyond basic extraction into richer, end-to-end processing scenarios that support analytics and agent-driven applications.
Wolters Kluwer, for example, is applying CU across tax and financial workflows to provide measurable business outcomes. Adam Orentlicher, SVP CTO at Wolters Kluwer, noted “By integrating Content Understanding into our solutions, our customers turn complex, unstructured data into actionable insights—faster and more accurately. The result is streamlined workflows, less manual effort, and clear, measurable business value from AI.”
The signal from enterprise customers is clear: Azure Content Understanding is how enterprises operationalize unstructured content—at scale, across modalities, and in production.
Azure Content Understanding is advancing across the full developer workflow—from higher-quality extraction with GPT 5.2, to a more unified experience in Microsoft Foundry, to broader native file support and new integrations for agent and Markdown workflows. With SDKs for Python, Java, .NET, JavaScript, and TypeScript that are now generally available, these capabilities are ready to put into practice today across automation, RAG, and document processing scenarios. We’re also sharing an early look at what’s next in July, including new capabilities enabled by the next Content Understanding API version.
Analyzers in Content Understanding are powered by LLM and embedding models you deploy in Microsoft Foundry. At Build, we’re expanding LLM support to include the latest GPT 5 model family (GPT 5.x), starting with GPT 5.2 (available now). With GPT 5.2, custom field extraction is enhanced, avoiding the need for prompt engineering gymnastics. Whether it’s mixed layouts, domain-specific language, or multiple languages, extraction is more accurate right out of the box. Existing analyzers built on GPT 4.1 continue to run unchanged.
The upgrade is a two-step path you can typically complete in under 5 minutes:
As always, we recommend running side-by-side against your existing eval set before flipping production traffic, as confidence scores, latency, and output accuracy can all shift with a new model.
Microsoft Foundry brings all of your AI tools into a single, unified environment for building modern AI applications. We’re excited to announce that Content Understanding is now a first-class citizen in the new Microsoft Foundry portal. Instead of stitching together multiple tools and services, developers can now access Foundry models, prebuilt analyzers, and agentic integrations in one place, reducing the friction from experimentation to production.
With Content Understanding prebuilt analyzers now integrated into Foundry, you can:

You’re now ready to move to production. Once you are satisfied with the results, select the key icon to retrieve your endpoint and API key, and use the provided code snippets to integrate the analyzer into your application.
Interested in building a custom analyzer? Click Customize in CU Studio from the resource page.
Learn more: Foundry vs. Content Understanding Studio · Create a Microsoft Foundry resource
The shortest path from “I have a document” to “I have structured data” is the one where you don’t have to convert the file first. Azure Content Understanding now ingests a wider set of file types, gathering the context of these files without needing to convert the file types before processing.
GET /contentunderstanding/analyzerResults/{operationId}/files/figures/{figureId}
Learn more: Supported document formats
We’re excited to announce that Content Understanding is now integrated with some of the most popular ways developers are building today, including Microsoft Agent Framework, Foundry IQ (Standard mode), LangChain, and MarkItDown. CU is able to meet developers in the middle of their favorite frameworks to make multi-modal building easier. With the Microsoft Agent Framework integration, for example, an agent can hand off a PDF or image mid-turn and get back structured fields or layout-aware Markdown without your code needing to orchestrate the call. We’re also bringing CU to open-source tools like, like MarkItDown, the converter for turning any document into clean Markdown for LLM consumption. By bringing the power of Content Understanding Layout into MarkItDown, developers can generate layout-aware Markdown that preserves key structures like tables, headings, and figure descriptions. CU is also integrated into LangChain, for easily transforming unstructured content into structured Document objects, and Foundry IQ (standard mode) for built-in content extraction in Microsoft’s retrieval and agent workflows.
Register Content Understanding as a tool on your agent. The agent’s planner will call it whenever it needs to read a document:
# pip install agent-framework-azure-contentunderstanding
from agent_framework_azure_contentunderstanding import (
ContentUnderstandingContextProvider,
AnalysisSection,
ContentLimits,
)
# Minimal setup (uses prebuilt-read analyzer by default)
cu = ContentUnderstandingContextProvider(
endpoint="https://my-resource.cognitiveservices.azure.com/",
credential=DefaultAzureCredential(),
)
# Full configuration with a custom analyzer
cu = ContentUnderstandingContextProvider(
endpoint="https://my-resource.cognitiveservices.azure.com/",
credential=DefaultAzureCredential(),
analyzer_id="my-custom-analyzer",
max_wait=10.0,
output_sections=[
AnalysisSection.MARKDOWN,
AnalysisSection.FIELDS,
AnalysisSection.FIELD_GROUNDING,
],
content_limits=ContentLimits(max_pages=50, max_file_size_mb=50),
)
# Snippet for use with agent
async with cu:
agent = Agent(client=llm_client, context_providers=[cu])
response = await agent.run(...)
The agent decides when to call analyze_document, you don’t have to. For domain-specific extraction, swap prebuilt-layout for one of the prebuilt analyzers or your own custom analyzer ID.
Learn more: Microsoft Agent Framework overview · Tool calling patterns
Install MarkItDown and configure it to use Content Understanding as the extraction backend. From there, any file you pass through convert() goes through CU and comes out as clean, layout-aware Markdown:
# pip install 'markitdown[az-content-understanding]'
from markitdown import MarkItDown
# Zero-config — auto-selects analyzer per file type
md = MarkItDown(cu_endpoint="<content_understanding_endpoint>")
result = md.convert("report.pdf") # documents → prebuilt-documentSearch
result = md.convert("meeting.mp4") # video → prebuilt-videoSearch
result = md.convert("call.wav") # audio → prebuilt-audioSearch
print(result.markdown)
)
# Full configuration with a custom analyzer
md = MarkItDown(
cu_endpoint="<content_understanding_endpoint>",
cu_analyzer_id="my-invoice-analyzer",
)
result = md.convert("invoice.pdf")
print(result.markdown)
# Output includes YAML front matter with extracted fields:
# ---
# contentType: document
# fields:
# VendorName: CONTOSO LTD.
# InvoiceDate: '2019-11-15'
# ---
# <!-- page 1 -->
# ...
The result is Markdown with headings, tables, and figure descriptions inline — exactly the shape downstream chunkers and embedding models prefer.
Learn more: MarkItDown on GitHub · Build a RAG solution with Content Understanding
Everything above is available for you to try out today, but we have even more exciting features coming in July.
Here’s a sneak peek at what’s landing in July 2026:
If you’re attending Build 2026, join us at Session BRK242 — “Turn your agents into action” (recorded), where we’ll go deep on the agentic understanding mode demo and the Foundry IQ integration. If you’re not at Build, the recording will be online right after the session, and we’ll publish a follow-up dev blog when the July release ships, including more working code, region availability, and migration guidance for every customer currently on GPT 4.x.
The post Build smarter document workflows: What’s new in Azure Content Understanding at Build 2026 appeared first on Microsoft Foundry Blog.
Microsoft announced two new text LLMs this morning - MAI-Thinking-1 (reasoning, 35B parameters, available to "select early partners") and MAI-Code-1-Flash (5B parameters, "purpose-built for GitHub Copilot and VS Code to deliver high performance and lower cost [...] rolling out to GitHub Copilot individual users in Visual Studio Code"). I've not been able to try either of them just yet.
It's very interesting to see Microsoft releasing models with such low parameter counts, especially given how expensive larger models are to access right now. They claim MAI-Thinking-1 "is preferred to Sonnet 4.6 in our blind human side-by-side evaluations", which is impressive for a 35B model seeing as I frequently run models larger than that on my own laptop.
Tags: llm-release, generative-ai, ai, microsoft, llms

Today we’re announcing azure-functions-skills in public preview: a one-command way to give your favorite coding agent (GitHub Copilot CLI, Claude Code, Codex CLI, VS Code) the skills, agent definition, MCP servers, hooks, and instructions it needs to ship secure-by-default, scale-ready Azure Functions — end-to-end.
AI coding agents now write the first draft of your function, scaffold the infrastructure, and run the deploy command. But ask a general-purpose agent to build for Azure Functions and the output is usually a step behind. It leans on older programming models that have been superseded, and it has no knowledge of newer capabilities: the serverless agents runtime, Flex Consumption defaults, the new Azure MCP template service, the latest binding shapes, this week’s runtime improvements, or Go language support. Worse, the code it produces often leaves hardcoded keys, connection strings, and other secrets sitting in your function for you to clean up later, picks patterns that don’t scale (client-per-invocation, blocking I/O on the hot path), and skips identity-based access entirely. The code compiles, but it isn’t secure, isn’t current, and isn’t using what Azure Functions offers today.
azure-functions-skills closes that gap. The skills steer the agent toward managed identity, Key Vault references, Flex Consumption, and the binding and concurrency patterns that scale — and the built-in doctor catches the rest before deploy.
Try it now: npx @azure/functions-skills install
In about 5 minutes you’ll have a working Functions project scaffolded with managed identity, a deploy-ready workflow, and a doctor HTML report you can wire into CI.
Requirements: Node 18+, an Azure subscription, and one of: GitHub Copilot CLI, Claude Code, Codex CLI, or VS Code.
Availability:
azure-functions-skillsis in public preview on npm as@azure/functions-skillsand on the GitHub Copilot CLI / Claude Code / Codex plugin marketplaces. The skill set is intentionally small at launch and will grow with each Azure Functions release.
azure-functions-skills is a plugin for AI coding agents. It builds on the broader azure-skills plugin for cross-Azure scenarios, and it ships:
setup, create, deploy, diagnostics, best-practices, health-status, inventory, doctor, feedback).functions-copilot) that routes user requests to the right skill and proposes the next workflow when one finishes.copilot-instructions.md, CLAUDE.md, AGENTS.md). Everything the agent needs to behave consistently across hosts.@azure/functions-skills, that installs all of the above with one command, lets you run the agent (chat), and validates your project before deployment (doctor).Names you’ll see in this post:
@azure/functions-skills— the npm package and CLI you run.azure-functions-skills— the plugin (skills + instructions) the CLI installs.functions-copilot— the agent definition that routes you to the right skill.
Two design choices shape every feature:
azure-skills plugin rather than reinvent it.The
azure-functions-agentsskill is included from launch and supports the Azure Functions serverless agents runtime that just launched at Build 2026.
| Skill | What it does |
|---|---|
azure-functions-setup |
Detects Azure CLI / azd / Core Tools / language runtimes / the azure-skills plugin on your machine and walks you through installing what’s missing. |
azure-functions-create |
Scaffolds new Functions projects, or adds functions to an existing project, using the Azure MCP template service so you always start from the latest templates. |
azure-functions-agents |
Scaffolds, extends, deploys, and troubleshoots event-driven AI agents on the Azure Functions serverless agents runtime (azurefunctions-agents-runtime) that just launched at Build 2026. Picks the best deployable GPT model based on subscription / region quota, wires Microsoft Foundry, Connector Namespaces, and remote MCP servers, and offloads code execution or web browsing to Azure Container Apps dynamic sessions. |
azure-functions-deploy |
Hands off to the azure-skills prepare → validate → deploy workflow with Functions-specific guidance (Flex Consumption, functionAppConfig, private networking, identity). |
azure-functions-best-practices |
Reviews an existing Function App against current best practices and proposes prioritized, approval-gated remediations. |
azure-functions-diagnostics |
Investigates deployment failures, runtime errors, trigger / binding issues, and logging gaps. |
azure-functions-health-status |
Collects the current running state, metrics, Application Insights signals, Resource Health, and Activity Log. |
azure-functions-inventory |
Collects static specifications: SKU, runtime, networking, identity, settings, functions, and trigger inventory. |
azure-functions-doctor |
Pre-deployment validation, used by the doctor CLI command below. |
azure-functions-feedback |
Turns observations from a session into a previewed GitHub issue or PR against this repo. |
The set is intentionally small at launch. It already includes azure-functions-agents so you can scaffold and deploy on the Azure Functions serverless agents runtime that just launched at Build 2026. A skill to assist migrating worker code to Go is next.
Have a skill you’d like to see? Open an issue at https://github.com/Azure/azure-functions-skills/issues, or just run
azure-functions-feedbackmid-session and the skill itself will prepare the issue draft for you.
Each AI coding agent has its own plugin install flow, and several of them spread the work across multiple steps. The GitHub Copilot CLI plugin, in particular, can only be installed at user scope. That’s useful for skills, but not what you want for project-specific MCP servers, hooks, or instruction files that should live with your repository.
install collapses all of that into one command and applies the right split by default:
CLAUDE.md / AGENTS.md) → the current directory. Committable alongside your code.This keeps your user-scope agent context clean and makes the Azure Functions skills findable every time you open the workspace. If you want everything in the project, add --local:
# GitHub Copilot CLI (default: plugin user-scope, workspace artifacts here)
npx @azure/functions-skills install --agent ghcp
# Everything in the project
npx @azure/functions-skills install --agent ghcp --local
Use --agent claude for Claude Code or --agent codex for Codex CLI. The CLI also absorbs future plugin-flow changes so the command stays stable for users.
chat launches your installed agent of choice, already wired into the functions-copilot agent definition.
npx @azure/functions-skills chat
A typical first message looks like this:
“Create a Python HTTP trigger that reads from Cosmos DB using managed identity, and add a Service Bus output binding.”
The agent picks the right skills (create, then best-practices), uses the Azure MCP template service for the latest scaffold, and wires identity-based access by default. No keys in your repo.
The first time you run chat in a workspace, the setup skill auto-fires. It walks through prerequisites (Azure CLI, Azure Developer CLI, Core Tools, language runtimes, the azure-skills plugin) and offers to install anything missing, so a developer brand-new to Azure Functions can get to a working environment without bouncing between docs.
After setup, the agent suggests the most useful next skill based on your project state, which makes the rest of the catalog easy to discover.

Everything after -- is passed through to the underlying agent, so any agent-native flag you rely on still works. Subsequent chat runs skip setup because the per-workspace state lives under .azure-functions-skills/.
VS Code users get the same experience: open the workspace, pick the functions-copilot agent, and run the setup skill from there.

Do you know the top two causes of Azure Functions support incidents reported to our team?
Together, they account for roughly half of the Azure Functions support incidents we see internally — based on our analysis of Customer Reported Incidents (CRIs) in Q1 CY2026, about 53% were related to customer code or configuration issues. Preventing this class of issue before deploy time eliminates a large fraction of the problems customers report.
doctor checks a workspace for exactly those issues. It runs in two tiers:
host.json shape, runtime version, trigger configuration, extension bundle range, deprecated settings, lockfile presence, tracked .env files, and a set of supply-chain checks (lifecycle scripts, unpinned production dependencies, install-script dependencies, and more) informed by the recent npm / PyPI compromises.--deep): Uses your coding agent to find issues that need to read the code: client-per-invocation patterns, blocking I/O on the hot path, hardcoded secrets, Durable Functions non-determinism (Date.now(), Math.random(), network calls in orchestrators), credential collection patterns, and more.Run it locally and get a self-contained HTML report (the --deep --accept-deep-risk flags opt into Tier 2 LLM checks; safe to run locally, see the CI note below before using in pipelines):
npx @azure/functions-skills doctor --dir . \
--deep --accept-deep-risk \
--agent github-copilot \
--format html --output doctor-report.html
A representative run looks like this:
Tier 1 (deterministic)
✓ host.json shape ok
✓ runtime version pinned (~4)
⚠ extension bundle range too broad host.json:5
⚠ unpinned production dependency semver:^7.0.0 → pin to 7.5.4
✗ tracked .env file with secret keys .env:3
Tier 2 (semantic, via --deep)
⚠ blocking I/O on hot path app/orders.py:42 (use async client)
✗ hardcoded connection string app/cosmos.py:11 (use Key Vault reference)
⚠ client-per-invocation pattern app/blob.py:18 (hoist client to module scope)
Summary: 2 critical, 4 warnings — see doctor-report.html

The same command can run in CI. Wire it into your deployment pipeline and you have shift-left for the configuration and code-quality issues that drive the majority of incidents, caught while the developer (or the agent acting for them) can still fix the diff cheaply.
--deep runs the coding agent with file-write and shell-execution permissions, so any input the agent sees becomes a potential prompt-injection surface. We default to refusing --deep on pull_request events. You can opt in with AZURE_FUNCTIONS_DOCTOR_TRUST_PR=1 for trusted mirror pipelines.
The recommended pattern:
--no-deep (Tier 1 only). Fast, deterministic, safe to run on untrusted PR content.--deep on push: main, ideally gated behind a GitHub Environment with required reviewers and a scoped secret for the agent token.See docs/doctor-guide.md and SECURITY.md for the full security model.
| When you want to… | Use |
|---|---|
| Get your local environment ready for Functions development | azure-functions-setup |
| Start a new project or add a function | azure-functions-create |
| Build a scheduled or event-driven AI agent (daily briefing, inbox digest, connector-triggered workflow) | azure-functions-agents |
| Deploy to Azure | azure-functions-deploy |
| Catch problems before deployment | doctor CLI (or azure-functions-doctor) |
| Review an existing app against current best practices | azure-functions-best-practices |
| Investigate a failing or misbehaving Function App | azure-functions-diagnostics |
| Check the live health of a running app | azure-functions-health-status |
| Send us feedback or a feature request | azure-functions-feedback |
functions-copilot routes your request to the appropriate skill, and proposes the next step after each workflow.
Pick the agent you already use; the rest of the flow is the same.
# 1. Install the plugin (default: skills at user scope, workspace artifacts here)
npx @azure/functions-skills install --agent ghcp # GitHub Copilot CLI
npx @azure/functions-skills install --agent claude # Claude Code
npx @azure/functions-skills install --agent codex # Codex CLI
# 2. Launch the agent (setup skill auto-fires on first run)
npx @azure/functions-skills chat
# 3. Validate before deploy (--deep enables Tier 2 LLM checks; safe locally, see CI note)
npx @azure/functions-skills doctor --deep --accept-deep-risk \
--agent github-copilot \
--format html --output doctor-report.html
VS Code: after step 1, open the workspace in VS Code, select the functions-copilot agent in GitHub Copilot Chat, and run the setup skill. Same first-run experience as chat, just inside the IDE.
Prefer the skills scoped to the current project only? Add
--localto step 1.
Full docs, CI recipes, and the supply-chain check reference live at https://github.com/Azure/azure-functions-skills.
azure-functions-skills is open source, MIT licensed, and developed in the open. The repository is the right place to:
azure-functions-feedback mid-session and have the skill prepare the draft for you.CONTRIBUTING.md.Repository: https://github.com/Azure/azure-functions-skills
We’re building the AI-era developer experience for Azure Functions in the open. Star the repo, open an issue, or run azure-functions-feedback mid-session and have the skill draft the issue for you. Tell us what to ship next.
The post Introducing azure-functions-skills: An AI-Era Workspace for Azure Functions (Preview) appeared first on Azure SDK Blog.
We released a set of AWS SDK Skills as part of the open-source Agent Toolkit for AWS. These are AI skills that teach coding agents how to follow AWS SDK best practices. The project is available on GitHub under the Apache-2.0 license.
AI coding agents know the general shape of AWS SDK usage, but they get the details wrong. They generate incorrect API names, use incorrect parameter types, and miss SDK-specific patterns like paginators, waiters, and high-level APIs such as the transfer manager for Amazon Simple Storage Service (Amazon S3). These errors are especially common for newer SDKs like the AWS SDK for Swift, where agents generate code that looks plausible but fails to compile.
As developers increasingly rely on AI agents to write AWS SDK code, we need to make sure those agents produce code that compiles, follows best practices, and uses each SDK the way it was intended to be used.
Skills are modular packages that give AI coding agents specialized SDK knowledge. Each skill is authored by the SDK team that owns the language, so it reflects the things agents consistently get wrong for that specific SDK. A skill includes:
SKILL.md — core instructions with SDK-specific patterns and concrete examplesreferences/ — on-demand documentation for deeper topics, loaded only when neededscripts/ — automation for build, test, and validation workflowsSkills are agent-agnostic. They work with any coding agent that supports the open skills format.
Code that doesn’t compile. This is the most common failure mode for newer SDKs where the agent’s training data is thin or out of date. The AWS SDK for Swift uses Swift concurrency throughout. Operations are async-throwing, and so are the convenience client constructors. Agents frequently miss this and produce code that looks reasonable but doesn’t build:
// What agents tend to write. Does not compile.
let client = S3Client()
let response = client.listBuckets(input: ListBucketsInput())
Both lines are wrong: S3Client() is async throws, and so is listBuckets. With the Swift skill installed, the agent writes the modern Swift concurrency form:
let config = try await S3Client.S3ClientConfig(region: "us-west-2")
let client = S3Client(config: config)
let response = try await client.listBuckets(input: ListBucketsInput())
The first version sends the developer back to the docs to figure out why a plausible-looking line won’t build. The second one runs.
Code that runs but performs poorly or costs more. Agents often skip SDK features that exist precisely to make AWS calls efficient: paginators for ListObjects and similar APIs, waiters for resource-state polling, and the SDK’s high-level file methods like upload_file / download_file for large transfers. A handwritten loop that calls ListObjects without pagination silently drops results past the first page, polling code without waiters burns API calls and risks throttling, and manual file I/O for S3 transfers gives up multipart uploads and parallelism. The code compiles and often appears to work in small tests, but breaks once you’re dealing with real data volumes. With a skill installed, the agent reaches for the right SDK feature for the job: paginators for list operations, waiters for state polling, and the high-level transfer methods for files.
Code that runs but has subtle bugs. Manually marshalling DynamoDB types like {"S": "value"} is easy to get slightly wrong in ways that fail only on certain inputs. Catching a generic Exception instead of typed exceptions like ConditionalCheckFailedException makes retry logic swallow real failures. With a skill installed, the agent reaches for the document client (which handles the conversion correctly) and uses typed exceptions tied to the actual operations it’s calling.
We evaluate each skill against a benchmark of real SDK tasks (Amazon S3 operations, Amazon DynamoDB queries, client configuration, presigned URL generation, credential management) and grade the generated code on whether it compiles, passes lint, and actually does what the task asked for (judged by an LLM). Every task runs twice: once with no skill installed, and once with the relevant skill loaded.
Across our test suite, code generated with a skill installed consistently passed more checks than code generated without one.
The following table summarizes the skills available at launch:
| Skill | SDK | What it covers |
|---|---|---|
aws-sdk-swift-usage |
AWS SDK for Swift | Async patterns, struct-based config types, client initialization |
aws-sdk-js-v3-usage |
AWS SDK for JavaScript v3 | Package structure, client styles, middleware, runtime validation |
aws-sdk-python-usage |
Boto3 / botocore | Client vs. resource interfaces, paginators, waiters, error handling |
You’ll need a coding agent that supports the open skills format. To install a skill from the Agent Toolkit for AWS, run:
npx skills add aws/agent-toolkit-for-aws/skills --skill <skill>
Replace <skill> with the one you want:
aws-sdk-swift-usageaws-sdk-js-v3-usageaws-sdk-python-usageOr pass --skill multiple times to install more than one.
If your favorite SDK is missing or you’ve seen agents make mistakes that aren’t covered yet, open an issue or submit a skill. Visit the repository on GitHub to try it out.