Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
152265 stories
·
33 followers

7 tips to optimize Azure Cosmos DB costs for AI and agentic workloads

1 Share

AI apps and agentic workloads expose inefficiencies in your data layer faster than any previous generation of apps. You’re storing embeddings, serving low-latency retrieval, handling bursty traffic from chat and orchestration, and often operating across regions. Done right, Azure Cosmos DB can support these patterns with high performance and cost controls built in. Done wrong, it is easy to over-provision or pay for inefficient queries.

Insights from a recently published Azure Cosmos DB cost savings white paper, combined with guidance from Microsoft leader John Savill and real-world feedback from Azure Cosmos DB customers, reveal a clear pattern. The teams that succeed financially do so by aligning Azure Cosmos DB design decisions to workload behavior early, then refining those decisions as applications scale.

Below are seven practical, field-tested tips to help you scale AI applications on Azure Cosmos DB while keeping costs under control.

Tip 1: Start free for dev/test so you’re not burning budget before launch

A surprisingly common cost trap is paying for non-production environments longer than necessary. Consider using two levers with Azure Cosmos DB: Free Tier and the Emulator. Each subscription gets a set amount of throughput and storage free each month, and you can develop locally with the emulator at zero cloud cost.

Azure Cosmos DB reviewers on PeerSpot frequently mention ease of setup and cost management as part of the value story. A senior director of product management at Sitecore states that “the search, configuration, and ease of cost management have been a really great experience…Azure Cosmos DB has reduced our total cost of ownership significantly, allowing us to sell our product at extremely competitive pricing.”

In addition to easy setup and management, customers rely on Azure Cosmos DB to eliminate traditional database friction. Users benefit from:

  • No schema management – flexible JSON data allowing schema management inside application code.
  • Automatic indexing – no need for manual tuning
  • Rich SDKs across all major languages including Python, node.JS, GO, .NET, Java and so on
  • Serverless and autoscale – no need to manage capacity

Tip 2: Pick the right throughput mode early, then change it as needs evolve

Understanding Azure Cosmos DB service options and throughput modes is foundational, with free SKU, provisioned, and autoscale being distinct choices.

The eBook translates this into clear guidance:

Novo Nordisk avoided paying for unused capacity, using serverless with a redesigned data model. Simon Kofod, Lead Software Developer at Novo Nordisk, said, “We went from a database that would set us back $240 per month to Azure Cosmos DB that costs less than a buck per month. And we can multiply this saving by four because we have four environments.”

Tip 3: If your AI traffic is spiky, autoscale is often the highest-impact cost lever

AI app demands tend to be uneven: launches, feature rollouts, prompt changes, batch jobs, or simply time-of-day usage can create bursts. Autoscale allows Azure Cosmos DB to scale up and down automatically within a defined range, helping you avoid overprovisioning for peak capacity that you rarely use. For workloads with steady, predictable usage, manual provisioned throughput or reserved capacity may deliver better long-term efficiency.

Kinectify’s platform sees “very spiky load patterns,” and they describe their goal clearly: scale up fast, scale down when quiet, optimize cost.

Michael Calvin, CTO of Kinectify, said, “We have large volumes of data coming in and very spiky load patterns. So we needed a solution that could scale quickly and also scale down when we weren’t receiving those traffic patterns so we can optimize cost… auto-scaling has been invaluable for optimizing both cost and performance on our platform daily.”

Kinectify also implemented a tenant-based logical partition and paired it with autoscale so they could share throughput across tenants while keeping the platform efficient.

Tip 4: Treat partitioning as a cost decision, not just a scale decision

Partition strategy determines whether your queries stay efficient or fan out across physical partitions. John Savill explicitly calls out partition key importance and high cardinality.

Veeam’s implementation connects cost directly to partition-aware architecture and search scope: “What Azure Cosmos DB does for us is deliver low operational overhead with infinite scaling capability… We can narrow down our search to a very limited space within physical partitions, and this saves costs and decreases the latency,” (Zack Rossman, Staff Software Engineer, Veeam).

Veeam used autoscale plus a hierarchical partitioning strategy to distribute billions of items without hot spots, keeping queries efficient at massive scale.

Shahid Syed, director of technology, at Unite Digital, touts Azure Cosmos DB partition-based scaling has having “significantly reduced costs of over $25,000 per month with minimal effort.”

Tip 5: Optimize RU consumption by aligning data models, queries, and indexing with access patterns

Request Units, or RU/s, are the currency of Azure Cosmos DB. Every read, write, and query consumes RU/s, and inefficient operations increase costs even if throughput looks reasonable on paper.

John Savill emphasizes the importance of understanding which operations consume the most RUs and why. AI applications often rely on complex queries, vector searches, or high-volume reads and writes, all of which can drive RU usage if not carefully designed. Reducing document size where possible can lower storage costs and reduce RU consumption for reads and writes. In some scenarios, separating large embeddings from frequently accessed metadata can lead to more efficient access patterns.

The goal is not to prematurely optimize, but to ensure that document design reflects how data is actually used by inference paths, retrieval workflows, and downstream systems.

Novo Nordisk found a concrete modeling change reduced consumption and improved performance: instead of storing tasks as separate documents, they redesigned so an entire checklist and tasks lived in a single document.

RU consumption is not driven by queries alone – it is a combined effect of data modeling, indexing strategy, and how consistently your access patterns align with your partitioning model.

Tip 6: Avoid paying for extra services you don’t need in your AI retrieval pipeline

Running multiple databases for related workloads or vector search often introduces unnecessary cost and complexity: duplicate throughput, fragmented data access, and higher operational overhead. By consolidating related datasets into a single Azure Cosmos DB account, teams can significantly improve cost efficiency without sacrificing scale or performance.

Lead Cloud Architect at Solliance Joel Hullen says, “[H]aving the vector store in Microsoft Azure Cosmos DB makes a lot of sense because the vector store lives in line with the data. It is in the same workspace and the same region. We do not have to worry about ingress and egress charges because with it being co-located with our data, we are going to have better performance.”

Consolidation allows you to:

  • Share throughput (RU/s) across workloads instead of over-provisioning each database independently
  • Reduce management and monitoring overhead by operating within a unified data plane
  • Enable simpler querying and data access patterns when applications need to reason across datasets
  • Eliminate duplicate infrastructure costs that add up quickly as systems scale

This pattern is particularly effective for SaaS platforms, internal line-of-business apps, and analytics-heavy workloads where datasets are highly related and benefit from centralized access.

Tip 7: Keep multi-region setups aligned to actual traffic patterns

Multi-region is a superpower but it can quickly become expensive if regions are added by default rather than based on actual user demand. To keep costs under control, Azure Cosmos DB multi‑region setups should be intentionally aligned to where traffic is truly coming from.

A cost‑aware multi‑region strategy includes:

  • Adding regions only where there is sustained read or write traffic, rather than pre‑provisioning global coverage
  • Regularly reviewing per‑region usage metrics to identify regions that are underutilized
  • Using a single write region with selective read regions when user activity is geographically skewed
  • Removing or consolidating regions as traffic patterns change over time

Check out how provisioned throughput multiplies across regions and how to reason about the multi-region cost model.

Scale AI without the surprise bill

Azure Cosmos DB provides the foundation required to build low-latency AI apps that reach any scale. The insights above underscore an important point. Cost efficiency comes from intentional design, not shortcuts.

By choosing the right throughput model, optimizing RU usage, designing effective partitions, and having setups align to how apps behave in the real world, teams can support demanding AI workloads without sacrificing financial predictability.

The result is an AI-ready data platform that grows with your ambitions and stays aligned with your budget.

 

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless NoSQL and vector database for modern app development, including AI applications. With its SLA-backed speed and availability as well as instant dynamic scalability, it is ideal for real-time NoSQL and MongoDB applications that require high performance and distributed computing over massive volumes of NoSQL and vector data.

To stay in the loop on Azure Cosmos DB updates, follow us on XYouTube, and LinkedIn.  Join the discussion with other developers on the #nosql channel on the Microsoft Open Source Discord.

The post 7 tips to optimize Azure Cosmos DB costs for AI and agentic workloads appeared first on Azure Cosmos DB Blog.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

2.7.2: Add admin protection error message for shadow admin scenarios (#40170)

1 Share
  • Add admin protection error message for shadow admin scenarios

When Windows Admin Protection is enabled, the elevated process runs as a
shadow admin with a different SID, so distributions registered under the
real user are not visible. Surface an informational message in two cases:

  1. Launching a distribution by name that is not found (WSL_E_DISTRO_NOT_FOUND)
  2. Listing distributions when none are registered (WSL_E_DEFAULT_DISTRO_NOT_FOUND)
  • formatting

  • Show admin protection message for non-elevated users too

When Admin Protection creates a shadow admin, distros registered under
the real user are invisible to the shadow admin and vice versa. Remove
the elevation check so the informational message appears for both
elevated and non-elevated callers.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com


Co-authored-by: Ben Hillis benhill@ntdev.microsoft.com
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Read the whole story
alvinashcraft
14 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

DIY To Power Our Planet: An (Mini) Earth Day Field Guide

1 Share
DIY To Power Our Planet: An (Mini) Earth Day Field Guide

Earth Day 2026 isn’t about lofty pledges or abstract policy—it’s about what you can build, hack, repair, and share today. The theme, “Our Power, Our Planet,” shares some key tenets of Maker culture, elevating tools over talk, prototypes over promises, and communities over complacency. Its a distillation and an exhortation to do what we can, […]

The post DIY To Power Our Planet: An (Mini) Earth Day Field Guide appeared first on Make: DIY Projects and Ideas for Makers.

Read the whole story
alvinashcraft
39 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Daily Reading List – April 22, 2026 (#769)

1 Share

Amazing first day of Google Cloud Next. Keynotes were fun and had a ton of productive customer chats.

[blog] Welcome to Google Cloud Next ‘26. Great roundup of what we’re announcing today, and why it matters.

[blog] The new Gemini Enterprise: one platform for agent development, orchestration, and governance. You deserve nice things! We’ve juiced-up our platform for building agents and doing work.

[blog] Gemini Enterprise for the agentic task force: introducing long-running agents, agentic collaboration spaces, advanced governance, and more. This post goes deeper into the Gemini Enterprise app and how we all can do more productive and interesting work.

[blog] Announcing Spanner Omni: Your infrastructure, Google’s innovation. The best data service in the cloud available … anywhere? Yup, you can now bring Spanner with you to run on any infrastructure.

[blog] What’s new with Databases: Powering the agentic future. Plenty of new and exciting things from our Data team.

[blog] What’s new in Cloud Run at Next ‘26. We haven’t let up on our serverless investment, and it’s a one-horse race right now. Cloud Run continues to introduce valuable and innovative capabilities.

[blog] What’s new in GKE at Next ‘26. Kubernetes is already a fundamental component for many companies, and with this AI surge, it’s finding new ways to be useful.

[blog] What’s next in Google AI infrastructure: Scaling for the agentic era. Compute, networking, and storage are still the bread-and-butter of a hyperscaler, and ours should be second-to-none.

[blog] Inside the eighth-generation TPU: An architecture deep dive. Big improvements, and two distinct systems.

[blog] Level Up Your Agents: Announcing Google’s Official Skills Repository. Fantastic work from my team here. We built a foundational set of agent skills you can use with your favorite agentic harness.

[article] 15 principles for managing up. Really good. “Managing up” has a bad rap, but it’s important to effectively and clearly communicate with the boss.

[article] Eclipse Foundation offers enterprise-grade open source alternative to Microsoft’s VS Code Marketplace. Already popular, this registry has over 10k extensions. I’m glad Google is a supporter.

[blog] Introducing the Builders Hub from the Google Developer Program. Tremendous work from my team to build this new experience. It pulls together more of Google’s dev products into a single place.

[article] SpaceX is working with Cursor and has an option to buy the startup for $60B. Wouldn’t have predicted that, but it also makes sense given the ambitions of those involved.

[blog] A Guide to 5 Agent Payment Protocols. It’s wild there are five of these already. But they don’t completely overlap. This post tries to separate the use cases for each.

[article] Employers say they struggle to find workers with the right AI skillset. This is specific to graduate hires. Universities need to quickly reset to ensure their churning out people who have the skills the market is willing to pay for.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:



Read the whole story
alvinashcraft
52 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Building agents that reach production systems with MCP

1 Share
Building agents that reach production systems with MCP
Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Gates Foundation To Cut 20% of Staff, Review Epstein Ties

1 Share
An anonymous reader quotes a report from Reuters: The Gates Foundation opened an external review earlier this year into its engagement with the late financier and convicted sex offender Jeffrey Epstein, the philanthropic group said on Tuesday. The foundation has been mired in controversy due to Chairman Bill Gates' association with Epstein. A release of emails in January by the U.S. Justice Department also showed communication between Epstein and the Gates Foundation's staff. "Early this year, Gates Foundation CEO Mark Suzman commissioned an external review to assess past foundation engagement with Epstein, and our current policies for vetting and developing new philanthropic partnerships," the foundation said in a statement. "That review is underway, and we expect the board and management will receive an update this summer," it added. The Wall Street Journal, which first reported the news earlier on Tuesday, said Suzman told staff in a memo, "this is a challenging time for our organization in many ways, but it also highlights the critical importance of taking the tough actions now." The WSJ also reports that the Gates Foundation will eliminate up to 500 jobs, or about 20% of its staff, by 2030. It said the foundation has a 2026 budget of about $9 billion, but plans to cap operating expenses at $1.25 billion. Further reading: The Bill Gates-Epstein Bombshell - and What Most People Get Wrong

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
7 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories