TypeScript didn’t just grow, it evolved with the community. Amanda Silver shares why open sourcing it was key to working alongside frameworks like Angular, React, and Vue.
Building Agents should be easy and it is with Microsoft Agent Framework. Plugging in memory makes the Agent remember you and your conversations. SQL Server is a great option for storing your memory in, and Azure SQL Database future proofs you. Your users deserve to be remembered so learn how.
✅ Chapters: 0:00 Intro 0:45 sample e-commerce app 1:49 agent recommendations 3:00 agent code walkthrough 3:20 context provider and history provider with Microsoft Agent Framework 5:04 more details on the history provider and how Microsoft Agent Framework helps 6:10 mssql-python driver 8:54 what else you can do with the framework 9:00 local to cloud, easily pluggable 10:10 final tips and trips, Agent Framework UI
✅ Resources: Microsoft agent framework, Microsoft Agent Framework Overview | Microsoft Learn SQL Server Docker, Docker: Run Containers for SQL Server on Linux - SQL Server | Microsoft Learn Azure SQL Server, Connect to and Query Azure SQL Database Using Python and the mssql-python Driver - Azure SQL Databa…
For better or worse, my life has become infested by Jest mocks.
I come from the land of dependency injection (I even gave a talk once!), but whenever I talk about dependency injection to JavaScript developers I get this reaction:
I thought I could get by winging it but that never works in the long run, so I finally sat down and studied how Jest mocking actually works. Here's the notes from my research.
(I’m not aiming to write a complete guide here, just covering what I use regularly.)
Mocking modules
The classic example of needing to mock is when you want to test a component that depends on an API. You want to test the business logic, but you don't want to actually make network calls to an external service:
This is where jest.mock() comes into play. Calling it replaces the real module with a mock version:
Calling logic.foo() normally results in a real API call, but since I mocked api, it doesn't.
Because mock modules must be set up before you start importing modules, Jest will “hoist” all jest.mock() calls so they execute first. As such, it doesn’t matter where you put your calls to jest.mock() (which honestly spooks me a bit).
(Mock hoisting is why you can’t, for example, use constants in your mocks. While the constant might be defined in the file before calling jest.mock(), the execution order flips them the other way around!)
Automock
What is returned from a mocked module, anyways?
By default, Jest “automocks” modules, replacing its properties with mockable alternatives. For example, it replaces any function with a mock function (aka jest.fn()). It’ll walk object trees, so if you’ve got an object with a function property, it’ll mock that function as well.
Not all module properties are mocked, though. For example, constants (e.g. strings or numbers) are unaffected by automock. I couldn’t find any docs mapping type → mock behavior, so some trial-and-error may be required for unusual circumstances.
(Warning: “automock” is an overloaded term in Jest; it’s used both to describe what happens to a mocked module by default, but is also the name of the automock feature.)
Mocking behavior
Automock is all well and good for keeping the tests from crashing, but sometimes you need to replace behavior as well (such as returning a stub value).
Let's make our mocked API return a sample response:
We’ve mocked api.get() but the rest of the api module uses the actual implementation.
The factory function allows for all sorts of shenanigans, such as returning a module which doesn’t match the spec of the original module at all. That sucks, so I only use factory functions when needed.
Cleaning mocks
Reusing a mock usually means cleaning up between tests. There are three levels of cleaning:
Clear - clears mock usage metadata (e.g., how many times was this mock function called).
Reset - does the above plus replaces the mocked function w/ the default empty function (i.e. undoes calls like mockReturnValue()).
Restore - does all the above plus restores the function to its original (real) implementation.
You can also call jest.resetModules() to completely reset the modules in the cache, in case there’s some local state in a module that you need to clear between tests.
What’s new in Postman: AsyncAPI 3.0, performance streaming, and service accounts
This was a packed week for the Postman platform. Five features shipped that touch almost every part of the API lifecycle — from how you design event-driven APIs, to how you validate specifications, to how you stress-test services in your CI/CD pipeline. I’ve been digging into each one, and a few of these are going to change daily workflows for a lot of teams.
Here’s what landed and how to start using it.
AsyncAPI 3.0 gets first-class support in Spec Hub
If you’re building event-driven systems with Kafka, MQTT, WebSockets, or any message broker, this one matters. Postman now supports AsyncAPI 3.0 specifications with the same editing, validation, and preview experience you’re used to with OpenAPI.
The 3.0 release of AsyncAPI was a significant rewrite. It decoupled channels from operations, making specifications reusable and less ambiguous than the 2.x versions. That cleanup is what made broad tooling support practical — and now Postman is there.
What you can do
You can create AsyncAPI 3.0 specifications directly from the left sidebar, the + New menu, or through Postman Agent Mode. The import flow accepts v3 files through file upload, folder, link, or paste — the same paths you’d use for any other specification type.
The editor provides v3-specific validation, snippets, and $ref autocomplete. There’s a v3-aware sidebar outline that includes top-level Operations (a concept that didn’t exist as a standalone entity in v2). And the shared AsyncAPI live preview renders your documentation as you type.
Local workspaces also auto-detect v3 specification files on disk, so if you’re using Native Git workflows, your AsyncAPI 3.0 files show up without extra configuration.
Unified diagnostics pane for API specifications
The per-workbench validation pane is gone. In its place: a single, centralized diagnostics pane that consolidates all validation issues across every open API specification and Common Component tab.
This is one of those changes that sounds incremental but shifts how you work. Instead of switching between tabs to check each specification’s validation state, you get a single view with everything — errors, warnings, info messages — organized into a navigable tree grouped by file and rule source (syntax versus governance rules).
The status bar indicator
There’s a new status bar indicator at the bottom of the editor that shows real-time error, warning, and info counts for whichever tab you’re looking at. Click it, and you jump directly to the diagnostics pane filtered to that tab’s issues.
You can also toggle severity levels independently. Working through a governance rollout and want to focus on errors first? Turn off warnings and info. Come back to them later.
Move API specifications between workspaces
You can now move API specifications across workspaces the same way you’ve always moved collections. Open the specification options menu, select Move, choose a destination workspace, and done. Structure, metadata, and multi-file relationships all come along.
This works for every specification type Postman supports: OpenAPI 2.0, 3.0, and 3.1, AsyncAPI 2.0, GraphQL, and protobuf. Single-file and multi-file specifications both work.
It’s a small feature, but it removes a real friction point. I’ve seen teams duplicate specifications across workspaces because there was no move option — then struggle to keep them in sync. Now you can reorganize without that overhead.
Live streaming for CLI performance runs
This is the one that got the most internal excitement, and I get why. You can now trigger performance runs from the Postman CLI and watch results stream in real time in Postman.
Before this, running performance tests from a CI/CD pipeline meant waiting for the full run to complete before seeing any results. If your API failed 10 seconds in, you still waited for the entire test duration to elapse. That’s wasted pipeline time and delayed feedback.
How it works
Run your performance test from the Postman CLI using the postman performance run command — either locally or from your CI/CD pipeline. While the test runs, open Postman and you’ll see a live stream of response times, throughput, and error rates updating in real time.
The practical value here goes beyond watching graphs update. If an API breaks immediately under load, your team can stop the run and adjust without burning through the remaining test duration. That matters when your CI/CD pipeline is blocking a deploy.
Quality gates in your pipeline
The Postman CLI can also act as an automated quality gate. Define your thresholds — response time SLAs, maximum error rates, minimum throughput — and the Postman CLI returns a non-zero exit code if the run fails to meet them. Your pipeline stops the build before it reaches production.
This follows the broader industry shift toward integrating performance testing directly into CI/CD rather than running it as a separate, post-deployment step. The difference is catching regressions before they ship, not after.
Available to all Postman Solo plan and above customers.
Try it in your pipeline
Here’s what the Postman CLI integration looks like in a GitHub Actions workflow:
The exit code tells your pipeline whether the run passed or failed, so you can gate deployments on performance results without writing custom scripts.
Service accounts for Enterprise teams
Service accounts give Postman Enterprise teams a non-human identity for automation, integrations, and system-to-system interactions. If you’ve been using a team member’s personal API key for CI/CD integrations, this is the proper replacement.
The security model uses short-lived JWT tokens. You exchange a service account API key for a token that’s valid for 15 minutes, and that token encodes the account’s identity and permissions. No more long-lived credentials sitting in your pipeline configuration.
The API endpoint
There’s a single endpoint:
POST /service-account-tokens
Authorization: Bearer <service-account-api-key>
Content-Type: application/json
The response gives you a short-lived access token:
Use that token for your automated Postman API calls. When it expires, request a new one. The endpoint is rate limited to 10 requests per 10-second window per user, which is plenty for any automation pattern that isn’t doing something wrong.
Why this matters
Non-human identity management has become a critical concern as teams automate more of their API lifecycle. The two main risks with using personal API keys for automation are: the key has broader permissions than the automation needs, and the key stops working when that person leaves the team.
Service accounts solve both. You scope the permissions to what the automation actually requires, and the account isn’t tied to any individual’s employment status. Combined with short-lived tokens, the blast radius of a compromised credential shrinks from “everything that person has access to, indefinitely” to “a narrow set of permissions, for 15 minutes.”
This is Enterprise plan only.
Also in this release: Postman API v1.41.0
Alongside service accounts, the Postman API v1.41.0 adds AsyncAPI 3.0 support to the /specs endpoints. If you’re managing specifications programmatically, you can now create, read, and update AsyncAPI 3.0 specifications through the API — not only through the app.
What these updates add up to
Individually, each of these features fills a specific gap. Together, they tell a story about where Postman is heading: tighter feedback loops (live performance streaming), broader protocol coverage (AsyncAPI 3.0), better developer experience for specification management (unified diagnostics, cross-workspace moves), and proper infrastructure for automation at scale (service accounts).
The AsyncAPI 3.0 support is particularly interesting if you’re working in a hybrid architecture — REST APIs alongside event-driven services. Having both OpenAPI and AsyncAPI specifications in the same workspace, with the same validation and editing experience, means one fewer tool in your stack.
Start with whatever’s most relevant to your current work. If you’re running performance tests in CI/CD, try the live streaming. If you’ve been waiting on AsyncAPI 3.0 support, create your first specification in Spec Hub. If you’re an Enterprise team with personal API keys in your pipelines, swap them out for service accounts.
In early 2023, Slack faced a foundational challenge: serving Large Language Models (LLMs) at enterprise scale with the security, reliability, and performance our customers expect. Over three years, we evolved from basic infrastructure to orchestrating a sophisticated multi-cloud architecture. We didn’t just want shiny new models; we needed a system resilient to regional outages and GPU scarcity. Our journey moved through four distinct phases, shifting from reactive infrastructure management to proactive, multi-vendor orchestration.
Phase 1: The SageMaker Era
When we built the initial stages of Slack AI, AWS SageMaker was the natural starting point. It was a managed ML Serving platform that offered the key things that we were looking for: Security, FedRamp compliance, model availability and control. We were able to leverage a sophisticated escrow virtual private cloud (VPC) strategy to establish a strict zero-knowledge environment: our data remained private to Slack, and the provider’s proprietary model weights remained inaccessible to us.
To maximize uptime for a global user base, we deployed these containers across multiple AWS regions. This required our teams to manage the operational lifecycle, including cross-region IAM roles, balanced routing across model endpoints, proactive capacity planning, and auto-scaling logic.
The Operational Reality
While SageMaker provided the necessary security, the overhead was immense. We faced three primary taxes:
Scaling Latency: Initialization times prevented instantaneous scaling.
Hardware Scarcity: Enterprise-grade Nvidia GPUs, such as the A100 (Ampere architecture) and the emerging H100 (Hopper architecture) instances, were often unavailable.
Over-Provisioning: Maintaining idle resources to meet peak SLAs.
By early 2024, we mitigated these via On-Demand Capacity Reservations (ODCR) and proactive, cron-based scaling. However, this reinforced a hard truth: we were spending too many engineering cycles on plumbing. To scale, we needed automated capacity, not manual coordination.
Feature Lag
As the AI ecosystem and feature usage accelerated, newer and higher quality models emerged quickly. While we were maintaining a custom serving solution on SageMaker, AWS was heavily prioritizing Amazon Bedrock, its purpose-built managed LLM service.
Hosting Anthropic models via an escrow VPC led to a “catch-up” cycle. Model iterations and optimizations often debuted on Bedrock weeks or months before SageMaker availability. For Slack, where staying at the bleeding edge of model quality is a competitive necessity, this gap became a significant driver for our next architectural evolution.
Phase 2: Migrating to Amazon Bedrock for Agility and Access
By mid-2024, AWS Bedrock had matured significantly. It had achieved FedRamp Moderate compliance and also promised the same security posture that we required. The decision to migrate was a strategic pivot as it offered three immediate advantages:
Operational Simplicity: We moved away from having to scale individual GPU instances to a fully managed AWS service.
Immediate Model Access: We eliminated any LLM model feature lag by gaining access to the latest models very quickly after LLM Providers made them publicly available.
Infrastructure Efficiency: Bedrock introduced Provisioned Throughput (PT) and On Demand (OD) infrastructure options, allowing us to tailor compute to specific use cases. We utilized PT for interactive, latency-sensitive features like channel summaries, while leveraging OD for bursty, scheduled workloads like Recap to eliminate costs for idle compute.
Understanding Provisioned Throughput
In the Bedrock ecosystem, capacity is measured in Model Units (MUs). Each MU provides a deterministic amount of throughput, measured in tokens per minute. Shifting from GPU instances to MUs allowed us to abstract away the hardware and focus entirely on raw throughput. To minimize migration risk, we prioritized provisioned throughput infrastructure first, leaving on demand infra as a fast follow.
The Zero Incident Migration
We executed the transition through a multi-stage migration strategy:
Compliance: Secured Legal, Security, and FedRamp sign-offs before rerouting production traffic to maintain our existing high bar for data privacy.
Capacity: Conducted extensive load tests to map the exact number of Model Units (MUs) required to match our SageMaker baseline across diverse traffic profiles.
Quality: Used A/B testing and evaluation frameworks to compare environment outputs side-by-side, verifying both quality and latency parity.
Rollout: Implemented gradual traffic shifts via feature flags and instant rollback capabilities, ensuring 100% availability during the switch.
Achieving Operational Maturity
The migration to Bedrock delivered immediate, compounding wins for our engineering teams and our customers:
Engineering Efficiency & Enhanced Experience: By offloading the burden of self-managed infrastructure, we freed our engineers to focus on model performance and feature quality. Because Bedrock serves as the primary launchpad for new LLMs, we were able to deliver model upgrades and quality improvements to users weeks or months earlier than was possible on SageMaker, directly enhancing the user experience across the entire Slack AI suite. A prime example was our ability to quickly upgrade the AI Search features to new high-reasoning models, which led to more precise, context-aware answers.
Architectural Simplicity: We successfully moved away from the “infrastructure plumbing” of endpoint management, GPU instance lifecycle, and complex capacity reservation coordination. In this new model, we simply requested quota from AWS, they provisioned the MUs, and we served traffic. This allowed us to shift from reactive scaling to a strategic forecasting technique. By projecting our needs several weeks out, we gave our account teams ample time to secure capacity, ensuring we were always ahead of the demand curve.
The “Zero-Incident” Standard: Switching an entire backend while serving live traffic can sometimes be a recipe for disaster. We avoided that by being borderline obsessed with parity, and achieved zero customer-facing incidents. We didn’t just run unit tests; we ran massive load tests and shadow requests to find the exact “Model Unit” count that matched our old setup. We used feature flags to slowly bleed traffic over, so if anything looked even slightly off, we could yank it back in seconds. It wasn’t magic – it was just a lot of cautious plumbing.
This solidified a core Slack AI engineering principle: measure first, migrate gradually, and monitor continuously.
The Final Efficiency Gap
While Provisioned Throughput was a massive leap forward for predictable, consistent workloads, it wasn’t perfectly optimized for the workloads. We encountered two primary efficiency hurdles:
The Over-Provisioning Cycle: Our infrastructure needs are very closely aligned by the global workday traffic patterns. To ensure a snappy experience during the massive US East and West Coast morning surges – when users lean heavily on AI Summaries and Search to catch up on activity – we had to maintain a high baseline of MUs. While we saw steadier, lighter usage during the APAC and EU mornings, we had to provision for that absolute global peak. This meant we were often paying for significant underutilized capacity during the troughs between regional handoffs and over the weekends, creating a persistent efficiency gap.
The Commitment Lock-in: Provisioned Throughput often required commitments of one to six months. In the fast-moving world of LLMs, where a state-of-the-art model can be superseded in weeks, these commitments effectively slowed down our ability to upgrade. Even when a superior model was released, we often chose to wait for our existing commitments to expire before migrating.
These challenges led us to our next evolution: finding a way to balance the reliability of provisioned capacity with the economic and technical flexibility of On-Demand scaling.
Phase 3: Transitioning to Bedrock On-Demand
With high confidence in Bedrock and mature monitoring, we moved to close the final efficiency and quality gap. Historical analysis revealed that feature usage fluctuated with business hours, leaving some idle capacity overnight.
Rather than maintaining a static footprint for 24/7 peak capacity, moving to on-demand infrastructure allowed us to solve the idle capacity problem. It gave us the architectural agility to support highly variable workloads without the friction of manual over-provisioning. For features with a 10x variance between peak and off-peak hours, the efficiency gains were substantial. More importantly, it removed the technical bottleneck we faced in Phase 2: because we were no longer locked into multi-month commitments, we regained the freedom to migrate features to different models. This meant that as soon as a more performant model dropped and passed our internal quality and metrics bars, we could pivot our infrastructure to support it within a day, rather than waiting months for a contract to expire.
The Hybrid Strategy: Optimizing for Performance and Fluency
We didn’t simply flip a switch and move everything to On-Demand. To balance efficiency with a premium user experience, we implemented a Hybrid Routing strategy. We kept high-volume, latency-sensitive features on dedicated capacity (Provisioned Throughput) to ensure a consistent “snappy” feel. Simultaneously, we moved asynchronous, bursty workloads – like nightly Recaps – to On-Demand capacity. To bridge the gap, we engineered a Spillover Pattern: if a sudden surge pushed us beyond our reserved limits, excess requests automatically “spilled over” to on-demand endpoints, ensuring we never dropped a request due to capacity ceilings.
Navigating the Trade-offs of On-Demand
Shifting to On-Demand traded rigid pre-planning for architectural agility, eliminating manual capacity management. By utilizing Bedrock’s ability to route across different US regions based on real-time availability, we were able to find capacity dynamically while adhering to our regional data boundaries. However, this flexibility introduced a new set of variables that we had to solve for:
Service Level Variability: Unlike the dedicated nature of PT, OD operates on a shared-resource model, which typically carries different uptime characteristics.
Regional Capacity Orchestration: Success with OD relies on the cloud provider’s ability to manage demand across their entire customer base in specific regions, rather than having specific hardware units explicitly reserved for Slack.
Concentration Risk: Relying too heavily on a single provider’s on-demand pool meant that any service-wide blip could have the potential to impact entire Slack AI features simultaneously.
Engineering for Resilience
To mitigate these risks, we didn’t just accept the trade-offs – we built a more intelligent AI Platform abstraction. We developed a model hierarchy for every AI feature, allowing our system to automatically fall back to different models if the primary model reached a degraded state. Some examples of regressions are elevated time to first token latencies, throttling errors, and downward trend in customer feedback.
This hierarchy was a game-changer for model quality and reliability. If a specific model was underperforming or hitting limits in one region, the platform would reroute the request in real-time to another healthy endpoint. From the customer’s perspective, the experience remained seamless; they continued to receive high-quality results without ever knowing a complex failover had occurred behind the scenes.
While this internal fallback logic significantly increased our service resilience, it also highlighted two strategic gaps. First, no matter how many failovers we engineered within a single cloud, we remained susceptible to any potential provider-wide outage. Second, the AI landscape is moving with incredible velocity and remains highly fragmented. The state-of-the-art model for a specific task – whether it’s summarization, reasoning, or high-speed extraction – can change in a matter of weeks, and these leading models are often exclusive to specific cloud providers. Relying on any single vendor meant we might be artificially limiting our access to the highest-quality technology available. To ensure Slack AI always provides the best possible experience, we need the flexibility to go wherever the best models are while simultaneously meeting our security, compliance, and privacy standards.
As Slack AI scaled to millions of users, we realized that true enterprise-grade reliability and a “best-of-breed” model strategy required looking beyond any single provider. This realization was the primary catalyst for our latest evolution: the move to a Multi-Cloud architecture.
Phase 4: Expanding to a Multi-Cloud Strategy Ecosystem
We recognized that providing a world-class AI experience required the best of every ecosystem. By early-2026 we officially expanded our footprint to include Google Cloud Platform (GCP) Vertex AI, not just as a failover for redundancy, but as a strategic engine to accelerate product innovation through access to a broader catalog of state-of-the-art models. Our goal is simple: ensure Slack remains the most intelligent place to get work done. This move wasn’t done just for the sake of complexity, but rather a strategic shift driven by four key factors:
Infrastructural Redundancy & High Availability: For a mission-critical Digital HQ, uptime is the primary metric. While we continue to rely on third-party LLM models as a cornerstone for their consistency and reliability, a multi-cloud footprint eliminates provider-level large scale infrastructural disruptions as a single point of failure. If an entire cloud ecosystem experiences a regional or platform-wide disruption, our traffic can be rerouted to a separate, healthy stack without service interruption.
Model-to-Feature Optimization: The “one-size-fits-all” approach to LLMs quickly hits diminishing returns. By expanding our catalog to include multiple models, we gained the ability to match the specific latent strengths of a model to the specific requirements of a feature. This granular optimization led to immediate performance gains:
~10% improvement in quality metrics for complex reasoning tasks.
~67% reduction in latency for high-velocity, low-token workloads.
Access to Innovation: The AI landscape moves at extreme velocity with frequent vendor exclusivity. Multi-cloud ensures we are ready to integrate with the latest breakthroughs regardless of where they are hosted while upholding our compliance, privacy, and security promises.
Dynamic Workload Orchestration: Beyond simple failover, multiple providers allow for sophisticated traffic shaping. We can route requests based on real-time telemetry – evaluating not just provider health, but which endpoint offers the optimal performance profile for a given workload at that exact moment. This enhanced our infrastructure from a static resource into a dynamic, intelligent routing layer.
The Integration Journey
Building a production-ready GCP integration was a massive cross-functional effort. It required tight synchronization across teams such as Security, Risk and Compliance, Trust and Integrity, AI Quality, Legal, and Cloud Providers to ensure our data boundaries remained ironclad across the board. Expanding to GCP Vertex AI turned our infrastructure into a strategic engine for product innovation. Rather than being limited to any single provider’s catalog, we can now granularly match specific features to the models best suited for them – balancing factors like context window, latency, and reasoning capabilities. To make this a reality, we solved cold start engineering hurdles by implementing secretless authentication and an API Normalization layer that translates disparate provider signals into a unified language for our application logic.
Architectural Deep Dive: The Intelligent Routing Layer
The core technical challenge was building a system that abstracted away provider complexity. By enhancing our abstraction layer into an Intelligent Routing Layer, we ensured that users receive the fastest, highest-quality response available. If one model or provider slows down, the system instantly reroutes the request to a better-performing alternative, making the underlying complexity completely invisible to the user while maintaining a seamless experience. It contains:
Metric-Driven Model Selection: We use our internal quality metrics to determine the optimal model for each feature. For instance, if our benchmarks show a specific LLM outperforms others for “Recaps,” the router directs traffic accordingly. Crucially, we always designate backup models for every feature; if the primary choice doesn’t meet our performance or quality thresholds in real-time, the system knows exactly where to go next.
Experimental Rules & A/B Testing: This capability has fundamentally changed our release velocity. When we wanted to test the latest LLMs, after our security and compliance verifications, for our Recaps feature, we were able to route a percentage of traffic to the new model with minimal code changes and an incredibly fast turnaround time. This allowed us to validate performance in the wild and tighten our feedback loop from weeks to days.
Automated Circuit Breaker & Health Monitoring: To move beyond manual failovers, we implemented an automated Circuit Breaker pattern. This system acts as a real-time watchdog, constantly monitoring health signals at the endpoint level. If a specific provider or model begins to exhibit signs of distress – such as an elevated Time to First Token (TTFT), a spike in 5xx error rates, or crossing a latency p90 threshold – the circuit “trips.” Once tripped, the routing layer automatically diverts traffic to a healthy alternative model based on the use case and complexity. Crucially, the breaker enters a partial-open state, allowing a small, controlled trickle of requests to reach the degraded endpoint. As the endpoint demonstrates sustained health, the system dynamically expands this trickle, incrementally ramping traffic back up until the breaker is fully “closed” and normal operations resume. This ensures a graceful recovery without overwhelming a stabilizing service.
The Multi-Cloud Reality
Running a multi-cloud footprint at our scale is a major technical undertaking. It’s a conscious trade-off: we gain immense flexibility but it requires a much more sophisticated approach to how we manage our systems:
API and Behavioral Friction: Each provider has its own unique API patterns, proprietary error codes, and distinct rate-limiting behaviors. We had to build a robust normalization layer to ensure that a “Rate Limit Exceeded” from one provider and a “Throttling Exception” from another were handled identically by our application logic.
Operational Monitoring Complexity: To avoid blind spots, we couldn’t rely on the native dashboards of each cloud. We had to build a unified monitoring stack that integrates telemetry from the multiple clouds into a single view, ensuring our on-call engineers can diagnose issues without pivoting between consoles.
The Attribution Challenge: Accurately tracking the cost per feature internally becomes significantly harder when workloads are shifting dynamically between clouds. This required deep instrumentation across multiple billing systems to maintain financial transparency.
The On-Call Knowledge Gap: Our engineers can no longer be specialists in just one ecosystem. To support the platform effectively, they need to be provider agnostic, possessing deep expertise in the infrastructure patterns and networking nuances that span multiple major cloud environments. This shift requires a broader skill set to troubleshoot and maintain a distributed, multi-vendor footprint.
While multi-cloud increases operational overhead, the trade-off is a superior service. We have removed single points of failure, improved quality benchmarks by matching features to specific model strengths, and gained the strategic leverage to adopt new innovations the moment they hit the market.
Reflections on the Path to Multi-Cloud
We arrived at a multi-cloud architecture not for the sake of complexity, but to enhance Slack’s standards for product innovation and reliability. Looking back, five themes stand out:
1. Scaling safely requires XFN parity
The biggest hurdles in scaling AI aren’t just technical; they also include legal, risk, compliance, and security related tasks. Achieving deep alignment between these teams and engineering is what allowed us to scale to millions of users without compromising our trust standards.
2. The abstraction layer is a core requirement
As seen in our Phase 2 move, the most critical decision wasn’t which model to use, but how we built the logic around them. Agility and speed to market are our primary competitive edge.
3. Treat architecture as a living document
Managed services mature monthly. Because we remained provider-agnostic, we can now adopt breakthroughs in latency or reasoning without a total rewrite.
4. Reliability requires provider agnosticism
Internal failovers aren’t enough. Our move in Phase 4 to a multi-provider stack ensures Slack stays online even during any potential platform-wide cloud disruption.
5. Redefining the meaning of “Failure”
An LLM service that is “up” but slow is effectively broken. By treating different dimensions of data such as p90 spikes as soft failures and feedback trends, our routing layer ensures users have a snappy experience.
The future of enterprise AI is multi-cloud, multi-model, and dynamically orchestrated. By prioritizing portability and staying close to the market, we haven’t just built a way to use AI – we’ve built a platform that harnesses the best the industry has to offer the moment it arrives. We’re looking forward to seeing what we build next!
Interested in taking on interesting projects, making people’s work lives easier, or just building some pretty cool forms? We’re hiring!