Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
152394 stories
·
33 followers

Scaling Camera File Processing at Netflix

1 Share

Orchestrating Media Workflows Through Strategic Collaboration

Authors: Eric Reinecke, Bhanu Srikanth

Introduction to Content Hub’s Media Production Suite

At Netflix, we want to provide filmmakers with the tools they need to produce content at a global scale, with quick turnaround and choice from an extraordinary variety of cameras, formats, workflows, and collaborators. Every series or film arrives with its own creative ambitions and technical requirements. To reduce friction and keep productions moving smoothly, we built Netflix’s Media Production Suite (MPS) with the goal of automating repeatable tasks, standardizing key workflows, and giving productions more time to focus on creative collaboration and craftsmanship.

A critical part of this effort is how we handle image processing and camera metadata across the hundreds of hours and terabytes of camera footage that Netflix productions ingest on a daily basis. Rather than build every component from scratch, we chose to partner where it made sense–especially in areas where the industry already had trusted, battle-tested solutions.

This article explores how Netflix’s Media Production Suite integrates with FilmLight’s API (FLAPI) as the core studio media processing engine in Netflix’s cloud compute infrastructure, and how that collaboration helps us deliver smarter, more reliable workflows at scale.

Why We Built MPS

As Netflix’s production slate grew, so did the complexity of file-based workflows. We saw recurring challenges across productions:

  • File wrangling sapping time from creative decision-making
  • Inconsistent media handling across shows, regions, or vendors
  • Difficult to audit manual processes that are prone to human error
  • Duplication of effort as teams reinvented similar workflows for each production

Content Hub Media Production Suite was created to address these pain points. MPS is designed to:

  • Bring efficiency, consistency, and quality control to global productions
  • Streamline media management and movement from production through post-production
  • Reduce time spent on non-creative file management
  • Minimize human error while maximizing creative time

To achieve this, MPS needed a robust, flexible, and trusted way to handle camera-original media and metadata at scale.

The Right Tool for the Job

From the start, we knew that building a world-class image processing engine in-house is a significant, long-term commitment: one that would require deep, continuous collaboration with camera manufacturers and the wider industry.

When designing the system, we set out some core requirements:

  • Inspect, trim, and transcode original camera files and metadata for any Netflix production with trusted color science
  • Support a wide variety of cameras and recording formats used worldwide while staying current as new ones are released
  • Run well in our paved-path encoding infrastructure, enabling us to take advantage of proven compute and storage scalability with robust observability

FilmLight develops Baselight and Daylight, which are commonly used in the industry for color grading, dailies, and transcoding. Their FilmLight API (FLAPI) allows us to use that same media processing engine as a backend API.

Rather than duplicating that work, we chose to integrate. FilmLight became a trusted technology partner, and FLAPI is now a foundational part of how MPS processes media.

The Media Processing Engine

MPS is not a single application; it’s an ecosystem of tools and services that support Netflix productions globally. Within that ecosystem, the FilmLight API plays the following key roles.

  1. Parsing camera metadata on ingest

Productions upload media to Netflix’s Content Hub with ASC MHL (Media Hash List) files to ensure completeness and integrity of initial ingest, but soon after, it’s important to understand the technical characteristics of each piece of media. We call this workflow phase “inspection.”

Footage ingested with MPS is inspected using FLAPI and all metadata is indexed and stored

At this stage, we:

  • Use FLAPI to gather camera metadata from the original camera files
  • Conform the workflow critical fields to Netflix’s normalized schema
  • Make it searchable and reusable for downstream processes

This metadata is integral to:

  • Matching footage based on timing and reel name for automated retrieval
  • Debugging (e.g., why a shot looks a certain way after processing)
  • Validations and checks across the pipeline

FLAPI provides consistent, camera-aware insight into footage that may have originated anywhere in the world. Additionally, since we’re able to package FLAPI in a Docker image, we can deploy almost identical code to both cloud and our production compute and storage centers around the world, ensuring a consistent assessment of footage wherever it may exist.

2. Generating VFX plates and other deliverables

Visual effects workflows constantly push image processing pipelines to their absolute limits. For MPS to succeed, it must generate images with accurate framing, consistent color management, and correct debayering/decoding parameters — all while maintaining rapid turnaround times.

To achieve this, we leverage Netflix’s Cosmos compute and storage platform and use open standards to provide predictable and consistent creative control.

At this phase, we use the FilmLight API to:

  • Debayer original camera files with the correct format-specific decoding parameters
  • Crop and de-squeeze images using Framing Decision Lists (ASC FDL) to ensure spatial creative decisions are preserved
  • Apply ACES Metadata Files (AMF), providing repeatable color pipelines from dailies through finishing
  • Generate an array of media deliverables in varied formats

These processes are automated, repeatable, and auditable. We deliver AMFs alongside the OpenEXRs to ensure recipients know exactly what color transforms are already applied, and which need to be applied to match dailies.

Because we use FilmLight’s tools on the backend, our workflow specialists can use Baselight on their workstations to manually validate pipeline decisions for productions before the first day of principal photography.

The Media Processing Factory in the Cloud

Finding an engine that competently processes media in line with open standards is an important part of the equation. To maximize impact, we want to make these tools available to all of the filmmakers we work with. Luckily, we’re no strangers to scaled processing at Netflix, and our Cosmos compute platform was ready for the job!

Cloud-first integration

The traditional model for this kind of processing in filmmaking has been to invest in beefy computers with large GPUs and high-performance storage arrays to rip through debayering and encoding at breakneck speed. However, constraints in the cloud environment are different.

Factors that are essential for tools in our runtime environment include that they:

  • Are packageable as Serverless Functions in Linux Docker images that can be quickly invoked to run a single unit of work and shut down on completion
  • Can run on CPU-only instances to allow us to take advantage of a wide array of available compute
  • Support headless invocation via Java, Python, or CLI
  • Operate statelessly, so when things do go wrong, we can simply terminate and re-launch the worker

Operating within these constraints lets us focus on increasing throughput via parallel encoding rather than focusing on single-instance processing power. We can then target the sweet spot of the cost/performance efficiency curve while still hitting our target turnaround times.

When tools are API-driven, easily packaged in Linux containers, and don’t require a lot of external state management, Netflix can quickly integrate and deploy them with operational reliability. FilmLight API fit the bill for us. At Netflix, we leverage:

  • Java and Python as the primary integration languages
  • Ubuntu-based Docker images with Java and Python code to expose functionality to our workflows
  • CPU instances in the cloud and local compute centers for running inspection, rendering, and trimming jobs

While FLAPI also supports GPU rendering, CPU instances give us access to a much wider segment of Netflix’s vast encoding compute pool and free up GPU instances for other workloads.

To use FilmLight API, we bundle it in a package that can be easily installed via a Dockerfile. Then, we built Cosmos Stratum Functions that accept an input clip, output location, and varying parameters such as frame ranges and AMF or FDL files when debayering footage. These functions can be quickly invoked to process a single clip or sub-segment of a clip and shut down again to free up resources.

Elastic scaling for production workloads

Production workloads are inherently spiky:

  • A quiet day on set may mean minimal new footage to inspect.
  • A full VFX turnover or pulling trimmed OCF for finishing might require thousands of parallel renders in a short time window.

By deploying FLAPI in the cloud as functions, MPS can:

  • Allocate compute on demand and release it when our work queue dies down
  • Avoid tying capacity to a fixed pool of local hardware
  • Smooth demand across many types of encoding workload in a shared resource pool

This elasticity lets us swarm pull requests to get them through quickly, then immediately yield resources back to lower priority workloads. Even in peak production periods, we avoid the pain of manually managing render queues and prioritization by avoiding fixed resource allocation. All this means lightning-fast turnaround times and less anxiety around deadlines for our filmmakers.

Designed for Seasoned Pros and Emerging Filmmakers

Netflix productions range from highly experienced teams with very specific workflows to newer teams who may be less familiar with potential pitfalls in complex file-based pipelines.

MPS is designed to support both:

  • Industry veterans who need to configure precise, bespoke workflows and trust that underlying image processing will respect those decisions.
  • Productions without a color scientist on staff — those who benefit from guardrails and sane defaults that help them avoid common workflow issues (e.g., mismatched color transforms, inconsistent debayering, or incomplete metadata handling).

The partnership with FilmLight lets Netflix focus on workflow design, orchestration, and production support, while FilmLight focuses on providing competent handling of a wide variety of camera formats with world-class image science!

Collaboration and Co-Evolution

Netflix aimed to integrate MPS into a wider tool ecosystem by developing a comprehensive solution based on emerging open standards, rather than making MPS a self-contained system. Integrating FLAPI into our system requires more than an API reference–it requires ongoing partnership. FilmLight worked closely with Netflix teams to:

  • Align on feature roadmaps, particularly around new camera formats and open standards
  • Validate the accuracy and performance of key operations
  • Debug edge cases discovered in large-scale, real-world workloads
  • Evolve the API in ways that serve both Netflix and the wider industry
  • Create a positive feedback cycle with open standards like ACES and ASC FDL to solve for gaps when the rubber hits the road

One example of this has been with the implementation of ACES 2. FilmLight’s developers quickly provided a roadmap for support. As our engineering teams collaborated on integration, we also provided feedback to the ACES technical leadership to quickly address integration challenges and test drive updates in our pipeline.

This collaborative relationship–built on open communication, joint validation, and feedback to the greater industry–is how we routinely work with FilmLight to ensure we’re not just building something that works for our shows, but also driving a healthy tooling and standards ecosystem.

Impact

While much of this work takes place behind the scenes, its impact is felt directly by our productions. Our goal with building MPS is for producers, post supervisors, and vendors to experience:

  • Fewer delays caused by missing, incomplete, or incorrect media
  • Faster turnaround on VFX plates and other technical deliverables
  • More predictable, consistent handoffs between editorial, color, and VFX
  • Less time spent troubleshooting technical issues, and more time focused on creative review

In practice, this often shows up as the absence of crisis: the time a VFX vendor doesn’t have to request a re-delivery, or the time editorial doesn’t have to wait for corrected plates, or the time the color facility doesn’t have to reinvent a tone-mapping path because the AMF and ACES pipeline are already in place.

Looking Ahead

As camera technology, codecs, open standards, and production workflows continue to evolve, so will MPS. The guiding principles remain:

  • Automate what’s repeatable
  • Centralize what benefits from standardization
  • Partner where deep domain expertise already exists

The integration with FilmLight API is one example of this philosophy in action. By treating image processing as a specialized discipline and collaborating with a trusted industry partner, Netflix is delivering smarter, more reliable workflows to productions worldwide.

At its core, this partnership supports a simple goal: reduce manual workflow and tool management, giving filmmakers more time to tell stories.

Acknowledgements

This project is the result of collaboration and iteration over many years. In addition to the authors, the following people have contributed to this work:

  • Matthew Donato
  • Prabh Nallani
  • Andy Schuler
  • Jesse Korosi

Scaling Camera File Processing at Netflix was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read the whole story
alvinashcraft
20 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Defending against exceptions in a scope_exit RAII type

1 Share

One of the handy helpers in the Windows Implementation Library (WIL) is wil::scope_exit. We’ve used it to simulate the finally keyword in other languages by arranging for code to run when control leaves a scope.

I’ve identified three places where exceptions can occur when using scope_exit.

auto cleanup = wil::scope_exit([captures] { action; });

One is at the construction of the lambda. What happens if an exception occurs during the initialization of the captures?

This exception occurs even before scope_exit is called, so there’s nothing that scope_exit can do. The exception propagates outward, and the action is never performed.

Another is at the point the scope_exit tries to move the lambda into cleanup. In a naïve implementation of scope_exit, the exception would propagate outward without the action ever being performed.

The third point is when the scope_exit is destructed. In that case, it’s an exception thrown from a destructor. Since destructors default to noexcept, this is by default a std::terminate. If you explicitly enable a throwing destructor, then what happens next depends on why the destructor is running. If it’s running due to executing leaving the block normally, then the exception propagates outward. But if it’s running due to unwinding as a result of some other exception, then that’s a std::terminate.

The dangerous parts are the first two cases, because those result in the exception being thrown (and possibly caught elsewhere) without the cleanup action ever taking place.

WIL addresses this problem by merely saying that if an exception occurs during copying/moving of the lambda, then the behavior is undefined.

C++ has a scope_exit that is in the experimental stage, and it addresses the problem a different way: If an exception occurs during the construction of the capture, then the lambda is called before propagating the exception. (It can’t do anything about exceptions during contruction of the lambda, and it also declares the behavior undefined if the lambda itself throws an exception.)

In practice, the problems with exceptions on construction or copy are immaterial because the lambda typically captures all values by reference ([&]), and those types of captures do not throw on construction or copy.

The post Defending against exceptions in a <CODE>scope_exit</CODE> RAII type appeared first on The Old New Thing.

Read the whole story
alvinashcraft
27 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

openclaw 2026.4.23

1 Share

2026.4.23

Changes

  • Providers/OpenAI: add image generation and reference-image editing through Codex OAuth, so openai/gpt-image-2 works without an OPENAI_API_KEY. Fixes #70703.
  • Providers/OpenRouter: add image generation and reference-image editing through image_generate, so OpenRouter image models work with OPENROUTER_API_KEY. Fixes #55066 via #67668. Thanks @notamicrodose.
  • Image generation: let agents request provider-supported quality and output format hints, and pass OpenAI-specific background, moderation, compression, and user hints through the image_generate tool. (#70503) Thanks @ottodeng.
  • Agents/subagents: add optional forked context for native sessions_spawn runs so agents can let a child inherit the requester transcript when needed, while keeping clean isolated sessions as the default; includes prompt guidance, context-engine hook metadata, docs, and QA coverage.
  • Agents/tools: add optional per-call timeoutMs support for image, video, music, and TTS generation tools so agents can extend provider request timeouts only when a specific generation needs it.
  • Memory/local embeddings: add configurable memorySearch.local.contextSize with a 4096 default so local embedding contexts can be tuned for constrained hosts without patching the memory host. (#70544) Thanks @aalekh-sarvam.
  • Dependencies/Pi: update bundled Pi packages to 0.70.0, use Pi's upstream gpt-5.5 catalog metadata for OpenAI and OpenAI Codex, and keep only local gpt-5.5-pro forward-compat handling.
  • Codex harness: add structured debug logging for embedded harness selection decisions so /status stays simple while gateway logs explain auto-selection and Pi fallback reasons. (#70760) Thanks @100yenadmin.

Fixes

  • Codex harness: route native request_user_input prompts back to the originating chat, preserve queued follow-up answers, and honor newer app-server command approval amendment decisions.
  • Codex harness/context-engine: redact context-engine assembly failures before logging, so fallback warnings do not serialize raw error objects. (#70809) Thanks @jalehman.
  • WhatsApp/onboarding: keep first-run setup entry loading off the Baileys runtime dependency path, so packaged QuickStart installs can show WhatsApp setup before runtime deps are staged. Fixes #70932.
  • Block streaming: suppress final assembled text after partial block-delivery aborts when the already-sent text chunks exactly cover the final reply, preventing duplicate replies without dropping unrelated short messages. Fixes #70921.
  • Codex harness/Windows: resolve npm-installed codex.cmd shims through PATHEXT before starting the native app-server, so codex/* models work without a manual .exe shim. Fixes #70913.
  • Slack/groups: classify MPIM group DMs as group chat context and suppress verbose tool/plan progress on Slack non-DM surfaces, so internal "Working…" traces no longer leak into rooms. Fixes #70912.
  • Agents/replay: stop OpenAI/Codex transcript replay from synthesizing missing tool results while still preserving synthetic repair on Anthropic, Gemini, and Bedrock transport-owned sessions. (#61556) Thanks @VictorJeon and @vincentkoc.
  • Telegram/media replies: parse remote markdown image syntax into outbound media payloads on the final reply path, so Telegram group chats stop falling back to plain-text image URLs when the model or a tool emits ![...](...) instead of a MEDIA: token. (#66191) Thanks @apezam and @vincentkoc.
  • Agents/WebChat: surface non-retryable provider failures such as billing, auth, and rate-limit errors from the embedded runner instead of logging surface_error and leaving webchat with no rendered error. Fixes #70124. (#70848) Thanks @truffle-dev.
  • WhatsApp: unify outbound media normalization across direct sends and auto-replies. Thanks @mcaxtr.
  • Memory/CLI: declare the built-in local embedding provider in the memory-core manifest, so standalone openclaw memory status, index, and search can resolve local embeddings just like the gateway runtime. Fixes #70836. (#70873) Thanks @mattznojassist.
  • Gateway/WebChat: preserve image attachments for text-only primary models by offloading them as media refs instead of dropping them, so configured image tools can still inspect the original file. Fixes #68513, #44276, #51656, #70212.
  • Plugins/Google Meet: hang up delegated Twilio calls on leave, clean up Chrome realtime audio bridges when launch fails, and use a flat provider-safe tool schema.
  • Media understanding: honor explicit image-model configuration before native-vision skips, including agents.defaults.imageModel, tools.media.image.models, and provider image defaults such as MiniMax VL when the active chat model is text-only. Fixes #47614, #63722, #69171.
  • Codex/media understanding: support codex/* image models through bounded Codex app-server image turns, while keeping openai-codex/* on the OpenAI Codex OAuth route and validating app-server responses against generated protocol contracts. Fixes #70201.
  • Providers/OpenAI Codex: synthesize the openai-codex/gpt-5.5 OAuth model row when Codex catalog discovery omits it, so cron and subagent runs do not fail with Unknown model while the account is authenticated.
  • Models/Codex: preserve Codex provider metadata when adding models from chat or CLI commands, so manually added Codex models keep the right auth and routing behavior. (#70820) Thanks @Takhoffman.
  • Providers/OpenAI: route openai/gpt-image-2 through configured Codex OAuth directly when an openai-codex profile is active, instead of probing OPENAI_API_KEY first.
  • Providers/OpenAI: harden image generation auth routing and Codex OAuth response parsing so fallback only applies to public OpenAI API routes and bounded SSE results. Thanks @Takhoffman.
  • OpenAI/image generation: send reference-image edits as guarded multipart uploads instead of JSON data URLs, restoring complex multi-reference gpt-image-2 edits. Fixes #70642. Thanks @dashhuang.
  • Providers/OpenRouter: send image-understanding prompts as user text before image parts, restoring non-empty vision responses for OpenRouter multimodal models. Fixes #70410.
  • Providers/Google: honor the private-network SSRF opt-in for Gemini image generation requests, so trusted proxy setups that resolve Google API hosts to private addresses can use image_generate. Fixes #67216.
  • Agents/transport: stop embedded runs from lowering the process-wide undici stream timeouts, so slow Gemini image generation and other long-running provider requests no longer inherit short run-attempt headers timeouts. Fixes #70423. Thanks @giangthb.
  • Providers/OpenAI: honor the private-network SSRF opt-in for OpenAI-compatible image generation endpoints, so trusted LocalAI/LAN image_generate routes work without disabling SSRF checks globally. Fixes #62879. Thanks @seitzbg.
  • Providers/OpenAI: stop advertising the removed gpt-5.3-codex-spark Codex model through fallback catalogs, and suppress stale rows with a GPT-5.5 recovery hint.
  • Control UI/chat: persist assistant-generated images as authenticated managed media and accept paired-device tokens for assistant media fetches, so webchat history reloads keep showing generated images. (#70719, #70741) Thanks @Patrick-Erichsen.
  • Control UI/chat: queue Stop-button aborts across Gateway reconnects so a disconnected active run is canceled on reconnect instead of only clearing local UI state. (#70673) Thanks @chinar-amrutkar.
  • Memory/QMD: recreate stale managed QMD collections when startup repair finds the collection name already exists, so root memory narrows back to MEMORY.md instead of staying on broad workspace markdown indexing.
  • Agents/OpenAI: surface selected-model capacity failures from PI, Codex, and auto-reply harness paths with a model-switch hint instead of the generic empty-response error. Thanks @vincentkoc.
  • Plugins/QR: replace legacy qrcode-terminal QR rendering with bounded qrcode-tui helpers for plugin login/setup flows. (#65969) Thanks @vincentkoc.
  • Voice-call/realtime: wait for OpenAI session configuration before greeting or forwarding buffered audio, and reject non-allowlisted Twilio callers before stream setup. (#43501) Thanks @forrestblount.
  • ACPX/Codex: stop materializing auth.json bridge files for Codex ACP, Codex app-server, and Codex CLI runs; Codex-owned runtimes now use their normal CODEX_HOME/~/.codex auth path directly.
  • Auto-reply/system events: route async exec-event completion replies through the persisted session delivery context, so long-running command results return to the originating channel instead of being dropped when live origin metadata is missing. (#70258) Thanks @wzfukui.
  • Gateway/sessions: extend the webchat session-mutation guard to sessions.compact and sessions.compaction.restore, so WEBCHAT_UI clients are rejected from compaction-side session mutations consistently with the existing patch/delete guards. (#70716) Thanks @drobison00.
  • QA channel/security: reject non-HTTP(S) inbound attachment URLs before media fetch, and log rejected schemes so suspicious or misconfigured payloads are visible during debugging. (#70708) Thanks @vincentkoc.
  • Plugins/install: link the host OpenClaw package into external plugins that declare openclaw as a peer dependency, so peer-only plugin SDK imports resolve after install without bundling a duplicate host package. (#70462) Thanks @anishesg.
  • Plugins/Windows: refresh the packaged plugin SDK alias in place during bundled runtime dependency repair, so gateway and CLI plugin startup no longer race on ENOTEMPTY/EPERM after same-guest npm updates.
  • Teams/security: require shared Bot Framework audience tokens to name the configured Teams app via verified appid or azp, blocking cross-bot token replay on the global audience. (#70724) Thanks @vincentkoc.
  • Plugins/startup: resolve bundled plugin Jiti loads relative to the target plugin module instead of the central loader, so Bun global installs no longer hang while discovering bundled image providers. (#70073) Thanks @yidianyiko.
  • Anthropic/CLI security: derive Claude CLI bypassPermissions from OpenClaw's existing YOLO exec policy, preserve explicit raw Claude --permission-mode overrides, and strip malformed permission-mode args instead of silently falling back to a bypass. (#70723) Thanks @vincentkoc.
  • Android/security: require loopback-only cleartext gateway connections on Android manual and scanned routes, so private-LAN and link-local ws:// endpoints now fail closed unless TLS is enabled. (#70722) Thanks @vincentkoc.
  • Pairing/security: require private-IP or loopback hosts for cleartext mobile pairing, and stop treating .local or dotless hostnames as safe cleartext endpoints. (#70721) Thanks @vincentkoc.
  • Plugins/security: stop setup-api lookup from falling back to the launch directory, so workspace-local extensions/<plugin>/setup-api.* files cannot be executed during provider setup resolution. (#70718) Thanks @drobison00.
  • Approvals/security: require explicit chat exec-approval enablement instead of auto-enabling approval clients just because approvers resolve from config or owner allowlists. (#70715) Thanks @vincentkoc.
  • Discord/security: keep native slash-command channel policy from bypassing configured owner or member restrictions, while preserving channel-policy fallback when no stricter access rule exists. (#70711) Thanks @vincentkoc.
  • Android/security: stop ASK_OPENCLAW intents from auto-sending injected prompts, so external app actions only prefill the draft instead of dispatching it immediately. (#70714) Thanks @vincentkoc.
  • Secrets/Windows: strip UTF-8 BOMs from file-backed secrets and keep unavailable ACL checks fail-closed unless trusted file or exec providers explicitly opt into allowInsecurePath. (#70662) Thanks @zhanggpcsu.
  • Agents/image generation: escape ignored override values in tool warnings so parsed MEDIA: directives cannot be injected through unsupported model options. (#70710) Thanks @vincentkoc.
  • QQBot/security: require framework auth for /bot-approve so unauthorized QQ senders cannot change exec approval settings through the unauthenticated pre-dispatch slash-command path. (#70706) Thanks @vincentkoc.
  • MCP/tools: stop the ACPX OpenClaw tools bridge from listing or invoking owner-only tools such as cron, closing a privilege-escalation path for non-owner MCP callers. (#70698) Thanks @vincentkoc.
  • Feishu/onboarding: load Feishu setup surfaces through a setup-only barrel so first-run setup no longer imports Feishu's Lark SDK before bundled runtime deps are staged. (#70339) Thanks @andrejtr.
  • Approvals/startup: let native approval handlers report ready after gateway authentication while replaying pending approvals in the background, so slow or failing replay delivery no longer blocks handler startup or amplifies reconnect storms.
  • WhatsApp/security: keep contact/vCard/location structured-object free text out of the inline message body and render it through fenced untrusted metadata JSON, limiting hidden prompt-injection payloads in names, phone fields, and location labels/comments.
  • Group-chat/security: keep channel-sourced group names and participant labels out of inline group system prompts and render them through fenced untrusted metadata JSON.
  • Agents/replay: preserve Kimi-style functions.<name>:<index> tool-call IDs during strict replay sanitization so custom OpenAI-compatible Kimi routes keep multi-turn tool use intact. (#70693) Thanks @geri4.
  • Discord/replies: preserve final reply permission context through outbound delivery so Discord replies keep the same channel/member routing rules at send time.
  • Plugins/startup: restore bundled plugin openclaw/plugin-sdk/* resolution from packaged installs and external runtime-deps stage roots, so Telegram/Discord no longer crash-loop with Cannot find package 'openclaw' after missing dependency repair. (#70852) Thanks @simonemacario.
  • CLI/Claude: run the same prompt-build hooks and trigger/channel context on claude-cli turns as on direct embedded runs, keeping Claude Code sessions aligned with OpenClaw workspace identity, routing, and hook-driven prompt mutations. (#70625) Thanks @mbelinky.
  • Discord/plugin startup: keep subagent hooks lazy behind Discord's channel entry so packaged entry imports stay narrow and report import failures with the channel id and entry path.
  • Memory/doctor: keep root durable memory canonicalized on MEMORY.md, stop treating lowercase memory.md as a runtime fallback, and let openclaw doctor --fix merge true split-brain root files into MEMORY.md with a backup. (#70621) Thanks @mbelinky.
  • Providers/Anthropic Vertex: restore ADC-backed model discovery after the lightweight provider-discovery path by resolving emitted discovery entries, exposing synthetic auth on bootstrap discovery, and honoring copied env snapshots when probing the default GCP ADC path. Fixes #65715. (#65716) Thanks @feiskyer.
  • Codex harness/status: pin embedded harness selection per session, show active non-PI harness ids such as codex in /status, and keep legacy transcripts on PI until /new or /reset so config changes cannot hot-switch existing sessions.
  • Gateway/security: fail closed on agent-driven gateway config.apply/config.patch runtime edits by allowlisting a narrow set of agent-tunable prompt, model, and mention-gating paths (including Telegram topic-level requireMention) instead of relying on a hand-maintained denylist of protected subtrees that could miss new sensitive config keys. (#70726) Thanks @drobison00.
  • Webhooks/security: re-resolve SecretRef-backed webhook route secrets on each request so openclaw secrets reload revokes the previous secret immediately instead of waiting for a gateway restart. (#70727) Thanks @drobison00.
  • Memory/dreaming: decouple the managed dreaming cron from heartbeat by running it as an isolated lightweight agent turn, so dreaming runs even when heartbeat is disabled for the default agent and is no longer skipped by heartbeat.activeHours. openclaw doctor --fix migrates stale main-session dreaming jobs in persisted cron configs to the new shape. Fixes #69811, #67397, #68972. (#70737) Thanks @jalehman.
  • Agents/CLI: keep --agent plus --session-id lookup scoped to the requested agent store, so explicit agent resumes cannot select another agent's session. (#70985) Thanks @frankekn.
Read the whole story
alvinashcraft
37 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Chat History Storage Patterns in Microsoft Agent Framework

1 Share

When people talk about building AI agents, they usually focus on models, tools, and prompts. In practice, one of the most important architectural decisions is much simpler: where does the conversation history live?

Imagine a user asks your agent a complex question, clicks “try again,” explores two different answers in parallel, and then comes back tomorrow expecting the agent to remember everything. Whether that experience is possible depends on the answer to this question.

Your choice affects cost, privacy, portability, and the kinds of user experiences you can build. It also determines whether your application treats a conversation as a simple thread, a branchable tree, or just a list of messages you resend on every call.

This article explores the fundamental patterns for chat history storage, how different AI services implement them, and how Microsoft Agent Framework abstracts these differences to give you flexibility without complexity.

Why Chat History Storage Matters

Every time a user interacts with an AI agent, the model needs context from previous messages to provide coherent, contextual responses. Without this history, each interaction would be isolated. The agent couldn’t remember what was discussed moments ago.

The storage strategy you choose affects:

  • User experience: Can users resume conversations? Branch into different directions? Undo and try again?
  • Compliance: Where does conversation data live? Who controls it?
  • Architecture: How tightly coupled is your application to a specific provider?

The Two Fundamental Patterns

At the highest level, there are two approaches to managing chat history:

Service-Managed Storage

The AI service stores conversation state on its servers. Agent Framework holds a reference (like a conversation_id or thread_id) in the AgentSession, and the service automatically includes relevant history when processing requests.

chat history service managed image

Benefits:

  • Simpler client implementation
  • Service handles context window management and compaction automatically
  • Built-in persistence across sessions
  • Lower per-request payload size (just a reference ID, not full history)

Tradeoffs:

  • Data lives on provider’s servers
  • Less control over what context is included
  • No control over compaction strategy – you can’t customize what gets summarized, truncated, or dropped
  • Provider lock-in for conversation state

Client-Managed Storage

Agent Framework maintains the full conversation history locally (in the AgentSession or associated history providers) and sends relevant messages with each request. The service is stateless. It processes the request and forgets.

chat history client managed image

Benefits:

  • Full control over data location and privacy
  • Easy to switch providers (no state migration)
  • Explicit control over what context is sent
  • Full control over compaction strategies – truncation, summarization, sliding window, tool-call collapse
  • Can implement custom context strategies

Tradeoffs:

  • Larger request payloads
  • Client must handle context window limits
  • Must implement and maintain compaction strategies as conversations grow
  • More complex client-side logic

Service-Managed Storage Models

Not all service-managed storage is equal. There are two distinct models that affect what you can build:

Linear (Single-Threaded) Conversations

This is the traditional chat model: messages form an ordered sequence. Each new message appends to the thread, and you can’t branch or fork the conversation.

Examples:

  • Microsoft Foundry Prompt Agents (conversations)
  • OpenAI Responses with Conversations API (conversations)
  • [DEPRECATED] OpenAI Assistants API (threads)

chat history linear image

Good for:

  • Chatbots and support agents
  • Simple Q&A flows
  • Scenarios requiring strict audit trails

Limitations:

  • Can’t “go back” and try a different response
  • No parallel exploration of different conversation paths

Forking-Capable Conversations

Modern Responses APIs introduce a more flexible model: each response has a unique ID, and new requests can reference any previous response as the conversation continuation point.

Examples:

  • Microsoft Foundry Responses endpoint
  • Azure OpenAI Responses API
  • OpenAI Responses API

chat history forking image

Good for:

  • Exploration and brainstorming applications
  • A/B testing different response strategies
  • “Undo” and “try again” functionality
  • Building tree-structured conversation UIs
  • Agentic workflows where multiple paths may be explored

Client-Managed Storage Patterns

When the AI service doesn’t store conversation state, your application takes full responsibility. This is the pattern used by many providers.

Providers using this model:

  • Azure OpenAI Chat Completions
  • OpenAI Chat Completions
  • Anthropic Claude
  • Ollama
  • Most open-source model APIs

Implementation Considerations

Context Window Management: You can’t send unlimited history. As conversations grow, you’ll need strategies like:

  • Truncating older messages
  • Summarizing earlier parts of the conversation
  • Selective inclusion based on relevance

Persistence: In-memory history works for demos and development, but production applications almost always need a durable store – a database, Redis, blob storage, or similar. This adds infrastructure and operational complexity that service-managed storage avoids entirely.

Privacy Control: The upside: conversation data never leaves your control unless you explicitly send it. This can be crucial for sensitive applications.

Compaction: The Hidden Complexity

When the service manages history, it also manages compaction – keeping the conversation context within the model’s token limits. You don’t have to think about it, but you also can’t control it.

With client-managed history, compaction becomes your responsibility. As conversations grow, you need explicit strategies to prevent context window overflows and control costs. Common approaches include:

  • Truncation – Drop the oldest messages beyond a threshold
  • Sliding window – Keep only the most recent N turns
  • Summarization – Replace older messages with an LLM-generated summary
  • Tool-call collapse – Replace verbose tool call/result pairs with compact summaries

Agent Framework provides built-in compaction strategies for all of these patterns, so you don’t have to build them from scratch. But you do need to choose, configure, and maintain the right strategy for your use case – a tradeoff that doesn’t exist with service-managed storage.

How Agent Framework Handles the Differences

Microsoft Agent Framework provides a unified programming model that works regardless of which storage pattern the underlying service uses. This abstraction lives in two key components:

AgentSession: The Unified Conversation Container

Every conversation in Agent Framework is represented by an AgentSession. This object:

  • Contains any service-specific identifiers (thread IDs, response IDs)
  • Holds local state (for client-managed history scenarios). This may include:
    • The actual chat history
    • Storage identifiers for a custom database chat history store
  • Provides serialization for persistence across application restarts
// C#
// Create a session - works the same regardless of provider
AgentSession session = await agent.CreateSessionAsync();

// Use the session across multiple turns
var first = await agent.RunAsync("My name is Alice.", session);
var second = await agent.RunAsync("What is my name?", session);

// The session handles the details:
// - If service-managed: tracks the conversation_id internally
// - If client-managed: accumulates history locally
# Python
# Create a session - works the same regardless of provider
session = agent.create_session()

# Use the session across multiple turns
first = await agent.run("My name is Alice.", session=session)
second = await agent.run("What is my name?", session=session)

ChatHistoryProvider: Pluggable Storage Backends

When you need client-managed storage, history providers allow you to control where history lives and how it’s retrieved:

// C#
// Built-in in-memory provider (simplest and default option)
AIAgent agent = chatClient.AsAIAgent(new ChatClientAgentOptions
{
    ChatOptions = new() { Instructions = "You are a helpful assistant." },
    ChatHistoryProvider = new InMemoryChatHistoryProvider()
});

// Custom database-backed provider (you implement)
AIAgent agent = chatClient.AsAIAgent(new ChatClientAgentOptions
{
    ChatOptions = new() { Instructions = "You are a helpful assistant." },
    ChatHistoryProvider = new DatabaseChatHistoryProvider(dbConnection)
});
# Python
from agent_framework import InMemoryHistoryProvider
from agent_framework.openai import OpenAIChatCompletionClient

# Built-in in-memory provider (simplest and default option)
agent = OpenAIChatCompletionClient().as_agent(
    name="Assistant",
    instructions="You are a helpful assistant.",
    context_providers=[InMemoryHistoryProvider("memory", load_messages=True)],
)

# Custom database-backed provider (you implement)
agent = OpenAIChatCompletionClient().as_agent(
  name="Assistant",
  instructions="You are a helpful assistant.",
  context_providers=[DatabaseHistoryProvider(db_client)],
)

Key design principle: Your application code doesn’t change when switching between service-managed and client-managed storage. The abstraction handles the details.

Transparent Mode Switching

Consider this scenario: you start with OpenAI Chat Completions (client-managed) and later want to try the Responses API (service-managed with forking). Your agent invocation code stays the same:

// C#
// Works with Chat Completions (client-managed)
var response = await agent.RunAsync("Hello!", session);

// Also works with Responses API (service-managed)
var response = await agent.RunAsync("Hello!", session);
# Python
# Works with Chat Completions (client-managed)
response = await agent.run("Hello!", session=session)

# Also works with Responses API (service-managed)
response = await agent.run("Hello!", session=session)

The session and provider handle the underlying differences. This decoupling is valuable for:

  • Experimenting with different providers
  • Migrating between services
  • Building provider-agnostic applications

Provider Comparison

Most AI services have a fixed storage model – the service either stores history or it doesn’t. The Responses API is the notable exception: it’s configurable.

Fixed-Mode Providers

These providers operate in a single storage mode:

Provider Storage Location Storage Model Compaction
OpenAI Chat Completion Client N/A Developer
Azure OpenAI Chat Completion Client N/A Developer
Foundry Agent Service Service Linear (threads) Service
Anthropic Claude Client N/A Developer
Ollama Client N/A Developer
GitHub Copilot SDK Service N/A Service
[DEPRECATED] OpenAI Assistants Service Linear (threads) Service

Configurable: The Responses API

The Responses API (available from Microsoft Foundry, OpenAI, and Azure OpenAI) is a special case. It supports multiple storage modes controlled by configuration – primarily the store parameter:

Mode Configuration Storage Location Storage Model Compaction
Forking (default) store=true Service Forking via response IDs Service
Client-managed store=false Client N/A Developer
Linear conversations Conversations API Service Linear Service

This makes the Responses API uniquely flexible:

  • store=true (default) – The service stores each response and its history. New requests can reference any prior response ID to continue from that point, enabling branching and forking. The service handles compaction.
  • store=false – The service is stateless. Agent Framework manages the full conversation history client-side using history providers – just like Chat Completions.
  • Conversations API – Built on top of Responses, this provides a linear thread model similar to Assistants. The service manages an ordered conversation and handles compaction. Pass a conversation id as input to responses instead of a previous response id, to enable this model.

Legend:

  • Storage Location: Where the canonical conversation state lives – “Service” (on the provider’s servers) or “Client” (in Agent Framework’s session/history providers).
  • Storage Model: For service-stored history, the shape – linear (thread) or forking (response IDs).
  • Compaction: Who keeps context within token limits. “Service” = automatic. “Developer” = you configure compaction strategies in Agent Framework.

Configuring Responses API Modes

Here’s how each mode looks in practice:

Mode 1: Forking with service storage (default)

This is the simplest setup – just create an agent from the Responses client. The service stores everything and supports forking via response IDs.

// C# - Responses API with store=true (default)
// The service stores each response and its history.
// Each response ID can be used as a fork point.
AIAgent agent = new OpenAIClient("<your_api_key>")
    .GetResponseClient("gpt-5.4-mini")
    .AsAIAgent(
    instructions: "You are a helpful assistant.",
    name: "ForkingAgent");

AgentSession session = await agent.CreateSessionAsync();
var response1 = await agent.RunAsync("What are three good vacation spots?", session);

// The session tracks the response ID internally.
// A new session forked from response1 could explore a different branch.
# Python - Responses API with store=true (default)
# The service stores each response and its history.
# Each response ID can be used as a fork point.
from agent_framework import Agent
from agent_framework.openai import OpenAIChatClient

agent = Agent(
    client=OpenAIChatClient(),
    name="ForkingAgent",
    instructions="You are a helpful assistant.",
)

session = agent.create_session()
response1 = await agent.run("What are three good vacation spots?", session=session)

# The session tracks the response ID internally.
# A new session forked from response1 could explore a different branch.

Mode 2: Client-managed with store=false

Here you use the same Responses client but disable service-side storage. Agent Framework manages history client-side, giving you full control over persistence and compaction.

// C# - Responses API with store=false
// The service is stateless - Agent Framework manages history.
AIAgent agent = new OpenAIClient("<your_api_key>")
    .GetResponseClient("gpt-5.4-mini")
    .AsIChatClientWithStoredOutputDisabled()
    .AsAIAgent(new ChatClientAgentOptions
    {
        ChatOptions = new() { Instructions = "You are a helpful assistant." },
        ChatHistoryProvider = new InMemoryChatHistoryProvider()
    });

AgentSession session = await agent.CreateSessionAsync();
var response = await agent.RunAsync("Hello!", session);
// History lives in the InMemoryChatHistoryProvider,
// not on the service. You control compaction.
# Python - Responses API with store=false
# The service is stateless - Agent Framework manages history.
from agent_framework import Agent, InMemoryHistoryProvider
from agent_framework.openai import OpenAIChatClient

agent = Agent(
    client=OpenAIChatClient(),
    name="StatelessAgent",
    instructions="You are a helpful assistant.",
    default_options={"store": False},
    context_providers=[InMemoryHistoryProvider("memory", load_messages=True)],
)

session = agent.create_session()
response = await agent.run("Hello!", session=session)
# History lives in the InMemoryHistoryProvider,
# not on the service. You control compaction.

Mode 3: Linear conversations

The Conversations API builds on Responses to provide a linear thread model. You create a server-side conversation first, and then bootstrap your session with it. This gives you service-managed storage with a simple, ordered history – similar to the deprecated Assistants API.

In C#, the FoundryAgent class provides a CreateConversationSessionAsync() convenience method that creates the server-side conversation and links it to a session in a single call:

// C# — Responses API with Conversations (via Foundry)
// CreateConversationSessionAsync() creates a server-side conversation
// that persists on the Foundry service and is visible in the Foundry Project UI.
AIProjectClient aiProjectClient = new(new Uri(endpoint), new DefaultAzureCredential());

FoundryAgent agent = aiProjectClient
    .AsAIAgent("gpt-5.4-mini", instructions: "You are a helpful assistant.", name: "ConversationAgent");

// One call creates the conversation and binds it to the session.
ChatClientAgentSession session = await agent.CreateConversationSessionAsync();

Console.WriteLine(await agent.RunAsync("What is the capital of France?", session));
Console.WriteLine(await agent.RunAsync("What about Germany?", session));
// Both responses are part of the same linear conversation thread
// managed by the service.
# Python — Responses API with Conversations (via Foundry)
# Use get_session with a conversation id from the conversation service to link to
# a server-side conversation.
from agent_framework.foundry import FoundryChatClient
from azure.identity import AzureCliCredential
from agent_framework import Agent

foundry_client = FoundryChatClient(credential=AzureCliCredential())
agent = Agent(
    client=foundry_client,
    instructions="You are a helpful assistant."
)

# Create a session with a conversation id from the conversations service
conversation_result = await foundry_client.client.conversations.create()
session = agent.get_session(service_session_id=conversation_result.id)

response1 = await agent.run("What is the capital of France?", session=session)
response2 = await agent.run("What about Germany?", session=session)
# Both responses are part of the same linear conversation thread
# managed by the service.

Decision Tree

This decision tree demonstrates some of the main options available when choosing a chat history storage mechanism.

chat history decision tree image

Conclusion

Chat history storage might seem like an implementation detail, but it fundamentally shapes what your AI application can do. Understanding the tradeoffs between service-managed and client-managed patterns—and between linear and forking models—helps you make architectural decisions that align with your requirements.

Microsoft Agent Framework’s session and provider abstractions give you the flexibility to start with one approach and evolve without rewriting your application logic. Whether you’re building a simple chatbot or a complex agentic system with branching conversations, the framework adapts to your chosen storage strategy.

The key takeaway: choose based on your actual requirements (privacy, control, capabilities), not just what’s easiest to start with. The right storage pattern will make your application more capable and maintainable in the long run.

For more details on implementing these patterns with Microsoft Agent Framework, see:

The post Chat History Storage Patterns in Microsoft Agent Framework appeared first on Microsoft Agent Framework.

Read the whole story
alvinashcraft
56 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Qwen3.6-27B Brings Open-Weight Vision and Coding Power

1 Share
Qwen3.6-27B is an open-weight multimodal model built for coding, reasoning, visual understanding, and long-context AI workflows.
Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Susurrus: Crafting a Cozy Watercolor World with Three.js and Shaders

1 Share
A behind-the-scenes look at blending NPR shading, sound, and interaction to shape a meditative WebGL scene.



Download video: https://codrops-1f606.kxcdn.com/codrops/wp-content/uploads/2026/04/susurrus-short-video.mp4?x30804



Download video: https://codrops-1f606.kxcdn.com/codrops/wp-content/uploads/2026/04/0414.mp4?x30804



Download video: https://codrops-1f606.kxcdn.com/codrops/wp-content/uploads/2026/04/spawn.mp4?x30804



Download video: https://codrops-1f606.kxcdn.com/codrops/wp-content/uploads/2026/04/dissssolve.mp4?x30804
Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories