Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
151365 stories
·
33 followers

Kubernetes v1.35: Job Managed By Goes GA

1 Share

In Kubernetes v1.35, the ability to specify an external Job controller (through .spec.managedBy) graduates to General Availability.

This feature allows external controllers to take full responsibility for Job reconciliation, unlocking powerful scheduling patterns like multi-cluster dispatching with MultiKueue.

Why delegate Job reconciliation?

The primary motivation for this feature is to support multi-cluster batch scheduling architectures, such as MultiKueue.

The MultiKueue architecture distinguishes between a Management Cluster and a pool of Worker Clusters:

  • The Management Cluster is responsible for dispatching Jobs but not executing them. It needs to accept Job objects to track status, but it skips the creation and execution of Pods.
  • The Worker Clusters receive the dispatched Jobs and execute the actual Pods.
  • Users usually interact with the Management Cluster. Because the status is automatically propagated back, they can observe the Job's progress "live" without accessing the Worker Clusters.
  • In the Worker Clusters, the dispatched Jobs run as regular Jobs managed by the built-in Job controller, with no .spec.managedBy set.

By using .spec.managedBy, the MultiKueue controller on the Management Cluster can take over the reconciliation of a Job. It copies the status from the "mirror" Job running on the Worker Cluster back to the Management Cluster.

Why not just disable the Job controller? While one could theoretically achieve this by disabling the built-in Job controller entirely, this is often impossible or impractical for two reasons:

  1. Managed Control Planes: In many cloud environments, the Kubernetes control plane is locked, and users cannot modify controller manager flags.
  2. Hybrid Cluster Role: Users often need a "hybrid" mode where the Management Cluster dispatches some heavy workloads to remote clusters but still executes smaller or control-plane-related Jobs in the Management Cluster. .spec.managedBy allows this granularity on a per-Job basis.

How .spec.managedBy works

The .spec.managedBy field indicates which controller is responsible for the Job, specifically there are two modes of operation:

  • Standard: if unset or set to the reserved value kubernetes.io/job-controller, the built-in Job controller reconciles the Job as usual (standard behavior).
  • Delegation: If set to any other value, the built-in Job controller skips reconciliation entirely for that Job.

To prevent orphaned Pods or resource leaks, this field is immutable. You cannot transfer a running Job from one controller to another.

If you are looking into implementing an external controller, be aware that your controller needs to be conformant with the definitions for the Job API. In order to enforce the conformance, a significant part of the effort was to introduce the extensive Job status validation rules. Navigate to the How can you learn more? section for more details.

Ecosystem Adoption

The .spec.managedBy field is rapidly becoming the standard interface for delegating control in the Kubernetes batch ecosystem.

Various custom workload controllers are adding this field (or an equivalent) to allow MultiKueue to take over their reconciliation and orchestrate them across clusters:

While it is possible to use .spec.managedBy to implement a custom Job controller from scratch, we haven't observed that yet. The feature is specifically designed to support delegation patterns, like MultiKueue, without reinventing the wheel.

How can you learn more?

If you want to dig deeper:

Read the user-facing documentation for:

Deep dive into the design history:

Explore how MultiKueue uses .spec.managedBy in practice in the task guide for running Jobs across clusters.

Acknowledgments

As with any Kubernetes feature, a lot of people helped shape this one through design discussions, reviews, test runs, and bug reports.

We would like to thank, in particular:

Get involved

This work was sponsored by the Kubernetes Batch Working Group in close collaboration with the SIG Apps, and with strong input from the SIG Scheduling community.

If you are interested in batch scheduling, multi-cluster solutions, or further improving the Job API:

Read the whole story
alvinashcraft
29 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

82% of Companies Are Seeing Positive AI ROI

1 Share
From: AIDailyBrief
Duration: 18:20
Views: 353

An AI ROI benchmarking study of over 1,200 respondents and 5,000 use cases found 82% positive ROI with 37% reporting significant or transformational impact. Timesavings was the most common benefit, averaging about eight hours saved per week, while strategic gains, new capabilities, improved decision-making, and increased revenue correlated with higher ROI. Smaller organizations and C-level executives reported stronger gains, agentic AI accounted for roughly 14% of use cases, and a portfolio approach across multiple benefit types aligned with greater overall value.

Brought to you by:
KPMG – Go to ⁠www.kpmg.us/ai⁠ to learn more about how KPMG can help you drive value with our AI solutions.
Vanta - Simplify compliance - ⁠⁠⁠⁠⁠⁠⁠https://vanta.com/nlw

The AI Daily Brief helps you understand the most important news and discussions in AI.
Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614
Get it ad free at
Join our Discord: https://bit.ly/aibreakdown

Read the whole story
alvinashcraft
30 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

python-1.0.0b251218

1 Share

[1.0.0b251218] - 2025-12-18

Added

  • agent-framework-core: Azure AI Agent with Bing Grounding Citations sample (#2892)
  • agent-framework-core: Workflow option to visualize internal executors (#2917)
  • agent-framework-core: Workflow cancellation sample (#2732)
  • agent-framework-core: Azure Managed Redis support with credential provider (#2887)
  • agent-framework-core: Additional arguments for Azure AI agent configuration (#2922)

Changed

  • agent-framework-ollama: Updated Ollama package version (#2920)
  • agent-framework-ollama: Move Ollama samples to samples getting started directory (#2921)
  • agent-framework-core: Cleanup and refactoring of chat clients (#2937)
  • agent-framework-core: Align Run ID and Thread ID casing with AG-UI TypeScript SDK (#2948)

Fixed

  • agent-framework-core: Fix Pydantic error when using Literal types for tool parameters (#2893)
  • agent-framework-core: Correct MCP image type conversion in _mcp.py (#2901)
  • agent-framework-core: Fix BadRequestError when using Pydantic models in response formatting (#1843)
  • agent-framework-core: Propagate workflow kwargs to sub-workflows via WorkflowExecutor (#2923)
  • agent-framework-core: Fix WorkflowAgent event handling and kwargs forwarding (#2946)

New Contributors

Full Changelog: python-1.0.0b251216...python-1.0.0b251218

Read the whole story
alvinashcraft
30 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How to Build a Smart HomeKit Virtual Light in Go

1 Share

Recently, I wanted to understand how smart home devices actually work. When you scan a QR code and a light appears in your Home app, what's really happening? When you tap "on", what bytes travel across your network?

Virtual HomeKit Light QR code

The best way I know to understand something is to build it, so I created a virtual HomeKit light in Go. And in this tutorial, I’ll walk you through how I went about it. We’ll pull back the curtain on smart home protocols so you understand how they work, in depth. Let’s dive in.

What we’ll cover:

  1. What HomeKit Actually Is

  2. The Smart Home Protocol Landscape

  3. How HomeKit Discovery Works

  4. The Pairing Process: What Happens When You Scan the QR Code

  5. The Setup URI: What's in That QR Code?

  6. What Happens When You Toggle the Light

  7. The Accessory Database Model

  8. Persisting Pairing Data

  9. Event Notifications

  10. The Complete Implementation

  11. What I Learned

What You'll Need

Before we start building, let's make sure you have the right setup. This project requires two things:

  1. Go 1.21 or later: We're using some modern Go features, and the brutella/HAP library works best with recent versions. You can check your version with go version. If you need to upgrade, grab the latest from go.dev

  2. An Apple HomeKit environment: This means an iPhone or iPad running iOS 15+ with the Home app. You'll also want to be on the same WiFi network as the machine running your virtual light. HomeKit is entirely local, so your phone needs to be able to reach your development machine directly.

One thing that tripped me up initially is that if you’re running this on a Linux server or inside a container, make sure mDNS traffic isn’t being blocked. Your firewall needs to allow UDP port 5353 (for mDNS discovery) and whatever port your accessory runs on (we'll use 51826). On a Mac this usually just works.

What HomeKit Actually Is

HomeKit is Apple's smart home framework. It's comprised of three things:

  1. a protocol (HAP) that defines how devices talk to each other,

  2. a security model that encrypts and authenticates everything,

  3. and an ecosystem (the Home app, Siri, automations)

Here, we’ll be focused on the protocol layer. We're building something that speaks HAP well enough that Apple's ecosystem accepts it as a real accessory.

The Smart Home Protocol Landscape

Before getting started, let's understand what we're dealing with. There are two protocols at play here:

  1. HomeKit Accessory Protocol (HAP): Apple's original smart home protocol from 2014. It runs over your local WiFi network, uses mDNS for discovery, and encrypts everything with Curve25519 and ChaCha20-Poly1305. Every HomeKit device you've ever used speaks HAP.

  2. Matter: The new industry standard (2022) backed by Apple, Google, Amazon, and others. Matter is actually built on many of the same cryptographic primitives as HAP. When Apple added Matter support, they essentially made HomeKit bilingual, as it can speak both protocols.

Here's what's interesting: Matter devices that connect to Apple Home still end up being controlled through HomeKit's infrastructure. Matter is the pairing and discovery layer, but once a device is in your Home, Apple's ecosystem takes over.

For this project, I'm using the HAP protocol directly via the brutella/hap library. This lets us see exactly what's happening without Matter's additional abstraction layer.

How HomeKit Discovery Works

When you run a HomeKit accessory on your network, it doesn't just sit there waiting. It actively announces itself using mDNS (multicast DNS), also called Bonjour on Apple platforms.

The accessory broadcasts a service record that looks like this:

_hap._tcp.local.
  name: Virtual Light._hap._tcp.local.
  port: 51826
  txt: 
    c#=1          // config number (changes trigger rediscovery)
    ff=0          // feature flags
    id=XX:XX:XX   // device ID (like a MAC address)
    md=Virtual Light  // model name
    pv=1.1        // protocol version
    s#=1          // state number
    sf=1          // status flag (1=not paired, 0=paired)
    ci=5          // category (5=lightbulb)
    sh=XXXXXX     // setup hash

Your iPhone is constantly listening for _hap._tcp.local. broadcasts. When it sees one with sf=1 (unpaired), it shows up in "Add Accessory" as available.

Let's see this in code. Here's the minimal server setup:

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/brutella/hap"
    "github.com/brutella/hap/accessory"
)

func main() {
    light := accessory.NewLightbulb(accessory.Info{
        Name:         "Virtual Light",
        Manufacturer: "My Smart Home",
    })

    server, err := hap.NewServer(hap.NewFsStore("./data"), light.A)
    if err != nil {
        log.Fatal(err)
    }

    server.Pin = "00102003"
    server.Addr = ":51826"

    server.ListenAndServe(context.Background())
}

When ListenAndServe runs, it:

  1. Generates a unique device ID if one doesn't exist

  2. Starts listening on port 51826

  3. Registers the mDNS service record

  4. Waits for connections

At this point, your iPhone can discover it. But what happens when you try to pair it?

The Pairing Process: What Happens When You Scan the QR Code

This is where it gets interesting. HomeKit uses the SRP (Secure Remote Password) protocol for pairing. It’s the same protocol used in things like 1Passwords authentication.

When you scan the QR code or enter the PIN, here's the actual sequence:

Step 1: Pair Setup M1 (iOS → Accessory)

iOS sends: { method: "pair-setup", state: 1 }

Your phone initiates pairing, telling the accessory "I want to pair with you."

Step 2: Pair Setup M2 (Accessory → iOS)

Accessory sends: { 
  state: 2,
  salt: <16 random bytes>,
  public_key: <SRP public key B>
}

The accessory generates an SRP salt and public key. The PIN code you entered isn't sent over the network – instead, it's used to derive a verifier locally.

Step 3: Pair Setup M3 (iOS → Accessory)

iOS sends: {
  state: 3,
  public_key: <SRP public key A>,
  proof: <SRP proof M1>
}

Your iPhone uses the PIN to compute its own SRP values and sends a proof that it knows the PIN.

Step 4: Pair Setup M4 (Accessory → iOS)

Accessory sends: {
  state: 4,
  proof: <SRP proof M2>
}

The accessory verifies the proof. If the PIN was wrong, pairing fails here. If correct, it sends its own proof back.

Step 5-6: Key Exchange

Now both sides have a shared secret derived from SRP. They use this to establish an encrypted channel and exchange long term Ed25519 public keys. These keys are stored permanently. This is why your lights still work after rebooting your router.

The whole dance takes about 2 seconds. After this, sf in the mDNS record changes from 1 to 0 and the accessory disappears from "Add Accessory".

The Setup URI: What's in That QR Code?

The QR code contains a URI that encodes everything needed for pairing:

X-HM://0ABCDEFGH1234
        ^^^^^^^^^^^^
        |       |
        |       +-- Setup ID (4 chars)
        +---------- Encoded payload (9 chars, base-36)

The payload packs three things into 45 bits:

  1. Category: what type of accessory this is (5 = lightbulb, 6 = outlet, 10 = thermostat, and so on)

  2. Flags: how the accessory can pair (2 = supports IP ,wifi pairing , 4 = supports BLE pairing , 6 = supports both)

  3. PIN code as integer

This lets your iPhone know what icon to show and the PIN to use, all from scanning a single QR code.

func generateSetupURI(pin, setupID string, category int) string {
    // PIN "00102003" becomes integer 102003
    var pinInt uint64
    for _, c := range pin {
        if c >= '0' && c <= '9' {
            pinInt = pinInt*10 + uint64(c-'0')
        }
    }

    // Bit layout:
    // [39:32] = category (5 = lightbulb)
    // [31:28] = flags (2 = IP pairing supported)
    // [26:0]  = PIN code
    payload := (uint64(category) << 32) | (2 << 28) | (pinInt & 0x7FFFFFF)

    // Encode as base-36 (0-9, A-Z)
    const chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    encoded := ""
    for payload > 0 {
        encoded = string(chars[payload%36]) + encoded
        payload /= 36
    }

    for len(encoded) < 9 {
        encoded = "0" + encoded
    }

    return "X-HM://" + encoded + setupID
}

When your iPhone camera sees X-HM://, it knows this is a HomeKit code. It decodes the payload to extract the category (so it can show the right icon) and the PIN (so you don't have to type it). The setup ID helps with identification when multiple unpaired accessories are on the network.

What Happens When You Toggle the Light

Now for the part I was most curious about. When you tap the light button in the Home app, what actually travels across your network?

Step 1: Encrypted Session

Your iPhone doesn't just send commands in plaintext. Every paired session uses the longterm keys exchanged during pairing to establish a session key. All communication is encrypted with ChaCha20Poly1305.

Step 2: HAP Request

Inside the encrypted channel, HomeKit uses a simple HTTP like protocol. A "turn on" command looks like this:

PUT /characteristics HTTP/1.1
Host: Virtual Light._hap._tcp.local
Content-Type: application/hap+json

{
  "characteristics": [{
    "aid": 1,        // accessory ID
    "iid": 10,       // instance ID (the "On" characteristic)
    "value": true    // new state
  }]
}

Step 3: Accessory Response

The accessory processes the request and responds like this:

HTTP/1.1 204 No Content

If something went wrong, it'll return a status object with an error code.

In our Go code, we hook into this with a callback:

light.Lightbulb.On.OnValueRemoteUpdate(func(on bool) {
    if on {
        fmt.Println("💡 Light ON")
    } else {
        fmt.Println("💡 Light OFF")
    }
})

This callback fires when the value in that PUT request changes. The brutella/hap library handles all the decryption, parsing, and response generation.

The Accessory Database Model

HomeKit organizes everything into a hierarchy:

Accessory (aid=1)
└── Services
    ├── AccessoryInformation (iid=1)
    │   ├── Name (iid=2)
    │   ├── Manufacturer (iid=3)
    │   ├── Model (iid=4)
    │   └── SerialNumber (iid=5)
    │
    └── Lightbulb (iid=9)
        ├── On (iid=10)           ← boolean
        ├── Brightness (iid=11)   ← int 0-100
        └── Hue (iid=12)          ← float 0-360

Each characteristic has an iid (instance ID). When you change brightness to 75%, the PUT request targets aid=1, iid=11, value=75.

This model is why HomeKit accessories are interoperable. Every lightbulb, regardless of manufacturer, has the same characteristic structure.

Persisting Pairing Data

When your accessory pairs with a controller (iPhone), it stores:

  • The controller's Ed25519 public key

  • A controller ID (36chars UUID)

  • Permission level (admin or regular user)

The accessory also has its own keypairs that must persist across restarts. If you lose this, all paired controllers become orphaned – that is, they think they’re paired, but the accessory doesn't recognize them.

As mentioned earlier, we need to save pairing info so that if the app/device restarts, it can communicate with Homekit again. You could use a database, but for a single accessory, a JSON file works fine. If the process crashes mid-session, you won’t lose pairing data.

I wrote a simple JSON store to keep everything in one file:

type JSONStore struct {
    path string
    data map[string][]byte
    mu   sync.RWMutex
}

func (s *JSONStore) Set(key string, value []byte) error {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.data[key] = value
    return s.save()
}

func (s *JSONStore) Get(key string) ([]byte, error) {
    s.mu.RLock()
    defer s.mu.RUnlock()
    if v, ok := s.data[key]; ok {
        return v, nil
    }
    return nil, fmt.Errorf("key not found: %s", key)
}

The HAP library stores several keys:

  • uuid – accessory's unique identifier

  • public / private – Ed25519 keypair

  • *-pairings – paired controller keys

If you delete this JSON file, the accessory (our virtual-light) forgets all its paired controllers. Your iPhone still thinks it's paired, but the accessory doesn't recognize it anymore – you'll see "No Response" in the Home app. The fix removes the accessory from the Home app and pairs it fresh using the QR code again.

Event Notifications

One thing I didn't expect is that HomeKit supports push notifications from accessories. When our light state changes (maybe from a physical switch), we can notify all connected controllers:

light.Lightbulb.On.SetValue(true)  // This triggers notifications

Under the hood, the accessory maintains persistent connections with controllers. When a characteristic changes, it sends an EVENT message:

EVENT/1.0 200 OK
Content-Type: application/hap+json

{
  "characteristics": [{
    "aid": 1,
    "iid": 10,
    "value": true
  }]
}

This is how your Home app updates in realtime when someone else turns on a light.

The Complete Implementation

Here's everything together:

package main

import (
    "context"
    "encoding/json"
    "fmt"
    "log"
    "os"
    "os/signal"
    "sync"
    "syscall"

    "github.com/brutella/hap"
    "github.com/brutella/hap/accessory"
    "github.com/skip2/go-qrcode"
)

const (
    pinCode  = "00102003"
    setupID  = "VLTX"
    category = 5
    dbFile   = "data.json"
)

type JSONStore struct {
    path string
    data map[string][]byte
    mu   sync.RWMutex
}

func NewJSONStore(path string) *JSONStore {
    s := &JSONStore{
        path: path,
        data: make(map[string][]byte),
    }
    s.load()
    return s
}

func (s *JSONStore) load() {
    file, err := os.ReadFile(s.path)
    if err != nil {
        return
    }
    json.Unmarshal(file, &s.data)
}

func (s *JSONStore) save() error {
    file, err := json.MarshalIndent(s.data, "", "  ")
    if err != nil {
        return err
    }
    return os.WriteFile(s.path, file, 0644)
}

func (s *JSONStore) Set(key string, value []byte) error {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.data[key] = value
    return s.save()
}

func (s *JSONStore) Get(key string) ([]byte, error) {
    s.mu.RLock()
    defer s.mu.RUnlock()
    if v, ok := s.data[key]; ok {
        return v, nil
    }
    return nil, fmt.Errorf("key not found: %s", key)
}

func (s *JSONStore) Delete(key string) error {
    s.mu.Lock()
    defer s.mu.Unlock()
    delete(s.data, key)
    return s.save()
}

func (s *JSONStore) KeysWithSuffix(suffix string) ([]string, error) {
    s.mu.RLock()
    defer s.mu.RUnlock()
    var keys []string
    for k := range s.data {
        if len(k) >= len(suffix) && k[len(k)-len(suffix):] == suffix {
            keys = append(keys, k)
        }
    }
    return keys, nil
}

func generateSetupURI(pin, setupID string, category int) string {
    var pinInt uint64
    for _, c := range pin {
        if c >= '0' && c <= '9' {
            pinInt = pinInt*10 + uint64(c-'0')
        }
    }

    payload := (uint64(category) << 32) | (2 << 28) | (pinInt & 0x7FFFFFF)

    const chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    encoded := ""
    for payload > 0 {
        encoded = string(chars[payload%36]) + encoded
        payload /= 36
    }

    for len(encoded) < 9 {
        encoded = "0" + encoded
    }

    return "X-HM://" + encoded + setupID
}

func main() {
    light := accessory.NewLightbulb(accessory.Info{
        Name:         "Virtual Light",
        Manufacturer: "My Smart Home",
    })

    light.Lightbulb.On.OnValueRemoteUpdate(func(on bool) {
        if on {
            fmt.Println("💡 Light ON")
        } else {
            fmt.Println("💡 Light OFF")
        }
    })

    store := NewJSONStore(dbFile)

    server, err := hap.NewServer(store, light.A)
    if err != nil {
        log.Fatal(err)
    }

    server.Pin = pinCode
    server.SetupId = setupID
    server.Addr = ":51826"

    fmt.Println("==============================================")
    fmt.Println("       Virtual HomeKit Light")
    fmt.Println("==============================================")
    fmt.Println("PIN: 001-02-003")
    fmt.Println()

    setupURI := generateSetupURI(pinCode, setupID, category)
    if qr, err := qrcode.New(setupURI, qrcode.Medium); err == nil {
        fmt.Println(qr.ToSmallString(false))
    }

    fmt.Println("Manual: Home app → + → More Options → Virtual Light")
    fmt.Printf("Data stored in: %s\n", dbFile)
    fmt.Println("==============================================")

    ctx, cancel := context.WithCancel(context.Background())
    go func() {
        c := make(chan os.Signal, 1)
        signal.Notify(c, os.Interrupt, syscall.SIGTERM)
        <-c
        cancel()
    }()

    fmt.Println("Running... (Ctrl+C to stop)")
    server.ListenAndServe(ctx)
}

Run it, pair it, and watch the terminal as you toggle from your phone. Each "💡 Light ON" is the end of an encrypted request that traveled from your phone, through your router, to this Go process.

What I Learned

Building this cleared up several things I'd been fuzzy on:

  1. HomeKit is entirely local. There are no cloud servers involved in controlling devices – your commands go directly from phone to device over your LAN. This is why HomeKit devices work when your internet is down.

  2. The security model is solid. SRP for pairing means the PIN never crosses the network. Ed25519 + ChaCha20 for sessions means that even someone sniffing your WiFi sees only encrypted blobs.

  3. Matter doesn't replace HAP. At least not in Apple's ecosystem. Matter handles discovery and pairing across ecosystems, but Apple Home still uses HAP concepts internally.

  4. The protocol is HTTPish. Once you get past the encryption, it’s just PUT/GET requests with JSON bodies – surprisingly approachable.

Thanks for reading!

The code is here if you want to experiment yourself. You could try adding brightness control, or create a switch instead of a light. The best way to understand a protocol is to speak it ;)



Read the whole story
alvinashcraft
31 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How to Build an AI Agent with LangChain and LangGraph: Build an Autonomous Starbucks Agent

1 Share

Back in 2023, when I started using ChatGPT, it was just another chatbot that I could ask complex questions to and it would identify errors in my code snippets. Everything was fine. The application had no memory of previous states or what was said the day before.

Then in 2024, everything started to change. We went from a stateless chatbot to an AI agent that could call tools, search the internet, and generate download links.

At this point, I started to get curious. How can an LLM search the internet? An infinite number of questions were flowing through my head. Can it create its own tools, programs, or execute its own code? It felt like we were heading toward the Skynet (Terminator) revolution.

I was just ignorant 😅. But that's when I started my research and discovered LangChain, a tool that promises all those miracles without a billion-dollar budget.

In this article, you’ll build a fully functional AI agent using LangChain and LangGraph. You’ll start by defining structured data using Zod schemas, then parsing them for AI understanding. Next, you’ll learn about summarizing data into text, creating tools the agent can call, and setting up LangGraph nodes to orchestrate workflows.

You’ll see how to compile the workflow graph, manage state, and persist conversation history using MongoDB. By the end, you’ll have a working Starbucks barista AI that demonstrates how to combine reasoning, tool execution, and memory in a single agent.

Table of Contents

Prerequisites

To take full advantage of this article, you should have a basic understanding of TypeScript, Node.js, and a bit of NestJS will help, as it’s the backend framework we’ll be using.

What is an LLM Agent?

By definition, an LLM agent is a software program that’s capable of perceiving its environment, making decisions, and taking autonomous actions to achieve specific goals. It often does this by interacting with tools and systems.

Many frameworks and conventions were created to achieve this, and one of the most famous and widely used is the ReAct (Reason & Act) framework.

With this framework, the LLM receives a prompt, thinks, decides the next action (this can be calling a specific tool), and receives the tool data. Once the tool’s response has been received, the AI model observes the response, generates its own response, and plans its next actions based on the tool’s response.

You can read more about this concept on the official white paper. And here’s a diagram that summarizes the entire process:

Diagram illustrating an LLM agent workflow: the agent receives a prompt, reasons, decides an action (such as calling a tool), observes the tool’s response, generates its own response, and iteratively plans its next actions using the ReAct framework

Note that the workflow is not limited to a single tool invocation – it can proceed through several rounds before returning to the user.

But for an LLM agent to be truly human-like and act with knowledge of the past, it requires a memory. This enables it to recall previous prompts and responses, maintaining consistency within the given thread.

There’s no single source of truth for how to approach this. Most agents implement a short-term memory. This means that the agent will append each new chat to the conversation history, and when a new prompt is submitted, the agent will append the previous messages to the new prompt.

This method is very efficient and gives the LLM a strong knowledge of previous states. But it can also introduce problems, because the more the conversation grows, the more the LLM will have to go through all previous messages in order to understand what action to take next.

And this can introduce some context drift, just like humans experience. You can’t watch a two-hour podcast and remember all the spoken words, right? In this scenario, the LLM will focus on the most relevant information, eventually losing some context.

Illustration showing an LLM agent workflow with memory: the agent processes multiple rounds of prompts and tool interactions, maintains a short-term memory of previous conversations, and uses this context to decide actions, while older context may fade over time causing potential context drift.

You don’t have to implement this from scratch. Many tools and frameworks have been developed to make the implementation as easy as possible. You can build it from scratch if you want, of course, but we won’t be doing that here.

In this article, we’ll build a Starbucks barista that collects order information and calls a create_order tool once the order meets the full criteria. This is a tool that we’ll create and expose to the AI.

Project Setup

Let’s start by initializing our project. We’ll use Nest.js for its efficiency and native TypeScript support. Note that nothing here is tied to Nest.js – this is just a framework preference, and everything we’ll do here can be done with Node.js and Express.js.

Here is a list of all the tools that we’ll use:

  1. langchain/core - Always required

    This is the main Langchain engine that defines all core tools and fundamental functions, containing:

    • prompt templates

    • message types

    • runnables

    • tool interfaces

    • chain composition utilities, and more.

Most LangChain project need this.

  1. langchain/google-genai - This package is used to interact with Google’s generative AI models, vector embedding models, and other related tools.

  2. langchain/langgraph - Important for building an AI agent with total control

    Langgraph is a low-level orchestration framework for building controllable agents. It can be used to build:

    • Conversational agents.

    • Build complex task automation.

    • Agent’s context management.

  3. langchain/langgraph-checkpoint-mongodb - This package provides a MongoDB-based checkpointer for LangGraph, enabling persistence of agent state and short-term memory using MongoDB.

  4. @langchain/mongodb - This package provides MongoDB integrations for LangChain, allowing you to:

    • Store and retrieve vector embeddings.

    • Persist LangChain documents, agents, or memory states.

    • Easily integrate MongoDB as a database backend for your AI workflows.

  5. @nestjs/mongoose - A NestJS wrapper around Mongoose for MongoDB. Provides:

    • Dependency injection support for Mongoose models.

    • Simplified schema definition and model management.

    • Seamless integration of MongoDB into NestJS applications, enabling structured data persistence for AI apps or any backend.

  6. langchain - This is the main npm package that aggregates LangChain functionality. It provides:

    • Access to connectors, utilities, and core modules.

    • Easy import of different LangChain components in one place.

    • Commonly used alongside @langchain/core for building applications with minimal setup.

  7. mongodb - The official MongoDB driver for Node.js. It provides:

    • Low-level, flexible access to MongoDB databases.

    • Support for CRUD operations, transactions, and indexing.

    • A required dependency if you plan to connect LangChain components or your backend directly to MongoDB.

  8. mongoose - An ODM (Object Data Modeling) library for MongoDB. Offers:

    • Schema-based data modeling for MongoDB documents.

    • Middleware, validation, and hooks for MongoDB operations.

    • Ideal for structured data management in NestJS or other Node.js applications.

  9. zod - A TypeScript-first schema validation library. Used for:

    • Defining strict data schemas and validating inputs/outputs.

    • Ensuring type safety at runtime.

    • Useful in AI applications to validate responses from models or enforce data consistency.

Start by initializing your Nest.js project, and installing all the required dependencies:

$ npm i -g @nestjs/cli //If you don't have Nest.js installed on your machine
$ nest new project-name

"dependencies" : {
    "@langchain/core": "^0.3.75",
    "@langchain/google-genai": "^0.2.16",
    "@langchain/langgraph": "^0.4.8",
    "@langchain/langgraph-checkpoint-mongodb": "^0.1.1",
    "@langchain/mongodb": "^0.1.0",
    "@nestjs/mongoose": "^11.0.3",
    "langchain": "^0.3.33",
    "mongodb": "^6.19.0",
    "mongoose": "^8.18.1",
    "zod": "^4.1.8"
}

//The versions may not be same at the time you are reading this, so I recommand checking
//The official documentation for each package.

Now that we have our project created and all the packages installed, let’s see what we need to do to turn our vision into a project. Think of what you’ll need in order to create a Starbucks barista:

  • First, we need to define the structure of our data (creating schemas)

  • Then we need to create a menu list that our agent will be referring to.

  • After that, we’ll add LLM interaction

  • And last but not least, we’ll add the ability to save previous conversations for conversational context.

Folder Structure

You can modify this folder structure and adapt it based on your framework of choice. But the core implementation is the same across all frameworks.

├── .env
├── .eslintrc.js
├── .gitignore
├── .prettierrc
├── nest-cli.json
├── package.json
├── README.md
├── tsconfig.build.json
├── tsconfig.json
├── src/
│   ├── app.controller.ts
│   ├── app.module.ts
│   ├── app.service.ts
│   ├── main.ts
│   ├── chat/
│   │   ├── chat.controller.ts
│   │   ├── chat.module.ts
│   │   ├── chat.service.ts
│   │   └── dtos/
│   │       └── chat.dto.ts
│   ├── data/
│   │   └── schema/
│   │       └── order.schema.ts
│   └── util/
│       ├── constants/
│       │   └── drinks_data.ts
│       ├── schemas/
│       │   ├── drinks/
│       │   │   └── Drink.schema.ts
│       │   └── orders/
│       │       └── Order.schema.ts
│       ├── summeries/
│       │   └── drink.ts
│       └── types/

Data Schematization with Zod

This file contains all our schema definitions regarding drinks and all modifications they can receive. This part is useful for defining the structure of the data that will be used by the AI agent.

Importing Zod

In the lib/util/schemas/drinks.ts file, before defining any schemas, import the Zod library, which provides tools for building TypeScript-first schemas.

// Imports the 'z' object from the 'zod' library.
// Zod is a TypeScript-first schema declaration and validation library.
// 'z' is the primary object used to define schemas (e.g., z.object, z.string, z.boolean, z.array).
import z from "zod";

Zod gives you a simple and expressive way to define and validate the structure of the data our agent will interact with.

Drink Schema

This schema represents the structure of a drink in the Starbucks-style menu. I split and explained each field so the reader clearly understands what each property controls.

export const DrinkSchema = z.object({
  name: z.string(),            // Required name of the drink
  description: z.string(),     // Required explanation of what the drink is
  supportMilk: z.boolean(),    // Whether milk options are available
  supportSweeteners: z.boolean(), // Whether sweeteners can be added
  supportSyrup: z.boolean(),   // Whether flavor syrups are allowed
  supportTopping: z.boolean(), // Whether toppings are supported
  supportSize: z.boolean(),    // Whether the drink can be ordered in sizes
  image: z.string().url().optional(), // Optional image URL
});

What this schema represents

  • It ensures every drink has a proper name and a description.

  • It defines which customizations apply to the drink.

  • It prepares the agent to reason about drink options in a structured, validated format.

Sweetener Schema

Each sweetener option in the menu is represented with its own schema.

export const SweetenerSchema = z.object({
  name: z.string(),                // Sweetener name
  description: z.string(),         // What it is / taste description
  image: z.string().url().optional(), // Optional image URL
});

This ensures consistency across all sweetener entries and avoids malformed data.

Syrup Schema

Similar to sweeteners, but for syrup flavors:


export const SyrupSchema = z.object({
  name: z.string(),
  description: z.string(),
  image: z.string().url().optional(),
});

This can represent flavors like Vanilla, Caramel, or Hazelnut.

Topping Schema

Toppings such as whipped cream or cinnamon are defined here.

export const ToppingSchema = z.object({
  name: z.string(),
  description: z.string(),
  image: z.string().url().optional(),
});

Size Schema

Drink sizes are modeled as objects as well:

export const SizeSchema = z.object({
  name: z.string(),               // e.g. Small, Medium
  description: z.string(),        // A short explanation
  image: z.string().url().optional(),
});

Milk Schema

Represents milk types such as Whole, Skim, Almond, or Oat.

export const MilkSchema = z.object({
  name: z.string(),
  description: z.string(),
  image: z.string().url().optional(),
});

Collections of Items

Now that the individual item schemas exist, we can create collections of them. These represent all available toppings, sizes, milk types, syrups, sweeteners, and the entire menu of drinks

export const ToppingsSchema = z.array(ToppingSchema);
export const SizesSchema = z.array(SizeSchema);
export const MilksSchema = z.array(MilkSchema);
export const SyrupsSchema = z.array(SyrupSchema);
export const SweetenersSchema = z.array(SweetenerSchema);
export const DrinksSchema = z.array(DrinkSchema);

Why arrays? Because in the real world, your agent will receive lists from a database or API—not single items.

Inferred Types

Zod also allows TypeScript to infer types from schemas automatically.

This ensures:

  • TypeScript types always match the schemas.

  • You avoid duplicated definitions.

  • The agent code stays consistent and safe.

export type Drink = z.infer<typeof DrinkSchema>;
export type SupportSweetener = z.infer<typeof SweetenerSchema>;
export type Syrup = z.infer<typeof SyrupSchema>;
export type Topping = z.infer<typeof ToppingSchema>;
export type Size = z.infer<typeof SizeSchema>;
export type Milk = z.infer<typeof MilkSchema>;

export type Toppings = z.infer<typeof ToppingsSchema>;
export type Sizes = z.infer<typeof SizesSchema>;
export type Milks = z.infer<typeof MilksSchema>;
export type Syrups = z.infer<typeof SyrupsSchema>;
export type Sweeteners = z.infer<typeof SweetenersSchema>;
export type Drinks = z.infer<typeof DrinksSchema>;

These provide the rest of your LangChain/LangGraph code with strong typing based on your schema definitions.

This entire file:

  • Encodes all drink-related data structures.

  • Provides validation to ensure clean, predictable data.

  • Automatically generates TypeScript types.

  • Helps the AI agent reason reliably about drinks and customization options.

You’ll use these schemas later and convert them into string representations for LLM prompts.

You can find the file containing all the code here.

How to Parse the Schema

As mentioned earlier, LLMs are text input–output machines. They don’t understand TypeScript types or Zod schemas directly. If you include a schema inside a prompt, the model will simply see it as plain text without understanding its structure or constraints.

Because of this, we need a way to convert schemas into a readable string format that can be embedded inside a prompt, such as:

“The output must be a JSON object with the following fields…”

This is exactly the problem solved by StructuredOutputParser from langchain/output_parsers. It takes a Zod schema and turns it into:

  • A human-readable description that can be sent to an LLM.

  • A validator that checks whether the model’s output matches the schema.

In short, it acts as a bridge between typed application logic and text-based AI output.

Defining the Order Schema

We’ll start with a simple Zod schema that represents a customer’s drink order. This schema defines the exact shape and constraints of the data we expect the model to produce.

export const OrderSchema = z.object({
  drink: z.string(),
  size: z.string(),
  mil: z.string(),
  syrup: z.string(),
  sweeteners: z.string(),
  toppings: z.string(),
  quantity: z.number().min(1).max(10),
});

export type OrderType = z.infer<typeof OrderSchema>;

At this point, the schema is useful only inside our TypeScript application. The LLM still has no idea what this structure means.

Parsing the Schema into Human-Readable Text

This is where schema parsing comes in. Using StructuredOutputParser.fromZodSchema, we can transform the Zod schema into:

  • Instructions the LLM can understand.

  • A runtime validator that ensures the response is correct.

export const OrderParser =
  StructuredOutputParser.fromZodSchema(OrderSchema as any);

The parser enables two critical workflows:

Generating prompt instructions

The parser can generate a text description of the schema that looks roughly like: “Return a JSON object with the fields drink, size, mil, syrup, sweeteners, and toppings as strings, and quantity as a number between 1 and 10.” This string can be injected directly into your prompt so the LLM knows exactly how to format its response.

Validating the model’s output

After the LLM responds, its output is still just text. The parser:

  • Converts that text into a JavaScript object.

  • Validates it against the original Zod schema.

  • Throws an error if anything is missing, malformed, or out of bounds.

This prevents invalid AI-generated data (for example, quantity: 0) from entering your system.

Reusing the Same Approach for Other Schemas

Once you understand this pattern, applying it to other schemas is straightforward.

For example, you can do the same thing for a DrinkSchema:

export const DrinkParser =
  StructuredOutputParser.fromZodSchema(DrinkSchema as any);

Now you can confidently say something like: “Hey Gemini, this is what a drink object looks like—please respond using this structure.”

Why This Matters

Schema parsing allows you to:

  • Keep strong typing in your application.

  • Give clear formatting instructions to the LLM.

  • Safely convert unstructured AI output into validated, production-ready data.

Without this step, working with LLMs at scale becomes unreliable and error-prone.

Data-to-Text Summarization

In the context of LLM agents, data-to-text summarization means converting structured data—such as objects returned from a database or backend API—into clear, human-readable strings that can be embedded directly into prompts.

Even the most advanced LLMs operate purely on text. They don’t reason over JavaScript objects, database rows, or JSON structures in the same way humans or programs do. The clearer and more descriptive your text input is, the more accurate and reliable the model’s output will be.

Because of this, a common and recommended pattern when building LLM-powered systems is:

Fetch structured data → summarize it into natural language → pass the summary into the prompt

To keep this article focused, we’ll store our data in constants instead of querying a real database. The technique is exactly the same whether the data comes from MongoDB, PostgreSQL, or an API.

The Core Idea

The goal of data-to-text summarization is simple:

  • Take an object with fields and boolean flags

  • Convert it into a short paragraph that explains what the object represents

  • Remove ambiguity and guesswork for the LLM

Instead of forcing the model to infer meaning from raw data, we spell it out explicitly.

Summarizing a Drink Object

Consider the following drink object:

{
  name: 'Espresso',
  description: 'Strong concentrated coffee shot.',
  supportMilk: false,
  supportSweeteners: true,
  supportSyrup: true,
  supportTopping: false,
  supportSize: false,
}

While this structure is easy for developers to understand, it’s not ideal for an LLM prompt. Boolean flags like supportMilk: false require interpretation, which increases the chance of incorrect assumptions.

Instead, we convert this object into a descriptive paragraph:

“A drink named Espresso. It is described as a strong, concentrated coffee shot. It cannot be made with milk. It can be made with sweeteners. It can be made with syrup. It cannot be made with toppings. It cannot be made in different sizes.”

This transformation is exactly what data-to-text summarization provides.

A Standard Summarization Pattern

Below is a simplified example of how we convert a Drink object into a readable description.

export const createDrinkItemSummary = (drink: Drink): string => {
  const name = `A drink named ${drink.name}.`;
  const description = `It is described as ${drink.description}.`;

  const milk = drink.supportMilk
    ? 'It can be made with milk.'
    : 'It cannot be made with milk.';

  const sweeteners = drink.supportSweeteners
    ? 'It can be made with sweeteners.'
    : 'It cannot contain sweeteners.';

  const syrup = drink.supportSyrup
    ? 'It can be made with syrup.'
    : 'It cannot be made with syrup.';

  const toppings = drink.supportTopping
    ? 'It can be made with toppings.'
    : 'It cannot be made with toppings.';

  const size = drink.supportSize
    ? 'It can be made in different sizes.'
    : 'It cannot be made in different sizes.';

  return `${name} ${description} ${milk} ${sweeteners} ${syrup} ${toppings} ${size}`;
};

Why this works well for LLMs

  • Boolean logic is converted into explicit sentences

  • Every capability and limitation is clearly stated

  • The output can be embedded directly into a system or user prompt

Summarizing Collections of Data

This same approach applies to lists of data such as milks, syrups, toppings, or sizes. Instead of passing an array of objects to the model, we convert them into bullet-style text summaries:

export const createSweetenersSummary = (): string => {
  return `Available sweeteners are:
${SWEETENERS.map(
  (s) => `- ${s.name}: ${s.description}`
).join('\n')}`;
};

This gives the model a complete, readable overview of available options without requiring it to interpret raw arrays.

Applying the Same Idea to Other Domains

This pattern is not limited to drinks or menus. It works for any domain. For example, here’s the same summarization technique applied to an object representing a shoe in an online ordering assistant:

export const createShoeItemSummary = (shoe: {
  name: string;
  description: string;
  genderCategory: string;
  styleType: string;
  material: string;
  availableInMultipleColors: boolean;
  limitedEdition: boolean;
  supportsCustomization: boolean;
}): string => {
  return `
A shoe named ${shoe.name}.
It is described as ${shoe.description}.
It is categorized as a ${shoe.genderCategory.toLowerCase()} shoe.
It belongs to the ${shoe.styleType.toLowerCase()} fashion style.
It is made of ${shoe.material.toLowerCase()} material.
${shoe.availableInMultipleColors ? 'It is available in multiple colors.' : 'It is available in a single color.'}
${shoe.limitedEdition ? 'It is a limited-edition release.' : 'It is not a limited-edition release.'}
${shoe.supportsCustomization ? 'It supports customization options.' : 'It does not support customization options.'}
`.trim();
};

Which produces an output like:

“A shoe named Veloria Canvas Sneaker. It is described as a minimalist everyday sneaker designed for casual wear. It is categorized as a unisex shoe. It belongs to the casual fashion style. It is made of breathable canvas material. It is available in multiple colors. It is not a limited-edition release. It supports light customization options.”

How to Persist Orders with MongoDB in NestJS

Now that we’ve established the core foundations of our application—schemas, parsers, and data-to-text summaries—it’s time to persist data. In a real-world assistant, orders and conversations shouldn’t disappear when the server restarts. They need to be stored reliably so they can be retrieved, analyzed, or continued later.

To achieve this, we’ll use MongoDB as our database and the NestJS Mongoose integration to manage data models and collections.

Connecting MongoDB to a NestJS Application

In NestJS, the AppModule is the root module of the application. This is where global dependencies—such as database connections—are configured.

@Module({
  imports: [
    MongooseModule.forRoot(process.env.MONGO_URI),
    ChatsModule,
  ],
  controllers: [AppController],
  providers: [AppService],
})
export class AppModule {}

What’s happening here?

  • MongooseModule.forRoot(...) establishes a global MongoDB connection.

  • The connection string is read from an environment variable (MONGO_URI), which is the recommended practice for security.

  • Once configured, this connection becomes available throughout the entire application.

  • ChatsModule is imported so it can access the database connection and register its own schemas.

This setup ensures that every feature module can safely interact with MongoDB without creating multiple connections.

Defining an Order Schema with Mongoose

NestJS uses decorators to define MongoDB schemas in a clean, class-based way. Each class represents a MongoDB document, and each property becomes a field in the collection.

@Schema()
export class Order {
  @Prop({ required: true })
  drink: string;

  @Prop({ default: null })
  size: string;

  @Prop({ default: null })
  milk: string;

  @Prop({ default: null })
  syrup: string;

  @Prop({ default: null })
  sweeter: string;

  @Prop({ default: null })
  toppings: string;

  @Prop({ default: 1 })
  quantity: number;
}

Why this approach?

  • Each @Prop() decorator maps directly to a MongoDB field.

  • Default values allow partial orders to be saved incrementally.

  • Required fields (like drink) enforce basic data integrity.

  • The schema closely mirrors the structured output produced by the LLM.

Once the class is defined, it’s converted into a MongoDB schema:

export const OrderSchema = SchemaFactory.createForClass(Order);

This single line creates:

  • A MongoDB collection

  • A validation layer

  • A schema that Mongoose can use to create, read, and update orders

How This Fits into the LLM Agent Architecture

At this point, we have:

  • Zod schemas → for validating AI output

  • Summarization functions → for converting data into readable prompts

  • MongoDB schemas → for persisting finalized orders

This separation is intentional:

  • Zod handles AI-facing validation

  • Mongoose handles database persistence

  • NestJS acts as the glue that ties everything together

Preparing for the Agent Logic

With the database in place, we’re now ready to implement the agent itself.

The agent’s responsibilities will include:

  • Interpreting user messages

  • Calling tools

  • Generating structured orders

  • Validating them

  • Persisting them to MongoDB

  • Maintaining conversational state

All of this logic will live inside the src/chats/chats.service.ts file. The next section introduces the agent’s core logic, and we’ll walk through it step by step so every part is easy to follow.

Start by importing the required dependencies:


import { Injectable } from '@nestjs/common';
import { InjectModel } from '@nestjs/mongoose';
import { MongoClient } from 'mongodb';
import { Model } from 'mongoose';

import { tool } from '@langchain/core/tools';
import {
  ChatPromptTemplate,
  MessagesPlaceholder,
} from '@langchain/core/prompts';
import { AIMessage, BaseMessage, HumanMessage } from '@langchain/core/messages';

import { ChatGoogleGenerativeAI } from '@langchain/google-genai';
import { StateGraph } from '@langchain/langgraph';
import { ToolNode } from '@langchain/langgraph/prebuilt';
import { Annotation } from '@langchain/langgraph';
import { START, END } from '@langchain/langgraph';

import { MongoDBSaver } from '@langchain/langgraph-checkpoint-mongodb';

import z from 'zod';

import { Order } from './schemas/order.schema';
import { OrderParser, OrderSchema, OrderType } from 'src/lib/schemas/orders';
import { DrinkParser } from 'src/lib/schemas/drinks';
import { DRINKS } from 'src/lib/utils/constants/menu_data';

import {
  createSweetenersSummary,
  availableToppingsSummary,
  createAvailableMilksSummary,
  createSyrupsSummary,
  createSizesSummary,
  createDrinkItemSummary,
} from 'src/lib/summaries';

const GOOGLE_API_KEY = process.env.GOOGLE_API_KEY || '';
const client: MongoClient = new MongoClient(process.env.MONGO_URI || '');
const database_name = 'drinks_db';

LangGraph State/Annotation Terms

In LangGraph, state can be thought of as a temporary workspace that exists while the agent is running. It stores all the information that nodes (we’ll cover nodes in detail later) might need to access information like the last message, the history of the conversation, or any intermediate data generated during execution.

This state allows nodes to read from it, update it, and pass information along as the agent processes a workflow, making it the agent’s short-term memory for the duration of the run.

@Injectable()
export class ChatService {

  chatWithAgent = async ({
    thread_id,
    query,
  }: {
    thread_id: string;
    query: string;
  }) => {

    const graphState = Annotation.Root({
      messages: Annotation<BaseMessage[]>({
        reducer: (x, y) => [...x, ...y],
      }),
    });

  }

}

This code defines the LangGraph state for the chat agent. The graphState object acts as a central memory that every node in the workflow can read from and update.

The messages field specifically stores all messages in the conversation, including user messages, AI responses, and tool outputs. The reducer function [...x, ...y] appends new messages to the existing array, preserving the conversation history across multiple steps.

LangGraph’s reducer mechanism lets developers control how new state merges with old state. In this chat system, the approach is similar to updating React state with setMessages(prev => [...prev, ...newMessages]): it keeps the old messages while adding the new ones.

Together, this state enables the agent, tools, and checkpointing system to maintain a coherent conversation, allowing each node in the LangGraph workflow to access the full context and contribute incrementally.

How to Create Tools for the Agent

Modern chatbots can do more than just generate text - they can also search the internet, read files, or perform computations. While LLMs are powerful, they cannot execute code or compile programs on their own.

In the code text of LLM agents, a tool is a piece of code written by the agent developer that an LLM can invoke on the host machine. The host machine executes the code, and the LLM only receives the final output of the computation.

Here's how to create a tool that stores orders in the database. Still in the chatWithAgent function within the ChatService class. Bellow the state store definition:

const orderTool = tool(
  async ({ order }: { order: OrderType }) => {
    try {
      await this.orderModel.create(order);
      return 'Order created successfully';
    } catch (error) {
      console.log(error);
      return 'Failed to create the order';
    }
  },
  {
    schema: z.object({
      order: OrderSchema.describe('The order that will be stored in the DB'),
    }),
    name: 'create_order',
    description: 'This tool creates a new order in the database',
  }
);

const tools = [orderTool];

LangGraph Nodes (Workflow Components)

From a definition standpoint, a LangGraph node is a fundamental component of a LangGraph workflow, representing a single unit of computation or an individual step in an AI agent's process.

Each node can perform a specific task, such as generating a message, invoking a tool, or transforming data, and it interacts with the state to read inputs and write outputs. Together, nodes are connected to form the agent’s workflow or execution graph, allowing complex reasoning and multi-step operations.

In our project, we’ll have four nodes.

  1. Agent node: This node is in charge of interacting with the LLM - it constructs the agent’s main message template and stacks old messages to the new prompt to create context.

  2. Tools node: The tools node introduces external capabilities, which allow the workflow to interact with external APIs

  3. START node: This node indicates the entry point of our workflow, or to be precise, which node to call when a user initiates a conversation with the agent. It’s quite simple to define.

  4. addConditionalEdges - addConditionalEdges('agent', shouldContinue): In LangGraph, .addConditionalEdges('agent', shouldContinue) lets the workflow branch dynamically after the 'agent' node runs, based on a condition defined in shouldContinue. Unlike a fixed edge, which always goes from one node to the next, a conditional edge evaluates the agent’s output and directs the workflow to different nodes depending on the result, allowing the AI agent to make decisions and adapt its next steps.

Graph Declaration

In LangGraph, a graph is the central structure that models an AI agent’s workflow as interconnected nodes, where each node represents a computation step, tool, or decision. It orchestrates the flow of data and control between nodes, manages conditional branching, and maintains the recursive loop of execution.

Essentially, the graph is the backbone that ensures complex, stateful interactions happen in a coordinated and modular way, connecting nodes like agent, tools, and conditional edges into a coherent workflow.

With that knowledge in place, we can now create the agent graph with all its nodes.

  const callModal = async (states: typeof graphState.State) => {
    const prompt = ChatPromptTemplate.fromMessages([
      {
        role: 'system',
        content: `
            You are a helpful assistant that helps users order drinks from Starbucks.
            Your job is to take the user's request and fill in any missing details based on how a complete order should look.
            A complete order follows this structure: ${OrderParser}.

            **TOOLS**
            You have access to a "create_order" tool.
            Use this tool when the user confirms the final order.
            After calling the tool, you should inform the user whether the order was successfully created or if it failed.

            **DRINK DETAILS**
            Each drink has its own set of properties such as size, milk, syrup, sweetener, and toppings.
            Here is the drink schema: ${DrinkParser}.

            You must ask for any missing details before creating the order.

            If the user requests a modification that is not supported for the selected drink, tell them that it is not possible.

            If the user asks for something unrelated to drink orders, politely tell them that you can only assist with drink orders.

            **AVAILABLE OPTIONS**
            List of available drinks and their allowed modifications:
            ${DRINKS.map((drink) => `- ${createDrinkItemSummary(drink)}`)}

            Sweeteners: ${createSweetenersSummary()}
            Toppings: ${availableToppingsSummary()}
            Milks: ${createAvailableMilksSummary()}
            Syrups: ${createSyrupsSummary()}
            Sizes: ${createSizesSummary()}

            Order schema: ${OrderParser}

            If the user's query is unclear, tell them that the request is not clear.

            **ORDER CONFIRMATION**
            Once the order is ready, you must ask the user to confirm it.
            If they confirm, immediately call the "create_order" tool.
            Only respond after the tool completes, indicating success or failure.

            **FRONTEND RESPONSE FORMAT**
            Every response must include:

            "message": "Your message to the user",
            "current_order": "The order currently being constructed",
            "suggestions": "Options the user can choose from",
            "progress": "Order status ('completed' after creation)"

            **IMPORTANT RULES**
            - Be friendly, use emojis, and add humor.
            - Use null for unfilled fields.
            - Never omit the JSON tracking object.
        `,
      },
      new MessagesPlaceholder('messages'),
    ]);

  const formattedPrompt = await prompt.formatMessages({
    time: new Date().toISOString(),
    messages: states.messages,
  });

  const chat = new ChatGoogleGenerativeAI({
    model: 'gemini-2.0-flash',
    temperature: 0,
    apiKey: GOOGLE_API_KEY,
  }).bindTools(tools);

  const result = await chat.invoke(formattedPrompt);
  return { messages: [result] };
  };     
    const shouldContinue = (state: typeof graphState.State) => {
      const lastMessage = state.messages[
        state.messages.length - 1
      ] as AIMessage;
      return lastMessage.tool_calls?.length ? 'tools' : END;
    };

    const toolsNode = new ToolNode<typeof graphState.State>(tools);

    /**
     * Build the conversation graph.
     */
    const graph = new StateGraph(graphState)
      .addNode('agent', callModal)
      .addNode('tools', toolsNode)
      .addEdge(START, 'agent')
      .addConditionalEdges('agent', shouldContinue)
      .addEdge('tools', 'agent');

Explanation

  • Graph State (graphState)
    The graphState object is the shared memory across all nodes. It stores messages, which track the conversation history including user inputs, AI responses, and tool interactions. The reducer [...x, ...y] appends new messages, preserving past context. This is similar to React state updates: old messages remain while new ones are added.

  • Agent Node (callModal)
    This node handles the LLM call. It formats a prompt containing system instructions, drink schemas, available tools, and frontend response rules. By including states.messages, the AI sees the full conversation history, enabling multi-turn dialogue.

  • LLM Execution
    ChatGoogleGenerativeAI generates the AI response. .bindTools(tools) allows the AI to call tools like create_order directly if needed.

  • Conditional Flow (shouldContinue)
    After the AI responds, the shouldContinue function checks if the message includes tool calls. If so, execution moves to the tools node; otherwise, the workflow ends. This allows dynamic branching depending on the AI’s output.

  • Tool Node (ToolNode)
    The tools node executes the requested tool, such as saving the order to the database. Once completed, control returns to the agent node, enabling the AI to respond to the user with results.

  • Graph Construction (StateGraph)
    Nodes are connected in a coherent workflow:

    • START → agent begins the conversation

    • Conditional edges handle tool execution

    • tools → agent ensures the agent can respond after tools run

  • Overall Flow
    Together, the graph and shared state ensure a stateful, multi-turn conversation. The AI can ask for missing details, call tools when needed, and maintain context across interactions. Every node reads and writes to the same state.

Workflow Compilation and State Persistence (Final Part)

So far, all of our states are temporary, meaning they only exist for the duration of a user’s request. However, we want our agent to remember and recall conversation context even when a new request is sent with the same thread_id or conversation ID.

To achieve this, we’ll use MongoDB in combination with the langchain/langgraph-checkpoint-mongo library. This library simplifies state persistence by associating each conversation with a unique, manually assigned ID. All operations—from retrieving previous messages to saving new ones—are handled internally, you only need to provide the conversation ID you want to work with.

const graph = new StateGraph(graphState)
  .addNode('agent', callModal)
  .addNode('tools', toolsNode)
  .addEdge(START, 'agent')
  .addConditionalEdges('agent', shouldContinue)
  .addEdge('tools', 'agent');

  const checkpointer = new MongoDBSaver({ client, dbName: database_name });

  const app = graph.compile({ checkpointer });

  /**
     * Run the graph using the user's message.
     */
    const finalState = await app.invoke(
      { messages: [new HumanMessage(query)] },
      { recursionLimit: 15, configurable: { thread_id } },
    );

  /**
   * Extract JSON payload from AI response.
   */
  function extractJsonResponse(response: any) {
    const match = response.match(/```json\\s*([\\s\\S]*?)\\s*```/i);
    if (match && match[1] && typeof response === 'string') {
      return JSON.parse(match[1].trim());
    }
    throw response;
  }

  const lastMessage = finalState.messages.at(-1) as AIMessage; // Extract the last message of the conversation
  return extractJsonResponse(lastMessage.content); //Response

The above code demonstrates how to initialize a checkpoint, compile a graph, and invoke the agent with an incoming prompt.

The extractJsonResponse method is used to grab the formatted response that we instructed the LLM to generate whenever it’s sending back something to the user.

Based on this given instruction from the main template, every response must include: "message": "Your message to the user", "current_order": "The order currently being constructed", "suggestions": "Options the user can choose from", "progress": "Order status ('completed' after creation)"

Every response from the LLM should look like this:

'```json\\n' +
  '{\\n' +
  '"message": "Got it! To make sure I get your order just right, can you clarify which coffee drink you\\'d like? We have Latte, Cappuccino, Cold Brew, and Frappuccino. 😊",\\n' +
  '"current_order": {\\n' +
  '"drink": null,\\n' +
  '"size": null,\\n' +
  '"mil": null,\\n' +
  '"syrup": null,\\n' +
  '"sweeteners": null,\\n' +
  '"toppings": null,\\n' +
  '"quantity": null\\n' +
  '},\\n' +
  '"suggestions": [\\n' +
  '"Latte",\\n' +
  '"Cappuccino",\\n' +
  '"Cold Brew",\\n' +
  '"Frappuccino"\\n' +
  '],\\n' +
  '"progress": "incomplete"\\n' +
  '}\\n' +
  '```';

This structure allows the frontend to easily render the LLM response and track the state of the current order. This is more of a design choice and less of a convention.

Conclusion

Building an autonomous AI agent with LangChain and LangGraph allows you to combine the reasoning power of LLMs with practical tool execution and persistent memory. By defining schemas, parsing data into human-readable formats, and orchestrating workflows through nodes, you can create intelligent agents capable of handling real-world tasks—like our Starbucks barista.

With MongoDB integration for state persistence, your agent can maintain context across conversations, making interactions feel more natural and human-like. This approach opens the door to building more sophisticated, domain-specific AI assistants without starting from scratch.

In short: define your data, teach your agent how to reason, and let LangGraph orchestrate the magic. ☕🤖

Source code here: https://github.com/DjibrilM/langgraph-starbucks-agent

Resources



Read the whole story
alvinashcraft
31 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

🤖 Local AI Power: Vision and Function Calling with Microsoft Agent Framework and Ollama

1 Share

⚠ This blog post was created with the help of AI tools. Yes, I used a bit of magic from language models to organize my thoughts and automate the boring parts, but the geeky fun and the 🤖 in C# are 100% mine.

Hola friends!

One of the questions I get most often lately is: “Bruno, can I run a full-featured agent locally without sending everything to the cloud?”

The answer is a resounding YES. 🚀

Today, I want to show you how to use the new Microsoft Agent Framework to build a local agent that doesn’t just chat, but also calls functions and analyzes images (Vision!). All powered by Ollama and the Ministral-3 model.

You can start with the video, or the full description below:

🛠 The Scenario

We are going to build a console application that:

  1. Connects to a local instance of Ollama.
  2. Uses the ministral-3 model (which is amazing for local tasks).
  3. Implements a C# function to “get the weather” (simulated).
  4. Processes a local image file to describe its content.

📝 The Code

The beauty of the Microsoft Agent Framework is how it integrates with Microsoft.Extensions.AI. Check out how clean this code is:

C#

using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using OllamaSharp;
using System.ComponentModel;

[Description("Get the weather for a given location.")]
static string GetWeather([Description("The location to get the weather for.")] string location)
    => $"The weather in {location} is cloudy with a high of 15°C.";

// 1. Setup the Local Client with Ollama
IChatClient chatClient = 
    new OllamaApiClient(new Uri("http://localhost:11434/"), "ministral-3");

// 2. Build the Agent with Function Calling support
var ollamaAgent = chatClient.AsBuilder()
    .UseFunctionInvocation()
    .Build()
    .CreateAIAgent(
        name: "OllamaAgent",
        instructions: "You are a useful agent",
        tools: [AIFunctionFactory.Create(GetWeather)]);

// 3. Ask a standard question
Console.WriteLine(await ollamaAgent.RunAsync("What is the capital of France?"));

// 4. Trigger Function Calling (Local!)
Console.WriteLine(await ollamaAgent.RunAsync("What is the weather like in Amsterdam?"));

// 5. Vision Analysis: Send an image as a Byte Array
var imageByteArray = File.ReadAllBytes("image01.jpg");
ChatMessage message = new(ChatRole.User, [
    new TextContent("What do you see in this image?"),
    new DataContent(imageByteArray, "image/png")
]);

Console.WriteLine(await ollamaAgent.RunAsync(message));

This is the image that I used for the sample:

And this is the complete response from the console app:

The capital of France is **Paris**.

The weather in **Amsterdam** is currently **cloudy**, with a high temperature of **-5°C**. It might be quite chilly, so bundle up! Let me know if you'd like updates or forecasts for another location.

This image depicts a charming and stylized scene featuring a raccoon. Here are the details:

1. **Raccoon**: The central figure is a raccoon sitting upright on its hind legs. It has a friendly expression and is adorned with a necklace featuring a red maple leaf pendant, which is symbolic of Canada.

2. **Background**: The raccoon is situated in a picturesque natural setting. Behind it, there is a serene lake with calm waters reflecting the surrounding scenery.

3. **Forest**: The lake is surrounded by a dense forest of tall evergreen trees, creating a lush green backdrop.

4. **Mountains**: In the distance, there are snow-capped mountains, adding to the scenic beauty of the landscape.

5. **Foreground**: The foreground includes a dirt path with some rocks and wildflowers, adding to the natural and tranquil ambiance of the scene.

Overall, the image combines elements of nature with a cute and anthropomorphic raccoon, creating a peaceful and inviting atmosphere.

💡 Why this matters

Running agents locally with the Microsoft Agent Framework gives you:

  • Privacy: Your images and data never leave your machine.
  • Latency: No round-trips to the cloud.
  • Cost: It’s essentially free (once you have the hardware!).

If you want to see this in action, I’ve recorded an 8-minute video walking through the entire setup. You can find the links to the resources and the video below!

📚 Resources

Happy coding!

Greetings

El Bruno

More posts in my blog ElBruno.com.

More info in https://beacons.ai/elbruno






Read the whole story
alvinashcraft
31 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories