This entry is part of the 2025 F# Advent Calendar, a month-long celebration across the community. It’s a privilege to contribute alongside so many talented developers who share a passion for this ever-growing language ecosystem.
Why ‘Ask AI’ Exists
Every blog accumulates something beyond articles: an implicit body of knowledge. Years of technical writing create relationships between concepts, build upon shared foundations, and form an interconnected corpus. In the case of this site, adding portfolio descriptions and company information to the blog entries brings the indexed document count to nearly 100.
That’s a substantial corpus covering everything from low-level memory safety to cloud architecture to hardware security. A visitor arriving with a specific question currently must navigate manually, scanning titles and skimming content to find relevant material. Even with filtering and sorting in the blog listing, narrowing down to what’s relevant for a particular visitor’s context is challenging. The friction is real.
An intelligent query system transforms passive content into an active knowledge resource. But the implementation matters. Ask AI doesn’t simply retrieve text and ask a language model to summarize it. It returns a synthesized answer alongside a rank-ordered list of source documents, each with relevance scores that explain why they contributed to the response. This transparency changes the nature of the interaction: the visitor can trust the answer because they can verify it.
The business case is straightforward. For a ‘deep tech’ startup developing complex hardware and software offerings, the blog represents intellectual capital. Making that capital discoverable through natural language questions reduces the barrier between a visitor’s curiosity and the company’s expertise. When someone asks a question, they get an answer grounded in actual content, with links to dive deeper.
This turns the concept of the static “FAQ” on its head. A resource that’s often outdated as soon as it’s deployed is now a living system that updates automatically as the site gains information and additional context.
What Cloudflare “AI Search” Provides
Cloudflare’s AI Search, their new internal name for AutoRAG, is a managed Retrieval-Augmented Generation pipeline. The service handles the complex orchestration that RAG systems require: document ingestion, intelligent chunking, embedding generation, vector storage, similarity search, and response synthesis. For developers who want RAG capabilities without building infrastructure, it’s a compelling starting point.
This example implementation works like this: content (markdown documents from the site - blog entries, product descriptions and company information) is synchronized into an R2 bucket. AI Search monitors that bucket, automatically processing new or modified documents. When content changes, the system chunks the text, generates embeddings using their BGE model, and stores everything in a Vectorize index. At query time, the question is embedded, similar chunks are retrieved, and an LLM synthesizes a response using those chunks as context.
What makes this particularly useful is the metadata flow. Custom attributes attached to R2 objects propagate through the pipeline and appear in query responses. This means we can attach document titles, URLs, publication dates, and other context that helps both the LLM and the end user understand where information comes from.
Loading diagram...
The diagram shows two pipelines: content flows from markdown through the sync script into R2, where AI Search indexes it; queries flow from the user through the ask-ai worker, which calls AI Search to retrieve context and generate responses.
CloudflareFS: The Enabling Infrastructure
Here’s where the story becomes interesting. Cloudflare provides extensive TypeScript SDKs and a CLI tool called Wrangler for development and deployment. But for teams working in F#, context-switching to TypeScript for infrastructure feels unnecessary when, frankly, a better option in F# exists.
CloudflareFS is a collection of F# bindings that provide type-safe access to Cloudflare’s runtime and management APIs. The project leverages Fable to transpile F# to JavaScript, meaning F# code runs indirectly in Cloudflare Workers. But the bindings didn’t appear from thin air. They’re the product of a toolchain that transforms Cloudflare’s own specifications into idiomatic F#.
The Binding Generation Pipeline
Two foundational tools make CloudflareFS possible: Hawaii and Glutinum. Both represent years of community investment in F# tooling, and Ask AI wouldn’t exist without them.
Hawaii processes OpenAPI specifications and generates F# HTTP clients. Cloudflare publishes their management APIs as OpenAPI specs, which means Hawaii can automatically generate typed clients for provisioning R2 buckets, D1 databases, Workers deployments, and more. The generated code handles serialization, HTTP mechanics, and error responses, leaving application developers to focus on business logic.
Glutinum takes a different approach, transforming TypeScript definitions into F# interface bindings. Cloudflare’s worker runtime types, their AI SDK, the D1 database client, R2 storage APIs, these are all published as TypeScript. Glutinum parses those definitions and produces F# interfaces that Fable can emit as the correct JavaScript calls.
The combination is powerful. Hawaii provides the management layer for provisioning and deploying resources from CI/CD pipelines or local development tools. Glutinum provides the runtime layer for code that actually executes inside Workers. Together, they enable a full F# workflow from development through deployment.
// Hawaii-generated management client example
// Provisions a D1 database via Cloudflare's API
let createDatabase (config: CloudflareConfig) (name: string) =
async {
let client = D1Client(config.ApiToken, config.AccountId)
let! result = client.CreateDatabase({ Name = name })
match result with
| Ok database ->
printfn "Created D1 database: %s (ID: %s)" name database.Uuid
return Ok database.Uuid
| Error e ->
return Error $"Failed to create database: {e.Message}"
}
// Glutinum-generated runtime bindings example
// Used inside a Worker at request time
let queryDatabase (db: D1Database) (question: string) =
promise {
let sql = "SELECT * FROM query_log WHERE query_text LIKE ?"
let stmt = db.prepare(sql).bind($"%%{question}%%")
let! result = stmt.all<QueryLogEntry>()
return result.results |> Option.defaultValue (ResizeArray())
}
Notice the different computation expressions. Management operations use standard F# async workflows since they run in .NET on developer machines or CI/CD systems. Runtime operations use promise, a Fable-provided computation expression that compiles to JavaScript Promises. This isn’t a departure from F# idioms; it’s an adaptation. The promise CE provides familiar let! and return syntax while ensuring the generated JavaScript uses native Promise semantics that Workers expect.
The ask-ai Worker Implementation
With bindings in place, implementing the worker is remarkably clean. The worker exposes two endpoints: a streaming /ask-stream for the interactive UI and a non-streaming /ask for simpler integrations.
Worker Environment Bindings
Cloudflare Workers receive their configuration through environment bindings, typed references to resources like databases, AI services, and storage buckets. CloudflareFS defines these as F# interfaces:
[<AllowNullLiteral>]
[<Interface>]
type WorkerEnv =
inherit Env
abstract member DB: D1Database with get
abstract member AI: Ai<obj> with get
abstract member ALLOWED_ORIGIN: string with get
abstract member AUTORAG_NAME: string with get
The Ai binding provides access to the full Workers AI surface, including the autorag method that returns an AI Search client. The D1Database binding enables analytics logging. Both are injected by Cloudflare at runtime, meaning the worker code never manages credentials or connection strings.
Handling Queries with Persona Context
One design decision worth highlighting: Ask AI accepts optional persona and interest parameters that adjust how responses are framed. A business leader asking about the Fidelity framework receives different emphasis than an engineer asking the same question.
/// Build context prefix based on user persona and interests
let buildContextPrefix (persona: string option) (interests: string array) : string =
let personaText =
match persona |> Option.defaultValue "engineer" with
| "business" -> "I am a business leader."
| "academic" -> "I am an academic."
| "security" -> "I am a security professional."
| _ -> "I am an engineer."
let interestText =
let interestNames =
interests
|> Array.choose (function
| "fidelity" -> Some "the Fidelity framework"
| "cloudflarefs" -> Some "CloudflareFS"
| _ -> None)
if interestNames.Length > 0 then
" I am principally interested in " + (String.concat " and " interestNames) + "."
else
""
personaText + interestText
This context prefix prepends the user’s question before it reaches AI Search, subtly steering the response toward the user’s perspective. The LLM sees “I am an engineer. I am principally interested in CloudflareFS. How do you deploy workers?” rather than just the bare question.
Streaming Response Architecture
The streaming endpoint deserves attention because it demonstrates a pattern that’s increasingly common in AI applications: Server-Sent Events (SSE) that deliver both structured data and incremental text.
/// Handle streaming /ask-stream POST endpoint
/// Returns SSE with sources first, then streams AI response chunks
let handleAskStreamRequest (request: Request) (env: WorkerEnv) (ctx: ExecutionContext) : JS.Promise<Response> =
promise {
let startTime = DateTime.UtcNow
let queryId = Guid.NewGuid().ToString()
let! body = request.json<AskRequest>()
let question = body.question.Trim()
if String.IsNullOrWhiteSpace(question) then
return jsonResponse { error = "Question is required" } 400
else
// Build context prefix from persona and interests
let interests = if isNullOrUndefined body.interests then [||] else body.interests
let contextPrefix = buildContextPrefix body.persona interests
let fullQuery = if String.IsNullOrEmpty(contextPrefix) then question else $"{contextPrefix} {question}"
let autorag = env.AI.autorag(env.AUTORAG_NAME)
// Step 1: Get sources via search (no LLM, fast)
let searchRequest: AutoRagSearchRequest = !!createObj [
"query" ==> fullQuery
"max_num_results" ==> 5
]
let! searchResult = autorag.search(searchRequest)
let sources = extractSources searchResult.data
// Step 2: Get streaming AI response
let streamRequest: AutoRagAiSearchRequestStreaming = !!createObj [
"query" ==> fullQuery
"max_num_results" ==> 5
"stream" ==> true
]
let! streamResponse = autorag.aiSearch(streamRequest)
// Create a TransformStream to build our SSE response
let transformStream: obj = emitJsExpr () "new TransformStream()"
let readable: obj = transformStream?readable
let writable: obj = transformStream?writable
let writer: obj = writable?getWriter()
let encoder: obj = emitJsExpr () "new TextEncoder()"
// ... stream processing logic ...
// Return SSE response immediately with the readable stream
let headers = Globals.Headers.Create()
headers.set("Content-Type", "text/event-stream")
headers.set("Cache-Control", "no-cache")
headers.set("Connection", "keep-alive")
return Globals.Response.Create(!!readable, !!createObj [
"status" ==> 200
"headers" ==> headers
])
}
The key here is the two-phase approach. First, we call search to get sources without invoking the LLM, which is fast. As soon as the AI summary begins to stream tokens, we send those sources to the client via an SSE event. Then the streaming AI response continues, and it forwards chunks as they arrive. The user sees sources initially, then watches the answer materialize, this creates a responsive experience even though the full result may take several seconds to generate.
Source Extraction and URL Generation
AI Search returns metadata attached to retrieved documents, but the format requires transformation for the frontend:
/// Extract sources from AutoRAG search response data
let private extractSources (data: ResizeArray<AutoRagSearchResponse.data>) : SourceReference array =
data.ToArray()
|> Array.map (fun item ->
let attrs = item.attributes
let titleVal: obj = attrs.["title"]
let urlVal: obj = attrs.["url"]
let title =
if isNullOrUndefined titleVal then filenameToTitle item.filename
else string titleVal
// Generate clean URL from filename if url metadata missing
// Filename format: "section--slug.md" (e.g., "blog--my-post.md")
let url =
if isNullOrUndefined urlVal then
let filename = item.filename.Replace(".md", "")
if filename.Contains("--") then
let parts = filename.Split([|"--"|], StringSplitOptions.None)
let section = parts.[0]
let slug = parts.[1]
$"/{section}/{slug}/"
else
$"/blog/{filename}/"
else string urlVal
{ title = title; url = url; relevance = item.score }
)
|> Array.distinctBy (fun s -> s.url)
|> Array.sortByDescending (fun s -> s.relevance)
This logic handles cases where metadata might be missing by deriving URLs from filenames. The filename convention, section--slug.md, encodes enough information to reconstruct the URL path. Deduplication ensures the same document doesn’t appear multiple times if multiple chunks matched.
The Content Synchronization Script
Content reaches AI Search through R2, and the sync script manages that pipeline. Rather than using a Worker for this task, we use an F# script (sync-content.fsx) that runs in .NET, leveraging the AWS S3 SDK since R2 is S3-compatible.
#!/usr/bin/env dotnet fsi
// Sync site content to R2 bucket for AutoRAG indexing
#r "nuget: AWSSDK.S3, 3.7.305"
open System
open System.IO
open System.Net.Http
open System.Security.Cryptography
open Amazon.S3
open Amazon.S3.Model
// Content directories to sync (relative to hugo/content)
let contentDirs = [
"blog", "/blog/" // blog posts -> /blog/{slug}/
"company", "/company/" // company pages -> /company/{slug}/
"portfolio", "/portfolio/" // portfolio pages -> /portfolio/{slug}/
]
// Helper: normalize filename to R2 key with section prefix
// e.g., ("blog", "my-post.md") -> "blog--my-post.md"
let toR2Key (section: string) (filename: string) =
let normalizedFilename = filename.ToLowerInvariant().Replace(" ", "-")
$"{section}--{normalizedFilename}"
// Upload new or modified files
for KeyValue(key, (localPath, section)) in localFiles do
let localBytes = File.ReadAllBytes(localPath)
let localMD5 = computeMD5 localBytes
let needsUpload =
match r2Objects.TryFind key with
| Some etag -> etag <> localMD5 // ETag is MD5 for single-part uploads
| None -> true // New file
if needsUpload then
use stream = new MemoryStream(localBytes)
let request = PutObjectRequest(
BucketName = bucketName,
Key = key,
InputStream = stream,
ContentType = "text/markdown",
DisablePayloadSigning = true // Required for R2 compatibility
)
let response = client.PutObjectAsync(request) |> Async.AwaitTask |> Async.RunSynchronously
// ...
The script uses MD5 hashes to detect changes, only uploading files that have actually been modified. This makes it efficient to run frequently, whether manually during development or automatically in CI/CD. After uploading changes, the script triggers AI Search’s full scan endpoint:
// Trigger AutoRAG indexing if there were changes
if hasChanges then
use httpClient = new HttpClient()
let url = sprintf "https://api.cloudflare.com/client/v4/accounts/%s/autorag/rags/%s/full_scan" acctId ragName
use request = new HttpRequestMessage(HttpMethod.Patch, url)
request.Headers.Add("Authorization", sprintf "Bearer %s" token)
request.Content <- new StringContent("{}", Encoding.UTF8, "application/json")
let response = httpClient.SendAsync(request) |> Async.AwaitTask |> Async.RunSynchronously
// ...
CI/CD Pipeline Potential
The architecture obviastes the need for Wrangler, Cloudflare’s CLI tool, in favor of direct API calls through CloudflareFS bindings. This design choice enables something powerful: the entire deployment pipeline can be expressed in F# and integrated into .NET build systems.
Consider what a CI/CD pipeline might look like:
// Conceptual deployment orchestration
let deploy (config: CloudflareConfig) =
async {
// 1. Provision resources if they don't exist
let! r2Result = R2.ensureBucket config "speakez-content"
let! d1Result = D1.ensureDatabase config "speakez-ask-ai"
// 2. Compile F# to JavaScript via Fable
let! compileResult = Fable.compile "workers/ask-ai"
// 3. Deploy worker with bindings
let! deployResult =
Workers.deploy config "ask-ai" {
MainModule = "dist/Main.js"
Bindings = [
AIBinding "AI"
D1Binding ("DB", d1Result.Id)
PlainText ("AUTORAG_NAME", "speakez-rag")
]
}
// 4. Sync content to R2
let! syncResult = ContentSync.sync config "hugo/content"
return deployResult
}
This isn’t just a convenience; it’s a paradigm shift. Infrastructure provisioning, application deployment, and content synchronization become testable F# code rather than shell scripts calling CLI tools. Teams can use the same CI/CD patterns they apply to their application code.
A git-aware deployment can take this further. By analyzing diffs, the pipeline can determine the minimal deployment scope: cosmetic changes might only require a static site refresh, while worker code changes trigger full synchronization to the AI Search instance. Not only is the site smarter, but the deployment pipeline is more intelligent and nuanced as well.
The Open Source Foundation
CloudflareFS wouldn’t exist without the F# community’s sustained investment in tooling. The project builds directly on three pillars:
Hawaii, designed by Zaid Ajaj, has been quietly transforming how F# developers consume REST APIs for years. Its Swagger/OpenAPI-to-F# generation provides the management layer that makes programmatic infrastructure control possible.
Glutinum, from Maxime Mangel and contributors, solves the complementary problem of TypeScript interop. As JavaScript ecosystems publish TypeScript definitions for everything, Glutinum makes those definitions accessible to F# developers.
Fable, the F# to JavaScript compiler, is the foundation everything else rests on. Alfonso Garcia-Caro and the Fable community have built something remarkable: a way to write F# and run it anywhere JavaScript runs. Without Fable, there would be no F# on Cloudflare Workers.
CloudflareFS is built to be fully available as a free open source resource. The goal is to expand what’s possible for F# developers targeting modern edge infrastructure. The bindings, the generators, the samples, all are available for the community to use, examine, and improve.
Closing Thoughts
Building Ask AI was an exercise in practical application of tools that others created. The feature works because Hawaii generates management clients, because Glutinum produces runtime bindings, because Fable transpiles F# to JavaScript that runs at the edge. The implementation code, the handlers and types and sync scripts, is the visible layer built atop invisible years of open source effort.
For those considering similar projects, the path is clearer than it’s ever been. F# developers can target Cloudflare’s global edge network without leaving their preferred language. They can provision infrastructure, deploy workers, and manage content through type-safe APIs. The friction that once made polyglot development feel mandatory has been engineered away by people who shared their work freely.
Thank you to everyone who contributed to making this possible. The F# community’s generosity continues to expand what individual developers can accomplish.
And don’t forget to try Ask AI for yourself!
If you’re curious to see a sample of the code, it’s available in the samples directory of the CloudflareFS repo.
