Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
146781 stories
·
33 followers

React Basics: Memoization in React

1 Share

Don’t let re-rendering hamstring your React app: memoize! Learn when and how to memorize to improve the performance of your React app.

If you’ve ever worked on a React app that started out snappy and slowly turned sluggish as it grew, you’ve probably run into one of React’s classic performance puzzles—re-renders.

You click a button, update a state and suddenly half your component tree decides to re-render itself. Now, the UI works fine, but something feels off. Things start to lag. And before long, you’re Googling “why is my React component re-rendering” for the 10th time that week.

That’s where memoization steps in. It’s one of those optimization tools that, once you understand it, gives you control over when your components and calculations actually update, instead of letting React do it blindly.

But if you’ve ever tried to optimize performance in a large React app, you know how easy it is to overdo it. Suddenly, your code is buried under layers of memo hooks, dependency arrays and “why isn’t this memoized” comments. Performance tuning starts to feel like superstition; sometimes it helps, sometimes it hurts, and often it just adds noise.

In this deep dive, we will unpack how memoization really works in React and how the landscape is changing with React 19’s compiler, which is rewriting the rules entirely. By the end, you’ll not only understand memoization, but you’ll also know when to stop worrying about it and finally how React 19 is making manual memoization almost obsolete.

Why Memoization Matters

React’s rendering model is simple but aggressive. Whenever a parent component re-renders, its children will usually re-render too, even if their props haven’t changed. Most of the time, this is fine. But when your components grow complex or involve expensive computations (such as filtering, sorting or formatting data), unnecessary renders start to accumulate.

Let’s say you have a large list of users, and every time you type in a search box, the entire list recalculates and re-renders. Even if only one prop changed, React doesn’t know whether the results are the same, so it plays it safe and re-renders everything.

Memoization helps fix that by letting React remember what didn’t change.

What Is Memoization, Exactly?

Memoization is just a fancy term for caching the result of a function so you don’t recompute it unnecessarily.

Here’s a plain JavaScript example to show what it means:

function slowSquare(n) {
  console.log('Computing...');
  return n * n;
}

slowSquare(4); // "Computing..." → 16
slowSquare(4); // "Computing..." → 16 again

Every call recalculates. Now let’s memoize it:

const cache = {};
function memoizedSquare(n) {
  if (cache[n]) return cache[n];
  console.log('Computing...');
  cache[n] = n * n;
  return cache[n];
}
memoizedSquare(4); // "Computing..." → 16
memoizedSquare(4); // (no log) → 16 from cache

The second time, it skips the heavy work because the inputs haven’t changed.

That’s all memoization is. It’s a performance optimization that remembers past results based on inputs.

React applies this same concept to rendering and state updates. React’s job is to re-render your UI when state or props change. But sometimes, it’s re-rendering too much. So React gives us three tools to take control of this:

  • React.memo – memoizes components
  • useMemo – memoizes values
  • useCallback – memoizes functions

Each one targets a different kind of unnecessary work. Let’s go through them one by one.

React.memo: Memoizing Components

When a parent component re-renders, all its children re-render by default, even if their props are identical. React.memo changes that.

It wraps a functional component and tells React:

“If the props are the same as last time, skip re-rendering.”

Here’s an example:

const TodoCard = React.memo(function TodoCard({ name }) {
  console.log('Rendering:', name);
  return <div>{name}</div>;
});

If you render multiple TodoCard components inside a parent that re-renders often, you’ll notice only the ones with changed props actually re-render.

function TodoList({ todos }) {
  return (
    <div>
      {todos.map(todo => (
        <TodoCard key={todo.id} name={todo.name} />
      ))}
    </div>
  );
}

If the todos array reference stays the same, React.memo will prevent redundant renders.

Shallow Comparison Caveat

React.memo uses a shallow comparison for props. That means if you pass a new object or array each time (even with the same contents), React will still think it changed:

<TodoCard todo={{ name: “Write React Article” }} /> // new object every render

In that case, memoization won’t help unless you also memoize the object reference with useMemo or stabilize it in another way.

useMemo: Memoizing Expensive Calculations

Sometimes, the performance hit doesn’t come from re-renders—it comes from recalculations inside the render function. That’s what useMemo is for.

useMemo lets you cache a computed value between renders, only recomputing when its dependencies change.

const cachedValue = useMemo(calculateValue, dependencies)

To cache a calculation between re-renders, wrap it in a useMemo call at the top level of your component:

import { useMemo } from 'react';
function TodoApp() {const filteredTodos = useMemo(() => {
    console.log("Filtering todos...");
    return todos.filter(todo =>
      todo.text.toLowerCase().includes(search.toLowerCase())
    );
  }, [todos, search]);
...
}

When using useMemo, you need to pass two things:

  1. A calculation function takes no arguments, like () =>, and returns what you wanted to calculate.
  2. A list of dependencies including every value within your component that’s used inside your calculation.

On the initial render, React runs that calculation and stores the result.

On every subsequent render, React compares the current dependencies to the ones from the previous render (using Object.is for comparison).

  • If none of the dependencies have changed, React simply returns the previously cached value—it doesn’t rerun the calculation.
  • If at least one dependency has changed, React re-executes the function, updates the stored result and returns the new value.

In short, useMemo remembers the result of a computation between renders and only recalculates it when one of its dependencies changes. It’s React’s way of saying, “I’ve already done this work. Unless something important changed, let’s not do it again.”

Example: Optimizing a Filtered Todo List with useMemo

Let’s look at a simple example where useMemo actually makes a difference—a Todo list with a search filter.

Without useMemo, every time the component re-renders (even when unrelated state changes), your filter logic will run again. That’s fine for small data, but as your list grows, it can start to slow things down unnecessarily.

Here’s the straightforward version first:

function TodoApp() {
  const [search, setSearch] = useState("");
  const [todos, setTodos] = useState([
  { id: 0, name: "Todo 1", done: false },
  { id: 1, name: "Todo 2", done: true},
  { id: 2, name: "Todo 3", done: false },
  ]);

  const filteredTodos = todos.filter(todo =>
    todo.text.toLowerCase().includes(search.toLowerCase())
  );
  return (
    <div>
      <input
        type="text"
        placeholder="Search todos..."
        value={search}
        onChange={e => setSearch(e.target.value)}
      />
      <ul>
        {filteredTodos.map(todo => (
          <li key={todo.id}>{todo.text}</li>
        ))}
      </ul>
    </div>
  );
}

This works, but notice what happens when you type in the search box or add a new todo: React reruns the entire component, including the todos.filter() call.

Now imagine you have hundreds or thousands of todos, or that the filter operation involves heavier logic. You don’t want to run that unnecessarily on every render.

Here’s where useMemo helps:

function TodoApp() {
  const [search, setSearch] = useState("");
  const [todos, setTodos] = useState([
    { id: 1, text: "Buy groceries", done: false },
    { id: 2, text: "Read a book", done: true },
    { id: 3, text: "Go for a walk", done: false },
  ]);

  const filteredTodos = useMemo(() => {
    console.log("Filtering todos...");
    return todos.filter(todo =>
      todo.text.toLowerCase().includes(search.toLowerCase())
    );
  }, [todos, search]);

  return (
    <div>
      <input
        type="text"
        placeholder="Search todos..."
        value={search}
        onChange={e => setSearch(e.target.value)}
      />

      <ul>
        {filteredTodos.map(todo => (
          <li key={todo.id}>{todo.text}</li>
        ))}
      </ul>
    </div>
  );
}

Now, the filter only runs when either todos or search changes. If you trigger a re-render for any other reason (say, toggling a modal or updating unrelated state higher up in the tree), React will reuse the cached filtered list from the previous render.

useMemo shines in situations like this when you’re doing expensive or repetitive calculations during render that don’t need to run every single time. It’s not about squeezing out microseconds; it’s about keeping your renders predictable and efficient as your app scales.

In this todo example, the difference is subtle, but in a real-world app where filtering, sorting or formatting can get complex, memoization can save a noticeable amount of work.

useCallback: Memoizing Functions for Stability

If useMemo helps React remember values, then `useCallback helps React remember functions.

At first,this might sound unnecessary—after all, functions are cheap to create, right? But in React, passing functions down as props can sometimes cause subtle and frustrating re-renders that you don’t expect.

The useCallback() hook is one of React’s built-in tools for optimizing re-renders. Its job is simple but powerful—it lets React remember your function definitions between renders, so they don’t get recreated every single time your component reruns.

In short, it memoizes the function itself.

Syntax:

const memoizedFunction = useCallback(() => {
  // Your logic here
}, [dependency1, dependency2, ...]);

The useCallback() hook takes two arguments:

  1. A function to memoize: This is the function you want React to remember. It can take any arguments and return any value.
  2. A dependency array: This lists every reactive value (state, props or variables) used inside that function. React will only recreate the function if one of those dependencies changes between renders.

Here’s what happens under the hood:

  • On the first render, React creates and returns the function.
  • On later renders, React checks if any dependencies have changed.

This simple mechanism keeps the function reference stable across renders, which is exactly what you want when passing callbacks to memoized child components.

Remember: Since useCallback is a hook, it must follow the rules of hooks. Call it only at the top level of a React component or custom hook, never inside loops, conditionals or nested functions.

Practical Example: Preventing Unnecessary Re-renders

Let’s bring this to life with a practical example.

Suppose you’re building a simple todo app that displays a list of tasks, with the ability to mark each one as done or undone.

Here’s the initial setup:

import React, { useState } from "react";

const TodoItem = React.memo(({ todo, onChange }) => {
  console.log(`Rendering ${todo.name} `);
  return (
    <div>
      <span style={{ textDecoration: todo.done ? "line-through" : "none" }}>
        {todo.name}
      </span>
      <button onClick={() => onChange(todo.id)}>
        {todo.done ? "Undone" : "Done"}
      </button>
    </div>
  );
});

const demoTodos = [
  { id: 0, name: "Todo 1", done: false },
  { id: 1, name: "Todo 2", done: false },
  { id: 2, name: "Todo 3", done: false },
  { id: 3, name: "Todo 4", done: false },
];

const TodoList = () => {
  const [todos, setTodos] = useState(demoTodos);
  function toggleTodo(id) {
    setTodos(prevTodos =>
      prevTodos.map(todo =>
        todo.id === id ? { ...todo, done: !todo.done } : todo
      )
    );
  }

  return (
    <div>
      <h2>Today’s Todos</h2>
      <ul>
        {todos.map(todo => (
          <li key={todo.id}>
            <TodoItem todo={todo} onChange={toggleTodo} />
          </li>
        ))}
      </ul>
    </div>
  );
};
export default TodoList;

The TodoItem component is wrapped in React.memo, meaning it shouldn’t re-render unless its props change. But if you try this out, you’ll notice something strange—clicking the button causes every TodoItem to re-render, not just the one you toggled:

Demo 1

Why?

Because each time TodoList re-renders, React creates a brand-new toggleTodo function. From React’s perspective, that means the onChange prop on every child is now different, even if the logic is the same, so React.memo decides to re-render everything.

That’s exactly the kind of subtle inefficiency useCallback fixes.

Fixing it with useCallback

Let’s wrap the toggleTodo function in useCallback to make its reference stable:

import React, { useState, useCallback } from "react";
const TodoItem = React.memo(({ todo, onChange }) => {
  console.log(`Rendering ${todo.name} `);
  return (
    <div>
      <span style={{ textDecoration: todo.done ? "line-through" : "none" }}>
        {todo.name}
      </span>
      <button onClick={() => onChange(todo.id)}>
        {todo.done ? "Undone" : "Done"}
      </button>
    </div>
  );
});
const demoTodos = [
  { id: 0, name: "Todo 1", done: false },
  { id: 1, name: "Todo 2", done: false },
  { id: 2, name: "Todo 3", done: false },
  { id: 3, name: "Todo 4", done: false },
];
const TodoList = () => {
  const [todos, setTodos] = useState(demoTodos);
  const toggleTodo = useCallback((id) => {
    setTodos(prevTodos =>
      prevTodos.map(todo =>
        todo.id === id ? { ...todo, done: !todo.done } : todo
      )
    );
  }, []);
  return (
    <div>
      <h2>Today’s Todos</h2>
      <ul>
        {todos.map(todo => (
          <li key={todo.id}>
            <TodoItem todo={todo} onChange={toggleTodo} />
          </li>
        ))}
      </ul>
    </div>
  );
};
export default TodoList;

Now, the toggleTodo function keeps the same reference between renders because its dependency array ([]) is empty. React creates it once and reuses it as long as the component stays mounted.

Since TodoItem receives a stable onChange prop, React.memo can finally do its job—only the toggled TodoItem re-renders, not the entire list.

In a small Todo app, this might feel like a micro-optimization. But in a real app, think of dozens of components, large lists or complex trees. Stabilizing your callbacks can prevent entire subtrees from re-rendering unnecessarily.

That’s not just about performance; it’s about keeping your app predictable, scalable and smooth.

Here’s the CodeSandbox demo if you want to play around with it. Try adding features like add, delete or edit, and see how useCallback helps control re-renders as your app grows.

When (and When Not) to Memoize

Memoization is powerful, but it’s not free. Every memoized value or component adds a bit of overhead. React needs to compare dependencies, store results and decide whether to reuse or recompute. Overusing it can actually make things slower or harder to read.

Use Memoization When:

  • A component re-renders often with identical props.
  • You’re doing expensive computations (sorting, filtering, formatting).
  • You’re passing callbacks to deeply nested children wrapped in React.memo

Avoid Memoization When:

  • The component is small and cheap to render.
  • Props change frequently anyway.
  • You’re just memoizing out of habit.

In short: Measure first, optimize second. Don’t sprinkle memoization everywhere “just in case.”

React 19: Rethinking Memoization

Here’s where things get interesting. React 19 introduces several architectural improvements that make manual memoization far less critical than before. Under the hood, React’s new compiler can automatically memoize components and hooks at build time—analyzing your component’s dependencies and optimizing re-renders intelligently.

That means in most cases, you no longer need to reach for React.memo, useMemo or useCallback unless you’re dealing with very specific edge cases.

Here’s what changes:

  • Automatic memoization: The React compiler now detects pure components and automatically memoizes them, skipping re-renders when inputs haven’t changed.
  • Smarter dependency tracking: Instead of relying on manual dependency arrays, React can track dependencies automatically, reducing human error and cleanup code.
  • Simplified mental model: You can focus on writing clean, declarative code, and React handles the optimization behind the scenes.

In practice, this means your React 19 codebase becomes simpler and less noisy. Hooks like useCallback and useMemo are still available for advanced control, but they’re no longer the first thing you reach for—React just does the right thing by default.

Final Thoughts

Memoization in React is all about efficiency, caching results or function references to avoid redundant work.

Before React 19, we relied heavily on React.memo, useMemo and useCallback to manage this manually. But with React 19’s new compiler, much of that optimization happens automatically.

Still, knowing how memoization works gives you the mental model to understand what React is doing behind the scenes and when to step in yourself. Because even in the new world of auto-optimization, the best React developers understand the “why,” not just the “what.”

Further Reading

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Building a RAG (Retrieval-Augmented Generation) in ASP.NET Core

1 Share

RAG is a technique that enhances language models by integrating them with internal knowledge sources. In this post, you’ll understand the concept of RAG and learn how to implement it in an ASP.NET Core application, exploring a practical, real-world scenario.

The use of artificial intelligence has become increasingly common in web applications, so it’s important to consider how to optimize the use of AI via large language models (LLMs) for efficiency and cost reduction.

In this article, we’ll explore how to use the Retrieval-Augmented Generation (RAG) concept in ASP,NET Core projects to create an automated return policy system. We’ll understand how RAG combines information retrieval and text generation to enable LLMs to respond based on real, up-to-date data, reducing the need for retraining.

Understanding the RAG Concept

Retrieval Augmented Generation, or RAG, is a technique (sometimes referred to as an architecture) used to optimize the performance of an AI model. It consists of connecting a model to an internal knowledge base, which can be a text file or even a database, to provide more relevant answers without the need for additional training.

In simple terms, instead of relying only on its training data, RAG allows the model to retrieve up-to-date or domain-specific information to generate a more accurate answer.

How Does RAG Work?

The operation of RAG basically involves combining information retrieval models with generative AI models, and then returning a more accurate result. RAG systems typically follow a five-stage process:

  1. User Input (User Prompt)
    The user asks a question or sends a command, for example, “What are the company’s security policies?”

  2. Information Retrieval (Retrieval)
    A retrieval model, such as a vector search engine, is triggered to search for relevant data in an external knowledge base, such as corporate documents or databases. This step transforms the prompt into embeddings and searches for the semantically closest documents.

  3. Integration (Integration / Context Assembly)
    The most relevant information found is returned and combined with the original prompt. In this stage, the system assembles an augmented prompt that contains both the user’s question and the relevant snippets retrieved from the internal knowledge base.

  4. Generation (Augmented Generation)
    The language model receives this augmented prompt and generates a contextualized response, taking into account the retrieved data.

  5. Output to the User (Response Delivery)
    The final result is then delivered to the user, usually accompanied by references to the sources or links that support the answer.

These five steps constitute the complete RAG workflow, which goes beyond a simple question-and-answer approach. It combines querying, filtering, context assembly and contextualized generation, enabling more accurate and up-to-date responses. The image below summarizes this workflow:

RAG stages

Creating a RAG in ASP,NET Core

To practice using RAG, we will develop an API in ASP,NET Core that will answer questions about a product return policy. The API will retrieve the information from a knowledge base, a text file containing the return policy. This data will be converted into embeddings and stored in an SQLite database.

Then, the API will send the relevant content to the OpenAI API, which will generate a contextualized response and return it in the request response.

You can check the complete source code in this GitHub repository: Return Policy source code.

Prerequisites

To practice the example in this tutorial, you will need to have the following:

  1. API Key. If you don’t already have an API key, you can use this tutorial to create it: Get Started Integrating AI in Your ASP.NET Core Applications.

  2. A project created on the OpenAI website

  3. The following models configured in your API key:

Open AI models

You can use other models of your choice, but other libraries and additional configurations may be necessary.

So, to create the sample application and download the packages you can use the following commands on the terminal:

dotnet new web -o ReturnPolicy
dotnet add package Microsoft.Data.Sqlite --version 9.0.10
dotnet add package OpenAI --version 2.5.0

Next, let’s create the single model class used in the application, it will be used to request the data. So, create a new folder called “Models” and, inside it, add the following record:

namespace ReturnPolicy.Models;

public record QuestionRequest(string Question);

Creating the Return Policy File

Now let’s create a text file that will serve as the knowledge base to be sent to the model to formulate the response. It will contain a common example of a return policy. Create a new folder called “Data” and, inside it, create a file called return_policy.txt and add to it the following text:

Return Policy - Updated July 2025

Our customers may return most new, unopened items within 30 days of delivery for a full refund.
Products that are defective or damaged can be returned or exchanged at any time.

To be eligible for a return:

- The product must be in the same condition as received.
- Proof of purchase is required.
- Returns after 30 days are subject to manager approval.

Please contact our support team before sending any returns.

Creating the Logic for Generating the Response

Now, let’s create the logic to generate, register and retrieve the embeddings, as well as formulate the response generated by the model.

Create a new folder called “Service” and, inside it, add the class below.

Note that we will first create the class and throughout the post we will add methods to it until it is complete. This is intended to explain each method separately for better understanding.

using Microsoft.Data.Sqlite;
using OpenAI;
using OpenAI.Chat;
using OpenAI.Embeddings;

namespace ReturnPolicy.Services
{
    public class PolicyService
    {
        private readonly ChatClient _chatClient;
        private readonly EmbeddingClient _embeddingClient;
        private readonly string _policyPath;
        private readonly string _dbPath;

        public PolicyService(IConfiguration config)
        {
            var apiKey = config["OpenAI:ApiKey"];

            if (string.IsNullOrWhiteSpace(apiKey))
                throw new InvalidOperationException("Missing OpenAI:ApiKey in configuration.");

            var client = new OpenAIClient(apiKey);

            _chatClient = client.GetChatClient("gpt-4o-mini");
            _embeddingClient = client.GetEmbeddingClient("text-embedding-3-small");

            _policyPath = Path.Combine(Directory.GetCurrentDirectory(), "Data", "return_policy.txt");
            _dbPath = Path.Combine(Directory.GetCurrentDirectory(), "Data", "embeddings.db");

            InitializeDatabase();
            LoadPolicyIntoDatabaseAsync().Wait();
        }
    }
}

Here we are using the PolicyService class to integrate the application with the OpenAI API and prepare the necessary data to work with RAG.

At the beginning, four private fields are declared: _chatClient and _embeddingClient are responsible for communicating with the OpenAI API. The first handles the chat model (in this case, gpt-4o-mini), while the second works with the embeddings model.

Meanwhile, _policyPath stores the path to the text file containing the return policy we created earlier, and _dbPath indicates the location where the SQLite database will be created.

The class constructor starts by reading the OpenAI API key from the application’s configuration file. If the key is not present, an exception is thrown indicating that it is required. Then, an OpenAIClient object is created, which serves as an access point to the different OpenAI services.

With this client, two components are initialized: the _chatClient, which will be used to generate intelligent responses, and the _embeddingClient, which will handle the creation of the text embeddings for the policy. After that, we define the paths of the data files, so that both the original text and the embeddings database are stored within the project’s Data folder.

Finally, two important actions are performed: InitializeDatabase() to prepare the SQLite database, creating the necessary tables if they do not already exist, and LoadPolicyIntoDatabaseAsync().Wait(), which reads the content of the return policy file, generates the embeddings and saves them to the database for later use.

Now, let’s create the methods to insert and retrieve the embeddings from the database. In the PolicyService class, add the methods below:

private void InitializeDatabase()
{
    using var conn = new SqliteConnection($"Data Source={_dbPath}");
    conn.Open();

    var cmd = conn.CreateCommand();
    cmd.CommandText = @"CREATE TABLE IF NOT EXISTS PolicyChunks (
                        Id INTEGER PRIMARY KEY AUTOINCREMENT,
                        Text TEXT NOT NULL,
                        Embedding BLOB NOT NULL
                        );";
    cmd.ExecuteNonQuery();
}

private async Task LoadPolicyIntoDatabaseAsync()
{
    var policyText = await File.ReadAllTextAsync(_policyPath);
    var chunks = SplitIntoChunks(policyText, 500);

    using var conn = new SqliteConnection($"Data Source={_dbPath}");
    conn.Open();

    foreach (var chunk in chunks)
    {
        var checkCmd = conn.CreateCommand();

        checkCmd.CommandText = "SELECT COUNT(*) FROM PolicyChunks WHERE Text = $text";

        checkCmd.Parameters.AddWithValue("$text", chunk);

        bool exists = Convert.ToInt32(checkCmd.ExecuteScalar()) > 0;

        if (exists) continue;

        try
        {
            var embeddingResult = await _embeddingClient.GenerateEmbeddingAsync(chunk);
            
            float[] vector = embeddingResult.Value.ToFloats().ToArray();

            var insertComand = conn.CreateCommand();

            insertComand.CommandText = "INSERT INTO PolicyChunks (Text, Embedding) VALUES ($text, $embedding)";

            insertComand.Parameters.AddWithValue("$text", chunk);
            insertComand.Parameters.AddWithValue("$embedding", FloatArrayToBytes(vector));

            insertComand.ExecuteNonQuery();
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Message);
        }
    }
}

Here, the InitializeDatabase() method creates the basic structure of the SQLite database, if it doesn’t already exist. First, it establishes a connection to the file defined in _dbPath, which is the same path configured in the class constructor. Then, it opens this connection and executes an SQL command responsible for creating the table PolicyChunks.

Note that the table consists of three columns: Id for the primary key, Text, which will store the text segment (or chunk) extracted from the policy file, and Embedding, a BLOB (Binary Large Object), that is, a binary field where the numerical vector that semantically represents that text segment will be stored. This means that the database is ready to receive and store the text data and their respective embeddings.

The LoadPolicyIntoDatabaseAsync() method is responsible for loading the policy content and saving its embeddings to the database.

First, it reads the entire content of the policy file, located at _policyPath, and then divides it into smaller parts using the SplitIntoChunks() method. This division is important because OpenAI’s language models have input size limits. Therefore, the text is broken into blocks of up to 500 characters or tokens.

Then, a new connection to the database is opened, and each text segment (chunk) is processed. For each segment, it checks if the content is already stored in the table. This is done through a query that counts how many records have the same text. If the segment already exists, it is ignored to avoid duplication.

When a new segment is found, the method requests the OpenAI embeddings model to generate a numerical vector representing the meaning of that text. The result is then converted to a float array, which is then transformed into bytes using the FloatArrayToBytes() method, to be compatible with the BLOB type of the database.

Finally, the vector and the text are inserted together into the PolicyChunks table.

If any error occurs during the process, such as a communication failure with the API, for example, the exception is displayed in the console, allowing processing to continue with the remaining segments.

Now let’s add the most important part of the service, where the intelligent search and generation of responses based on the company’s return policy takes place. So, add the following code to the PolicyService class:

public async Task<string> GetAnswerAsync(string question)
{
    var queryEmbedding = await _embeddingClient.GenerateEmbeddingAsync(question);
    var queryVector = queryEmbedding.Value.ToFloats().ToArray();

    var topChunks = GetTopChunks(queryVector, 3);

    var context = string.Join("\n\n", topChunks);

    List<ChatMessage> messages = new()
    {
        ChatMessage.CreateSystemMessage("You are a helpful assistant that answers based on company return policies."),
        ChatMessage.CreateUserMessage($"Use only the following policy text to answer the question:\n\n{context}\n\nQuestion: {question}")
    };

    var response = await _chatClient.CompleteChatAsync(messages);
    return response.Value.Content[0].Text.Trim();
}

private List<string> GetTopChunks(float[] queryEmbedding, int topN)
{
    using var conn = new SqliteConnection($"Data Source={_dbPath}");
    conn.Open();

    var selectCmd = conn.CreateCommand();

    selectCmd.CommandText = "SELECT Text, Embedding FROM PolicyChunks";
   
    using var reader = selectCmd.ExecuteReader();

    var scoredChunks = new List<(string Text, double Score)>();
    
    while (reader.Read())
    {
        var text = reader.GetString(0);
        var embeddingBytes = (byte[])reader["Embedding"];
        var embedding = BytesToFloatArray(embeddingBytes);

        var similarity = CosineSimilarity(embedding, queryEmbedding);
        scoredChunks.Add((text, similarity));
    }

    return scoredChunks
        .OrderByDescending(x => x.Score)
        .Take(topN)
        .Select(x => x.Text)
        .ToList();
}

private static IEnumerable<string> SplitIntoChunks(string text, int maxLength)
{
    for (int i = 0; i < text.Length; i += maxLength)
        yield return text.Substring(i, Math.Min(maxLength, text.Length - i));
}

private static double CosineSimilarity(float[] v1, float[] v2)
{
    double dot = 0.0, mag1 = 0.0, mag2 = 0.0;
    for (int i = 0; i < v1.Length; i++)
    {
        dot += v1[i] * v2[i];
        mag1 += v1[i] * v1[i];
        mag2 += v2[i] * v2[i];
    }
    return dot / (Math.Sqrt(mag1) * Math.Sqrt(mag2));
}

private static byte[] FloatArrayToBytes(float[] array)
{
    var bytes = new byte[array.Length * sizeof(float)];
    Buffer.BlockCopy(array, 0, bytes, 0, bytes.Length);
    return bytes;
}

private static float[] BytesToFloatArray(byte[] bytes)
{
    var floats = new float[bytes.Length / sizeof(float)];
    
    Buffer.BlockCopy(bytes, 0, floats, 0, bytes.Length);
    return floats;
}

Now, let’s analyze each method:

1. GetAnswerAsync(string question)

This method receives the user’s question and returns an AI-generated answer based on the policy content stored in the database.

First, it transforms the question into an embedding vector, using the same embedding model configured previously. This numerical vector (queryVector) represents the semantic meaning of the question.

Next, the method calls GetTopChunks(), which searches the database for the chunks most similar to the meaning of the question, i.e., the parts of the policy that contain information relevant to the answer. It requests the three most relevant chunks (topChunks), and then combines these texts into a single string called context.

With the context ready, the method assembles a list of messages to send to the chat model. The first message is an instruction, stating that the assistant should only answer based on the company policy. The second message contains both the policy text and the user’s original question.

Finally, the code calls the CompleteChatAsync() method of the OpenAI client, which returns a generated response based on the provided context. The final text is extracted, cleaned and then returned.

2. GetTopChunks(float[] queryEmbedding, int topN)

This method identifies which chunks in the database are most semantically similar to the question asked.

It opens a connection to the SQLite database and reads all records from the PolicyChunks table, which contains the texts and their embeddings.

For each record, it calculates the cosine similarity between the query vector (queryEmbedding) and the stored text vector. This calculation generates a number between 0 and 1; the closer to 1, the more similar the meaning between the two texts.

After calculating the scores, the method sorts the results in descending order and returns the topN most relevant chunks (in this case, three). These chunks will serve as the knowledge base for the model to answer correctly.

3. SplitIntoChunks(string text, int maxLength)

This method divides long texts into smaller parts, respecting a defined maximum size (for example, 500 characters).

It iterates through the original text and, in each iteration, returns a chunk that can be processed without exceeding the model’s token limit. It is essential to use techniques like this when working with large documents in RAG systems.

4. CosineSimilarity(float[] v1, float[] v2)

This method is the mathematical basis of the semantic search system. It calculates the angle between two vectors in the embedding space—the smaller the angle (or the larger the cosine), the closer the meanings of the texts represented by those vectors.

It is with this metric that the system determines which sections of the policy are most relevant to a specific question.

5. FloatArrayToBytes(float[] array) and BytesToFloatArray(byte[] bytes)

These two methods convert between arrays of numbers and bytes. Since SQLite does not have a native data type to store arrays of floats, the embeddings are converted into a binary format (BLOB) before being saved to the database. When they need to be used again, these bytes are converted back into an array of floats, preserving all the information of the original vector.

Adding the API Key

To access the OpenAI API, you need an API key. With the key in hand, add the following code to the application’s appsettings.json file:

"OpenAI": {
  "ApiKey": "YOUR_OPEN_AI_API_KEY"
},

Finally, in the Program class, add the following code:

using ReturnPolicy.Services;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddControllers();
builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSingleton<PolicyService>();

var app = builder.Build();
app.UseStaticFiles();
app.MapControllers();

app.Run();

Running the Application and Testing the RAG

Now that everything is configured, we can run the application and test the endpoint that will generate the response. In this post, we’ll use Progress Telerik Fiddler Everywhere for this. Run the application and make the following request:

Route: POST - https://localhost:PORT/api/policy/ask

Body:

{
    "question": "Can I return an opened product?"
}

So the generated response will be something like this:

Policy response

The complete response returned by the model was:

According to the return policy, most new, unopened items can be returned within 30 days for a full refund. If you have an opened product that is defective or damaged, it can be returned or exchanged at any time. If you need further assistance, please contact our support team.

Note that the answer provided by the model is aligned with the policy, which states that new and unused items can be returned within 30 days. Furthermore, it demonstrates clarity and objectivity by directly addressing the user’s question, conveying confidence and concern for the customer, which contributes to a good support experience.

Conclusion

The RAG technique allows for the generation of contextualized and intelligent responses, integrating the retrieval of relevant information with advanced synthesis capabilities. The responses produced by RAGs reduce ambiguities, improve the user experience and increase the reliability of interactions, especially in scenarios where document-based accuracy is essential.

In this post, we created a complete RAG system in ASP,NET Core, integrating it with OpenAI services for generating embeddings and producing contextualized responses. I hope this content serves as a practical reference, facilitating the adoption of the RAG technique whenever you have the opportunity to apply it in your projects.

If this all seems like a lot of work, there are professional RAG platforms you can explore, such as Progress Agentic RAG.

Progress Agentic RAG

Progress Agentic RAG is a RAG-as-a-Service platform that simplifies the creation of augmented reality (AR) retrieval and generation solutions. Instead of requiring proprietary infrastructure or multiple separate tools, it offers a ready-to-use environment for indexing documents, files and even videos, along with integrated metrics to evaluate RAG quality.

In practice, Progress Agentic RAG stands out for:

  • Generative search for websites, which can interpret user intent and build responses using existing content on the page itself.
  • Use in sensitive areas, such as the financial sector, supporting decision-making without compromising privacy and security.
  • The ability to handle unstructured data, covering more than 60 formats: PDFs, videos, spreadsheets, texts, among others. This greatly helps teams like legal departments, who need quick and accurate answers.
  • Intelligent video indexing, allowing the location of specific segments and generating responses based on audiovisual content.

For developers, Progress Agentic RAG functions as an intelligence layer that can be integrated with minimal effort. This reduces the need to build a RAG pipeline from scratch and accelerates the development of generative AI-based solutions.

Keep reading: Understand more about RAG and get a walk-through of Progress Agentic RAG.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Building a Greenfield System with the Critter Stack

1 Share

JasperFx Software works hand in hand with our clients to improve our client’s outcomes on software projects using the “Critter Stack” (Marten and Wolverine). Based on our engagements with client projects as well as the greater Critter Stack user base, we’ve built up quite a few optional usages and settings in the two frameworks to solve specific technical challenges.

The unfortunate reality of managing a long lived application framework such as Wolverine or a complicated library like Marten is the need to both continuously improve the tools as well as trying really hard not to introduce regression errors to our clients when they upgrade tools. To that end, we’ve had to make several potentially helpful features be “opt in” in the tools, meaning that users have to explicitly turn on feature flag type settings for these features. A common cause of this is any change that introduces database schema changes as we try really hard to only do that in major version releases (Wolverine 5.0 added some new tables to SQL Server or PostgreSQL storage for example).

And yes, we’ve still introduced regression bugs in Marten or Wolverine far more times than I’d like, even with trying to be careful. In the end, I think the only guaranteed way to constantly and safely improve tools like the Critter Stack is to just be responsive to whatever problems slip through your quality gates and try to fix those problems quickly to regain trust.

With all that being said, let’s pretend we’re starting a greenfield project with the Critter Stack and we want to build in the best performing system possible with some added options for improved resiliency as well. To jump to the end state, this is what I’m proposing for a new optimized greenfield setup for users:

 var builder = Host.CreateApplicationBuilder();

builder.Services.AddMarten(m =>
{
    // Much more coming...
    m.Connection(builder.Configuration.GetConnectionString("marten"));

    // 50% improvement in throughput, less "event skipping"
    m.Events.AppendMode = EventAppendMode.Quick;
    // or if you care about the timestamps -->
    m.Events.AppendMode = EventAppendMode.QuickWithServerTimestamps;

    // 100% do this, but be aggressive about taking advantage of it
    m.Events.UseArchivedStreamPartitioning = true;

    // These cause some database changes, so can't be defaults,
    // but these might help "heal" systems that have problems
    // later
    m.Events.EnableAdvancedAsyncTracking = true;

    // Enables you to mark events as just plain bad so they are skipped
    // in projections from here on out.
    m.Events.EnableEventSkippingInProjectionsOrSubscriptions = true;

    // If you do this, just now you pretty well have to use FetchForWriting
    // in your commands
    // But also, you should use FetchForWriting() for command handlers 
    // any way
    // This will optimize the usage of Inline projections, but will force
    // you to treat your aggregate projection "write models" as being 
    // immutable in your command handler code
    // You'll want to use the "Decider Pattern" / "Aggregate Handler Workflow"
    // style for your commands rather than a self-mutating "AggregateRoot"
    m.Events.UseIdentityMapForAggregates = true;

    // Future proofing a bit. Will help with some future optimizations
    // for rebuild optimizations
    m.Events.UseMandatoryStreamTypeDeclaration = true;

    // This is just annoying anyway
    m.DisableNpgsqlLogging = true;
})
// This will remove some runtime overhead from Marten
.UseLightweightSessions()

.IntegrateWithWolverine(x =>
{
    // Let Wolverine do the load distribution better than
    // what Marten by itself can do
    x.UseWolverineManagedEventSubscriptionDistribution = true;
});

builder.Services.AddWolverine(opts =>
{
    // This *should* have some performance improvements, but would
    // require downtime to enable in existing systems
    opts.Durability.EnableInboxPartitioning = true;

    // Extra resiliency for unexpected problems, but can't be
    // defaults because this causes database changes
    opts.Durability.InboxStaleTime = 10.Minutes();
    opts.Durability.OutboxStaleTime = 10.Minutes();

    // Just annoying
    opts.EnableAutomaticFailureAcks = false;

    // Relatively new behavior that will store "unknown" messages
    // in the dead letter queue for possible recovery later
    opts.UnknownMessageBehavior = UnknownMessageBehavior.DeadLetterQueue;
});

using var host = builder.Build();

return await host.RunJasperFxCommands(args);

Now, let’s talk more about some of these settings…

Lightweight Sessions with Marten

The first option we’re going to explicitly add is to use “lightweight” sessions in Marten:

var builder = Host.CreateApplicationBuilder();

builder.Services.AddMarten(m =>
{
    // Elided configuration...
})
// This will remove some runtime overhead from Marten
.UseLightweightSessions()

By default, Marten will use a heavier version of IDocumentSession that incorporates an Identity Map internally to track documents (entities) already loaded by that session. Likewise, when you request to load an entity by its identity, Marten’s session will happily check if it has already loaded that entity and gives you the same object back to you without making the database call.

The identity map usage is mostly helpful when you have unclear or deeply nested call stacks where different elements of the code might try to load the same data as part of the same HTTP request or command handling. If you follow “Critter Stack” and what we call the best practices especially for Wolverine usage, you’ll know that we very strongly recommend against deep call stacks and excessive layering.

Moreover, I would argue that you should never need the identity map behavior if you were building a system with an idiomatic Critter Stack approach, so the default session type is actually harmful in that it adds extra runtime overhead. The “lightweight” sessions run leaner by completely eliminating all the dictionary storage and lookups.

Why you ask is the identity map behavior the default?

  1. We were originally designing Marten as a near drop in replacement for RavenDb in a big system, so we had to mimic that behavior right off the bat to be able to make the replacement in a timely fashion
  2. If we changed the default behavior, it can easily break code in existing systems that upgrade in ways that are very hard to predict and unfortunately hard to diagnose. And of course, this is most likely a problem in the exact kind of codebases that are hard to reason about. How do I know this and why am I so very certain this is so you ask? Scar tissue.



Read the whole story
alvinashcraft
11 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Studying compiler error messages closely: Input file paths

1 Share

A colleague was working in a project that used a number of data files to configure how the program worked. They wanted one portion of the configuration file to be included only if a particular build flag was set. Let’s say that the configuration file is C:\repos\contoso\config\Contoso.config.

<providers>
    <provider name="Widget" version="1.0"/> <!-- or 2.0 if useV2Widgets build flag is set -->
    <provider name="Gadget" version="1.0"/> <!-- only if useV2Widgets build flag is set -->
    <!-- other providers that are used regardless of the build flags -->
</providers>

They were adding a build flag to convert the code base to use 2.0 widgets, but they wanted the default to be 1.0; only people who build with the special build flag should get 2.0 widgets. It so happens that 2.0 widgets depend on gadgets, so they also wanted to add a gadget provider, but again only conditionally based on the build flag.

The configuration file itself doesn’t support conditionals. How can they get a configuration file to support conditionals when the file format does not support conditionals?

I suggested that they use a preprocessor to take the marked-up configuration file and produce a filtered output file, which becomes the actual configuration file. Upon closer investigation, it appeared that they were not the first project to need conditionals in their configuration file, and another team had already written a generic XML preprocessor that supports conditional elements based on build flags, and that other team even included instructions on their project wiki on how to include a preprocessor pass to their build configuration. The updated configuration file looks something like this:

<providers>
    <provider name="Widget" version="1.0" condition="!useV2Widgets"/>
    <provider name="Widget" version="2.0" condition="useV2Widgets"/>
    <provider name="Gadget" version="1.0" condition="useV2Widgets"/>
</providers>

However, after following the instructions on the wiki to update the configuration file to use the condition attribute, and update the build process to send the file through the “conditional build flags” preprocessor, the was still a build error:

Validation failure: C:\repos\contoso\config\Contoso.config(2): Invalid attribute 'conditions'

The configuration validator was upset at the condition attribute, but when they compared their project to other projects that used the configuration preprocessor, those other projects used the condition attribute just fine.

Look carefully at the error message. In particular, look at the path to the file that the validator is complaining about.

The validator is complaining about the original unprocessed file.

They went to the effort of sending the unprocessed file through the conditional build flags preprocessor to produce a processed file that has the correct provider list based on the build flags. But they forgot to use the results of that hard work: They were still using the old unprocessed file. It’s like taking a photograph, doing digital touch-ups, but then uploading the original version instead of the touched-up version.

The fix was to update the project so that it consumed the processed file instead of the raw file.¹

Bonus chatter: To avoid this type of confusion, it is common to change the extension of the unprocessed file to emphasize that it needs to be preprocessed. That way, when you see an error in Contoso.config, you don’t have to spend the effort to figure out which Contoso.config the error is about.

In this case, they could rename the unprocessed file to Contoso.preconfig and have the processed output be Contoso.config. I choose this pattern because the validator may require that the file extension be .config.

Another pattern would be to call the unprocessed version Contoso-raw.config and the processed version Contoso.config.

If you don’t want to rename an existing file (say because you are worried about merge errors if your change collides with others who are also modifying that file), you could leave the unprocessed file as Contoso.config and call the processed file Contoso-final.config

¹ The instructions on the wiki says “In your project file, change references from yourfile.ext to $(OutputDirectory)\yourfile.ext‘ But in this case, the file was being used not by the project file but by a separate configuration tool. The team was too focused on reading the literal instructions without trying to understand why the instructions were saying the things that they did. In this case, the instructions were focused on consumption from the project file, since that was the use case of the team that wrote the tool originally. But if you understand what the steps are trying to accomplish, you should realize that the intention is to update the references to the old yourfile.ext in every location you want to consume the postprocessed version.³

² I chose the suffix -final as a joking reference to the pattern of seeing files named Document-Final-Final-Final 2-USETHISONE.docx.

³ I took the liberty of updating the wiki to clarify that you need to update all references to yourfile.ext. The references usually come from the project file, but they could be in other places, too, such as a call to makepri.exe.

The post Studying compiler error messages closely: Input file paths appeared first on The Old New Thing.

Read the whole story
alvinashcraft
17 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Code that fits in a context window

1 Share

AI-friendly code?

On what's left of software-development social media, I see people complaining that as the size of a software system grows, large language models (LLMs) have an increasingly hard time advancing the system without breaking something else. Some people speculate that the context windows size limit may have something to do with this.

As a code base grows, an LLM may be unable to fit all of it, as well as the surrounding discussion, into the context window. Or so I gather from what I read.

This doesn't seem too different from limitations of the human brain. To be more precise, a brain is not a computer, and while they share similarities, there are also significant differences.

Even so, a major hypothesis of mine is that what makes programming difficult for humans is that our short-term memory is shockingly limited. Based on that notion, a few years ago I wrote a book called Code That Fits in Your Head.

In the book, I describe a broad set of heuristics and practices for working with code, based on the hypothesis that working memory is limited. One of the most important ideas is the notion of Fractal Architecture. Regardless of the abstraction level, the code is composed of only a few parts. As you look at one part, however, you find that it's made from a few smaller parts, and so on.

A so-called 'hex-flower', rendered with aesthetics in mind.

I wonder if those notions wouldn't be useful for LLMs, too.


This blog is totally free, but if you like it, please consider supporting it.
Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

AI-Assisted Coding: Where It Helps and Where It Doesn’t

1 Share
AI-Assisted Coding: Where It Helps and Where It Doesn’t

Discussions about AI-assisted coding are everywhere—and for good reason. The topic tends to stir up a mix of emotions. Some people are curious about the possibilities, some are excited about improving their day-to-day efficiency, and others are worried these tools will eventually get “smart” enough to replace them.

In this article, I will share my own experiences using AI as a coding assistant in my daily workflow.


My Background (and the Tools I Use)

For context, I’m a full stack engineer with 12 years of web development experience. My current focus is UI development with React and TypeScript.

Depending on the project, I use a variety of LLMs and AI tools, including:


Why Context Matters So Much

Regardless of which model you use, getting good results requires preparation. LLMs produce dramatically better output when they’re given sufficient context about:

  • The problem space
  • The tech stack
  • Architectural constraints
  • Coding standards and preferences

For example, if the only instruction provided is:

“Create a reusable React dropdown component”

…the response could reasonably be:

  • A fully custom component with inline styles
  • A ShadCN-based implementation assuming Tailwind
  • A wrapper around a Bootstrap dropdown

Without more information, the LLM has no idea:

  • Which version of React you’re using
  • Whether the app uses SSR
  • How important accessibility is
  • What design system or component library is standard in your project

Many LLMs won’t ask follow-up questions; they’ll just guess the “most likely” solution.


Global Instructions: The Real Productivity Unlock

You could solve this by writing extremely detailed prompts, but that quickly becomes tedious and undermines the efficiency gains AI is supposed to provide.

A better approach is to supply global context that applies to every prompt.

When using AI tools inside your IDE, this often means configuration files like:

  • CLAUDE.md (for Claude)
  • copilot-instructions.md (for GitHub Copilot)

These files are typically generated during a one-time setup. The AI scans the repository and records important assumptions, such as:

  • “This application uses .NET 8.0”
  • “UI components use ShadCN with Tailwind and Radix primitives”
  • “Authentication is handled via Microsoft Entra ID”

You can also manually update these files or even ask the LLM to update them for you.

If you ask for a “reusable React dropdown component” before and after generating these instruction files, the difference in output quality is usually dramatic. The AI can move faster and align with your repository’s conventions.

Tip: It can be beneficial to separate your instructions into smaller, more specific files in a docs folder (auth.md, data-fetching.md, etc), and point to them from your LLM-specific files. This lets you keep a single source of truth, while allowing multiple LLMs to work efficiently in your project.


The Limits of Context (and Hallucinations)

Even with excellent context, LLMs aren’t magic.

They’re still prone to hallucination (confidently producing content that is incorrect or completely fabricated). A common pattern looks like this:

“I understand now! The fix is…”

…followed by code that’s:

  • More complicated
  • Harder to reason about
  • Still incorrect

This leads to the real question:

When is it actually efficient to use LLMs, and what are they best at?

The strengths and limitations below reflect typical, out-of-the-box usage. In practice, the more effort you invest in context, instruction files, and guidance, the better the results tend to be.

Where AI Shines

In my experience, AI is most effective in these scenarios:

  • Quick prototypes, where code quality isn’t the top priority
  • Translating logic from one programming language to another
  • Single-purpose functions with complex logic that would normally require stepping through a debugger

Common examples:

  • Parsing authentication tokens
  • Formatting dates or strings in very specific ways
  • Creating and explaining regular expressions
  • Investigating and narrowing down error causes
  • Writing CSS or Tailwind classes

Styling is a bit of a toss-up. The AI often adds unnecessary styles, but if CSS isn’t your strong suit, it can still be a big help.


Where AI Falls Short

There are also clear areas where AI is far less effective (without additional guidance or setup):

  • High-level architecture and long-term planning
    LLMs don’t naturally think ahead unless explicitly told to, and even then the results often fall short of what an experienced architect would expect.
  • Producing high-quality, maintainable code quickly
    AI can generate a lot of code fast, but well-structured, modular code often takes longer to review and refactor than writing it yourself. I frequently spend significant time cleaning up AI-generated code.

Final Thoughts

After using AI in my everyday work, my conclusion is fairly simple:

AI is excellent at increasing speed and efficiency, but it does not replace good engineering judgment.

On its own, AI tends to optimize for immediacy rather than long-term maintainability. When left unguided, it will readily generate solutions that work today while introducing architectural fragility or technical debt tomorrow. That’s where skepticism is warranted.

That said, well-architected software is achievable with AI when the right conditions are in place. With strong global context, clearly defined architectural constraints, well-maintained instruction files, and, most importantly, a developer who understands what good architecture looks like, AI can become a genuinely effective collaborator.

Used thoughtfully, AI becomes a powerful accelerator. Used blindly, it can become technical debt.



Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories