Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
153344 stories
·
33 followers

Build an AI-Powered Rich Text Editor in .NET MAUI with AI AssistView

1 Share

Build an AI-Powered Rich Text Editor in .NET MAUI with AI AssistView

TL;DR: Discover how to build a smart Rich Text Editor using .NET MAUI AI AssistView. Empower your writing experience with built‑in AI capabilities such as paraphrasing, tone refinement, grammar correction, content expansion, and content shortening, all seamlessly integrated to help you write smarter and faster.

Enterprise applications generate large volumes of text, including incident reports, audit logs, customer communications, internal documentation, and policy content. As this volume grows, users increasingly expect smart writing assistance directly within their editing experience, such as grammar correction, paraphrasing, tone refinement, and content summarization.

Integrating AI into a rich text editor, however, is not just about calling a large language model (LLM). The real challenge is designing an experience that feels native, predictable, and maintainable without breaking content flow or architectural boundaries.

In this article, you’ll learn how to build a production‑ready, AI‑assisted rich text editor using .NET MAUI,  Syncfusion® Rich Text Editor, and Syncfusion MAUI AI AssistView powered by Azure OpenAI. The solution keeps the editor as the single source of truth while using AssistView as a guided, action‑driven AI layer.

Why use AI AssistView instead of a custom chat UI?

Calling an LLM API is relatively straightforward. Delivering a reliable in‑app AI workflow is not.

Syncfusion AI AssistView is purpose‑built for contextual assistance inside applications rather than mimicking a generic chat interface. It provides:

  • Seamless integration with Syncfusion controls, keeping editor content and AI state synchronized.
  • A structured action → response → apply workflow, instead of free‑form chat.
  • Suggestion‑driven interactions, presenting AI results as product features rather than raw responses.
  • Customizable templates for headers, actions, and suggestion items.
  • Built‑in handling of real‑world scenarios, such as retries, errors, and transformation history.

This model works particularly well for enterprise editors where content ownership, traceability, and UX consistency matter.

How it works (end-to-end flow)

At a high level, the interaction model looks like this sequence:

  1. The user writes or edits content in SfRichTextEditor.
  2. The user selects an action in AI AssistView (e.g., Shorten or Paraphraser).
  3. The ViewModel builds an AI prompt using the selected action and editor HTML.
  4. The app calls Azure OpenAI through an application service layer.
  5. The AI response appears in AI AssistView.
  6. The user applies the result back to the editor with a single click.

This approach ensures that:

  • The editor remains authoritative
  • AI responses are explicit, reviewable, and reversible
  • The user’s writing flow is never interrupted

Building an AI-powered Rich Text Editor

This section outlines how a rich text editor is extended with built‑in AI assistance.

Step 1: Set up the Rich Text Editor

Start by creating a new .NET MAUI project and configuring the Rich Text Editor control following the official setup documentation.

Add the SfRichTextEditor as the primary editing surface and bind its HTML content to your ViewModel:

<rte:SfRichTextEditor x:Name="richTextEditor"  
                      ShowToolbar="True"  
                      HtmlText="{Binding EditorHtml, Mode=TwoWay}" />

Binding the editor content as HTML allows you to preserve formatting while enabling AI‑driven transformations.

Step 2: Add the AI AssistView Interface

Next, integrate the Syncfusion AI AssistView by following the official documentation.

The Syncfusion AI AssistView component can be opened by clicking the button located at the top‑right corner of the Rich Text Editor. It provides intelligent suggestions, responses, and follow‑up actions to enhance user interaction.

A customizable header template featuring prompts such as “How can I help you?” further improves usability. Additionally, you can bind collections from your view model to enable dynamic, interactive AI‑driven conversations.

Here’s how you can do it in code:

<aiassistview:SfAIAssistView x:Name="AssistView"
                             ShowHeader="True"
                             IsVisible="False"
                             HeaderTemplate="{StaticResource headerTemplate}"
                             AssistItems="{Binding AssistItems}"
                             Suggestions="{Binding Suggestions}"
                             SuggestionItemSelectedCommand="{Binding SuggestionItemSelectedCommand}" >
</aiassistview:SfAIAssistView>

<DataTemplate x:Key="headerTemplate">
    <StackLayout HorizontalOptions="Center"
                 Spacing="10"
                 Padding="10">
         <Label Text='&#xe7e1;'
                FontFamily="MauiSampleFontIcon"  
                FontSize="20"  
                HorizontalOptions="Center"  
                VerticalOptions="Center" />  
         <Label Text="How can I help you?"  
                FontAttributes="Bold"  
                FontSize="16"  
                VerticalOptions="Center" />  
    </StackLayout>  
</DataTemplate> 

Step 3: Set up the Azure OpenAI connection

The AI layer is powered by Azure OpenAI. Initialize a secure connection using your endpoint, API key, and deployment name.

Here’s the Azure OpenAI implementation:

private const string endpoint = "YOUR_END_POINT_NAME";
internal const string deploymentName = "DEPLOYMENT_NAME";
private const string key = "API_KEY";

// Build chat client with endpoint, key, deployment
private void GetAzureOpenAIKernal()
{
    var client = new AzureOpenAIClient(
        new Uri(endpoint),
        new AzureKeyCredential(key))
        .AsChatClient(modelId: deploymentName);
    this.Client = client;
} 

Once initialized and validated, the AI engine is ready to handle transformation requests. For detailed information on validation and service implementation, please refer to the AzureBaseService class available in the GitHub repository.

Step 4: Suggestion system and user interaction

Defining AI suggestions

The suggestion system exposes available AI actions that users can apply to editor content, such as:

  • Paraphraser
  • Grammar Checker
  • Elaborate
  • Shorten

Below is the code you need:

public AssistViewViewModel()
{
    _suggestions = new ObservableCollection<ISuggestion>
    {
        new AssistSuggestion { Text = "Paraphraser" },
        new AssistSuggestion { Text = "Grammer Checker" },
        new AssistSuggestion { Text = "Elaborate" },
        new AssistSuggestion { Text = "Shorten" }
    };

    this.SuggestionItemSelectedCommand = new Command(
        obj => _ = OnSuggestionTapCommandAsync(obj));
}

For paraphrasing, you can dynamically inject tone‑based options:

  • Humanize: Conversational and relatable
  • Professional: Business‑oriented tone
  • Simple: Clear and easy to understand
  • Academic: Structured and scholarly

Handling user actions

All suggestion interactions flow through a single command handler in the ViewModel. When a user selects a suggestion, the handler identifies the requested transformation and routes it through the appropriate processing pipeline.

Add this to your project:

private async Task OnSuggestionTapCommandAsync(object obj)
{ 
    var args = obj as SuggestionItemSelectedEventArgs;
    if (args == null || args.SelectedItem is not ISuggestion s)
        return;
    await InputProcessingAsync(s.Text).ConfigureAwait(true);
}
  • Paraphrasing actions return both a transformed response and follow‑up suggestions.
  • Other actions return transformed content only.

This keeps interaction handling centralized and predictable.

Step 5: Applying AI results to the editor

When an AI request is triggered, the system:

  1. Builds a prompt using the editor’s content
  2. Calls Azure OpenAI through a dedicated service
  3. Formats the response
  4. Displays it as an AssistView with an Apply action
private async Task GetResult(object inputQuery)
{
    await Task.Delay(1000).ConfigureAwait(true);
    AssistItem request = (AssistItem)inputQuery;
    if (request != null) 
    {
        var userAIPrompt = GetUserAIPrompt(request.Text, EditorHtml);
        var response = await azureAIService!
            .GetResultsFromAI(userAIPrompt)
            .ConfigureAwait(true);
        response = response.Replace("\n", "<br>");
        AssistItem responseItem = new AssistItem()
        {
            Text = response,
            Suggestion = GetAcceptSuggestion()
            responseItem.RequestItem = inputQuery
        };
        this.AssistItems.Add(responseItem);
    }
}

Nothing is committed to the editor automatically. Users explicitly apply the result, ensuring full control and transparency over content changes.

Enterprise considerations

When moving this pattern into production, keep the following in mind:

  • Performance: Send selected text instead of full documents when possible.
  • Scalability: Add throttling and cancellation for rapid successive requests.
  • Maintainability: Isolate AI calls in a dedicated service layer.
  • Reliability: Handle empty responses and transient errors gracefully.
  • Cost control: Reset prompt history and limit maximum output length.

GitHub reference

Explore the complete .NET MAUI AI‑assisted Rich Text Editor sample implementation on GitHub.

Frequently Asked Questions

Can I customize the RichTextEditor toolbar?

Yes. You can show or hide commands using toolbar settings or build a fully custom toolbar that triggers editor commands programmatically.

Does this solution work offline?

The rich text editor supports basic offline editing. AI features require an active internet connection to access Azure OpenAI.

What are the cost considerations?

Costs depend on Azure OpenAI usage (token‑based) and licensing. Usage varies by model and request volume.

Can I use a different AI provider?

Yes. The architecture is provider‑agnostic. You can replace the Azure AI service with OpenAI, Claude, Gemini, or another compatible API.

What's the maximum content length the Rich Text Editor can handle?

While there’s no hard limit, performance degrades with very large documents (<10,000 words). Consider pagination, lazy loading, or chunking for lengthy content.

Can the AI AssistView display markdown-formatted responses?

By default, it renders plain text and HTML. To display Markdown, parse it to HTML using libraries like Markdig before adding it to the AssistItem.Text.

Does AI AssistView support voice input?

Voice input isn’t built-in. Integrate platform-specific speech recognition APIs to convert voice to text, then send the text as a request to the AI AssistView.

Supercharge your cross-platform apps with Syncfusion's robust .NET MAUI controls.

Conclusion

Thank you for reading! This article demonstrated how AI can be seamlessly integrated into a Syncfusion Rich Text Editor without disrupting the authoring experience. By combining a Syncfusion MAUI AI AssistView with a robust editor and a clean service architecture, you can deliver intelligent writing assistance that feels native, reliable, and enterprise‑ready.

Whether you’re building a documentation platform, content management system, or collaborative editor, this approach provides a solid foundation for AI‑powered authoring workflows.

If you’re a Syncfusion user, you can download the setup from the license and downloads page. Otherwise, you can download a free 30-day trial.

You can also contact us through our support forumsupport portal, or feedback portal for queries. We are always happy to assist you!

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

How to Build Semantic Search for Documentation with NestJS, Qdrant and Xenova

1 Share

In this post, we’ll build a semantic documentation search API that lets users ask natural-language questions instead of matching exact keywords.

In this post, we’ll build a semantic documentation search API that lets users ask natural-language questions instead of matching exact keywords. We’ll use Qdrant as the vector database, Xenova/transformers to generate local text embeddings and NestJS as our API to tie everything together.

We will learn how to run Qdrant with Docker, generate embeddings in Node.js and index docs as vectors with metadata in Qdrant. Our documentation API will provide a pure semantic search endpoint and a hybrid search endpoint that combines filters for an even more effective search.

Prerequisites

  • Basic knowledge of NestJS and TypeScript
  • Basic knowledge of HTTP, RESTful APIs, and cURL
  • Node.js and Docker should be installed

How Semantic Search Works

Semantic search focuses on meaning, not just words. It understands user intent and contextual meaning, then finds data with similar meaning rather than matching keywords. Semantic search solves this by converting text into vectors (arrays of numbers) that capture meaning, and then comparing these vectors to find related information.

For example, if our docs contain the phrase “How to authenticate users using JWT” and a user searches for “login security setup,” semantic search can infer they mean the same thing.

What Is Qdrant?

Qdrant is a vector database built for speed. It stores vectors and handles nearest neighbor calculations quickly. Qdrant uses the HNSW algorithm (Hierarchical Navigable Small World) to find similar vectors and return results in milliseconds. We’ll use the official Docker image to run it locally, which keeps our environment clean and makes the database easy to start and stop.

What Is Xenova?

Xenova lets you run machine learning models directly in Node.js. We’ll use it through the @xenova/transformers package to generate embeddings locally. This means no API calls, no rate limits and our data doesn’t leave our machine. The model downloads once (~23 MB) and caches locally for future use.

Project Setup

First, create a NestJS project:

nest new semantic-search-api
cd semantic-search-api

Next, run the command below to install our dependencies:

npm install @nestjs/config @qdrant/js-client-rest @xenova/transformers uuid \
  && npm install --save-dev @types/uuid

In our install command, @nestjs/config is used to import environment variables into our app, @qdrant/js-client-rest is the JavaScript client for interacting with the Qdrant vector database, @xenova/transformers is used to generate local text embeddings, and uuid is used to create unique identifiers for documents and embeddings.

Running Qdrant with Docker

Instead of installing Qdrant directly, we’ll use Docker Compose to keep our environment clean. Create a docker-compose.yml file at the root of your project and paste the code below:

version: '3.8'

services:
  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    restart: unless-stopped
    ports:
      - "6333:6333"  # REST API port
    volumes:
      - ./qdrant_storage:/qdrant/storage

Start the database in the background:

docker-compose up -d

Next, create a .env file and paste your Qdrant connection settings and embedding configuration:

QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=documentation
QDRANT_VECTOR_DIMENSION=384
HF_MODEL_CACHE=./models

The variables above configure Qdrant’s URL and collection name, set the vector dimension to 384 (which matches our embedding model), and specify where Xenova caches the downloaded model.

Next, let’s update the app module to import the ConfigModule, so that we can load environment variables in our app:

Update your app.module.ts file with the following:

import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';

@Module({
  imports: [
    ConfigModule.forRoot({
      isGlobal: true,
      envFilePath: '.env',
    }),
    // ... we will add other modules here later
  ],
})
export class AppModule {}

Project Structure

Our project structure will look like this:

src/
├── qdrant/
│   ├── qdrant.module.ts
│   └── qdrant.service.ts
├── embeddings/
│   ├── embeddings.module.ts
│   └── embeddings.service.ts
├── documents/
│   ├── documents.module.ts
│   ├── documents.controller.ts
│   ├── document-ingestion/
│   │   ├── document-ingestion.service.ts
│   │   └── document-ingestion.service.spec.ts
│   └── document-processor/
│       ├── document-processor.service.ts
│       └── document-processor.service.spec.ts
└── search/
    ├── search.module.ts
    ├── search.service.ts
    ├── search.service.spec.ts
    └── search.controller.ts

Run the command below to generate the necessary files:

nest g module qdrant && \
nest g service qdrant && \
nest g module embeddings && \
nest g service embeddings && \
nest g module documents && \
nest g service documents/document-processor && \
nest g service documents/document-ingestion && \
nest g controller documents && \
nest g module search && \
nest g service search && \
nest g controller search

Choosing an Embedding Model

For this project, we’ll be using Xenova/all-MiniLM-L6-v2 for embeddings. This model is great at producing sentence-level embeddings, which work well for semantic search over documentation. It is relatively small and fast, and this makes it practical to run in Node.js without requiring a GPU. It outputs fixed 384-dimensional vectors (arrays with a fixed length of 384), which match our Qdrant collection configuration.

The model runs completely locally. On first use, Xenova downloads and caches it, and every subsequent run uses the cached version.

Building the Embedding Service

Our EmbeddingService will be responsible for converting text into vectors. Open the embeddings.service.ts file and update it with the following:

import { Injectable } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';
import {
  pipeline,
  env,
  FeatureExtractionPipeline,
} from '@xenova/transformers';

export type EmbeddingVector = number[];

@Injectable()
export class EmbeddingsService {
  private extractor: FeatureExtractionPipeline | null = null;
  private readonly DIMENSION: number;

  constructor(private readonly configService: ConfigService) {
    const vectorDimensionEnv = this.configService.getOrThrow<string>(
      'QDRANT_VECTOR_DIMENSION',
    );
    this.DIMENSION = parseInt(vectorDimensionEnv, 10);
  }

  private async getExtractor(): Promise<FeatureExtractionPipeline> {
    if (!this.extractor) {
      env.localModelPath = this.configService.getOrThrow<string>(
        'HF_MODEL_CACHE',
      );

      console.log('Loading embedding model (first time only, ~5s)...');
      const pipe = await pipeline(
        'feature-extraction',
        'Xenova/all-MiniLM-L6-v2',
      );
      this.extractor = pipe;
      console.log('Embedding model loaded.');
    }

    return this.extractor;
  }

  async embed(text: string): Promise<EmbeddingVector> {
    const extractor = await this.getExtractor();
    const output = await extractor(text, {
      pooling: 'mean',
      normalize: true,
    });

    return Array.from(output.data as Float32Array);
  }

  async embedBatch(texts: string[]): Promise<EmbeddingVector[]> {
    const extractor = await this.getExtractor();
    const output = await extractor(texts, {
      pooling: 'mean',
      normalize: true,
    });

    const data = Array.from(output.data as Float32Array);
    return Array.from({ length: texts.length }, (_, i) =>
      data.slice(i * this.DIMENSION, (i + 1) * this.DIMENSION),
    );
  }

  async warmup(): Promise<void> {
    try {
      await this.embed('warmup');
      console.log('Embedding model warmup completed.');
    } catch (error) {
      console.error('Embedding model warmup failed:', error);
      throw error;
    }
  }
}

The extractor property stores our loaded embedding model. It is initialized as null and loaded lazily on first use. This means the model only downloads when it is actually needed, rather than slowing down application startup.

The getExtractor() method loads and caches the model. First, we check if this.extractor already exists. If it does, we return it. If not, we set env.localModelPath to tell Xenova where to cache the downloaded model files, then call pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2') to download and load the model.

The embed() method calls the extractor with two options: pooling: 'mean', which averages all token embeddings into a single vector, and normalize: true, which scales the vector to unit length (required for cosine similarity in Qdrant). The extractor returns a Float32Array, which we convert to a regular array using Array.from().

For embedBatch(), we pass an array of texts to the extractor. The model returns a flattened array containing all vectors concatenated together. We split this back into individual vectors by slicing out chunks of 384 values (our vector dimension). The first text gets indices 0–383, the second gets 384–767 and so on.

The warmup() method runs a dummy embedding to preload the model, preventing the first real user request from experiencing a delay while the model loads. Be sure to export the EmbeddingsService in the EmbeddingsModule.

Building the Qdrant Service

This service wraps the vector database and handles creating the collection as well as reading and writing vectors. Update the qdrant.service.ts file with the following:

import { Injectable } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';
import {
  QdrantClient,
  Schemas,
} from '@qdrant/js-client-rest';

export interface IQdrantPayload {
  title: string;
  category: string;
  url: string;
  text: string;
  chunkIndex: number;
  [key: string]: unknown;
}

export interface IQdrantPoint {
  id: string;
  vector: number[];
  payload: IQdrantPayload;
}

@Injectable()
export class QdrantService {
  private readonly client: QdrantClient;
  private readonly vectorDimension: number;
  private readonly collectionName: string;

  constructor(private readonly configService: ConfigService) {
    const url = this.configService.get<string>('QDRANT_URL');
    if (!url) {
      throw new Error('QDRANT_URL is not set in environment');
    }

    this.collectionName =
      this.configService.get<string>('QDRANT_COLLECTION') || 'documentation';

    const vectorDimensionEnv = this.configService.getOrThrow<string>('QDRANT_VECTOR_DIMENSION');
    this.vectorDimension = parseInt(vectorDimensionEnv, 10);

    this.client = new QdrantClient({ url });
  }

  getCollectionName(): string {
    return this.collectionName;
  }

  async setupCollection(): Promise<void> {
    const collections = await this.client.getCollections();
    const exists = collections.collections?.some(
      c => c.name === this.collectionName,
    );

    if (exists) {
      console.log(`✓ Collection "${this.collectionName}" already exists.`);
      return;
    }

    await this.client.createCollection(this.collectionName, {
      vectors: {
        size: this.vectorDimension,
        distance: 'Cosine',
      },
    });

    console.log(`✓ Created collection "${this.collectionName}".`);
  }

  async upsertPoints(points: IQdrantPoint[]): Promise<void> {
    await this.client.upsert(this.collectionName, {
      wait: true,
      points,
    });
  }

  async search(
    vector: number[],
    limit: number,
    filter?: Schemas\['SearchRequest'\]['filter'],
  ): Promise<Schemas\['ScoredPoint'\][]> {
    const params: Schemas['SearchRequest'] = {
      vector,
      limit,
      with_payload: true,
      with_vector: false,
      ...(filter && { filter }),
    };

    return this.client.search(this.collectionName, params);
  }
}

We define an interface for our vector points. Each point has an ID, a vector and a payload. The payload holds metadata such as the text content and URL.

Our constructor reads our environment variables and creates a Qdrant client. The setupCollection() method checks if our collection exists and creates it if it doesn’t. We use Cosine distance, which is the standard for semantic similarity.

The upsertPoints() method saves vectors and their metadata to Qdrant. Finally, the search() method finds similar vectors. We request the payload but not the vector itself, since we only need the metadata for displaying results.

Document Chunking

Document Processing Service

LLMs and vector databases work best with smaller chunks of text. When large text, such as a 10-page document, is embedded as a single vector, specific details get lost.

While our Xenova model has a safe upper bound limit of approximately 2,000 characters, for best search quality it is advised to embed text with a length of 400–600 characters. Therefore, we need to split documents into chunks; for this project, we’ll aim for around 500 characters.

Our chunking strategy will aim for the maximum number of complete paragraphs we can fit within the 500-character limit. Then we’ll start a new chunk with an overlap from the end of the previous chunk. The purpose of the overlap is to preserve context across chunk boundaries.

Update your document-processor.service.ts file with the following:

import { Injectable } from '@nestjs/common';

export interface IDocumentMetadata {
  title: string;
  category: string;
  url: string;
}

export interface IDocumentChunk {
  text: string;
  chunkIndex: number;
  metadata: IDocumentMetadata;
}

@Injectable()
export class DocumentProcessorService {
  private readonly CHUNK_SIZE = 500; // characters
  private readonly OVERLAP = 50;     // characters


  chunkDocument(
    content: string,
    metadata: IDocumentMetadata,
  ): IDocumentChunk[] {
    const chunks: IDocumentChunk[] = [];
    const paragraphs = content
      .split('\n\n')
      .map(p => p.trim())
      .filter(p => p.length > 0);

    let currentChunk = '';
    let chunkIndex = 0;

    for (const paragraph of paragraphs) {
      const potentialChunk = currentChunk
        ? `${currentChunk}\n\n${paragraph}`
        : paragraph;

      if (potentialChunk.length > this.CHUNK_SIZE && currentChunk) {
        // Current chunk is full; emit it
        chunks.push({
          text: currentChunk,
          chunkIndex,
          metadata,
        });

        // Start new chunk with overlap (prefer complete sentence, fallback to word boundary)
        const overlap = this.findOverlap(currentChunk);
        currentChunk = overlap + '\n\n' + paragraph;
        chunkIndex++;
      } else {
        currentChunk = potentialChunk;
      }
    }

    // Emit last chunk
    if (currentChunk.length > 0) {
      chunks.push({
        text: currentChunk,
        chunkIndex,
        metadata,
      });
    }

    return chunks;
  }

  private findOverlap(text: string): string {
    const searchWindow = text.slice(-this.OVERLAP * 2);
    
    // Try to find last complete sentence
    const sentenceMatch = searchWindow.match(/[.!?]\s+([^.!?]+)$/);
    if (sentenceMatch) {
      return sentenceMatch[1].trim();
    }
    
    // Fallback: find word boundary near target overlap length
    const tail = text.slice(-this.OVERLAP * 1.5);
    const wordMatch = tail.match(/\s+(\S+.*)$/);
    if (wordMatch) {
      return wordMatch[1].trim();
    }
    
    // Last resort: from last space
    const lastSpace = text.lastIndexOf(' ', text.length - this.OVERLAP);
    return lastSpace !== -1 ? text.slice(lastSpace + 1) : text.slice(-this.OVERLAP);
  }
}

In the code above, we set our chunk size to 500 characters with a 50-character overlap. The overlap helps preserve context across chunk boundaries.

The chunkDocument() method splits text by paragraph, then accumulates paragraphs until the next one would exceed our limit. It then saves the chunk and starts a new one with an overlap from the end of the chunk that was just saved.

The findOverlap() method tries to find a complete sentence for the overlap first. If that fails, it looks for a word boundary. This keeps the overlap readable rather than cutting words in half.

Document Ingestion Service

This service processes raw documents, converts them to vectors and saves them in Qdrant. Update your document-ingestion.service.ts file with the following:

import { Injectable } from '@nestjs/common';
import { v4 as uuidv4 } from 'uuid';
import { DocumentProcessorService, IDocumentMetadata, IDocumentChunk } from '../document-processor/document-processor.service';
import { EmbeddingsService, EmbeddingVector } from '../../embeddings/embeddings.service';
import { QdrantService, IQdrantPoint, IQdrantPayload } from '../../qdrant/qdrant.service';

export interface IRawDocument extends IDocumentMetadata {
  content: string;
}

@Injectable()
export class DocumentIngestionService {
  constructor(
    private readonly processor: DocumentProcessorService,
    private readonly embeddings: EmbeddingsService,
    private readonly qdrant: QdrantService,
  ) {}

  /**
   * Ingest one or more raw documents:
   * - Chunk content into smaller overlapping pieces.
   * - Embed all chunk texts in a batch.
   * - Upsert points (vector + payload) into Qdrant.
   */
  async ingestDocuments(docs: IRawDocument[]): Promise<{
    status: 'ok' | 'error';
    documents: number;
    totalChunks: number;
    skipped: number;
    error?: string;
  }> {
    try {
      if (!docs?.length) {
        return { status: 'ok', documents: 0, totalChunks: 0, skipped: 0 };
      }

      await this.qdrant.setupCollection();

      let totalChunks = 0;
      let skipped = 0;

      for (const doc of docs) {
        const result = await this.ingestDocument(doc);
        if (result.success) {
          totalChunks += result.chunks;
        } else {
          skipped++;
        }
      }

      return {
        status: 'ok',
        documents: docs.length - skipped,
        totalChunks,
        skipped,
      };
    } catch (error) {
      console.error('Fatal error during ingestion:', error);
      return {
        status: 'error',
        documents: 0,
        totalChunks: 0,
        skipped: docs?.length ?? 0,
        error: error instanceof Error ? error.message : 'Unknown error',
      };
    }
  }

  private async ingestDocument(doc: IRawDocument): Promise<{
    success: boolean;
    chunks: number;
  }> {
    try {
      if (!doc.title || !doc.content) {
        console.warn('Skipping document, missing title or content');
        return { success: false, chunks: 0 };
      }

      const metadata: IDocumentMetadata = {
        title: doc.title,
        category: doc.category || 'uncategorized',
        url: doc.url || '',
      };

      const chunks = this.processor.chunkDocument(doc.content, metadata);
      if (chunks.length === 0) {
        console.warn(`Skipping "${doc.title}" - produced no chunks`);
        return { success: false, chunks: 0 };
      }

      const vectors = await this.embeddings.embedBatch(
        chunks.map(chunk => chunk.text),
      );

      if (vectors.length !== chunks.length) {
        console.error(
          `Error ingesting "${doc.title}": expected ${chunks.length} vectors, got ${vectors.length}`,
        );
        return { success: false, chunks: 0 };
      }

      const points = this.createQdrantPoints(chunks, vectors);
      await this.qdrant.upsertPoints(points);

      console.log(`✓ Ingested "${doc.title}" (${chunks.length} chunks).`);
      return { success: true, chunks: chunks.length };
    } catch (error) {
      console.error(`Error ingesting "${doc.title}":`, error);
      return { success: false, chunks: 0 };
    }
  }

  private createQdrantPoints(
    chunks: IDocumentChunk[],
    vectors: EmbeddingVector[],
  ): IQdrantPoint[] {
    return chunks.map((chunk, index) => ({
      id: uuidv4(),
      vector: vectors[index],
      payload: {
        title: chunk.metadata.title,
        category: chunk.metadata.category,
        url: chunk.metadata.url,
        text: chunk.text,
        chunkIndex: chunk.chunkIndex,
      },
    }));
  }
}

In our constructor, we inject three services: the document processor for chunking, the embeddings service for generating vectors and the Qdrant service for database operations.

The ingestDocuments() method processes multiple documents at once. First, it confirms that the Qdrant collection is set up, then it processes each document individually. If one document fails, we still proceed with the others while tracking which documents were skipped and which were successful.

The ingestDocument() method handles the actual ingestion for individual documents. It verifies that the document has the required fields, sets up metadata, chunks the content, generates embeddings and saves them to Qdrant. It also confirms that the number of vectors is consistent with the number of chunks; if not, it sends a warning and skips that document.

The createQdrantPoints() method is a helper that combines our chunks, vectors and metadata into the format Qdrant expects.

Documents Controller and Module

Update your documents.controller.ts file with the following:

import { Body, Controller, Post } from '@nestjs/common';
import { DocumentIngestionService, IRawDocument } from './document-ingestion/document-ingestion.service';

@Controller('documents')
export class DocumentsController {
  constructor(private readonly ingestionService: DocumentIngestionService) {}

  @Post('ingest')
  async ingest(@Body() body: { docs: IRawDocument[] }) {
    if (!body?.docs?.length) return { error: 'No documents provided' };
    return this.ingestionService.ingestDocuments(body.docs);
  }
}

Next, update the DocumentsModule to import the EmbeddingsModule and QdrantModule.

Building the Search Service

The search service handles user queries. Update your search.service.ts file with the following:

import { Injectable } from '@nestjs/common';
import { EmbeddingsService } from '../embeddings/embeddings.service';
import { QdrantService, IQdrantPayload } from '../qdrant/qdrant.service';
import { Schemas } from '@qdrant/js-client-rest';

export interface ISearchResult {
  title: string;
  snippet: string;
  url: string;
  category: string;
  score: number;
  chunkIndex: number;
}

@Injectable()
export class SearchService {
  private static readonly MIN_LIMIT = 1;
  private static readonly MAX_LIMIT = 100;
  private static readonly DEFAULT_LIMIT = 10;

  constructor(
    private readonly embeddings: EmbeddingsService,
    private readonly qdrant: QdrantService,
  ) {}

  private mapHits(hits: Schemas\['ScoredPoint'\][]): ISearchResult[] {
    return hits
      .filter(hit => hit.payload && hit.score !== undefined)
      .map(hit => {
        const payload = hit.payload as IQdrantPayload;
        return {
          title: String(payload?.title ?? ''),
          snippet: String(payload?.text ?? ''),
          url: String(payload?.url ?? ''),
          category: String(payload?.category ?? ''),
          score: hit.score ?? 0, //similarity score
          chunkIndex: Number(payload?.chunkIndex ?? 0),
        };
      });
  }

  private validateAndNormalizeQuery(query: string): string | null {
    const trimmed = query?.trim();
    return trimmed && trimmed.length > 0 ? trimmed : null;
  }

  private clampLimit(limit: number): number {
    return Math.max(SearchService.MIN_LIMIT, Math.min(SearchService.MAX_LIMIT, limit));
  }

  private createCategoryFilter(category: string | null | undefined): Schemas\['SearchRequest'\]['filter'] | undefined {
    const trimmed = category?.trim();
    if (!trimmed) {
      return undefined;
    }
    return {
      must: [
        {
          key: 'category',
          match: { value: trimmed },
        },
      ],
    };
  }

  private async performSearch(
    query: string,
    limit: number,
    filter?: Schemas\['SearchRequest'\]['filter'],
  ): Promise<ISearchResult[]> {
    try {
      const vector = await this.embeddings.embed(query);
      const hits = await this.qdrant.search(vector, limit, filter);
      return this.mapHits(hits);
    } catch (error) {
      console.error('Error performing search:', error);
      return [];
    }
  }

  
  async search(query: string, limit = SearchService.DEFAULT_LIMIT): Promise<ISearchResult[]> {
    const normalizedQuery = this.validateAndNormalizeQuery(query);
    if (!normalizedQuery) {
      return [];
    }
    return this.performSearch(normalizedQuery, this.clampLimit(limit));
  }

  async searchWithCategory(
    query: string,
    category?: string | null,
    limit = SearchService.DEFAULT_LIMIT,
  ): Promise<ISearchResult[]> {
    const normalizedQuery = this.validateAndNormalizeQuery(query);
    if (!normalizedQuery) {
      return [];
    }
    const filter = this.createCategoryFilter(category);
    return this.performSearch(normalizedQuery, this.clampLimit(limit), filter);
  }
}

The mapHits() method converts Qdrant’s raw response into a user-friendly format. The validateAndNormalizeQuery() method verifies we have an actual query string, clampLimit() keeps result counts within safe limits and createCategoryFilter() builds the Qdrant filter object when users filter by category.

The performSearch() method embeds the user query, searches Qdrant and maps the results.

The search() method uses only pure semantic search with no filters, while searchWithCategory() adds category filtering for more specific searches.

Search Controller

Our search controller exposes two endpoints. The /search endpoint provides pure semantic search, while /search/hybrid adds category filtering. Update your search.controller.ts file with the following:

import { Controller, Get, Query } from '@nestjs/common';
import { SearchService } from './search.service';

@Controller('search')
export class SearchController {
  constructor(private readonly searchService: SearchService) {}

  private parseLimit(limit: string | undefined): number {
    const parsed = Number(limit);
    return isNaN(parsed) || parsed <= 0 ? 10 : parsed;
  }

  @Get()
  async search(
    @Query('q') q: string,
    @Query('limit') limit?: string,
  ) {
    if (!q) {
      return { error: 'Query parameter "q" is required' };
    }
    return this.searchService.search(q, this.parseLimit(limit));
  }

  @Get('hybrid')
  async hybrid(
    @Query('q') q: string,
    @Query('category') category?: string,
    @Query('limit') limit?: string,
  ) {
    if (!q) {
      return { error: 'Query parameter "q" is required' };
    }
    return this.searchService.searchWithCategory(q, category, this.parseLimit(limit));
  }
}

Finally, update the SearchModule to import the EmbeddingsModule and QdrantModule.

Application Warmup

We don’t want the first user request to hang while our ML model loads, so we’ll add a warmup phase during application startup.

Update your main.ts file with the following:

import { NestFactory } from '@nestjs/core';
import { AppModule } from './app.module';
import { EmbeddingsService } from './embeddings/embeddings.service';
import { QdrantService } from './qdrant/qdrant.service';

async function bootstrap() {
  const app = await NestFactory.create(AppModule);

  try {
    console.log('Starting services warmup...');
    const embeddings = app.get(EmbeddingsService);
    const qdrant = app.get(QdrantService);
    
    await Promise.all([
      embeddings.warmup(),
      qdrant.setupCollection(),
    ]);
    console.log('✓ Services ready.');
  } catch (err) {
    console.error('Warmup failed', err);
    process.exit(1);
  }

  await app.listen(3000);
}
bootstrap();

This loads the ML model and sets up the database collection before accepting requests.

Testing the API

Run the following in your terminal to start your server:

npm run start:dev

You should see the warmup logs followed by the server start message.

Ingesting Documents

Let’s add some test documents:

curl -X POST http://localhost:3000/documents/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "docs": [
      {
        "title": "API Authentication",
        "category": "Security",
        "url": "/docs/auth",
        "content": "To access the API, you must use a Bearer token in the header. Tokens expire after 1 hour."
      },
      {
        "title": "Rate Limiting",
        "category": "Performance",
        "url": "/docs/rate-limits",
        "content": "We limit requests to 100 per minute per IP address. Exceeding this returns a 429 Too Many Requests error."
      }
    ]
  }'

Your response should be:

{
  "status": "ok",
  "documents": 2,
  "totalChunks": 2,
  "skipped": 0
}

Searching Documents

Let’s try a semantic search. Note that we’re not using the exact words “Bearer” or “header”:

curl "http://localhost:3000/search?q=how%20do%20I%20log%20in%20to%20the%20api"

The system understands that “log in” is semantically related to “authentication” and “Bearer token,” so it returns the Authentication document.

Sample response for semantic search

Let’s try searching with a category filter:

curl "http://localhost:3000/search/hybrid?q=api%20limits&category=Performance"

Your response should be:

Sample response for category filter

This searches for content semantically related to “api limits” but only returns results in the “Performance” category.

Conclusion

In this article, we learned how to run a vector database locally with Docker, generate embeddings without external APIs, chunk documents with overlapping windows and combine vector similarity with metadata filtering.

Possible next steps include swapping Xenova for OpenAI if we need larger models, or moving Qdrant to a cloud cluster for millions of vectors, while preserving the existing NestJS logic.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

How DataOps principles help to reduce GenAI risk and improve data quality

1 Share

John Kerski explains how DataOps principles improve trust, quality, and governance when using GenAI in data solutions. Includes John’s advice on how to apply these principles – plus tips for Git, testing, Microsoft Fabric, and more.

Over the past three years, the presence of Generative Artificial Intelligence (GenAI) in the world of data has profoundly changed how we build solutions. I’ve seen teams incorporate GenAI into their development processes as vendors continue introducing AI tools such as GitHub Copilot and Copilot for Fabric. The dependence on these tools to accelerate work is not much different from what IntelliSense and integrated development environments (IDEs) did years ago as the industry moved away from punch cards and assembly.

Yet, from what I have seen, AI only exacerbates existing problems with data and the processes we use to build solutions. AI is another tool we have, but productivity and trust in what our solutions produce can quickly be doused when it’s used incorrectly. Here are some of the issues I’ve seen with AI in data solutions in the past two years alone.

AI updates code with no audit trail

I’ve seen teams use AI to update the likes of SQL, Power Query and Python without considering how they’d roll back those changes if AI introduced a mistake. Whether it’s the model, poor constraints in prompting, or context rot, where the model no longer “sees” prior information that would make it more effective, AI can make updates to code that introduce errors. How do you identify what changed and then roll those changes back?

AI updates code with no safety net

Let’s be clear – GenAI is built and trained by humans, and humans are fallible. So, when you ask AI to build a new set of data transformations from an API source, should you trust it implicitly? How do you know the code it builds handles 400 errors gracefully, or backs off appropriately when it receives a 429 error? If I replaced the word AI with “junior data engineer,” would you answer those questions differently? From my experience, the answers should not be different.

AI answers your client’s questions with little oversight

I’ve encountered situations where agents in Copilot or Data Agents in Microsoft Fabric answer questions inconsistently. They may even answer questions they shouldn’t. Asking a Financial Data Agent for a good brownie recipe is not the desired outcome!

How DataOps principles help with AI usage

These issues are exactly why DataOps is more important than ever for your project teams. DataOps is a set of principles for reducing production errors while increasing the delivery of data solutions. The wonderful thing about principles is that they apply regardless of the tools or technologies involved.

So, I’d like to offer a few principles you should make inherent to your teams’ work. I’ll also include some tips for Fabric and Power BI that you should be applying with GenAI today. I put them in order so you can focus on one principle at a time, each over a 1-2 month period.

Principle #1: Make it reproducible

Reproducible results are required and therefore we version everything. That means data, low-level hardware and software configurations, and the code and configuration specific to each tool in the toolchain.

Tip #1: Embrace Git

Notice that I bolded the words ‘we version’ above. In our industry, that means using Git. Git is fundamental to giving your teams peace of mind that, whether AI updates a notebook or a Power BI report, you know exactly what changed (and when.)

There is a learning curve to Git, yet features like Fabric Git Integration make it easier than ever to save versions of your work. Fabric also provides support for version control with both Azure DevOps and GitHub.

For many teams, Git represents the steepest part of the learning curve. However, once your team builds the habit of cloning, committing, syncing, and merging their changes – and treats the repository as the single source of truth rather than the workspace – you’ll have a solid foundation for the DataOps principles that follow.

Enjoying this article? Subscribe to the Simple Talk newsletter

Get selected articles, event information, podcasts and other industry content delivered straight to your inbox.
Subscribe now

Principle #2: Improve cycle times

We should strive to minimize the time and effort required to turn a customer need into an analytic idea. We should create it in development, release it as a repeatable production process and, lastly, refactor and reuse that product.

Tip #2 – Identify how you use AI to update your code

The companies providing these GenAI tools are of course interested in growing their customer base. That’s why they offer personal productivity plans. However, are you aware of the terms of service attached to these plans? They’re likely less strict on data residency and privacy, especially the free ones (because free is not really free.) Do you know how many people on your teams are using their own personal plans? Many of these GenAI tools have access to the data and code you’re working on, and then send that data to centers and logs around the world.

I don’t intend to scare you away from using GenAI but you should consider these security aspects carefully. Fortunately, many enterprise-grade tools have different terms of service that are more favorable to company data privacy. Tools like GitHub Copilot Enterprise, for example, lets you isolate GenAI models to ones deployed in your Azure tenant with Foundry.

This isn’t a new concept that makes GenAI a security pariah, though. Many in our industry remember the security concerns around ‘Bring Your Own Device’ when mobile device usage dramatically increased. That was another tool that accelerated productivity and rankled IT security personnel.

Ultimately, as they did with ‘BYOD’, teams just need to consider the security concerns and implement practices to mitigate the risks. Remember, also – the tools to manage these risks will get better, so reassess the situation often.

Protect your data. Demonstrate compliance.

With Redgate, stay ahead of threats with real-time monitoring and alerts, protect sensitive data with automated discovery & masking, and demonstrate compliance with traceability across every environment.
Learn more

Tip #3 – Use Visual Studio Code to aid development with Fabric

Visual Studio Code makes saving work to Azure DevOps and GitHub much easier. There are also a cadre of extensions that make working with Fabric easier, including Fabric Data Engineering and Microsoft Fabric MCP.

Plus, for Power BI Desktop development, VS Code is becoming a complementary tool that lets AI make changes to your models and reports through the Power BI Modeling MCP.

Tip #4 – Implement workspace governance

With Git in place, you can ensure your development work is separate from what your customers see. At a minimum, you should keep two workspaces: one for development and one for production. If you can afford it, you should also have separate Fabric capacities for production and development. That way, production won’t be impacted if you make a mistake, such as a notebook mistakenly running a merge of large tables that consumes a lot of capacity.

It also means you should have a capacity (albeit a smaller one like an F4) for Copilot for Fabric. AI assisting with building code or answering questions should not come at the detriment of processing data. Analytics work is a volatile aspect of Fabric consumption – it’s hard to predict because it’s an exploratory endeavor, and asking AI to help explore is just as unpredictable. Keep your data engineering Fabric consumption separate from your analytics consumption.

Principle #3: Quality is paramount

Analytic pipelines should be built with a foundation capable of automated detection of abnormalities (jidoka), security issues in code, configuration, and data. It should also provide continuous feedback to operators for error avoidance (poka- yoke).

Tip #5 – See an error? Build a test

Testing is of utmost importance when AI is used in code generation. Without testing, how do we prove that AI didn’t introduce a mistake or fail to handle our requirements?

Whether AI accidentally changed a data type that broke a relationship, or didn’t use the DAX function TREATAS on the right column for a DAX measure, the solution is the same: build a test. And it’s no different if one of your team makes a mistake. Generating a test is the first thing you should do to help prevent it from happening again. Over time, you’ll build up a ‘safety net’ of tests to help prevent future mishaps.

Tip #6 – Use Pytest, wheel files, and environments for notebook development

Python has long had the ability to test transformations. When you have AI build Python transformations, you should have defined tests in pytest to validate those transformations. When those tests pass, the Python code should be compiled into wheel files and then added to an environment. This encapsulates your code with sound testing practices, reduces AI-driven regression errors, and protects the version of that code in environments.

Note: I have a sample project you can use as an example to help you get started.

Tip #7 – Test your semantic models

With the advent of the Power BI project file format and the introduction of the Power BI Modeling MCP server (which gives GenAI tools the ability to directly update your model), I’ve seen teams let AI update DAX and Power Query without testing. But how do you know what AI updated is correct?

Well, you can use the DAX Query View Testing Pattern and User Defined Functions such as PQL.Assert to build tests against your model. You can validate that the content has the correct distinct columns, that DAX measures output consistent results under certain filters, and that the relationships in a model are preserved, all within the semantic model. This lays the foundation for automated testing.

Tip #8 – Test your data agents

The Fabric Data Agent SDK, built to be run in a notebook, can be used to impersonate a customer asking questions so you can inspect the results and ensure they are consistent and appropriate. You should also be testing with inappropriate questions.

Regarding this, your leadership should understand that the questions stored in the notebook may be unsuitable in pleasant conversation, but are necessary to make sure the agent responds appropriately. For example, by asking an agent to give the definition of a curse word, you can validate that the agent does not return the word. It should instead simply say it cannot answer the question.

Testing a Data Agent with various questions

Principle #4: Monitor for quality and performance

Our goal is to have performance, security, and quality measures that are monitored continuously to detect unexpected variation and generate operational statistics.

Tip #9 – Testing does not stop once the solution has shipped!

To find issues before your customers do, it’s crucial to test and track your data’s journey to the customer during every step of the process. Eventhouse Monitoring allows workspace admins to start getting real-time insights into refreshes, and I have a template that can help.

This best practice also includes Data Agents, Fabric’s AI implementation for chatting with data. Both the data the agent queries and the model used to infer answers can, and will, change (including model deprecation.) The notebook I referenced earlier can also be used to test the agent and log results. Furthermore, these results can be logged to an eventhouse – giving you near real-time insights into the behavior of the Data Agent.

As of March 2026, seeing the prompts and conversations used by users with Data Agents is not built into the Fabric product. I’d instead suggest using Copilot Studio. While there is an additional cost, it does have more robust options. My hope is that we start seeing user activity sent to Eventhouse Monitoring in the near future.

In summary: why you should use DataOps principles with GenAI

GenAI has the capability to accelerate the delivery of solutions, and DataOps provides the principles to keep teams from crashing (both technically and personally). GenAI will continue to improve, introducing new capabilities that disrupt the industry. At the same time, DataOps principles are still as useful as ever. I hope, as a result of this article, you consider how they can make your teams better in these early days of GenAI.

FAQs: How DataOps principles help to reduce GenAI risk and improve data quality

1. How does Generative AI (GenAI) impact data engineering workflows?

Generative AI accelerates development but can introduce errors, inconsistent outputs, and governance challenges if not properly managed.

2. Why is DataOps important when using AI in data solutions?

DataOps ensures reproducibility, quality, and monitoring, helping teams reduce errors and maintain trust in AI-assisted workflows.

3. What are the risks of using AI-generated code in data projects?

Risks include lack of audit trails, missing error handling, security concerns, and untested code changes that may break pipelines.

4. How can teams safely use AI tools like GitHub Copilot?

Teams should implement version control (Git), enforce testing, review AI-generated code, and use enterprise-grade tools with proper data governance.

5. What role does testing play in AI-driven data development?

Testing validates AI-generated code, prevents regressions, and builds a safety net to ensure consistent and accurate data outputs.

6. How can Microsoft Fabric and Power BI teams manage AI risks?

By using Git integration, workspace separation, automated testing, and monitoring tools to maintain performance and data quality.

7. What is the best way to track AI-generated code changes?

Using version control systems like Git to maintain a clear audit trail and enable rollback of AI-generated updates.

The post How DataOps principles help to reduce GenAI risk and improve data quality appeared first on Simple Talk.

Read the whole story
alvinashcraft
15 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Learn T-SQL With Erik: Nonclustered Index Interactions

1 Share

Learn T-SQL With Erik: Nonclustered Index Interactions


Chapters

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we are going to continue with our little baby slices of material from my Learn T-SQL with Erik course. And today we are going to talk about nonclustered indexes and their relationship to either the heap or clustered table that they were created on. And of course, this, you know, again, this is not specifically like, here’s how to write a query, Learn T-SQL, SQL Server stores data, and learning about how those data structures relate both to each other and to the base tables that they are created on. In this case, is very, very valuable for people who want to seriously practitioners T-SQL, because without this knowledge, your queries are going to suck. They’re going to be terribly slow, and people are going to be angry at you. You can be as advanced as you want in the writing of T-SQL, but unless it, unless those queries ever finish, no one’s ever truly going to apply and appreciate them. So, that’s why we must talk about these things. So, down in the video description, you will see all sorts of very helpful links for you to give me money. You can hire me for consulting. I still need to tell people what to do with their SQL servers, which is fine with me. I don’t mind doing that. In fact, I’m rather accustomed to it. You can purchase my training materials. There is a link, actually, with a discount code attached for this very course.
Right? Bloop, bloop, bloop, bloop, bloop, bloop, bloop. You can become a supporting member of the channel to keep the wind in my sails, the gas in my engine, whatever. Does the gas go into the engine? I don’t know. I’ve never had a car. But to keep creating this high-quality SQL Server content for you or one of your friends, if you share this video with one of them after you like and subscribe, if you haven’t done either of those already. I don’t want to be repetitive here, but unless I say these things, people just sort of passively watch and then I never see them again.
If you would like to monitor your SQL servers for free, I have a very, very free $0, zero-cost monitoring tool available up on GitHub. It’s this link, but it’s down in the video description. You got to go click on something, right? But it’s totally free, totally open source, no email, no phone home, no weird telemetry data. It’s just all the stuff that I would look at if I were a SQL Server monitoring tool, which now I am. So now you know what I’d look at. Congratulations.
Weight stats, blocking, deadlocks, queries, CPU, memory, disk I.O., all that fun stuff. You know, normal SQL Server monitoring stuff, just way better. And if you are the type of robot-loving person that seems to make up the majority of the IT world today, you can turn on, optionally, if you can opt in to using MCP servers to allow your robots to talk to purely your monitoring data.
And they can make a much better sense out of what I have collected than if you were to just go allow them to run random DMV queries on their own. Because, I mean, look, I’m not trying to get on the case of the LLMs and, like, all that, but they still mess a lot of stuff up, and they still make a lot of stuff up, at least, you know, in my dealings with them.
No, I don’t know that I would just let them go do that. But the MCP tools that I built in here are all very well defined. The schema is all sort of predefined.
The results, like, the columns that they represent is all predefined. And so it’s much easier for them to manage that. There’s nothing to hallucinate there. So they could still make things up, but they can’t hallucinate things existing or not existing, which is supposedly a step in the right direction.
Anyway, if you would still, if you still value human interactions, I will be leaving the house. And when I leave the house, these are the places I’ll be going, right?
I’ll be in, well, let’s do this in order, kind of. I’m going to be at Pass on Tour. I’m going to lose my entire arm up here. I’m going to be at Pass on Tour in Chicago, May 7th and 8th, right?
And then from there I fly to Poland from May 11th to the 13th. And then from there I fly home. And after I’m home for about a month, I fly to date a Saturday, Croatia, June 12th and 13th.
And then I’m going to be home for a little, as far as I know, I’m going to be home for a little bit. And then I go to Seattle, Washington, November 9th through 11th. So get your dance and rain boots on for that.
But for now, it is April. And April is baseball and floating hot dogs. Because this gentleman database is clearly either, that thing is about to drop to the ground.
Or, I don’t know, maybe it’s like floating to his mouth. I don’t know what happened there. But everyone’s got beer and they’re happy because they won.
At least they think they did. This guy looks angry. You know, like angry eyebrows. This guy’s kind of like sad looking. This guy’s angry looking too. Champion’s guy looks happy.
But he’s, again, wearing red. So I don’t know how he got into this picture. I feel like, oh, this isn’t brown. I don’t know. Someone’s about to get shot, maybe. Anyway, let’s talk about nonclustered indexes. Now, we cannot talk about nonclustered indexes without some sort of actually type person piping up and saying.
But what about, what about, what about, I don’t know, chicken little. What if the sky does fall on your head? I don’t know what to tell you.
What, won’t they slow down inserts, updates, and deletes? I’m like, yeah, merges too. They do have an impact on these things. But we’ve got this query over here that’s running for like a minute.
And if we add an index, it’ll run really fast. So maybe reducing that query’s impact on the server sort of amortizes the cost of maintaining this index for the inserts, updates, and deletes. Maybe that’ll happen.
Maybe there are trade-offs that we have to make in life. Maybe there are things that we as professionals in the database world have to test. But make sure that they work as expected.
I know, it’s crazy. So yes, every index is a separate object on disk and memory. It is a copy of the data from either the heap or clustered table.
Every index will add some overhead to modifications because you’re going to have to write those changes to the transaction log. You’re even going to have to lock them to complete writes. All right?
My goodness. Even indexes that are unused by reads. And we hope to get rid of those. We do seek to remove indexes that are not helping our queries go faster. So don’t keep those around. Need to be maintained because the SQL Server doesn’t know when a query might show up and need to use them.
All right? So we need to read those indexes into the buffer pool. Sometimes a little teeny bit of the index if we do a seek. Sometimes a lot of an index if we do a scan.
Right? So the main points that poor indexing habits can introduce, the main problems that they can introduce to your systems, consist of typically these three points.
Buffer pool contention because you have more objects vying for space in your buffer pool. You have more transaction log activity because there are more objects you need to write information about changes to the transaction log. And you may even see increased locking and maybe even increased lock escalation attempts and sometimes even successes.
Assuming that there are no competing locks that would prevent a lock escalation from becoming a success after it gets attempted. So there are things that we do have to be aware of when we are indexing things. So all of these things being true simultaneously, we must come to an agreement, you and I.
We must come to an arrangement that suits everybody. We must compromise and we must say that we need indexes but we should enjoy them responsibly. The same way that we enjoy other leisure activities responsibly so that we don’t end up throwing up on ourselves.
All right? But we also need to query responsibly. Don’t we?
The wider your indexes, or rather the wider your queries are, the wider your indexes may have to be in order for SQL Server to actually use them. All right? Because if we are writing select star-ish queries, or if we have queries that are, you know, attempting to sort of kitchen sink, evaluate 14 different columns in a table, well, if we don’t have indexes that SQL Server considers covering for those very wide queries, if our queries have a very wide berth, we’ve got wide hips on that query, well, guess what?
SQL Server might say, ah, no, I’m just going to scan that clustered index instead. And let’s talk about why that happens. That’s a great question.
Why does that happen? I’ve created this index on the reputation column in the users table. It maxed up one so that I get a more consistent statistics histogram. Sometimes that works.
Sometimes it doesn’t. But anyway, if we, the important thing that I want to show you in this query, or rather across these three queries, is the cost of the key lookup in the plan.
Because remember, our index up here is only on the reputation column. And so when we go to get other columns from the table, and I’m telling SQL Server here for a bit of demo stability that it must use this index, or else it will be in big trouble, then SQL Server has to get this column from somewhere, and so a key lookup gets involved.
If we run this query, and we look at the cost of the key lookup, SQL Server says, I think that’s going to cost 47.3194 query bucks to fetch that one column, that one integer column, from the users table, a total of 15,656 times, right?
Because we have to do that many loops to go get that set of columns, or that column from the clustered index down here. Okay, remember that number, four, gosh darn it, you, 47.3194, alright? What if we say, I need to go get an envarchar max column, because this is a count ID column again, this is an integer, right?
And this about me column, well this is not an integer at all. If that other helpful thing will show up, we can see this, about me is indeed an envarchar maximum, alright? Max envarchar there. How much does this cost?
Well, shockingly, 47.3194 query bucks. Oh dear. So, SQL Server assigns the same cost to going to get a 4 byte integer as it does to going to get a book with a key lookup.
Isn’t that interesting? That completely leaves out that we might have to do, like, lob reads, or read off-road pages, or off-road data, or any of that other stuff. It just has the same cost, right?
Wild. And even still, if we sell SQL Server, hey, go get everything from that table. I guarantee you, my friends, this is not a metadata cost that we are concerned about here. Anyone who talks about metadata in these situations should be hit with a hammer.
We have the same plan, right? But we have, also, we have the same cost assigned here. 47.3194 query bucks.
So, getting everything from the table has the same cost, with a lookup, has the same cost as getting just one integer column from the table, right? So, you don’t even have to do a select star query.
Remember, all the folks on LinkedIn who give you the groundbreaking advice to avoid select star. My goodness, you geniuses, you. They are not quite, not quite all together, are they?
Because you don’t have to do select star. All you have to do is get one extra column, even just one integer column that is not represented in your nonclustered index, for SQL Server to need to evaluate the cost of lookup versus clustered index scan plans, and to maybe not use your index just based on that one column difference.
So, while I do agree that select star makes things a little bit harder, it also doesn’t take select star to get index choice to be a sort of painful point during query optimization.
You can ignore that number. That was a previous run. I guess, I don’t know. I installed a cumulative update, math changed on me, or something. Hard to tell.
But, that’s all. All I know is that the last time I ran through this, that’s how much it cost it did. Funny. All right. So, let’s see.
When we let SQL Server choose which index to use, right? And we’re going to look for reputation equals 16 now. SQL Server says 38,000. Nope.
Scanning the clustered index. That takes 234 milliseconds to do that. Okay. That’s fine. Keep in mind that the clustered index scan is a fixed cost, right? So, if we needed to go get, say, like, that was, how many?
38,000? I think 21 is like 58,000 or something. Yeah, 51,000. This will have the same fixed cost, right?
Because we still have to scan the clustered index and do the same amount of work either way. So, that’s 60, oh, let’s see. 68.3519 query bucks.
And that is, execution plan. 68.3519 query bucks. So, because scanning the clustered index represents the same amount of work, no matter how many rows we estimate might come out of that clustered index scan, it is a fixed cost based on pages and whatnot, right?
So, the bigger your clustered index is the more expensive it gets, at least from a CPO and IO perspective. Lookup costs are dynamic, and they are based on the number of times the lookup has to be performed. I find this to be a very sensible arrangement.
If we were to go look at the execution plan for this, we would see that the cost of this lookup is no longer 47.3819 query bucks, but for one single row, it is, oh, you know what? Oh, that’s the right place. 0.0032831 query bucks.
A very small number. So, lookup costs are dynamic based on the number that SQL Server has to get done. There is some math done around sort of like the first one is expensive, but then following ones are less expensive because SQL Server sort of figures, well, like this one might be from an empty, it’s going to be from an empty buffer pool, it assumes. And then like the next one will, like maybe the day is going to be on the same page, so I can read that from memory.
So, like the first one’s expensive, but then the next ones are, the cost is reduced for each one. So, good things to know when one is looking at query plans and whatnot. There is a fun quirk with unique versus non-unique nonclustered indexes.
In the users table, we have this account ID column, which is unique. It is entirely unique. It is a somewhat monotonically increasing number.
However, it is not exactly the same as the identity column that is the clustered primary key on the table, which is the ID column, but we can create a unique and a non-unique nonclustered index on the account ID column, right? I think I got that one. The other one’s showing me a red squiggle.
Yeah, okay, I got that one. So, when you create a non-unique nonclustered index, the clustered primary key acts as a hidden key column in the index definition. So, if we run this query and we tell SQL Server to use our non-unique index, we will get a double seek into the nonclustered index.
We can see both where ID and account ID are evaluated with the literal values that we passed in. So, we get a double seek there. In a unique nonclustered index, that clustered primary key column is more like an include in the index, where we are no longer able to double seek.
Notice that we do have a seek predicate here to account ID, but now we have a residual predicate up here on ID. So, that’s the truth about that. But when I talked about how nonclustered indexes have a relationship both to the table that they get created on and other nonclustered indexes.
Well, let’s take a look at some of that. There’s a type of plan that SQL Server can use, a type of execution plan SQL Server can use called index intersection. And hopefully, this all works out.
Where SQL Server has taken… Let’s do our magic zoom trick here. Oh, that’s too big. There we go.
SQL Server has done a seek into two non-clustered… So, what’s annoying is that it’s very… I don’t know why those ellipses kick in so soon, but if we were to look here, we would see an index seek on this nonclustered index, and then an index seek into this nonclustered index. And then SQL Server is able to merge join those two indexes together and produce the result that we care about.
And if we look at the merge join, it is a residual, well, this is a merge join, on the ID column in one index being matched to the ID column in the other index. So, when I talked about how nonclustered indexes inherit the clustered index key column in some form, depending on uniqueness, SQL Server does a merge join here because that ID column is the second key column. And so, that data is in order after we do equality predicates to the actual key column that we defined the index on.
Interesting stuff, right? If we look at index intersection, we can even get index intersection with a key lookup, right? And we’re going to hint SQL Server to use those indexes, right?
Because that’s the easiest way to get this sort of plan. But now, we have what we saw before, which is the merge join and the two nonclustered index thingies here. But now, we also have a lookup back to the clustered index to get that additional column, right?
Because now, we’re trying to get display name, but display name is not in either of our indexes. So, SQL Server can do index intersection plus a lookup. But the index hint order is significant, right?
So, up here, we have the non-unique account ID and reputation. And down here, we have reputation first and then the non-unique one, right? So, if you’re going to be the type of person who hints indexes, you may want to consider the order that you’re hinting them in because SQL Server will hit them in the order that you hint them, right? Because we hit reputation first in this one and we hit the non-unique one second in this one, right?
But up here in this query, right? If you look at this plan again, right? Execution plan, this hits non-unique first and then this one hits reputation second.
So, very interesting things when you start hinting in indexes, especially multiple indexes. We also have the concept of an index union plan. This is where SQL Server brings the results of two indexes together, like if you were doing a union or a union all type query, right?
So, this is merge join concatenation. This is not just a merge join. This is concatenating results from both of these.
And we’ll see once again that, well, we don’t really have much of a, we don’t have a join condition because this is not a merge join. It is a concatenation operator. But it still says merge join, which is a little confusing, but that’s okay.
I’m sure Microsoft is on the case, right? But we hit both of our nonclustered indexes and then we did that. And then we can also see that index union with a key lookup is also a perfectly valid plan choice, right? So, that’s all that happening in there.
So, index intersection happens most commonly with AND predicates. You can see, well, you can see there are a variety of different join types that might be chosen, sometimes even a hash join, depending on your index setup. Both indexes must be fairly selective.
The result set must be fairly small. The optimizer doesn’t choose these quite as often because usually it will do either a covering index or, you know, scan a clustered index or just do a regular old key lookup. Those don’t always require hints, but, you know, sometimes if you want it to happen, it’s not going to happen naturally.
You have to tell it to happen. Index union is more common with OR predicates. You see a concatenation operator that combines streams of data from two nonclustered indexes.
You may even see a stream aggregate to remove duplicates, which is a nice thing. Each branch has to be seekable. I guess it’s somewhat more commonly chosen than index intersection, but, you know, whatever.
They don’t get used a lot. Like, they don’t get, those plans don’t get chosen very often. The cost model assumes some overhead to, you know, doing that type of stuff that is unfriendly to them.
It prefers to hit a single index path when possible. Sometimes statistically, or rather, the statistics that you have, the histograms that you have, may not, like, present a very big benefit to the optimizer for using them. And then there’s always some risk of eliminating duplicates, adding a bunch of overhead, which, you know, for a, that’s why they have to be selective.
But they can be very beneficial if you have selective AND or OR conditions, and each predicate has a good candidate index to seek into. So, if you are sort of, you know, hoping for that type of execution plan, then you will have to sort of do some work to get one quite often, either with a foreseek hint or hinting the indexes to use in the appropriate order to use them. Anyway, that went on a lot longer than I thought it was gonna.
I’m gonna go finish my coffee now. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. Today is Friday. Nah, I shouldn’t have saved a long video for Friday.
No one watches the long videos on Friday. But I will see you on Monday, where we will do yet another Office Hours. And hope that someone out there cares.
Alright. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

The post Learn T-SQL With Erik: Nonclustered Index Interactions appeared first on Darling Data.

Read the whole story
alvinashcraft
25 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

GPT-5.5 Outperforms (and Hallucinates), Kimi K2.6 Leads Open LLMs, AI Strains Climate Pledges, Strategic Thinking in LLMs vs. Humans

1 Share
The Batch AI News and Insights: The ways we prompt AI are very different in 2026 than 2022 when ChatGPT came out.
Read the whole story
alvinashcraft
38 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Fighting Tool Sprawl: The Case for AI Tool Registries

1 Share

As enterprise AI agent adoption scales, the absence of centralized, organization-level tool infrastructure is producing compounding costs. When adoption is built around optimizing for deployment speed, enterprises expose themselves to a combination of risks: duplicated engineering effort, security exposure, and operational opacity.

Every enterprise needs its own shared tool registry, one that reflects its specific regulatory environment, security posture, and operational conventions. To be clear, this is not an argument for a public package manager, something like npm, PyPI, or Maven. The infrastructure each enterprise needs is internal; scoped to its own teams, its own data, its own policies, its own domain. Trying to expand the scope beyond the confines of individual organizations would be premature standardization in a fast-moving, nascent space.

A shared enterprise tool registry is not an optimization or a nice-to-have. It is foundational infrastructure as agent deployments scale beyond early experiments. The case for it rests on two pillars: reducing coordination cost and enabling risk management, both for the humans building with agents and for the agents themselves.

AI agents depend on tools that retrieve data, write records, trigger workflows, and call external APIs. According to McKinsey, in most large organizations these tools are built by individual teams in an ad hoc fashion: undocumented, ungoverned, and invisible to the rest of the organization. This pattern is familiar to most engineering leaders, and the fragmentation it creates compounds with every new agent deployment. Teams rebuild what already exists elsewhere, security reviews miss tools that were never registered, and when something breaks, no one has a complete picture of what is running or why.

A coordination failure at infrastructure scale

The software industry solved an analogous problem decades ago with package managers. Centralized registries gave teams a way to discover, depend on, and govern shared code. The learning was clear: preventing duplication and inconsistency is an infrastructure problem, not a discipline problem.

The agent era presents the same problem in a new domain. When Kong launched its enterprise MCP Registry in February 2026, they explicitly called out the problems of manual MCP configuration, hardcoded and managed tool isolation across teams, fragmented integrations, and limited organization visibility.

Fragmented tool development is not a consequence of poor engineering practice. Rather, it is the predictable outcome of asking teams to solve an infrastructure problem at the application layer.

The visibility problem

Gravitee’s ”The State of AI Agent Security 2026” survey quantifies what happens when agent tooling is invisible to the people responsible for securing it. The survey found that only 14.4% of teams with agents beyond the planning phase have full security approval, and 88% of organizations had an agent-related security incident this year. Bad practices like shared API keys are endemic, with only 22% of organizations treating agents as independent identities. This governance gap transforms agents from productivity boosters into high-velocity liabilities capable of executing unauthorized actions or leaking sensitive data before a human can even intervene.

The story is clear: adoption is outpacing governance, and in a race for speed old lessons are having to be retaught. The majority of deployed agents (and the MCP servers powering them!) are operating without any security sign-off. This is not primarily a resourcing failure, and it is not something a registry alone solves. Security teams cannot review what they cannot discover, and without a registry, discovery is manual, incomplete, and stale. A registry does not make tools inherently secure; rather, it makes security work possible by ensuring tools exist not as transitory, ad hoc shims, but rather as inventoried artifacts that audits and policy can attach to.

It is worth revisiting public package managers here. These registries have not been able to eliminate a number of security problems, issues such as typosquatting, malicious packages, and dependency confusion, showing clearly that centralization alone is not a security solution. But they also show the converse: a registry is a precondition for security. Numerous community responses to breaches in these ecosystems demonstrate the power centralization provides. Centralization does not guarantee security, but decentralization forfeits the means to coordinate it.

Governance requires shared context

The default posture in most agent deployments is permissive: tools are available unless explicitly blocked. AgilityFeat’s analysis of enterprise AI guardrails identifies the structural risk this creates, since an architecture not built on deny-by-default increases risk and creates upkeep costs.

Allow-by-default, replicated across dozens of independent agent deployments, produces an attack surface that scales with adoption. Inverting this requires a coordination point, a shared, organization-wide context. The registry itself isn’t a governance layer, but it is what makes governance possible. When every tool an agent can use is registered with ownership, version, and review status, the governance layer has something concrete to enforce against. Without that context, policy has to be reimplemented by every consuming team, and consistency becomes impossible.

Frontegg’s framework for AI agent governance describes what that policy layer looks like operationally: agent actions mapped to explicit, granular guardrails that define the operational boundaries for what any agent can attempt or execute. These guardrails live outside the registry, but they depend on it. A guardrail that references a tool the security team has never heard of cannot be written in the first place.

What a production-grade tool catalog requires

A mature enterprise tool registry has two core functions, discovery and versioning, and serves as the foundation for two others: certification metadata and access control. Think of it as an Internal Developer Portal (IDP) built for the agent era, solving the same coordination problem that IDPs solved for service teams but one layer up.

Discovery allows any team building an agent to search for existing tools before writing new ones. With ownership metadata, version history, and usage metrics centralized, duplication is reduced not through mandate but through reduced friction. A well-designed catalog goes further than a flat list: tools should be grouped hierarchically by functional domain so that both humans and agents can find relevant capabilities quickly.

Versioning closes a gap that neither discovery nor access controls address: When agent behavior changes, why did it change? A tool registry that tracks versions gives enterprises the visibility to answer that question. Was it the model? A tool prompt update? An underlying API change? Without proper versioning, finding the answer goes from a simple diff comparison to a time-consuming, manual investigation.

Certification status (things like security approval, API contract validation, PII handling checks) is metadata that the registry surfaces, not a boundary that the registry itself enforces. The actual review work happens through the security organization’s existing tooling. The registry’s contribution is making the result of that review visible at the moment a team is deciding whether to adopt a tool, ensuring the review actually informs the decision it was meant to inform.

Access control works the same way. A policy layer enforces authorization scoped to agent identity, team, environment, and action type, reading from the registry to know what tools exist and who owns them. The registry’s centralization lets access control be applied consistently, rather than forcing each team to come up with something bespoke.

None of this is achievable when each team maintains its own isolated tooling stack. Platform teams already understand why IDPs exist. The value of the paradigm in the agent context is no different.

The compounding cost of inaction

The cost of inaction is direct, not merely operational and security-related. Without a searchable, well-organized catalog of tools, teams continually reinvent the wheel, since it is easier to generate a tool than to find one that already exists. Duplication means redundancy and technical debt. A registry, by making tools discoverable and reusable, converts that redundant spend into capacity for actual work.

For platform engineering teams, the trajectory is clear. Agent adoption is increasing, tool duplication is increasing with it, and the shims that worked at small scale will not hold as the number of agents and tools grows. The security exposure documented in the Gravitee survey will widen, not narrow, without structural intervention.

The organizations that build centralized tool infrastructure now will be able to onboard new agents quickly, govern them consistently, and audit them when something goes wrong. Those that defer will rediscover, the hard way, what platform teams learned a decade ago: coordination problems do not resolve themselves at the application layer. They compound there.



Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories