While pgvector enables powerful semantic search, it doesn’t automatically keep embeddings in sync when your data changes, requiring manual updates. The pgai Vectorizer automatically keeps PostgreSQL vector embeddings in sync by generating and updating them whenever your data changes, removing the need for manual regeneration with pgvector.
It runs in the background using a worker that processes changes via queues, triggers, and embedding APIs. This makes it easy to build real-time semantic search in PostgreSQL using pgvector and TigerData’s pgai tools. Learn everything you need to know in this guide.
It’s no secret that PostgreSQL now stores vector embeddings of unstructured data using pgvector, enabling both relational and semantic search. When it comes to keeping your source data and corresponding AI-generated embeddings in sync during changes, however, pgvector falls short.
Simply put, it requires you to manually regenerate the vector embeddings to mirror any changes made in your PostgreSQL database. It doesn’t happen automatically.
Thankfully, the pgai Vectorizer tool, created by Timescale (now TigerData), is here to save the day. With a SQL command, it creates AI-generated vector embeddings and regenerates them when your source data changes.
Timescale also provides Docker images to quickly set up a PostgreSQL environment that is ready for pgai Vectorizer. In this article, I’ll use these images to demonstrate exactly how the tool works.
Before you continue reading…
Are you new to using pgvector in PostgreSQL? If so, please first read this article. It explains how pgvector works and how semantic search is handled in it.
What is the pgai Vectorizer tool for PostgreSQL?
The pgai Vectorizer tool uses pgvector under the hood to store and manage vector embeddings in PostgreSQL. It leverages pgai’s SQL functions to define how embeddings are generated – specifying the embedding provider to use, the source table, the column to load raw data from to embed, formatting, and so on. It runs outside your database and is always on standby.
When you create a Vectorizer, it processes the embedding asynchronously, as follows:
- A queue is set up in the database to track the columns that need embedding.
- Triggers ensure new or updated columns are added to this queue.
- A background worker runs and polls the queue for pending jobs.
- The worker processes jobs in batches, calls the embedding API (e.g., OpenAI, Ollama), and writes embeddings back to the database.
- It then processes any failed jobs on the next polling cycle.
Get started with PostgreSQL – free book download
Because it runs outside PostgreSQL, your database is isolated and immune to external API failures or latency problems. You can also scale it horizontally to handle more embedding workloads.
Note that the tool is third-party, not an official PostgreSQL extension, and depends on pgvector for storage, indexing, and similarity search.
How to use pgai Vectorizer in a self-hosted PostgreSQL database
To use pgai Vectorizer in a self-hosted PostgreSQL database, you must:
- Install pgai and its Vectorizer component (vectorizer-worker) as Python libraries.
- Deploy the pgai PostgreSQL extension and run the Vectorizer via the pgai CLI.
- Build and install Pgvectorscale from source if you want to use StreamingDiskANN indexing.
- Spin up and manage Vectorizer worker processes to manage embedding throughput.
See the official GitHub docs on how to use pgai Vectorizer on self-hosted and managed PostgreSQL databases.
Additionally, see the official GitHub docs containing the API reference for pgaiVectorizer.
Requirements to use pgai Vectorizer (what you need)
To use pgai Vectorizer, install Docker Engine and the Docker Compose plugin. If you’re on Windows or Mac OS, you also need Docker Desktop (which comes with Compose by default.)
You also need an embedding provider API key. The choice of embedding provider is up to you, but I use OpenAI in this article.
How to create a database and pgai Vectorizer worker
Open up a docker-compose.yml file with your default code editor and paste in the code snippets below:
name: pgai
services:
db:
image: timescale/timescaledb-ha:pg17
environment:
POSTGRES_PASSWORD: postgres
OPENAI_API_KEY: <your-openai-api-key>
ports:
- "5432:5432"
volumes:
- data:/home/postgres/pgdata/data
vectorizer-worker:
image: timescale/pgai-vectorizer-worker:latest
environment:
PGAI_VECTORIZER_WORKER_DB_URL: postgres://postgres:postgres@db:5432/postgres
OPENAI_API_KEY: <your-openai-api-key>
command: ["--poll-interval", "10s", "--log-level", "INFO"]
volumes:
data:This will pull and start a TimescaleDB PostgreSQL database instance and a single pgai Vectorizer worker. The database will be available on localhost:5432, and the vectorizer will automatically poll for embedding jobs every 10 seconds.
Start both containers:docker compose up -d
Verify that they are running:docker compose ps
You should have an output similar to this:
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
pgai-db-1 timescale/timescaledb-ha:pg17 "/docker-entrypoint.…" db 34 seconds ago Up 33 seconds 8008/tcp, 0.0.0.0:5432->5432/tcp, [::]:5432->5432/tcp, 8081/tcp
pgai-vectorizer-worker-1 timescale/pgai-vectorizer-worker:latest "python -m pgai vect…" vectorizer-worker 14 hours ago Up 27 minutesHow to set up, and run, pgai Vectorizer
To set up the pgai Vectorizer tool in your PostgreSQL database, run the following:docker compose run --rm --entrypoint "python -m pgai install -d postgres://postgres:postgres@db:5432/postgres" vectorizer-worker
This installs the necessary database objects under the ai schema which you can view with:
docker compose exec db psql -U postgres -c "\\dt ai.*".
Once it’s installed, you should have an output similar to 2026-03-21 03:45:17 [info ] pgai 0.12.1 installed.
How to create a table and insert relational data in pgai Vectorizer
Connect to your database instance (db) interactively with: docker compose exec db psql -U postgres
Create the table articles to work with:
CREATE TABLE articles (
id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
title TEXT,
author TEXT,
content TEXT
);Insert articles into the articles table:
INSERT INTO articles (title, author, content)
VALUES
(
'The World of Citrus Fruits',
'John Doe',
'Citrus fruits are among the most widely cultivated fruits in the world, valued for their refreshing flavor and impressive nutritional profile. The citrus family includes oranges, lemons, limes, grapefruits, and tangerines. Oranges alone account for a significant portion of global fruit production, with Brazil, China, and the United States being the largest producers. Citrus fruits are an excellent source of vitamin C, a powerful antioxidant that supports immune function and skin health. Beyond vitamin C, they contain folate, potassium, and beneficial plant compounds like flavonoids and carotenoids linked to reduced risk of chronic diseases. Citrus cultivation dates back thousands of years, with origins traced to Southeast Asia before spreading through the Middle East, Mediterranean, and eventually the Americas. Modern citrus farming faces challenges including pests, diseases like citrus greening, and climate change, which threaten yields in major producing regions.'
),
(
'Tropical Fruits and Their Health Benefits',
'Jane Smith',
'Tropical fruits thrive in warm, humid climates near the equator, offering an extraordinary range of flavors, textures, and nutritional benefits. Mangoes, pineapples, papayas, bananas, coconuts, and guavas are among the most popular tropical fruits enjoyed globally. The mango, often called the king of fruits, is particularly rich in vitamin A, vitamin C, and folate, and contains powerful antioxidants like mangiferin with anti-inflammatory properties. Pineapple contains bromelain, a unique enzyme that aids protein digestion and reduces inflammation. Papaya is celebrated for its digestive enzyme papain, which soothes digestive discomfort. Bananas provide a quick source of energy through natural sugars while delivering potassium, magnesium, and vitamin B6. The cultivation of tropical fruits plays a vital economic role in many developing countries, providing livelihoods for millions of small-scale farmers across Africa, Asia, and Latin America.'
),
(
'Stone Fruits: Nature and Nutrition',
'Bob Johnson',
'Stone fruits, also known as drupes, are characterized by their fleshy outer layer surrounding a hard pit that contains the seed. Peaches, plums, cherries, apricots, and nectarines all belong to this group, sharing a similar botanical structure despite differences in flavor and texture. Peaches are perhaps the most iconic stone fruit, native to Northwest China and rich in vitamins A and C, potassium, and dietary fiber. Cherries have attracted significant scientific interest due to their high concentration of anthocyanins, powerful antioxidants linked to reduced muscle soreness, improved sleep quality, and lower risk of heart disease. Plums and prunes are well known for their digestive benefits, containing sorbitol and dietary fiber that promote healthy bowel function. Climate change poses a growing challenge to stone fruit farmers, as milder winters are disrupting the chilling requirements that these trees depend on to produce fruit successfully.'
);How to create a vectorizer for your table in pgai Vectorizer
pgai AI’s schema provides several SQL functions to perform AI tasks in PostgreSQL. To create a vectorizer, you must use the ai.create_vectorizer function. Run the SQL query below to create a vectorizer for the articles table:
SELECT ai.create_vectorizer(
'articles'::regclass,
loading => ai.loading_column('content'),
embedding => ai.embedding_openai('text-embedding-3-small', 1536),
destination => ai.destination_table('articles_embeddings'),
formatting => ai.formatting_python_template('Title: $title\\nAuthor: $author\\n$chunk')
);From the SQL command above, the vectorizer will:
- Source data from the
articlestable, load contents from thecontentcolumn, and watch it for changes. - Generate embeddings using OpenAI’s
text-embedding-3-smallmodel. - Split text into chunks with overlap to preserve context recursively.
- Format input by prepending title and author metadata to each chunk.
- Store embeddings in a destination table (or view) named
articles_embeddings.
After running this command, the vectorizer worker will automatically generate and sync embeddings. You can monitor its progress with: SELECT * FROM ai.vectorizer_status;
If the pending_items column shows 1, it’s still processing your embeddings. If it shows 0, it’s up to date.
id | name | source_table | target_table | view | embedding_column | pending_items | disabled
----+----------------------------------+-----------------+----------------------------------+----------------------------+------------------+---------------+----------
1 | public_articles_embeddings_store | public.articles | public.articles_embeddings_store | public.articles_embeddings | embedding | 0 | f
(1 row)Alternatively, you can stream real-time logs from the vectorizer worker when embeddings are being generated:docker compose logs -f vectorizer-worker
You’ll see messages like running vectorizer, finished processing vectorizer and helpful messages for debugging in case there’s an error:
vectorizer-worker-1 | 2026-03-21 04:34:38 [info ] sleeping for 0:00:30 before polling for new work
vectorizer-worker-1 | 2026-03-21 04:35:08 [warning ] no vectorizers found
vectorizer-worker-1 | 2026-03-21 04:35:08 [info ] sleeping for 0:00:30 before polling for new work
vectorizer-worker-1 | 2026-03-21 04:35:38 [info ] running vectorizer vectorizer_id=1
vectorizer-worker-1 | 2026-03-21 04:35:56 [info ] finished processing vectorizer items=3 vectorizer_id=1
vectorizer-worker-1 | 2026-03-21 04:35:56 [info ] sleeping for 0:00:30 before polling for new work
vectorizer-worker-1 | 2026-03-21 04:36:26 [info ] running vectorizer vectorizer_id=1
vectorizer-worker-1 | 2026-03-21 04:36:27 [info ] finished processing vectorizer items=0 vectorizer_id=1
vectorizer-worker-1 | 2026-03-21 04:36:27 [info ] sleeping for 0:00:30 before polling for new workThe articles_embeddings view will include the original content of the content column, plus chunk and embedding for semantic search. You can query it with: SELECT * FROM articles_embeddings LIMIT 1;
Setting the limit to 1 helps to inspect the structure and content of the articles_embeddings view without loading large amounts of data. If you’d like to view all of its contents, omit the LIMIT 1 query parameter.
How to automate embeddings in pgai Vectorizer
The Vectorizer worker monitors changes through create, update, and delete operations to process embeddings in the background accordingly. This way, your view – articles_embeddings in this case – stays in sync with the latest content in the source table.
Update the articles table to trigger the vectorizer:
UPDATE articles
SET
title = 'The Brilliance of Berries',
content = 'Berries, including strawberries, blueberries, and raspberries, are vibrant fruits packed with fiber and antioxidants. Unlike citrus, they thrive in cooler temperate climates. Blueberries are famous for anthocyanins, which may help brain health and memory. They are often eaten fresh or used in desserts.'
WHERE id = 1;Stream the logs of the vectorizer to view it processing the update, using: docker compose logs -f vectorizer-worker
Insert an article into the article table:
INSERT INTO articles (title, author, content)
VALUES (
'Why Papaya Is Called the Fruit of the Angels',
'Carlos Rivera',
'Papaya, once referred to as the fruit of the angels by Christopher Columbus, is a tropical fruit native to Central America and southern Mexico. Today it is cultivated across tropical regions worldwide, with India, Brazil, and Indonesia among the largest producers. The papaya plant is unique in that it can begin bearing fruit within the first year of planting, making it one of the fastest-yielding fruit crops in tropical agriculture. Papayas are exceptionally rich in vitamin C, vitamin A, folate, and potassium. They also contain lycopene, a powerful antioxidant associated with reduced risk of heart disease and certain cancers. The fruit is perhaps best known for containing papain, a proteolytic enzyme found in both the fruit and its latex that breaks down proteins and is widely used in meat tenderizers, digestive supplements, and pharmaceutical applications. Unripe green papaya is commonly used in savory dishes across Southeast Asia, particularly in the popular Thai green papaya salad. Ripe papaya has a soft, buttery texture and a sweet, musky flavor that makes it a staple breakfast fruit across many tropical countries. Papaya cultivation faces threats from the papaya ringspot virus, a destructive pathogen that devastated Hawaiian papaya crops in the 1990s before the introduction of genetically modified virus-resistant varieties saved the industry.'
);The vectorizer will generate new embeddings for it in the background. And, with that, the process is complete.
What else can you do with pgai Vectorizer?
The pgai Vectorizer tool turns your PostgreSQL database into an AI powerhouse. What we’ve covered here is just one example of this. You can also perform a hybrid search and re-rank their results using re-ranking models like Cohere and Voyage AI. Or, why not translate natural language to SQL via semantic catalog?
Whether you decide to use it for more than just automating vector embeddings or not, you’ve now seen the power of the pgai Vectorizer tool in PostgreSQL. With its vectorizer, manual embedding lifecycles and stale embeddings are a thing of the past.
In this article, you saw this firsthand, with every create, insert, and update command you made being picked up and processed in the background. I hope you found the guide helpful, and feel free to share your thoughts in the comments below!
Simple Talk is brought to you by Redgate Software
FAQs: How to automate vector embeddings with pgai Vectorizer in PostgreSQL
1. What is pgai Vectorizer in PostgreSQL?
It’s a tool from Timescale (TigerData) that automatically generates and updates AI embeddings in PostgreSQL using pgvector.
2. Does pgvector update embeddings automatically?
No. pgvector stores embeddings but requires manual updates when source data changes.
3. How does pgai Vectorizer work?
It uses SQL-defined configurations, triggers, a job queue, and a background worker to generate and refresh embeddings automatically.
4. What is pgai used for?
pgai enables AI workflows in PostgreSQL, including automatic embeddings, semantic search, and RAG pipelines.
5. Do I need Docker to use pgai Vectorizer?
The post How to automate vector embeddings with pgai Vectorizer in PostgreSQL appeared first on Simple Talk.
