Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154373 stories
·
33 followers

How to automate vector embeddings with pgai Vectorizer in PostgreSQL

While pgvector enables powerful semantic search, it doesn’t automatically keep embeddings in sync when your data changes, requiring manual updates. The pgai Vectorizer automatically keeps PostgreSQL vector embeddings in sync by generating and updating them whenever your data changes, removing the need for manual regeneration with pgvector.

It runs in the background using a worker that processes changes via queues, triggers, and embedding APIs. This makes it easy to build real-time semantic search in PostgreSQL using pgvector and TigerData’s pgai tools. Learn everything you need to know in this guide.

It’s no secret that PostgreSQL now stores vector embeddings of unstructured data using pgvector, enabling both relational and semantic search. When it comes to keeping your source data and corresponding AI-generated embeddings in sync during changes, however, pgvector falls short.

Simply put, it requires you to manually regenerate the vector embeddings to mirror any changes made in your PostgreSQL database. It doesn’t happen automatically.

Thankfully, the pgai Vectorizer tool, created by Timescale (now TigerData), is here to save the day. With a SQL command, it creates AI-generated vector embeddings and regenerates them when your source data changes.

Timescale also provides Docker images to quickly set up a PostgreSQL environment that is ready for pgai Vectorizer. In this article, I’ll use these images to demonstrate exactly how the tool works.

Before you continue reading…
Are you new to using pgvector in PostgreSQL? If so, please first read this article. It explains how pgvector works and how semantic search is handled in it.

What is the pgai Vectorizer tool for PostgreSQL?

The pgai Vectorizer tool uses pgvector under the hood to store and manage vector embeddings in PostgreSQL. It leverages pgai’s SQL functions to define how embeddings are generated – specifying the embedding provider to use, the source table, the column to load raw data from to embed, formatting, and so on. It runs outside your database and is always on standby.

When you create a Vectorizer, it processes the embedding asynchronously, as follows:

  • A queue is set up in the database to track the columns that need embedding.
  • Triggers ensure new or updated columns are added to this queue.
  • A background worker runs and polls the queue for pending jobs.
  • The worker processes jobs in batches, calls the embedding API (e.g., OpenAI, Ollama), and writes embeddings back to the database.
  • It then processes any failed jobs on the next polling cycle.

Get started with PostgreSQL – free book download

‘Introduction to PostgreSQL for the data professional’, written by Grant Fritchey and Ryan Booz, covers all the basics of how to get started with PostgreSQL.
Download your free copy

Because it runs outside PostgreSQL, your database is isolated and immune to external API failures or latency problems. You can also scale it horizontally to handle more embedding workloads.

Note that the tool is third-party, not an official PostgreSQL extension, and depends on pgvector for storage, indexing, and similarity search.

How to use pgai Vectorizer in a self-hosted PostgreSQL database

To use pgai Vectorizer in a self-hosted PostgreSQL database, you must:

See the official GitHub docs on how to use pgai Vectorizer on self-hosted and managed PostgreSQL databases.

Additionally, see the official GitHub docs containing the API reference for pgaiVectorizer.

Requirements to use pgai Vectorizer (what you need)

To use pgai Vectorizer, install Docker Engine and the Docker Compose plugin. If you’re on Windows or Mac OS, you also need Docker Desktop (which comes with Compose by default.)

You also need an embedding provider API key. The choice of embedding provider is up to you, but I use OpenAI in this article.

How to create a database and pgai Vectorizer worker

Open up a docker-compose.yml file with your default code editor and paste in the code snippets below:

name: pgai
services:
  db:
    image: timescale/timescaledb-ha:pg17
    environment:
      POSTGRES_PASSWORD: postgres
      OPENAI_API_KEY: <your-openai-api-key>
    ports:
      - "5432:5432"
    volumes:
      - data:/home/postgres/pgdata/data
  vectorizer-worker:
    image: timescale/pgai-vectorizer-worker:latest
    environment:
      PGAI_VECTORIZER_WORKER_DB_URL: postgres://postgres:postgres@db:5432/postgres
      OPENAI_API_KEY: <your-openai-api-key>
    command: ["--poll-interval", "10s", "--log-level", "INFO"]
volumes:
  data:

This will pull and start a TimescaleDB PostgreSQL database instance and a single pgai Vectorizer worker. The database will be available on localhost:5432, and the vectorizer will automatically poll for embedding jobs every 10 seconds.

Start both containers:
docker compose up -d

Verify that they are running:
docker compose ps

You should have an output similar to this:

NAME                       IMAGE                                     COMMAND                  SERVICE             CREATED          STATUS          PORTS
pgai-db-1                  timescale/timescaledb-ha:pg17             "/docker-entrypoint.…"   db                  34 seconds ago   Up 33 seconds   8008/tcp, 0.0.0.0:5432->5432/tcp, [::]:5432->5432/tcp, 8081/tcp
pgai-vectorizer-worker-1   timescale/pgai-vectorizer-worker:latest   "python -m pgai vect…"   vectorizer-worker   14 hours ago     Up 27 minutes

How to set up, and run, pgai Vectorizer

To set up the pgai Vectorizer tool in your PostgreSQL database, run the following:
docker compose run --rm --entrypoint "python -m pgai install -d postgres://postgres:postgres@db:5432/postgres" vectorizer-worker

This installs the necessary database objects under the ai schema which you can view with:
docker compose exec db psql -U postgres -c "\\dt ai.*".

Once it’s installed, you should have an output similar to 2026-03-21 03:45:17 [info ] pgai 0.12.1 installed.

How to create a table and insert relational data in pgai Vectorizer

Connect to your database instance (db) interactively with: docker compose exec db psql -U postgres

Create the table articles to work with:

CREATE TABLE articles (
id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
title TEXT,
author TEXT,
content TEXT
);

Insert articles into the articles table:

INSERT INTO articles (title, author, content)
VALUES 
(
    'The World of Citrus Fruits',
    'John Doe',
    'Citrus fruits are among the most widely cultivated fruits in the world, valued for their refreshing flavor and impressive nutritional profile. The citrus family includes oranges, lemons, limes, grapefruits, and tangerines. Oranges alone account for a significant portion of global fruit production, with Brazil, China, and the United States being the largest producers. Citrus fruits are an excellent source of vitamin C, a powerful antioxidant that supports immune function and skin health. Beyond vitamin C, they contain folate, potassium, and beneficial plant compounds like flavonoids and carotenoids linked to reduced risk of chronic diseases. Citrus cultivation dates back thousands of years, with origins traced to Southeast Asia before spreading through the Middle East, Mediterranean, and eventually the Americas. Modern citrus farming faces challenges including pests, diseases like citrus greening, and climate change, which threaten yields in major producing regions.'
),
(
    'Tropical Fruits and Their Health Benefits',
    'Jane Smith',
    'Tropical fruits thrive in warm, humid climates near the equator, offering an extraordinary range of flavors, textures, and nutritional benefits. Mangoes, pineapples, papayas, bananas, coconuts, and guavas are among the most popular tropical fruits enjoyed globally. The mango, often called the king of fruits, is particularly rich in vitamin A, vitamin C, and folate, and contains powerful antioxidants like mangiferin with anti-inflammatory properties. Pineapple contains bromelain, a unique enzyme that aids protein digestion and reduces inflammation. Papaya is celebrated for its digestive enzyme papain, which soothes digestive discomfort. Bananas provide a quick source of energy through natural sugars while delivering potassium, magnesium, and vitamin B6. The cultivation of tropical fruits plays a vital economic role in many developing countries, providing livelihoods for millions of small-scale farmers across Africa, Asia, and Latin America.'
),
(
    'Stone Fruits: Nature and Nutrition',
    'Bob Johnson',
    'Stone fruits, also known as drupes, are characterized by their fleshy outer layer surrounding a hard pit that contains the seed. Peaches, plums, cherries, apricots, and nectarines all belong to this group, sharing a similar botanical structure despite differences in flavor and texture. Peaches are perhaps the most iconic stone fruit, native to Northwest China and rich in vitamins A and C, potassium, and dietary fiber. Cherries have attracted significant scientific interest due to their high concentration of anthocyanins, powerful antioxidants linked to reduced muscle soreness, improved sleep quality, and lower risk of heart disease. Plums and prunes are well known for their digestive benefits, containing sorbitol and dietary fiber that promote healthy bowel function. Climate change poses a growing challenge to stone fruit farmers, as milder winters are disrupting the chilling requirements that these trees depend on to produce fruit successfully.'
);

How to create a vectorizer for your table in pgai Vectorizer

pgai AI’s schema provides several SQL functions to perform AI tasks in PostgreSQL. To create a vectorizer, you must use the ai.create_vectorizer function. Run the SQL query below to create a vectorizer for the articles table:

SELECT ai.create_vectorizer(
  'articles'::regclass,
  loading => ai.loading_column('content'),           
  embedding => ai.embedding_openai('text-embedding-3-small', 1536), 
  destination => ai.destination_table('articles_embeddings'),
  formatting => ai.formatting_python_template('Title: $title\\nAuthor: $author\\n$chunk') 
);

From the SQL command above, the vectorizer will:

  • Source data from the articles table, load contents from the content column, and watch it for changes.
  • Generate embeddings using OpenAI’s text-embedding-3-small model.
  • Split text into chunks with overlap to preserve context recursively.
  • Format input by prepending title and author metadata to each chunk.
  • Store embeddings in a destination table (or view) named articles_embeddings.

After running this command, the vectorizer worker will automatically generate and sync embeddings. You can monitor its progress with: SELECT * FROM ai.vectorizer_status;

If the pending_items column shows 1, it’s still processing your embeddings. If it shows 0, it’s up to date.

id |               name               |  source_table   |           target_table           |            view            | embedding_column | pending_items | disabled 
----+----------------------------------+-----------------+----------------------------------+----------------------------+------------------+---------------+----------
  1 | public_articles_embeddings_store | public.articles | public.articles_embeddings_store | public.articles_embeddings | embedding        |             0 | f
(1 row)

Alternatively, you can stream real-time logs from the vectorizer worker when embeddings are being generated:
docker compose logs -f vectorizer-worker

You’ll see messages like running vectorizer, finished processing vectorizer and helpful messages for debugging in case there’s an error:

vectorizer-worker-1  | 2026-03-21 04:34:38 [info     ] sleeping for 0:00:30 before polling for new work
vectorizer-worker-1  | 2026-03-21 04:35:08 [warning  ] no vectorizers found
vectorizer-worker-1  | 2026-03-21 04:35:08 [info     ] sleeping for 0:00:30 before polling for new work
vectorizer-worker-1  | 2026-03-21 04:35:38 [info     ] running vectorizer             vectorizer_id=1
vectorizer-worker-1  | 2026-03-21 04:35:56 [info     ] finished processing vectorizer items=3 vectorizer_id=1
vectorizer-worker-1  | 2026-03-21 04:35:56 [info     ] sleeping for 0:00:30 before polling for new work
vectorizer-worker-1  | 2026-03-21 04:36:26 [info     ] running vectorizer             vectorizer_id=1
vectorizer-worker-1  | 2026-03-21 04:36:27 [info     ] finished processing vectorizer items=0 vectorizer_id=1
vectorizer-worker-1  | 2026-03-21 04:36:27 [info     ] sleeping for 0:00:30 before polling for new work

The articles_embeddings view will include the original content of the content column, plus chunk and embedding for semantic search. You can query it with: SELECT * FROM articles_embeddings LIMIT 1;  

Setting the limit to 1 helps to inspect the structure and content of the articles_embeddings view without loading large amounts of data. If you’d like to view all of its contents, omit the LIMIT 1 query parameter.

How to automate embeddings in pgai Vectorizer

The Vectorizer worker monitors changes through create, update, and delete operations to process embeddings in the background accordingly. This way, your view – articles_embeddings in this case – stays in sync with the latest content in the source table.

Update the articles table to trigger the vectorizer:

UPDATE articles 
SET 
    title = 'The Brilliance of Berries',
    content = 'Berries, including strawberries, blueberries, and raspberries, are vibrant fruits packed with fiber and antioxidants. Unlike citrus, they thrive in cooler temperate climates. Blueberries are famous for anthocyanins, which may help brain health and memory. They are often eaten fresh or used in desserts.'
WHERE id = 1;

Stream the logs of the vectorizer to view it processing the update, using: docker compose logs -f vectorizer-worker

Insert an article into the article table:

INSERT INTO articles (title, author, content)
VALUES (
    'Why Papaya Is Called the Fruit of the Angels',
    'Carlos Rivera',
    'Papaya, once referred to as the fruit of the angels by Christopher Columbus, is a tropical fruit native to Central America and southern Mexico. Today it is cultivated across tropical regions worldwide, with India, Brazil, and Indonesia among the largest producers. The papaya plant is unique in that it can begin bearing fruit within the first year of planting, making it one of the fastest-yielding fruit crops in tropical agriculture. Papayas are exceptionally rich in vitamin C, vitamin A, folate, and potassium. They also contain lycopene, a powerful antioxidant associated with reduced risk of heart disease and certain cancers. The fruit is perhaps best known for containing papain, a proteolytic enzyme found in both the fruit and its latex that breaks down proteins and is widely used in meat tenderizers, digestive supplements, and pharmaceutical applications. Unripe green papaya is commonly used in savory dishes across Southeast Asia, particularly in the popular Thai green papaya salad. Ripe papaya has a soft, buttery texture and a sweet, musky flavor that makes it a staple breakfast fruit across many tropical countries. Papaya cultivation faces threats from the papaya ringspot virus, a destructive pathogen that devastated Hawaiian papaya crops in the 1990s before the introduction of genetically modified virus-resistant varieties saved the industry.'
);

The vectorizer will generate new embeddings for it in the background. And, with that, the process is complete.

What else can you do with pgai Vectorizer?

The pgai Vectorizer tool turns your PostgreSQL database into an AI powerhouse. What we’ve covered here is just one example of this. You can also perform a hybrid search and re-rank their results using re-ranking models like Cohere and Voyage AI. Or, why not translate natural language to SQL via semantic catalog?

Whether you decide to use it for more than just automating vector embeddings or not, you’ve now seen the power of the pgai Vectorizer tool in PostgreSQL. With its vectorizer, manual embedding lifecycles and stale embeddings are a thing of the past.

In this article, you saw this firsthand, with every create, insert, and update command you made being picked up and processed in the background. I hope you found the guide helpful, and feel free to share your thoughts in the comments below!

Simple Talk is brought to you by Redgate Software

Take control of your databases with the trusted Database DevOps solutions provider. Automate with confidence, scale securely, and unlock growth through AI.
Discover how Redgate can help you

FAQs: How to automate vector embeddings with pgai Vectorizer in PostgreSQL

1. What is pgai Vectorizer in PostgreSQL?

It’s a tool from Timescale (TigerData) that automatically generates and updates AI embeddings in PostgreSQL using pgvector.

2. Does pgvector update embeddings automatically?

No. pgvector stores embeddings but requires manual updates when source data changes.

3. How does pgai Vectorizer work?

It uses SQL-defined configurations, triggers, a job queue, and a background worker to generate and refresh embeddings automatically.

4. What is pgai used for?

pgai enables AI workflows in PostgreSQL, including automatic embeddings, semantic search, and RAG pipelines.

5. Do I need Docker to use pgai Vectorizer?

Yes, for self-hosted setups Docker is commonly used to run PostgreSQL and the vectorizer worker.

The post How to automate vector embeddings with pgai Vectorizer in PostgreSQL appeared first on Simple Talk.

Read the whole story
Share this story
Delete

Podcast: Chasing Efficient Java Development: From 1BRC to Developing Hardwood AI Natively

1 Share

Gunnar Morling, technologist at Confluent and Java Champion, shares his experiences with building high-performance applications in Java, especially in the data space. He shares insights from experiments with building durable execution engines, bootstrapping, and AI natively developing Apache Hardwood - a minimal dependencies Java parser for Apache Parquet.

By Gunnar Morling
Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Your first look at Unreal Engine 6 comes courtesy of a much shinier looking Rocket League teaser

1 Share

Epic Games have made no secret of the next release of Unreal Engine; they first shared some details about the follow-up back in 2024, namely that it's being built around the idea of "interoperable content" that can be switched between any game that uses the engine. Now, Unreal Engine 6 has been somewhat formally revealed through the lens of a "new era" of Rocket League during the game's Paris Majors semi-finals.

Read more

Read the whole story
alvinashcraft
17 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Computer-using agents in Microsoft Copilot Studio are now generally available

1 Share

The next chapter of enterprise AI isn't about chatting with assistants—it's about agents that actually do the work. Until now, automating long-tail, UI-driven business processes meant either building and maintaining brittle RPA scripts or waiting on APIs that legacy systems were never going to expose.

That gap has kept some of the most valuable workflows—the ones buried in vendor portals, internal web apps, and proprietary line-of-business systems—out of reach for modern automation. For enterprise IT teams, the challenge hasn’t just been automating these workflows. It’s been doing so in a way that remains secure, governable, and scalable across the business.

The gap is now closing. Computer use in Microsoft Copilot Studio is now generally available, and we're expanding availability to all commercial geographies in Microsoft Power Platform.

New computer use features generally available

With this release, every Copilot Studio maker can build agents that don't just reason and respond—they take action directly inside any application a person can use. For IT teams, GA represents more than a new automation capability; it’s a shift toward a more governable and enterprise-ready model for AI-driven work. Organizations can better standardize how agents operate across applications while maintaining security, observability, and administrative control through the Power Platform admin center.

With this release, computer use delivers:

  • Global availability across all commercial Power Platform geos, so customers in every region can deploy computer use in agents under their tenant's data residency and compliance boundaries.
  • Secure authentication with built in credentials and Azure Key Vault when signing in to website or desktop applications.
  • Enterprise governance built in, allowing lists for websites or desktop applications and native Power Platform governance capabilities such as DLP policies, environment isolation, and audit trails.
  • Human-in-the-loop checkpoints for low-confidence steps, exceptions, and decisions that require an operator's approval.
  • Run history and observability, so makers and admins can see exactly what the agent saw, what it clicked, and why. Logs are also propagated to Purview and Dataverse for audits and admin review.
  • Model choice for your agents, with models from OpenAI and Anthropic.

Add computer use as a tool in a Copilot Studio agent

Reach every system, including the ones without APIs

Computer use gives an agent the same tools a person has: a browser, a screen, a keyboard, and the ability to read what's on the page and take the next logical step. Instead of brittle selector-based automation, the computer use tool uses vision and reasoning to navigate live UIs—adapting when layouts shift, fields move, or workflows branch.

For organizations with deep investments in proprietary platforms or third-party portals, this changes the math on automation. Workflows that previously required either a multi-quarter integration project or an army of contractors clicking through screens can now be handed to an agent.

For enterprise IT organizations, this can also reduce pressure to modernize or rebuild every legacy workflow before automation can begin. That helps teams extend the value of existing systems while still moving toward broader AI transformation goals.

Customer spotlight: Graebel automates global service order processing end to end

Graebel, a global leader in talent mobility with approximately 1,500 employees, manages thousands of cross-border employee relocations every year for multinational clients. A significant share of those relocation requests arrives the way most enterprise work arrives: as free-form emails, full of unstructured instructions, attachments, and edge cases. Each email had to be read, interpreted, and entered by hand into Graebel's proprietary Global Connect platform.

Global Connect couldn’t support a API-based integration, and earlier robotic process automation (RPA) attempts proved too rigid to keep up with the variability of human-written emails. Graebel needed automation that could use reasoning, not just click.

Working with GET AI and Microsoft, Graebel built and deployed the Graebel Service Order Agent, equipped with computer use, in Microsoft Copilot Studio. The agent now:

  • Monitors designated mailboxes and interprets unstructured service-order emails using Azure Content Understanding, extracting key data into structured form with confidence scoring.
  • Validates each request against Graebel's business rules, service logic, and compliance requirements before any action is taken.
  • Operates Global Connect directly through its UI—navigating screens, entering data, and completing transactions exactly as a trained human operator would, without APIs or platform redevelopment.
  • Escalates exceptions and low-confidence cases through human-in-the-loop workflows, preserving governance and service quality.

Architecture of Graebel’s Power Automate flow and custom Service Order agent

“By adopting Microsoft Copilot Studio and AI agents, we’ve moved beyond traditional automation to a more intelligent, scalable operating model. This initiative strengthens our ability to serve clients faster and more accurately while positioning Graebel for long-term growth.” - Matt Brownlee, Chief Revenue Officer, Graebel

The Service Order Agent is live today and processing real volume, with an architecture designed to scale across more than 30 relocation service categories. For Graebel, the results include a meaningful reduction in manual effort, faster service-order turnaround, more consistent data quality, and a repeatable blueprint for bringing intelligent automation to the rest of their operations.

Read how Graebel drives growth and automation with Power Platform and Copilot Studio.

How to get started

Ready to try computer‑using agents in Copilot Studio?

  1. Create or open an agent in Microsoft Copilot Studio.
  2. Go to Tools → Add tool → Add new computer use.
  3. Describe the task you want the agent to perform in natural language.

For deeper guidance, configuration details, and best practices, see the computer use documentation.

Before you go: We’re actively investing in advanced governance, operations, and scale for CUAs—and customer feedback directly informs the roadmap. Tell us what you think of the latest CUA updates today:

Read the whole story
alvinashcraft
29 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Reading Notes #699

1 Share

This week's reading notes bring you the latest insights into AI, .NET, open-source development, and even a few social hacks! From exploring background tasks in Blazor to the fascinating debate on Markdown vs. HTML for AI output, this roundup has something for everyone.

Jean-Olivier P. presenting at MsDevMtl user group

Let me know if you find anything particularly interesting; I'd love to hear your thoughts!

Programming

AI

Open Source

Podcasts

Miscellaneous


Sharing my Reading Notes is a habit I started a long time ago, where I share a list of all the articles, blog posts, and books that catch my interest during the week. 

 ~frank
Read the whole story
alvinashcraft
52 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Automating Intakes to the Awesome Copilot Marketplace

1 Share

One of the things that comes with maintaining the Awesome Copilot repo is that people want to contribute to it. And that’s great! We’ve had community contributions since the repo was first created. But recently we’ve been getting a growing number of requests from people wanting to list their external plugins in the marketplace — plugins that live in repos we don’t own, maintained by people we might not know.

That’s… a different proposition to someone submitting a PR with a resource that we can review directly.

The supply chain problem

Here’s the thing that kept me up at night: external plugins are essentially us telling Copilot “hey, go clone this repo and pull stuff out of it.” We become the front door. The marketplace passthrough. And if you’ve been paying any attention to the NPM ecosystem lately (or PyPI, or… look, pick your package manager, they’ve all had incidents), you know that being the trusted entry point to untrusted code is a really bad position to be in. Combine that with the rise of using AI in writing code, and Awesome Copilot being a directory of AI-powered plugins, and you have a recipe for… well, a lot of things that could go wrong.

I don’t need to be the reason someone’s machine gets owned because they trusted a plugin that we listed. That’s the kind of thing that keeps open source maintainers awake at 2am, staring at the ceiling, questioning their life choices.

So before we could open this up, we needed a process. Not just “yeah sure, send us a link and we’ll add it”. An actual, structured, auditable process with real security guardrails.

Designing the intake workflow

The goals were pretty straightforward:

  1. Transparency — submitters should be able to see exactly where they are in the process, and consumers should be able to see that a review happened.
  2. Automation — reduce the manual burden as much as possible, because I do not have time to manually validate every field in a submission.
  3. Security — pin submissions to immutable refs (SHAs or tags), not branches. Branches move with HEAD. SHAs cannot. And tags, well that can move, but they require a force push to change, which is at least a more deliberate action.
  4. Human oversight — automation is great, but someone with context still needs to make the final call.

What we landed on is a GitHub Issues-based workflow, which honestly felt like the most natural fit. Issues are already where people interact with repos, they have structured forms, they support automation via Actions, and they’re public by default. So it’s approachable, transparent, and auditable.

How it actually works

The process flows like this:

1. Submission via Issue form

We created a form-based issue template that captures everything we need: plugin name, description, repo URL, the ref or SHA to review against, and so on.

The form-based approach is important here. It’s not a freeform text area where people can write whatever they want, it’s structured fields that we can programmatically parse. This means we can validate the submission before a human ever looks at it.

2. Automated validation

When an issue is opened using the template, a GitHub Action fires and runs a validation script. This script checks:

  • All required fields are populated
  • The referenced repository actually exists and is public
  • The ref or SHA provided actually resolves to something real on the remote repo
  • The license is something we’re comfortable with
  • The structure looks correct

If any of these checks fail, the automation comments on the issue explaining what’s wrong, and the submitter can fix their issue and trigger a re-run with /rerun-intake.

3. JSON generation

If validation passes, the action generates a comment containing the exact JSON blob that would need to be added to our repository’s plugin definitions. This serves two purposes: it shows the submitter exactly what will be added, and it gives us (the maintainers) a copy-paste-ready block for the actual PR. Think of it as getting a preview of the PR diff before we even create the PR.

4. Manual review

This is where the human comes in. A maintainer looks at the actual plugin and performs the same kind of review that we would of anything that comes in via a PR. Does it do what it claims? Is it useful? Is the quality reasonable? Does it follow our responsible AI policies? Does it pass the “vibe check”? (C’mon, it’s AI, there’s gotta be some vibes in there!) This is the part you can’t automate, because it requires judgement and context.

After the review, we either /approve or /reject via a comment, which triggers the next automation step.

5. Automated PR (or rejection notice)

If approved, an Action automatically creates the PR to add the plugin to the repo. If rejected, the submitter gets a comment explaining why, and the issue is closed.

The iterative part

One of the things I’m most pleased about is how well the iterative feedback loop works. Take issue #1813 as an example. Someone submitted a plugin for review, the automated review flagged some issues, they edited their submission, ran /rerun-intake, the automation validated again, found more issues, they fixed those too, and then it was ready for a manual review.

That whole back-and-forth happened without any human maintainer involvement. The submitter got immediate feedback, knew exactly what to fix, and could iterate on their own timeline. That’s the kind of developer experience I was aiming for.

(Ultimately, the plugin was rejected after I did a manual review, but I wanted to highlight the process, not the outcome.)

Re-review after six months

Something we built into the process is a staleness check. After six months, approved external plugins get flagged for re-review. This mirrors what we already do for resources directly in the repo (via our staleness report), and it ensures that the external plugins stay maintained, useful, and not quietly hijacked by someone buying an expired domain or taking over an abandoned repo.

Individuals welcome

One decision I’m happy about is that this isn’t limited to organisations or official “partners”. If you’re an individual with a plugin that’s genuinely useful to the Copilot community, you can submit it through the same process. We might be a bit more thorough on the review (the rise of purely AI-generated submissions means quality varies… a lot), but the door is open.

Testing Actions is hard

I’ll be honest: I’m pretty amazed at just how smoothly the Actions and scripts are working, because they are super hard to test. You can’t really unit test “a GitHub Issue was opened with this specific form data and an Action should parse it, validate it, comment on it, and then wait for a slash command.” You kind of just… ship it and hope.

Ok, that’s not entirely true — we did test the validation logic in isolation, and we did dry-run the workflow a bunch of times against test issues. But the integration between all the moving parts (issue forms, action triggers, comment parsing, slash command detection, PR creation) is the kind of thing where you cross your fingers and watch the first real submission come through.

And it worked. First time. Well, mostly first time. Look, there were a few tweaks needed, but nothing that was visible to submitters, so it counts.

What’s next

Now that the process is live, the next challenge is scale. As more submissions come in, the manual review step becomes the bottleneck. I’m exploring ways to leverage the other automation workflows that we have in Awesome Copilot and apply them to this process, but baby steps.

But for now, I’m just enjoying the fact that we have a transparent, auditable, secure-ish process for external plugins, and that it all runs on GitHub’s native primitives — Issues, Actions, and PRs. No external services, no special tooling, just the platform doing what it does best.

If you’ve got a Copilot plugin you’d like listed in the marketplace, open a submission and let the robots take it from there. Well, the robots and me. Eventually me. I’ll get to it. One day. Promise.

Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories