
Large language models have changed how we interact with information, but they have one fundamental limitation: their knowledge is frozen in time. They can’t access real-time data or information from private, proprietary documents because they only know what they’ve been trained on. This is where RAG comes in. By connecting LLMs to external knowledge sources, RAG makes them smarter, more accurate, and more useful.
RAG is an AI technique that improves large language models by allowing them to retrieve relevant external information before generating a response. Instead of relying solely on pre-trained knowledge, RAG searches connected data sources, such as documents or databases, to provide more accurate, up-to-date, and context-aware answers.
Think of it like an open-book exam. An LLM on its own is like a student trying to answer questions from memory. A RAG-powered LLM is like that same student having a curated set of textbooks and notes to consult before writing their answer. This process improves the accuracy and relevance of the LLM’s output, reduces the risk of generating incorrect or fabricated information (known as “hallucinations”), and allows it to answer questions about data it wasn’t trained on.
The RAG process generally follows these steps:

Graph RAG is a more sophisticated approach that uses a knowledge graph as its external data source. A knowledge graph organizes information as a network of entities (nodes) and their relationships (edges). For example, a node could be a person, a company, or a product, while an edge could represent a relationship like “works for,” “acquired,” or “is a component of.”
Instead of just searching for text chunks that are semantically similar to a query, graph RAG traverses the network of relationships to find highly contextual, interconnected information. It understands not just what things are but also how they relate to each other. This allows it to answer complex questions that require understanding relationships, patterns, and hierarchies within the data.
Vector RAG is currently the most common implementation of the RAG framework. It uses a vector database to store and retrieve information. In this approach, text data (e.g., documents, articles, web pages) is broken down into smaller chunks, and each chunk is converted into a numerical representation called a vector embedding using an embedding model.
When a user submits a query, the query itself is also converted into a vector. The system then performs a similarity search within the vector database to find the text chunks whose vectors are closest to the query vector. These semantically similar chunks are then passed to the LLM as context.

Choosing between graph RAG and vector RAG depends entirely on your data and the types of questions you need to answer.
Use graph RAG when:
Use vector RAG when:
The debate isn’t about which RAG method will “win.” The future of RAG is hybrid. The most powerful AI systems will combine the strengths of both graph RAG and vector RAG.
Imagine a system that performs a vector search to quickly identify a relevant set of documents. Then, it uses a knowledge graph constructed from those documents to explore the specific relationships between entities mentioned. This multi-layered approach provides both the speed and scale of vector search and the depth and precision of graph traversal. This hybrid model allows an LLM to answer a broader range of questions with greater accuracy and context than either system could alone.
To continue learning about retrieval-augmented generation, you can review the resources below:
What are the main advantages of graph RAG over vector RAG? The main advantages are its ability to understand and utilize explicit relationships within data, answer complex multi-hop questions, and provide greater explainability for its answers by tracing the query path through the graph.
Can you combine graph RAG and vector RAG into a single system? Yes, and this is becoming a powerful pattern. A hybrid approach can use vector search for initial, broad retrieval, then use a knowledge graph to refine context and explore specific relationships, leveraging the strengths of both methods.
Is graph RAG or vector RAG better for large-scale enterprise data? It depends on the type of data. If the enterprise data is a massive collection of unstructured documents (reports, emails, etc.), vector RAG is a great starting point. If the data involves complex relationships (e.g., organizational charts, customer interaction histories, product dependencies), graph RAG will deliver more value and deeper insights.
How do graph databases differ from vector databases in RAG applications? Graph databases store data as nodes and edges, optimized for querying relationships. Vector databases store data as high-dimensional vectors and are optimized to find the nearest neighbors of a query vector using a distance metric. One stores explicit connections, while the other stores semantic similarity.
Does graph RAG require more computational resources than vector RAG? The upfront resource requirement for graph RAG can be higher, particularly in the data modeling and ingestion phase. However, for certain complex queries, traversing a well-structured graph can be more efficient than sifting through thousands of semantically similar but potentially irrelevant text chunks retrieved by a vector search. Query performance depends heavily on the specific use case and database optimization.
The post A Breakdown of Graph RAG vs. Vector RAG appeared first on The Couchbase Blog.
Every MCP tutorial I've found so far has followed the same basic script: build a server, point Claude Desktop at it, screenshot the chat window, done.
This is fine if you want a demo. But it's not fine if you want something you can ship, defend in an interview, or hand to another developer without a README that starts with "first, install this Electron app."
So I built an MCP server in Python, containerized it with Docker, and wired it into Claude Code – all from the terminal, no GUI required.
This article walks through the full loop in one afternoon: what MCP actually is, why it matters now that OpenAI and Google have adopted it, the real security problems nobody puts in their tutorial (complete with CVEs), and every command you need to go from an empty directory to a working tool.
If you're between jobs and need a portfolio project that shows you understand how AI tooling actually works under the hood, this is the one.
By the end of this tutorial, you will have:
A Python MCP server that exposes custom tools to any MCP-compatible AI client
A Docker container that packages the server for reproducible deployment
A working connection between that container and Claude Code in your terminal
An understanding of the security risks involved and how to mitigate the worst of them
The server we are building is a project scaffolder. You give it a project name and a language, and it generates a starter directory structure with the right files. It's simple enough to build in an afternoon, but useful enough to actually put on your résumé.
You will need the following installed on your machine:
Python 3.10+ (check with python3 --version)
Docker (check with docker --version)
Claude Code with an active Claude Pro, Max, or API plan (check with claude --version)
Node.js 20+ (required by Claude Code – check with node --version)
A terminal you are comfortable in
If you don't have Claude Code installed yet, follow the official installation instructions. The npm installation method is deprecated, so make sure you use the native binary installer instead.
The Model Context Protocol (MCP) is an open standard that lets AI models connect to external tools and data sources. Anthropic released it in November 2024, and within a year it became the default way to extend what an LLM can do. OpenAI adopted it in March 2025. Google DeepMind followed in April. The protocol now has over 97 million monthly SDK downloads and more than 10,000 active servers.
The easiest way to think about MCP is as a USB-C port for AI. Before MCP, every AI provider had its own way of calling tools. OpenAI had function calling. Google had their own format. If you wanted your tool to work with multiple models, you had to implement it multiple times. MCP gives you one interface that works everywhere.
Here is how the pieces fit together:
An MCP server exposes tools, resources, and prompts. It is your code.
An MCP client (like Claude Code, Claude Desktop, or Cursor) discovers those tools and calls them on behalf of the LLM.
The transport is how they communicate. For local servers, that's usually stdio (standard input/output). For remote servers, it's HTTP.
When you type a message in Claude Code and it decides to use one of your tools, here is what happens: Claude Code sends a JSON-RPC 2.0 message to your server over stdin, your server executes the tool and writes the result to stdout, and Claude Code reads it back. The LLM never talks to your server directly. The client is always in the middle.
If you want the deeper architecture breakdown, freeCodeCamp already has a solid explainer on how MCP works under the hood. Here, I will focus on building.
Most MCP tutorials use Claude Desktop as the client. That works, but Claude Code has a few advantages for developers:
It lives in your terminal. No GUI to configure. No JSON files to hand-edit in hidden config directories. You add an MCP server with one command and you are done.
It's already where you code. If you're writing the server, testing it, and connecting it, doing all of that in the same terminal session cuts the context switching.
It works on headless machines. If you're SSHing into a dev box or running in CI, Claude Desktop isn't an option. Claude Code is.
It's also an MCP server itself. Claude Code can expose its own tools (file reading, writing, shell commands) to other MCP clients via claude mcp serve. That's a neat trick we won't use today, but it's worth knowing about.
The relevant commands:
# Add an MCP server
claude mcp add <name> -- <command>
# List configured servers
claude mcp list
# Remove a server
claude mcp remove <name>
# Check MCP status inside Claude Code
/mcp
We're using FastMCP, a Python framework that handles all the protocol plumbing so you can focus on your tools. Create a new project directory and set it up:
mkdir mcp-scaffolder && cd mcp-scaffolder
python3 -m venv .venv
source .venv/bin/activate
pip install "mcp[cli]>=1.25,<2"
Why pin the version? The MCP Python SDK v2.0 is in development and will change the transport layer significantly. Pinning to >=1.25,<2 keeps your server working until you're ready to migrate.
Now create server.py:
# server.py
from mcp.server.fastmcp import FastMCP
import os
import json
mcp = FastMCP("project-scaffolder")
# Templates for different languages
TEMPLATES = {
"python": {
"files": {
"main.py": '"""Entry point."""\n\n\ndef main():\n print("Hello, world!")\n\n\nif __name__ == "__main__":\n main()\n',
"requirements.txt": "",
"README.md": "# {name}\n\nA Python project.\n\n## Setup\n\n```bash\npip install -r requirements.txt\npython main.py\n```\n",
".gitignore": "__pycache__/\n*.pyc\n.venv/\n",
},
"dirs": ["tests"],
},
"node": {
"files": {
"index.js": 'console.log("Hello, world!");\n',
"package.json": '{{\n "name": "{name}",\n "version": "1.0.0",\n "main": "index.js"\n}}\n',
"README.md": "# {name}\n\nA Node.js project.\n\n## Setup\n\n```bash\nnpm install\nnode index.js\n```\n",
".gitignore": "node_modules/\n",
},
"dirs": [],
},
"go": {
"files": {
"main.go": 'package main\n\nimport "fmt"\n\nfunc main() {{\n\tfmt.Println("Hello, world!")\n}}\n',
"go.mod": "module {name}\n\ngo 1.21\n",
"README.md": "# {name}\n\nA Go project.\n\n## Setup\n\n```bash\ngo run main.go\n```\n",
".gitignore": "bin/\n",
},
"dirs": ["cmd", "internal"],
},
}
@mcp.tool()
def scaffold_project(name: str, language: str) -> str:
"""Create a new project directory structure.
Args:
name: The project name (used as the directory name)
language: The programming language - one of: python, node, go
"""
language = language.lower().strip()
if language not in TEMPLATES:
return json.dumps({
"error": f"Unsupported language: {language}",
"supported": list(TEMPLATES.keys()),
})
template = TEMPLATES[language]
base_path = os.path.join(os.getcwd(), name)
if os.path.exists(base_path):
return json.dumps({
"error": f"Directory already exists: {name}",
})
# Create the project directory
os.makedirs(base_path, exist_ok=True)
# Create subdirectories
for dir_name in template["dirs"]:
os.makedirs(os.path.join(base_path, dir_name), exist_ok=True)
# Create files
created_files = []
for filename, content in template["files"].items():
filepath = os.path.join(base_path, filename)
formatted_content = content.replace("{name}", name)
with open(filepath, "w") as f:
f.write(formatted_content)
created_files.append(filename)
return json.dumps({
"status": "created",
"path": base_path,
"language": language,
"files": created_files,
"directories": template["dirs"],
})
@mcp.tool()
def list_templates() -> str:
"""List all available project templates and their contents."""
result = {}
for lang, template in TEMPLATES.items():
result[lang] = {
"files": list(template["files"].keys()),
"directories": template["dirs"],
}
return json.dumps(result, indent=2)
if __name__ == "__main__":
mcp.run(transport="stdio")
A few things to notice about this code:
Tools return strings. MCP tools communicate through text. I'm returning JSON strings so the LLM can parse the results reliably. You could return plain text, but structured data gives the model more to work with.
The @mcp.tool() decorator does the heavy lifting. FastMCP reads your function signature and docstring to generate the JSON schema that tells the LLM what this tool does, what arguments it takes, and what types they are. Good docstrings aren't optional here – they're how the LLM decides whether to call your tool.
transport="stdio" is the key line. This tells FastMCP to communicate over standard input/output, which is what Claude Code expects for local servers.
Before we Dockerize anything, make sure the server actually works:
# Quick smoke test - the server should start without errors
python server.py
You should see... nothing. That is correct. An MCP server over stdio just sits there waiting for JSON-RPC messages on stdin. Press Ctrl+C to stop it.
For a proper test, use the MCP Inspector (Anthropic's debugging tool):
# Install and run the inspector
npx @modelcontextprotocol/inspector python server.py
This opens a web interface where you can see your tools, call them manually, and inspect the JSON-RPC messages going back and forth. Verify that both scaffold_project and list_templates show up and return sensible results.
Here's a debugging tip that will save you time: If your MCP server logs anything to stdout, it will corrupt the JSON-RPC stream and the client will disconnect. Use stderr for all logging: print("debug info", file=sys.stderr). This is the single most common source of "my server connects but then immediately fails" bugs. The New Stack called stdio transport "incredibly fragile" for exactly this reason.
Create a Dockerfile in your project root:
FROM python:3.12-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy server code
COPY server.py .
# MCP servers over stdio need unbuffered output
ENV PYTHONUNBUFFERED=1
# The server reads from stdin and writes to stdout
CMD ["python", "server.py"]
Create requirements.txt:
mcp[cli]>=1.25,<2
Build and verify:
docker build -t mcp-scaffolder .
# Quick test - should start without errors
docker run -i mcp-scaffolder
Again, you'll see nothing because the server is waiting for input. Ctrl+C to stop.
Two things matter in this Dockerfile:
PYTHONUNBUFFERED=1 is critical. Without it, Python buffers stdout, and the MCP client may hang waiting for responses that are sitting in a buffer. This is one of those bugs that works fine in local testing and breaks in Docker.
docker run -i (interactive mode) is required. The -i flag keeps stdin open so the MCP client can send messages to the container. Without it, the server gets an immediate EOF and exits.
Now connect your Docker container to Claude Code:
claude mcp add scaffolder -- docker run -i --rm mcp-scaffolder
That's the whole command. Let me break it down:
claude mcp add registers a new MCP server
scaffolder is the name you will reference it by
Everything after -- is the command Claude Code runs to start the server
docker run -i --rm mcp-scaffolder starts the container with interactive stdin and removes it when done
Verify that it registered:
claude mcp list
You should see scaffolder in the output with a stdio transport type.
Now launch Claude Code and check the connection:
claude
Once inside Claude Code, type /mcp to see the status of your MCP servers. You should see scaffolder listed as connected with two tools available.
Still inside Claude Code, try it out:
Create a new Python project called "weather-api"
Claude Code should discover your scaffold_project tool, call it with name="weather-api" and language="python", and report back what it created. Check your filesystem and you should see the full project structure.
Try a few more:
What project templates are available?
Scaffold a Go project called "url-shortener"
If Claude Code doesn't pick up your tools, run /mcp to check the connection status. If it shows as disconnected, the most common causes are that the Docker image failed to build, stdout is being polluted (check for stray print statements), or the Docker daemon is not running.
This is the section most MCP tutorials skip. They should not. MCP has had real security incidents, not theoretical ones, and understanding them makes you a better developer.
MCP servers execute code on your machine based on what an LLM decides to do. If an attacker can influence what the LLM sees, they can influence what your server does. This is called prompt injection, and it is the number one unsolved security problem in the MCP ecosystem.
In May 2025, researchers at Invariant Labs demonstrated this against the official GitHub MCP server. They created a malicious GitHub issue that, when read by an AI agent, hijacked the agent into leaking private repository data (including salary information) into a public pull request. The root cause was an overly broad Personal Access Token combined with untrusted content landing in the LLM's context window.
This was not a contrived lab demo. It used the official GitHub MCP server, the kind of thing people install from the MCP server directory without a second thought.
The ecosystem has accumulated real vulnerability reports:
CVE-2025-6514: A critical command-injection bug in mcp-remote, a popular OAuth proxy that 437,000+ environments used. An attacker could execute arbitrary OS commands through crafted OAuth redirect URIs.
CVE-2025-6515: Session hijacking in oatpp-mcp through predictable session IDs, letting attackers inject prompts into other users' sessions.
MCP Inspector RCE: Anthropic's own debugging tool allowed unauthenticated remote code execution. Inspecting a malicious server meant giving the attacker a shell on your machine.
An Equixly security assessment found command injection in 43% of tested MCP server implementations. Nearly a third were vulnerable to server-side request forgery.
For the server we built today, here is what matters:
Our Docker container doesn't mount your home directory. That's intentional. If you need the server to write files to your host, mount only the specific directory you need: docker run -i --rm -v $(pwd)/projects:/app/projects mcp-scaffolder. Never mount / or ~.
Our scaffold_project tool checks that the language is in a known list and that the directory does not already exist. But think about what happens if someone passes name="../../etc/passwd" as the project name. Path traversal is the kind of thing you need to catch. Add this to the tool:
# Add this validation at the top of scaffold_project
if ".." in name or "/" in name or "\\" in name:
return json.dumps({"error": "Invalid project name"})
If your MCP server connects to an API, give it the minimum permissions it needs. The GitHub MCP incident happened because the PAT had access to every private repo. A read-only token scoped to one repo would have contained the blast radius.
A malicious npm package posing as a "Postmark MCP Server" was caught silently BCC'ing all emails to an attacker's address. Treat MCP server packages with the same caution you would give any code that runs on your machine with your permissions.
You have a working MCP server in a Docker container, connected to Claude Code. Here is how to make it portfolio-ready:
Add more tools: The scaffolder is a starting point. Add a tool that reads a project's dependency file and lists outdated packages. Add one that generates a Dockerfile for an existing project. Each tool is a function with a decorator – the pattern is the same every time.
Add tests: Write pytest tests that call your tool functions directly and verify the output. MCP tools are just Python functions. Test them like Python functions.
Push the Docker image: Tag it and push to Docker Hub or GitHub Container Registry. Then your claude mcp add command becomes claude mcp add scaffolder -- docker run -i --rm yourusername/mcp-scaffolder:latest and anyone can use it.
Write a README that explains the security model: What permissions does your server need? What file system access? What happens if inputs are malicious? Answering these questions in your README signals that you think about security, which is exactly what hiring managers are looking for right now.
We built a Python MCP server with FastMCP, containerized it with Docker, and connected it to Claude Code. The whole thing fits in about 100 lines of Python, a six-line Dockerfile, and one claude mcp add command.
The MCP ecosystem is real and growing fast. The protocol has the backing of Anthropic, OpenAI, and Google. It's now governed by the Linux Foundation. But it's also young, and the security story is still being written. Build with it, but build with your eyes open.
If you want to go deeper, here are the resources I found most useful:
MCP specification: the actual protocol docs
Claude Code MCP documentation: how Claude Code implements MCP
FastMCP GitHub: the Python framework we used
AuthZed's timeline of MCP security incidents: required reading if you are building MCP servers for production
Simon Willison on MCP prompt injection: the clearest explanation of why this is hard to solve
The complete source code for this tutorial is on GitHub.

Uno Platform 6.5 introduces Antigravity AI agent support, allowing agents to verify app behavior at runtime. Hot Design now launches by default with a redesigned toolbar and new scope selector. The release also adds Unicode TextBox support for non-Latin scripts, improves WebView2 on WebAssembly, and resolves over 450 community issues across all supported platforms.
By Almir VukIn the previous posts we covered the different CLI modes and session management. This time we're looking at one of Copilot CLI's most powerful features: the /fleet command. If you've ever wished you could clone yourself to tackle several parts of a codebase at once, this is the closest thing to it.
/fleet?When you send a prompt to Copilot CLI, by default a single agent works through the task sequentially. /fleet changes that model entirely.
The /fleet slash command lets Copilot CLI break down a complex request into smaller tasks and run them in parallel, maximizing efficiency and throughput. The main Copilot agent analyzes the prompt and determines whether it can be divided into smaller subtasks. It then acts as an orchestrator, managing the workflow and dependencies between those subtasks, each handled by a separate subagent.
In practice, this means a task that might take 20 minutes sequentially can complete in a fraction of the time — because independent chunks of work are being executed concurrently.
/fleetThe typical workflow is to use /fleet after creating an implementation plan. Switch into plan mode with Shift+Tab, describe the feature or change you want, and work with Copilot to produce a structured plan. Once the plan is complete, you'll be presented with two options:
/fleet implement the plan to kick things off manually. The first option is the faster path. The second gives you a moment to review or tweak your prompt before committing.
You can also use /fleet directly without going through plan mode first, by prefixing any prompt with the command:
/fleet add unit tests for every service in src/services/
Copilot will assess whether the work can be parallelized and assign subtasks to subagents accordingly. For something like writing tests across multiple independent service files, this is a natural fit.
/tasksOnce /fleet kicks off, you don't have to sit in the dark wondering what's happening. Use the /tasks slash command to see a list of all background tasks for the current session, including any subtasks being handled by subagents. Navigate the list with the up and down arrow keys. For each subagent task you can:
This is your control panel while fleet is running. Make a habit of opening /tasks after launching /fleet so you can catch any subtask that gets stuck or goes in the wrong direction early.
/fleetNot every task benefits from parallelization. /fleet shines when your work is naturally divisible into independent chunks.
Good candidates:
Poor candidates:
When you're using autopilot mode and want the quickest possible completion of a large task, /fleet is the right tool. But if your task cannot be cleanly split into independent subtasks, the main agent will handle it sequentially regardless.
/fleet is the multiplier that makes Copilot CLI genuinely competitive with human multitasking. Once you've identified a task that parallelizes well, the combination of plan mode + /fleet + autopilot is one of the most productive workflows the CLI offers.
In the next post, we'll look at extending GitHub Copilot agent behavior with hooks.