Embedding YouTube videos into your WordPress site is one of the simplest ways to add engaging content, and it only takes a few minutes!
This guide covers several ways to add a YouTube video to your WordPress site, whether you want quick and easy or more advanced control. It also includes customization tips, plugin options, and common fixes if something goes wrong.
If you’re using the WordPress block editor, the YouTube Embed block allows you to quickly add a video to a page or post. Simply follow these steps:

The video will appear directly in the editor. You can align it, resize the block, or add a caption if needed.
WordPress supports automatic embeds using oEmbed, so you can paste a YouTube URL on its own line, and WordPress will turn it into a video automatically. To do this:

This method is fast and doesn’t require any extra setup on your end.
If you’re still using the Classic Editor, embedding simply requires pasting the video link. Follow these steps:

When you view the page, the embedded video will appear where you placed the link.
If you want more control over how your video appears, try using the embed code from YouTube. Just follow these steps:

This method is great for advanced layout or design needs. If you’re familiar with basic coding principles, you can customize the iframe code by adding:
Example:
Replace VIDEO_ID with the actual ID from your YouTube link.
Want to show a YouTube video in your sidebar or footer? If you’re using a block theme, simply:

If you’re using a classic theme, follow these steps:

Plugins give you more options for things like displaying video galleries and adding lightboxes. Here are a few you can choose from:
Most of the time, embedding works without problems. But here are some potential problems and ways to fix them quickly:
YouTube is a great place to share and discover videos, but it’s not always the best fit for every site. If you want more control over how your videos are displayed, how quickly they load, or what shows up at the end, consider a dedicated video hosting solution like Jetpack VideoPress.
Jetpack VideoPress is built for WordPress, so you can upload and manage your videos directly from your dashboard. It gives you a fast, reliable player without third-party branding or distracting ads. Your visitors see your content, and nothing else.
Here’s what makes Jetpack VideoPress a smart choice:
If you’re publishing video content regularly and want more speed, control, and flexibility, VideoPress is an easy way to upgrade your setup without adding complexity.
You can learn more or get started here.

Key takeaways
Mistral has released a major new wave of open source models under Apache 2, including Mistral Large 3 and the Ministral 3 family. This generation reflects Mistral's continued move toward a fully open ecosystem, with open weights, multimodal capability, multilingual support, and upstream compatibility with vLLM. See the Mistral AI launch blog for more details, including benchmarks.
As part of this release, we collaborated with Mistral AI using the llm-compressor library to produce optimized FP8 and NVFP4 variants of Mistral Large 3, giving the community smaller and faster checkpoints that preserve strong accuracy.
With vLLM and Red Hat AI, developers and organizations can run these models on Day 0. There are no delays or proprietary forks. You can pull the weights and start serving the models immediately.
Mistral Large 3:
Ministral 3 (3B, 8B, 14B):
Licensing and openness:
The new Mistral 3 models are designed to work directly with upstream vLLM, giving users immediate access without custom integrations.
This makes vLLM the fastest path from model release to model serving.
Quick side note: We at Red Hat hosted a vLLM meetup in Zurich together with Mistral AI in November 2025. You can view the meetup recording here for a primer and deep dive into Mistral AI's approach to building open, foundational models.
Red Hat AI includes OpenShift AI and the Red Hat AI Inference Server, both built on top of open source foundations. The platform gives users a secure and efficient way to run the newest open models without waiting for long integration cycles.
Red Hat AI Inference Server, built on vLLM, lets customers run open source LLMs in production environments on prem or in the cloud. With the current Red Hat preview build, you can experiment with Mistral Large 3 and Ministral 3 today. A free 60-day trial is available for new users.
If you are using OpenShift AI, you can import the following preview runtime as a custom image:
registry.redhat.io/rhaiis-preview/vllm-cuda-rhel9:mistral-3-seriesYou can then use it to serve the models in the standard way, and add vLLM parameters to enable features such as speculative decoding, function calling, and multimodal serving.
This gives teams a fast and reliable way to explore the new Apache licensed Mistral models on Red Hat's AI platform while full enterprise support is prepared for upcoming stable releases.
This guide explains how to serve and run inference on a large language model using Podman and Red Hat AI Inference Server, leveraging NVIDIA CUDA AI accelerators.
Make sure you meet the following requirements before proceeding:
System requirements
Software requirements
Technology Preview notice
The Red Hat AI Inference Server images used in this guide are a Technology Preview and not yet fully supported. They are for evaluation only, and production workloads should wait for the upcoming official GA release from the Red Hat container registries.
This section walks you through the steps to run a large language model with Podman and Red Hat AI Inference Server using NVIDIA CUDA AI accelerators. For deployments in OpenShift AI, simply import the image registry.redhat.io/rhaiis-preview/vllm-cuda-rhel9:mistral-3-series as a custom runtime, and use it to serve the model in the standard way, eventually adding the vLLM parameters described in the following procedure to enable certain features (speculative decoding, function calling, and so on).
Open a terminal on your server and log in to registry.redhat.io:
podman login registry.redhat.iopodman pull registry.redhat.io/rhaiis-preview/vllm-cuda-rhel9:mistral-3-seriesIf SELinux is enabled on your system, allow container access to devices:
sudo setsebool -P container_use_devices 1Create and set proper permissions for the cache directory:
mkdir -p rhaiis-cachechmod g+rwX rhaiis-cacheCreate or append your Hugging Face token to a local private.env file and source it:
echo "export HF_TOKEN=<your_HF_token>" > private.envsource private.envIf your system includes multiple NVIDIA GPUs connected via NVSwitch, perform the following steps:
To detect NVSwitch support, check for the presence of devices:
ls /proc/driver/nvidia-nvswitch/devices/Example output:
0000:0c:09.0 0000:0c:0a.0 0000:0c:0b.0 0000:0c:0c.0 0000:0c:0d.0 0000:0c:0e.0
sudo systemctl start nvidia-fabricmanagerImportant
NVIDIA Fabric Manager is only required for systems with multiple GPUs using NVSwitch.
Run the following command to verify GPU access inside a container:
podman run --rm -it \
--security-opt=label=disable \
--device nvidia.com/gpu=all \
nvcr.io/nvidia/cuda:12.4.1-base-ubi9 \
nvidia-smiStart the Red Hat AI Inference Server container with the Mistral Large 3 FP8 model:
podman run --rm -it \
--device nvidia.com/gpu=all \
--shm-size=4g \
-p 8000:8000 \
--tmpfs /home/vllm/.cache:rw,exec,uid=2000,gid=2000 \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
--env "HF_HUB_OFFLINE=0" \
-e HF_HUB_CACHE=/opt/app-root/src/.cache \
registry.redhat.io/rhaiis-preview/vllm-cuda-rhel9:mistral-3-series \
--model mistralai/Mistral-Large-3-675B-Instruct-2512 \
--tokenizer-mode mistral \
--config-format mistral \
--load-format mistral \
--kv-cache-dtype fp8 \
--tensor-parallel-size 8 \
--limit-mm-per-prompt '{"image":10}' \
--enable-auto-tool-choice \
--tool-call-parser mistral \
--host 0.0.0.0 \
--port 8000Note: This configuration can be used to run Mistral Large 3 (FP8) on one 8x H200 node. Note the --tensor-parallel-size parameter, adjust to match other situations.
vLLM also supports calling user-defined functions. Make sure to run models with the following arguments.
podman run --rm -it \
--device nvidia.com/gpu=all \
--shm-size=4g \
-p 8000:8000 \
--tmpfs /home/vllm/.cache:rw,exec,uid=2000,gid=2000 \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
--env "HF_HUB_OFFLINE=0" \
-e HF_HUB_CACHE=/opt/app-root/src/.cache \
registry.redhat.io/rhaiis-preview/vllm-cuda-rhel9:mistral-3-series \
--model mistralai/Mistral-Large-3-675B-Instruct-2512 \
--tokenizer-mode mistral \
--config-format mistral \
--load-format mistral \
--kv-cache-dtype fp8 \
--tensor-parallel-size 8 \
--limit-mm-per-prompt '{"image":10}' \
--enable-auto-tool-choice \
--tool-call-parser mistralSpeculative decoding with the draft model (EAGLE3):
podman run --rm -it \
--device nvidia.com/gpu=all \
--shm-size=4g \
-p 8000:8000 \
--tmpfs /home/vllm/.cache:rw,exec,uid=2000,gid=2000 \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
--env "HF_HUB_OFFLINE=0" \
-e HF_HUB_CACHE=/opt/app-root/src/.cache \
registry.redhat.io/rhaiis-preview/vllm-cuda-rhel9:mistral-3-series \
--model mistralai/Mistral-Large-3-675B-Instruct-2512 \
--tokenizer-mode mistral \
--config-format mistral \
--load-format mistral \
--kv-cache-dtype fp8 \
--tensor-parallel-size 8 \
--limit-mm-per-prompt '{"image":10}' \
--host 0.0.0.0 \
--port 8000 \
--speculative_config '{
"model": "mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle",
"num_speculative_tokens": 3,
"method": "eagle",
"max_model_len": "16384"
}'The release of Mistral Large 3 and Ministral 3 represents another major step for open source LLMs and the open infrastructure supporting them.
Coming soon:
Open models are evolving faster than ever, and with vLLM and Red Hat AI, developers and enterprises can experiment on Day 0 safely, openly, and at scale.
The post Run Mistral Large 3 & Ministral 3 on vLLM with Red Hat AI on Day 0: A step-by-step guide appeared first on Red Hat Developer.A command that many shells have had forever and that PowerShell has had for a long time as well is ‘cd -’ which means “change directory to previous folder”. This means you could do something like this:
|
|
This is quite handy in many situations. In particular, since I’ve modified my PowerShell Profile to NOT start in my users folder but instead to start in my c:\dev folder (from which I can easily get to any repo I may be working from), I very often find myself using cd - whenever I create a new terminal in the folder I want to be in such as when using VS Code.
In Visual Studio 2026 we introduced Copilot Profiler Agent, a new AI-powered assistant that helps you analyze and optimize performance bottlenecks in your code. By combining the power of GitHub Copilot with Visual Studio’s performance profiler, you can now ask natural language questions about performance, get insights into hot paths, and quickly identify optimization opportunities. Let’s walk through a real-world example of how this tool can help you make meaningful performance improvements.
To demonstrate the capabilities of the Copilot Profiler Agent, let’s optimize CsvHelper, a popular open-source project. You can follow along by cloning my fork of the repo then checking out to right before my fix that we will detail below with git checkout 435ff7c
In one of my previous blog posts I added a CsvHelper.Benchmarks project that contains a benchmark for reading CSV records. This time I want to see if we can optimize writing CSV records instead. Normally I would start this investigation by creating a benchmark for the code that I want to optimize, and while we will still do that we can have Copilot do the toil work for us. In the Copilot Chat window I can ask @Profiler Help me write a benchmark for the #WriteRecords method. The @Profiler gets us talking directly with the Copilot Profiler agent and #WriteRecords tells it exactly the method we are interested in benchmarking. 
From here Copilot starts creating our new benchmark, asking us if its ok to install the profiler’s NuGet package to pull information from the benchmarks when it runs it. It also models the benchmarks after any existing benchmarks that it finds so the resulting benchmark is very similar to the one we already wrote keeping things consistent with the style of the repository. Lastly, it kicks off a build to make sure everything is good.
Once it’s done, it provides some useful follow-up prompts to start the investigation. We could click one of these to launch into our investigation, though I want to edit things slightly in the benchmark.
I tweaked the benchmark to have a few more fields for us to write, in this case 2 int fields and 2 string fields. When I originally had Copilot do this, before writing it up for this blog, instead of writing to the same memory stream it wrote to a new one each time. Writing into the same memory stream is probably the better way to go about things, you win this time Copilot, but in my original PR to CsvHelper I didn’t and it should be fine.
public class BenchmarkWriteCsv
{
private const int entryCount = 10000;
private readonly List records = new(entryCount);
public class Simple
{
public int Id1 { get; set; }
public int Id2 { get; set; }
public string Name1 { get; set; }
public string Name2 { get; set; }
}
[GlobalSetup]
public void GlobalSetup()
{
var random = new Random(42);
var chars = new char[10];
string getRandomString()
{
for (int i = 0; i < 10; ++i)
chars[i] = (char)random.Next('a', 'z' + 1);
return new string(chars);
}
for (int i = 0; i < entryCount; ++i)
{
records.Add(new Simple
{
Id1 = random.Next(),
Id2 = random.Next(),
Name1 = getRandomString(),
Name2 = getRandomString(),
});
}
}
[Benchmark]
public void WriteRecords()
{
using var stream = new MemoryStream();
using var streamWriter = new StreamWriter(stream);
using var writer = new CsvHelper.CsvWriter(streamWriter, CultureInfo.InvariantCulture);
writer.WriteRecords(records);
streamWriter.Flush();
}
}
Now to get started with the analysis I can either ask Profiler Agent to run the benchmark or just click on the follow up prompt for @Profiler Run the benchmark and analyze results. From here Copilot edits my main method which at first glance might seem odd but when looking at the changes I see it made the necessary changes to use BenchmarkSwitcher so it can choose which benchmarks to run:
static void Main(string[] args)
{
// Use assembly-wide discovery so all benchmarks in this assembly are run,
// including the newly added BenchmarkWriteRecords.
_ = BenchmarkSwitcher.FromAssembly(typeof(BenchmarkEnumerateRecords).Assembly).Run(args);
}
Then it kicks off a benchmarking run and when it’s done I’m presented with a diagsession where we can begin investigating.
Now comes the exciting part. After running the benchmark, the Profiler agent analyzes the trace and highlights where time is spent. I can ask the Profiler Agent questions about the trace and have it explain why code is slow or why certain optimizations could help. It has already pointed out that most of the time is spent in delegate compilation and invocation, which is done for each field in the CSV record. For a record with 4 fields written 10,000 times, that means 40,000 delegate invocations. Each invocation has overhead, and this is showing up as a hot path in the profiler.
I can ask the Profiler Agent: “How can I reduce the delegate invocation overhead?” or “Why is delegate invocation slow” and the agent like a patient teacher will explain concepts and suggest fixes.
I’m going to click the @Profiler Optimize library to produce a single compiled write delegate (reduce multicast invokes) and see what it comes up with. The Profiler Agent makes an edit to ObjectRecordWriter and I can click on that in the chat window to see the diff of the changes it makes.
Looking at the current implementation, the code builds a list of delegates, one for each field:
var delegates = new List<Action>();
foreach (var memberMap in members)
{
// ... field writing logic ...
delegates.Add(Expression.Lambda<Action>(writeFieldMethodCall, recordParameter).Compile());
}
var action = CombineDelegates(delegates) ?? new Action((T parameter) => { });
return action;
The issue is that CombineDelegates creates a multicast delegate which invokes each individual delegate separately in series. Instead, Profiler Agent is suggesting we use Expression.Block to combine all the expressions before compiling:
var expressions = new List<Expression>(members.Count);
foreach (var memberMap in members)
{
// ... field writing logic ...
expressions.Add(writeFieldMethodCall);
}
if (expressions.Count == 0)
{
return new Action<T>((T parameter) => { });
}
// Combine all field writes into a single block
var block = Expression.Block(expressions);
return Expression.Lambda<Action<T>>(block, recordParameter).Compile();
This change is small but elegant: instead of creating multiple delegates and invoking them sequentially, we create a single block expression containing all the field writes, then compile it once. Now all fields are written in a single call when we invoke the delegate for each record, with no additional delegate overhead.
After making this change, Copilot automatically reruns the benchmarks to measure the improvement. The results show approximately 24% better performance in this run with the profiler. Our previously staged PR for CsvHelper shows ~15% better performance. The CPU profiler confirms that we’ve eliminated the delegate invocation overhead and instead of 40,000 delegate calls for our 10,000 records with 4 fields each, we now have just 10,000 delegate calls.
This is a meaningful win for a library that’s already heavily optimized. For applications writing large CSV files with many fields, this improvement translates directly to reduced CPU time and faster processing. And because CsvHelper has millions of downloads, this optimization benefits a huge number of users. From here I went ahead and staged the PR, though Copilot helpfully provides more follow up prompts regarding the type conversion and ShouldQuote logic so that I could continue to improve performance further.
What makes this workflow powerful is the combination of precise performance data from the Visual Studio Profiler with the analytical and code generation capabilities of Copilot. Instead of manually digging through CPU traces and trying to understand what the hot paths mean, you can ask natural language questions, get actionable insights, and quickly test ideas.
The agent doesn’t just tell you what’s slow – it helps you understand why it’s slow and suggests concrete ways to fix it. In this case, it identified that delegate invocation overhead was the bottleneck and suggested the Expression.Block optimization, which is exactly the right solution for this problem. It even reran the benchmarks to confirm the optimization!
We’ve shown how the Copilot Profiler Agent can help you take a real-world project, identify performance bottlenecks through natural language queries, and make meaningful improvements backed by data. The measure/change/measure cycle becomes much faster when you can ask questions about your performance data and get intelligent answers. We’d love to hear what you think!
The post Profiler Agent – Delegate the analysis, not the performance appeared first on Visual Studio Blog.
Generative AI has evolved faster than almost any technology in recent history. But what’s coming next (what many are calling Gen AI 2.0) marks a deeper shift. It's not just about bigger models or more impressive outputs. It's about AI that truly collaborates, intuitively adapts and integrates into how we work every single day.
While the first generation of generative AI focused on responding to isolated prompts, Gen AI 2.0 focuses on understanding your broader context, taking proactive action and building alongside humans in a much more dynamic way. For creators, teams and companies of any size, this represents a powerful new foundation for unprecedented innovation.
This blog explains Gen AI 2.0 in a way that’s easy to understand, whether you're completely new to AI or already experimenting with its current capabilities, and explores how it will reshape the way we create, design and work in 2026 and beyond!
Think of Gen AI 2.0 as the intelligent evolution of generative artificial intelligence. Instead of simply generating text, images, or code on a single request, these new systems can understand the broader context of what you're trying to achieve and execute multi-step tasks across different tools and digital environments. This functionality is driven by seamless collaboration, allowing the systems to adapt and refine their behavior over time, often by orchestrating multiple specialized AI agents working together.
Think also of it as moving from “AI that answers your questions” to “AI that actively works with you to achieve your goals.”
In practical terms, Gen AI 2.0 can become your ultimate co-pilot, helping you brainstorm, prototype, test, document, build and refine… all while keeping your human intention firmly at the center of the creative process.
We’re standing at a pivotal moment. While 2025 might feel like a transition year, 2026 is poised to be the year AI becomes operational infrastructure for everyone.
Gen AI 2.0 is at the heart of that shift because it allows:
This isn’t about replacing human talent. It's about massively expanding what humans can envision and achieve.
The most important idea behind Gen AI 2.0 is that the AI doesn’t just sit on the sidelines as a passive tool: it actively participates as a teammate.
Context-Aware Systems: Instead of treating every prompt like a disconnected request, Gen AI 2.0 tools deeply understand where you are in a workflow: What are your overall goals? What steps have you already taken? What should logically come next?
Actionable Intelligence: These systems don’t just respond with text; they can act. They can draft full documents, generate functional code, restructure complex data, trigger automations, build UI components, or call external APIs to get things done.
Multi-Agent Collaboration: This shift means you can imagine having multiple AI “teammates,” each with a unique specialization, all working together under your expert direction.
Human Intention at the Center: Crucially, the AI doesn’t replace your judgment or creativity. It enhances it. Your personal taste, ethical considerations, unique vision and critical decision-making remain the anchors of the entire process.
One of the biggest and most exciting cultural shifts in Gen AI 2.0 is what we call “vibe coding.”
Instead of painstakingly writing every line of code or crafting every design element from scratch, you simply describe the desired outcomes, your high-level goals, any constraints, and even the overall “feel” or “vibe” of a system. The AI then intelligently generates the foundational scaffolding, offers various alternatives, and iterates on your vision.
You don't need to know what every function does or how every pixel is placed. You just need to know what you want to achieve.
The AI helps you explore possibilities, test ideas, identify and fix issues and evolve your system. This dramatically lowers the barrier to participation, allowing more people to build meaningful products and solutions without needing deep technical backgrounds.
Gen AI 2.0 uniquely empowers creators and builders to tackle challenges that previously required entire departments and massive resources. This means you can now build a polished Minimum Viable Product (MVP) in days, not months. The system automatically handles tasks like generating documentation, testing scripts, design assets, and automating complex workflows across tools. This capability allows you to iterate on ideas with near-instant feedback and scale your production dramatically without proportional headcount growth.
For startups, this dramatically levels the playing field. For individual creators, it expands the boundaries of what’s creatively possible.
Gen AI 2.0 will fundamentally reshape 2026 in three major ways. First, hybrid workflows will become the default, embedding AI into the initial design phase. Second, we will see the emergence of truly AI-native products; applications built from the ground up assuming AI is doing part of the work. Finally, creativity will scale far beyond human limits, making ideas that once seemed "too big," "too ambitious," or "too complex" achievable.
At Synergy Shock, we believe the future belongs to creators who build with intention. We operate on a simple philosophy: always keep humans at the center, using Gen AI to amplify creativity rather than automate it away.
Our solutions are designed to treat your workflows as interconnected ecosystems, not isolated pipelines, ensuring every piece of AI enhances critical human judgment.
If you’re ready to face the complexity of Gen AI adoption, let’s talk. We are here to help you turn ambition into resilient, measurable action.