Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
151630 stories
·
33 followers

Can You Really Trust AI-Generated Code? - JSJ 699

1 Share
AI is writing more of our code than ever before—but should we actually trust it? In this episode of JavaScript Jabber, I sat down with Itamar Friedman from Codo (formerly Quoto) to dig into one of the biggest questions developers are wrestling with right now: What happens when AI is generating code, reviewing code, and shaping how we ship software?

We explore where AI fits into modern code review, whether developers should be worried about job security, and how human responsibility still plays a critical role—even in an AI-powered workflow. From guardrails and quality standards to the future of agent-driven development, this conversation goes beyond hype and gets into what’s actually working today (and what still needs a human in the loop).

AI isn’t replacing developers—it’s changing how we build, review, and take ownership of software. If you enjoyed this conversation, make sure to rate, follow, share, and review JavaScript Jabber. It really helps the show, and it helps more developers join the conversation. Thanks for listening—and we’ll see you next time!

Become a supporter of this podcast: https://www.spreaker.com/podcast/javascript-jabber--6102064/support.



Download audio: https://dts.podtrac.com/redirect.mp3/api.spreaker.com/download/episode/69190814/jsj_699.mp3
Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

What Does 1 Trillion Web Pages Sound Like?

1 Share

For this special holiday episode, we’re celebrating the Internet Archive’s milestone of 1 trillion web pages archived with something a little different: live music created just for the occasion.

Join us for conversations with composer Erika Oba, composer Sam Reider, and cellist Kathryn Bates of the Del Sol Quartet, recorded around The Vast Blue We, the concert held at the Internet Archive to honor our shared digital memory. Two new commissions premiered that night: Oba’s “Blue Lights” and Reider’s “Quartet for a Trillion,” both written to capture the wonder and scale of the open web—and brought to life by Del Sol Quartet. Oba later reconfigured “Blue Lights” for a solo performance during The Web We’ve Built celebration.

In this episode, you’ll hear brief conversations with the artists about their creative process, followed by recordings from the performance itself. A short, reflective holiday release that celebrates collaboration, imagination, and what we can build together.

Check out all of the Future Knowledge episodes at https://archive.org/details/future-knowledge





Download audio: https://media.transistor.fm/648d6bed/0776f765.mp3
Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Microsoft denies rewriting Windows 11 using AI after an employee’s “one engineer, one month, one million code” post on LinkedIn causes outrage

1 Share

Microsoft told Windows Latest that the company does not plan to rewrite Windows 11 using AI in Rust, which is a programming language that is more secure than C and C++. But the clarification is not coming out of nowhere, as a top-level Microsoft engineer made bold claims of using AI to replace C and C++ with Rust.

“My goal is to eliminate every line of C and C++ from Microsoft by 2030. Our strategy is to combine AI *and* Algorithms to rewrite Microsoft’s largest codebases. Our North Star is “1 engineer, 1 month, 1 million lines of code,” Galen Hunt, who is a top-level engineer at Microsoft, wrote in a now-edited LinkedIn post.

“Eliminate every line of C and C++ from Microsoft by 2030” obviously suggests that Microsoft’s top-level engineer, who is responsible for several large-scale research projects, is talking about products like Windows. For those unaware, most of the Windows API level code, and even its kernel, is built in C, while C++ powers some of the apps.

I also screenshotted the LinkedIn post before it was edited out by the top-level Microsoft engineer:

Microsoft job listing

Honestly, most people would not have taken this seriously if it did not come from a top-level Microsoft engineer. When someone with that kind of title and long history at the company talks about eliminating C and C++ and using AI to rewrite large codebases, it sounds less like a random idea and more like something Microsoft is at least exploring.

Moreover, the LinkedIn post repeatedly used “our”, which sort of makes it obvious he’s speaking on behalf of the company.

Following the outrage over plans to “eliminate every line of C and C++ from Microsoft by 2030,” Microsoft told Windows Latest that there are no such plans. Frank X. Shaw, who is a top-level executive and heads communications for Microsoft, also confirmed to Windows Latest that the company has no plans to rewrite Windows 11 using AI.

Galen Hunt, who originally claimed C and C++ are being replaced with Rust using AI, also updated his LinkedIn post with the following clarification:

“It appears my post generated far more attention than I intended… with a lot of speculative reading between the lines.. Just to clarify… Windows is *NOT* being rewritten in Rust with AI.

My team’s project is a research project. We are building tech to make migration from language to language possible. The intent of my post was to find like-minded engineers to join us on the next stage of this multi-year endeavor—not to set a new strategy for Windows 11+ or to imply that Rust is an endpoint.

While Galen Hunt says people were “reading between the lines,” the reaction did not come out of nowhere. His post used very direct language about eliminating C and C++ by 2030 and using AI plus algorithms to rewrite large codebases, along with a “1 engineer, 1 month, 1 million lines of code” line.

In fact, the top-level engineer’s edited post still says his team would have “1 engineer, 1 month, 1 million lines of code.”

The original wording is what made it sound broader than a small research effort.

The post Microsoft denies rewriting Windows 11 using AI after an employee’s “one engineer, one month, one million code” post on LinkedIn causes outrage appeared first on Windows Latest

Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How to deploy and benchmark vLLM with GuideLLM on Kubernetes

1 Share

To truly understand an LLM's real production potential, you have to measure the performance of its serving engine. The high-performance inference technology at the core of Red Hat AI is based on the vLLM open-source project, whose performance optimization techniques are key to achieving speed and throughput at scale.

This article offers Kubernetes users a comprehensive, step-by-step approach to manually deploy and test the inference capabilities of vLLM. We will deploy the community version of a containerized vLLM server on OpenShift (as our Kubernetes distribution of choice) using NVIDIA GPUs and use GuideLLM, a specialized performance benchmarking tool, to generate the metrics needed to validate its capabilities under load. 

GuideLLM is designed to run performance benchmarks against LLM inference servers. It is able to simulate multiple simultaneous users by sending requests concurrently at various rates. This allows us to understand how the vLLM server behaves under load, measuring critical metrics like request throughput, latency, and tokens per second to evaluate its suitability for production workloads. Please refer to this article or video to learn more about GuideLLM.

Prerequisites

Before you begin, ensure you have the following:

  • An OpenShift or Kubernetes cluster (this guide uses openshift version 4.17.15).
  • A node with NVIDIA GPUs (this guide uses NVIDIA A100s).
  • The NVIDIA GPU Operator installed on your cluster. This operator acts as a bridge between the GPUs on your nodes and the OpenShift scheduler. It manages drivers and exposes the GPU resource that pods need to request GPU access.
  • The oc command-line tool configured to access your cluster.
    • Note for Kubernetes Users: This guide uses OpenShift commands (oc), but can be easily adapted for Kubernetes by replacing oc with kubectl. OpenShift-specific features like Routes will have Kubernetes alternatives noted throughout the guide.

Step 1: Deploy vLLM on OpenShift/Kubernetes

Note on Inference Server Options:

This guide demonstrates vLLM deployment for simplicity and broad applicability across Kubernetes environments. However, for enterprise production deployments, Red Hat recommends using Red Hat AI Inference Server, which offers an enterprise-grade and supported version of vLLM. Alternatively, users can leverage Red Hat OpenShift AI, which expands Red Hat AI Inference Server's capabilities into a full, end-to-end gen AI/MLOps platform for the hybrid cloud.

For AI Inference Server or OpenShift AI deployment instructions, refer to the AI Inference Server documentation and OpenShift AI documentation

The GuideLLM benchmarking methodology demonstrated in Step 2 and beyond applies equally to both vLLM and AI Inference Server deployments.

First, we will need to set up a project and service account to deploy the vLLM server.

1. Create a project and service account.

oc new-project vllm-inference
oc create serviceaccount vllm-sa -n vllm-inference

# Kubernetes Equivalent: Replace 'oc' with 'kubectl'.

2. Create a PersistentVolumeClaim (PVC) for saving our models.

The vLLM needs to download the model weights from Hugging Face. We'll create a PVC to store these models persistently.

vllm-pvc.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vllm-models-pvc
  namespace: vllm-inference
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: <your-storage-class>

Note: Replace <your-storage-class> with a storage class available on your cluster. You can find available classes by running oc get sc(or kubectl get sc for Kubernetes). If it returns empty you will first need to create a storage class. 

Since we're using ReadWriteOnce (RWO) access mode with a single replica, local storage classes work well for this use case. If you plan to scale to multiple replicas, you'll need to change the accessMode to ReadWriteMany (RWX) and use an RWX-compatible storage class. Common storage class examples:

For Local/Block Storage (RWO):

  • LVMS (Logical Volume Manager Storage): lvms-vg1

  • OpenShift Data Foundation: ocs-storagecluster-ceph-rbd

  • Local storage: local-path

  • Cloud providers: gp3 (AWS), standard-rwo (GKE), managed-csi (Azure)

If you need multiple replicas, these are for Shared/Network Storage (RWX):

  • NFS: nfs-client or managed-nfs-storage

  • OpenShift Data Foundation: ocs-storagecluster-cephfs

  • Cloud providers: efs-sc (AWS), filestore-csi (GKE), azurefile (Azure)

Apply the manifest to create the PVC:

oc apply -f vllm-pvc.yaml

3. Create a Hugging Face Secret

Many models, like Llama 3.1, require authentication with a Hugging Face token. Create a secret to store your token. You will create a token using your Hugging Face account.

oc create secret generic huggingface-secret \
--from-literal=hf_token=<your-hugging-face-token> \
-n vllm-inference

4. Define and deploy vLLM

Now, create the Deployment manifest. This will pull the latest vLLM container image, mount the PVC and the secret, and start the server. This manifest defines everything our vLLM pod needs to run successfully.

vllm-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-llama-8b
  namespace: vllm-inference
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vllm-llama-8b
  template:
    metadata:
      labels:
        app: vllm-llama-8b
    spec:
      serviceAccountName: vllm-sa
      containers:
      - name: vllm
        image: vllm/vllm-openai:v0.11.2
        env:
        - name: HF_TOKEN
          valueFrom:
            secretKeyRef:
              name: huggingface-secret
              key: hf_token
        - name: HOME
          value: /models
        - name: HF_HOME
          value: /models/.cache
        - name: FLASHINFER_WORKSPACE_DIR
          value: /models/.cache/flashinfer
        command: ["/bin/sh", "-c"]
        args:
          - "python3 -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.1-8B-Instruct --download-dir /models --tensor-parallel-size 1 max-model-len 2048"
        ports:
        - containerPort: 8000
        resources:
          limits:
            nvidia.com/gpu: 1
        volumeMounts:
        - name: dshm
          mountPath: /dev/shm
        - name: model-storage
          mountPath: /models
      volumes:
      - name: dshm
        emptyDir:
          medium: Memory
      - name: model-storage
        persistentVolumeClaim:
          claimName: vllm-models-pvc

Deploy the vLLM server:

oc apply -f vllm-deployment.yaml

5. Expose the service

To allow other applications (like our guidellm benchmark job) to access the vLLM server, we need to create a service.

oc expose deployment vllm-llama-8b --port=8000 --name=vllm-service

6. Create a route for external access (optional for testing)

This step creates an external route so you can test the vLLM deployment from outside the cluster using curl or other tools.

oc expose service vllm-service --name=vllm-route

Kubernetes alternative:

# For vanilla Kubernetes, Routes are not available. Instead, use one of these options:
a) LoadBalancer Service (if your cluster supports it):
    kubectl patch service vllm-service -p '{"spec":{"type":"LoadBalancer"}}'
 b) NodePort (for testing):
    kubectl patch service vllm-service -p '{"spec":{"type":"NodePort"}}'
 c) Ingress Resource (requires an Ingress Controller installed)

Important note on benchmarking:

While we're creating a route for external testing, our GuideLLM benchmark job will use the internal service endpoint (http://vllm-service.vllm-inference.svc.cluster.local:8000) instead. This ensures accurate performance metrics by avoiding external network latency and ingress overhead that could skew the results. Benchmarking from inside the cluster provides true application-to-service performance measurements.

7. Test your deployed model by using curl

To verify that your model is deployed and accessible, you can try sending a curl request from your terminal using curl.

curl http://<your route>/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "messages": [
    {"role": "user", "content": "What is Red Hat?"}
  ],
  "temperature": 0.1
}'

Step 2: Run GuideLLM as a Kubernetes job

With our vLLM instance running, follow these steps to launch a guidellm job.

1. Create a PVC for storing benchmark results.

Just like the vLLM pod, the job's pod is ephemeral and will be deleted after it completes. We need a separate PVC to persistently store the output JSON report from the benchmark.

guidellm-pvc.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: guidellm-results-pvc
  namespace: vllm-inference
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: <your-storage-class>

Apply the manifest:

oc apply -f guidellm-pvc.yaml
# Kubernetes Equivalent: Replace 'oc' with 'kubectl'.

2. Define and run the GuideLLM job

This manifest defines the benchmark job. Before we look at the complete manifest, let's understand the key GuideLLM benchmark flags we'll be using:

Understanding the benchmark configuration:

  • --target: The endpoint URL of the inference server to benchmark. We use the internal Kubernetes service DNS name for cluster-internal communication.

  • --model: The model ID to benchmark. Must match the model deployed in your vLLM server.

  • --processor: The tokenizer used to calculate token counts for statistics and synthetic data generation. Typically the same as the model ID.

  • --data: Defines the benchmark request shape. We use a JSON config specifying synthetic data with 1000 prompt tokens and 1000 output tokens per request.

  • --rate-type: The benchmark mode. "concurrent" maintains a fixed number of simultaneous requests. Other options include "poisson" (requests per second), "synchronous" (one at a time), and "sweep" (automatic load testing).

  • --rate: For concurrent mode, this specifies the number of concurrent users to test. In the example below, "1,2,4" means we'll run three separate benchmarks with 1, 2, and 4 simultaneous requests.

  • --max-seconds: Maximum duration for each benchmark iteration (300 seconds = 5 minutes per rate level).

  • --output-dir: The directory to save the JSON result files inside the container.

  • --outputs: The output files to create(json, csv, html)

guidellm-job.yaml:

apiVersion: batch/v1
kind: Job
metadata:
  name: guidellm-benchmark-job
  namespace: vllm-inference
spec:
  template:
    spec:
      containers:
      - name: guidellm
        image: ghcr.io/vllm-project/guidellm:v0.5.0
        env:
        - name: HF_TOKEN
          valueFrom:
            secretKeyRef:
              name: huggingface-secret
              key: hf_token
        - name: HOME
          value: /results
        - name: HF_HOME
          value: /results/.cache
        command: ["guidellm"]
        args:
        - "benchmark"
        - "run"
        - "--target"
        - "http://vllm-service.vllm-inference.svc.cluster.local:8000"
        - "--model"
        - "meta-llama/Llama-3.1-8B-Instruct"
        - "--processor"
        - "meta-llama/Llama-3.1-8B-Instruct"
        - "--data"
        - '{"prompt_tokens":1000,"output_tokens":1000}'
        - "--rate-type"
        - "concurrent"
        - "--max-seconds"
        - "300"
        - "--rate"
        - "1,2,4"
        - "--output-dir"
        - "/results"
        - "--outputs"
        - "benchmark-results.json,benchmark-results.html"
        volumeMounts:
        - name: results-volume
          mountPath: /results
      volumes:
      - name: results-volume
        persistentVolumeClaim:
          claimName: guidellm-results-pvc
      restartPolicy: Never
  backoffLimit: 1

Run the job:

oc apply -f guidellm-job.yaml

You can monitor the status of the job by running oc get pods -w. Wait for the guidellm-benchmark-job pod to change its status from Running to Completed. Once complete you can view the benchmark output by doing an oc logs <guidellm-pod-name>. Delete the job oc delete job guidellm-benchmark-job.

Step 3: Retrieve the benchmark results

Once the job is complete, the results will be saved in the guidellm-results-pvc. To get the file, we can create a temporary "helper" pod that mounts the same PVC, and then use oc cp to copy the file to our local machine.

1. Create the helper pod: 

pvc-inspector-pod.yaml:
apiVersion: v1
kind: Pod
metadata:
  name: pvc-inspector
  namespace: vllm-inference
spec:
  containers:
  - name: inspector
    image: registry.access.redhat.com/ubi8/ubi
    command: ["sleep", "infinity"]
    volumeMounts:
    - name: results-storage
      mountPath: /mnt/results
  volumes:
  - name: results-storage
    persistentVolumeClaim:
      claimName: guidellm-results-pvc
Apply it:
oc apply -f pvc-inspector-pod.yaml

2. Copy the results:

oc cp pvc-inspector:/mnt/results/benchmark-results.json ./benchmark-results.json 
oc cp pvc-inspector:/mnt/results/benchmark-results.html ./benchmark-results.html 

You will now have a benchmark-results.html and a benchmark-results.json file on your local machine with the detailed performance metrics from your vLLM server.

3. View the guidellm results UI on a web browser. 

Open the downloaded HTML file with a web browser to view the Guidellm UI, which provides an interactive HTML report with tables and visualizations of benchmark results (Figure 1).

This shows the workload report in the GuideLLM UI.
Figure 1: This figure shows the workload details within the GuideLLM UI.

Figure 2 displays a summary of the metrics in the GuideLLM UI.

A summary of the metrics in the GuideLLM UI.
Figure 2: This graph provides a summary of the metrics in the GuideLLM UI.

Figure 3 shows the latency in the GuideLLM UI.

This figure displays the latency in the GuideLLM UI.
Figure 3: This shows the latency in the GuideLLM UI.

Figure 4 shows the time and throughput metrics.

This screen displays the time and throughput metrics in the GuideLLM UI.
Figure 4: This figure displays the time and throughput metrics in the GuideLLM UI.

4. Redisplay the benchmark results.

Guidellm also provides a convenient way to redisplay the results from the saved JSON file using the from-file command. You will need to install guidellm on your local machine using pip as follows:

pip install guidellm[recommended]==0.5.0

guidellm benchmark from-file ./benchmark-results.json

This command will parse the JSON output and display the result tables as shown in Figure 5.

This screen displays the GuideLLM result tables.
Figure 5: This displays the GuideLLM benchmark result output.

Figure 6 shows more output.

This screen displays the GuideLLM benchmark result output.
Figure 6: This shows the output for the GuideLLM benchmark result.

The output includes five main tables:

  1. Run summary info table: This table shows metadata about each benchmark run.

  2. Text metrics statistics table: This table offers a detailed breakdown of the text-related statistics for each benchmark run. It details the input and output token, word and character statistics on a per request and per second basis. 

  3. Request token statistics table: This table provides a statistical summary of the input, output and total token counts per request for each benchmark. 

  4. Request latency statistics table:

    1. Request latency: Median and p95 end to end latency in seconds - the total time from request to completion

    2. TTFT (Time to First Token): Median and p95 in milliseconds - measures how quickly the model starts generating a response after receiving a request. Lower is better and critical for interactive applications.

    3. ITL (Inter-Token Latency): Median and p95 in milliseconds - the time between consecutive tokens during generation. Consistent low ITL provides smooth streaming experiences.

    4. TPOT (Time Per Output Token): Median and p95 in milliseconds - the average time to generate each subsequent token. Lower values mean faster generation.

  5. Server throughput statistics table: This table displays the throughput performance metrics for each concurrent load level.

    1. Input Tok/sec: Input tokens processed per second. Higher values indicate better throughput.

    2. Output Tok/sec: Output tokens generated per second. Higher values indicate better throughput.

    3. Total Tok/sec: This is the total tokens (input + output) processed per second (a measure of overall system throughput).

Final thoughts

We have demonstrated the comprehensive benchmarking process detailing the production potential of Large Language Models on OpenShift. By combining vLLM and GuideLLM, organizations can measure critical, real-world metrics such as request throughput and latency, which are essential for achieving enterprise-grade readiness. 

To continue your journey, dive deeper into GuideLLM and its capabilities by watching this technical overview. Explore an enterprise-grade, supported solution for your production workloads by reviewing the official documentation for Red Hat AI Inference Server.

The post How to deploy and benchmark vLLM with GuideLLM on Kubernetes appeared first on Red Hat Developer.

Read the whole story
alvinashcraft
4 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Build Composable Storefronts Smarter and Faster with the PWA Kit MCP Server

1 Share

Have you ever tried creating a composable storefront and found yourself scratching your head because everything is new? Building a composable storefront can be a time-consuming experience, requiring you to research the technology, understand the architecture, hunt down the docs, and figure out the development process. Now, the new PWA Kit Model Context Protocol (MCP) Server makes these development tasks easier and faster.

In this blog post, we’ll take a deep dive into the new PWA Kit MCP Server, which is currently in Developer Preview, and walk you through how to create a storefront, add a category list component to a new page, and then view the results in the browser.

Overview of the PWA Kit MCP Server

With the PWA Kit MCP Server, you can create a composable storefront, customize it with a page and a component, discover API hooks, figure out how to use APIs, and more. This server offers step-by-step guidance and hands-on support directly in your workspace, so you can confidently build your first component without getting lost before you even begin. 

You can use the PWA Kit MCP Server with an agentic integrated development environment (IDE), such as Cursor, AI agents, or agentic platforms, such as Claude, Codex, and Google Antigravity.

Composable storefronts are based on the Progressive Web App (PWA) Kit for web development. When building a storefront, the PWA Kit MCP Server provides you with a range of benefits, including:

  • Reduced time-to-market: By standardizing development workflows and providing AI-assisted code generation, development teams can accelerate storefront deployment cycles and get products to market faster
  • Lower development costs: The automated testing, best practice guidance, and code generation capabilities can reduce the need for extensive manual testing, code reviews, and troubleshooting, potentially lowering overall development expenses
  • Enhanced code quality and compliance: Automated accessibility and performance testing, combined with standardized development patterns, can improve overall code quality and help ensure compliance with web standards and accessibility requirements

Example use case

One of our partners wanted to generate an FAQ page using Cursor with the PWA Kit MCP Server configured. After they specified the requirements, the Cursor agent ran various MCP tools to generate the entire code for their new page.

Here’s how they described their experience:

“The tools we have used since release, the create component and create page tools of the PWA Kit MCP Server, are very helpful. They are able to identify the correct props and capture requirements as per prompt. We have tried very different components with the component tool, and it was able to give most things that are necessary. Those two tools really help with development. ” 

Karan Rajpara, Software Developer, The Commerce Team Global (a Salesforce Partner)

Before we dive deeper into the PWA Kit MCP Server, let’s first establish a quick understanding of the basics.

What is MCP, and how is it used with an agentic IDE?

MCP, or Model Context Protocol, is an open, standardized protocol that expands what a large language model (LLM) can do by giving it domain-specific knowledge and real, actionable capabilities beyond the LLM’s default capabilities. When integrated with an AI agent, an MCP server allows the AI to connect to data sources, tools, and workflows, enabling the AI agent to access key information and perform domain specific tasks. This transforms the AI agent into an intelligent copilot with knowledge and capabilities of a specific external system.

The PWA Kit MCP Server is a local Standard Input/Output (STDIO) MCP server that runs within a developer’s local IDE and uses the local machine’s standard input and output streams. It offers intelligent tools that optimize your storefront developer experience and reduce development time and complexity. 

The PWA Kit MCP Server provides AI-assisted code generation and assistance, built-in best practices, and performance and accessibility testing — all within your development environment. Out of the box, it provides a set of MCP tools designed to streamline and optimize composable storefront development, helping you to quickly create, customize, and manage your storefront apps with confidence. 

Diagram showing the architecture of the PWA Kit MCP Server and the MCP host app

PWA Kit MCP Server tools: Overview and capabilities 

The PWA Kit MCP Server offers intelligent tools that are tailored to composable storefront development.

MCP Tool Name Description
pwakit_create_storefront Guides agents and developers in creating a PWA Kit project with @salesforce/pwa-kit-create-app.
pwakit_create_page Generates a new composable storefront page with custom routing and components.
pwakit_create_component Generates a new React component for the composable storefront.
pwakit_get_dev_guidelines Provides best practices and guidance for building composable storefronts.
pwakit_recommend_hooks Identifies and integrates hooks that solve specific use cases.
pwakit_run_site_test Runs performance and accessibility audits on a provided site URL. Example: https://pwa-kit.mobify-storefront.com.
pwakit_install_agent_rules Adds an agent guidelines rule file to your project that enables AI to make better use of the PWA Kit MCP Server. 
pwakit_explore_scapi_shop_api Explores and documents the B2C Salesforce Commerce API (SCAPI) endpoints, parameters, and usage examples. 
scapi_custom_api_discovery Discovers custom SCAPI APIs registered in Business Manager, and fetches the schema of those APIs. Requires that you set up credentials

 

GitHub repo for the PWA Kit MCP Server source code 

If you want to inspect the source code of the PWA Kit MCP Server, or to customize the server, check out the pwa-kit-mcp folder in GitHub. To point to the local PWA Kit MCP Server, update your MCP settings as outlined in the Development section of the GitHub ReadMe.  

End-to-end demo: Create a storefront and add a page and a component

Ready? Let’s explore the power of using generative AI to accelerate your storefront development. We’ll use the PWA Kit MCP Server to create a storefront, add a category list component to a new page, and then view the results in the browser.

Configure PWA Kit MCP Server settings in Cursor

First, let’s configure the MCP settings in Cursor to add the PWA Kit MCP Server. The setting’s JSON definition looks as follows.

{
  "mcpServers": {
    "pwa-kit": {
      "command": "npx",
      "args": ["-y", "@salesforce/pwa-kit-mcp"],
      "env": {
        "PWA_STOREFRONT_APP_PATH": "{{path-to-app-directory}}"
      }
    }
  }
}

 

The {{path-to-app-directory}} value is a placeholder for the app subfolder path, which we can fill after creating the storefront.

After the MCP server settings are saved, Cursor performs these actions:

  • Launches the PWA Kit MCP Server
  • Connects to the PWA Kit MCP Server as a client
  • Displays the available tools in the UI and how you can invoke them

Cursor settings showing the available tools for the PWA Kit MCP Server

You can return to the MCP tools in Cursor anytime to enable or disable specific tools or servers.

Create a storefront with the PWA Kit MCP Server

Now that the MCP settings are set up, let’s get started with using the PWA Kit MCP Server to create a storefront. In a new chat window, we’ll type this prompt:

Using the pwa-kit MCP server, create a storefront

It gives us the option of creating a storefront using the demo Commerce Cloud instance or a custom instance. We’ll choose the demo instance to speed things up. Cursor creates a new storefront project in our workspace with all the files based on the Retail React App template. 

The MCP tool creates a new storefront project in the Cursor workspace based on the Retail React App template

Remember the MCP settings earlier that contain a placeholder for the app path? We’ll replace the {{path-to-app-directory}} placeholder with the app subfolder’s path, which in our case, looks like this:

/Users/<username>/project-folder/demo-storefront/overrides/app

Next, let’s run and view the storefront in the browser. We’ll type these commands in the chat and run them:

<pre language=”bash”>

cd <path-to-project>/demo-storefront && npm start
</pre>

A browser window opens and shows our storefront at the default address of http://localhost:3000/.

Now, we’ve finished creating the storefront. Yay!

 

cd /demo-storefront && npm start

 

Create a component for product categories

The storefront shows merchandise from the demo instance. Let’s now customize it by adding a new page and a component that displays categories for products. We’ll first tell Cursor to create the component.

                           Create a component for displaying a list of product categories

Using the MCP tools, Cursor does some research to find the hook that gets product categories in the commerce-sdk-react package. It then creates the ProductCategoryList component and a test for it.

The MCP tool in Cursor creates the ProductCategoryList component, which is shown in the index.jsx file

Create a page that displays the new component

Next, we’ll instruct Cursor to create a new page and add the component to it.

Create a new page called shop-by-category with the title “Shop By Category”. Add the ProductCategoryList component to it.

Using the MCP tool, Cursor creates a new page and automatically adds a route — /shop-by-category — for the new page. Let’s view the new page and component in the browser. Navigate to http://localhost:3000/shop-by-category, and tada! We can now browse products by categories using the tiles on that new page.

Screenshot of a browser showing the product categories on the new page

Conclusion

With the PWA Kit MCP Server, you can tackle many storefront development tasks by writing just a few prompts. In addition, you can use Cursor’s AI capabilities to troubleshoot issues with your storefront code, add tests, and use MCP tools to explore hooks and the Salesforce B2C Commerce API (SCAPI), among other things. The MCP server is your friend for your composable storefront development projects!

Important Note: The PWA Kit MCP Server is currently available as a Developer Preview. Salesforce will announce its General Availability in documentation, press releases, or other public statements. All commands, parameters, and other features are subject to change or deprecation any time, with or without notice. As such, we recommend that you do not implement functionality in production with these commands or tools.

Resources

About the authors

Wei Liu is a Principal Engineer at Salesforce specializing in innovative storefront development solutions for Commerce Cloud. She focuses on elevating developer experience, building scalable architectures, and advancing next-generation commerce tooling. You can follow Wei on LinkedIn

Katia Hage is a Lead Technical Writer at Salesforce. Katia writes documentation for storefront solutions for Commerce Cloud. Previously, Katia also covered documentation and Trailhead for the Salesforce Platform and Data Cloud.

The post Build Composable Storefronts Smarter and Faster with the PWA Kit MCP Server appeared first on Salesforce Developers Blog.



Read the whole story
alvinashcraft
4 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

The Model Context Protocol's impact on 2025

1 Share

The Model Context Protocol's impact on 2025

Read the whole story
alvinashcraft
5 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories