Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
150125 stories
·
33 followers

Give Claude AI Full Access to Your Local Filesystem With MCP

1 Share
abstract

We’ve done quite a bit of talking about Model Context Protocol (MCP), an open source standard that streamlines how AI models interact with APIs, but with Claude Desktop you can really see it in action. As a developer, some of your workflow might move across your laptop file system. In terms of area of effect, you would typically want to focus your development on Claude Code and ask simple queries via the web on mobile. But the Desktop product can be made aware of files on your machine, which has benefits that I will explore in this post.

Claude Desktop is a bit smarter today than when it first appeared. For a start, you no longer need to write the server code yourself. Claude Desktop provides prebuilt connecters to other services, although the basic pattern is to use an LLM to find documents and transform the information within them in a useful way. What we always want is a system that the LLM understands, so that we can just let it figure things out.

Installation and Initial Setup of Claude Desktop

I downloaded the 200MB .dmg for my M4, and installed it:

I signed in and was immediately moved to a web page. These pages were managed properly by the app, but you can see why users are wondering why they are using an app that immediately wants to use the web.

Next we see some immediate and useful hints about using Claude on your actual desktop, where it will be available when you need it:

I’m not sure hijacking the caps lock is a great plan — but if you think of your Mac as a dictation device, perhaps we won’t be typing much at all in the future?

I was then given the option to “update” to Claude Opus 4.5, which I took. But remember we aren’t necessarily focusing on straight coding for this app, so other models might be more suitable for your use case.

Understanding Payment Plans and Integrations

You then get confronted with the payment plans (although without showing usage limits), and naturally there is a free plan. I might already be part of a plan, but this interface didn’t tell me. According to Anthropic Console (now Claude Console) I might be on the API plan and have some credit. One day all this confusion will be gone, but for now we are in the innovation storm of the token economy.

Before I move on from payment plans, the first thing I noticed was this small discreet hint:

Now, I do want check the Slack integration. A few years ago, I tried Slack integration with a web tool and it was fairly tricky. Unfortunately, the Slack connector seems only available from a specific tier of subscription model, for Claude Team and Enterprise plan customers who have installed Claude in the Slack app. This might well be a sensible business offering.

Configuring the Fileserver MCP for Local File Access

But we are not limited to pre-figured connectors; we can roll our own. Claude recognises skills, and we can make MCP servers to talk to our local apps. This is actually a good reason to use Opus 4.5. You will need to be on a a paid plan to use skills too. But our ambition is just to let Claude work with the files on your local drive.

Right now, I don’t have any connectors up and running. So let’s check whether Claude can see our local files:

In order to change this, we are going to change the settings to allow the Fileserver MCP to fiddle with a limited set of files. At some point the server will need Node.js, so open a terminal and make sure you can do this:

Now we will work with Claude Desktop’s settings directly (through the App’s Mac menu, or from the App settings) and go to the Developers section:

Hitting “Edit Config” will open up the MCP config JSON file. You can see that mine is empty:

Remember, Claude Desktop is the “host” and the Fileserver is an MCP server that Claude can call. So you want to have something like this:

{ 
  "mcpServers": { 
    "filesystem": { 
      "command": "npx", 
      "args": [
        "-y", 
        "@modelcontextprotocol/server-filesystem", 
        "/Users/eastmad/Downloads" 
      ] 
    } 
  } 
}


I’m eastmad but you probably are not; so obviously, use your name instead. I’ve given the server access to my Downloads folder to underline that this works in general — and you can carry on adding specific directories in that array. You can see we are using npx (hence you need Node) and this server is known as “filesystem.” This will give Claude full access, as I’ve explained — though confirmation will be required.

OK, these are starting configs, so close Claude Desktop and restart it.

Immediately on restart, Claude Desktop knows a little bit more:

But does Claude know what it knows? Let’s try that initial question again. Go back to that query and refresh it:

Once you allow the action:

That’s great, although it doesn’t mention writing to the disk, so let’s check it can:

Finder assures me this is no hallucination:

And yes, a classic haiku is not only 5-7-5 syllables, it should also mention a season.

Security Implications and Permissions

Now before you start witnessing the firepower of your fully armed and operational MCP server, make sure you understand the security implications. We are only working locally, but we are still sending information back and forth from Anthropic, so ensure that documents you read don’t tie identity information to any sensitive stuff. You can also give Claude access to other tools to allow it to work locally — indeed, this is where the true power lies.

At the moment, Claude will ask you permission for each access request. So I asked Claude to read the details of one email file; it stops twice to check permissions.

Once you learn a bit more, you will be able to bypass these checks (if you truly wish to do so).

Final Thoughts on Using Claude With MCP

Claude Desktop can use a range of prebuilt MCP servers to connect to certain services — but remember, you will still have to engineer the permissions, credentials and any other admin things with third parties.

Payment plan and token usage systems remain murky, even though actual transactions are perfectly transparent. On the one hand, it is obvious that these systems will become federated and streamlined over time — although when that happens, the token economy may well be targeted for national taxation. So mild chaos does have some advantages for now.

The post Give Claude AI Full Access to Your Local Filesystem With MCP appeared first on The New Stack.

Read the whole story
alvinashcraft
22 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

How AI Agents Are Changing the Way Developers Build Software

1 Share

At this year’s Web Summit, one of the main topics was AI agents and the ways they are reshaping the world. Although a Stack Overflow survey showed that developers remain skeptical and agentic AI has not yet fully entered the mainstream, tech companies are firmly betting on this technology and its potential to shape the future across a wide range of fields.

Krešo Žmak, VP of Products at Infobip, held a roundtable on the first day, just before the conference opening, on the topic: From SaaS to Agents: How Personal AI Will Redefine Platforms and User Experience.

To bring you firsthand insights on this topic, we spoke with Krešo. In the interview that follows, you’ll learn everything you need to know about how AI will transform software-as-a-service as we know it and make the user experience truly extraordinary.

Why data is the key barrier to AI-powered platforms?

Krešo Žmak identified data as the biggest technical barrier in transitioning from a traditional SaaS model to agent-driven platforms.

“The biggest challenge lies in the data,” he said. “It involves combining different sources, managing access, hierarchy, structure, governance, availability, throughput, and so on, because AI is all about data.”

He contrasted deployment speed with real productivity:

You can deploy an agent in minutes by writing a prompt. That’s the initial step. But the real struggle begins when you try to make the agent productive. The first obstacle you face is how to access the data.

Even with modern tools, the issue remains that MCPs simplify access to backend and legacy systems, but the data behind those MCPs or APIs often stays unstructured and undocumented, which creates the biggest challenge.

Architecting CPaaS for the agent-to-agent world

Krešo emphasized that AI “heavily affects the software industry”. He explained that CPaaS architectures must evolve without requiring a complete redesign.

“In the short to mid term, channels will remain the primary way of communication,” he said, “because business agents will continue to interact with end users”.

He emphasized the need for CPaaS platforms to introduce new interfaces that support agentic capabilities – such as MCP or agent-to-agent connections – to integrate agentic solutions directly into the CPaaS stack. In his view, MCP adoption is already increasing significantly compared with legacy API integrations.

Krešo described the long-term paradigm shift:

In the future, brands and end users will communicate through agents talking with personal agents. Communication will move from machine-to-person to machine-to-machine.

He predicted this revolution will become mainstream by 2028, though rapid AI innovation could accelerate it to next year.

Modern AI products can’t skip MCPs or agent talk

Krešo highlighted MCP and agent-to-agent interfaces as mandatory technologies for modern AI-first products.

CPaaS companies must expose APIs through MCP interfaces and build agents that handle parts of the CPaaS workload, such as onboarding, analytics, or messaging.

He clearly described this transition as the future, where agents will communicate with other CPaaS agents or consume MCP interfaces. This shift is already reshaping the CPaaS landscape.

LLMs need infrastructure to be reliable

Furthermore, Krešo argued that the core problem with LLMs doesn’t lie in the technology itself but in how people use it, particularly when it comes to hallucinations.

He explained that if you rely on an LLM directly, it will hallucinate and produce incorrect answers. He advised enterprises to build infrastructure around LLMs:

You need to wrap LLMs with RAG pipelines, enforce guardrails, and process outputs to check for compliance. You must anchor the LLM to the context it serves. An enterprise solution should only respond within the brand’s context.

Finally, he noted that LLMs require integration because an LLM cannot execute actions on its own. To make it useful, you must connect it to the enterprise backend system.

The future of software is a hybrid “chat-with-your-app” model

Krešo confirmed the rise of the “chatting with software” paradigm in modern user experience design. He explained that users increasingly activate services, check analytics, or launch campaigns by typing prompts or interacting conversationally with interfaces.

He predicted a hybrid model in which non-deterministic LLM interactions are paired with deterministic UI elements. Inside a portal, users may chat to trigger actions while still receiving structured analytical graphs and predictable outputs.

The future lies in combining classical web interfaces, UIs, and prompting to enhance capabilities.

AI-powered personalization means knowing users inside out

Krešo explained that AI significantly augments and accelerates business processes and provides much greater capability for processing and structuring data, which drives personalization. He clarified that true personalization is not about surface-level details but about understanding the full context of a user’s interaction, whether they are reaching out about a campaign, an invoice, or something entirely different.

He added that AI enables the summarization of large volumes of data by connecting various touchpoints between a brand and a user and turning them into meaningful insight far beyond classical personalization models.

Personalization means knowing the full context of the user, not just who they are.

Strong engineering fundamentals are crucial

Krešo urged developer teams to focus on core engineering knowledge rather than transient AI tools. He emphasized the importance of architectural skills:

Understanding design principles and architecture will become increasingly important. Developers must know how to build applications, how data connects to the user interface, and how the backend processes it.

He concluded that AI tools can accelerate development, but only engineers with strong fundamentals can use them to build robust solutions quickly.

The post How AI Agents Are Changing the Way Developers Build Software appeared first on ShiftMag.

Read the whole story
alvinashcraft
45 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Ignite Spotlight: Copilot Pages Agent Mode & Notebooks

1 Share

Introduction

Microsoft Ignite has set the stage for the next era of AI-native work with Agent Mode in Copilot Pages. This breakthrough turns Copilot into a proactive, context-aware agent that doesn’t just respond—it manages entire workflows, keeps context across tasks, and drives collaboration without constant user input. In this post, created using Agent Mode itself, we’ll explore why this matters for organizations and how it fits alongside other Ignite innovations like Page History, Copilot Notebooks sharing, and Copilot Shortcuts.

  1. Introduction
  2. Agent Mode in Copilot Pages
    1. Why Executives Should Care
    2. Use Case: Writing This Blog Post with Agent Mode
  3. Version History in Copilot Pages
  4. Copilot Notebooks Sharing and other capabilities
    1. How to Invite People to a Notebook
    2. New Notebook Capabilities
  5. Copilot Shortcuts
    1. Available Shortcuts:
    2. Tips for Using Shortcuts:
  6. Conclusion

Agent Mode in Copilot Pages

Agent Mode is more than a feature—it’s a new way of working. Here’s what makes it powerful:

  • Persistent Context: Copilot retains instructions and applies them across tasks.
  • Multi-Step Workflow Execution: From ideation to final output, Copilot handles transitions without manual re-entry.
  • Real-Time Collaboration: Teams can co-author while Copilot manages structure and formatting.
  • Full Editing Flexibility: You and your colleagues can always edit the content in the page manually and also ask Copilot to make edits at any time, ensuring complete control and adaptability.

Note: At this point, you need to ensure GPT-5 is ON for Agent Mode to work. If you try anything too complex, Copilot may not create the page. However, you can create the page manually ( the old way) on Copilot’s result and then use Agent Mode to edit its contents.

Imagine starting with: Create a page for onboarding digital employees to the organization.” Copilot creates a structured page, organizes sections, and iterates based on feedback—all in one continuous flow.

Why Executives Should Care

  • Faster decision cycles.
  • Scalable collaboration.
  • Built-in governance and compliance.
  • Significant efficiency gains.

Use Case: Writing This Blog Post with Agent Mode

This very article is an example of Agent Mode in action:

  • Step 1: Ideation in Copilot Chat – We started by asking Copilot to draft a LinkedIn post about Agent Mode.
  • Step 2: Expand to Blog – Using Agent Mode, Copilot transformed the post into a full blog with structured sections.
  • Step 3: Add Ignite News – Copilot integrated additional announcements like Page History and Notebook Sharing.
  • Step 4: Iterate and Refine – We requested edits, added headers, and expanded explanations—all without losing context.

Learn more about Copilot Pages and Agent Mode here.


Version History in Copilot Pages

Copilot Pages are built for iteration, and Page History gives you confidence and control at every step—whether editing manually, with Copilot, or collaboratively.

  • Recover previous versions in seconds if something is deleted or overwritten.
  • Copy specific sections from older versions without overwriting current work.
  • Available today for all Microsoft 365 Copilot app users on web, Windows desktop, and Mac desktop.

How to Use Page History:

  1. Open your Copilot Page.
  2. Click Page History in the toolbar.
  3. Browse previous versions and select the one you need.
  4. Choose Restore Entire Page or manually copy content

Note: Page History does not take continuous snapshots of your page—it only saves versions periodically. For immediate changes, you can and should still use Undo (Ctrl+Z) to revert to the previous moment. Use Page History when you need to restore or copy content from an earlier version beyond what undo can provide.


Copilot Notebooks Sharing and other capabilities

Collaboration just got easier. You can now share Copilot Notebooks with colleagues for brainstorming, planning, and experimentation.

  • Control access with Microsoft 365 permissions.
  • Share securely across your organization.

License Requirement: Copilot Notebooks is available for users with a Microsoft 365 Copilot license and requires access to OneDrive or SharePoint for storage.

How to Invite People to a Notebook

  1. Open your Copilot Notebook.
  2. Click the Share icon in the top-left corner of the notebook interface.
  3. In the sharing panel, type the name or email address of the person you want to invite in the input box.
  4. Click Invite to send the invitation.

What Happens Next:

  • People invited to the Notebook will receive an email and can open the Notebook.
  • All notebook content and links are shared, while chats and audio overviews remain private.
  • Under People, you’ll see the list of members with their roles (e.g., Owner, Editor).
  • Use the dropdown next to each name to remove them from the notebook.

Tip: If you don’t see the new sharing UI, press Shift + F5 to refresh the page and load the latest interface.

Learn more here.

New Notebook Capabilities

  • You can now create a PowerPoint presentation or Word document directly from your Notebook using the Create button in the top-right corner. This makes it easy to turn ideas into polished deliverables without leaving the Notebook.
  • The new UI also includes an Overview section (currently in Frontier preview) that generates a read-only summary and key insights of the Notebook contents.

Tip: if you are looking for how to create the Audio Overview, then look for adding content where you can create the Audio overview or Study guide.


Copilot Shortcuts

Located at the bottom-right corner of the page, Copilot Shortcuts provide quick actions to refine and format your content without typing additional prompts. These shortcuts are essentially predefined prompts that you can click, and then Agent Mode executes the changes directly on the page. This makes editing faster and more intuitive.

Available Shortcuts:

  • Content Adjustments: Add details, make text shorter, change to an email, or convert to a blog post.
  • Tone Settings: Switch tone to Professional, Friendly, Sincere, or Assertive.
  • Formatting Options: Add emojis, section headers, introduction, conclusion, or apply all enhancements at once.

Tips for Using Shortcuts:

  • Use Add details when you need richer explanations or examples.
  • Apply Make shorter for concise summaries or executive-ready content.
  • Switch tone quickly to match your audience—Professional for formal reports, Friendly for internal updates.
  • Combine formatting options like Add section headers and Add conclusion to make content presentation-ready.
  • For fast transformations, use Change to an email or Change to a blog post to adapt content for different channels.

These shortcuts make Copilot even more efficient by enabling one-click improvements for style, tone, and structure.


Conclusion

Agent Mode, Page History, Notebook Sharing, and Copilot Shortcuts together represent a useful updates to Copilot Pages and Notebooks. These features don’t just save time—they fundamentally change how teams collaborate and innovate:

  • Agent Mode ensures continuity and accelerates workflows.
  • Page History provides confidence and flexibility for iterative work.
  • Notebook Sharing unlocks collaborative creativity across your organization while Overview and Study guide gets you up to speed with contents
  • Copilot Shortcuts deliver instant refinements for tone, structure, and formatting.

By combining these capabilities, Microsoft 365 Copilot empowers organizations to move faster, stay aligned, and innovate without friction. The future of work isn’t just about automation—it’s about intelligent, adaptive systems that work alongside humans to deliver better outcomes.

I created most of the content on this page using Copilot Page Agent Mode, only editing some parts when I had to. I would recommend everyone start trying this out, how to create content with AI.





Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Run cost-effective AI workloads on OpenShift with AWS Neuron Operator

1 Share

Large enterprises run LLM inference, training, and fine-tuning on Kubernetes for the scale and flexibility it provides. As organizations look to optimize both performance and cost, AWS Inferentia and Trainium chips provide a powerful, cost-effective option for accelerating these workloads, delivering up to 70% lower cost per inference compared to other instance types in many scenarios. Through a joint effort between AWS and Red Hat, these AWS AI chips are now available to customers using Red Hat OpenShift Service on AWS and self-managed OpenShift clusters on AWS, giving organizations more choice in how they design and run their AI platforms like Red Hat OpenShift AI.

The AWS Neuron Operator brings native support for AWS AI chips to Red Hat OpenShift, enabling you to run inference with full LLM support using frameworks like vLLM. This integration combines the cost benefits of AWS silicon with the enterprise features of OpenShift and the overall Red Hat AI capabilities..

What the AWS Neuron Operator does

The AWS Neuron Operator automates the deployment and management of AWS Neuron devices on OpenShift clusters. It handles four key tasks:

  • Kernel module deployment: Installs Neuron drivers using Kernel Module Management (KMM)
  • Device plug-in management: Exposes Neuron devices as schedulable resources
  • Intelligent scheduling: Deploys a custom Neuron-aware scheduler for optimal workload placement
  • Telemetry collection: Provides basic metrics through a node-metrics DaemonSet

The operator reconciles a custom resource called DeviceConfig that lets you configure images and target specific nodes in your cluster.

Joint development by AWS and Red Hat

This operator represents a collaboration between AWS and Red Hat engineering teams. The operator includes core functionality, Neuron integration, OpenShift integration patterns, and lifecycle management. Red Hat, as the originators of the operator framework before it became a CNCF project, developed the operator based on established best practices.

The project consists of two open source repositories:

Both repositories use automated GitHub Actions workflows to build and publish container images to public registries, making installation straightforward.

Why use AWS AI chips for LLM workloads

AWS Inferentia and Trainium chips are purpose-built for machine learning. Inferentia focuses on inference workloads, while Trainium handles both training and inference. Here's what makes them compelling for LLM deployments:

  • Cost efficiency: Run inference at up to 50% lower cost compared to GPU instances. For high-volume inference workloads, this translates to significant savings.
  • Performance: Inferentia2 delivers up to 4x higher throughput and 10x lower latency than first-generation Inferentia. Trainium offers high-performance training for models with hundreds of billions of parameters.
  • Framework support: The Neuron SDK integrates with popular frameworks including PyTorch, TensorFlow, and vLLM. You can deploy models from Hugging Face with minimal code changes.
  • Full LLM support: Run popular models like Llama 2, Llama 3, Mistral, and other transformer-based architectures. The vLLM integration provides optimized inference with features like continuous batching and PagedAttention.

Architecture overview

The operator uses several OpenShift and Kubernetes components to enable Neuron devices:

  • Node Feature Discovery (NFD): Detects Neuron PCI devices (vendor ID 1d0f) and labels nodes accordingly. This allows the operator to target the right nodes.
  • Kernel Module Management (KMM): Loads the Neuron kernel driver on nodes with compatible hardware. KMM handles kernel version matching automatically, even across OpenShift upgrades.
  • Custom Scheduler: A Neuron-aware scheduler extension that understands neuron core topology. This ensures workloads are placed on nodes with available neuron cores, not just nodes with Neuron devices.
  • Device plug-in: Exposes aws.amazon.com/neuron and aws.amazon.com/neuroncore as allocatable resources. Pods can request these resources in their resource limits.

The operator manages all these components through a single DeviceConfig custom resource, simplifying operations.

Installing the AWS Neuron Operator

You can install the operator through the OpenShift web console or using the command line. Both methods require three prerequisite operators from Red Hat.

Prerequisites

Before installing the AWS Neuron Operator, install these operators from OperatorHub:

  • Node Feature Discovery (NFD): Detects hardware features
  • Kernel Module Management (KMM): Manages kernel drivers
  • AWS Neuron Operator (by AWS): Manages Neuron devices

All three operators are available in the OpenShift OperatorHub catalog.

Installation via OpenShift console (recommended)

This method uses the OpenShift web console and is the easiest way to get started.

Step 1: Install Node Feature Discovery

  1. Open your cluster's web console.
  2. Navigate to Operators → OperatorHub.
  3. Search for Node Feature Discovery.
  4. Click Node Feature Discovery provided by Red Hat.
  5. Click Install, then Install again at the bottom.
  6. Once installed, click View Operator.
  7. Click Create Instance under NodeFeatureDiscovery.
  8. Click Create at the bottom (use default settings).

Step 2: Apply the NFD Rule for Neuron Devices

Create a file named neuron-nfd-rule.yaml:

apiVersion: nfd.openshift.io/v1alpha1
kind: NodeFeatureRule
metadata:
  name: neuron-nfd-rule
  namespace: ai-operator-on-aws
spec:
  rules:
    - name: neuron-device
      labels:
        feature.node.kubernetes.io/aws-neuron: "true"
      matchAny:
        - matchFeatures:
            - feature: pci.device
              matchExpressions:
                vendor: {op: In, value: ["1d0f"]}
                device: {op: In, value: [
                  "7064", "7065", "7066", "7067",
                  "7164", "7264", "7364"
                ]}

Apply it:

oc apply -f neuron-nfd-rule.yaml

This rule labels nodes that have AWS Neuron devices, making them discoverable by the operator.

Step 3: Install Kernel Module Management

  1. Go back to Operators → OperatorHub.
  2. Search for Kernel Module.v
  3. Click Kernel Module Management provided by Red Hat.
  4. Click Install, then Install again.

Step 4: Install AWS Neuron Operator

  1. Go to Operators → OperatorHub.
  2. Search for AWS Neuron.
  3. Click AWS Neuron Operator provided by Amazon, Inc.
  4. Click Install, then Install again.
  5. Once installed, click View Operator.
  6. Click Create Instance under DeviceConfig.
  7. Update the YAML with your desired configuration (see below).
  8. Click Create.

Installation via command line

For automation or CI/CD pipelines, use the command-line installation method.

Step 1: Install Prerequisites

Install NFD and KMM operators through OperatorHub first, then create the NFD instance and apply the NFD rule shown above.

Step 2: Install the Operator

# Install the latest version
kubectl apply -f https://github.com/awslabs/operator-for-ai-chips-on-aws/releases/latest/download/aws-neuron-operator.yaml
# Or install a specific version
kubectl apply -f https://github.com/awslabs/operator-for-ai-chips-on-aws/releases/download/v0.1.1/aws-neuron-operator.yaml

Step 3: Create DeviceConfig

Create a file named deviceconfig.yaml:

apiVersion: k8s.aws/v1alpha1
kind: DeviceConfig
metadata:
  name: neuron
  namespace: ai-operator-on-aws
spec:
  driversImage: public.ecr.aws/q5p6u7h8/neuron-openshift/neuron-kernel-module:2.24.7.0
  devicePluginImage: public.ecr.aws/neuron/neuron-device-plugin:2.24.23.0
  customSchedulerImage: public.ecr.aws/eks-distro/kubernetes/kube-scheduler:v1.32.9-eks-1-32-24
  schedulerExtensionImage: public.ecr.aws/neuron/neuron-scheduler:2.24.23.0
  selector:
    feature.node.kubernetes.io/aws-neuron: "true"

Apply it:

oc apply -f deviceconfig.yaml

The operator will automatically append the kernel version to the driversImage at runtime, ensuring the correct driver is loaded.

Verify installation

Check that all components are running:

# Check operator pods
oc get pods -n ai-operator-on-aws
# Verify KMM module
oc get modules.kmm.sigs.x-k8s.io -A
# Check node labels
oc get nodes -l feature.node.kubernetes.io/aws-neuron=true
# Verify Neuron resources are available
kubectl get nodes -o json | jq -r '
  .items[]
  | select(((.status.capacity["aws.amazon.com/neuron"] // "0") | tonumber) > 0)
  | .metadata.name as $name
  | "\($name)\n  Neuron devices: \(.status.capacity["aws.amazon.com/neuron"])\n  Neuron cores: \(.status.capacity["aws.amazon.com/neuroncore"])"
'

You should see nodes with available Neuron devices and cores.

Running LLM inference with vLLM

Once the operator is installed, you can deploy LLM inference workloads using vLLM, a high-performance inference engine optimized for AWS Neuron.

Set up the inference environment

  1. Create a namespace:

    oc create namespace neuron-inference
  2. Create a PersistentVolumeClaim for model storage. This PVC stores the downloaded model, so you don't need to download it every time you restart the deployment.

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: model-storage
      namespace: neuron-inference
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
  3. Create a Hugging Face token secret. Most LLM models require authentication to download from Hugging Face.

    oc create secret generic hf-token \
      --from-literal=token=YOUR_HF_TOKEN \
      -n neuron-inference
  4. Deploy the vLLM Inference Server. Create a deployment that downloads the model and runs the vLLM server:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: vllm-inference
      namespace: neuron-inference
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: vllm-inference
      template:
        metadata:
          labels:
            app: vllm-inference
        spec:
          initContainers:
          - name: model-downloader
            image: python:3.10-slim
            command:
            - /bin/bash
            - -c
            - |
              pip install huggingface_hub
              python -c "
              from huggingface_hub import snapshot_download
              import os
              token = os.environ.get('HF_TOKEN')
              snapshot_download('meta-llama/Llama-2-7b-hf', 
                              local_dir='/model',
                              token=token)
              "
            env:
            - name: HF_TOKEN
              valueFrom:
                secretKeyRef:
                  name: hf-token
                  key: token
            volumeMounts:
            - name: model-storage
              mountPath: /model
          containers:
          - name: vllm-server
            image: public.ecr.aws/neuron/vllm-neuron:latest
            command:
            - python
            - -m
            - vllm.entrypoints.openai.api_server
            - --model
            - /model
            - --tensor-parallel-size
            - "2"
            ports:
            - containerPort: 8000
              name: http
            resources:
              limits:
                aws.amazon.com/neuron: 2
              requests:
                aws.amazon.com/neuron: 2
            volumeMounts:
            - name: model-storage
              mountPath: /model
          volumes:
          - name: model-storage
            persistentVolumeClaim:
              claimName: model-storage
  5. Expose the Service. Create a service and route for external access:

    apiVersion: v1
    kind: Service
    metadata:
      name: vllm-service
      namespace: neuron-inference
    spec:
      selector:
        app: vllm-inference
      ports:
      - port: 8000
        targetPort: 8000
        name: http
    ---
    apiVersion: route.openshift.io/v1
    kind: Route
    metadata:
      name: vllm-route
      namespace: neuron-inference
    spec:
      to:
        kind: Service
        name: vllm-service
      port:
        targetPort: http
      tls:
        termination: edge

    Apply all resources:

    oc apply -f pvc.yaml
    oc apply -f deployment.yaml
    oc apply -f service.yaml

Testing the inference endpoint

Once the vLLM server is running, you can send requests to the OpenAI-compatible API:

# Get the route URL
ROUTE_URL=$(oc get route vllm-route -n neuron-inference -o jsonpath='{.spec.host}')
# Send a test request
curl https://${ROUTE_URL}/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/model",
    "prompt": "Explain quantum computing in simple terms:",
    "max_tokens": 100,
    "temperature": 0.7
  }'

The vLLM server provides an OpenAI-compatible API, making it easy to integrate with existing applications.

While the upstream vLLM project provides excellent capabilities today, Red Hat AI Inference Server, powered by vLLM, is designed to provide a hardened, enterprise-ready runtime for these workloads.

Cost optimization strategies

Running LLMs on AWS Neuron chips can significantly reduce your inference costs. Here are strategies to maximize savings:

  • Use Inferentia2 for inference-only workloads. Inferentia2 instances like inf2.xlarge start at a fraction of the cost of comparable GPU instances. For production inference, this is the most cost-effective option.
  • Leverage continuous batching. vLLM's continuous batching feature maximizes throughput by dynamically batching requests. This increases utilization and reduces cost per inference.
  • Right-size your instances. Start with smaller instance types and scale up based on actual usage. Inferentia2 instances come in various sizes from inf2.xlarge (1 Neuron device) to inf2.48xlarge (12 devices).
  • Use Spot instances for development. Red Hat OpenShift Service on AWS supports EC2 Spot instances through machine pools. Use Spot for development and testing environments to save up to 90%.
  • Cache models on persistent volumes. As shown in the vLLM example, caching models on PVCs eliminates repeated downloads and reduces startup time.

Monitoring and troubleshooting

The operator includes basic telemetry through the node-metrics DaemonSet. For production deployments, integrate with OpenShift monitoring.

Common issues

Here are some common issues and their troubleshooting steps.

Pods stuck in Pending state

Check that nodes have the feature.node.kubernetes.io/aws-neuron=true label and that Neuron resources are available:

oc describe node <node-name> | grep neuron

Driver not loading

Verify the KMM module is created and the DaemonSet is running:

oc get modules.kmm.sigs.x-k8s.io -A
oc get ds -n ai-operator-on-aws

Model download failures

Check that the Hugging Face token is valid and the model name is correct. Review init container logs:

oc logs <pod-name> -c model-downloader -n neuron-inference

Scheduler not placing pods

Ensure the custom scheduler is running and pods are using the correct scheduler name:

oc get pods -n ai-operator-on-aws | grep scheduler

What's next

The AWS Neuron Operator for OpenShift enables enterprise-grade AI acceleration. As AWS continues to invest in purpose-built AI chips and Red Hat enhances OpenShift's AI capabilities, expect more features and optimizations.

To support this vision, Red Hat AI Inference Server support for AWS AI chips (Inferentia and Trainium) is coming by January 2026. This developer preview will allow you to run the supported Red Hat AI Inference Server on AWS silicon, combining the cost efficiency of AWS Neuron with the lifecycle support and security of Red Hat AI.

To get started today:

The combination of AWS AI chips and OpenShift provides a powerful platform for running cost-effective AI workloads at scale. Whether you're deploying LLMs for customer service, content generation, or data analysis, this integration makes it easier and more affordable.

 

Note

The AWS Neuron Operator is developed jointly by AWS and Red Hat. Contributions and feedback are welcome through the GitHub repositories.

The post Run cost-effective AI workloads on OpenShift with AWS Neuron Operator appeared first on Red Hat Developer.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

freeCodeCamp's New Responsive Web Design Certification is Now Live

1 Share

The freeCodeCamp community just published our new Responsive Web Design certification. You can now sit for the exam to earn the free verified certification, which you can add to your résumé, CV, or LinkedIn profile.

Each certification is filled with hundreds of hours worth of interactive lessons, workshops, labs, and quizzes.

List of HTML modules in the new Responsive Web Design certification

How Does the New RWD Certification Work?

The Responsive Web Design certification will teach you core concepts including semantic HTML, working with forms, the importance of accessibility, CSS Flexbox, responsive design, CSS Grid, and more.

The certification is broken down into several modules that include lessons, workshops, labs, review pages and quizzes to ensure that you truly understand the material before moving onto the next module.

The lessons are your first exposure to new concepts. They provide crucial theory and context for how things work in the software development industry.

These lessons include our new interactive editor so you can see previews of the code. You can also play around with the examples for deeper understanding and comprehension.

Example of using the interactive editor to explain how linear gradients works.

At the end of each lesson, there will be three comprehension check questions to test your understanding of the material from the lesson.

Example question from the working with forms quiz in the Responsive Web Design certification

After these lessons, you’ll head over to the workshops. These workshops are guided step-based projects that provide you with an opportunity to practice what you have learned in the lessons.

Example step from the Build a City Skyline workshop

After the workshops, you’ll complete a lab which will help you review what you have learned so far. This will give you the chance to start building projects on your own, which is a crucial skill for a developer. You’ll be presented with a list of user stories and will need to pass the tests to complete the lab.

Example user stories from the Build a Product Landing Page lab

At the end of each module, there is a review page containing a list of all of the concepts covered. You can use these review pages to help you study for the quizzes.

Portion of the review content from the CSS attribute selectors review page

The last portion of the module is the quiz. This is a 20 question multiple choice quiz designed to test your understanding from the material covered in the module. You’ll need to get 18 out of 20 correct to pass.

Example question from the CSS Grid Quiz

Throughout the certification, there will be five certification projects you’ll need to complete in order to qualify for the exam.

[Alt Text: List of HTML certification projects in the new Responsive Web Design certification]

Once you’ve completed all 5 certification projects, you’ll be able to take the 50 question exam using our new open source exam environment. The freeCodeCamp community designed this exam environment tool with two goals: respecting your privacy while also making it harder for people to cheat.

Once you download the app to your laptop or desktop, you can take the exam.

Screenshot from the Responsive Web Design Certification Exam page

Frequently Asked Questions

Is all of this really free?

Yes. freeCodeCamp has always been free, and we’ve now offered free verified certifications for more than a decade. These exams are just the latest expansion to our community’s free learning resources.

What prevents people from just cheating on the exams?

Our goal is to strike a balance between preventing cheating and respecting people's right to privacy.

We've implemented a number of reliable, yet non-invasive, measures to help prevent people from cheating on freeCodeCamp's exams:

  1. For each exam, we have a massive bank of questions and potential answers to those questions. Each time a person attempts an exam, they'll see only a small, randomized sampling of these questions.

  2. We only allow people to attempt an exam one time per week. This reduces their ability to "brute force" the exam.

  3. We have security in place to validate exam submissions and prevent man-in-the-middle attacks or manipulation of the exam environment.

  4. We manually review each passing exam for evidence of cheating. Our exam environment produces tons of metrics for us to draw from.

We take cheating, and any form of academic dishonesty, seriously. We will act decisively.

This said, no one's exam results will be thrown out without human review, and no one's account will be banned without warning based on a single suspicious exam result.

Are these exams “open book” or “closed book”?

All of freeCodeCamp’s exams are “closed book”, meaning you must rely only on your mind and not outside resources.

Of course, in the real world you’ll be able to look things up. And in the real world, we encourage you to do so.

But that is not what these exams are evaluating. These exams are instead designed to test your memory of details and your comprehension of concepts.

So when taking these exams, do not use outside assistance in the form of books, notes, AI tools, or other people. Use of any of these will be considered academic dishonesty.

Do you record my webcam, microphone, or require me to upload a photo of my personal ID?

No. We considered adding these as additional test-taking security measures. But we have less privacy-invading methods of detecting most forms of academic dishonesty.

If the environment is open source, doesn't that make it less secure?

"Given enough eyeballs, all bugs are shallow." – Linus Torvalds, the creator of Linux

Open source software projects are often more secure than their closed source equivalents. This is because a lot more people are scrutinizing the code. And a lot more people can potentially help identify bugs and other deficiencies, then fix them.

We feel confident that open source is the way to go for this exam environment system.

How can I contribute to the Exam Environment codebase?

It's fully open source, and we'd welcome your code contributions. Please read our general contributor onboarding documentation.

Then check out the GitHub repo.

You can help by creating issues to report bugs or request features.

You can also browse open help wanted issues and attempt to open pull requests addressing them.

Are the exam questions themselves open source?

For obvious exam security reasons, the exam question banks themselves are not publicly accessible. :)

These are built and maintained by freeCodeCamp's staff instructional designers.

What happens if I have internet connectivity issues mid-exam?

If you have internet connectivity issues mid exam, the next time you try submit an answer, you’ll be told there are connectivity issues. The system will keep prompting you to retry submitting until the connection succeeds.

What if my computer crashes mid-exam?

If your computer crashes mid exam, you’ll be able to re-open the Exam Environment. Then, if you still have time left for your exam attempt, you’ll be able to continue from where you left off.

Can I take exams in languages other than English?

Not yet. We’re working to add multi-lingual support in the future.

I have completed my exam. Why can't I see my results yet?

All exam attempts are reviewed by freeCodeCamp staff before we release the results. We do this to ensure the integrity of the exam process and to prevent cheating. Once your attempt has been reviewed, you'll be notified of your results the next time you log in to freeCodeCamp.org.

I am Deaf or hard of hearing. Can I still take the exams?

Yes! While some exams may include audio components, we do make written transcripts available for reading.

I am blind or have limited vision, and use a screen reader. Can I still take the exams?

We’re working on it. Our curriculum is fully screen reader accessible. We're still refining our screen reader usability for the Exam Environment app. This is a high priority for us.

I use a keyboard instead of a mouse. Can I navigate the exams using just a keyboard?

This is a high priority for us. We hope to add keyboard navigation to the Exam Environment app soon.

Are exams timed?

Yes, exams are timed. We err on the side of giving plenty of time to take the exam, to account for people who are non-native English speakers, or who have ADHD and other learning differences that can make timed exams more challenging.

If you have a condition that usually qualifies you for extra time on standardized exams, please email support@freecodecamp.org. We’ll review your request and see whether we can find a reasonable solution.

What happens if I fail the exam? Can I retake it?

Yes. You get one exam attempt per week. Afterward, if you don’t pass, there is a one-week (exactly 168 hour) “cool-down” period where you cannot take any freeCodeCamp exams. This is to encourage you to study and to pace yourself.

There is no limit to the number of times you can take an exam. So if you fail, study more, practice your skills more, then try again the following week.

Do I need to redo the projects if I fail the exam?

No. Once you’ve submitted a certification project, you do not need to ever submit it again.

You can re-do projects for practice, but we recommend that you instead build some of our many practice projects in freeCodeCamp’s developer interview job search section.

A screenshot of the "Prepare for the developer interview job search" section with lots of coding projects

What happens if I already have the old Legacy Responsive Web Design certification? Should I claim the new one?

The new certification has more theory and practice as well as an exam. So if you’re looking to brush up on your skills, then you can go through the new version of this certification.

What will happen to my existing coursework progress on the Full Stack Certification? Does it transfer over to the Responsive Web Design course?

If you’ve already started the Certified Full Stack Developer Curriculum, all of your previously completed work should already be saved there.

To be clear, we’ve copied over all of the coursework from the full stack certification to this newer certification.

Can I still continue with the current Full Stack Developer Certification and just not do the new certification?

We’ve moved the coursework for the Full Stack Developer Certification over and broken it up into smaller certifications. Currently there are seven courses available for you to go through. Here is the complete list:

The Certified Full Stack Developer Certification button will remain on the learn page for a short time to give people the opportunity to switch over to the new certifications. Over the next few months, though, this option will disappear.

List of all certifications on the freeCodeCamp learn page.

Will my legacy certifications become invalid?

No. Once you claim a certification, it’s yours to keep.

Also note that we previously announced that freeCodeCamp certifications would have an expiration date and require recertification. We don’t plan to implement this anytime soon. And if we do decide to, we will give everyone at least a year’s notice.

Will the exam be available to take on my phone?

At this time, no. You’ll need to use a laptop or desktop to download the exam environment and take the exam. We hope to eventually offer these certification exams on iPhone and Android.

I have a disability or health condition that is not covered here. How can I request accommodations?

If you need specific accommodations for the exam (for example extra time, breaks, or alternative formats), please email support@freecodecamp.org. We’ll review your request and see whether we can find a reasonable solution.

Anything else?

Good luck working through freeCodeCamp’s coursework, building projects, and preparing for these exams.

Happy coding!



Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

One Key for Every Door: How Security Assertion Markup Language Simplifies Enterprise Access

1 Share

Unlocking seamless access across enterprise apps, Security Assertion Markup Language (SAML) transforms identity management with secure, standards-based single sign-on.

by: Abhinay Labade, Sr Manager, Cybersecurity Operations

Quick Bytes:

  • Enterprise users often face friction and fatigue from managing multiple logins across critical applications
  • Security Assertion Markup Language (SAML) simplifies authentication by turning identity into a secure, portable assertion — like a master key for digital access
  • With SAML-powered single sign-on, McDonald’s teams access critical tools faster, safer, and with less friction

Managing access to multiple applications used to mean juggling countless usernames and passwords — a hassle for users and a possible security risk for IT teams. Enter SAML, the protocol that turns chaos into clarity.

The problem: Too many keys
Imagine an office building with dozens of rooms — ones for HR, Finance, Engineering, and Marketing. Each room has its own lock, and employees need a separate key for each. Keeping track of all those keys is difficult, and remembering which key fits which door is even harder.

This scenario mirrors the digital world before SAML, where every application requires its own credentials.

Meet Sam: Your identity concierge
To solve the challenge of managing different keys for each room, the company hires a trusted messenger named Sam.

Sam has a brilliant idea:

“Why not have one secure master key for all the rooms? I’ll verify who you are and give you a secure note that every room trusts.”

Sam is a metaphor for the identity provider (IdP) — a centralized system that verifies user identity. The rooms — HR, Finance, Engineering, and Marketing — are service providers (SPs), or the applications users need to access. The note Sam writes is the SAML assertion, a secure message that confirms your identity and permissions.

How Sam gets you through the door
When you try to access an enterprise application — say, the Finance room — you’re first met with a guard who doesn’t recognize you and redirects you to Sam. Sam represents the Identity Provider (IdP), a trusted system that verifies your credentials using methods like passwords and multifactor authentication.

Once verified, Sam writes a secure note — known as a SAML assertion — stating your identity, the time of verification, and your role, and signs it with a digital seal. You return to the Finance room with this note, and the guard, now confident in Sam’s signature, grants you access. What’s more, that same note allows you to enter other rooms — other enterprise applications — without needing to present new credentials each time. This is the power of single sign-on (SSO), enabled by SAML: one secure identity check, many trusted doors.

Why SAML is trusted

  • Digital signatures: Assertions are signed and tamper-proof
  • Secure transmission: They travel over HTTPS
  • Expiration: Assertions expire quickly — no stale notes
  • Optional encryption: For sensitive data, assertions can be encrypted

What’s inside a SAML assertion?
Sam’s note — the SAML assertion — is written in XML and includes:

  • Who you are (NameID)
  • When you were verified (AuthnStatement)
  • Your attributes (AttributeStatement), such as role or department

Example:

<saml:Assertion>

<saml:Subject>

<saml:NameID>abhinay.labade@example.com</saml:NameID>

</saml:Subject>

<saml:AuthnStatement AuthnInstant=”2025–10–13T13:00:00Z”/>

<saml:AttributeStatement>

<saml:Attribute Name=”Role”>

<saml:AttributeValue>Manager</saml:AttributeValue>

</saml:Attribute>

</saml:AttributeStatement>

</saml:Assertion>

SAML assertions are signed using XML Digital Signature (XML DSig) and may be encrypted using XML Encryption standards. They are typically transmitted via browser redirects (HTTP POST or Redirect bindings).

SAML is more than just XML
At first glance, SAML might look like regular XML — but it’s far more than a simple markup language. XML is just a way to structure and store data, like a container. SAML builds on XML by adding strict schemas, digital signatures, and security rules that transform it into a full-fledged identity protocol. It defines what data must be included, how it should be structured, and how authentication and authorization information is securely exchanged between systems.

Security is baked into SAML. Assertions are signed to prevent tampering, can be encrypted for sensitive data, and include timestamps to prevent replay attacks. Without SAML, organizations would need to build their own authentication schema, encryption model, and trust framework from scratch. SAML provides all of this out of the box.

Security considerations

  • Always use HTTPS for transmission
  • Validate signatures to prevent tampering
  • Check timestamps to avoid replay attacks
  • Consider assertion encryption for sensitive data
  • Mitigate vulnerabilities like XML signature wrapping with best practices (e.g., validating audience restrictions, using short-lived assertions)

Unlocking the future of enterprise security
SAML isn’t just a protocol — it’s the key that unlocks a frictionless, secure digital workplace. By streamlining access and safeguarding identities, SAML empowers teams to focus on what matters most: driving innovation and collaboration. As organizations grow and evolve, SAML ensures that security keeps pace — making every login a gateway to possibility.


One Key for Every Door: How Security Assertion Markup Language Simplifies Enterprise Access was originally published in McDonald’s Technical Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories