Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
150455 stories
·
33 followers

Integrate Claude Code with Red Hat AI Inference Server on OpenShift

1 Share

Agentic coding tools help developers build software efficiently. Claude Code, Anthropic's terminal-based coding agent, improves productivity by letting you interact with your codebase through natural language—directly from the console.

One advantage of Claude Code is its flexibility. Rather than being locked to Anthropic's cloud models, you can connect it to any backend that uses the Anthropic Messages API. 

This article explores how to integrate Claude Code with a local model served by Red Hat AI Inference Server (a downstream version of vLLM) on Red Hat OpenShift. This approach keeps the inference process private on your infrastructure while retaining the full Claude Code workflow. By doing so, you keep all prompts and responses within your environment while benefiting from Claude Code's developer-focused workflows.

Prerequisites

You will need:

Environment

I executed the steps in this article using an environment with the following specifications:

  • Single-node OpenShift 4.21
  • GPU: NVIDIA RTX 4060 Ti
  • CPU: Intel Core i7-14700 × 28
  • Host machine operating system: Fedora 43

Disclaimer

Because this testing machine is not part of a supported environment, this demo is for testing only and does not represent an official Red Hat support procedure.

Deploy the Red Hat AI Inference Server

The first step is to deploy Red Hat AI Inference Server. For this demo, I created a Helm chart to simplify the deployment in an OpenShift 4.21 environment. You can alternatively follow the manual deployment procedure

Clone the project:

git clone https://github.com/alexbarbosa1989/rhai-helm

Set the minimal required environment variables:

export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export AUTHFILE=$XDG_RUNTIME_DIR/containers/auth.json
export STORAGECLASS=<ocp-storageclass>

Alternatively, you can configure your own values in the rhai-helm/values.yaml file—for example, a different model from Hugging Face or a custom namespace.

Hint: Before setting AUTHFILE, verify whether auth.json already exists at the expected path. This file is created automatically when you authenticate using Podman in the terminal.

podman login registry.redhat.io

Once you define the required environment variables, you can install the Helm chart. For example, to use the default rhai-helm/values.yaml, run:

helm install rhai-helm ./rhai-helm \
--create-namespace --namespace rhai-helm \
--set persistence.storageClass=$STORAGECLASS \
--set secrets.hfToken=$HF_TOKEN \
--set-file secrets.docker.dockercfg=$AUTHFILE

Check the created resources:

oc get secrets
oc get pvc model-cache
oc get deployment
oc get svc
oc get route

Finally, check the running pod. This might take a few minutes, depending on hardware resources.

oc get pod
NAME                        READY   STATUS    RESTARTS   AGE
qwen-coder-5f6668b767-hp585   1/1     Running   0          5m11s

Install and configure Claude Code

Configure this on your developer workstation. Follow the official installation instructions, or install it directly using the convenience script for Linux and macOS:

curl -fsSL https://claude.ai/install.sh | bash

Claude Code uses environment variables for configuration. By overriding the default Anthropic settings, you can redirect requests to a local model served by vLLM. Use this example configuration:

ANTHROPIC_BASE_URL="<RHAI-Inference-exposed-route>" \
ANTHROPIC_API_KEY="vllm" \
ANTHROPIC_DEFAULT_OPUS_MODEL="qwen-coder" \
ANTHROPIC_DEFAULT_SONNET_MODEL="qwen-coder" \
ANTHROPIC_DEFAULT_HAIKU_MODEL="qwen-coder" \
CLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENS="2000" \
CLAUDE_CODE_MAX_OUTPUT_TOKENS="4096" \
MAX_THINKING_TOKENS="0" \
claude

The ANTHROPIC_BASE_URL environment variable must point to the exposed OpenShift route of the Red Hat AI inference service. This is the endpoint Claude Code uses for all requests. 

Replace the example value with the route generated in your OpenShift cluster. Retrieve the route by running:

oc get route -n <namespace>

Also, the values for CLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENS and CLAUDE_CODE_MAX_OUTPUT_TOKENS should be tuned according the hardware capabilities to avoid exhausting the context window.

Once you set the environment variables, launching Claude Code prompts an interactive setup to initialize the workspace (Figure 1).

claude setup
Figure 1: Claude Code setup.

Select ❯ 1. Yes, I trust this folder. At this point, Claude Code is fully initialized and ready for use, as shown in Figure 2.

claude-setup2
Figure 2. Claude Code initialization.

In this example, the following instruction was provided:

❯ create a basic quarkus "hello" service

Claude Code immediately begins processing the request using the locally served model, as illustrated in Figure 3.

claude-demo1
Figure 3: Claude Code interactive session.

You can also verify the interaction directly from the vLLM backend pod in the OpenShift cluster. Successful requests appear in the logs as calls to the /v1/messages API endpoint:

(APIServer pid=1) INFO:     10.128.0.2:43662 - "POST /v1/messages?beta=true HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.128.0.2:43664 - "POST /v1/messages?beta=true HTTP/1.1" 200 OK

This confirms that Claude Code successfully routes requests to the OpenShift-hosted inference service.

Key takeaways

By integrating Claude Code with a vLLM-based inference service on OpenShift, you gain access to effective AI-assisted coding workflows while keeping models, data, and inference under your control.

This demonstration uses a lightweight Qwen model. With specialized, higher-performance hardware, you can serve larger models that provide advanced coding and reasoning capabilities.

Overall, this approach combines the productivity of Claude Code with the security and scalability of OpenShift. It is a practical solution for organizations that need private, on-premises AI development environments.

The post Integrate Claude Code with Red Hat AI Inference Server on OpenShift appeared first on Red Hat Developer.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Is Event-Driven Architecture Overkill for Most Apps?

1 Share

I get it. Most apps really are that simple. Typically, just CRUD. A user submits a form, you validate some data, save it in a database somewhere, and return a response. That is it. So is Event-Driven Architecture Overkill then?

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

Event-Driven Architecture Overkill?

This topic stemmed froma Reddit post that I thought was pretty interesting. Primarily because it’s confusing a lot of concepts and seems to miss the point of Event-Driven architecture. So that’s what this post is trying to clarify.

Pretty much my blog/channel.

I bring up these topics because I think they have a lot of utility and value within a given context. The problem is when there is confusion about what these ideas actually are, and you do not really understand the problem they are trying to solve.

Here’s what confuses me: a friend’s startup processes maybe 10k orders monthly, yet they’re implementing event sourcing. Another friend at a larger company says their monolith handles 100x that traffic just fine.

So what’s the reality? Are we overengineering because it’s trendy? Or am I missing something fundamental about when events actually matter? Real thresholds where event-driven becomes necessary

Neither example has anything to do with the point.

Event sourcing has nothing to do with scaling. A monolith versus services has nothing to do with whether events make sense. So what is the reality? Are we overengineering because it is trendy, or are people missing something fundamental about what events actually are and where they matter?

The reality is, there is just a lot of confusion around event-driven architecture and the different ways you can use events. That is the confusing part.

Scaling is a benefit, not the reason

One of the first things to clear up is scaling.

Yes, scaling can be a benefit of using asynchronous messaging, but that is not the reason to use event-driven architecture.

Imagine a client interacting with an HTTP API. That API publishes messages to a topic on a message broker. Then a separate worker consumes those messages and interacts with the same database. This could even all be happening inside a monolith.

When people think about scaling an HTTP API, they usually think of adding more instances behind a load balancer so more requests can be processed concurrently. That same idea applies to workers consuming messages. You can scale out more workers and process more messages concurrently.

That is the competing consumers pattern.

So yes, there is a scaling benefit there. But again, that is not the primary reason you would apply event driven architecture.

Events as notifications between responsibilities

One of the main reasons to use event-driven architecture is that something happened in your system, and some other independent concern might need to react to it.

That means you have a publisher producing an event because something happened, but it has no idea whether there are any consumers, whether there is one consumer, two consumers, or ten. It does not know if any other part of the system even cares.

What you are doing is decoupling responsibilities in time.

A simple example is shipment delivery.

You order something online. You get tracking information. Maybe you use an app on your phone. Then all of a sudden you get a notification that your package has been delivered.

That notification is one responsibility.

The actual act of somebody dropping off the package, taking the picture, and persisting that to the system is a completely different responsibility.

So when the package gets delivered, that fact can be published to a message broker. From there, different parts of the system can react.

One consumer might send a webhook to some third party system. Another might send an SMS through Twilio. Another might send a push notification through Firebase. These are all different responsibilities, and they are not directly part of the delivery action itself.

That is the point.

Those downstream actions were not part of the decision making process of whether the delivery should happen. The delivery already happened. The decision was already made. Now other parts of the system are reacting to that fact.

That is a good way to think about events. They are facts. Something occurred, and now other responsibilities can respond to it independently.

That is one of the big benefits of event driven architecture in the form of notifications.

Commands and events are not the same thing

This is another place where people get confused.

Just because you are using asynchronous messaging, a message broker, or queues, that does not automatically make your system event driven.

Commands and events are different things.

A command is trying to invoke behavior. It is something like DeliverShipment. It is an instruction. Typically, there is one consumer responsible for handling that command.

There may be multiple parts of the system that can send that command, but the command itself is about telling something to do something.

An event is different.

An event is a statement of fact. Something happened. ShipmentDelivered. PackageDelivered. OrderPlaced.

Typically, an event has one publisher, and there can be zero, one, or many consumers. The publisher does not know who those consumers are, and it does not need to.

That distinction matters.

When someone signs for a package on a device and that triggers a request to your HTTP API, that request is handling a command. It is doing the work of recording the delivery. It should not also be directly responsible for sending every webhook, every SMS, and every push notification as part of the same workflow.

Those are separate concerns. That is where events fit.

Event sourcing is about state, not notifications

There is even more confusion because events can serve different purposes.

Earlier I was talking about events as notifications. But event sourcing is something else entirely.

Event sourcing is about state.

Instead of recording current state, you record the series of events that state can be derived from.

That is a completely different utility than using events for notifications.

Take a shipment as an example. You might have events like:

  • Dispatched
  • ArrivedAtShipper
  • Loaded
  • Departed
  • ArrivedAtConsignee
  • Delivered

Each shipment has its own stream, its own series of events. Another shipment has a different stream.

Those events become your source of truth.

From that stream, you can derive state. You can project it into a relational table. You can put it into a document store. You can shape it however you want. At any point in time, you can rebuild the current state or derive what the state looked like earlier.

That is what event sourcing is.

So yes, you can be using events for collaboration and notifications between parts of your system. And yes, you can be using events as the source of truth with event sourcing. Those things can overlap.

But they are not the same thing.

Do not use events just because everyone else is

Do not use events because you want to use a broker. Do not use them because everybody is using Kafka. Do not use them because you heard it scales better.

That is not a good reason.

Use events for notifications because a business fact occurred and other consumers want to respond to that fact on their own timeline.

Use events when you want to remove temporal coupling.

Use them when you have integration boundaries between parts of your system, whether that is inside a monolith, between services, or with external systems.

Events can act as a contract at those boundaries.

And do not use event sourcing just because you want to be event driven.

That is where people get themselves into trouble.

CRUD sourcing is not event sourcing

One of the worst examples of this confusion is when people use event sourcing even though all they are really doing is recording current state changes.

I often call this CRUD sourcing.

The event stream becomes something like:

  • ShipmentCreated
  • ShipmentUpdated
  • ShipmentUpdated
  • ShipmentUpdated

That is not event sourcing in any meaningful sense.

On the flip side, event sourcing makes sense when history matters.

When the sequence of events matters.
When understanding how something got to its current state matters.
When those events are real domain language.

In the shipment example, Dispatched, ArrivedAtShipper, Departed, ArrivedAtConsignee, Delivered. That is domain language. That history has meaning. That history has value.

That is the exact opposite of CRUD sourcing.

If you care about that history, and you want to use it as your source of truth and derive different views or models from it, then event sourcing can be a really good fit.

So, is event-driven architecture overkill?

It can be.

If your app really is just request-response, and you do not have different workflows, independent responsibilities, or integration boundaries, then yes, it could absolutely be overkill.

Do not invent problems for solutions you do not fully understand.

Scaling is a benefit in some cases, but it is not the reason to use events as notifications, and it is not the reason to use event sourcing.

Use events for notification purposes when you want to decouple responsibilities in time.

Use event sourcing when history matters, when the domain language matters, and when recording that sequence of facts as your source of truth gives you real value.

Do not make it more complicated than that.

Join CodeOpinon!
Developer-level members of my Patreon or YouTube channel get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

The post Is Event-Driven Architecture Overkill for Most Apps? appeared first on CodeOpinion.

Read the whole story
alvinashcraft
15 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Next.js vs Wasp: 40% Less Tokens for the Same App

1 Share

TL;DR

We gave Claude Code the exact same feature prompt for two identical apps - one built with Next.js, the other built with Wasp - and measured everything that Claude Code did to implement the feature.

MetricWaspNext.jsWasp's Advantage
Total cost$2.87$5.1744% cheaper
Total input & output tokens2.5M4.0M38% fewer
API calls669631% fewer
Tool uses526621% fewer
Files read1215Smaller blast radius
Output tokens (code written)5,4165,395~same

Not surprisingly, the savings were proportional to the amount of actual total code/tokens the Wasp framework abstracts away, which we also measured by running a static token count across both codebases. In this case, the Wasp version reduced total code/tokens by ~40%.

Because the framework allows Claude Code to get the same work done in fewer tokens, it delivers a ~70% higher token efficiency (output per token).

So, if you're using AI coding tools daily, your framework choice might be your single biggest lever for improving your AI's ability to generate accurate, complex code, quickly and cheaply.

Wasp vs Next.js token efficiency
We compared the two frameworks on the same feature prompt in the same app: Vercel's SaaS Starter.

What Wasp Actually Is

Wasp is a full-stack framework for React, Node.js, and Prisma with batteries-included. Think Rails or Laravel-like productivity for the JS ecosystem but where authentication, routing, operations, database, and cron jobs are defined declaratively as config.

You write business logic and Wasp handles the boilerplate and glue code for you.

Here's a simple example of a Wasp app's config file:

main.wasp.ts
import { App } from 'wasp-config'

const app = new App('todoApp', {
title: 'ToDo App',
wasp: { version: '^0.21' },
// head: []
});

app.auth({
userEntity: User,
methods: {
google: {},
email: {...}
}
});

const mainPage = app.page('MainPage', {
component: { importDefault: 'Main', from: '@src/pages/MainPage' }
});
app.route('RootRoute', { path: '/', to: mainPage });

app.query('getTasks', {
fn: { import: 'getTasks', from: '@src/queries' },
entities: ['Task']
});

app.job('taskReminderJob', {
executor: 'PgBoss',
perform: {
fn: { import: 'sendTaskReminder', from: '@src/workers/taskReminder' }
},
schedule: { cron: "*/5 * * * *" },
entities: ['Task']
});

export default app;

Wasp's opinionated approach via its config gives AI tools (and developers) a big advantage: it acts as a large "specification" that both you and your AI coding agents already understand and agree on.

This gives AI one common pattern to follow, fewer decisions to make, less boilerplate to write, and fewer tools to stitch together, making the entire development process more reliable.

Show, Don't Tell

Before the numbers mean anything, here's how Wasp's "declarative config" compares to Next.js's equivalent.

In this example, we're comparing the fundamental auth setup from the actual test apps. The first tab shows Wasp, and the second tab shows Next.js. Click between the tabs to see the difference.

main.wasp.ts
app.auth({
userEntity: 'User',
methods: {
email: {
fromField: {
name: 'SaaS App',
email: 'hello@example.com'
},
emailVerification: {
clientRoute: 'EmailVerificationRoute'
},
passwordReset: {
clientRoute: 'PasswordResetRoute'
},
},
},
onAfterSignup: { import: 'onAfterSignup', from: '@src/auth/hooks' },
onAuthFailedRedirectTo: '/login',
onAuthSucceededRedirectTo: '/dashboard',
});
lib/auth/session.ts + middleware.ts
// lib/auth/session.ts
import { SignJWT, jwtVerify } from 'jose';
import { cookies } from 'next/headers';

const key = new TextEncoder().encode(process.env.AUTH_SECRET);

type SessionData = {
user: { id: number };
expires: string;
};

export async function signToken(payload: SessionData) {
return await new SignJWT(payload)
.setProtectedHeader({ alg: 'HS256' })
.setIssuedAt()
.setExpirationTime('1 day from now')
.sign(key);
}

export async function verifyToken(input: string) {
const { payload } = await jwtVerify(input, key, { algorithms: ['HS256'] });
return payload as SessionData;
}

export async function setSession(user: NewUser) {
const expiresInOneDay = new Date(Date.now() + 24 * 60 * 60 * 1000);
const session: SessionData = {
user: { id: user.id! },
expires: expiresInOneDay.toISOString(),
};
const encryptedSession = await signToken(session);
(await cookies()).set('session', encryptedSession, {
expires: expiresInOneDay,
httpOnly: true, secure: true, sameSite: 'lax',
});
}

export async function getSession() {
const session = (await cookies()).get('session')?.value;
if (!session) return null;
return await verifyToken(session);
}

export async function hashPassword(password: string) {
return hash(password, SALT_ROUNDS);
}

export async function comparePasswords(plainText: string, hashed: string) {
return compare(plainText, hashed);
}

// middleware.ts — route protection + token refresh
import { signToken, verifyToken } from '@/lib/auth/session';

const protectedRoutes = '/dashboard';

export async function middleware(request: NextRequest) {
const { pathname } = request.nextUrl;
const sessionCookie = request.cookies.get('session');
const isProtectedRoute = pathname.startsWith(protectedRoutes);

if (isProtectedRoute && !sessionCookie) {
return NextResponse.redirect(new URL('/sign-in', request.url));
}

let res = NextResponse.next();

if (sessionCookie && request.method === 'GET') {
try {
const parsed = await verifyToken(sessionCookie.value);
const expiresInOneDay = new Date(Date.now() + 24 * 60 * 60 * 1000);
res.cookies.set({
name: 'session',
value: await signToken({
...parsed,
expires: expiresInOneDay.toISOString(),
}),
httpOnly: true, secure: true, sameSite: 'lax',
expires: expiresInOneDay,
});
} catch (error) {
res.cookies.delete('session');
if (isProtectedRoute) {
return NextResponse.redirect(new URL('/sign-in', request.url));
}
}
}
return res;
}

As you can see, Wasp's config is much more concise and readable than Next.js's.

Why This Matters for AI

Now that you can see the abstraction gap, here's why it compounds for AI:

  1. Performance degrades as context window fills up

    AI tools are stateless and have a finite context window and performance degrades long before it's completely full. A larger codebase fills it faster, which means earlier message compression, less room for reasoning, and degraded output. Developers also have to maintain detailed memory and skill files to help AI understand the app structure and implementation expectations.

  2. Signal-to-noise ratio

    More tokens isn't just more expensive, it's more noise. Wasp's config is pure signal for AI, acting like a spec and concise map of the app the AI can follow. Next.js's equivalent is spread across route files, API handlers, middleware, and config. Higher noise = higher chance of mistakes.

  3. Every LLM call to the API re-reads the codebase

    AI tools don't remember between turns. Each turn re-reads the session conversation and codebase context from scratch. A bigger codebase means every single turn costs more. In our test, cache reads alone cost $1.71 with Next.js vs. $1.09 for Wasp.

  4. It compounds over a project's lifetime

    This test only measured one full-stack feature (db model, server operation, client page), and the differences were already significant (2.5M tokens vs. 4.0M tokens). In larger codebases, these differences quickly compound.

The Full Results

Combined: Planning + Implementation Phases

WaspNext.jsDelta
Total cost$2.87$5.17Next.js 80% more expensive
Total duration14.9m15.0mNearly identical
Total API calls6696Next.js 45% more
Total tokens2,505,7964,049,413Next.js 62% more
Total tool uses5266Next.js 27% more
Subagents spawned33Same
Unique files read1215
Files edited66Same
Files created23
Token cost by category
Input$0.0005$0.0801
Output$0.2099$0.2135Nearly identical
Cache read$1.0861$1.7073Next.js 57% more
Cache creation$1.3230$2.8184Next.js 113% more
Subagent$0.25$0.35

Here are the main takeaways:

  • Output tokens (what the AI wrote): nearly identical — $0.21 vs $0.21
  • Cache creation (what the AI first loaded): Next.js 113% more — $2.82 vs $1.32
  • Cache read (what the AI re-read each turn): Next.js 57% more — $1.71 vs $1.09

Cache creation cost. By caching your prompts, Claude optimizes your API usage by reusing prompts. But while reading from cache is cheap, creating new cache entries is still expensive. Next.js had 2.2x more cache creation tokens (343K vs 155K), at $6.25/M that's $2.15 vs $0.97. The bigger codebase means more new content being loaded into cache.

Cache read cost. Each LLM call to the API re-reads the growing context. The Next.js codebase is bigger so each read costs more: $1.14 vs $0.67. Output tokens were nearly identical (~5,400), meaning the AI wrote roughly the same amount of code but had to read far more to do it.

The main differences comes from the Next.js codebase being bigger, meaning more tokens to load and re-read across every single API call to get the same output.

Context Efficiency: a New Evaluation Metric

The savings mirror what Wasp abstracts: authentication, routing, database management, operations, and jobs are defined in the config, so AI doesn't need to read, navigate, or generate layers of glue code.

And while this is a new approach in the framework space, Wasp is just following a general principle here: tools that make coding easier for humans make it easier for AI, too. They offload structural work and let the AI focus on business logic, giving a clearer path to generating complex code more accurately.

Think of it as a new evaluation metric for any framework: "context efficiency", or how much of an AI's context window goes to signal vs. boilerplate. Add it alongside DX, performance, and ecosystem when choosing your stack in the AI-assisted coding era.

Methodology

First, we took Vercel's official SaaS starter app in Next.js and converted it to a Wasp (React, Node.js, Prisma) app.

Then, we made sure to use the same models (Opus for planning and implementation, Haiku for exploring), same framework-agnostic prompt, and same plan-then-implement flow, outlined in the test protocol.

Finally, we created measurement scripts to pull metrics from Claude Code's detailed JSONL session transcripts.

We used Anthropic's API pricing as of March 2026 (per 1M tokens), e.g.:

InputOutputCache ReadCache Create
Opus 4.6$5.00$25.00$0.50$6.25

Fairness caveats: Claude has seen far more Next.js training data (advantage: Next.js). Wasp's codebase is ~40% smaller, but that's the point. And this is a single feature test, not a comprehensive benchmark.

Explore the comparison yourself: both apps, the test protocol, and measurement scripts are in the comparison repo.

Try Wasp

Want to try Wasp? Get started with:

npm i -g @wasp.sh/wasp-cli@latest

Then start a new Wasp app:

wasp new my-app
wasp start

And don't forget to add the Wasp Agent Plugin / Skills to turn your agent into a Wasp framework expert!

Read the whole story
alvinashcraft
21 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Announcing Rust 1.94.1

1 Share

The Rust team has published a new point release of Rust, 1.94.1. Rust is a programming language that is empowering everyone to build reliable and efficient software.

If you have a previous version of Rust installed via rustup, getting Rust 1.94.1 is as easy as:

rustup update stable

If you don't have it already, you can get rustup from the appropriate page on our website.

What's in 1.94.1

Rust 1.94.1 resolves three regressions that were introduced in the 1.94.0 release.

And a security fix:

Contributors to 1.94.1

Many people came together to create Rust 1.94.1. We couldn't have done it without all of you. Thanks!

Read the whole story
alvinashcraft
25 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

The World’s Crudest Chaos Monkey

1 Share

I’m working pretty hard this week and early next to deliver the CritterWatch MVP (our new management and observability console for the Critter Stack) to a JasperFx Software client. One of the things we need to do for testing is to fake out several failure conditions in message handlers to be able to test CritterWatch’s “Dead Letter Queue” management and alerting features. To that end, we have some fake systems that constantly process messages, and we’ve rigged up what I’m going to call the world’s crudest Chaos Monkey in Wolverine middleware:

    public static async Task Before(ChaosMonkeySettings chaos)
    {
        // Configurable slow handler for testing back pressure
        if (chaos.SlowHandlerMs > 0)
        {
            await Task.Delay(chaos.SlowHandlerMs);
        }

        if (chaos.FailureRate <= 0) return;

        // Chaos monkey — distribute failure rate equally across 5 exception types
        var perType = chaos.FailureRate / 5.0;
        var next = Random.Shared.NextDouble();

        if (next < perType)
        {
            throw new TripServiceTooBusyException("Just feeling tired at " + DateTime.Now);
        }

        if (next < perType * 2)
        {
            throw new TrackingUnavailableException("Tracking is down at " + DateTime.Now);
        }

        if (next < perType * 3)
        {
            throw new DatabaseIsTiredException("The database wants a break at " + DateTime.Now);
        }

        if (next < perType * 4)
        {
            throw new TransientException("Slow down, you move too fast.");
        }

        if (next < perType * 5)
        {
            throw new OtherTransientException("Slow down, you move too fast.");
        }
    }

And this to control it remotely in tests or just when doing exploratory manual testing:

    private static void MapChaosMonkeyEndpoints(WebApplication app)
    {
        var group = app.MapGroup("/api/chaos")
            .WithTags("Chaos Monkey");

        group.MapGet("/", (ChaosMonkeySettings settings) => Results.Ok(settings))
            .WithSummary("Get current chaos monkey settings");

        group.MapPost("/enable", (ChaosMonkeySettings settings) =>
        {
            settings.FailureRate = 0.20;
            return Results.Ok(new { message = "Chaos monkey enabled at 20% failure rate", settings });
        }).WithSummary("Enable chaos monkey with default 20% failure rate");

        group.MapPost("/disable", (ChaosMonkeySettings settings) =>
        {
            settings.FailureRate = 0;
            return Results.Ok(new { message = "Chaos monkey disabled", settings });
        }).WithSummary("Disable chaos monkey (0% failure rate)");

        group.MapPost("/failure-rate/{rate:double}", (double rate, ChaosMonkeySettings settings) =>
        {
            rate = Math.Clamp(rate, 0, 1);
            settings.FailureRate = rate;
            return Results.Ok(new { message = $"Failure rate set to {rate:P0}", settings });
        }).WithSummary("Set chaos monkey failure rate (0.0 to 1.0)");

        group.MapPost("/slow-handler/{ms:int}", (int ms, ChaosMonkeySettings settings) =>
        {
            ms = Math.Max(0, ms);
            settings.SlowHandlerMs = ms;
            return Results.Ok(new { message = $"Handler delay set to {ms}ms", settings });
        }).WithSummary("Set artificial handler delay in milliseconds (for back pressure testing)");

        group.MapPost("/projection-failure-rate/{rate:double}", (double rate, ChaosMonkeySettings settings) =>
        {
            rate = Math.Clamp(rate, 0, 1);
            settings.ProjectionFailureRate = rate;
            return Results.Ok(new { message = $"Projection failure rate set to {rate:P0}", settings });
        }).WithSummary("Set projection failure rate (0.0 to 1.0)");
    }

In this case, the Before middleware is just baked into the message handlers, but in your development the “chaos monkey” middleware could be applied only in testing with a Wolverine extension.

And I was probably listening to Simon & Garfunkel when I did the first cut at the chaos monkey:



Read the whole story
alvinashcraft
35 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

How to Prevent Application Security Threats: A Developer’s Guide to Proactive Protection

1 Share

Perimeter controls can reduce risk, but they can’t compensate for vulnerable code. Network segmentation and intrusion detection can limit blast radius and help you spot suspicious activity, but attackers still get in by exploiting weaknesses in your application. They’re not always trying to break through the perimeter. They’re walking straight through the front door by abusing the logic, inputs, and dependencies your software relies on.

Application security threats target the logic, dependencies, and configurations that make your software work. When attackers find a broken authentication mechanism or an unvalidated input field, perimeter defenses become far less effective. The breach happens inside the application itself, where network and perimeter-focused controls often lack the context to help.

The numbers tell the story plainly. Teams that catch security problems during development generally spend less time and money fixing vulnerabilities than teams that discover the same issues in production. Patching costs are only part of it. Add downtime, emergency deployments, incident response, and the reputational damage that can come with a public breach. Prevention tends to be cheaper than remediation, and it scales more effectively as systems grow.

Application Code: The Primary Attack Surface

You write business logic and pull in open-source libraries. You integrate third-party APIs. Authentication systems need configuration. Each layer introduces potential weaknesses that attackers actively hunt for.

The shift to API-driven architectures and microservices expands this attack surface even further. Every endpoint becomes a potential entry point. Every service-to-service communication requires authentication and authorization. Every data transformation needs input validation. The complexity multiplies faster than most security teams can reasonably monitor, which is exactly what attackers count on.

OWASP Top 10 highlights many of the most common and high-impact application security risks, including broken access control, injection, cryptographic failures, and security misconfigurations. These issues show up repeatedly across real-world incidents because they’re often systemic: they stem from missing validation, inconsistent authorization, weak secrets management, and insecure defaults.

An attacker can submit carefully crafted SQL input into a vulnerable search box and potentially access data they shouldn’t. Broken authentication or session management can let them bypass login controls by abusing tokens or exploiting flawed password reset flows. Insecure deserialization can enable arbitrary code execution when an application accepts untrusted serialized objects. These failures originate in application code and configuration, not in the network perimeter.

Many organizations invest heavily in network security but still underinvest in application-layer safeguards. They monitor traffic patterns yet miss gaps like inconsistent input validation, weak session handling, or insecure configuration. Traditional security strategies can treat applications as black boxes behind protective barriers: scan periodically, patch what you find, and hope nothing slips through between assessments.

This reactive model fails because it assumes security happens around applications rather than within them. By the time a vulnerability is detected in production, it may already have been discovered, probed, or exploited. The longer the gap between introducing a flaw and finding it, the more expensive and risky remediation becomes.

Shift-Left Security in the Development Lifecycle

Effective prevention starts where vulnerabilities originate, which is in the development process itself. Shift-left security embeds testing and verification throughout your workflow rather than treating security as a final gate before deployment. The goal is catching problems when they’re cheapest and easiest to fix.

Your development lifecycle should include these testing approaches:

Static Application Security Testing (SAST)

SAST analyzes source code before it runs, catching problems like hardcoded credentials, unsafe function calls, and improper error handling while code is still in your IDE. It catches the obvious stuff, but it’s blind to what happens at runtime.

Dynamic Application Security Testing (DAST)

DAST simulates real attacks against running applications, probing endpoints to test how your application responds to malicious inputs and unexpected requests. It can uncover runtime issues, but it also tends to generate findings that require triage and tuning to separate real risk from noise.

Software Composition Analysis (SCA)

SCA tracks the open-source components your application depends on, helping you identify and remediate vulnerable dependencies before they reach production. It reflects how heavily modern applications rely on third-party and open-source components. When Log4Shell exposed millions of Java applications to remote code execution, SCA tools helped teams identify affected dependencies and prioritize remediation. You can’t protect what you can’t inventory.

Interactive Application Security Testing (IAST)

IAST combines static and dynamic approaches by instrumenting your application during QA testing to monitor data flow and execution paths in real time. It bridges the gap by seeing how data flows through your application during testing, which helps identify complex vulnerabilities that neither SAST nor DAST alone would catch. A SQL injection vulnerability might pass SAST if the code looks clean but fail IAST when testers exercise the feature and the tool observes unsanitized data reaching the database query.

Integrating these tools into your CI/CD pipeline makes security testing automatic rather than optional. Every commit triggers scans. Every pull request requires passing security checks before merging. Failed builds block vulnerable code from reaching production. Security becomes part of your definition of done, not something you bolt on at the end.

The key is calibrating these tools so they catch real problems without drowning developers in false positives. Misconfigured security tools get ignored, which defeats the entire purpose of integrating them in the first place.

Runtime Application Protection Beyond Development Testing

No amount of development-phase testing catches everything. Zero-day vulnerabilities appear in frameworks you depend on. Logic flaws hide in complex workflows. And configuration drift can introduce new exposure after release. That’s why many teams add runtime protection in production as part of a defense-in-depth strategy.

RASP is one category of runtime protection

Runtime Application Self-Protection (RASP) is one approach: it instructs an application at runtime to detect and respond to certain classes of in-app abuse while the application is running. RASP is often discussed alongside tools like WAFs and IDS/IPS, but it operates inside the application process rather than inspecting traffic from the outside.

Just as importantly, RASP isn’t the only runtime problem teams need to solve.

Runtime self-defense against tampering and reverse engineering

Many real-world attacks don’t look like “malicious HTTP traffic” at all. Instead, attackers reverse engineer applications to extract secrets, map proprietary logic, bypass license checks, or modify behavior. They attach debuggers, tamper with binaries, repackage mobile apps, or run code in compromised environments (like rooted or jailbroken devices) to gain leverage. These threats require code-level hardening and runtime integrity controls, not just vulnerability scanning.

Perimeter controls like WAFs and IDS/IPS can help reduce risk, but they typically operate outside the application and often lack visibility into what your code is doing internally. And they don’t address reverse engineering and tampering threats, where the attacker’s goal is to inspect, modify, or counterfeit your application rather than send obviously malicious requests.

Code obfuscation and integrity checks are designed to protect intellectual property and runtime integrity by making reverse engineering and tampering substantially harder. Attackers analyze application binaries to understand business logic, extract proprietary algorithms, and locate security controls they can bypass. They’re looking for hardcoded API keys, mapping out authentication flows, and figuring out where you validate licenses and what controls they can work around.

Obfuscation transforms readable methods and classes into symbols that are difficult to interpret and map back to intent. Control flow obfuscation restructures program logic to hide the original implementation. The application behaves the same, but the code becomes substantially more time-consuming to analyze and understand.

Security through obscurity gets dismissed as bad practice, and it is when it’s your only defense. But obfuscation matters when it protects your competitive advantage. In some markets, teams see competitors or fraudsters decompile mobile apps, extract API endpoints, and attempt to clone functionality that relies on the original developer’s infrastructure. Obfuscation won’t stop determined attackers, but it raises the cost of attack significantly.

Additional runtime protections address different threat scenarios:

  • Anti-debug: Detects debugging and dynamic analysis attempts and can respond (block, degrade, alert) based on policy
  • Tamper detection / integrity checks: Detects modified binaries, injected code, or repackaged apps before compromised code executes
  • Root/jailbreak detection (mobile): Detects compromised devices where attackers have elevated privileges

Least privilege and privilege boundaries (architecture-level): Limits blast radius by ensuring services and components run with only the permissions they need. When an attacker compromises one part of your system, proper privilege boundaries limit what they can reach. A compromised web service can’t access the database directly. A breached API endpoint can’t read files from the filesystem.

How PreEmptive Hardens Applications at the Code Level

If your threat model includes reverse engineering, tampering, repackaging, or running code in compromised environments, you need protections that ship with the application itself. That’s where code-level hardening and runtime integrity controls come in.

PreEmptive protects applications at the source, not the perimeter. Your code is the attack surface, and PreEmptive hardens it before it ever leaves the build pipeline.

Dotfuscator turns .NET applications into a reverse engineering nightmare. Your C# ships obfuscated and runtime-protected, fully functional but practically unreadable to anyone trying to pick it apart.

DashO brings that same level of protection to Java. JSDefender handles the JavaScript side, covering what runs in browsers and Node.js. All three integrate directly into your build process, so protection becomes part of shipping, not something bolted on after.

PreEmptive’s obfuscation isn’t a single technique. It’s a multi-layered defense system that compounds the difficulty of reverse engineering at every stage of analysis.

  • Renaming strips method and variable names down to meaningless symbols, so decompiled code reveals nothing about what it actually does.
  • Control flow transformation restructures your logic into functionally equivalent paths that resist static analysis.
  • String encryption locks down the endpoints, API keys, and configuration values that attackers typically search binaries for first.

Each layer alone raises the bar. Stacked together, they create a compounding effect. An attacker who gets past renamed symbols still faces restructured logic. If they untangle the control flow, encrypted strings block the shortcuts they’d normally rely on. Professional decompilation tools can recover structure, but they can’t recover intent when every readable signal has been stripped, scrambled, or buried.

Runtime protections can detect and respond to debugging, tampering, and suspicious runtime conditions. The specific capabilities vary by platform and configuration, but may include anti-debug measures that hinder analysis in reverse engineering tools, integrity checks that detect modified binaries or repackaged apps, and runtime checks such as root/jailbreak detection for mobile applications.

Mobile applications can add protections such as root and jailbreak detection to identify execution on compromised devices where attackers have elevated privileges, then respond based on policy (for example, blocking execution, limiting functionality, or logging and alerting).

Your applications stop being easy targets for straightforward reverse engineering and tampering. They can detect and respond to analysis attempts, modification, and execution in compromised environments, depending on platform and configuration.

Protect Applications at the Code Level

Testing catches many vulnerabilities during development, but applications still face threats that no test plan anticipates. New exploits emerge constantly, and attackers evolve tactics to probe for weaknesses and abuse runtime conditions. That’s why many teams complement shift-left practices with code-level hardening and runtime protections that ship with the application, helping protect it where it runs rather than relying only on external controls.

PreEmptive gives you that control through obfuscation and runtime protections that deploy with your application. Your .NET, Java, and JavaScript applications gain defenses that frustrate reverse engineering and respond to tampering and analysis attempts without depending on external security infrastructure. Security travels wherever your code runs because it’s built into the application itself.

Start Your Free PreEmptive Trial Today!

Read the whole story
alvinashcraft
40 seconds ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories