Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
151814 stories
·
33 followers

Kubernetes v1.35: Introducing Workload Aware Scheduling

1 Share

Scheduling large workloads is a much more complex and fragile operation than scheduling a single Pod, as it often requires considering all Pods together instead of scheduling each one independently. For example, when scheduling a machine learning batch job, you often need to place each worker strategically, such as on the same rack, to make the entire process as efficient as possible. At the same time, the Pods that are part of such a workload are very often identical from the scheduling perspective, which fundamentally changes how this process should look.

There are many custom schedulers adapted to perform workload scheduling efficiently, but considering how common and important workload scheduling is to Kubernetes users, especially in the AI era with the growing number of use cases, it is high time to make workloads a first-class citizen for kube-scheduler and support them natively.

Workload aware scheduling

The recent 1.35 release of Kubernetes delivered the first tranche of workload aware scheduling improvements. These are part of a wider effort that is aiming to improve scheduling and management of workloads. The effort will span over many SIGs and releases, and is supposed to gradually expand capabilities of the system toward reaching the north star goal, which is seamless workload scheduling and management in Kubernetes including, but not limited to, preemption and autoscaling.

Kubernetes v1.35 introduces the Workload API that you can use to describe the desired shape as well as scheduling-oriented requirements of the workload. It comes with an initial implementation of gang scheduling that instructs the kube-scheduler to schedule gang Pods in the all-or-nothing fashion. Finally, we improved scheduling of identical Pods (that typically make a gang) to speed up the process thanks to the opportunistic batching feature.

Workload API

The new Workload API resource is part of the scheduling.k8s.io/v1alpha1 API group. This resource acts as a structured, machine-readable definition of the scheduling requirements of a multi-Pod application. While user-facing workloads like Jobs define what to run, the Workload resource determines how a group of Pods should be scheduled and how its placement should be managed throughout its lifecycle.

A Workload allows you to define a group of Pods and apply a scheduling policy to them. Here is what a gang scheduling configuration looks like. You can define a podGroup named workers and apply the gang policy with a minCount of 4.

apiVersion: scheduling.k8s.io/v1alpha1
kind: Workload
metadata:
 name: training-job-workload
 namespace: some-ns
spec:
 podGroups:
 - name: workers
 policy:
 gang:
 # The gang is schedulable only if 4 pods can run at once
 minCount: 4

When you create your Pods, you link them to this Workload using the new workloadRef field:

apiVersion: v1
kind: Pod
metadata:
 name: worker-0
 namespace: some-ns
spec:
 workloadRef:
 name: training-job-workload
 podGroup: workers
 ...

How gang scheduling works

The gang policy enforces all-or-nothing placement. Without gang scheduling, a Job might be partially scheduled, consuming resources without being able to run, leading to resource wastage and potential deadlocks.

When you create Pods that are part of a gang-scheduled pod group, the scheduler's GangScheduling plugin manages the lifecycle independently for each pod group (or replica key):

  1. When you create your Pods (or a controller makes them for you), the scheduler blocks them from scheduling, until:

    • The referenced Workload object is created.
    • The referenced pod group exists in a Workload.
    • The number of pending Pods in that group meets your minCount.
  2. Once enough Pods arrive, the scheduler tries to place them. However, instead of binding them to nodes immediately, the Pods wait at a Permit gate.

  3. The scheduler checks if it has found valid assignments for the entire group (at least the minCount).

    • If there is room for the group, the gate opens, and all Pods are bound to nodes.
    • If only a subset of the group pods was successfully scheduled within a timeout (set to 5 minutes), the scheduler rejects all of the Pods in the group. They go back to the queue, freeing up the reserved resources for other workloads.

We'd like to point out that that while this is a first implementation, the Kubernetes project firmly intends to improve and expand the gang scheduling algorithm in future releases. Benefits we hope to deliver include a single-cycle scheduling phase for a whole gang, workload-level preemption, and more, moving towards the north star goal.

Opportunistic batching

In addition to explicit gang scheduling, v1.35 introduces opportunistic batching. This is a Beta feature that improves scheduling latency for identical Pods.

Unlike gang scheduling, this feature does not require the Workload API or any explicit opt-in on the user's part. It works opportunistically within the scheduler by identifying Pods that have identical scheduling requirements (container images, resource requests, affinities, etc.). When the scheduler processes a Pod, it can reuse the feasibility calculations for subsequent identical Pods in the queue, significantly speeding up the process.

Most users will benefit from this optimization automatically, without taking any special steps, provided their Pods meet the following criteria.

Restrictions

Opportunistic batching works under specific conditions. All fields used by the kube-scheduler to find a placement must be identical between Pods. Additionally, using some features disables the batching mechanism for those Pods to ensure correctness.

Note that you may need to review your kube-scheduler configuration to ensure it is not implicitly disabling batching for your workloads.

See the docs for more details about restrictions.

The north star vision

The project has a broad ambition to deliver workload aware scheduling. These new APIs and scheduling enhancements are just the first steps. In the near future, the effort aims to tackle:

  • Introducing a workload scheduling phase
  • Improved support for multi-node DRA and topology aware scheduling
  • Workload-level preemption
  • Improved integration between scheduling and autoscaling
  • Improved interaction with external workload schedulers
  • Managing placement of workloads throughout their entire lifecycle
  • Multi-workload scheduling simulations

And more. The priority and implementation order of these focus areas are subject to change. Stay tuned for further updates.

Getting started

To try the workload aware scheduling improvements:

  • Workload API: Enable the GenericWorkload feature gate on both kube-apiserver and kube-scheduler, and ensure the scheduling.k8s.io/v1alpha1 API group is enabled.
  • Gang scheduling: Enable the GangScheduling feature gate on kube-scheduler (requires the Workload API to be enabled).
  • Opportunistic batching: As a Beta feature, it is enabled by default in v1.35. You can disable it using the OpportunisticBatching feature gate on kube-scheduler if needed.

We encourage you to try out workload aware scheduling in your test clusters and share your experiences to help shape the future of Kubernetes scheduling. You can send your feedback by:

Learn more

Read the whole story
alvinashcraft
44 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Building Chromium & V8 with Visual Studio 2026: December 2025

1 Share

I don’t have to build Chrome (or v8) regularly. But I’ve had a recent occasion to look in the source code to explain an odd behaviour that I was seeing. In doing so, I revisited the steps for how to build Chrome on Windows. The steps evolve over time. This time around I decided that I would use Visual Studio 2026 to perform the build. I’ve now got a script that requires less interaction, which is great given how long a compilation takes. Don’t be surprised if it takes 10-20 hours for the code to compile depending on various factors (such as drive speed, internet bandwidth, processor speed, and other performance characteristics).

I have a build script for Windows and Visual Studio 2026. Unlike a previous version of this script from a post earlier this year, you don’t have to come along to click on an OK mutton to keep the process moving. It should run from beginning to end. This script assume that you already have Visual Studio 2026 and git installed. I’ve got two entry points for the process. One version of a script will run the Visual Studio installer to add the components that are needed (in case you haven’t added them) before starting the build process. Then it runs the steps to check out and start building. The other script only runs the steps necessary to checkout and start building, assuming that you already have the necessary Visual Studio components installed.

If you look in the scripts, you will see that the call command is used a lot. This is because many of the commands for google’s build tools are themselves bath files. If a batch file is invoked without the call command, when it gets done, control does not return to the batch file that was invoking it. If it is invoked with the call command, then control return to the line after the one that invoked it.

The script that installs the Visual Studio components invokes the vs_installer.exe with several arguments to install the various component. I call it with the start /wait command so that the batch file will pause. I’m using the Community edition of Visual Studio. If you are using a different edition, you’ll need to replace instances of the word “Community” with whatever is appropriate for your edition (“Enterprise” or “Professional”). The --passive argument tells the installer to run in passive mode. In this mode it will perform its tasks without requesting any input from the user. The argument --quiet could also work here. But --passive lets you see that the script is doing something.

 
pushd C:\Program Files (x86)\Microsoft Visual Studio\Installer\
start /wait vs_installer.exe install --passive --productid Microsoft.VisualStudio.Product.Community --ChannelId VisualStudio.18.Release --add Microsoft.VisualStudio.Workload.NativeDesktop  --add Microsoft.VisualStudio.Component.VC.ATLMFC  --add Microsoft.VisualStudio.Component.VC.Tools.ARM64 --add Microsoft.VisualStudio.Component.VC.MFC.ARM64  --add Microsoft.VisualStudio.Component.VC.Llvm.Clang --add Microsoft.VisualStudio.Component.VC.Llvm.ClangToolset --add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Llvm.Clang  --includeRecommended
popd
call checkout-chromium-and-build.cmd

Once the VS components are there, we are ready to start building. The Chromium build steps rely on git and some other tools/scripts from google. Those tools haven’t been installed yet. But that doesn’t prevent environment variables from being created for where these tools will be. Since I only need to have these environment variables set for compilation of Chrome and v8, I find it easier to keep them in the batch file rather than set them on the system. In the following, I setup these environment variables to compile on the C: drive. On that drive, I’m putting the tools and the compiled code in child folders of \shares\projects\google. From there, Google’s tools will be in a subfolder called depot_tools and the Chromium and V8 code will be in a subfolder named chromium. The top of the second script sets up all of these environment variables.

ECHO ON
timeout /t 2100 /nobreak 
SET drive=c:
set googlePath=%drive%\shares\projects\google\
SET VS_EDITION=Community
SET NINJA_SUMMARIZE_BUILD=1
set PATH=%googlePath%depot_tools;%PATH%
SET DEPOT_TOOLS_WIN_TOOLCHAIN=0
SET vs2022_install=%drive%\Program Files\Microsoft Visual Studio\18\%VS_EDITION%

SET PARALLEL_JOBS=8
IF(%PARALLEL_JOBS% EQU 0) (
    SET JOBS_PARAMETER=-j%PARALLEL_JOBS%    
) else (
    SET JOBS_PARAMETER=
)

Checking out Chromium/V8

Installing Google’s build tools (depot_tools) and the Chromium source occur in the following script. These steps will create the folders that contain both. The call to gclient initializes Google’s build tools after they are present on the drive. Once those are installed we can used the fetch command to retrieve the code for the repository of interest. We what the chromium source. We use fetch chromium to retrieve it. It will deposit the source code in the current folder. Before calling this command, we make a chromium folder to hold the source and call fetch from within it. Usually, after this command completes, the source code should be present in the folder. However, a few times during my tests encountered Google’s server failing to deliver the code. If this happens, then your source code folder ends up in a state in which only part of the code is checked out. If this happens, running gclient to perform a forced sync can resolve the problem. I’ve included a call to this time-consuming command in my script. Its presence is likely to just consume time without any real effect. But it will save some of you headaches. It’s up to you whether it gets removed. Once again, I decided to give weight to reliability over performance. After the source code is acquired, we can move into the src folder to configure and build.

%drive%
mkdir %googlePath%
cd %googlePath%
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
pushd %googlePath%depot_tools

call gclient

popd
mkdir chromium && cd chromium
call fetch chromium
gclient sync -D --force --reset

Configuring and Compiling

I make several calls to configure and compile targets for the chromium code. The targets we care about are chromium and v8 for both release and debug modes. We use the gn command to generate build configurations. The command accepts as arguments the output folder and a string that has other project-specific build parameters. I call the command 5 times for 5 configurations.

gn gen out\Default 
gn gen out\v8Release --args="is_component_build=false is_debug=false symbol_level=1 v8_enable_object_print=true v8_enable_disassembler=true target_cpu=\"x64\" v8_static_library = true v8_use_external_startup_data=false v8_monolithic=true"
gn gen out\v8Debug --args="is_component_build=false is_debug=true  symbol_level=1 v8_enable_object_print=true v8_enable_disassembler=true target_cpu=\"x64\" v8_static_library = true v8_use_external_startup_data=false v8_monolithic=true"
gn gen out\ChromeDebug --args="is_debug=true"
gn gen out\ChromeRelease --args="is_debug=false symbol_level=1"

After configuration, I kick off the builds. Here, the JOBS_PARAMETER variable is used to limit the number of threads that spin-up for compilation. If you set the PARALLEL_JOBS variable to 0 earlier then the JOBS_PARAMETER variable will be blank. This will let the build system decide how many threads to use itself.

Adjustments for Memory Limitations

When I was first working on this script, I was switching between my primary desktop and primary laptop. The desktop has 160 gigs of RAM, the the laptop 72 gigs of RAM. After I had the script working, I wanted to try it on a variety of hardware. I started running it on other computers to confirm my expectation, which was that this script would obviously work on all these other systems. That thinking was incorrect. On one system it would reliably fail. On another system it would sometimes fail. When I traced through the error output, I came across a clear statement on the nature of the failure.

build step: cxx "./obj/v8/torque_generated_definitions/js-iterator-helpers-tq.obj"
stderr:
LLVM ERROR: out of memory
Allocation failed
PLEASE submit a bug report to https://crbug.com in the Tools>LLVM component, run tools/clang/scripts/process_crashreports.py (only if inside Google) to upload crash related files, and include the crash backtrace, preprocessed source, and associated run script.

The computer had run out of memory during the compilation. The compilation process runs many jobs (or threads) to perform the various steps of compilation in parallel. If you get jobs running, the system can exhaust its memory and memory allocations will fail. That’s what was happening on the other systems. The system with 16 gigs would sometimes fail. But if I restarted the build then it would usually be successful. The system with 8 gigs would always fail. To adjust for this, I can manually cap the number of threads allowed. Once I did this, I could get reliably get the script to compile successfully. In normal times I might use this as an opportunity to encourage others to upgrade their memory. When it comes to storage and RAM, my philosophy is “You don’t have enough until you have more than enough.” But in the last month or two several memory manufacturers have announced they will be emphasizing more profitable markets and withdrawing from consumer sales. Chances are that most of the people reading this don’t have the resources of a data center available and upgrading their memory just isn’t an option (unless they want to pay 200+% for memory). Accommodating for this, I’m setting a cap of 32 jobs for the compilation process. I think that will make the script successful for most. If you want to allow the compilation process to attempt to maximize the number of threads you can set the PARALLEL_JOBS variable in the script to 0. Yes, this will result in the compilation process being slower than maximum on well capable systems unless someone modifies the script. But I wanted this script to be more generally usable. Reliably was given more importance than performance.

Start, Sleep, Finish

The compilation process can take a long time. For v8, depending on your system, it can take an hour or more. For Chromium it can take much longer. This isn’t the type of compilation that that you could start, go get coffee, and expect it to be done when you get back. Rather, you could start it, go to bed, and wake up the next day having gotten plenty of rest only to find that the compilation process isn’t finished. The good news is that your first compilation will take this long, but subsequent ones could be faster, benefiting from the efforts of previous compilations. You’ll want to make sure that your computer doesn’t go to sleep during compilation. I’ve had this happen, even when I set the power policy for the computer to never go to sleep. I’m tempted to resurface a mouse-jiggler to keep the machine awake. Those devices tend to effective even when the computer is locked.

Given the massive amount of time this process can take, it is better if there is a specific machine designated to perform build tasks for Chromium and V8. You could do other work on a machine while compilation occurs. But you might find that your tasks compete with the build process.

How Long Did This Take?

Build times can have wide variance between systems of different performance characteristics. If you want to see how long the various steps of the build process took, the batch file is appending to a file named build_log.txt in the Google folder. You can view this file while the build is in process. But you will need to make sure you don’t lock the file. Some text editors lock a file when it is being viewed. The safest way to view the file is to open a command prompt and dump the contents of the file to the console (using the type command in the command prompt or the Get-Content command in PowerShell). You can review the output from it to see how any of the phases of the build process works.

What about Building V8 Only?

Previously I’ve shown how to check out the v8 code only to build it. I think I’m going to abandon that in favor of this approach, which can be used to build other project. The difference comes down to selecting a build target.


Posts may contain products with affiliate links. When you make purchases using these links, we receive a small commission at no extra cost to you. Thank you for your support.

Mastodon: @j2inet@masto.ai
Instagram: @j2inet
Facebook: @j2inet
YouTube: @j2inet
Telegram: j2inet
Bluesky: @j2i.net



Read the whole story
alvinashcraft
44 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

PDF Diff - Compare PDFs privately and securely in your browser

1 Share


Read the whole story
alvinashcraft
44 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Bugs that survive the heat of continuous fuzzing

1 Share

Even when a project has been intensively fuzzed for years, bugs can still survive.

​​OSS-Fuzz is one of the most impactful security initiatives in open source. In collaboration with the OpenSSF Foundation, it has helped to find thousands of bugs in open-source software.

Today, OSS-Fuzz fuzzes more than 1,300 open source projects at no cost to maintainers. However, continuous fuzzing is not a silver bullet. Even mature projects that have been enrolled for years can still contain serious vulnerabilities that go undetected. In the last year, as part of my role at GitHub Security Lab, I have audited popular projects and have discovered some interesting vulnerabilities.

Below, I’ll show three open source projects that were enrolled in OSS-Fuzz for a long time and yet critical bugs survived for years. Together, they illustrate why fuzzing still requires active human oversight, and why improving coverage alone is often not enough.

Gstreamer

GStreamer is the default multimedia framework for the GNOME desktop environment. On Ubuntu, it’s used every time you open a multimedia file with Totem, access the metadata of a multimedia file, or even when generating thumbnails for multimedia files each time you open a folder.
In December 2024, I discovered 29 new vulnerabilities, including several high-risk issues.

To understand how 29 new vulnerabilities could be found in a software that has been continuously fuzzed for seven years, let’s have a look at the public OSS-Fuzz statistics available here. If we look at the GStreamer stats, we can see that it has only two active fuzzers and a code coverage of around 19%. By comparison, a heavily researched project like OpenSSL has 139 fuzzers (yes, 139 different fuzzers, that is not a typo).

Comparing OSS-Fuzz statistics for OpenSSL and GStreamer.

And the popular compression library bzip2 reports a code coverage of 93.03%, a number that is almost five times higher than GStreamer’s coverage.

OSS-Fuzz project statistics for the bzip2 compression library.

Even without being a fuzzing expert, we can guess that GStreamer’s numbers are not good at all.

And this brings us to our first reason: OSS-Fuzz still requires human supervision to monitor project coverage and to write new fuzzers for uncovered code. We have good hope that AI agents could soon help us fill this gap, but until that happens, a human needs to keep doing it by hand.

The other problem with OSS-Fuzz isn’t technical. It’s due to its users and the false sense of confidence they get once they enroll their projects. Many developers are not security experts, so for them, fuzzing is just another checkbox on their security to-do list. Once their project is “being fuzzed,” they might feel it is “protected by Google” and forget about it. Even if the project actually fails during the build stage and isn’t being fuzzed at all (which happens to more than one project in OSS-Fuzz).

This shows that human security expertise is still required to maintain and support fuzzing for each enrolled project, and that doesn’t scale well with OSS-Fuzz’s success!

Poppler

Poppler is the default PDF parser library in Ubuntu. It’s the library used to render PDFs when you open them with Evince (the default document viewer in Ubuntu versions prior to 25.04) or Papers (the default document viewer for GNOME desktop and the default document viewer from newer Ubuntu releases).

If we check Poppler stats in OSS-Fuzz, we can see it includes a total of 16 fuzzers and that its code coverage is around 60%. Those are quite solid numbers; maybe not at an excellent level, but certainly above average.

That said, a few months ago, my colleague Kevin Backhouse published a 1-click RCE affecting Evince in Ubuntu. The victim only needs to open a malicious file for their machine to be compromised. The reason a vulnerability like this wasn’t found by OSS-Fuzz is a different one: external dependencies.

Poppler relies on a good bunch of external dependencies: freetype, cairo, libpng… And based on the low coverage reported for these dependencies in the Fuzz Introspector database, we can safely say that they have not been instrumented by libFuzzer. As a result, the fuzzer receives no feedback from these libraries, meaning that many execution paths are never tested.

Coverage report table showing line coverage percentages for various Poppler dependencies.

But it gets even worse: Some of Evince’s default dependencies aren’t included in the OSS-Fuzz build at all. That’s the case with DjVuLibre, the library where I found the critical vulnerability that Kevin later exploited.

DjVuLibre is a library that implements support for the DjVu document format, an open source alternative to PDF that was popular in the late 1990s and early 2000s for compressing scanned documents. It has become much less widely used since the standardization of the PDF format in 2008.

The surprising thing is that while this dependency isn’t included among the libraries covered by OSS-Fuzz, it is shipped by default with Evince and Papers. So these programs were relying on a dependency that was “unfuzzed” and at the same time, installed on millions of systems by default.

This is a clear example of how software is only as secure as the weakest dependency in its dependency graph.

Exiv2

Exiv2 is a C++ library used to read, write, delete, and modify Exif, IPTC, XMP, and ICC metadata in images. It’s used by many mainstream projects such as GIMP and LibreOffice among others.

Back in 2021, my teammate Kevin Backhouse helped improve the security of the Exiv2 project. Part of that work included enrolling Exiv2 in OSS-Fuzz for continuous fuzzing, which uncovered multiple vulnerabilities, like CVE-2024-39695, CVE-2024-24826, and CVE-2023-44398.

Despite the fact that Exiv2 has been enrolled in OSS-Fuzz for more than three years, new vulnerabilities have still been reported by other vulnerability researchers, including CVE-2025-26623 and CVE-2025-54080.

In that case, the reason is a very common scenario when fuzzing media formats: Researchers always tend to focus on the decoding part, since it is the most obviously exploitable attack surface, while the encoding side receives less attention. As a result, vulnerabilities in the encoding logic can remain unnoticed for years.

From a regular user perspective, a vulnerability in an encoding function may not seem particularly dangerous. However, these libraries are often used in many background workflows (such as thumbnail generation, file conversions, cloud processing pipelines, or automated media handling) where an encoding vulnerability can be more critical.

The five-step fuzzing workflow

At this point it’s clear that fuzzing is not a magic solution that will protect you from everything. To assure minimum quality, we need to follow some criteria.

In this section, you’ll find the fuzzing workflow I’ve been using with very positive results in the last year: the five-step fuzzing workflow (preparation – coverage – context – value – triaging).

Five-step fuzzing workflow diagram. (preparation - coverage - context - value - triaging)

Step 1: Code preparation

This step involves applying all the necessary changes to the target code to optimize fuzzing results. These changes include, among others:

  • Removing checksums
  • Reducing randomness
  • Dropping unnecessary delays
  • Signal handling

If you want to learn more about this step, check out this blog post

Step 2: Improving code coverage

From the previous examples, it is clear that if we want to improve our fuzzing results, the first thing we need to do is to improve the code coverage as much as possible.

In my case, the workflow is usually an iterative process that looks like this:

Run the fuzzers > Check the coverage > Improve the coverage > Run the fuzzers > Check the coverage > Improve the coverage > …

The “check the coverage” stage is a manual step where i look over the LCOV report for uncovered code areas and the “improve the coverage” stage is usually one of the following:

  • Writing new fuzzing harnesses to hit new code that would otherwise be impossible to hit
  • Creating new input cases to trigger corner cases

For an automated, AI-powered way of improving code coverage, I invite you to check out the Plunger module in my FRFuzz framework. FRFuzz is an ongoing project I’m working on to address some of the caveats in the fuzzing workflow. I will provide more details about FRFuzz in a future blog post.

Another question we can ask ourselves is: When can we stop increasing code coverage? In other words, when can we say the coverage is good enough to move on to the next steps?

Based on my experience fuzzing many different projects, I can say that this number should be >90%. In fact, I always try to reach that level of coverage before trying other strategies, or even before enabling detection tools like ASAN or UBSAN.

To reach this level of coverage, you will need to fuzz not only the most obvious attack vectors such as decoding/demuxing functions, socket-receivers, or file-reading routines, but also the less obvious ones like encoders/muxers, socket-senders, and file-writing functions.

You will also need to use advanced fuzzing techniques like:

  • Fault injection: A technique where we intentionally introduce unexpected conditions (corrupted data, missing resources, or failed system calls) to see how the program behaves. So instead of waiting for real failures, we simulate these failures during fuzzing. This helps us to uncover bugs in execution paths that are rarely executed, such as:
    • Failed memory allocations (malloc returning NULL)
    • Interrupted or partial reads/writes
    • Missing files or unavailable devices
    • Timeouts or aborted network connections

A good example of fault injection is the Linux kernel Fault injection framework

  • Snapshot fuzzing: Snapshot fuzzing takes a snapshot of the program at any interesting state, so the fuzzer can then restore this snapshot before each test case. This is especially useful for stateful programs (operating systems, network services, or virtual machines). Examples include the QEMU mode of AFL++ and the AFL++ Nyx mode.

Step 3: Improving context-sensitive coverage

By default, the most common fuzzers (aka AFL++, libfuzzer, and honggfuzz) track the code coverage at the edge level. We can define an “edge” as a transition between two basic blocks in the control-flow graph. So if execution goes from block A to block B, the fuzzer records the edge A → B as “covered.” For each input the fuzzer runs, it updates a bitmap structure marking which edges were executed as a 0 or 1 value (currently implemented as a byte in most fuzzers).

In the following example, you can see a code snippet on the left and its corresponding control-flow graph on the right:

Edge coverage explanation.
Edge coverage = { (0,1), (0,2), (1,2), (2,3), (2,4), (3,6), (4,5), (4,6), (5,4) }

Each numbered circle corresponds to a basic block, and the graph shows how those blocks connect and which branches may be taken depending on the input. This approach to code coverage has demonstrated to be very powerful given its simplicity and efficiency.

However, edge coverage has a big limitation: It doesn’t track the order in which blocks are executed. 

So imagine you’re fuzzing a program built around a plugin pipeline, where each plugin reads and modifies some global variables. Different execution orders can lead to very different program states, while the edge coverage can still look identical. Since the fuzzer thinks it has already explored all the paths, the coverage-guided feedback won’t keep guiding it, and the chances of finding new bugs will drop.

To address this, we can make use of context-sensitive coverage. Context-sensitive coverage not only tracks which edges were executed, but it also tracks what code was executed right before the current edge.

For example, AFL++ implements two different options for context-sensitive coverage:

  • Context- sensitive branch coverage: In this approach, every function gets its own unique ID. When an edge is executed, the fuzzer takes the IDs from the current call stack, hashes them together with the edge’s identifier, and records the combined value.

You can find more information on AFL++ implementation here

  • N-Gram Branch Coverage: In this technique, the fuzzer combines the current location with the previous N locations to create a context-augmented coverage entry. For example:
    • 1-Gram coverage: looks at only the previous location
    • 2-Gram coverage: considers the previous two locations
    • 4-Gram coverage: considers the previous four

You can see how to configure it in AFL++ here

In contrast to edge coverage, it’s not realistic to aim for a coverage >90% when using context-sensitive coverage. The final number will depend on the project’s architecture and on how deep into the call stack we decide to track. But based on my experience, anything above 60% can be considered a very good result for context-sensitive coverage.

Step 4: Improving value coverage

To explain this section, I’m going to start with an example. Take a look at the following web server code snippet:

Example of a simple webserver code snippet.

Here we can see that the function unicode_frame_size has been executed 1910 times. After all those executions, the fuzzer didn’t find any bugs. It looks pretty secure, right?

However, there is an obvious div-by-zero bug when r.padding == FRAME_SIZE * 2:

Simple div-by-zero vulnerability.

Since the padding is a client-controlled field, an attacker could trigger a DoS in the webserver, sending a request with a padding size of exactly 2156 * 2 = 4312 bytes. Pretty annoying that after 1910 iterations the fuzzer didn’t find this vulnerability, don’t you think?

Now we can conclude that even having 100% code coverage is not enough to guarantee that a code snippet is free of bugs. So how do we find these types of bugs? And my answer is: Value Coverage.

We can define value coverage as the coverage of values a variable can take. Or in other words, the fuzzer will now be guided by variable value ranges, not just by control-flow paths. 

If, in our earlier example, the fuzzer had value-covered the variable r.padding, it could have reached the value 4312 and in turn, detected the divide-by-zero bug.

So, how can we make the fuzzer to transform variable values in different execution paths? The first naive implementation that came to my mind was the following one:

inline uint32_t value_coverage(uint32_t num) {

   uint32_t no_optimize = 0;
  
   if (num < UINT_MAX / 2) {
       no_optimize += 1;
       if(num < UINT_MAX / 4){
           no_optimize += 2;
           ...
       }else{
           no_optimize += 3
           ...
       }

   }else{
       no_optimize += 4;
       if(num < (UINT_MAX / 4) * 3){
           no_optimize += 5;
           ...
       }else{
           no_optimize += 6;
           ...
       }
   }

   return no_optimize;
}

In this code, I implemented a function that maps different values of the variable num to different execution paths. Notice the no_optimize variable to avoid the compiler from optimizing away some of the function’s execution paths.

After that, we just need to call the function for the variable we want to value-cover like this:

static volatile uint32_t vc_noopt;

uint32_t webserver::unicode_frame_size(const HttpRequest& r) {

   //A Unicode character requires two bytes
   vc_noopt = value_coverage(r.padding); //VALUE_COVERAGE
   uint32_t size = r.content_length / (FRAME_SIZE * 2 - r.padding);

   return size;
}

Given the huge number of execution paths this can generate, you should only apply it to certain variables that we consider “strategic.” By strategic, I mean those variables that can be directly controlled by the input and that are involved in critical operations. As you can imagine, selecting the right variables is not easy and it mostly comes down to the developers and researchers experience.

The other option we have to reduce the total number of execution paths is by using the concept of “buckets”: Instead of testing all 2^32 possible values of a 32 bits integer, we can group those values into buckets, where each bucket transforms into a single execution path. With this strategy, we don’t need to test every single value and can still achieve good results.

These buckets also don’t need to be symmetrically distributed across the full range. We can emphasize certain subranges by creating smaller buckets or, create bigger buckets for ranges we are not so interested in.

Now that I’ve explained the strategy, let’s take a look at what real-world options we have to get value coverage in our fuzzers:

  • AFL++ CmpLog / Clang trace-cmp: These focus on tracing comparison values (values used in calls to ==, memcmp, etc.). They wouldn’t help us find our divide-by-zero bug, since they only track values used in comparison instructions.
  • Clang trace-div + libFuzzer -use_value_profile=1: This one would work in our example, since it traces values involved in divisions. But it doesn’t give us variable-level granularity, so we can only limit its scope by source file or function, not by specific variable. That limits our ability to target only the “strategic” variables.

To overcome these problems with value coverage, I wrote my own custom implementation using the LLVM FunctionPass functionality. You can find more details about my implementation by checking the FRFuzz code here.

The last mile: almost undetectable bugs

Even when you make use of all up-to-date fuzzing resources, some bugs can still survive the fuzzing stage. Below are two scenarios that are especially hard to tackle with fuzzing.

Big input cases

These are vulnerabilities that require very large inputs to be triggered (on the order of megabytes or even gigabytes). There are two main reasons they are difficult to find through fuzzing:

  • Most fuzzers cap the maximum input size (for example 1 MB in the case of AFL), because larger inputs lead to longer execution times and lower overall efficiency.
  • The total possible input space is exponential: O(256ⁿ), where n is the size in bytes of the input data. Even when coverage-guided fuzzers use heuristic approaches to tackle this problem, fuzzing is still considered a sub-exponential problem, with respect to input size. So the probability of finding a bug decreases rapidly as the input size grows.

For example, CVE-2022-40303 is an integer overflow bug affecting libxml2 that requires an input larger than 2GB to be triggered.

Bugs that require “extra time” to be triggered

These are vulnerabilities that can’t be triggered within the typical per-execution time limit used by fuzzers. Keep in mind that fuzzers aim to be as fast as possible, often executing hundreds or thousands of test cases per second. In practice, this means per-execution time limits on the order of 1–10 milliseconds, which is far too short for some classes of bugs.

As an example, my colleague Kevin Backhouse found a vulnerability in the Poppler code that fits well in this category: the vulnerability is a reference-count overflow that can lead to a use-after-free vulnerability.

Reference counting is a way to track how many times a pointer is referenced, helping prevent vulnerabilities such as use-after-free or double-free. You can think of it as a semi-manual form of garbage collection.

In this case, the problem was that these counters were implemented as 32-bit integers. If an attacker can increment the counter up to 2^32 times, it will wrap the value back to 0 and then trigger a use-after-free in the code.

Kevin wrote a proof of concept that demonstrated how to trigger this vulnerability. The only problem is that it turned out to be quite slow, making exploitation unrealistic: The PoC took 12 hours to finish.

That’s an extreme example of a bug that needs “extra time” to manifest, but many vulnerabilities require at least seconds of execution to trigger. Even that is already beyond the typical limits of existing fuzzers, which usually set per-execution timeouts well under one second.

That’s why finding vulnerabilities that require seconds to trigger is almost a chimera for fuzzers. And this effectively discards a lot of real-world exploitation scenarios from what fuzzers can find.

It’s important to note that although fuzzer timeouts frequently turn out to be false alarms, it’s still a good idea to inspect them. Occasionally they expose real performance-related DoS bugs, such as quadratic loops.

How to proceed in these cases?

I would like to be able to give you a how-to guide on how to proceed in these scenarios. But the reality is we don’t have effective fuzzing strategies for these case corners yet.

At the moment, mainstream fuzzers are not able to catch these kinds of vulnerabilities. To find them, we usually have to turn to other approaches: static analysis, concolic (symbolic + concrete) testing, or even the old-fashioned (but still very profitable) method of manual code review.

Conclusion

Despite the fact that fuzzing is one of the most powerful options we have for finding bugs in complex software, it’s not a fire-and-forget solution. Continuous fuzzing can identify vulnerabilities, but it can also fail to detect some attack vectors. Without human-driven work, entire classes of bugs have survived years of continuous fuzzing in popular and crucial projects. This was evident in the three OSS-Fuzz examples above.

I proposed a five-step fuzzing workflow that goes further than just code coverage, covering also context-sensitive coverage and value coverage. This workflow aims to be a practical roadmap to ensure your fuzzing efforts go beyond the basics, so you’ll be able to find more elusive vulnerabilities.

If you’re starting with open source fuzzing, I hope this blog post helped you better understand current fuzzing gaps and how to improve your fuzzing workflows. And if you’re already familiar with fuzzing, I hope it gives you new ideas to push your research further and uncover bugs that traditional approaches tend to miss.

Want to learn how to start fuzzing? Check out our Fuzzing 101 course at gh.io/fuzzing101 >

The post Bugs that survive the heat of continuous fuzzing appeared first on The GitHub Blog.

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

AI IDEs – What Do You Choose?

1 Share
From: VisualStudio
Duration: 1:06:23
Views: 177

In this recorded Live! 360 session, Brian Randell breaks down the fast-moving world of AI-powered IDEs and developer tools—helping you decide what to use, when to use it, and why. From GitHub Copilot and Visual Studio 2026 to Claude Code, Cursor, and fully local models, this talk cuts through the hype with practical, experience-based guidance.

Through live demos and real workflows, Brian explores IDE-integrated AI, CLI-based agents, and local-first AI setups using tools like Claude, Ollama, and Copilot. You’ll see how agent modes, planning modes, MCP servers, and model selection impact productivity, cost, security, and reliability—and why AI works best as a developer companion, not a replacement.

🔑 What You’ll Learn
• How AI IDEs and agents are changing developer workflows
• The differences between IDE plugins, AI-first IDEs, and CLI agents
• When GitHub Copilot, Claude Code, Cursor, or local models make sense
• How agent mode, planning mode, and MCP servers work in practice
• Cost, context-window, and model-selection tradeoffs
• How to run AI tools locally for privacy, air-gapped, or offline environments
• Why documentation, constraints, and iteration matter when working with AI

⏱️ Chapters
00:00 Session intro + framing
02:30 Setting up the the demo
03:25 Opening the project in VS Code (solution structure overview)
05:31 Start Demo: Claude Code in the terminal
06:02 Initializing Claude in the repo (creating its project context file)
10:38 Plan Mode: requesting new service methods + Dapper data layer
14:05 Current AI IDE landscape & market trends
19:55 Major architectures of AI IDEs: Plugin/Extension, Standalone, CLI/Terminal
22:19 Claude plan review + proceeding with changes
27:13 Market leaders and their distinguishing features
34:15 Visual Studio + GitHub Copilot overview (modes/models/MCP)
43:30 Local AI workflow: Ollama + Continue in VS Code
56:25 Tool roundup: Cursor, Windsurf, JetBrains, CLIs
1:03:00 Enterprise Needs: Security, compliance, governance, and audit trails
1:04:00 Final guidance: how to choose what’s right for you

👤 Speaker: Brian Randell

🔗 Links
Download Visual Studio 2026: http://visualstudio.com/download
Explore more Live! 360 sessions: https://aka.ms/L360Orlando25
Join upcoming VS Live! events: https://aka.ms/VSLiveEvents

#GitHubCopilot #VisualStudio2026 #DeveloperTools #AgenticAI

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

3 Easy PowerShell Scripting Tips for Coding Masochists

1 Share

Tip: Read the ‘Important Notes’ section, because these are notes that are important.

1. Optimize Authentication Requests

  • What: Check for Existing Authentication Contexts before Repeating
  • How: Use Connection Context output
  • Why: Avoid repetitive, unnecessary authentication
  • Where: Your script code, Azure Automation Runbooks
  • When: Right now

Most API’s provide return data or establish a “context” once you complete the authentication process. When using cmdlets like Connect-AzAccount, Connect-Entra, Connect-ExchangeOnline, Connect-MicrosoftTeams, Connect-MgGraph, Connect-PnPOnline, and so on, you can either redirect the output of these to a variable, or use a context function to fetch them.

Why? If you run the same script or code block repeatedly, and it prompts for authentication every time, it not becomes a hassle, but it can waste time. How much this factors into time savings will depend on your environment(s) and usage patterns. Consider the following code example:

# Comment heading, because you always have one, right?

Connect-AzAccount # will force authentication every time

# ...more code...

Each time you run that, it will prompt for the Azure authentication. A small change can make it so you only get prompted the first time…

if (-not (Get-AzContext)) {
    # will only get invoked when there is no existing context
    Connect-AzAccount
}

If you happen to work with multiple tenants, you may want to add a check for the specific tenant ID as well…

$tenantId = "your_kickass_tenant_id"

if ((Get-AzContext).Tenant.Id -ne $tenantId) {
    # only invoked if context doesn't match or there is no context
    Connect-AzAccount -Tenant $tenantId 
}

More examples…

$tenantId = "your_kickass_tenant_id"
if ((Get-EntraContext).TenantId -ne $tenantId) {
    # only invoked when you haven't had coffee
    Connect-Entra -Tenant $tenantId
}

if ((Get-MgContext).TenantId -ne $tenantId) {
    # only invoked when you're paying attention, same kick-ass Tenant Id most likely
    Connect-MgGraph -TenantId $tenantId -NoWelcome
}

$spoSiteUrl = "your_kickass_sharepoint_online_site_url"
if ((Get-PnPContext).Url -ne $spoSiteUrl) {
    # only invoked when you first connect to your kick-ass sharepoint site
    Connect-PnPOnline -Url $spoSiteUrl -Interactive
}

You can also use Get-PnPConnection as an alternative. The MicrosoftTeams module doesn’t have a context-related cmdlet that I know of, which kind of sucks, like a broken vacuum cleaner. But life isn’t all bad.

2. Avoid Re-typing Credentials

  • What: Avoid Re-entering Passwords, Tenant and Subscription IDs
  • How: Store Credentials, Tenant ID’s, Subscription ID’s in Secret Vaults
  • Why: To reduce mistakes, limit security exposure
  • Where: On your computer, in Azure KeyVaults, or Azure Automation Credentials and Variables
  • When: As soon as possible

You may have noticed that some of the examples above define $tenantId or $spoSiteUrl. You may be doing this with other things like subscription Id’s, resource groups, usernames, and more. This is VERY BAD – Do NOT do that!

Any sensitive values should be stored securely so that if your scripts land in the wrong hands, they don’t hand the keys to your stolen car.

If you’re using any of the PowerShell Connect- functions that support a -Credential parameter, you can save a little time by feeding that from a credential vault. One simple way to do this is with the SecretManagement module. This works with various credential vaults like Windows Credential Manager, LastPass, 1Password, BitWarden and more.

Note: This does not circumvent safety controls like Entra Privileged Identity Management (PIM)

$myCredential = Get-Secret -Name AzureLogin123 -Vault PersonalVault

3. Suppress Unwanted Noise

  • What: Disable or reduce unneeded output
  • How: Use parameters like -NoWelcome, -WarningAction SilentlyContinue, Out-Null (or $null = ... )
  • Why: Clean output reduces processing overhead and avoids pipeline noise
  • Where: Every Connect- cmdlet or function that returns noisy output that you aren’t putting to use.
  • When: Always

Each time you connect to Microsoft Graph, it displays a welcome message that looks like the top half of a CVS receipt, only without coupons. There’s a marketing tip for Microsoft: Inject coupon codes in your API connection responses. You’re welcome.

You will also see: “NOTE: You can use the -NoWelcome parameter to suppress this message.” So, guess what: You can add -NoWelcome to quiet it down. They don’t have a -STFU parameter, but you could always wrap that yourself.

In addition to benign output, there are situations where even Warning output can make things messy. For example, within Azure Automation Runbooks, if you have output sent to a Log Analytics Workspace, the Warning output stream doesn’t need idiotic boy-who-cried-wolf warnings filling up your logs.

Important Notes

These notes are important.

  • As with any PowerShell content, things will change over time and some parameters may be added, replaced or removed. The examples provided herein, forthwith, notwithstanding and ipso facto, lorem ipsum are semi-valid as of 2025, December 29, anno domini, 12:25:32 PM Eastern Standard Time, planet Earth, Solar System 423499934.
  • Never run any script code, provided by humans, reptiles or AI services, in any production environment without thoroughly testing in non-production environments. Unless of course, you just don’t care about being fired or sued, or sent to a torture facility in El Salvadore.
  • References to trademark names like CVS are coincidental and imply no sponsorship, condonement, favor, agreements, contracts, eye winks, strange head nods, or thumbs-up gestures from either party. And who has time for parties. Anyhow, I have a prescription to pick up at CVS.
  • If you don’t care for humor, that’s okay. Neither do I.


Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories