Writers Write shares writing prompts and writing resources. Use these 31 writing prompts for March 2026 to get you writing.
This year, I want to focus on the most common challenges we face as writers.
Every month, along with a prompt for every day, Iâd like to share an exercise that helps us deal with challenges like writerâs block, perfectionism, procrastination, and imposter syndrome.
We tend to want stories to start at the beginning, and thatâs where many writers get stuck. This month, letâs skip the setup and dive straight into the middle of things.
Writing from the middle forces you to jump in with action and lets you feel the storyâs heartbeat. The characters already know each other. The stakes are high. You just need to hold on.
Skip your opening scene. Write something from the middle of your story. It can be a turning point, a fight, a revelation. Donât worry about continuity. Just follow the energy.
It removes pressure to be perfect and reminds you that writing is discovery, not performance.
Next month, we will discuss fear.
Good luck, writer.

Download your prompts here: Writing Prompts March 2026
If you would rather have a free daily writing prompt from us, sign up here:Â Join Our Newsletter
Make the most of your writing prompts. Read How To Use Writing Prompts
Happiness
Mia

by Mia Botha
Looking for more prompts?
Top Tip: Find out more about our workbooks and online courses in our shop.
The post 31 Writing Prompts For March 2026 appeared first on Writers Write.
If youâve been paying attention to Claude lately, youâve probably noticed one thing.
Itâs really good at Excel.
Not formulas.
Not basic charts.
I mean understanding structured data. Reasoning across multiple sheets. Spotting correlations. Pulling out insights. The kind of work that usually takes a data analyst or data scientist a lot of time.
Hereâs the part most people miss.
You already get that capability inside Microsoft 365 Copilot.
Let me show you what that actually looks like in the real world.
This is how the work showed up for me.
In this example scenario (all data is built using Copilot for demonstration purposes), an executive leader reached out and said:
âMy CFO wants a Q2 financial summary of paramedic overtime across multiple counties.â
What I got back was raw data.
Messy.
Unstructured.
No story.
Normally, this turns into hours of work:
Instead, I used Copilot in Excel with Agent Mode.
Six prompts.
Thatâs it.
Copilot in Excel with Agent Mode is where Copilot goes from âhelpful assistantâ to something much closer to a reasoning partner.
Under the hood, itâs doing things like:
This is also where you might notice model selection.
If your IT admin has enabled Anthropic, youâll see Claude as an option. If you donât, thatâs an admin setting, not a missing feature.
And that matters, because this is exactly the type of work Claude has been getting attention for.
Hereâs what I built using natural language.
I asked Copilot to take raw overtime data and turn it into structured tables that could actually support dashboards and reporting.
Without telling it which charts to build or how to lay things out, Copilot created a working dashboard and an SBARâstyle report structure.
I prompted it to explain what was happening in the data. It called out overtime spikes in May and flagged operational risk.
I asked for CFO talking points.
Copilot generated:
One prompt created an entirely new sheet to stressâtest scenarios.
All of this happened inside Excel.
No exporting.
No rework.
No separate AI tool.
This is where the conversation usually goes sideways.
âIsnât this replacing data scientists?â
No.
What it replaced was busy work.
AI is very good at:
Humans are very good at:
In this sceanrio, I knew what the CFO cared about.
I knew the story that needed to be told.
I knew what to stressâtest.
Copilot didnât replace that. It sped it up.
What used to take hours took about half an hour, including review.
Thatâs not a loss of skill. Thatâs leverage.
This is the mindset shift.
Stop thinking about Copilot as âAI that answers questions.â
Start thinking about it as delegation.
I delegated:
Then I reviewed it, applied judgment, and refined.
When I handed this back to the CFO, I didnât even use Agent Mode. I switched to standard Copilot and asked:
âSummarize this in one paragraph and give me three bullets a CFO would care about.â
That was it.
Copilot understood the entire workbook and produced executiveâready talking points.
Claudeâs Excel capabilities are impressive.
What matters more is where you can actually use them.
Copilot brings that level of reasoning into the tools people already work in. Excel.
Teams.
Word.
AI isnât here to think for you.
Itâs here to handle the mechanics so you can focus on judgment, context, and decisions.
Thatâs the difference.
Go try it.
Have some fun.
And start delegating.
Authors: Harshad Sane, Andrew Halaney
Imagine thisâââyou click play on Netflix on a Friday night and behind the scenes hundreds of containers spring to action in a few seconds to answer your call. At Netflix, scaling containers efficiently is critical to delivering a seamless streaming experience to millions of members worldwide. To keep up with responsiveness at this scale, we modernized our container runtime, only to hit a surprising bottleneck: the CPU architecture itself.
Let us walk you through the story of how we diagnosed the problem and what we learned about scaling containers at the hardware level.
When application demand requires that we scale up our servers, we get a new instance from AWS. To use this new capacity efficiently, pods are assigned to the node until its resources are considered fully allocated. A node can go from no applications running to being maxed out within moments of being ready to receive these applications.
As we migrated more and more from our old container platform to our new container platform, we started seeing some concerning trends. Some nodes were stalling for long periods of time, with a simple health check timing out after 30 seconds. An initial investigation showed that the mount table length was increasing dramatically in these situations, and reading it alone could take upwards of 30 seconds. Looking at systemdâs stack it was clear that it was busy processing these mount events as well and could lead to complete system lockup. Kubelet also timed out frequently talking to containerd in this period. Examining the mount table made it clear that these mounts were related to container creation.
The affected nodes were almost all r5.metal instances, and were starting applications whose container image contained many layers (50+).
The flamegraph in Figure 1 clearly shows where containerd spent its time. Almost all of the time is spent trying to grab a kernel-level lock as part of the various mount-related activities when assembling the containerâs root filesystem!

Looking closer, containerd executes the following calls for each layer if using user namespaces:
These bind mounts are owned by the containerâs user range and are then used as the lowerdirs to create the overlayfs-based rootfs for the container. Once the overlayfs rootfs is mounted, the bind mounts are then unmounted since they are not necessary to keep around once the overlayfs is constructed.
If a node is starting many containers at once, every CPU ends up busy trying to execute these mounts and umounts. The kernel VFS has various global locks related to the mount table, and each of these mounts requires taking that lock as we can see in the top of the flamegraph. Any system trying to quickly set up many containers is prone to this, and this is a function of the number of layers in the container image.
For example, assume a node is starting 100 containers, each with 50 layers in its image. Each container will need 50 bind mounts to do the idmap for each layer. The containerâs overlayfs mount will be created using those bind mounts as the lower directories, and then all 50 bind mounts can be cleaned up via umount. Containerd actually goes through this process twice, once to determine some user information in the image and once to create the actual rootfs. This means the total number of mount operations on the start up path for our 100 containers is 100 * 2 * (1 + 50 + 50) = 20200 mounts, all of which require grabbing various global mount related locks!
As alluded to in the introduction, Netflix has been undergoing a modernization of its container runtime. In the past a virtual kubelet + docker solution was used, whereas now a kubelet + containerd solution is being used. Both the old runtime and the new runtime used user namespaces, so whatâs the difference here?
Figure 2 below is a simplified example of how this idmap feature looks like:

As noted earlier, the issue was predominantly occurring on r5.metal instances. Once we identified the root issue we could easily reproduce by creating a container image with many layers and sending hundreds of workloads using the image to a test node.
To better understand why this bottleneck was more profound on some instances compared to others, we benchmarked container launches on different AWS instance types:
Figure 3 shows the baseline results from scaling containers on each instance type

Using perf record and custom microbenchmarks, we can see the hottest code path was in the Linux kernelâs Virtual Filesystem (VFS) path lookup codeâââspecifically, a tight spin loop waiting on a sequence lock in path_init(). The CPU spent most of its time executing the pause instruction, indicating many threads were spinning, waiting for the global lock, as shown in the disassembly snippet below
path_init():
âŚ
mov mount_lock,%eax
test $0x1,%al
je 7c
pause
âŚ
Using Intelâs Topdown Microarchitecture Analysis (TMA), we observed:
Given a high amount of time being spent in contested accesses, the natural thinking from a perspective of hardware variations led to investigation of NUMA and Hyperthreading impact coming from the architecture to this subset
Non-Uniform Memory Access (NUMA) is a system design where each processor has its own local memory for faster access but relies on an interconnect to access the memory attached to a remote processor. Introduced in the 1990s to improve scalability in multiprocessor systems, NUMA boosts performance but also introduces higher latency when a CPU needs to access memory attached to another processor. Figure 4 is a simple image describing local vs remote access patterns of a NUMA architecture

AWS instances come in a variety of shapes and sizes. To obtain the largest core count, we tested the 2-socket 5th generation metal instances (r5.metal), on which containers were orchestrated by the titus agent. Modern dual-socket architectures implement NUMA design, leading to faster local but higher remote access latencies. Although container orchestration can maintain locality, global locks can easily run into high latency effects due to remote synchronization. In order to test the impact of NUMA, we tested an AWS 48xl sized instance with 2 NUMA nodes or sockets versus an AWS 24xl sized instance, which represents a single NUMA node or socket. As seen from Figure 5, the extra hop introduces high latencies and hence failures very quickly.


Some modern server CPUs use a mesh-style interconnect to link cores and cache slices, with each intersection managing cache coherence for a subset of memory addresses. In these designs, all communication passes through a central queueing structure, which can only handle one request for a given address at a time. When a global lock (like the mount lock) is under heavy contention, all atomic operations targeting that lock are funneled through this single queue, causing requests to pile up and resulting in memory stalls and latency spikes.
In some well-known mesh-based architectures as shown in Figure 7 below, this central queue is called the âTable of Requestsâ (TOR), and it can become a surprising bottleneck when many threads are fighting for the same lock. If youâve ever wondered why certain CPUs seem to âpause for breathâ under heavy contention, this is often the culprit.

Some modern server CPUs use a distributed, chiplet-based architecture (Figure 8), where multiple core complexes, each with their own local last-level cacheâââare connected via a high-speed interconnect fabric. In these designs, cache coherence is managed within each core complex, and traffic between complexes is handled by a scalable control fabric. Unlike mesh-based architectures with centralized queueing structures, this distributed approach spreads contention across multiple domains, making severe stalls from global lock contention less likely. For those interested in the technical details, public documentation from major CPU vendors provides deeper insight into these distributed cache and chiplet designs.

Here is a comparison of the same workload run on m7i (centralized cache architecture) vs m7a (distributed cache architecture). Note that, in order to make it closely comparable, Hyperthreading (HT) was disabled on m7i, given previous regression seen in Figure 6, and experiments were run using same core counts. The result clearly shows a fairly consistent difference in performance of approximately 20% as shown in Figure 9

To prove the above theory related to NUMA, HT and micro-architecture, we developed a small microbenchmark which basically invokes a given number of threads that then spins on a globally contended lock. Running the benchmark at increasing thread counts reveals the latency characteristics of the system under different scenarios. For example, Figure 10 below is the microbenchmark results with NUMA, HT and different microarchitectures.

Results from this custom synthetic benchmark (pause_bench) confirmed:
While understanding the impacts of the hardware architecture is important for assessing possible mitigations, the root cause here is contention over a global lock. Working with containerd upstream we came to two possible solutions:
Since using the newer API requires using a new kernel, we opted to make the latter change to benefit more of the community. With that in place, no longer do we see containerdâs flamegraph being dominated by mount-related operations. In fact, as seen in Figure 11 below we had to highlight them in purple below to see them at all!

Our journey migrating to a modern kubelet + containerd runtime at Netflix revealed just how deeply intertwined software and hardware architecture can be when operating at scale. While kubelet/containerdâs usage of unique container users brought significant security gains, it also surfaced new bottlenecks rooted in kernel and CPU architectureâââparticularly when launching hundreds of many layered container images in parallel. Our investigation highlighted that not all hardware is created equal for this workload: centralized cache management amplified cache contention while distributed cache design smoothly scaled under load.
Ultimately, the best solution combined hardware awareness with software improvements. For an immediate mitigation we chose to route these workloads to CPU architectures that scaled better under these conditions. By changing the software design to minimize per-layer mount operations, we eliminated the global lock as a launch-time bottleneckâââunlocking faster, more reliable scaling regardless of the underlying CPU architecture. This experience underscores the importance of holistic performance engineering: understanding and optimizing both the software stack and the hardware it runs on is key to delivering seamless user experiences at Netflix scale.
We trust these insights will assist others in navigating the evolving container ecosystem, transforming potential challenges into opportunities for building robust, high-performance platforms.
Special thanks to the Titus and Performance Engineering teams at Netflix.
Mount Mayhem at Netflix: Scaling Containers on Modern CPUs was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.