| This post first appeared on Nick Tune’s Weird Ideas and is being republished here with the author’s permission. |
A well-crafted system prompt will increase the quality of code produced by your coding assistant. It does make a difference. If you provide guidelines in your system prompt for writing code and tests, coding assistants will follow the guidelines.
Although that depends on your definition of “will follow.” If your definition is “will follow often” then it’s accurate. If your definition is “will follow always” or even “will follow most of the time,” then it’s inaccurate (unless you’ve found a way to make them reliable that I haven’t—please let me know).
Coding agents will ignore instructions in the system prompt on a regular basis. As the context window fills up and starts to intoxicate them, all bets are off.
Even with the latest Opus 4.5 model, I haven’t noticed a major improvement. So if we can’t rely on models to follow system prompts, we need to invest in feedback cycles.
I’ll show you how I’m using Claude Code hooks to implement automatic code review on all AI-generated code so that code quality is higher before it reaches the human in the loop.
| You can find a code example that demonstrates the concepts discussed in this post on my GitHub. |
When I talk about auto code review in this post, I am describing a fast feedback mechanism intended to review common code quality issues. This will be run whenever Claude has finished making edits, so it needs to be fast and efficient.
I also use coding assistants for detailed code reviews when reviewing a PR, for example. That will spin up multiple subagents and take a bit longer. That’s not what I’m talking about here.

The purpose of the auto code review is to reinforce what’s in your system prompt, project documentation, and on-demand skills. Things that Claude may have ignored. Part of a multipronged approach.
Wherever possible, I recommend using your lint and test rules to bake in quality, and leave auto code review for more semantic issues that tools can’t check.
If you want to set a maximum length for your files or maximum level of indentation, then use your lint tool. If you want to enforce a minimum test coverage, use your test framework.
A semantic code review looks at how well the code is designed. For example, naming: Does the code accurately describe the business concepts it represents?
AI will often default to names like “helper” and “utils.” But AI is also good at understanding the nuance and finding better names if you challenge it, and it can do this quickly. So this is a good example of a semantic rule.
You can ban certain words like “helper” and “utils” with lint tools. (I recommend doing that.) But that won’t catch everything.
Another example is logic leaking out of the domain model. When a use case/application service queries an entity and then makes a decision, it’s highly likely your domain logic is leaking into the application layer. Not so easy to catch with lint tools, but worth addressing.

Another example is default fallback values. When Claude has an undefined value where a value is expected, it will set a default value. It seems to hate throwing exceptions or challenging the type signature and asking, “Should we allow undefined here?” It wants to make the code run no matter what and no matter how much the system prompt tells it not to.

You can catch some of this with lint rules but it’s very nuanced and depends on the context. Sometimes falling back to a default value is correct.
If you’re using Claude Code and want to build an auto code review for checks that you can’t easily define with lint or testing tools, then a solution is to configure a script that runs on the Stop hook.
The Stop hook is when Claude has finished working and passes control back to the user to make a decision. So here, you can trigger a subagent to perform the review on the modified files.
To trigger the subagent you need to return the error status code which blocks the main agent and forces them to read the output.

I think it’s generally considered a best practice to use subagent focused on the review with a very critical mindset. Asking the main agent to mark its own homework is obviously not a good approach, and it will use up your context window.
| The solution I use is available on GitHub. You can install it as a plug-in in your repo and customize the code review instructions, or just use it as inspiration for your own solution. Any feedback is welcome. |
In the example above you can see it took 52 seconds. Probably quicker than me reviewing and providing the feedback myself. But that’s not always the case. Sometimes it can take a few minutes.
If you’re sitting there blocked waiting for review, this might be slower than doing it yourself. But if you’re not blocked and are working on something else (or watching TV), this saves you time because the end result will be higher quality and require less of your time to review and fix.
I want my auto code review to only review files that have been modified since the last pull request. But Claude doesn’t provide this information in the context to the Stop hook.
I can find all files modified or unstaged using Git, but that’s not good enough.
What I do instead is to hook into PostToolUse by keeping a log of each modified file.

When the Stop hook is triggered, the review will find the files modified since the last review and ask the subagent to review only those. If there are no modified files, the code review is not activated.
Stop HookUnfortunately the Stop hook is not 100% reliable for this use case for a few reasons. Firstly, Claude might stop to ask a question, e.g. for you to clarify some requirements. You might not want the auto review to trigger here until you’ve answered Claude and it has finished.
The second reason is that Claude can commit changes before the Stop hook. So by the time the subagent performs the review, the changes are already committed to Git.
That might not be a problem, and there are simple ways to solve it if it is. It’s just extra things to keep in mind and setup.
The ideal solution would be for Anthropic (or other tool vendors) to provide us hooks that are higher level in abstraction—more aligned with the software development workflow and not just low-level file modification operations.
What I would really love is a CodeReadyForReview hook which provides all the files that Claude has modified. Then we can throw away our custom solutions.
I don’t know if I’m not looking in the right places or if the information isn’t out there, but I feel like this solution is solving a problem that should already be solved.
I’d be really grateful if you can share any advice that helps to bake in code quality before the human in the loop has to review it.
Until then I’ll continue to use this auto code review solution. When you’re giving AI some autonomy to implement tasks and reviewing what it produces, this is a useful pattern that can save you time and reduce frustration from having to repeat the same feedback to AI.
My first encounter with DevOps was so simple that I didn’t even realize its power. Let me share the story so you can see how it went from accidental discovery to deliberate practice, and why it was such a dramatic pivot.
The backdrop to this pivotal moment was a software delivery setup you might find anywhere. The development team built software in a reasonably iterative and incremental fashion. About once a month, the developers created a gold copy and passed it to the ops team.
The ops team installed the software on our office instance (we drank our own champagne). After two weeks of smooth running, they promoted the version to customer instances.
It wasn’t a perfect process, but it benefited from muscle memory, so there wasn’t an urgent imperative to change it. The realization that a change was needed came from the first DevOps moment.
When the ops team deployed the new version, they would review the logs to see if anything interesting or unexpected popped up as a result of the deployment. If they found something, they couldn’t get a quick answer, and it sometimes meant they opted to roll back rather than wait.
This was a comic-strip situation because the development team was a few meters away in their team room. It’s incredible how something as simple as a door transforms co-located teams into remote workers.
The ops team raised their request through official channels, and the developers didn’t even know they were causing more work and stress because the ticket hadn’t reached them yet.
Thankfully, one of the ops team members highlighted this. The next time they started a deployment, a developer was paired with them to watch the logs. A low-fi solution and not one you’d think much about. That developer was me. For this post, we’ll call my ops team partner “Tony”.
The day-one experience of this new collaborative process didn’t seem groundbreaking. When a log message popped up that surprised Tony, it surprised me too. The messages weren’t any more helpful to a developer than they were to the ops team.
I could think through what might be happening, talk it through, and then Tony and I would come up with a theory. We’d test the theory by trying to make another similar log message appear. Then we’d scratch our heads and try to decide whether this could wait for a fix or warranted a rollback.
The plan to bring people from the two teams together was intended to remove the massive communication lag, and it did. But further improvements were to come as a side effect, yielding more significant gains.
As a developer, when you generate log messages and then have to interpret them, you’ve completed a pain loop. Pain loops are potent drivers of improvement.
Most organizations have unresolved pain pathways. That means someone creates pain, like a developer throwing thousands of vague exceptions every minute, and then someone else feels it, like Tony when he’s trying to work out what the log means.
There are two ways to resolve the pain pathway.
If I’m the one who gets the electric shock when I press the button, I stop pushing it, even if someone in a white coat instructs me to continue the experiment.
With the pain loop connected, I realized we should log fewer messages to reduce the scroll and review burden. Instead of needing institutional knowledge of which messages were perpetually present and could therefore be ignored, we could stop logging them.
The (perhaps asymptotic) goal was to log only the events that required human review, with a toggle that let more verbose logging be generated on demand. Instead of scrolling through a near-infinite list of logs, you’d have a nearly empty view. If a log appeared, it was important enough to warrant your attention.
The next idea was to improve the information in the log messages. We could identify which customer or user experienced the error and provide context for it. By improving these error messages, we could often identify the bug before we even opened the code, dramatically reducing our investigation time.
This process evolved into the three Fs of event logging.
Another thread that emerged from the simple act of sitting together during deployments was the realization that the deployment process was nasty. We created an installer file, and the ops team would move it to the target server, double-click it, then follow the prompts to configure the instance.
Having to paste configuration values into the installer was slow and error-prone. We spent a disproportionate amount of time improving this process.
Admittedly, we were solving this one “inside the box” by improving an individual installation with DIY scripts, a can of lubricating spray, and sticky tape. This didn’t improve the experience of repeating the install across several environments and multiple production instances.
However, I did get to experience the stress of deployments when their probability of success was anything less than “very high”. When deployments weren’t a solved problem, they could damage team reputation, erode trust, and reduce autonomy.
Failed deployments are the leading cause of organizations working in larger batches. Large batches are a leading cause of failed deployments. This is politely called a negative spiral, and you have to reverse it urgently if you want to survive.
The act of sitting a developer with an ops team member during deployments isn’t going to solve all your problems. As we scaled from 6 to 30 developers, pursued innovative new directions for our product, and repositioned our offering and pricing, new pain kept emerging. Continuous improvement really is a game of whack-a-mole, and there’s no final state.
Despite this, the simple act of sitting together, otherwise known as collaboration, caused a chain reaction of beneficial changes.
When you’re sitting with someone working on the same problem, all the departmental otherness evaporates. You’re just two humans trying to make things work.
Instead of holding developers accountable for feature throughput and the ops team for stability, we shared a combined goal of high throughput and high stability in software delivery.
That removed the goal conflict and encouraged us to share and solve common problems together. This also works when you repeat the alignment exercise with other areas, like compliance and finance.
The problem with our logging strategy was immediately apparent when one of the people generating the logs had to wade through them. This is a powerful motivator for change.
Identifying unresolved pain paths and closing the pain loop isn’t a form of punishment; it’s a moment of realization. It’s the reason we should all use the software we build: it highlights the unresolved pain paths we’re burdening our users with.
Pain loops are crucial to meaningful improvements in software delivery.
Great developers are experts at automating things. When you expose this skill set to repetitive work, a developer’s instinct is to eliminate the toil.
For the ops team, the step-by-step deployment checklist was just part of doing business. They were so familiar with the process that it became invisible.
When we reduced the toil, the ops team was definitely happier, even though we hadn’t solved all the rough edges yet.
The fully-formed ideas didn’t arrive immediately. The rough shapes were polished over time into a set of repeatable and connected DevOps habits.
The three Fs, incident causation principles, alerting strategy, and monitor selection guidelines graduated into deliberate approaches long after this story.
I developed an approach to software delivery improvement that used these ideas to address trust issues between developers and the business. By reducing negative signals caused by failed deployments and escaped bugs, we increased trust in the development team, enhanced their reputation, and increased their autonomy.
We combined these practices with Octopus Deploy for deployment and runbook automation and an observability platform, which meant the team was the first to spot problems rather than users. When there was a problem, it was trivial to fix, and the new version could be rolled out in no time.
Unlike the original organization, where we increased collaboration between teams, we created fully cross-functional teams that worked together all the time. Every skill required to deliver and operate the software was embedded, minimizing dependencies and the risk of silos, tickets, and bureaucracy.
These cross-functional teams also proved to be the best way to level up team members.
You can’t work with a database whizz for long before you start thinking about query performance, maintenance plans, and normalization. You build better software when you develop these skills. You can’t work with an infrastructure expert without learning about failovers, networking, and zero-downtime deployments. You build better software when you develop these skills, too.
When people say they can’t hire these highly skilled developers, they miss the crucial point. A team designed in this cross-functional style takes new team members and upgrades them into these impossible-to-find unicorns. You may start as a backend developer, a database administrator, or a test analyst, but you grow into a generalizing specialist with many new skills.
Creating these unicorn portals is the most valuable skill development managers can bring to an organization. You need to hire to fill gaps and foster an environment where skills transfer fluidly throughout the team.
What became a sophisticated and repeatable process for team transformation could be traced back to that simple act of sitting together. It was a small, easy change that led to increased empathy and understanding, and then a whole set of improvements.
Staring at that rapid stream of logs was the pivot point that led to the most healthy and human approach to DevOps.
We didn’t have the research to confirm it back then, but deployment automation, shared goals, observability, small batches, and Continuous Delivery are all linked to better outcomes for the people, teams, and organization. Everybody wins when you do DevOps right.
Happy deployments!