Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
149766 stories
·
33 followers

The Invisible Rewrite: Modernizing the Kubernetes Image Promoter

1 Share

Every container image you pull from registry.k8s.io got there throughkpromo, the Kubernetes image promoter. It copies images from staging registries to production, signs them with cosign, replicates signatures across more than 20 regional mirrors, and generatesSLSA provenance attestations. If this tool breaks, no Kubernetes release ships. Over the past few weeks, we rewrote its core from scratch, deleted 20% of the codebase, made it dramatically faster, and nobody noticed. That was the whole point.

A bit of history

The image promoter started in late 2018 as an internal Google project byLinus Arver. The goal was simple: replace the manual, Googler-gated process of copying container images into k8s.gcr.io with a community-owned, GitOps-based workflow. Push to a staging registry, open a PR with a YAML manifest, get it reviewed and merged, and automation handles the rest. KEP-1734 formalized this proposal.

In early 2019, the code moved to kubernetes-sigs/k8s-container-image-promoter and grew quickly. Over the next few years,Stephen Augustus consolidated multiple tools(cip, gh2gcs, krel promote-images, promobot-files) into a single CLI called kpromo. The repository was renamed topromo-tools.Adolfo Garcia Veytia (Puerco) added cosign signing and SBOM support. Tyler Ferrara built vulnerability scanning. Carlos Panato kept the project in a healthy and releasable state. 42 contributors made about 3,500 commits across more than 60 releases.

It worked. But by 2025 the codebase carried the weight of seven years of incremental additions from multiple SIGs and subprojects. The READMEsaid it plainly:you will see duplicated code, multiple techniques for accomplishing the same thing, and several TODOs.

The problems we needed to solve

Production promotion jobs for Kubernetes core images regularly took over 30 minutes and frequently failed with rate limit errors. The core promotion logic had grown into a monolith that washard to extend and difficult to test, making new features like provenance or vulnerability scanning painful to add.

On the SIG Release roadmap,two work items had been sitting for a while: "Rewrite artifact promoter" and"Make artifact validation more robust". We had discussed these at SIG Release meetings and KubeCons, and the open research spikes onproject board #171 captured eight questions that needed answers before we could move forward.

One issue to answer them all

In February 2026, we opened issue #1701 ("Rewrite artifact promoter pipeline") and answered all eight spikes in a single tracking issue. The rewrite was deliberately phased so that each step could be reviewed, merged, and validated independently. Here is what we did:

Phase 1: Rate Limiting (#1702).Rewrote rate limiting to properly throttle all registry operations with adaptive backoff.

Phase 2: Interfaces (#1704).Put registry and auth operations behind clean interfaces so they can be swapped out and tested independently.

Phase 3: Pipeline Engine (#1705).Built a pipeline engine that runs promotion as a sequence of distinct phases instead of one large function.

Phase 4: Provenance (#1706).Added SLSA provenance verification for staging images.

Phase 5: Scanner and SBOMs (#1709).Added vulnerability scanning and SBOM support. Flipped the default to the new pipeline engine. At this point we cutv4.2.0 and let it soak in production before continuing.

Phase 6: Split Signing from Replication (#1713).Separated image signing from signature replication into their own pipeline phases, eliminating the rate limit contention that caused most production failures.

Phase 7: Remove Legacy Pipeline (#1712).Deleted the old code path entirely.

Phase 8: Remove Legacy Dependencies (#1716).Deleted the audit subsystem, deprecated tools, and e2e test infrastructure.

Phase 9: Delete the Monolith (#1718).Removed the old monolithic core and its supporting packages. Thousands of lines deleted across phases 7 through 9.

Each phase shipped independently.v4.3.0 followed the next day with the legacy code fully removed.

With the new architecture in place, a series of follow-up improvements landed:parallelized registry reads(#1736),retry logic for all network operations(#1742),per-request timeouts to prevent pipeline hangs(#1763),HTTP connection reuse(#1759),local registry integration tests(#1746),the removal of deprecated credential file support(#1758),a rework of attestation handling to use cosign's OCI APIs and the removal of deprecated SBOM support(#1764),and a dedicated promotion record predicate type registered with thein-toto attestation framework (#1767).These would have been much harder to land without the clean separation the rewrite provided.v4.4.0 shipped all of these improvements and enabled provenance generation and verification by default.

The new pipeline

The promotion pipeline now has seven clearly separated phases:

graph LR Setup --> Plan --> Provenance --> Validate --> Promote --> Sign --> Attest
Phase What it does
Setup Validate options, prewarm TUF cache.
Plan Parse manifests, read registries, compute which images need promotion.
Provenance Verify SLSA attestations on staging images.
Validate Check cosign signatures, exit here for dry runs.
Promote Copy images server-side, preserving digests.
Sign Sign promoted images with keyless cosign.
Attest Generate promotion provenance attestations using a dedicated in-toto predicate type.

Phases run sequentially, so each one gets exclusive access to the full rate limit budget. No more contention. Signature replication to mirror registries is no longer part of this pipeline and runs as adedicated periodic Prow job instead.

Making it fast

With the architecture in place, we turned to performance.

Parallel registry reads (#1736):The plan phase reads 1,350 registries. We parallelized this and the plan phase dropped from about 20 minutes to about 2 minutes.

Two-phase tag listing (#1761):Instead of checking all 46,000 image groups across more than 20 mirrors, we first check only the source repositories. About 57% of images have no signatures at all because they were promoted before signing was enabled. We skip those entirely,cutting API calls roughly in half.

Source check before replication (#1727):Before iterating all mirrors for a given image, we check if the signature exists on the primary registry first. In steady state where most signatures are already replicated, this reduced the work from about 17 hours to about 15 minutes.

Per-request timeouts (#1763):We observed intermittent hangs where a stalled connection blocked the pipeline for over 9 hours. Every network operation now has its own timeout and transient failures are retried automatically.

Connection reuse (#1759):We started reusing HTTP connections and auth state across operations, eliminating redundant token negotiations. This closed along-standing request from 2023.

By the numbers

Here is what the rewrite looks like in aggregate.

  • Over 40 PRs merged, 3 releases shipped (v4.2.0, v4.3.0, v4.4.0)
  • Over 10,000 lines added and over 16,000 lines deleted, a net reduction of about 5,000 lines (20% smaller codebase)
  • Performance drastically improved across the board
  • Robustness improved with retry logic, per-request timeouts, and adaptive rate limiting
  • 19 long-standing issues closed

The codebase shrank by a fifth while gaining provenance attestations, a pipeline engine, vulnerability scanning integration, parallelized operations, retry logic, integration tests against local registries, and a standalone signature replication mode.

No user-facing changes

This was a hard requirement. The kpromo cip command accepts the same flags and reads the same YAML manifests. Thepost-k8sio-image-promo Prow job continued working throughout. The promotion manifests inkubernetes/k8s.io did not change. Nobody had to update their workflows or configuration.

We caught two regressions early in production. One (#1731)caused a registry key mismatch that made every image appear as "lost" so that nothing was promoted. Another (#1733)set the default thread count to zero, blocking all goroutines. Both were fixed within hours. The phased release strategy (v4.2.0 with the new engine, v4.3.0 with legacy code removed) gave us a clear rollback path that we fortunately never needed.

What comes next

Signature replication across all mirror registries remains the most expensive part of the promotion cycle. Issue #1762 proposes eliminating it entirely by havingarcheio (the registry.k8s.io redirect service) route signature tag requests to a single canonical upstream instead of per-region backends. Another option would be to move signing closer to the registry infrastructure itself. Both approaches need further discussion with the SIG Release and infrastructure teams, but either one would remove thousands of API calls per promotion cycle and simplify the codebase even further.

Thank you

This project has been a community effort spanning seven years. Thank you toLinus,Stephen,Adolfo,Carlos,Ben,Marko,Lauri,Tyler,Arnaud, and many others who contributed code, reviews, and planning over the years. The SIG Release and Release Engineering communities provided the context, the discussions, and the patience for a rewrite of infrastructure that every Kubernetes release depends on.

If you want to get involved, join us in#release-management on the Kubernetes Slack or check out therepository.

Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Scaling Jenkins: Central Controller vs Instance Sprawl

1 Share

This article was brought to you by Kumar Harsh, draft.dev.

Jenkins has powered CI/CD pipelines for more than a decade. Many teams start with a single Jenkins controller and a handful of build jobs. At that stage, Jenkins feels simple and flexible.

The problem appears later.

As organizations grow, the number of pipelines increases rapidly. Teams add more agents, install more plugins, and create more complex workflows. Eventually the Jenkins controller becomes the bottleneck that limits build throughput and operational stability.

This article explains why scaling Jenkins becomes difficult, what architectural patterns teams use to manage growth, and how modern CI/CD platforms such as TeamCity approach the same challenge differently.

What does scaling Jenkins actually mean?

Scaling CI/CD is not just about the number of builds a system can run.

At enterprise scale, CI systems must handle:

  • hundreds or thousands of concurrent builds
  • multiple repositories and programming languages
  • complex artifact dependencies
  • short feedback cycles for developers
  • high reliability and compliance requirements

A CI platform that worked well for a few teams must now support an entire engineering organization.

Architecture becomes a critical factor.

Why Jenkins struggles at scale

Jenkins was originally designed around a single controller architecture. The controller performs several responsibilities:

  • scheduling build jobs
  • managing pipeline configuration
  • coordinating build agents
  • storing metadata and artifacts
  • serving the web interface
  • executing plugin logic

When the number of builds increases, these responsibilities compete for the same CPU, memory, and I/O resources.

Even if you add more agents, the controller itself may become the bottleneck.

Common symptoms include:

  • growing build queues
  • slow UI performance
  • controller instability
  • frequent restarts during upgrades

At small scale these problems are manageable. At enterprise scale they become operational risks.

Two ways teams try to scale Jenkins

Organizations usually attempt one of two strategies.

1. A large centralized controller

In this model, one powerful Jenkins controller manages all pipelines across the organization.

Advantages:

  • centralized governance
  • easier visibility across pipelines
  • consistent configuration

Challenges:

  • controller becomes a single point of failure
  • upgrades affect all builds
  • plugin conflicts can impact the entire system

2. Multiple Jenkins controllers

Many organizations split workloads across several controllers.

Each controller may support:

  • a specific product team
  • a set of repositories
  • a particular environment

Advantages:

  • reduced load per controller
  • partial isolation between teams

Challenges:

  • configuration drift
  • inconsistent plugin versions
  • duplicated maintenance work
  • fragmented governance

Over time this approach often leads to Jenkins instance sprawl.

Instead of one complex controller, organizations manage dozens of smaller Jenkins environments.

The plugin ecosystem at scale

The Jenkins plugin ecosystem is one of the platform’s biggest strengths. Integrations with version control systems, cloud platforms, and developer tools are usually implemented through plugins.

However, plugin management becomes significantly more complex as systems grow.

Common problems include:

  • dependency chains between plugins
  • incompatible plugin versions across controllers
  • controller restarts required for upgrades
  • abandoned or unmaintained plugins
  • security vulnerabilities in plugin code

A single plugin upgrade may trigger additional dependency updates. Administrators often need to test plugin combinations carefully before deploying them in production.

At enterprise scale, plugin management becomes an operational discipline of its own.

Operational costs of running Jenkins at scale

Infrastructure costs are only part of the equation.

Organizations running large Jenkins installations must also manage:

  • plugin lifecycle management
  • controller upgrades
  • security patching
  • access control governance
  • pipeline configuration maintenance

Downtime can affect hundreds of developers simultaneously. When builds stop, releases are delayed and engineering productivity drops.

In regulated environments, compliance requirements add another layer of complexity. Administrators must track plugin usage, credential access, and audit logs across multiple Jenkins instances.

How modern CI platforms approach scalability

Modern CI/CD platforms are increasingly designed with scalability as a core architectural principle.

Instead of relying heavily on plugins and controller customization, they focus on:

  • built-in integrations
  • predictable upgrade processes
  • clearer separation between orchestration and execution

This approach reduces operational overhead and improves system stability as organizations grow.

How TeamCity addresses CI/CD scaling

TeamCity uses a server-agent architecture that separates orchestration from build execution.

Key capabilities include:

  • native integrations for common tools
  • build chains for managing pipeline dependencies
  • built-in artifact management
  • configuration as code using Kotlin DSL
  • centralized governance and visibility

Because many integrations are built into the platform, organizations rely less on third-party extensions. This reduces dependency management and simplifies upgrades.

At larger scale, fewer moving parts can translate into more predictable CI/CD operations.

💡 Read also: Centralized Power: How TeamCity’s Architecture Solves Jenkins’ Scaling Problem

Jenkins vs TeamCity at scale

CapabilityJenkinsTeamCity
Core architectureSingle controller with agentsServer and agents
IntegrationsPlugin ecosystemMostly built-in
Upgrade complexityPlugin dependency managementIntegrated release cycle
GovernanceVaries across controllersCentralized
Operational overheadHigher at large scaleTypically lower

Both platforms can support enterprise CI/CD, but they approach scalability differently.

Evaluating CI/CD platforms for large organizations

When choosing a CI/CD platform, organizations should evaluate several factors:

Build scale
How many builds run daily and how quickly developers need feedback.

Governance requirements
Whether compliance or security standards require centralized visibility and control.

Operational complexity
How much engineering time can be dedicated to maintaining CI infrastructure.

Integration needs
Whether teams rely on highly customized integrations or prefer built-in capabilities.

Running a proof-of-concept migration with a small project is often the best way to compare platforms.

Conclusion

Jenkins remains one of the most widely used CI/CD tools in the industry. Its flexibility and plugin ecosystem helped it become the backbone of many engineering organizations.

However, scaling Jenkins often requires significant operational investment. Organizations must manage controllers, plugin dependencies, and infrastructure as their CI environments grow.

Platforms like TeamCity take a different architectural approach. By emphasizing built-in capabilities and centralized management, they aim to reduce the operational burden of running CI/CD at enterprise scale.

For teams reassessing their CI infrastructure, the key question is simple:

Do you want to engineer your CI platform, or focus on engineering your product?

Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

🔔New app Alert! Trowser

1 Share

No one asked for it, and now it’s here! You can now pin a website in a small pane to your Windows System Tray! Why would you want to do this?? Well, many times I will pick up my phone for a quick scroll while I wait for a long running task to fail. So why not embed that experience in the system tray.

I see this as a tool for anyone thinking of making a small app version of a website. Maybe it is a tool or feed or reference; now it can be a click away!

As always, Trowser is open source on GitHub. To install just download the latest release from GitHub.

Please let me know what you think!

Joe





Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

AI Snake-Oil Salesmen: How to Spot Them

1 Share

Outside of my work at Redgate, I serve as an AI Advisor to organizations and individual investors. It’s a role I genuinely enjoy and it gives me the opportunity to help drive clear direction around AI projects, policies, and governance. But one increasingly frustrating pattern has emerged from this work over the past year, and I think it’s worth talking about: the rise of what I’ve started calling the AI snake-oil salesman.

Whenever a technology disrupts an industry as dramatically as AI has disrupted tech, the profit opportunity attracts not just innovators but also opportunists, con artists, and even charlatans. Fake products all dressed up in impressive language and AI tech. I’ve encountered them on advisory projects and in investment reviews, and I’ve gotten good at recognizing the patterns. These red flags aren’t secret and I truly believe anyone can learn to spot them, so I figured I’d save the technical community some due diligence and share what I’ve collected.

The Red Flags

  1. They have a solution without a problem. Or worse, they’re actively shopping for a problem their product can pretend to solve.
  2. Their website contradicts their pitch. They’ll tell you AI is at the core of everything they do, but their web presence was built with basic traditional tools and shows no evidence of the sophisticated infrastructure they’re describing. The contradiction is the tell and no react code in sight.
  3. All their content reads like it was written by AI. Lots of buzz words, impressive-sounding phrases, and absolutely nothing substantive about how their AI actually functions or what’s under the hood.
  4. In person, they speak in magic, not mechanics. Whether they’re talking about RAG, machine learning, or generative AI, it’s all buzz words and hand-waving. Ask them about workflows, architecture, security, or frameworks and watch the conversation go sideways.
  5. They ask what your problems are before telling you what their product does. This is a classic manipulation pattern.  They’re reverse-engineering their pitch in real time to match whatever pain you just described.
  6. They target non-technical investors and unpaid technical talent. They’re looking for people with money who don’t know enough to ask hard questions, and developers willing to work for equity in something that doesn’t yet exist.
  7. They hide behind intellectual property. There’s a difference between protecting trade secrets and being unable to explain what you’re building. If someone can’t describe their product’s purpose clearly, without giving anything proprietary away, then that’s not caution, but a gap.

What Legitimate AI Work Actually Looks Like

For contrast: credible AI practitioners can explain their architecture in plain language. They know what problem they’re solving and why AI is the right tool for it and not just the trendy one. They welcome technical scrutiny rather than deflecting it. And they don’t need your investment before they can tell you what they’re building.

A Note on Hope and Reality

Every time I encounter one of these, I genuinely hope I’m wrong. I’d love to hear a year later that they found their footing and built something real. But so far, without exception, every AI snake-oil salesman I’ve flagged is still searching for traction, or worse, still searching for a problem and often, it’s well over a year later that I’m seeing this.

Do your due diligence. Ask hard questions. And if someone can’t answer them, that’s your answer.

Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Daily Reading List – March 16, 2026 (#742)

1 Share

I waded a bit into the “MCP or not” debate by running some experiments to see how much MCP costs my custom-built agent. If you complement with agent skills, the answer is “not too much.”

[blog] Become Builders, Not Coders. This is more of a directive versus suggestion at this point. What has to change and how do you do it? Here’s a post with advice.

[blog] Balancing AI tensions: Moving from AI adoption to effective SDLC use. The DORA team used some fresh research to understand how teams are using AI, where they get value, and stumble. The suggestions are very good.

[blog] Why context is the missing link in AI data security. These Google Cloud tools are really impressive at identifying and masking sensitive info. Now, with better context classifiers.

[blog] Run Karpathy’s autoresearch on a Google serverless stack for $2/hour. With the exception of doing massive training jobs, most of us can try out nearly anything with AI for a reasonable cost. I like Karl’s example here.

[article] Why the World Still Runs on SAP. Big ERP, CRM, and service management platforms aren’t going anywhere. But it’s going to get easier to set them up, use them, and operate them.

[article] You’re Not Paid to Write Code. I recognize that I’ve shared a lot of posts on this topic. But it’s important. We’re not just adding tools to the mix; we’re changing identities and habits. That takes repetitive reminders and motivation.

[blog] When to use WebMCP and MCP. Pay attention to WebMCP. It might turn out to be something fairly important.

[blog] BigQuery Studio is more useful than ever, with enhanced Gemini assistant. I like this surface, and it’s made data analytics so much simpler for experts and novices.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:



Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

How I Set Up Claude Code to Run My Entire Dev Workflow

1 Share

I've been using Claude Code as my primary development tool for the past few months. Not as a fancy autocomplete — as an actual workflow engine that handles deployments, testing, code review, and multi-step tasks autonomously.

Here's the setup that made it click.

The problem with default Claude Code

Out of the box, Claude Code is a capable terminal assistant. But it starts fresh every session. It doesn't know your project conventions. It doesn't remember that you prefer named exports, or that your deploy script needs a specific flag, or that the auth module has a known race condition.

Every session, you're re-explaining context. That's the gap.

The configuration stack that changed everything

1. CLAUDE.md — project context in one file

This is the single highest-leverage configuration. Create a CLAUDE.md in your project root with your tech stack, conventions, file structure, and build commands. Claude reads it automatically at session start.

# Project: My App

## Stack
- Next.js 14 (App Router), TypeScript strict, Tailwind
- Drizzle ORM with Postgres, deployed on Vercel

## Commands
- pnpm dev / pnpm build / pnpm test

## Conventions
- Named exports only
- Repository pattern for DB queries
- Never commit .env files

10 minutes to write. Immediately makes every session more productive.

2. Persistent memory

Create .claude/memory.md to maintain context across sessions. Track what you're working on, decisions already made, and known issues. Keep it under 100 lines — memory should be compact, not a changelog.

3. Custom skills

Skills are reusable procedures in .claude/skills/ that ClaudeI 'fvoel lboewesn fuosri nrge pCelaatuadbel eC otdaes kass. mMyy pmroismta-ruys edde vsekliolplmse:n
t
-t o*o*lD efpolro yt*h*e — prausnts fperwe -mcohnetchkss., Nbouti ladss ,a dfeapnlcoyy sa,u tvoecroimfpileest
e- — a*sD Ba nm iagcrtautailo nw*o*r k— fgleonwe reantgeisn em itghraatt ihoann,d lteess tdse polno ysmteangtisn,g ,t easptpilnige,s ctood ep rroedv
i-e w
,* PaRn dr emvuiletwi*-*s t—e pc hteacskkss saeuctuorniotmyo,u srluyn.s

tHeesrtes',s rtehvei eswest uapg atihnastt mtaedaem istt acnldiacrkd.s

#T h4e. pHrooobklse mf owri tghu adredfraauillts

C
lHaouodkes Ciond e
.
cOluatu doef/ stehtet ibnogxs,. jCsloanu
d ee nCfoodrec ei sr ual ecsa paaubtloem atteircmailnlayl. aMsys iesstsaenntt.i aBlust:
i
t- sPtraer-tcso mfmriets:h leivnetr y+ steyspseicohne.c kI tb edfooersen 'etv ekrnyo wc oymomuirt
p-r oPjreec-tw rciotnev:e nbtlioocnks .f iIlte sd oceosnnt'ati nrienmge mhbaerrd ctohdaetd ysoeuc rpertesf
e-r Pnoasmte-dt oeoxlp:o rltosg, eovre rtyh afti lyeo umro ddiefpilcoayt isocnr ifpotr naeueddist
a
#s#p eTchief irce sfullatg
,
Woirt ht htahti st hsee tauupt,h Im ocdaunl es ahya s" aad dk nSotwrni prea cwee bchoonodki thiaonnd.l
i
nEgv"e rayn ds ewsasliko na,w ayyo.u 'Crlea urdee- eCxopdlea irneiandgs cmoyn tceoxdte.b aTshea,t 'usn dtehres tgaanpd.s

t#h#e Tahrec hciotnefcitguurrea t(iforno ms tCaLcAkU DtEh.amtd )c,h afnogleldo wesv emryy tchoinnvge
n
t#i#o#n s1 .( fCrLoAmU DmEe.mmodr y—) ,p ruosjeesc tt hceo nrtiegxhtt idne polnoey fpirloec
e
sTsh i(sf riosm tshkei lslisn)g,l ea nhdi gchaens'tt- lceovmemriatg eb acdo ncfoidgeu r(ahtoiookns.) .C
r
eIatt'es an otC LmAaUgDiEc. m—di ti'ns ycoounrf ipgruorjaetcito nr ocootm pwoiutnhd iynogu ro vteerc ht ismtea.c
k
,# #c oWnavnetn ttihoen ss,h ofritlceu ts?t
r
uBcutiulrdei,n ga nadl lb utihlids cformomma nsdcsr.a tCclha utdaek erse addasy si.t Ia uftooumnadt i[cCallaluyd iafty ]s(ehststiposn: /s/tcalratu.d
i
fy.`tmeacrhk)d o— wan
p#r eP-rboujielctt :o pMeyr aAtpipn
g
#s#y sStteamc kf
o-r NCelxatu.djes C1o4d e( Awpipt hR o1u,t7e0r0)+, sTkyiplelSsc,r ippetr ssitsrtiecntt, mTeamiolrwyi,n da
n-d Daruitzozmlaet eOdR Mq uwailtiht yP ogsattgerse.s ,I td eipnlsotyaeldl so ni nV eorncee lc
o
m#m#a nCdo mamnadn dgsi
v-e sp nypomu dtehve /f uplnlp mc obnufiilgdu r/a tpinopnm stteasctk

i#m#m eCdoinavteenltyi.o
n
sW
h-e tNhaemre dy oeux pbouritlsd oynoluyr
-o wRne psoestiutpo royr puastet esronm eftohri nDgB pqruee-rbiueisl
t-, Ntehvee rk ecyo mimnisti g.hetn vi sf itlhees
sam`e
:
1*0* Cmlianuudtee sC otdoe 'wsr ivtael.u eI mimsend'ita tienl yt hmea kmeosd eelv e—r yi ts'ess siino nt hmeo rceo npfriogduurcattiivoen.
l
a#y#e#r 2a.r oPuenrds iistt.e*n*t

m-e-m-o
r
y*
W
hCarte'ast ey ou.rc lCaluadued/em eCmoodrey .smedtu pt ol omoaki nltiakien? cDornotpe xyto uarc rCoLsAsU DsEe.smsdi otnisp.s Tirna ctkh ew hcaotm myeonut'sr.e* working on, decisions already made, and known issues. Keep it under 100 lines — memory should be compact, not a changelog.

3. Custom skills

Skills are reusable procedures in .claude/skills/ that Claude follows for repeatable tasks. My most-used skills:

  • Deploy — runs pre-checks, builds, deploys, verifies
  • DB migration — generates migration, tests on staging, applies to prod
  • PR review — checks security, runs tests, reviews against team standards

4. Hooks for guardrails

Hooks in .claude/settings.json enforce rules automatically. My essentials:

  • Pre-commit: lint + typecheck before every commit
  • Pre-write: block files containing hardcoded secrets
  • Post-tool: log every file modification for audit

The result

With this setup, I can say "add Stripe webhook handling" and walk away. Claude Code reads my codebase, understands the architecture (from CLAUDE.md), follows my conventions (from memory), uses the right deploy process (from skills), and can't commit bad code (hooks).

It's not magic — it's configuration compounding over time.

Want the shortcut?

Building all this from scratch takes days. I found Claudify — a pre-built operating system for Claude Code with 1,700+ skills, persistent memory, and automated quality gates. It installs in one command and gives you the full configuration stack immediately.

Whether you build your own setup or use something pre-built, the key insight is the same: Claude Code's value isn't in the model — it's in the configuration layer around it.

What's your Claude Code setup look like? Drop your CLAUDE.md tips in the comments.

Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories