Every container image you pull from registry.k8s.io got there throughkpromo, the Kubernetes image
promoter. It copies images from staging registries to
production, signs them with cosign, replicates
signatures across more than 20 regional mirrors, and generatesSLSA provenance attestations. If this tool breaks, no
Kubernetes release ships. Over the past few weeks, we rewrote its core from
scratch, deleted 20% of the codebase, made it dramatically faster, and
nobody noticed. That was the whole point.
A bit of history
The image promoter started in late 2018 as an internal Google project byLinus Arver. The goal was simple: replace the
manual, Googler-gated process of copying container images into k8s.gcr.io with
a community-owned, GitOps-based workflow. Push to a staging registry, open a PR
with a YAML manifest, get it reviewed and merged, and automation handles the
rest. KEP-1734
formalized this proposal.
In early 2019, the code moved to kubernetes-sigs/k8s-container-image-promoter
and grew quickly. Over the next few years,Stephen Augustus consolidated multiple tools(cip, gh2gcs, krel promote-images, promobot-files) into a single CLI
called kpromo. The repository was renamed topromo-tools.Adolfo Garcia Veytia (Puerco) added cosign signing
and SBOM support. Tyler Ferrara built
vulnerability scanning. Carlos Panato kept the project in a healthy and
releasable state. 42 contributors made about 3,500 commits across more than 60 releases.
It worked. But by 2025 the codebase carried the weight of seven years of incremental additions from multiple SIGs and subprojects. The READMEsaid it plainly:you will see duplicated code, multiple techniques for accomplishing the same thing, and several TODOs.
The problems we needed to solve
Production promotion jobs for Kubernetes core images regularly took over 30 minutes and frequently failed with rate limit errors. The core promotion logic had grown into a monolith that washard to extend and difficult to test, making new features like provenance or vulnerability scanning painful to add.
On the SIG Release roadmap,two work items had been sitting for a while: "Rewrite artifact promoter" and"Make artifact validation more robust". We had discussed these at SIG Release meetings and KubeCons, and the open research spikes onproject board #171 captured eight questions that needed answers before we could move forward.
One issue to answer them all
In February 2026, we opened issue #1701 ("Rewrite artifact promoter pipeline") and answered all eight spikes in a single tracking issue. The rewrite was deliberately phased so that each step could be reviewed, merged, and validated independently. Here is what we did:
Phase 1: Rate Limiting (#1702).Rewrote rate limiting to properly throttle all registry operations with adaptive backoff.
Phase 2: Interfaces (#1704).Put registry and auth operations behind clean interfaces so they can be swapped out and tested independently.
Phase 3: Pipeline Engine (#1705).Built a pipeline engine that runs promotion as a sequence of distinct phases instead of one large function.
Phase 4: Provenance (#1706).Added SLSA provenance verification for staging images.
Phase 5: Scanner and SBOMs (#1709).Added vulnerability scanning and SBOM support. Flipped the default to the new pipeline engine. At this point we cutv4.2.0 and let it soak in production before continuing.
Phase 6: Split Signing from Replication (#1713).Separated image signing from signature replication into their own pipeline phases, eliminating the rate limit contention that caused most production failures.
Phase 7: Remove Legacy Pipeline (#1712).Deleted the old code path entirely.
Phase 8: Remove Legacy Dependencies (#1716).Deleted the audit subsystem, deprecated tools, and e2e test infrastructure.
Phase 9: Delete the Monolith (#1718).Removed the old monolithic core and its supporting packages. Thousands of lines deleted across phases 7 through 9.
Each phase shipped independently.v4.3.0 followed the next day with the legacy code fully removed.
With the new architecture in place, a series of follow-up improvements landed:parallelized registry reads(#1736),retry logic for all network operations(#1742),per-request timeouts to prevent pipeline hangs(#1763),HTTP connection reuse(#1759),local registry integration tests(#1746),the removal of deprecated credential file support(#1758),a rework of attestation handling to use cosign's OCI APIs and the removal of deprecated SBOM support(#1764),and a dedicated promotion record predicate type registered with thein-toto attestation framework (#1767).These would have been much harder to land without the clean separation the rewrite provided.v4.4.0 shipped all of these improvements and enabled provenance generation and verification by default.
The new pipeline
The promotion pipeline now has seven clearly separated phases:
| Phase | What it does |
|---|---|
| Setup | Validate options, prewarm TUF cache. |
| Plan | Parse manifests, read registries, compute which images need promotion. |
| Provenance | Verify SLSA attestations on staging images. |
| Validate | Check cosign signatures, exit here for dry runs. |
| Promote | Copy images server-side, preserving digests. |
| Sign | Sign promoted images with keyless cosign. |
| Attest | Generate promotion provenance attestations using a dedicated in-toto predicate type. |
Phases run sequentially, so each one gets exclusive access to the full rate limit budget. No more contention. Signature replication to mirror registries is no longer part of this pipeline and runs as adedicated periodic Prow job instead.
Making it fast
With the architecture in place, we turned to performance.
Parallel registry reads (#1736):The plan phase reads 1,350 registries. We parallelized this and the plan phase dropped from about 20 minutes to about 2 minutes.
Two-phase tag listing (#1761):Instead of checking all 46,000 image groups across more than 20 mirrors, we first check only the source repositories. About 57% of images have no signatures at all because they were promoted before signing was enabled. We skip those entirely,cutting API calls roughly in half.
Source check before replication (#1727):Before iterating all mirrors for a given image, we check if the signature exists on the primary registry first. In steady state where most signatures are already replicated, this reduced the work from about 17 hours to about 15 minutes.
Per-request timeouts (#1763):We observed intermittent hangs where a stalled connection blocked the pipeline for over 9 hours. Every network operation now has its own timeout and transient failures are retried automatically.
Connection reuse (#1759):We started reusing HTTP connections and auth state across operations, eliminating redundant token negotiations. This closed along-standing request from 2023.
By the numbers
Here is what the rewrite looks like in aggregate.
- Over 40 PRs merged, 3 releases shipped (v4.2.0, v4.3.0, v4.4.0)
- Over 10,000 lines added and over 16,000 lines deleted, a net reduction of about 5,000 lines (20% smaller codebase)
- Performance drastically improved across the board
- Robustness improved with retry logic, per-request timeouts, and adaptive rate limiting
- 19 long-standing issues closed
The codebase shrank by a fifth while gaining provenance attestations, a pipeline engine, vulnerability scanning integration, parallelized operations, retry logic, integration tests against local registries, and a standalone signature replication mode.
No user-facing changes
This was a hard requirement. The kpromo cip command accepts the same flags and
reads the same YAML manifests. Thepost-k8sio-image-promo
Prow job continued working throughout. The promotion manifests inkubernetes/k8s.io did not change. Nobody
had to update their workflows or configuration.
We caught two regressions early in production. One (#1731)caused a registry key mismatch that made every image appear as "lost" so that nothing was promoted. Another (#1733)set the default thread count to zero, blocking all goroutines. Both were fixed within hours. The phased release strategy (v4.2.0 with the new engine, v4.3.0 with legacy code removed) gave us a clear rollback path that we fortunately never needed.
What comes next
Signature replication across all mirror registries remains the most expensive
part of the promotion cycle. Issue #1762
proposes eliminating it entirely by havingarcheio (the registry.k8s.io
redirect service) route signature tag requests to a single canonical upstream
instead of per-region backends. Another option would be to move signing closer
to the registry infrastructure itself. Both approaches need further discussion
with the SIG Release and infrastructure teams, but either one would remove
thousands of API calls per promotion cycle and simplify the codebase even
further.
Thank you
This project has been a community effort spanning seven years. Thank you toLinus,Stephen,Adolfo,Carlos,Ben,Marko,Lauri,Tyler,Arnaud, and many others who contributed code, reviews, and planning over the years. The SIG Release and Release Engineering communities provided the context, the discussions, and the patience for a rewrite of infrastructure that every Kubernetes release depends on.
If you want to get involved, join us in#release-management on the
Kubernetes Slack or check out therepository.


