As syntax becomes cheap and abundant, architectural control becomes the scarce resource. Effective governance starts upstream, where intent, constraints, and threat models shape the agent’s working context before generation begins. The goal isn’t better prompting but build-time boundaries that prevent structurally invalid code from entering the system.
The Frankenstein factories
The dark factories (as Dan Shapiro calls them) are running. Tokens fly through trycycles, features ship overnight, and codebases are ported before breakfast. The velocity is real. And comprehension debt (a term coined by Addy Osmani) is compounding in silence behind it.
What this era is producing, at scale, deserves its own name: Frankenstein factories. Not a critique of any single approach but a description of a structural condition—generation engines so effective at producing working syntax that they have industrialized the creation of architecturally ungovernable systems. The creature walks out of the laboratory impressive, functional, and alive on delivery day.
The crisis arrives the day someone must govern it. To govern a system means to hold it accountable to its design boundaries—the ability to look at it and reliably say why it works, what is permitted to touch what, and to categorically prevent forbidden state changes before they happen. Victor’s catastrophe was not the act of creation but the absent governing frame.
For prototyping or shipping features fast, unconstrained generation is a powerful tool. It optimizes for velocity, and it delivers. But for enterprise payment systems, insurance underwriting engines, logistics orchestrators, and regulated platforms, the question is not “Does the code ship?” but “Who is liable when it does the wrong thing?” Here, automating the word “YES” to every feature request does not solve the problem. It industrializes it.
Consider a standard Jira ticket: “Add an email notification after a successful payment.”
A junior developer might attempt to wedge the email-sending logic directly into the PaymentProcessor class. A senior architect catches this in code review: “No. Fire a PaymentSuccessEvent to the message bus.” That human friction—the architectural “No”—keeps the system maintainable.
Unconstrained AI agents lack this assertiveness. By default, they are the ultimate yes-men.
Hand that same ticket to a standard coding agent and it will not argue about bounded contexts. It will burn tokens until it produces 300 lines of syntactically perfect code, import an SMTP library directly into the core of your billing domain, and submit a pull request. The tests will pass; conventional feature tests make no assertion about bounded contexts. The CI pipeline will go green. And structurally, the system is now a disaster.
This happens not through malice but because of how agentic loops are built. Without explicit architectural constraints, the system’s emergent behavior is to fulfill immediate user intent. The agent is orchestrated to ship the feature, not to defend the architecture. Comprehension debt is the structural consequence: AI generates syntax faster than human beings can read or govern it. Expecting a probabilistic model to enforce structural integrity on its own is a category error. Without a governing frame, the agent will always take the path of least resistance to a “YES.”
You cannot fix code overproduction by hiring more people to read it nor by running the generation loop faster. The only scalable answer is to build a concrete riverbed before you turn on the water.
If the current era automates the word “YES,” we should automate the word “NO.”
Securing the runtime environment prevents the monster from escaping. But to prevent it from being built in the first place, we need to step back into the IDE and the CI/CD pipeline. We need to govern generation.
The great softening: Shifting risk from build time to runtime
Compilers never guaranteed correct software. You could write catastrophic logically broken systems in C, Java, or any other compiled language. But compilers served a crucial engineering purpose: They deterministically governed a specific layer of structural risk.
By enforcing hard execution constraints—syntax validity, type compatibility, linkage rules, and executable viability—the compiler acted as an automated boundary. It didn’t verify business intent, domain correctness, or architectural quality. What it did was eliminate an entire class of low-level structural failure before execution ever began.
That delegation of risk is one of the quiet triumphs of software engineering. Our discipline has always advanced by mechanizing one class of guarantees so humans can focus on the next layer of abstraction. We automated machine-level structural correctness so engineers could spend their cognitive energy on application logic. Later, we pushed more guarantees upward, into schemas, testing, static analysis, architectural patterns, and operational controls.
Over time, we also deliberately softened certain boundaries in exchange for speed. Dynamic languages, richer runtimes, reflection, and increasingly abstract frameworks all traded deterministic compile-time guarantees for developer velocity and flexibility. The newly exposed risk was absorbed elsewhere: runtime validation, automated testing, observability, and engineering discipline.
Today, with agentic AI, we are softening boundaries again, more radically than ever before.
Natural language has become a high-level control plane for software generation. Arbitrary text increasingly shapes executable behavior. And in that shift, we have blurred one of the oldest boundaries in computing: the separation between data and instructions.
Outside the model, that boundary still exists. Systems enforce permission scopes, schema contracts, sandboxing, and execution policies. But inside the inference context, those protections collapse into the same token stream.
System prompts, retrieved documents, user messages, tool outputs, and external content all flow through the same neural weights. There is no hard privilege boundary between instruction and input. Modern models may resist naive attacks like “Ignore previous instructions,” but they remain vulnerable to indirect injections disguised as legitimate operational context. A malicious instruction embedded in a customer email, a webpage, or a tool response is not processed as passive data. It can become behavioral influence.
Inside the context window, untrusted text can shape control flow. That is the real softening.
We are generating syntax at machine speed, but we have dissolved the structural gate that once constrained how systems were built. The result is a massive shift of risk from build time to runtime. Code that appears structurally sound during generation may violate architectural boundaries, introduce unsafe execution paths, or become behaviorally compromised the moment hostile context enters the loop.
The conclusion is straightforward: The fact that AI-generated code runs is no longer a meaningful proxy for system correctness.
Syntax is abundant. Execution is easy. Structural governance is what is missing.
We outsourced the writing of logic to machines, but we did not build a deterministic boundary that governs what those machines are allowed to generate.
If we want control back, we cannot rely on human code review at machine speed. We must rebuild the build-time gate.
From dependency bloat to tailor-made architecture
For decades, the industry’s default response to complexity was abstraction by accumulation: monolithic frameworks, sprawling dependency trees, and ever-thicker layers of indirection. Importing a 50-megabyte library to avoid repetitive boilerplate was a rational trade-off when developer time and cognitive bandwidth were the scarce resources. For AI agents, that trade-off changes.
This is not an argument against foundational infrastructure. Mature primitives—like SQLAlchemy in Python or Spring Boot in Java—remain essential precisely because their conventions are widely learned and predictable. The problem isn’t abstraction but opacity. When core business logic disappears behind proprietary decorators, internal frameworks, or custom orchestration layers, execution becomes a black box. An agent cannot safely reason about code it cannot trace. It needs direct visibility into causality: what changes state, what enforces invariants, and where responsibilities begin and end. Hidden flow degrades reasoning into guesswork; guesswork silently becomes architectural drift.
At the same time, AI drives the cost of procedural code toward zero. Boilerplate is no longer expensive. Clarity is. The design question shifts from “How much can we abstract away?” to “How much must remain explicit for safe reasoning?” The answer is tailor-made architecture: thin infrastructure, explicit domain logic, hard boundaries, and narrowly scoped components with visible contracts. The value is no longer in how much code you avoid writing but in how clearly the system declares its boundaries.
That same opacity also breaks verification. AI review can catch local defects, risky patterns, and implementation mistakes, but it remains blind to architectural drift and missing business intent unless those constraints are explicitly encoded. After all, if you ask a model to review code generated from the exact same vague Jira ticket, do you actually get verification, or do you just engineer a circular hallucination, where the AI politely revalidates its own blind spots?

The Context Compilation Pattern
The Context Compilation Pattern governs generation in the IDE and the CI/CD pipeline before a single syntactically plausible line ever reaches a human reviewer. If the Decision Intelligence Runtime (DIR) is the vault door that protects execution in production, context compilation is the blueprint that prevents the monster from being built in the lab.
This is not “prompt engineering,” which merely asks a probabilistic model for a better answer. What we need is build-time governance: two layers of defense assembled before the LLM inference is even triggered. The first is structured context injection (assembling the prompt from prioritized artifacts). The second is postgeneration static verification (deterministic AST checks that enforce rules no probabilistic model can override). The prompt structure biases generation toward compliant solutions; the static checks make declared, machine-verifiable boundary violations impossible to merge.
Deterministic build-time governance is not a return to formal software specification (like UML), nor is it merely “prompt engineering disguised as Markdown.” It’s a mechanical constraint on the generation space that makes explicitly declared boundary violations rejectable by design. Context compilation does not eliminate architectural review or replace engineering judgment. Instead, it ensures that the agent operates within a defined riverbed of allowed structural invariants.
Engineering evolves whenever implicit rules become explicit declarations. Application development is now crossing that boundary. The senior engineer’s new job is declarative boundary engineering: explicitly declaring what the system is absolutely forbidden from doing.
The failure is not in the frameworks. The failure is in the process: pointing an unconstrained AI agent at a codebase full of invisible magic and expecting a CI/CD pipeline designed for human-generated code to catch what goes wrong. The answer is to build a compiler for the agent’s context.
The Context Compilation Pattern is the staged pipeline that makes this concrete.

Step 1: The context artifacts
The most strategically valuable code in your repository may no longer live in src/. It lives in /context. The pipeline consumes versioned artifacts such as intent.md, boundaries.md, and threat-model.md, each authored by a specialist before a single line of code is generated. (Ownership and role responsibilities are covered in “Artifact-Bound Roles and Accountability” below.) What matters here is that these files are the inputs to the compiler: Without them, there’s nothing to compile.
To prevent cognitive overlap, their roles must be fiercely separated: boundaries.md declares structural invariants (e.g., dependency direction, allowed communication paths, and event emission), whereas threat-model.md models adversarial constraints as declarative abuse scenarios (e.g., prompt injection and secrets exfiltration) that must be mechanically blocked.
boundaries.md warrants a precise definition, because it anchors the entire build-time governance model. In practice, boundaries are typically defined at module or bounded-context granularity (e.g., /billing/* or /risk/*), not per class or per repository. They are implemented using hybrid artifacts: a natural language document designed to constrain the LLM, tightly paired with a deterministic rule for the CI runner.
Consider this concrete example of how an architectural boundary is explicitly declared and enforced:
1. boundaries.md (for the LLM context)
This Markdown file is injected into the agent’s prompt. It defines the vocabulary, architectural constraints, and allowed interactions.
Module: Billing
Ontology: Order, Invoice, PaymentEvent
Rule: Zero external network I/O is allowed in this domain. You must NEVER import requests or smtplib.
2. semgrep-rule.yml (for the CI/CD runner)
This static file goes to the CI pipeline to mechanize the boundary. It ensures the code check is fully deterministic.
rules:
# Block forbidden imports at the module boundary
- id: block-external-io-in-billing
patterns:
- pattern-either:
- pattern: import smtplib
- pattern: import requests
message: "Architecture Violation: External I/O is strictly forbidden in the billing domain."
severity: ERROR
languages: [python]
paths:
include: ["src/billing/**"]
# Domain layer must not talk to DB driver directly
- id: block-db-driver-in-domain
patterns:
- pattern-either:
- pattern: import sqlalchemy
- pattern: from sqlalchemy import ...
- pattern: import psycopg2
- pattern: from psycopg2 import ...
message: "Architecture Violation: Domain layer must use Repository abstraction, not database drivers directly."
severity: ERROR
languages: [python]
paths:
include:
- "src/billing/domain/**"
Crucially, these Semgrep/CI rules are human-authored (or human-reviewed) precommit artifacts. We don’t rely on an LLM to generate the security gates on the fly. The AI reads the Markdown to guide its generation; the CI runner executes the static YAML to enforce the boundary.
If these artifacts stay current, they actively govern the generated codebase. Stale or malformed context becomes context debt: The pipeline will enforce strictly whatever was declared, even if the declaration is wrong. Governance artifacts are production code. They require strict versioning, explicit ownership, and periodic review just like the executable logic they constrain. That’s why core artifacts like boundaries.md require rigorous peer review, not just casual updates.
Step 2: The context compiler
Dumping all Markdown files into the system prompt is sometimes acceptable for small projects and small artifacts. But as the codebase grows or the context window fills with too many competing constraints, models begin to suffer from “lost in the middle” degradation and silently ignore what matters most.
The term “context compiler” might sound like a magical enterprise heavy-lift, but the reality is entirely mundane. In its simplest form, it’s just a deterministic context assembly layer combined with a routing mechanism.
Instead of treating context as a flat pile of documents, the compiler assembles it into an ordered structure. Because different artifacts apply to different parts of the project, boundaries.md in the /billing module might enforce strict isolation, while the one in /frontend might be much more permissive.
In practice, the compiler may take one of these forms:
Manual selection: The developer simply points their IDE or agent to a structured set of Markdown files.
A mundane script: A basic Python or bash script that understands a directory structure. It concatenates the .md files to build the LLM’s system prompt and hands the .yml files directly to the CI runner.
Tool-mediated context protocols: Dedicated mechanisms (e.g., MCP) that allow the agent to query the workspace and dynamically assemble the required boundaries directly within the IDE, bypassing the need for manual script invocation.
Consider a practical directory structure:
/context
/global
coding-standards.md
/domain
/billing
boundaries.md
threat-model.md
semgrep-rule.yml
/risk
boundaries.md
threat-model.md
semgrep-rule.yml
/frontend
boundaries.md
threat-model.md
semgrep-rule.yml
When generating code for the billing module, the script reads /global and /billing. The compiler simply scopes the rules based on the directory, perfectly focusing the agent’s attention on the boundaries that matter while wiring the corresponding YAML rules for deterministic CI verification.
Step 3: Strict boundary hierarchy (resolving conflicts)
When faced with conflicting instructions, LLMs don’t throw a compilation error. They hallucinate a dangerous compromise. The compiler prevents this by enforcing a deterministic precedence of declared constraints before the prompt is assembled:
Threat model > Boundaries > Coding standards > Intent + acceptance criteria
Security and architectural boundaries unconditionally overrule feature delivery. This operates at two levels. At the prompt level (soft enforcement), constraint ordering biases generation toward compliant solutions. At the postgeneration level (hard enforcement), deterministic code checks parse the generated syntax, verify structural invariants, and instantly fail the build on violation.
“Resolution” in this context does not mean an LLM philosophically negotiating between two Markdown files. It means deterministic rejection via CI. If the intent.md asks to “email a receipt to the user,” but boundaries.md forbids external network calls in the billing module, an unconstrained AI might try to generate an SMTP call. The conflict is mechanically “resolved” when the CI pipeline runs a static rule (derived from semgrep-rule.yml) and instantly fails the build. The developer (context orchestrator) must then intervene and change the design to use an event bus instead. The hierarchy is enforced by deterministic code analysis, not LLM reasoning. A rejected build is not necessarily a rejected business need; it’s a signal that declared boundaries and intended capability must be reconciled explicitly before regeneration. (This mechanical rejection physically executes during the adversarial verification phase in step 5).
We do not use AI for this validation. We use existing, proven AST tools and code linters like Semgrep, Bandit, or CodeQL to enforce these boundaries in CI/CD.
However, we must be precise about what this governance actually achieves. Deterministic checks enforce invariants, not the architecture as a whole. You can statically enforce forbidden imports, forbidden outbound I/O, strict layering, and schema conformance. You cannot statically enforce domain semantics, aggregate ownership correctness, subtle coupling, or conceptual cohesion. Deterministic verification doesn’t prove architectural correctness. It proves compliance with explicitly declared structural invariants.
Step 4: Generation
Context as code matters only if generated syntax is verified against the same boundaries that shaped it. With a compiled, conflict-free context hierarchy, the developer agent generates code inside an isolated user space sandbox. In this fleeting fraction of a second, the agent inside the developer’s IDE consumes the narrowed, precompiled system prompt and outputs the actual payment_service.py. Its role is constrained synthesis: translating the boundaries in boundaries.md and the imperatives in intent.md into code.
Step 5: Adversarial verification (negative space)
This phase checks whether the generated code crossed a forbidden boundary. Before the development cycle begins, the adversarial context provider defines threat vectors in threat-model.md. Because a Markdown file only guides the LLM softly, the governance platform engineer bridges the gap to determinism by translating those declarative threats into matching executable rules (like semgrep-rule.yml) wired into the CI gates. If the threat model identifies server-side request forgery or secrets exfiltration as a risk for the /frontend module, the corresponding CI rule parses the generated code and instantly fails the build if a known attack pattern or insecure execution sink is detected.
The pipeline doesn’t ask an LLM to read the Markdown and assess if the code is safe. It mechanically executes the prewritten rules derived from it. If a generative agent helps draft the rule set, it does so before the cycle in an isolated sandbox, and a human reviews the result before it enters CI. Step 5 doesn’t prove overall correctness; it proves that declared structural and security boundaries are enforced.
Like any static gate, deterministic boundary checks trade flexibility for safety and will occasionally reject valid implementations. That friction is intentional: Explicit override and artifact refinement are part of the governance loop.
AI code review may identify suspicious code, but it cannot certify that declared boundaries survived generation. Step 5 therefore relies on deterministic CI rules, not on a probabilistic model interpreting the pull request.
Step 6: Acceptance verification (positive space)
This phase checks whether the generated code solves the business problem. The acceptance-criteria.md defines the expected behavior not as a vague user story, but as a machine-executable contract (e.g., using Gherkin syntax):
Scenario: Successful payment emits notification
Given a valid payment of 100 EUR
When the transaction completes
Then the PaymentSuccessEvent is published to the message bus
The CI pipeline parses this exact Markdown block and runs the corresponding test suite. Step 6 provides what step 5 cannot: verification against a declared delivery contract.
The code is approved only when it passes adversarial checks and satisfies the acceptance criteria. Without step 5, the system could violate structural boundaries. Without step 6, it could implement the wrong intent. Both contracts must hold.
Artifact-bound roles and accountability
The traditional SDLC is a linear cascade: Requirements flow to architecture, then to code, then to QA. In an era where a machine generates 10,000 lines of syntax in the time it takes to fetch a coffee, that handoff is a fatal bottleneck.
In the context matrix, specialists define parallel, independent constraint vectors before generation begins. The titles on business cards stay the same. The artifacts they produce change entirely.
| Old role | New role | Artifact | Responsibility |
| Business analyst | Intent definer | intent.md + acceptance-criteria.md | Define the “what” and the deterministic proof that it was delivered |
| Software architect | World builder | boundaries.md | Define domain ontology, architectural invariants, and allowed interaction patterns |
| QA & security engineer | Adversarial context provider | threat-model.md | Define threat vectors and abuse paths before generation |
| Platform engineer/DevOps | Governance platform engineer | Compiler pipeline + CI gates (semgrep-rule.yml) | Operationalize declared constraints into nonbypassable enforcement gates |
| Developer | Context orchestrator | coding-standards.md + critical code | Resolve artifact conflicts, steer generation workflows, implement critical paths, and refine context quality |
In this model, accountability is distributed and artifact bound. Rather than handing off work downstream, each role owns specific upstream activities and constraints.
- The intent definer (formerly business analyst): Owns the business reality. They translate user needs into
intent.mdand define hardacceptance-criteria.md(like BDD scenarios or API contracts). Their job is to formulate requirements so strictly that the pipeline can automatically prove delivery, acting as the first line of defense against vague “vibe coding.” - The world builder (formerly software architect): Owns the structural gravity. They write
boundaries.mdto establish the domain ontology and hard architectural boundaries. Instead of reviewing pull requests for drift, their daily activity is defining what modules are allowed to communicate and declaring the structural invariants the generated code must respect. - The adversarial context provider (formerly QA and security): Owns the negative space. They anticipate failure modes and define threat vectors via
threat-model.md. Their responsibility is identifying the precise abuse paths that the CI pipeline must block, ensuring an LLM never tests its own code. - The governance platform engineer (formerly platform engineer/DevOps): Owns the enforcement machinery. They build the context compiler pipeline and operationalize declared constraints into nonbypassable enforcement gates. Their responsibility is the deterministic enforcement pipeline that executes declared governance artifacts at precommit and CI/CD boundaries.
- The context orchestrator (formerly developer): Owns generation orchestration and critical handwritten paths. This is a hybrid reality, not the end of programming. They write
coding-standards.md, manually implement zero-trust paths, and resolve runtime exception requests. For the bulk of the system, their focus shifts to a meta-level: resolving conflicting constraints, tuning the prompt’s signal-to-noise ratio, and debugging why a given artifact failed to govern the agent properly.
When a failure occurs, the investigation shifts from “What was the agent thinking?” to “Which contract failed to govern?” Because the pipeline deterministically enforces what was explicitly declared, failures are no longer opaque hallucinations. They’re traceable collisions between artifact boundaries. A structural flaw cleanly points to an unbounded boundaries.md. When the pipeline is green and the contracts are honest, the orchestrator acts as a firewall against process failure, not a scapegoat for undocumented assumptions.

The economics of governance
Context compilation makes economic sense only when the cost of architectural failure exceeds the cost of explicit governance. It adds upfront design work and cognitive overhead, so its value depends on how expensive a wrong system decision would be.
For rapid prototyping, throwaway utility scripts, marketing sites, or low-stakes internal tools—where the worst-case consequence of a hallucination is a misaligned dashboard—let the generative engines run unconstrained. Velocity is the only thing that matters.
For safety-critical automation, trading platforms, healthcare orchestrators, and regulated enterprise systems, the economics invert. Velocity without deterministic boundaries is simply the speed at which you accumulate liability. A single unconstrained agent importing an insecure dependency into a payment core costs orders of magnitude more than the engineer-hours spent writing a boundaries.md contract.
You don’t build a bank vault door for a garden shed. You apply context compilation where the systemic cost of emergent architectural failure is catastrophic.
Automating the word “NO”
When code generation becomes cheap, architectural entropy tends to scale with it. That makes post hoc code review less effective, especially when reviewers spend their attention on machine-generated boilerplate. A more durable approach is context review: peer review of the declarative constraints that shape what the machine is allowed to build. A reviewed boundaries.md can guide many later development cycles. A reviewed pull request usually governs only a single change.
The discipline has shifted from imperative engineering of procedures to declarative engineering of boundaries.
Let’s return to the Jira ticket that started this discussion: “Add an email notification after a successful payment.”
The business analyst submits the intent.md. Before the developer agent sees the prompt, the context compiler activates—at the precommit gate or via tool-mediated context protocols (e.g., script or MCP) in the IDE—before a line is written. It retrieves the architect’s boundaries.md, which states, “The /domain module has zero external dependencies. No network calls.” The SMTP import collides with that boundary instantly. Even if the agent generates the import, the build will not survive it—the prompt biases generation toward compliant solutions, and the deterministic static check in step 5 rejects it at the declared boundary. The Frankenstein is caught in the pipeline, not discovered in production three release cycles later.
Code generation is becoming abundant. Architectural discipline is becoming scarce.
Context as code governs what may be generated. Responsibility-oriented agents govern what may be proposed. Decision Intelligence Runtime governs what may be executed. Three boundaries. One governing frame.
The highest-value engineering skill is no longer writing syntax. It’s engineering the conditions under which correct syntax can emerge.
That is the ability to automate the word “NO.”
This article concludes the three-part series on engineering boundaries in agentic AI. The repository at github.com/huka81/decision-intelligence-runtime contains an open source reference implementation of the concepts described in this series.