reuben.brooks

The Spec Is Not a Gate, It Is a Substrate

Four places one .shen file can enforce itself — and why that changes what a specification is for.

This series is about Shen-Backpressure — a project that uses a small, formally-defined specification language (Shen) to constrain and guide an AI coding loop. The natural reading of a tool like this is that it adds a new kind of gate to your CI. You write a spec. The spec gets checked. If the check passes, the code advances. If it fails, the code stops. Gate in, gate out. Specifications as checkpoints.

That reading is correct, and it is small. The more useful way to see what’s happening here is to notice that the same specs/core.shen file — the same handful of sequent-calculus rules about balances and transactions and tenants — is being enforced in four different places, by four different mechanisms, and all four are projections of one source. The spec isn’t a gate. It’s a substrate. Gates are just one of the surfaces it can render onto.

Once you hold that frame, the engineering problem reshapes itself. You stop asking “where in the pipeline should the check go?” and start asking “which surfaces does this invariant need to reach, and does my spec language let me project there?”

The four surfaces, made concrete

Surface one: the target language’s type system, at build time. The shengen tool parses the spec and emits Go (or Rust, or TypeScript, or Python, or Java) where every domain type has private fields and a validating constructor. Amount{v: -5} does not compile. NewAmount(-5) returns an error. The Go compiler — not Shen, not the developer, not the LLM — enforces that you cannot hold an Amount whose value is negative. The spec projected into the type system of the target language; the language’s own compiler became your proof checker.

Surface two: the spec’s internal logic, at build time. shen tc+ runs the sequent calculus over the spec itself. This does not check your code. It checks your rules. If you wrote two premises that can never be simultaneously satisfied, or a conclusion the premises don’t support, tc+ fails. This is the layer nobody in the empirical-validation school has an answer for, because no amount of running programs tells you whether your specification is internally consistent. You need a deductive check, and tc+ is one.

Surface three: the spec as a runtime oracle, at test time. This is where shen-derive lives, and it’s the move I find most interesting. The same .shen file that drives shengen also contains (define ...) blocks — the obvious-correct, slow, functional definition of what your code should compute. shen-derive embeds an s-expression evaluator, runs the spec on sampled inputs, and emits a table-driven test that compares the hand-written implementation against the spec pointwise. The spec literally executes. Not as the thing that ships, but as the oracle that judges the thing that ships. If the efficient Go implementation diverges from the spec on any sampled input, the test fails.

Surface four: boundary enforcement at I/O. shengen can emit scoped DB wrappers. A TenantAccess proof produces a TenantAccessDB struct where the tenant ID is captured and cannot be changed. Every query through that wrapper is automatically scoped to the verified tenant. The proof that was erected at the HTTP boundary travels all the way down to the SQL query without anyone having to remember to pass the right argument. The spec reached past compilation, past testing, into the runtime call to the database.

Four surfaces. One file. Edit the spec, every surface updates on the next run. This is the thing the tool explainers don’t say out loud: the architecture has collapsed what used to be four separate artifacts — type definitions, consistency checks, oracle tests, and authorization middleware — into a single source that projects into all of them.

Why this matters more than it sounds

In most production codebases, a business rule like “balance must cover transaction” lives in at least four places. There is a Confluence page. There is a validation function somewhere near the HTTP handler. There is a guard in the SQL layer. There are a handful of unit tests. Each of these is an independent restatement of the same rule, maintained by different people at different times with different incentives. They drift. Not because anyone is careless — because keeping four representations of the same idea synchronized is a task no process ever fully solves.

The substrate architecture collapses this by construction. The rule exists in one place. All enforcement surfaces are generated from that place. Drift is impossible not because you have good discipline but because there is no second copy to drift from. This is the same bet that schema-first tooling (Protobuf, OpenAPI, database migrations from models) makes at the data-shape layer, extended from shapes to logic.

The reason this newly matters in the AI-coding context is specific and important. LLMs are good at writing implementations from specs. They are bad at keeping specs and implementations synchronized across files by hand. If you generate everything downstream from one source, you don’t need the model to remember the rule — you need it to produce code that the generated enforcement surfaces accept. The model can forget the invariant entirely and still not violate it, because the compiler, the test, the database wrapper, and the type checker have all been handed a projection of it.

This is the deeper meaning of backpressure. Backpressure in the Ralph sense is the signal that travels back upstream when a gate rejects work. Backpressure in the substrate sense is something larger: every surface the spec reaches is a place where bad work cannot accumulate, because the surface itself is a projection of the rule. The rule doesn’t need to be checked against the surface — the surface is the rule, embodied in the local language of that enforcement context.

The substrate property and what it requires

For a specification language to work as a substrate, it has to have three properties. Not all spec languages have them.

It has to be logic you can hand to a program, not logic you can only hand to a human. A Confluence page is not a substrate. A Word document of requirements is not a substrate. An RFC is not a substrate. These are all artifacts where the rule is encoded in a form only humans can project from, which is why the four-places-drift problem is universal.

It has to be cheap to parse and manipulate. This is where Shen’s Lisp syntax earns its keep. The spec is already an AST. Generating Go validators from it is a tree walk, not a parser project. Embedding its runtime is “load the file and call eval.” The substrate property is theoretically available to any formally-defined spec language, but in practice the ones with clean s-expression or similar representations are the ones where the projection tooling is cheap enough to actually build.

It has to be expressive enough to span the surfaces you care about. A schema language like Protobuf handles data shapes beautifully but can’t express “balance must cover transaction” as a relation across fields. A type system like Rust’s handles static structural constraints but can’t run at test time as an oracle. Shen’s sequent calculus plus its executable (define ...) blocks sits in a sweet spot: it can express predicates, relations, proofs, and computations, and all three are parsable by the same tools.

The interesting question for anyone building in this space is not “which spec language is most rigorous?” but “which spec language projects cleanly into the most enforcement surfaces I need?” Rigor that can only enforce at the build-time boundary is a gate. Rigor that enforces at four boundaries is a substrate. The value of a specification scales with the number of surfaces it can reach, and the ceiling of that scaling is set by the language’s portability, not its theoretical strength.

The shape this implies

If you take the substrate view seriously, a few consequences follow.

The spec file becomes the single most important artifact in the repository. Everything else is derived, regenerable, disposable. Editing generated code is a category error — you’ve stopped editing the substrate and started editing one of its projections, which will be erased on the next run. The discipline the repo describes (“never edit guards_gen.go”) is not arbitrary; it’s the logical consequence of treating the spec as source.

The number of enforcement surfaces should grow over time. If the spec is a substrate and you’ve only projected it into two places, you’ve left value on the table. The repo’s trajectory — from shengen’s type generation to scoped DB wrappers to shen-derive’s runtime oracle — is the natural shape of a project whose authors have started asking “where else can this rule reach?” That question has no fixed answer. Every new surface is a new place where drift becomes impossible.

Specification skills become more valuable, and implementation skills become less so. If one spec file can drive four projections, the leverage on the spec is four times the leverage on any individual projection. The person who can write a precise, minimal, well-factored spec is doing something structurally different from the person who can write good Go. This is not a new idea — it’s been true since the first DSL compiler — but the AI-coding context sharpens it. The LLM can write the Go. What it cannot do for you is decide which invariants the Go must respect. That decision lives in the spec, and the spec is now the thing every enforcement surface is answering to.

The right mental model for a tool like Shen-Backpressure is not “a type checker with a codegen step.” It’s “a system for projecting a single logical artifact into every context where it can do useful work.” The gates are one projection. The runtime oracle is another. The DB wrappers are a third. The consistency check is a fourth. There will be more. That’s the point.