reuben.brooks

From Sequent Calculus to Guard Types: How Shengen Works

Six patterns classify every Shen datatype rule, and each generates different code. A tour of the codegen taxonomy and the hidden structure of the validation problem.

There is something striking about the ratios. The payment processor spec is 48 lines of Shen. The generated Go code is 120 lines. The email campaign spec is 93 lines; its generated code, 380 lines. The multi-tenant API spec runs to about 100 lines and produces 290 lines of Go. In every case, the spec is shorter than the generated code, easier to review, and encodes invariants that would otherwise require hundreds of lines of hand-written validation — validation that, in my experience, someone will eventually forget to call. I suspect most working programmers have an intuitive sense of this problem. We know that validation logic sprawls, that it becomes inconsistent, that the gap between what we intended and what we enforce widens quietly over time. What interests me about Shengen is that it takes a specific position on where that gap comes from and offers a surprisingly compact mechanism for closing it.

The core notation is sequent calculus, the same formalism used in programming language theory papers. Premises sit above a horizontal line, the conclusion below:

(datatype amount
X : number;
(>= X 0) : verified;
====================
X : amount;)

This reads: if X is a number, and X >= 0 is verified, then X inhabits the type amount. It is not a type annotation or a schema; it is a proof rule, a statement about what must be true for a value to carry this type. If you have read PL theory papers, you have seen this notation before. If you have not, I think it is more accessible than it looks at first glance, because each block is self-contained and there are only a few recurring patterns. You can learn to read it in an hour, which is more than I can say for most formal specification languages I have encountered.

What Shengen does with these blocks is classify them. Every datatype block falls into one of six patterns, and each pattern generates different code. I want to walk through all six, because I think the taxonomy itself reveals something interesting about the hidden structure of the validation problem — that what we usually treat as a single undifferentiated task (checking that data is good) actually decomposes into qualitatively distinct operations, each with different implications for code generation.

The first pattern is what I would call a wrapper: no validation, just naming.

(datatype account-id
X : string;
==============
X : account-id;)

This generates the most minimal Go imaginable:

type AccountId struct{ v string }
func NewAccountId(x string) AccountId { return AccountId{v: x} }
func (t AccountId) Val() string { return t.v }

No error return, no validation. This is pure naming — distinguishing an account ID from any other string at the type level. The compiler then prevents NewTransaction(amount, accountId, tenantId) when you meant NewTransaction(amount, fromAccountId, toAccountId). You would be amazed, or perhaps you would not be, at how many production bugs amount to “passed the right type of string to the wrong parameter.” This pattern addresses a class of error that exists entirely in the semantic gap between what the programmer meant and what the type system can see. It is, in a sense, the simplest possible proof: this string is not just any string, it is this particular kind of string.

The second pattern introduces runtime validation at the boundary — what I think of as the constrained type:

(datatype amount
X : number;
(>= X 0) : verified;
====================
X : amount;)
type Amount struct{ v float64 }
func NewAmount(x float64) (Amount, error) {
if !(x >= 0) {
return Amount{}, fmt.Errorf("x must be >= 0: %v", x)
}
return Amount{v: x}, nil
}
func (t Amount) Val() float64 { return t.v }

The constructor now returns (Amount, error). The validation runs once, at construction time. After that, any function receiving an Amount knows it is non-negative without checking again. This is the principle sometimes called “parse, don’t validate” — the parsed representation carries the proof with it. I find this idea compelling in a way that goes beyond mere convenience. There is a hidden premise in the conventional approach to validation that I think deserves scrutiny: the assumption that checking a value’s properties is something you do at the point of use, repeatedly, scattered across a codebase. This seems natural, but it implicitly treats the knowledge that “this number is non-negative” as something that cannot be captured in the type system, that must be re-established each time. The constrained pattern rejects that premise. Multiple constraints compose straightforwardly; the dosage calculator’s patient-weight type, for instance, requires both (> X 0) and (<= X 500), and the generated constructor checks both.

The third pattern is the composite, where multiple fields are typed by guard types rather than primitives:

(datatype transaction
Amount : amount;
From : account-id;
To : account-id;
===================================
[Amount From To] : transaction;)
type Transaction struct {
amount Amount
from AccountId
to AccountId
}
func NewTransaction(amount Amount, from AccountId, to AccountId) Transaction {
return Transaction{amount: amount, from: from, to: to}
}

The constructor takes guard types, not primitives. You cannot create a Transaction with a raw float64; you need an Amount, which means passing through NewAmount, which validates non-negativity. The type system chains the proofs. This is where I think the approach starts to exhibit something genuinely interesting from a theoretical standpoint. Each composite type is not merely a container of validated data; it is a node in a proof graph, and the edges of that graph are enforced by the compiler. A composite type is like a document that can only be assembled from notarized components — the notarization has already occurred and is attested by the type itself.

The fourth pattern is what Shengen calls guarded — a composite that also includes validation involving its fields:

(datatype balance-invariant
Bal : number;
Tx : transaction;
(>= Bal (head Tx)) : verified;
=======================================
[Bal Tx] : balance-checked;)
type BalanceChecked struct {
bal float64
tx Transaction
}
func NewBalanceChecked(bal float64, tx Transaction) (BalanceChecked, error) {
if !(bal >= tx.Amount().Val()) {
return BalanceChecked{}, fmt.Errorf("bal must be >= tx.amount")
}
return BalanceChecked{bal: bal, tx: tx}, nil
}

Note how (head Tx) becomes tx.Amount().Val(). Shengen resolves the Shen accessor chain — head of a three-element list means the first element, which maps to the amount field — into Go accessor calls. The premise (>= Bal (head Tx)) becomes bal >= tx.Amount().Val(). This is where the code generation bridge does real work, translating between the mathematical notation and idiomatic Go. I find this translation step particularly interesting because it sits at exactly the boundary where formal specification languages usually break down. The gap between “what the math says” and “what the code does” is precisely where subtle bugs live, and Shengen’s pattern-classification approach gives it a structured way to cross that gap rather than leaving it to ad-hoc interpretation.

The fifth pattern is perhaps the most powerful: the proof chain, which requires previous proofs as input.

(datatype safe-transfer
Tx : transaction;
Check : balance-checked;
=============================
[Tx Check] : safe-transfer;)
type SafeTransfer struct {
tx Transaction
check BalanceChecked
}
func NewSafeTransfer(tx Transaction, check BalanceChecked) SafeTransfer {
return SafeTransfer{tx: tx, check: check}
}

There is no validation logic in the constructor, but it requires a BalanceChecked value, which can only exist if the balance check passed. The proof is transitive: SafeTransfer implies BalanceChecked, which implies bal >= tx.amount, which implies amount >= 0. Any function accepting a SafeTransfer gets all four guarantees for free. In the multi-tenant demo, this chain runs five links deep. At the end, a ResourceAccess type carries proofs of JWT validity, token freshness, authentication, tenant membership, and resource ownership — and the handler code simply accepts the type. I suspect this is where the real leverage lies. In conventional code, these kinds of transitive guarantees exist only as comments or as implicit contracts maintained by programmer discipline. The proof chain pattern makes them structural, enforced by the compiler, and — crucially — visible in the spec.

The sixth and final pattern handles sum types, where multiple paths lead to the same conclusion type:

(datatype human-principal
Auth : authenticated-user;
===========================
Auth : authenticated-principal;)
(datatype service-principal
Cred : service-credential;
============================
Cred : authenticated-principal;)

Two blocks produce the same conclusion type. Shengen generates a Go interface with a private marker method:

type AuthenticatedPrincipal interface {
isAuthenticatedPrincipal()
}
type HumanPrincipal struct { auth AuthenticatedUser }
func (t HumanPrincipal) isAuthenticatedPrincipal() {}
type ServicePrincipal struct { cred ServiceCredential }
func (t ServicePrincipal) isAuthenticatedPrincipal() {}

The private marker method seals the interface — no external code can implement it. This gives you sum types in Go: an AuthenticatedPrincipal is either a HumanPrincipal or a ServicePrincipal, and nothing else. Cron jobs and API callers can flow through the same proof chain via different entry points. I think this pattern is worth pausing on because Go famously lacks algebraic data types, and the usual workarounds — empty interface values, stringly-typed discriminators — sacrifice precisely the kind of compile-time safety that the rest of the system is trying to establish. The sealed interface approach is a well-known Go idiom, but generating it from a formal spec closes the loop in a way that manual coding does not.

Having walked through the six patterns, I want to return to the question of the spec-to-code ratio, because I think there is something deeper going on than mere brevity. The payment processor spec is 48 lines; the hand-written equivalent, including tests for each validation, would be roughly 200 lines. The email campaign spec is 93 lines versus perhaps 600 hand-written. The multi-tenant API is 100 lines versus perhaps 500. The dosage calculator, 100 lines versus perhaps 700. The spec is reviewable in a way that the generated code is not, and does not need to be. A domain expert can read 48 lines of sequent calculus and verify that the payment invariants are correct. The generated code is correct by construction, modulo bugs in Shengen itself, a caveat I want to flag honestly because it matters: the trust model shifts from “did every developer correctly implement every validation” to “is the code generator correct.” This is a meaningful trade, not an elimination of risk. But I suspect it is a favorable one, because the generator is a single artifact that can be tested and audited, whereas hand-written validation is distributed across an entire codebase and maintained by an ever-changing cast of developers.

There is a meta-level to this system that I find intriguing. The create-shengen command is an 875-line specification that teaches an LLM to build a Shengen for any target language. It specifies the input grammar (Shen datatype blocks), the classification algorithm, the per-language enforcement mechanism (unexported fields in Go, pub(crate) in Rust, private constructors in TypeScript, closures in Python), and the code generation patterns for each of the six type categories. This is, in effect, a compiler-generating compiler. One LLM invocation produces a complete Shengen for Rust or Python or C#. The formal verification supply chain then becomes: one human writes a spec, one tool — potentially LLM-generated — produces guard types, and the target language’s compiler enforces them. I am not entirely sure what to make of this. On one hand, it is an elegant demonstration of how formal specifications can propagate through layers of tooling. On the other hand, it introduces a dependency on LLM correctness at the tooling layer, which raises questions I do not think have been fully answered yet about how one audits and trusts such a chain. It seems to me that the trust problem does not disappear; it shifts shape.

Shengen’s internals, roughly 1600 lines of Go, work in four stages. First, parsing: read .shen files, tokenize, parse (datatype ...) blocks as S-expression trees. Second, classification: build a symbol table mapping Shen type names to Go names and categories, using two-pass resolution to handle cases where the block name differs from the conclusion type name (for example, block balance-invariant concludes type balance-checked). Third, resolution: map verified premises to Go conditions, translate accessor chains like (head X), (tail X), (head (tail X)) to Go method calls, and handle structural equality premises where accessor resolution fails. Fourth, emission: generate the Go file with package declaration, imports, and one type block per datatype including struct, constructor, accessors, and a String method. The generated file always carries a DO NOT EDIT header, and Gate 5 of the audit process verifies that the committed file matches what Shengen would generate. Nobody can hand-edit the guards.

I want to close with an open question that I have been turning over. The six-pattern taxonomy feels right to me — it captures the validation structures I encounter in practice — but I wonder whether it is complete. Is there a seventh pattern that would naturally arise in, say, concurrent systems where proofs need to be invalidated and re-established? What about temporal properties, where a proof holds only for a bounded duration? The token freshness check in the multi-tenant demo gestures in this direction, but it treats time as just another constraint, not as a fundamentally different kind of premise. It seems to me that there might be a meaningful distinction between proofs that hold permanently once established and proofs that decay, and that this distinction could warrant its own pattern. I do not have a clear answer here, but I suspect that the boundary of the six-pattern taxonomy marks the boundary of the class of systems for which this approach is most naturally suited, and that understanding where it breaks down would teach us something about the deeper structure of the validation problem.