reuben.brooks

One Spec, Five Languages, Five Different Guarantees

The strength of a compile-time invariant is bounded by the weakest link in the enforcement chain — and different languages provide very different enforcement substrates.

Here is a question that I think deserves more attention than it has received: if you have a formal invariant, say “amounts must be non-negative,” and you generate enforcement code in five different languages from that single specification, do you actually get the same guarantee in each? The answer, it turns out, is no, and not even close. I suspect most developers have an intuitive sense that languages differ in their enforcement capabilities, but the nature of the divergence is more interesting than a simple ranking of “strong” versus “weak” might suggest. Understanding where and why the guarantees differ is, I think, the key to making this approach practical rather than merely elegant.

The Spec

Consider the following specification, written in Shen’s type system:

(datatype amount
X : number;
(>= X 0) : verified;
====================
X : amount;)

This is the same regardless of target language. Shen does not care about Go or TypeScript. It is pure math: if X is a number and X is greater than or equal to zero, then X is an amount. The specification exists in a space that is, in a meaningful sense, prior to any particular language’s type system. This reminds me of how mathematical proofs exist independently of the notation used to express them, though the notation can constrain what is easy or hard to express. The interesting question, then, is what happens when this abstract specification becomes concrete code in a particular language. The answer reveals something about the hidden assumptions we carry when we think about type safety.

Tier 1: The Compiler as Structural Enforcer

In languages like Go, Rust, Java, Swift, Kotlin, and C#, the compiler itself becomes the enforcement mechanism. Consider Go:

type Amount struct{ v float64 } // lowercase v = unexported
func NewAmount(x float64) (Amount, error) {
if !(x >= 0) { return Amount{}, fmt.Errorf("x must be >= 0: %v", x) }
return Amount{v: x}, nil
}

If code outside the package attempts Amount{v: -5}, this is a compile error. Not a warning, not a lint suggestion, but a hard refusal to produce a binary. The Go compiler is the enforcer. Now, Rust takes this further in an interesting way:

pub struct Amount { v: f64 } // v is private by default
impl Amount {
pub fn new(x: f64) -> Result<Self, String> {
if !(x >= 0.0) { return Err(format!("x must be >= 0: {}", x)); }
Ok(Amount { v: x })
}
pub fn val(&self) -> f64 { self.v }
}

Not only is v private, but shengen can derive no Clone on guarded types, preventing bypass via cloning a partially-constructed value. Rust’s ownership system adds an enforcement layer that Go simply does not have. I find this instructive because it illustrates how the same formal specification, when projected onto different type systems, picks up additional guarantees almost as a byproduct of the target language’s existing machinery. The ownership model was not designed for this purpose, but it serves it nonetheless.

What you get in Tier 1 is that structural bypass is a compile error. The constructor is the only path to the type. The compiler enforces the specification with zero runtime overhead beyond the validation check itself. If the code compiles, the invariant holds at every construction site. There is something deeply satisfying about this, and I think it is related to why formal methods have historically found their strongest foothold in languages with rich compile-time guarantees. The enforcement is not merely conventional; it is structural.

Tier 2: Compile-Time Guarantees with an Escape Hatch

TypeScript presents a more nuanced case, and I think the nuance is worth dwelling on because it reveals a hidden premise in how we typically think about type safety.

class Amount {
private constructor(private readonly v: number) {}
static create(x: number): Amount {
if (!(x >= 0)) throw new Error(`x must be >= 0: ${x}`);
return new Amount(x);
}
val(): number { return this.v; }
}

Writing new Amount(-5) produces a TypeScript compile error because the constructor is private. If you are writing TypeScript and running tsc, you get what appears to be a Tier 1-equivalent guarantee in your source code. But TypeScript transpiles to JavaScript, and after transpilation, private disappears. In the emitted JavaScript, new Amount(-5) works perfectly well. A determined developer, or an LLM that has been asked to write JavaScript rather than TypeScript, can bypass the constructor entirely. The runtime validation in create() still catches Amount.create(-5) because the throw survives transpilation, so you get runtime enforcement even in JavaScript, but the compile-time wall has a door in it.

I find this situation analogous to a concept from institutional economics: the difference between a rule that is enforced by physical constraint versus one enforced by social convention backed by occasional auditing. The TypeScript compiler provides genuine enforcement within its domain, but the transpilation boundary is a trust boundary. Code on the other side of that boundary operates under different rules entirely. Shengen for TypeScript also supports #private fields using the ECMAScript private fields syntax, which does survive transpilation, though browser support and tooling are still catching up. It seems to me that the TypeScript case is particularly revealing because it forces us to ask: what exactly do we mean when we say a type system “enforces” something? The enforcement is real, but it is bounded by the compilation context in a way that Tier 1 languages’ enforcement is not.

Tier 3: Convention and Runtime

Python, Ruby, Lua, and plain JavaScript occupy a different position on this spectrum, and I think it is important to be precise about what is and is not lost.

class Amount:
def __init__(self, x: float):
if not (x >= 0):
raise ValueError(f"x must be >= 0: {x}")
self._v = x
@property
def val(self) -> float:
return self._v

Python has no compile-time privacy. Writing amount._v = -5 works. The underscore is a naming convention, not a language guarantee. Mypy catches type mismatches but not privacy violations. There is no compilation step that catches structural bypass. Amount(-5) raises a runtime ValueError, so the validation does run, but nothing prevents post-construction mutation or direct field access. For stronger enforcement in Tier 3 languages, shengen can use closures:

def new_amount(x: float):
if not (x >= 0):
raise ValueError(f"x must be >= 0: {x}")
def val():
return x
return val

The captured x has no externally accessible address. You cannot mutate it. You cannot inspect it except through val(). This is genuinely stronger than the _v convention, but ergonomically worse since every access becomes a function call rather than a property access. I suspect that in practice, the choice between class-based convention and closure-based enforcement in Python depends heavily on the specific context: how critical is the invariant, and how much ergonomic cost is the team willing to bear? This is not a purely technical question; it is a question about organizational trust and the probability of accidental violation, which depends on team norms and the likelihood of accidental violation.

Three Layers of Protection

Every language, regardless of tier, receives three layers of protection from this approach. The first layer is runtime validation: the constructor checks the invariant, and this works identically everywhere. The second layer is compiler enforcement, which is where languages diverge dramatically. Go and Rust produce hard compile errors for structural bypass. TypeScript produces compile errors within its own context but loses them after transpilation. Python relies on naming convention alone. The third layer is deductive verification: Shen’s own type checker (shen tc+) proves the consistency of the specification as a subprocess, and this too is universal because it does not depend on the target language’s type system at all.

Layers one and three are universal precisely because they do not depend on the target language’s type system. Layer one runs in the constructor. Layer three runs in Shen. It is layer two where the interesting divergence occurs, and I think it is the most important layer for AI-assisted coding loops specifically because it is the layer that turns invariant violations into compiler errors, which are the kind of feedback that large language models are demonstrably best at acting on. There is a hidden premise here that I want to make explicit: the value of compile-time enforcement is not merely about catching errors early in some abstract sense. It is about the structure of the feedback signal. A compile error is unambiguous, localized, and actionable. A runtime error may be any of those things, or it may be none of them, depending on test coverage and the distance between the violation and its observable consequences.

Implications for AI-Assisted Development

In a Go coding loop, when an LLM writes Amount{v: -5}, it receives a clear, specific error message: “cannot refer to unexported field v in struct literal of type shenguard.Amount.” The LLM knows exactly what to do with this. It uses NewAmount(-5) instead, which returns an error, which it handles. The type system guided it to the correct path through what amounts to a structured conversation between the model and the compiler.

In a Python coding loop, the same conceptual mistake does not produce a compile error. The LLM writes Amount(-5), gets a runtime ValueError, and fixes it. But it could also write amount._v = -5 and no tool would catch this until a test happens to exercise that particular path. This does not mean Python is useless for specification-driven development. It means the backpressure is softer: runtime errors instead of compile errors, convention instead of enforcement. The audit gate becomes correspondingly more important in Tier 3 languages. If someone edits the guard file to skip validation, the audit catches it even when the compiler cannot.

Choosing a Language for Formal Invariants

If you are choosing a language for a new project that will use AI coding with formal invariants, the enforcement spectrum matters in concrete ways. For safety-critical domains like healthcare, finance, or infrastructure, Tier 1 languages like Go or Rust make the compiler itself the enforcer, and this is probably where you want to be. For web applications with a TypeScript team, the tsc guarantees are real within your codebase, and the JavaScript escape hatch matters less if you control the build pipeline. For an existing Python codebase, Tier 3 still provides genuine value: the specification documents invariants, the runtime validates them, and shen tc+ proves consistency. You get perhaps sixty percent of the benefit with zero language migration cost, which is a tradeoff that may well be worth making depending on the circumstances.

The point, as I see it, is not that every language is equal. They manifestly are not. The point is that every language benefits from formal specification, and the specification itself is portable across all of them. Write the specification once, generate guard types for each language in your stack, and get the strongest guarantee each language can offer. This is a pragmatic position, not an idealistic one, and I think it is more honest about the messy reality of polyglot codebases than an approach that insists on a single “correct” language for formal methods.

The Self-Replicating Specification Chain

One final point that I think deserves attention: what about languages that shengen does not yet support? The create-shengen meta-specification is an 875-line document that teaches an LLM to build a shengen for any target language. It parameterizes the enforcement mechanism (unexported fields, private constructors, closures), error handling patterns (error returns, exceptions, Result types), naming conventions, value accessor syntax, sum type implementation, and set membership syntax. A single LLM invocation with this specification produces a working shengen for a new language. The formal verification supply chain is, in a meaningful sense, self-replicating. I am not entirely sure what to make of this. It is both powerful and somewhat unsettling, in the way that any self-referential system tends to be. The specification for generating specification-enforcers is itself a specification, and the question of who verifies the verifier is one that I suspect will become increasingly important as these systems proliferate.