mwaddip/design-by-contract-for-ai-agents.md

Design by Contract for AI Agent Orchestration

A methodology for coordinating multiple autonomous AI coding agents across a multi-repository codebase. Each component is maintained by a separate AI session, coordinated by a human operator through interface contracts.

The Problem

AI coding agents are competent within a single codebase but dangerous across boundaries. A session that knows domain A will confidently build something in domain B that compiles, passes tests, and is fundamentally wrong — because it doesn't understand the trust model on the other side.

The failure mode isn't incompetence. It's confident competence in the wrong domain.

The Solution: Contracts as the Leash

Borrow Bertrand Meyer's Design by Contract — preconditions, postconditions, invariants — and apply it at the architecture level, not the function level.

Every component boundary gets a written contract that defines:

What it provides (postconditions): CLI commands, API endpoints, config files, output formats
What it expects (preconditions): input formats, installed dependencies, available services
What it must not change (invariants): shared config schemas, file ownership, naming conventions

The contracts live in a shared repository (e.g. facts/) that every agent can read but only the main session can write. The contract is the single source of truth. Code is an implementation of the contract, not the other way around.

Architecture

Session Topology

                    ┌─────────────────┐
                    │  Human Operator  │
                    │  (the gate)      │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │  Main Session    │
                    │  owns contracts  │
                    │  writes prompts  │
                    │  never writes    │
                    │  component code  │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼──────┐ ┌────▼───────┐ ┌────▼───────┐
     │ Component A   │ │ Component B │ │ Component C │
     │ (own repo)    │ │ (own repo)  │ │ (own repo)  │
     │ reads contract│ │ reads contract│ │ reads contract│
     │ owns its code │ │ owns its code│ │ owns its code│
     └───────────────┘ └─────────────┘ └─────────────┘

Roles

Human operator: The only entity that can dispatch work between sessions. Approves every cross-session interaction. This is not a bottleneck — it is a security boundary.

Main session: Knows the full architecture through contracts, never through code. Writes interface contracts. Writes prompts for component sessions. Pulls component updates and verifies integration. Never edits component source code.

Component sessions: Expert within their contract boundary. Receive work as prompts (markdown files). Implement against the contract. Push to their own repo. Never edit files outside their working directory.

The Contract Change Pipeline

Update the contract — this is the spec
Push the contract
Each affected component session reads the updated contract
Component sessions implement against the new contract
Main session pulls all components, verifies integration

Never change code first and contract after. The contract leads. Code follows.

Core Principles

1. The Contract is the Interface

Don't document what the code does. Define what the boundary requires. A good contract produces correct implementations even when the prompt doesn't explicitly specify details.

Investment in the contract saves investment in the prompt.

2. Working Directory is the Boundary

Every session's working directory is its boundary. Files outside it belong to someone else. If code outside your repo needs changes, write a prompt for the operator to deliver — don't edit it yourself.

This is bidirectional. The main session doesn't edit component code. Component sessions don't edit the main repo. The human carries prompts between them. Every boundary crossing goes through a human.

3. The Human is the Gate

Cross-session dispatch always requires human approval. Even when the dispatch is obviously correct. Even when the agent is in a flow and "just needs a quick thing" from another session. Especially then — that's exactly when boundaries erode.

The pattern:

Agent: [DISPATCH] component-B: update schema to match new contract — send? (y/n)
Human: y

One line to ask, one character to approve. Zero exceptions.

4. Presence as State

Prefer observable state over configured state. If a component exists, the feature is supported. If a file is present, the step is complete. Don't configure what you can observe. Don't coordinate what you can derive.

Applied to contracts: if the contract defines it, the component must implement it. If the contract doesn't mention it, the component shouldn't assume it.

5. Interface Integrity

When two components disagree, the contract is wrong or one implementation is wrong. Never write an adapter to bridge the mismatch. Never wrap the disagreement in glue code. Trace the error to its source and fix it there.

Adapters hide bugs. Adapters breed more adapters. Fix the interface.

6. Derivation Over Configuration

If both sides of a boundary can compute a value from shared inputs, don't configure it. Every configured value is a synchronization point that can drift. Every derived value is a function that can't.

Contract Format

Contracts are markdown files. Dense, structured, no prose. Optimized for LLM consumption.

A contract typically contains:

# Component Interface Specification

## CLI Commands
| Command | Arguments | Output | Exit codes |
|---------|-----------|--------|------------|
| `tool action` | `--flag <value>` | JSON to stdout | 0=success, 1=failure |

## Config Files
| File | Owner | Schema |
|------|-------|--------|
| `/etc/app/config.yaml` | This component writes, others read | `key: type` |

## Preconditions
- Package X must be installed
- Service Y must be running
- Config file Z must exist with keys A, B, C

## Postconditions  
- After finalization, file F exists with permissions P
- Service S is enabled and running

## Invariants
- Shared config schema is defined by the contract, not by any component
- Field names are role-descriptive, not domain-specific

Prompt Format

Prompts are the communication channel between the main session and component sessions. They are markdown files that describe what needs to change and why.

A good prompt:

States the goal (what the contract requires)
Provides context (why this change is needed)
References the contract (which section defines the expected behavior)
Does NOT prescribe implementation details (the component session knows its own code)

A bad prompt:

Includes code to copy-paste
Tells the session which files to edit
Assumes knowledge of the component's internal architecture

The contract defines what. The prompt explains why. The component session decides how.

Anti-Patterns

"I'll just fix both sides"

An agent sees a mismatch and edits code in two repositories. The second repo's session has no knowledge of the change. The next pull overwrites it silently.

Fix: Write a prompt. Let each session own its own code.

"This is so small it doesn't need approval"

An agent dispatches a trivial fix to another session without asking the human. The "trivial" fix conflicts with work the other session is mid-way through.

Fix: Every dispatch goes through the human. The smaller it seems, the more important the ask.

"The contract uses the wrong term for my domain"

An agent encounters a contract that uses generic terminology instead of domain-specific language. Instead of implementing under the contract's field name, it renames the field to match its domain's vocabulary, breaking every other component that reads that field.

Fix: The contract defines the vocabulary. Implement under the contract's terms. If the contract needs updating, update it first — through the main session.

"I'll add a wrapper to make it work"

Two components disagree on a format. An agent writes an adapter layer to translate between them. Now three things can be wrong instead of one, and the adapter must be maintained forever.

Fix: Find which side is wrong. Fix that side. Delete the adapter.

Verification

Every component delivery should be verified against available sources of truth:

The contract — does the implementation match what was specified?
The reference implementation — does it behave the same as the existing system? (if applicable)
Reality — does it work with real data, not just test fixtures?

Any two agreeing against the third is a finding. Reality is the tiebreaker.

Scaling

This methodology scales by adding sessions, not by making sessions larger. Each new component gets its own session, its own contract, and its own boundary. The main session's job doesn't grow with component count — it only coordinates contracts and verifies integration.

Origin

This methodology was not designed. It was discovered by building a system where AI agents kept breaking each other's work until the boundaries became explicit enough that they couldn't. Every rule exists because its absence caused a specific, painful failure.

The contracts are not documentation. They are the architecture.