the harness

the deliverable

We ship a harness,
not a deck.

Most consulting ends with a document, and a document cannot enforce itself. We end with a running system — and what you get is an operation that enforces its own rules, frees your team from mechanical rework, and records every decision so nobody re-argues last quarter's conclusions.

what it is

A working system that runs alongside your team — not a report about one.

It sits inside the work your people already do — the documents, the decisions, the daily process. It does not make the judgment calls, and it is not the AI itself. What it does is hold the rules your operation runs on, turn the way your team works into repeatable procedure, and keep a record of every task, decision, check, and result. After fourteen weeks, the rules, the history, and the way of working are yours to keep.

architecture

  1. spec your rules
  2. skill your methods
  3. agent who acts
  4. runtime the record
  5. evidence the proof

Each decision is recorded against the rule it followed — then feeds the next one. The chain is the product.

what it enforces

“Done” means the evidence exists — not that someone thinks it’s good.

four places where “should” becomes “must”

  1. Finish is evidence-gated

    The system will not mark work done until every required check has passed, the artifacts exist, and the verification is on record. A claim of “done” with nothing behind it is simply not accepted.

  2. Critical checks can’t be self-graded

    The concerns most tempting to wave through are confirmed by an automated test, not by the agent’s own say-so. The agent cannot mark them passed — only the test can.

  3. Sensitive work is reviewed before it starts

    When a task touches a sensitive surface, a security review must be on record before the work is even handed off. The review is a precondition, not an afterthought.

  4. Drift is caught, not discovered

    The rules you set are checked against what the system actually does. When the two drift apart, it flags the gap — so the way things are meant to run and the way they actually run never quietly diverge.

how the work divides

The agent that builds is never the agent that judges.

Work moves between defined roles through a written hand-off, not a chat thread. Each role has one job — and the one who checks the work starts fresh: it re-runs the checks itself and points to the exact evidence, instead of taking the builder’s word for it.

  1. one

    Architect

    Writes the binding design — the contract the build must satisfy — before a line of code is written.

  2. two

    Builder

    Implements against that contract, recording evidence as it goes.

  3. three

    Evaluator

    Checks the work independently, in a fresh context — runs the tests, cites the evidence. Not a rubber stamp.

  4. four

    Reconciler

    Keeps the written rules and the running code in agreement as both keep changing.

in practice

A rule, enforced — not a paragraph in a doc.

Take one rule: no piece of work moves forward until someone has reviewed it. The system makes that sign-off a precondition — it will not let the work advance while the review is missing, and it records the review, the work, the checks, and the result as it goes. The rule does not live in a handbook nobody opens. It is enforced as the work happens, and every step is on the record.

the record

Every decision is written down where it can be read back.

Each task, decision, check, and result is kept in a durable, searchable record — one you can export, and one that survives even if the work is interrupted. The trail is not a by-product; it is part of what you are buying. When a decision is questioned next quarter, the answer — and the rule it followed — is already on file.

not locked in

One method. Your choice of engine.

The same method runs across the major AI agent platforms, so your operation is never married to a single vendor’s AI. The rules, the roles, and the record are yours; the underlying AI is a part you can swap.

  • Claude Code
  • Codex CLI
  • Gemini CLI

open by design

specability is our own agent harness — and we are opening it.

We do not sell a system we would not run ourselves. The harness behind every engagement is the same one we are preparing to open-source. A firm that runs its own system is the firm to trust with yours.

request early access →

how it maps

one engagement · one artifact

  1. one

    Discovery

    Find the invariants. Read the history; name the few rules that were always doing the work, and the one surface that is genuinely judgment.

  2. two

    Specification

    Encode them, once, where everyone can see them. The spec is not documentation of the system; it is the system's source of truth.

  3. three

    Harness

    Ship the thing that runs the spec. The boundary between rule and judgment is now enforced by software, not described in a slide.