< back_to_blog

Welcome to Mutagen

AnnouncementMutagen
Welcome to Mutagen

# Mutagen

I've been shipping a thing called Mutagen for a few months now, and it's finally cooked enough to point at in public. It's open source, MIT, and lives at [github.com/CHKDSKLabs/Mutagen] (https://github.com/CHKDSKLabs/Mutagen).

Mutagen is a Rust harness that runs an agentic design workflow on top of Claude Code and Codex CLI. It owns the things you cannot trust prompts with — queue selection, stage transitions, scope enforcement, evidence bundling, retry policy, verdict persistence — and hands the rest to a fixed cast of thirteen personas with bounded mandates.

The working rule, taped to the inside of the binary:

If a behavior matters, the harness enforces it or records it. If the only control is "the prompt said pretty please," that is not a control plane.

## Why it exists

If you've spent any time wiring agents together you've seen the failure modes:

  • - The agent decides Tuesday that the schema migration belongs in the API layer, and there's no mechanism to say no.
  • - The same workflow behaves three different ways across three different hosts because half the rules live in markdown that one host enforces and another one politely describes.
  • - A reviewer agent passes a slice that breaks an invariant from a design doc written four stages ago, because nothing actually carried that invariant forward as a check.
  • - "Stage 3 retries" turns out to mean "the agent asked itself nicely."

Mutagen takes the position that orchestration in prose is a smell. The Rust crate (mutagen-harness) is the canonical runtime: it reads the queue, selects the next ready slice deterministically, materializes evidence, dispatches to a host adapter, parses the verdict back, and persists state. Hosts plug in. Personas execute. Nothing critical is held together by politeness.

## The cast

Thirteen personas, each with a narrow mandate:

  • - April — interviewer; authors the upstream design bundle (PRD / ADR / DDD / ISC / DSD).
  • - Shredder — principal architect; consumes the bundle and slices it into a dependency-ordered queue.
  • - Karai — dispatcher; routes slices to executors and validates returns.
  • - Bebop — standard execution muscle (CRUD, UI, business logic, plumbing).
  • - Baxter — algorithmic and math-heavy slices.
  • - Krang — Layer 1 foundation and infrastructure.
  • - Chaplin — Layer 2 data, schemas, migrations.
  • - Tatsu — Layer 3 security; threat models before code.
  • - Metalhead — observability; SLOs, alerts, dashboards, runbooks.
  • - Bishop — principal-engineer review of completed slices.
  • - Tiger Claw — adversarial defect hunting; writes new attack tests.
  • - Splinter — human-facing documentation.
  • - Traag — scope guardian; every write passes through him before touching disk.

The pipeline is roughly: April authors → Shredder slices → Karai dispatches →the right executor implements → Bishop reviews → Tiger Claw attacks → Splinter documents → harness records verdicts and rotates the queue.

## What's actually enforced

The Claude Code build ships a PreToolUse hook that physically blocks writes outside the active slice's manifest before they happen. Codex doesn't have a hook contract on Windows yet, so there scope is advisory and reviewers are the backstop — the harness still writes the manifest between stages so the audit trail is intact, but enforcement degrades gracefully.

The harness owns:

  • - Queue mutation (every status flip, retry counter, completion timestamp goes through one canonical path).
  • - Evidence assembly (slice-scoped bundle materialized to .mutagen/state/evidence/<slice_id>.md before dispatch).
  • - Stage prompts (canonical builder for author and review; no per-host re-assembly in markdown).
  • - Verdict normalization (Tiger Claw's QA report parsed into a machine-readable retry contract).
  • - Cohort execution (bounded-parallel siblings fan out into isolated git worktrees and reconcile back into the main tree in queue order).
  • - Notifications (queue-clear, structural failure, scope violation, retry-budget exhaustion, layer-complete — all canonical intents, optional Pushover transport).

## Install

### Claude Code

/plugin marketplace add CHKDSKLabs/Mutagen /plugin install mutagen@mutagen-marketplace

### Codex CLI

bash
git clone https://github.com/CHKDSKLabs/Mutagen.git
export MUTAGEN_ROOT="/absolute/path/to/Mutagen/plugins/mutagen"

Then register the marketplace entry at ~/.agents/plugins/marketplace.json or use the repo-local .agents/plugins/marketplace.json. Skills are invoked explicitly: $mutagen-elicit, $mutagen-slice, $mutagen-execute-next, and six others. All nine are configured with allow_implicit_invocation: false` — Mutagen is a workflow, not a helpful tool, so explicit invocation is the only trigger.

## Where to start

The repo ships with a populated reference workspace at [examples/orders-demo/] (https://github.com/CHKDSKLabs/Mutagen/tree/main/examples/orders-demo) — five upstream design documents, a slice queue, and a Tiger Claw review report in their canonical filesystem layout. If you want to see what the pipeline produces before you run it, start there.

For the full feature surface, the plugin README is the source of truth. For harness internals — the state machine, the artifact contracts, the host abstraction — see harness/.

## What's next

A few things on the near horizon:

  • - Codex hooks on Windows, when upstream lands. That closes the last advisory gap.
  • - More host adapters. The trait is small and deliberate; Claude and Codex fit, anything else with a CLI prompt surface should fit.
  • - Inference-provider direct prompting (Ollama and LM Studio) for the slices that don't need a full agentic launcher in the loop.

Bug reports, host adapters, and persona proposals welcome. The contributor bar is the same as the runtime bar: be precise, cite sources, don't widen scope silently. See CONTRIBUTING.md before opening a 400-line PR.

CHKDSK Labs