Standing orders for a fleet of agent captains.
In case signals can neither be seen nor perfectly understood, no Captain can do very wrong if he places his ship alongside that of an Enemy.
A frigate sails for the West Indies. Twelve weeks each way, plus station time. The Admiral won't see that ship again for half a year. No telegraph. No radio. Once the captain clears the Channel, he's on his own.
The Admiral has four tools to bind him:
Nelson's genius: write everything down in advance, trust the captains. He called them a Band of Brothers — and made sure they didn't need micromanagement, because it wasn't possible.
A senior engineer hands a task to an agent — Cursor, Claude Code, Codex — and walks away. For minutes-to-hours-to-days, no real-time visibility. No telegraph here either.
The same four tools, modernized:
--no-verify)Same horizon problem. Same solution. The Band of Brothers is software now, and the Admiralty is one person with eleven ships at sea.
This talk is the contemporary answer to a 220-year-old problem: how an Admiral keeps a fleet honest without being on every quarterdeck.
Half the team designs. The other half writes code with AI agents. The product is the merger of the two: fast, well-branded, fully-tested software standing up in days instead of quarters.
Brand systems, UX, full-stack code. We ship the whole thing — site, app, integrations, deploys.
Cursor, Claude Code, Codex. They write the bulk of the code. We review, shape, gate, ship.
If the standing orders aren't right, the agents will happily generate ten thousand lines of code we wouldn't be proud of.
Different stacks. Different deploy targets. Different brand worlds. Same flag.
It assumes one thing: the only way code goes wrong is by failing a test.
Half of what broke during this morning's v3.1 rollout wasn't a test failure. It was a CVE in a dep. A "Looser" word retext-profanities lit up. Five page.waitForURL calls that would hang E2E for 30 seconds each. Unused exports. CSS-bytes drift. Schema drift. Bundle bloat.
A pyramid of tests catches none of these. They aren't tests at all.
Sets strategy. Picks which ships sail. Writes the standing orders. The human engineer.
Commands one ship. Wears multiple hats throughout the watch. Replaceable, interchangeable, follows orders. The AI agent.
Each layer of the gate is a duty the captain stands. Helmsman, Sailing Master, Lookout, Master at Arms — sixteen of them in our wardroom.
The SKILL files. Read on entry, carried on the voyage. The captain's autonomy is bounded by what's written here.
Even one with strong opinions. Even one in a hurry. Even one fresh from another ship. The Admiral never has to police; the duties do.
Plus The Master's Log — the daily code-quality report PR. The bound book the Admiral reads when each ship returns.
Holds the heading set by the Sailing Master. Watches the compass continuously and corrects small drift before it accumulates.
Catches stylistic and rule-level drift the moment it appears in source. Doesn't let it compound into structural mess.
Knows where the ship is, where it's going, and whether the planned course is feasible. Validates that course-changes are consistent with the chart.
Validates the shape of the code is internally consistent — types flow, contracts hold, schemas align. Rejects incoherent state transitions before runtime sees them.
Spots threats from far away — other ships, weather, hazards on the horizon — before they're visible from the deck. Provides early warning.
Detects future runtime failures from current source. Catches the class of bug that would only show up in production or in slow CI feedback.
Knows the local waters intimately and cons the ship through known hazards in this specific harbor. Doesn't pretend to know the open ocean.
Makes local-iteration gates fast by analyzing only what changed in this commit/push. Leaves full-ocean analysis to CI.
Enforces the ship's Articles regardless of the violator's rank. Doesn't write the rules — ensures they apply equally to everyone aboard, including senior officers.
Enforces policy the team has agreed to — hard, with no informal bypass. Branch protection, enforce_admins: true, commitlint, secrets sweep. Even the most senior contributor follows the same gates.
Detects and responds to live external threats. Knows the armament, the powder, the drill. Fires when there's a target.
Detects newly-disclosed external threats — vulnerabilities published since our last build — and refuses to ship until they're addressed. Live npm audit --json.
Knows what's in the magazine, the pantry, the rope locker. Tracks expirations, reorder cycles, and accepted-risk inventory the captain has approved.
Maintains a deliberate inventory of dependencies, replacement schedules, and explicitly-accepted risks. Dependabot grouped weekly PRs + frozen audit baseline.
Two duties, two different jobs. The Master Gunner doesn't read the inventory ledger; the Purser doesn't fire on attackers. We need both, separately.
Treats the wounded. Reactive — the Surgeon doesn't prevent injury; he keeps casualties alive and refuses to clear them back to duty until they're stable.
Refuses to merge code that drops the test-coverage floor. Doesn't grow the suite — gates regression of what's already covered. vitest line/branch/function/statement thresholds.
The Surgeon's apprentice. Routine sick-bay maintenance, supply prep, vitamin discipline. Keeps the crew healthy enough that the Surgeon's caseload stays manageable.
Actively grows the test suite. Writes tests where they don't yet exist so coverage rises over time, not just holds steady. Currently empty in 10 of 10 ships.
We have the Surgeon hired everywhere — bfd-platform's coverage frozen at 38 %, aatm-brain's at 45 %. The Mate is the largest open hire across the Navy.
Daily upkeep of the working surfaces — rigging tension, sail trim, deck condition, ropes, anchors. Keeps the ship moving efficiently through the water.
Maintains the byte-level health of the shipped artifact — stylesheet size, selector counts, !important counts, bundle bytes. Surface condition of what reaches the user.
Inspects the hull for rot, damage, weak planks, water intrusion. Repairs structural problems before they sink the ship. The bones; the Boatswain handles the skin.
Detects structural decay — dead exports, circular dependencies, code duplication, broken layer boundaries. Stops rot from compounding into rewrites. Fallow project graph.
Cuts, sews, and repairs the sails. Sails convert wind into motion — without them the ship can't reliably go anywhere even if everything else is in order.
Maintains the design-system rigging that makes UI output consistent and accessible to actual users. Hired only on bfd-platform; empty on the other 8 ships with UI.
565 fallow clone groups + 990 health findings, all baselined during v3.1 adoption. Frozen rot — can't get worse — but the rot is real, and burning it down is a separate workstream.
Runs the watch bill and timed drills. Discipline of the working crew. A slack First Lieutenant loses battles.
CI Test Gate parallelized — security, vitest, fallow+CSS, E2E. needs: test-gate blocks deploy.
Times the watches. Bell strikes per turn; eight bells = end of watch. Regardless of weather or mood.
90 s local / 300 s CI ceiling on npm run check. Hard-kill 30 s past. Raising the glass needs a captain's note.
Official drawn record of coastlines, channels, hazards. Updated only at port, only by authorized hand.
Playwright PNG baselines. Each PR diffs against them. Updates require deliberate --update-snapshots.
Estimating current position from heading × speed × time-elapsed since last fix. Truth between observations.
PostHog event capture, funnels, retention. Real on aatm-brain. v3.2: standardize across the Navy.
Daily official record. Same bound book; new page every day. Read in port to plan the next voyage.
Daily report cron + persistent-PR pattern. peter-evans/create-pull-request@v7. Diff is the metric history.
| Duty | IDEms · on save | Pre-commit~3 s · on commit | Pre-push~70 s · on push | CI Test Gate~3 min · parallel | Crondaily · persistent PR |
|---|---|---|---|---|---|
| Helmsman | ● | ● | ● | ● | |
| Sailing Master | ● | ● | ● | ||
| Lookout | ● | ● | |||
| Pilot | ● | ● | |||
| Master at Arms | ● | ● | ● | ||
| Master Gunner | ● | ● | ● | ||
| Purser | ● | ● | ● | ||
| Surgeon | ● | ● | |||
| Boatswain | if CSS | ● | ● | ||
| Carpenter | ● | ● | ● | ||
| Sailmaker | ● | ● | |||
| First Lieutenant | ● | ||||
| Half-Hour Glass | ● | ● | |||
| The Charts | ● | ||||
| Dead Reckoning | v3.2 | ||||
| Master's Log | ● |
The cheapest watch wins. Catch at IDE → free. Catch at CI → minutes. Catch in production → a deploy and an apology. Every duty is shoved as far left as it'll go.
|| true. No informational tier. No --no-verify. If a step is worth running it's worth failing on. Half-on means the next captain copies the half-on pattern into the next ship.e2e:preflight caught 5 page.waitForURL calls in aatm-brain missing waitUntil: "commit"5 × 30s hangs preventednpm audit found 14 high-severity advisories on bfd-front-door, 1 on bfd-platformTwo open work orders, separate PRsBring coverage from 38 % → 80 % on bfd-platform, 45 % → 80 % on aatm-brain, freeze at meaningful floors on the other 8. The Surgeon refuses regression but nobody is currently growing the suite. Largest single hire across the Navy.
Stand up token systems on aatm-brain, ncee, front-door, mcp, playbook, style-guide, cli, widget, muster. Layers 3 (theme) and 4 (contrast) currently sit empty everywhere except bfd-platform. Until we hire here, every UI surface is hand-stitched canvas.
Two consecutive Test Gate runs on bfd-platform failed flaky E2E suites; the third retry passed. Reliable drills require deterministic tests. Quarantine the flakes or fix them — but don't keep retrying.
PostHog instrumentation real today only on aatm-brain. v3.2 proposes an analytics:check gate verifying touched user-flow components include posthog.capture() calls + headline metrics in the daily Master's Log. The Reckoning is the truth between fixed observations.
~/.claude/skills/code-quality-setup/SKILL.md is symlinked into Claude Code, Codex, and Cursor. Every captain reads it before generating code in the new repo.
Drop the per-repo SKILL template at .cursor/skills/code-quality/SKILL.md. Fill in which duties are filled, pending, or N/A.
In priority order: Helmsman + Sailing Master → Half-Hour Glass + Master's Log → Pilot → Master at Arms → Surgeon → Boatswain + Carpenter → Master Gunner + Purser → Lookout → Sailmaker → First Lieutenant → Charts → Dead Reckoning.
Update ~/.claude/skills/code-quality-setup/per-repo/INDEX.md — name, cron hour, filled vs. pending duties. The convoy register is the Admiralty's roster.
Open the iter-N PR. Each layer commit gets its own message. Gaps land documented, not pretended-shipped. PR description names every filled duty + every documented gap.
Don't skip step 2. The captain reads the per-repo SKILL on subsequent visits — an undocumented ship is one the next agent will misunderstand.
Before writing code on a repo, find ~/.claude/skills/code-quality-setup/SKILL.md. Symlinked into all three agents. Find an answer before asking me.
Every file the stack needs has a copy-paste template in the standing orders. Don't reinvent the Half-Hour Glass. Don't fork the Master's Log generator. Use what's there.
If a duty is hurting more than it helps, say so. The ratchet only works if the floors match the actual ship. We tighten together. The Articles are revised in port, not at sea.
Surgeon's Mate fleet-wide. Sailmaker on 8 of 10 ships. Drillmaster (First Lieutenant) on bfd-platform. Pick a ship, pick a duty, hire one this quarter.
That's what we built on 2026-04-28. Eleven ships under one flag, one wardroom of duties, sixteen seats — three of them still empty. The Navy is real; the gaps are documented; the persistent log fires daily.
Nelson didn't win Trafalgar because his captains went off-script. He won it because his standing orders were detailed enough that they didn't have to ask.