Why this analogy is the right one

The 220-year-old horizon problem

A First Lord at the Admiralty couldn't see his fleet
for six to eighteen months.

1805

A frigate sails for the West Indies. Twelve weeks each way, plus station time. The Admiral won't see that ship again for half a year. No telegraph. No radio. Once the captain clears the Channel, he's on his own.

The Admiral has four tools to bind him:

The Articles of War — punishable regardless of context
Standing orders — read in port, carried on the voyage
A trained crew — every warrant officer drilled the same way
The ship's log — read on return, the only audit trail there is

Nelson's genius: write everything down in advance, trust the captains. He called them a Band of Brothers — and made sure they didn't need micromanagement, because it wasn't possible.

2026

A senior engineer hands a task to an agent — Cursor, Claude Code, Codex — and walks away. For minutes-to-hours-to-days, no real-time visibility. No telegraph here either.

The same four tools, modernized:

The Articles → the doctrine (the ratchet rule, no --no-verify)
Standing orders → the SKILL files, read on entry to every repo
A trained crew → the wardroom of duties wired into every gate
The ship's log → the daily code-quality report PR

Same horizon problem. Same solution. The Band of Brothers is software now, and the Admiralty is one person with eleven ships at sea.

This talk is the contemporary answer to a 220-year-old problem: how an Admiral keeps a fleet honest without being on every quarterdeck.

The Admiral-at-home problem · 1805 ↔ 202603 / 20

Setting the stage

Who we are

Black Flag is a design + AI consultancy.
We build software our clients couldn't.

Half the team designs. The other half writes code with AI agents. The product is the merger of the two: fast, well-branded, fully-tested software standing up in days instead of quarters.

01 · WHAT

Design + product engineering for ambitious clients.

Brand systems, UX, full-stack code. We ship the whole thing — site, app, integrations, deploys.

02 · HOW

AI agents are first-class teammates.

Cursor, Claude Code, Codex. They write the bulk of the code. We review, shape, gate, ship.

03 · WHY THIS TALK

Agents don't get tired of bad habits.

If the standing orders aren't right, the agents will happily generate ten thousand lines of code we wouldn't be proud of.

What BFD ships04 / 20

Eleven ships under one flag

10 clients · 11+ active product repos

Every line of code we wrote today is on its way
to one of these.

Healthspan Wealth

Care-coord SaaS for advisor-served wealthy aging families. 7+ active repos.

NCEE

Education research org. Interactive blueprint + scorecard system.

totumai

Personality-driven advisor comms. Survey tool + Angela prototype.

National Library of Medicine

PubMed-adjacent AI research, biomedical clinical-trial matching.

Advisorpedia

Financial-advisor digital content platform.

Voltage Control

Facilitation training + cert. AI-tools-meets-IAF practice.

Nxxting

Design + innovation consultancy. AI prototyping partnerships.

Love 4 Dogs

Pet-care startup. GPS, scheduling, photo updates.

BFD internal

Platform, MCP servers, CLI, widget, muster, playbook.

+ next quarter

Every signed client adds another ship. The Admiralty has to absorb them.

Different stacks. Different deploy targets. Different brand worlds. Same flag.

The convoy05 / 20

The classic answer

The shape every web team starts with

The testing pyramid is a great mental model
for one specific kind of failure.

Unit tests
Many · fast · cheap
Integration
Some · medium
Functional
Few · slow
E2E
Smallest · slowest

It assumes one thing: the only way code goes wrong is by failing a test.

Half of what broke during this morning's v3.1 rollout wasn't a test failure. It was a CVE in a dep. A "Looser" word retext-profanities lit up. Five page.waitForURL calls that would hang E2E for 30 seconds each. Unused exports. CSS-bytes drift. Schema drift. Bundle bloat.

A pyramid of tests catches none of these. They aren't tests at all.

The pyramid · and what it misses06 / 20

The shape we actually need

Reframe

The Black Flag Navy.

01 · ADMIRALTY

The Admiral

Sets strategy. Picks which ships sail. Writes the standing orders. The human engineer.

02 · QUARTERDECK

The Captain

Commands one ship. Wears multiple hats throughout the watch. Replaceable, interchangeable, follows orders. The AI agent.

03 · WARDROOM

The Duties

Each layer of the gate is a duty the captain stands. Helmsman, Sailing Master, Lookout, Master at Arms — sixteen of them in our wardroom.

04 · BOUND BOOK

The Standing Orders

The SKILL files. Read on entry, carried on the voyage. The captain's autonomy is bounded by what's written here.

The duties enforce themselves on every captain.

Even one with strong opinions. Even one in a hurry. Even one fresh from another ship. The Admiral never has to police; the duties do.

The Navy · structure07 / 20

Sixteen duties + the Admiral's instruments

The wardroom roster · 16 duties + 1 log

Sixteen Royal Navy ranks and instruments.
Each maps to a specific layer of code-quality work.

NAVIGATION

Helmsman

ESLint + format

NAVIGATION

Sailing Master

TypeScript + contracts

NAVIGATION

Lookout

preflight + JSX prose

NAVIGATION

Pilot

diff-scoped analysis

DISCIPLINE

Master at Arms

policy enforcement

THREATS

Master Gunner

live npm audit

SUPPLY

Purser

Dependabot + baseline

HEALTH

Surgeon

coverage threshold

HEALTH · GAP

Surgeon's Mate

test authorship — unfilled fleet-wide

UPKEEP

Boatswain

Wallace + bundle

STRUCTURE

Carpenter

fallow project graph

RIGGING · GAP

Sailmaker

tokens + contrast — unfilled in 8 of 10

DRILLS

First Lieutenant

CI Test Gate orchestration

INSTRUMENT

The Half-Hour Glass

wall-time budget

INSTRUMENT

The Charts

visual regression baselines

INSTRUMENT · v3.2

Dead Reckoning

PostHog telemetry

Plus The Master's Log — the daily code-quality report PR. The bound book the Admiral reads when each ship returns.

Wardroom roster · 16 duties + 1 log08 / 20

Navigation · 4 of 16

Knowing where you are + what's around

Navigation duties.

Lexical analysis + style

Helmsman

On the ship

Holds the heading set by the Sailing Master. Watches the compass continuously and corrects small drift before it accumulates.

In the code

Catches stylistic and rule-level drift the moment it appears in source. Doesn't let it compound into structural mess.

Type checking + contracts

Sailing Master

On the ship

Knows where the ship is, where it's going, and whether the planned course is feasible. Validates that course-changes are consistent with the chart.

In the code

Validates the shape of the code is internally consistent — types flow, contracts hold, schemas align. Rejects incoherent state transitions before runtime sees them.

Future-failure detection

Lookout

On the ship

Spots threats from far away — other ships, weather, hazards on the horizon — before they're visible from the deck. Provides early warning.

In the code

Detects future runtime failures from current source. Catches the class of bug that would only show up in production or in slow CI feedback.

Differential analysis

Pilot

On the ship

Knows the local waters intimately and cons the ship through known hazards in this specific harbor. Doesn't pretend to know the open ocean.

In the code

Makes local-iteration gates fast by analyzing only what changed in this commit/push. Leaves full-ocean analysis to CI.

Navigation duties09 / 20

Discipline + threats · 3 of 16

Rules and adversaries

Discipline + threats.

Policy enforcement

Master at Arms

On the ship

Enforces the ship's Articles regardless of the violator's rank. Doesn't write the rules — ensures they apply equally to everyone aboard, including senior officers.

In the code

Enforces policy the team has agreed to — hard, with no informal bypass. Branch protection, enforce_admins: true, commitlint, secrets sweep. Even the most senior contributor follows the same gates.

Live CVE scanning

Master Gunner

On the ship

Detects and responds to live external threats. Knows the armament, the powder, the drill. Fires when there's a target.

In the code

Detects newly-disclosed external threats — vulnerabilities published since our last build — and refuses to ship until they're addressed. Live npm audit --json.

Supply-chain ledger

Purser

On the ship

Knows what's in the magazine, the pantry, the rope locker. Tracks expirations, reorder cycles, and accepted-risk inventory the captain has approved.

In the code

Maintains a deliberate inventory of dependencies, replacement schedules, and explicitly-accepted risks. Dependabot grouped weekly PRs + frozen audit baseline.

Cannon kills threats. Purser knows what's in the magazine.

Two duties, two different jobs. The Master Gunner doesn't read the inventory ledger; the Purser doesn't fire on attackers. We need both, separately.

Discipline + threats10 / 20

Health · 2 of 16

The crew's body

Health duties.

Coverage threshold

Surgeon

On the ship

Treats the wounded. Reactive — the Surgeon doesn't prevent injury; he keeps casualties alive and refuses to clear them back to duty until they're stable.

In the code

Refuses to merge code that drops the test-coverage floor. Doesn't grow the suite — gates regression of what's already covered. vitest line/branch/function/statement thresholds.

Test authorship · GAP

Surgeon's Mate

On the ship

The Surgeon's apprentice. Routine sick-bay maintenance, supply prep, vitamin discipline. Keeps the crew healthy enough that the Surgeon's caseload stays manageable.

In the code

Actively grows the test suite. Writes tests where they don't yet exist so coverage rises over time, not just holds steady. Currently empty in 10 of 10 ships.

The Surgeon gates regression. The Surgeon's Mate prevents it by growing the suite.

We have the Surgeon hired everywhere — bfd-platform's coverage frozen at 38 %, aatm-brain's at 45 %. The Mate is the largest open hire across the Navy.

Health duties11 / 20

Maintenance · 3 of 16

The vessel itself

Maintenance duties.

Built-artifact hygiene

Boatswain

On the ship

Daily upkeep of the working surfaces — rigging tension, sail trim, deck condition, ropes, anchors. Keeps the ship moving efficiently through the water.

In the code

Maintains the byte-level health of the shipped artifact — stylesheet size, selector counts, !important counts, bundle bytes. Surface condition of what reaches the user.

Static project-graph

Carpenter

On the ship

Inspects the hull for rot, damage, weak planks, water intrusion. Repairs structural problems before they sink the ship. The bones; the Boatswain handles the skin.

In the code

Detects structural decay — dead exports, circular dependencies, code duplication, broken layer boundaries. Stops rot from compounding into rewrites. Fallow project graph.

Tokens + contrast · GAP

Sailmaker

On the ship

Cuts, sews, and repairs the sails. Sails convert wind into motion — without them the ship can't reliably go anywhere even if everything else is in order.

In the code

Maintains the design-system rigging that makes UI output consistent and accessible to actual users. Hired only on bfd-platform; empty on the other 8 ships with UI.

1,555 open work orders for the Carpenter on bfd-platform.

565 fallow clone groups + 990 health findings, all baselined during v3.1 adoption. Frozen rot — can't get worse — but the rot is real, and burning it down is a separate workstream.

Maintenance duties12 / 20

Drills + records · 5 of 17

Officer + instruments + log

Drills, instruments, and the log.

CI orchestration

First Lieutenant

Ship

Runs the watch bill and timed drills. Discipline of the working crew. A slack First Lieutenant loses battles.

Code

CI Test Gate parallelized — security, vitest, fallow+CSS, E2E. needs: test-gate blocks deploy.

Wall-time budget

Half-Hour Glass

Ship

Times the watches. Bell strikes per turn; eight bells = end of watch. Regardless of weather or mood.

Code

90 s local / 300 s CI ceiling on npm run check. Hard-kill 30 s past. Raising the glass needs a captain's note.

Visual regression

The Charts

Ship

Official drawn record of coastlines, channels, hazards. Updated only at port, only by authorized hand.

Code

Playwright PNG baselines. Each PR diffs against them. Updates require deliberate --update-snapshots.

Behavioral telemetry · v3.2

Dead Reckoning

Ship

Estimating current position from heading × speed × time-elapsed since last fix. Truth between observations.

Code

PostHog event capture, funnels, retention. Real on aatm-brain. v3.2: standardize across the Navy.

Persistent observability

The Master's Log

Ship

Daily official record. Same bound book; new page every day. Read in port to plan the next voyage.

Code

Daily report cron + persistent-PR pattern. peter-evans/create-pull-request@v7. Diff is the metric history.

Drills, instruments, the log13 / 20

When each duty stands watch

The watch bill

Each duty stands a specific watch.
Earlier watches catch cheaper.

Duty	IDEms · on save	Pre-commit~3 s · on commit	Pre-push~70 s · on push	CI Test Gate~3 min · parallel	Crondaily · persistent PR
Helmsman	●	●	●	●
Sailing Master		●	●	●
Lookout			●	●
Pilot		●	●
Master at Arms		●	●	●
Master Gunner			●	●	●
Purser			●	●	●
Surgeon			●	●
Boatswain		if CSS	●	●
Carpenter		●	●	●
Sailmaker			●	●
First Lieutenant				●
Half-Hour Glass			●	●
The Charts				●
Dead Reckoning					v3.2
Master's Log					●

The cheapest watch wins. Catch at IDE → free. Catch at CI → minutes. Catch in production → a deploy and an apology. Every duty is shoved as far left as it'll go.

The watch bill · sequencing14 / 20

The doctrine

The Articles bind everyone — even the Admiral

The ratchet:
tighter is free, loosening is ceremony.

MEASURECapture the current value of every metric. Don't aspire. Whatever the number is today is the freeze point.

FREEZECommit that value as the threshold. Zero headroom. Any increase fails the gate. The build won't go green until the metric returns to the frozen value or below.

RATCHET DOWNWhen a metric improves, edit the threshold to the new lower value. No ceremony, no discussion. The wheel only turns one direction.

RATCHET UPRaising a threshold requires a commit message that explicitly names the new value AND the reason. "Bumped wallace 248833 → 248861 after react 19.2.5" is the format. Public reasoning forces the call to be deliberate.

NEVERNo || true. No informational tier. No --no-verify. If a step is worth running it's worth failing on. Half-on means the next captain copies the half-on pattern into the next ship.

The Articles · the ratchet15 / 20

2026-04-28 rollout

What happened this morning · 10 PRs across the convoy

What the Navy caught.
What it surfaced as gaps.

DutyWhat happenedOutcome

Lookoute2e:preflight caught 5 page.waitForURL calls in aatm-brain missing waitUntil: "commit"5 × 30s hangs prevented

Helmsmanalex flagged "Looser" as profanity (homophone of "loser") in bfd-platform's doctrine docCaught at first push · fixed in c803b8b

Half-Hour Glassbfd-platform full pipeline measured 67.4 s under the 90 s local ceilingConfirmed: gate runs, times, fails fast

Master GunnerLive npm audit found 14 high-severity advisories on bfd-front-door, 1 on bfd-platformTwo open work orders, separate PRs

Master at ArmsDaily-report cron 401'd at PR creation: org-level Actions PR-create permission was offOne-time org setting flip required

First Lieutenantbfd-platform Test Gate E2E flaked through 2 consecutive runs, passed on retry #3Drillmaster slacking · v3.2 work

PilotDiff-scoped lint kept pre-commit at ~3 s across all 11 shipsLocal loop fast, full lint in CI

Carpenter565 fallow clone groups + 990 health findings baselined on bfd-platformFrozen — can't get worse · burndown pending

SailmakerLayers 3–4 (tokens + contrast) found unfilled on 8 of 10 shipsLargest standing gap in the Navy

Master's LogDaily-report workflow live on 10 of 10 ships, staggered cron 06:00–15:00 UTCLogs ready · waiting on org perm

10/10PRs merged on main

9/10Production deploys green

7Real issues caught mid-flight

What the Navy caught on 2026-04-2816 / 20

What v3.2 needs to staff

Empty seats in the wardroom

The hires the Navy still needs to make.

Surgeon's Mate · 10 of 10 ships

Test authorship

Bring coverage from 38 % → 80 % on bfd-platform, 45 % → 80 % on aatm-brain, freeze at meaningful floors on the other 8. The Surgeon refuses regression but nobody is currently growing the suite. Largest single hire across the Navy.

Sailmaker · 8 of 10 ships

Tokens + contrast

Stand up token systems on aatm-brain, ncee, front-door, mcp, playbook, style-guide, cli, widget, muster. Layers 3 (theme) and 4 (contrast) currently sit empty everywhere except bfd-platform. Until we hire here, every UI surface is hand-stitched canvas.

First Lieutenant · partial · bfd-platform

Drill reliability

Two consecutive Test Gate runs on bfd-platform failed flaky E2E suites; the third retry passed. Reliable drills require deterministic tests. Quarantine the flakes or fix them — but don't keep retrying.

Dead Reckoning · v3.2 standard candidate

Behavioral telemetry

PostHog instrumentation real today only on aatm-brain. v3.2 proposes an analytics:check gate verifying touched user-flow components include posthog.capture() calls + headline metrics in the daily Master's Log. The Reckoning is the truth between fixed observations.

Open hires · v3.217 / 20

Adoption playbook

Admiralty procedure for commissioning

When client #11 lands —
how the team commissions a new ship.

STEP 1

Read the standing orders

~/.claude/skills/code-quality-setup/SKILL.md is symlinked into Claude Code, Codex, and Cursor. Every captain reads it before generating code in the new repo.

STEP 2

Commission the ship

Drop the per-repo SKILL template at .cursor/skills/code-quality/SKILL.md. Fill in which duties are filled, pending, or N/A.

STEP 3

Hire the wardroom

In priority order: Helmsman + Sailing Master → Half-Hour Glass + Master's Log → Pilot → Master at Arms → Surgeon → Boatswain + Carpenter → Master Gunner + Purser → Lookout → Sailmaker → First Lieutenant → Charts → Dead Reckoning.

STEP 4

Add to the convoy register

Update ~/.claude/skills/code-quality-setup/per-repo/INDEX.md — name, cron hour, filled vs. pending duties. The convoy register is the Admiralty's roster.

STEP 5

Sail

Open the iter-N PR. Each layer commit gets its own message. Gaps land documented, not pretended-shipped. PR description names every filled duty + every documented gap.

The team's responsibility on every new repo: walk steps 1–5.

Don't skip step 2. The captain reads the per-repo SKILL on subsequent visits — an undocumented ship is one the next agent will misunderstand.

Commissioning a new ship18 / 20

What I'm asking from you

From the Admiralty

Four things from the team.

1. Read the standing orders.

Before writing code on a repo, find ~/.claude/skills/code-quality-setup/SKILL.md. Symlinked into all three agents. Find an answer before asking me.

2. Use the templates.

Every file the stack needs has a copy-paste template in the standing orders. Don't reinvent the Half-Hour Glass. Don't fork the Master's Log generator. Use what's there.

3. Push back on the doctrine.

If a duty is hurting more than it helps, say so. The ratchet only works if the floors match the actual ship. We tighten together. The Articles are revised in port, not at sea.

4. Hire the open positions.

Surgeon's Mate fleet-wide. Sailmaker on 8 of 10 ships. Drillmaster (First Lieutenant) on bfd-platform. Pick a ship, pick a duty, hire one this quarter.

The asks19 / 20

Back to the Admiralty

If we write the standing orders well, the duties enforce themselves on every captain — and no captain can do very wrong if the gates pass.

That's what we built on 2026-04-28. Eleven ships under one flag, one wardroom of duties, sixteen seats — three of them still empty. The Navy is real; the gaps are documented; the persistent log fires daily.

Nelson didn't win Trafalgar because his captains went off-script. He won it because his standing orders were detailed enough that they didn't have to ask.

Thanks.
Questions?

The Black Flag Navy · close20 / 20

The BlackFlag Navy.

A First Lord at the Admiralty couldn't see his fleetfor six to eighteen months.

1805

2026

Black Flag is a design + AI consultancy.We build software our clients couldn't.

Design + product engineering for ambitious clients.

AI agents are first-class teammates.

Agents don't get tired of bad habits.

Every line of code we wrote today is on its wayto one of these.

The testing pyramid is a great mental modelfor one specific kind of failure.

The Black Flag Navy.

The Admiral

The Captain

The Duties

The Standing Orders

The duties enforce themselves on every captain.

Sixteen Royal Navy ranks and instruments.Each maps to a specific layer of code-quality work.

Navigation duties.

Discipline + threats.

Cannon kills threats. Purser knows what's in the magazine.

Health duties.

The Surgeon gates regression. The Surgeon's Mate prevents it by growing the suite.

Maintenance duties.

1,555 open work orders for the Carpenter on bfd-platform.

Drills, instruments, and the log.

Each duty stands a specific watch.Earlier watches catch cheaper.

The ratchet:tighter is free, loosening is ceremony.

What the Navy caught.What it surfaced as gaps.

The hires the Navy still needs to make.

When client #11 lands —how the team commissions a new ship.

Read the standing orders

Commission the ship

Hire the wardroom

Add to the convoy register

Sail

The team's responsibility on every new repo: walk steps 1–5.

Four things from the team.

1. Read the standing orders.

2. Use the templates.

3. Push back on the doctrine.

4. Hire the open positions.

If we write the standing orders well, the duties enforce themselves on every captain — and no captain can do very wrong if the gates pass.

Thanks.Questions?

The Black
Flag Navy.

A First Lord at the Admiralty couldn't see his fleet
for six to eighteen months.

Black Flag is a design + AI consultancy.
We build software our clients couldn't.

Every line of code we wrote today is on its way
to one of these.

The testing pyramid is a great mental model
for one specific kind of failure.

Sixteen Royal Navy ranks and instruments.
Each maps to a specific layer of code-quality work.

Each duty stands a specific watch.
Earlier watches catch cheaper.

The ratchet:
tighter is free, loosening is ceremony.

What the Navy caught.
What it surfaced as gaps.

When client #11 lands —
how the team commissions a new ship.

Thanks.
Questions?