Black Flag Design
Tech talk · 30 min
Code quality at scale

The Black
Flag Navy.

Standing orders for a fleet of agent captains.

Keith Pattison · Black Flag Design v3.1 · 2026-04-28
The Black Flag Navy · BFD code quality v3.101 / 20
Black Flag Design
The frame
221 years before this rollout
In case signals can neither be seen nor perfectly understood, no Captain can do very wrong if he places his ship alongside that of an Enemy.
— Vice-Admiral Lord Horatio Nelson
Trafalgar Memorandum, 9 October 1805
Written twelve days before Trafalgar, as the standing-orders document for his captains.
Vice-Admiral Lord Horatio Nelson, by Lemuel Francis Abbott (c. 1798)
Trafalgar Memorandum · 180502 / 20
Black Flag Design
Why this analogy is the right one
The 220-year-old horizon problem

A First Lord at the Admiralty couldn't see his fleet
for six to eighteen months.

1805

A frigate sails for the West Indies. Twelve weeks each way, plus station time. The Admiral won't see that ship again for half a year. No telegraph. No radio. Once the captain clears the Channel, he's on his own.

The Admiral has four tools to bind him:

  • The Articles of War — punishable regardless of context
  • Standing orders — read in port, carried on the voyage
  • A trained crew — every warrant officer drilled the same way
  • The ship's log — read on return, the only audit trail there is

Nelson's genius: write everything down in advance, trust the captains. He called them a Band of Brothers — and made sure they didn't need micromanagement, because it wasn't possible.

2026

A senior engineer hands a task to an agent — Cursor, Claude Code, Codex — and walks away. For minutes-to-hours-to-days, no real-time visibility. No telegraph here either.

The same four tools, modernized:

  • The Articles → the doctrine (the ratchet rule, no --no-verify)
  • Standing orders → the SKILL files, read on entry to every repo
  • A trained crew → the wardroom of duties wired into every gate
  • The ship's log → the daily code-quality report PR

Same horizon problem. Same solution. The Band of Brothers is software now, and the Admiralty is one person with eleven ships at sea.

This talk is the contemporary answer to a 220-year-old problem: how an Admiral keeps a fleet honest without being on every quarterdeck.

The Admiral-at-home problem · 1805 ↔ 202603 / 20
Black Flag Design
Setting the stage
Who we are

Black Flag is a design + AI consultancy.
We build software our clients couldn't.

Half the team designs. The other half writes code with AI agents. The product is the merger of the two: fast, well-branded, fully-tested software standing up in days instead of quarters.

01 · WHAT

Design + product engineering for ambitious clients.

Brand systems, UX, full-stack code. We ship the whole thing — site, app, integrations, deploys.

02 · HOW

AI agents are first-class teammates.

Cursor, Claude Code, Codex. They write the bulk of the code. We review, shape, gate, ship.

03 · WHY THIS TALK

Agents don't get tired of bad habits.

If the standing orders aren't right, the agents will happily generate ten thousand lines of code we wouldn't be proud of.

What BFD ships04 / 20
Black Flag Design
Eleven ships under one flag
10 clients · 11+ active product repos

Every line of code we wrote today is on its way
to one of these.

Healthspan Wealth
Care-coord SaaS for advisor-served wealthy aging families. 7+ active repos.
NCEE
Education research org. Interactive blueprint + scorecard system.
totumai
Personality-driven advisor comms. Survey tool + Angela prototype.
National Library of Medicine
PubMed-adjacent AI research, biomedical clinical-trial matching.
Advisorpedia
Financial-advisor digital content platform.
Voltage Control
Facilitation training + cert. AI-tools-meets-IAF practice.
Nxxting
Design + innovation consultancy. AI prototyping partnerships.
Love 4 Dogs
Pet-care startup. GPS, scheduling, photo updates.
BFD internal
Platform, MCP servers, CLI, widget, muster, playbook.
+ next quarter
Every signed client adds another ship. The Admiralty has to absorb them.

Different stacks. Different deploy targets. Different brand worlds. Same flag.

The convoy05 / 20
Black Flag Design
The classic answer
The shape every web team starts with

The testing pyramid is a great mental model
for one specific kind of failure.

Unit tests
Many · fast · cheap
Integration
Some · medium
Functional
Few · slow
E2E
Smallest · slowest

It assumes one thing: the only way code goes wrong is by failing a test.

Half of what broke during this morning's v3.1 rollout wasn't a test failure. It was a CVE in a dep. A "Looser" word retext-profanities lit up. Five page.waitForURL calls that would hang E2E for 30 seconds each. Unused exports. CSS-bytes drift. Schema drift. Bundle bloat.

A pyramid of tests catches none of these. They aren't tests at all.

The pyramid · and what it misses06 / 20
Black Flag Design
The shape we actually need
Reframe

The Black Flag Navy.

01 · ADMIRALTY

The Admiral

Sets strategy. Picks which ships sail. Writes the standing orders. The human engineer.

02 · QUARTERDECK

The Captain

Commands one ship. Wears multiple hats throughout the watch. Replaceable, interchangeable, follows orders. The AI agent.

03 · WARDROOM

The Duties

Each layer of the gate is a duty the captain stands. Helmsman, Sailing Master, Lookout, Master at Arms — sixteen of them in our wardroom.

04 · BOUND BOOK

The Standing Orders

The SKILL files. Read on entry, carried on the voyage. The captain's autonomy is bounded by what's written here.

The duties enforce themselves on every captain.

Even one with strong opinions. Even one in a hurry. Even one fresh from another ship. The Admiral never has to police; the duties do.

The Navy · structure07 / 20
Black Flag Design
Sixteen duties + the Admiral's instruments
The wardroom roster · 16 duties + 1 log

Sixteen Royal Navy ranks and instruments.
Each maps to a specific layer of code-quality work.

NAVIGATION
Helmsman
ESLint + format
NAVIGATION
Sailing Master
TypeScript + contracts
NAVIGATION
Lookout
preflight + JSX prose
NAVIGATION
Pilot
diff-scoped analysis
DISCIPLINE
Master at Arms
policy enforcement
THREATS
Master Gunner
live npm audit
SUPPLY
Purser
Dependabot + baseline
HEALTH
Surgeon
coverage threshold
HEALTH · GAP
Surgeon's Mate
test authorship — unfilled fleet-wide
UPKEEP
Boatswain
Wallace + bundle
STRUCTURE
Carpenter
fallow project graph
RIGGING · GAP
Sailmaker
tokens + contrast — unfilled in 8 of 10
DRILLS
First Lieutenant
CI Test Gate orchestration
INSTRUMENT
The Half-Hour Glass
wall-time budget
INSTRUMENT
The Charts
visual regression baselines
INSTRUMENT · v3.2
Dead Reckoning
PostHog telemetry

Plus The Master's Log — the daily code-quality report PR. The bound book the Admiral reads when each ship returns.

Wardroom roster · 16 duties + 1 log08 / 20
Black Flag Design
Navigation · 4 of 16
Knowing where you are + what's around

Navigation duties.

Lexical analysis + style
Helmsman
On the ship

Holds the heading set by the Sailing Master. Watches the compass continuously and corrects small drift before it accumulates.

In the code

Catches stylistic and rule-level drift the moment it appears in source. Doesn't let it compound into structural mess.

Type checking + contracts
Sailing Master
On the ship

Knows where the ship is, where it's going, and whether the planned course is feasible. Validates that course-changes are consistent with the chart.

In the code

Validates the shape of the code is internally consistent — types flow, contracts hold, schemas align. Rejects incoherent state transitions before runtime sees them.

Future-failure detection
Lookout
On the ship

Spots threats from far away — other ships, weather, hazards on the horizon — before they're visible from the deck. Provides early warning.

In the code

Detects future runtime failures from current source. Catches the class of bug that would only show up in production or in slow CI feedback.

Differential analysis
Pilot
On the ship

Knows the local waters intimately and cons the ship through known hazards in this specific harbor. Doesn't pretend to know the open ocean.

In the code

Makes local-iteration gates fast by analyzing only what changed in this commit/push. Leaves full-ocean analysis to CI.

Navigation duties09 / 20
Black Flag Design
Discipline + threats · 3 of 16
Rules and adversaries

Discipline + threats.

Policy enforcement
Master at Arms
On the ship

Enforces the ship's Articles regardless of the violator's rank. Doesn't write the rules — ensures they apply equally to everyone aboard, including senior officers.

In the code

Enforces policy the team has agreed to — hard, with no informal bypass. Branch protection, enforce_admins: true, commitlint, secrets sweep. Even the most senior contributor follows the same gates.

Live CVE scanning
Master Gunner
On the ship

Detects and responds to live external threats. Knows the armament, the powder, the drill. Fires when there's a target.

In the code

Detects newly-disclosed external threats — vulnerabilities published since our last build — and refuses to ship until they're addressed. Live npm audit --json.

Supply-chain ledger
Purser
On the ship

Knows what's in the magazine, the pantry, the rope locker. Tracks expirations, reorder cycles, and accepted-risk inventory the captain has approved.

In the code

Maintains a deliberate inventory of dependencies, replacement schedules, and explicitly-accepted risks. Dependabot grouped weekly PRs + frozen audit baseline.

Cannon kills threats. Purser knows what's in the magazine.

Two duties, two different jobs. The Master Gunner doesn't read the inventory ledger; the Purser doesn't fire on attackers. We need both, separately.

Discipline + threats10 / 20
Black Flag Design
Health · 2 of 16
The crew's body

Health duties.

Coverage threshold
Surgeon
On the ship

Treats the wounded. Reactive — the Surgeon doesn't prevent injury; he keeps casualties alive and refuses to clear them back to duty until they're stable.

In the code

Refuses to merge code that drops the test-coverage floor. Doesn't grow the suite — gates regression of what's already covered. vitest line/branch/function/statement thresholds.

Test authorship · GAP
Surgeon's Mate
On the ship

The Surgeon's apprentice. Routine sick-bay maintenance, supply prep, vitamin discipline. Keeps the crew healthy enough that the Surgeon's caseload stays manageable.

In the code

Actively grows the test suite. Writes tests where they don't yet exist so coverage rises over time, not just holds steady. Currently empty in 10 of 10 ships.

The Surgeon gates regression. The Surgeon's Mate prevents it by growing the suite.

We have the Surgeon hired everywhere — bfd-platform's coverage frozen at 38 %, aatm-brain's at 45 %. The Mate is the largest open hire across the Navy.

Health duties11 / 20
Black Flag Design
Maintenance · 3 of 16
The vessel itself

Maintenance duties.

Built-artifact hygiene
Boatswain
On the ship

Daily upkeep of the working surfaces — rigging tension, sail trim, deck condition, ropes, anchors. Keeps the ship moving efficiently through the water.

In the code

Maintains the byte-level health of the shipped artifact — stylesheet size, selector counts, !important counts, bundle bytes. Surface condition of what reaches the user.

Static project-graph
Carpenter
On the ship

Inspects the hull for rot, damage, weak planks, water intrusion. Repairs structural problems before they sink the ship. The bones; the Boatswain handles the skin.

In the code

Detects structural decay — dead exports, circular dependencies, code duplication, broken layer boundaries. Stops rot from compounding into rewrites. Fallow project graph.

Tokens + contrast · GAP
Sailmaker
On the ship

Cuts, sews, and repairs the sails. Sails convert wind into motion — without them the ship can't reliably go anywhere even if everything else is in order.

In the code

Maintains the design-system rigging that makes UI output consistent and accessible to actual users. Hired only on bfd-platform; empty on the other 8 ships with UI.

1,555 open work orders for the Carpenter on bfd-platform.

565 fallow clone groups + 990 health findings, all baselined during v3.1 adoption. Frozen rot — can't get worse — but the rot is real, and burning it down is a separate workstream.

Maintenance duties12 / 20
Black Flag Design
Drills + records · 5 of 17
Officer + instruments + log

Drills, instruments, and the log.

CI orchestration
First Lieutenant
Ship

Runs the watch bill and timed drills. Discipline of the working crew. A slack First Lieutenant loses battles.

Code

CI Test Gate parallelized — security, vitest, fallow+CSS, E2E. needs: test-gate blocks deploy.

Wall-time budget
Half-Hour Glass
Ship

Times the watches. Bell strikes per turn; eight bells = end of watch. Regardless of weather or mood.

Code

90 s local / 300 s CI ceiling on npm run check. Hard-kill 30 s past. Raising the glass needs a captain's note.

Visual regression
The Charts
Ship

Official drawn record of coastlines, channels, hazards. Updated only at port, only by authorized hand.

Code

Playwright PNG baselines. Each PR diffs against them. Updates require deliberate --update-snapshots.

Behavioral telemetry · v3.2
Dead Reckoning
Ship

Estimating current position from heading × speed × time-elapsed since last fix. Truth between observations.

Code

PostHog event capture, funnels, retention. Real on aatm-brain. v3.2: standardize across the Navy.

Persistent observability
The Master's Log
Ship

Daily official record. Same bound book; new page every day. Read in port to plan the next voyage.

Code

Daily report cron + persistent-PR pattern. peter-evans/create-pull-request@v7. Diff is the metric history.

Drills, instruments, the log13 / 20
Black Flag Design
When each duty stands watch
The watch bill

Each duty stands a specific watch.
Earlier watches catch cheaper.

Duty IDEms · on save Pre-commit~3 s · on commit Pre-push~70 s · on push CI Test Gate~3 min · parallel Crondaily · persistent PR
Helmsman
Sailing Master
Lookout
Pilot
Master at Arms
Master Gunner
Purser
Surgeon
Boatswainif CSS
Carpenter
Sailmaker
First Lieutenant
Half-Hour Glass
The Charts
Dead Reckoningv3.2
Master's Log

The cheapest watch wins. Catch at IDE → free. Catch at CI → minutes. Catch in production → a deploy and an apology. Every duty is shoved as far left as it'll go.

The watch bill · sequencing14 / 20
Black Flag Design
The doctrine
The Articles bind everyone — even the Admiral

The ratchet:
tighter is free, loosening is ceremony.

MEASURECapture the current value of every metric. Don't aspire. Whatever the number is today is the freeze point.
FREEZECommit that value as the threshold. Zero headroom. Any increase fails the gate. The build won't go green until the metric returns to the frozen value or below.
RATCHET DOWNWhen a metric improves, edit the threshold to the new lower value. No ceremony, no discussion. The wheel only turns one direction.
RATCHET UPRaising a threshold requires a commit message that explicitly names the new value AND the reason. "Bumped wallace 248833 → 248861 after react 19.2.5" is the format. Public reasoning forces the call to be deliberate.
NEVERNo || true. No informational tier. No --no-verify. If a step is worth running it's worth failing on. Half-on means the next captain copies the half-on pattern into the next ship.
The Articles · the ratchet15 / 20
Black Flag Design
2026-04-28 rollout
What happened this morning · 10 PRs across the convoy

What the Navy caught.
What it surfaced as gaps.

DutyWhat happenedOutcome
Lookoute2e:preflight caught 5 page.waitForURL calls in aatm-brain missing waitUntil: "commit"5 × 30s hangs prevented
Helmsmanalex flagged "Looser" as profanity (homophone of "loser") in bfd-platform's doctrine docCaught at first push · fixed in c803b8b
Half-Hour Glassbfd-platform full pipeline measured 67.4 s under the 90 s local ceilingConfirmed: gate runs, times, fails fast
Master GunnerLive npm audit found 14 high-severity advisories on bfd-front-door, 1 on bfd-platformTwo open work orders, separate PRs
Master at ArmsDaily-report cron 401'd at PR creation: org-level Actions PR-create permission was offOne-time org setting flip required
First Lieutenantbfd-platform Test Gate E2E flaked through 2 consecutive runs, passed on retry #3Drillmaster slacking · v3.2 work
PilotDiff-scoped lint kept pre-commit at ~3 s across all 11 shipsLocal loop fast, full lint in CI
Carpenter565 fallow clone groups + 990 health findings baselined on bfd-platformFrozen — can't get worse · burndown pending
SailmakerLayers 3–4 (tokens + contrast) found unfilled on 8 of 10 shipsLargest standing gap in the Navy
Master's LogDaily-report workflow live on 10 of 10 ships, staggered cron 06:00–15:00 UTCLogs ready · waiting on org perm
10/10PRs merged on main
9/10Production deploys green
7Real issues caught mid-flight
What the Navy caught on 2026-04-2816 / 20
Black Flag Design
What v3.2 needs to staff
Empty seats in the wardroom

The hires the Navy still needs to make.

Surgeon's Mate · 10 of 10 ships
Test authorship

Bring coverage from 38 % → 80 % on bfd-platform, 45 % → 80 % on aatm-brain, freeze at meaningful floors on the other 8. The Surgeon refuses regression but nobody is currently growing the suite. Largest single hire across the Navy.

Sailmaker · 8 of 10 ships
Tokens + contrast

Stand up token systems on aatm-brain, ncee, front-door, mcp, playbook, style-guide, cli, widget, muster. Layers 3 (theme) and 4 (contrast) currently sit empty everywhere except bfd-platform. Until we hire here, every UI surface is hand-stitched canvas.

First Lieutenant · partial · bfd-platform
Drill reliability

Two consecutive Test Gate runs on bfd-platform failed flaky E2E suites; the third retry passed. Reliable drills require deterministic tests. Quarantine the flakes or fix them — but don't keep retrying.

Dead Reckoning · v3.2 standard candidate
Behavioral telemetry

PostHog instrumentation real today only on aatm-brain. v3.2 proposes an analytics:check gate verifying touched user-flow components include posthog.capture() calls + headline metrics in the daily Master's Log. The Reckoning is the truth between fixed observations.

Open hires · v3.217 / 20
Black Flag Design
Adoption playbook
Admiralty procedure for commissioning

When client #11 lands —
how the team commissions a new ship.

STEP 1

Read the standing orders

~/.claude/skills/code-quality-setup/SKILL.md is symlinked into Claude Code, Codex, and Cursor. Every captain reads it before generating code in the new repo.

STEP 2

Commission the ship

Drop the per-repo SKILL template at .cursor/skills/code-quality/SKILL.md. Fill in which duties are filled, pending, or N/A.

STEP 3

Hire the wardroom

In priority order: Helmsman + Sailing Master → Half-Hour Glass + Master's Log → Pilot → Master at Arms → Surgeon → Boatswain + Carpenter → Master Gunner + Purser → Lookout → Sailmaker → First Lieutenant → Charts → Dead Reckoning.

STEP 4

Add to the convoy register

Update ~/.claude/skills/code-quality-setup/per-repo/INDEX.md — name, cron hour, filled vs. pending duties. The convoy register is the Admiralty's roster.

STEP 5

Sail

Open the iter-N PR. Each layer commit gets its own message. Gaps land documented, not pretended-shipped. PR description names every filled duty + every documented gap.

The team's responsibility on every new repo: walk steps 1–5.

Don't skip step 2. The captain reads the per-repo SKILL on subsequent visits — an undocumented ship is one the next agent will misunderstand.

Commissioning a new ship18 / 20
Black Flag Design
What I'm asking from you
From the Admiralty

Four things from the team.

1. Read the standing orders.

Before writing code on a repo, find ~/.claude/skills/code-quality-setup/SKILL.md. Symlinked into all three agents. Find an answer before asking me.

2. Use the templates.

Every file the stack needs has a copy-paste template in the standing orders. Don't reinvent the Half-Hour Glass. Don't fork the Master's Log generator. Use what's there.

3. Push back on the doctrine.

If a duty is hurting more than it helps, say so. The ratchet only works if the floors match the actual ship. We tighten together. The Articles are revised in port, not at sea.

4. Hire the open positions.

Surgeon's Mate fleet-wide. Sailmaker on 8 of 10 ships. Drillmaster (First Lieutenant) on bfd-platform. Pick a ship, pick a duty, hire one this quarter.

The asks19 / 20
Black Flag Design
Close
Back to the Admiralty

If we write the standing orders well, the duties enforce themselves on every captain — and no captain can do very wrong if the gates pass.

That's what we built on 2026-04-28. Eleven ships under one flag, one wardroom of duties, sixteen seats — three of them still empty. The Navy is real; the gaps are documented; the persistent log fires daily.

Nelson didn't win Trafalgar because his captains went off-script. He won it because his standing orders were detailed enough that they didn't have to ask.

Thanks.
Questions?

The Black Flag Navy · close20 / 20