AI Engineering Governance · 2026

The End of Naked AI Coding: Accelerating Multi-Agent Development with a Deterministic Safety Brake

When N Agents Concurrently Mutate a Single Repo: How to Leverage the Atomic Foundry and 5D CID to Forge a Rock-Solid System Governance Foundation.

by Eagl Huang·June 4, 2026·22 min read

Picture your repo as a foundry. Every piece of code an AI writes gets a name (its 5D CID), a SHA256 evidence packet you can replay months later, and a seven-stage trip from ORIENT to CLOSURE. What rolls off the line is a cell with a name, a contract, audit trail, and a slot ready to be reused elsewhere. With this brake in place, you can run five agents in parallel and the repo still holds together.

7-stage lifecycleORIENT → NEXT → LOCK → EXECUTE → EVIDENCE → HANDOFF → CLOSURE. Each stage is doctor-detectable; you can't skip stages.

5D CID fingerprintStrict / Interface / Effects / Semantic / Behavior. Only Interface ships today; the rest are roadmap, and non-deterministic signals never enter the identity hash.

closure-packet.v1 evidence chainEvery close writes a replay-grade packet with commandRuns SHA256, targetCommit, and governedTreeSha. The schema is live.

TypeScript · Node ≥20 5 active atoms 7 INV-ATM red lines 11 behavior families Charter v2.0.0

1. The walls you hit when many AIs build one project at once

Around day thirty you realize that babysitting three agents on your repo is more tiring than just writing it yourself.

Fig. 1: Three agents can't see each other's context, so each writes a near-identical retry helper — different names, same meaning. That is the "reinvented wheel" wall, and nobody has the global view to notice it.

Coding with a single AI assistant feels great at first. It fills in files, writes tests, drafts your README — progress without typing. Then you open a few more: one agent on this task, another on that one, a third opening a PR in your editor. You expect 3× the speed; you hit four walls.

Wall one: reinvented work. Agent A writes an HTTP retry helper. Three weeks later Agent B writes an almost identical one in another feature. Two weeks after that, Agent C writes a third. Each agent only sees its own slice, so nobody notices "this already exists somewhere."

Wall two: scope collisions. Three agents edit one repo, and no mechanism tells them "someone is already touching this file" or "this write permission only goes to one of you this round." You find out at git merge — far too late.

Wall three: slow structural decay. Every feature is correct, yet six months later files keep swelling: tasks.ts grows from 300 lines to 1,500, and nobody watches for "this file should be split" or "twelve of these thirty atoms are doing the same thing." Each part is fine; the whole is quietly rotting. This is structural entropy — code review can't catch it, it only shows up over time.

Wall four is the subtlest — no identity. However beautifully it's written, every snippet an AI produces is an anonymous string living inside one project's git history. A semantically identical helper in the project next door has no way to find it, and the next new project will reinvent it again. Every piece of code is an orphan.

This article is how I answer those four walls. The core is one metaphor: treat the whole repo as a foundry. Every casting that comes off the line is a code cell with an ID card — name, fingerprint, status, evidence. It can be governed, audited, and reassembled later.

2. The atomic foundry: turning code into cells with identity

The word "atom" has a specific meaning here. It's not a task, not a commit, not a prompt. It's a code cell with a spec:

It has a name: every atom has a unique ID, e.g. ATM-MAP-0042
It has a semantic fingerprint: input signature + output signature + invariants are normalized and SHA256-hashed — that is the CID
It has a version and a status: active / deprecated / expired / quarantined
It has a caller graph: who calls it and what it calls are both recorded
It has an evidence history: every change leaves a SHA256 record
It grows: it can split, merge, evolve, or go dormant (sweep)

Here's what an atom actually looks like in the registry — a structured record with fields, a fingerprint, and a history:

{
  "atomId": "ATM-MAP-0042",
  "name": "idempotent-http-retry",
  "status": "active",
  "semanticFingerprint": "cid:a3f9c2e81b40",   // ← this is its ID card
  "inputs":  ["url: string", "opts: RetryOpts"],
  "outputs": ["Promise<Response>"],
  "invariants": ["same idempotencyKey => at most one effect"],
  "callers": ["ATM-MAP-0017", "ATM-MAP-0031"],   // who uses it
  "bornBy": "behavior.atomize",
  "evidenceRefs": ["TASK-AAO-0042.closure-packet.json"]
}

Fig. 2: Every atom is a little cell wearing an ID badge. The cid: on the badge is its semantic fingerprint — the card that lets it be queried, deduplicated, and rediscovered by future projects.

The foundry metaphor is precise. A factory ships finished assemblies; a foundry ships castings, each stamped with a serial number — traceable on its own, with its own recipe, ready to flow into the next product. Code should work the same way.

Once your repo enters this mode, an AI agent's job becomes very simple: "cast a new atom" or "run a governed transformation on an existing one." It can no longer drop floating snippets, because the foundry only accepts things with serial numbers.

3. Above atoms there's the map: a governance replacement surface for legacy modules

You might be wondering: "OK, every piece of code becomes an ID-bearing cell — but what about that 1500-line legacy checkout.py? If I shred it into 30 atoms, am I done?" Almost, but not quite. Atoms own capability; composition semantics belong to the map.

An atom is a cell; a map is an organ. Even a perfectly shredded legacy module needs a single thing that can stand up and say "I'm the new checkout, the old one can retire now." That thing is the map.

3.1 Why atom maps exist

Slicing finely (§2) comes with a price: the integrated entry point is gone. A checkout flow turned into 30 atoms still owes you these answers:

Which atom is the new entry point? Which atoms play entry-adapter, validator, side-effect-adapter?
Where do data-flow / control-flow / validation edges run between atoms?
Does it actually behave like the old checkout.py? Which known divergences are acceptable?
If something goes wrong, is there a clean rollback path back to legacy?

A single atom can answer none of these, so ATM adds a layer above atoms — the atomic map. Its schema is atomic-map.schema.json, and it lives parallel to atoms at atomic_workbench/maps/<mapId>/map.spec.json.

3.2 The four replacement contracts

A real map goes beyond "a list of members plus edges." It promises four things — together they form the governance replacement surface:

Structural semantics

members[].role (entry-adapter / validator / side-effect-adapter…) and edges[].edgeKind (data-flow / control-flow / validation). No role + edgeKind, no promotion.

Replacement declaration

replacement.legacyUris[] explicitly enumerates what is being replaced, e.g. legacy://app/checkout.py#L1-L1500. Without this field the map stays in draft forever.

Verifiable equivalence

A map-equivalence-report proves the map behaves like the legacy entry. Known divergences are allowed — but only if explicitly enumerated in knownDivergences[].

Safe exit

Reaching legacy-retired requires a rollback-proof or retirement-proof. No reverse path, no retirement.

3.3 Five-stage replacement lifecycle: draft → shadow → canary → active → legacy-retired

Replacing legacy is never a single switch flip. It is a gated five-stage pipeline; every transition is guarded by police + evidence:

Figure 3: The atom map's five-stage replacement lifecycle. Legacy is never cut over in one step — it is first shadowed for comparison, trialed in canary, promoted to active only once equivalence is proven, and finally retired only with a rollback proof in hand.

3.4 Three data tiers: what's disposable, what's load-bearing

The v2 plan (map-replacement-protocol v2-r2) splits everything into three tiers, each with its own tolerance rule:

Tier	Examples	Rule
Source-of-truth	atom source, `map.spec.json`, capsule manifest	Must be verifiable and rebuildable; never overwritten by cache
Derived state	registry, Mermaid diagrams, health reports	Deletable and rebuildable; must carry `generatedAt`
Local volatile	daemon PID, guide cache	Off-git by default; safe to delete on corruption

Combined with M26 Rescue Police and M27 Disaster Recovery CLI (two task cards in map-replacement-protocol), this guarantees a golden property: as long as source-of-truth survives, every registry and derived report can be rebuilt from scratch. Even after a botched retirement or corrupted derived data, you still have a road back.

So atom maps are where ATM truly handles "legacy module."^* Atoms (§2) tell you what a piece of code is; maps tell you how a chunk of legacy gets replaced in an orderly, evidence-backed, rollback-safe way. Without the map, you've only pinned cells to a wall. With the map, you have an organ you can actually transplant.

4. CID conflict adjudication: the key that actually lets two agents work in parallel

By now we have ID-bearing cells (atoms) and contract-bound organs (maps). One key question in real multi-agent work still stands: given two task cards, how does the framework decide whether they can run in parallel?

Git solves the physical conflict: you and I edited the same line. CID solves the logical conflict: you and I touched the same contract fingerprint. They work together; you want both.

4.1 Why "same file blocks" is the wrong question

Traditional collaboration tools detect conflicts on one signal: did two people touch the same file? Good enough for a pair of humans in an IDE. With five agents in one monorepo it goes wrong in two ways:

False positives: two agents edit different functions inside tasks.ts — logically independent, yet serialized into "same file, must wait."
False negatives: two agents touch different files but both change the public contract of a shared generator or validator. Git sees nothing wrong; deployment blows up.

ATM's move: treat "file overlap" as a secondary signal and put "CID match" in front as the primary one. Same file is a secondary clue; the real conflict is when both sides touch the same CID.

4.2 Eight verdicts

Given tasks A and B, the parallel conflict advisor emits one of eight verdicts (ordered by severity):

Verdict	Meaning	Action
`parallel-safe`	Different CIDs, no file overlap, no shared surface	✅ Both agents work independently; framework stays out of the way
`needs-physical-split`	Different CIDs but overlapping file ranges	🔄 Parallel-OK; routed to deterministic composer or Neutral Write Steward
`blocked-cid-conflict`	Same `atom_cid`	❌ Real semantic conflict — serialize or escalate to Captain
`blocked-shared-generator`	Same code generator	❌ Split the atom or serialize
`blocked-shared-validator`	Same validator	❌ Same as above
`blocked-shared-projection`	Same view / index	❌ Same as above
`blocked-shared-artifact`	Same output artifact	❌ Same as above
`blocked-active-lease`	Already held by an active lease	❌ Wait for release or change scope

Figure 4: The advisor's three main paths. The middle one, needs-physical-split, is where CID gets smarter than "same file = locked" — the physical layer goes to Git, the logical layer goes to CID, and the two tracks run in parallel.

4.3 Not just "blocked" — also "here's how to split"

When the verdict is blocked-cid-conflict or needs-physical-split, the advisor doesn't just say ❌ and walk away. It emits a merge plan that contains:

Reason: which CID, which shared generator, which artifact actually collides
Recommended lane: deterministic composer (ranges are clear, tooling can merge), Neutral Write Steward (both sides have semantic judgment, need a neutral writer), or shared-surface guard (the surface itself needs splitting first)
Hotspot report: how many times this same conflict point has shown up across recent tasks — the higher the count, the more urgent it becomes to split that atom or that file
Required evidence: which validators must pass before close after the merge

The design philosophy: send out the traffic radar first, dispatch the merge mechanic only when two cars actually want the same lane. The framework lets tasks run freely most of the time, escalating to serialization or arbitration only when a real CID collision shows up.

The key insight: this is a logical-layer solution, with Git owning the physical layer. Git takes care of history, branches, merges, commits, rollback. CID takes care of "do these two changes collide in meaning?" — the contract question. With atoms and atom maps, CID has units to point at; with CID, parallel multi-agent dev lifts from "praying nothing collides" up to "decidable by computation." All three layers show up together for the brake to really hold.^*

5. The governance skeleton: seven stages + five atoms + the full behavior set

Before we talk about the "brake," we need to look at where the brake is mounted. The foundry's skeleton splits one piece of work into seven machine-detectable stages, and uses two stable outward contracts — atoms and behaviors — to strap every change down. (One agent doing the job end-to-end is the older shape; this one survives N agents better.) The diagram below is what's shipped today:

Fig. 5: ATM's seven-stage governance lifecycle. Each stage has its own CLI command, its own doctor check, and its own verification contract. The strip at the bottom is the "stable outward governance units": five active atoms (governed tier) handling identity / provisioning / neutrality / fingerprint, plus ten plugin behaviors (plus two capsule-tier helpers: anchorize, promote) — the legal verbs you can apply to atoms.

What's an atom? Not a task, not a commit. It's the smallest governable unit with a unique ATM-{BUCKET}-{NNNN} id, a hash-locked spec, and evidence-required validation. Five active atoms in the registry right now (governed tier): seed (0001), neutrality scanner (0003), atom generator (0004), semantic fingerprint (0005), generator self-test fixture (FIXTURE-0001). You can't edit atomic-registry.json directly — every new atom has to go through ATM-CORE-0004 and pass the INV-ATM-003 schema gate (which is one of the seven charter red lines).

What's a behavior? The legal verbs you can apply to atoms, all consolidated in plugin-behavior-pack, grouped by family:

Split family

split divides an existing atom into clearer boundaries; atomize extracts a brand-new governed unit from broader / legacy material.

Merge family

merge combines two atoms; dedup-merge collapses duplicates; compose assembles multiple units into a higher-level result.

Evolution family

evolve advances an atom version in place; polymorphize preserves identity but switches to a variant.

Lifecycle family

expire formally retires; sweep cleans up residue. Both currently dry-run / proposal only, no direct host apply.

Propagation family

infect propagates a governed change into downstream dependents under neutrality scan + human review. This is the transport mechanism for the "atom market" idea in §10.

Lifecycle role matrix

There's an atom × lifecycle role matrix (lead / assist / detect / not involved). E.g. semantic fingerprint (0005) is the most widely distributed atom — it plays a role in NEXT / EXECUTE / EVIDENCE / HANDOFF / CLOSURE.

6. 5D CID: a multi-dimensional ID card for every atom

This is the heart of the foundry metaphor. Every atom carries an ID card — a CID (content-addressable identifier). It's a fingerprint of what the atom publicly declares it does (not a fingerprint of "what the code means," which computer science still can't actually deliver).

A CID is deterministic — it looks only at what the atom declares for itself (input / output ports, language, performance budget) and skips source parsing, AST analysis, LLM calls, embeddings. The same declaration always yields the same card; non-deterministic signals stay out of this hash.

A concrete example. Two agents write a retry helper. The implementations look completely different:

// Version A: for-loop
async function retry(url, opts) { /* ... */ }

// Version B: while-loop, arrow fn
const withRetry = async (u, o) => { /* ... */ };

Different implementations, so their literal-source hash (CID.Strict) differs. But both atoms declared the same outward contract:

// Each atom's own atom spec — this is what CID actually hashes
const spec = {
  inputs: [
    { name: "url",  kind: "string" },
    { name: "opts", kind: "RetryOpts" }
  ],
  outputs: [
    { name: "result", kind: "Promise<Response>" }
  ],
  language: { primary: "typescript" },
  validation: { evidenceRequired: true },
  performanceBudget: { hotPath: false, inputMutation: "forbidden", maxDurationMs: 5000 }
};

// CID.Interface = SHA256 over the normalized declared contract
const cidInterface = "sf:sha256:" + sha256(canonicalize(spec));

// Because both atoms declared the same contract:
//   A → sf:sha256:a3f9c2e8…1b40
//   B → sf:sha256:a3f9c2e8…1b40   ← same CID.Interface

The process skips "effects extraction," "invariant inference," and "variable-name stripping" on purpose. CID reads what the atom publicly promised as its interface and execution constraints, rather than guessing. Matching declarations → matching hashes → same ID card. This is engineering-tractable, and it's what ATM runs today.

Identity has five possible dimensions; only CID.Interface is shipped right now. The rest are roadmap:

Dimension	What it certifies	Status
`CID.Strict`	SHA256 of normalized source; for git-level tamper checks	partial (capsule / hash-lock)
`CID.Interface`	The one above — declared contract fingerprint	✅ shipped (= today's semanticFingerprint)
`CID.Effects`	Effect tags (NET / FS / DB…); pure vs IO can't substitute	proposed, not built
`CID.Semantic`	Embedding vector; only as a ranking hint for dedup candidates	proposed, not built
`CID.Behavior`	Bound to a test-harness id: "indistinguishable under this harness"	proposed, not built

The last three (Effects / Semantic / Behavior) stay in advisory land — never mixed into the identity hash. This firewall is the determinism axiom: identity stays stable, recomputable, and independent of model-version drift.

Fig. 6: Feed in the atom's declared contract (ports, language, execution constraints). The machine normalizes it, stamps on a SHA256 seal, and emits a CID.Interface. Different implementations are fine — as long as they declare the same outward contract, they receive the same ID card.

With that card in hand, here's what becomes possible — flagged honestly with what already runs and what's still on the roadmap:

In-project interface dedup (shipped): the Dedup Police compares CID.Interface prefixes and instantly catches "two atoms declared the exact same contract — should they merge?" This is an advisory finding; it never auto-acts.
Cross-project interface lookup (query layer shipped; cross-org index pending): across ATM repos already on your machine, this lookup works today; a true cross-organization index does not yet exist.
Cross-organization atom market (roadmap, gated by local maturity): see §10 for the honest boundary.

CIDs also clear up an old question: "how much interface-level duplication does my codebase have?" That used to be fragile static analysis; with every atom carrying a deterministic fingerprint, it becomes an O(1) lookup. One thing to keep in mind — matching interface means matching contract, not matching behavior. Behavior equivalence belongs to runtime testing (which is where CID.Behavior and the zero-trust sandbox are headed on the roadmap). CID itself only covers "who declared what contract."

7. Permission is the gear: leases replace folder isolation

Two agents cannot hold the same write key at the same time — this layer stops it before git ever runs.

The place multi-agent work crashes hardest is "who may write what." Traditional answers — folder permissions or tool allowlists — share the same flaw: they're static and don't track the task.

The foundry splits permission into named objects, each with a mode:

Permission	Mode	Default holder	Meaning
`task.lifecycle`	exclusive	Captain	close / checkpoint / advance
`git.write`	exclusive	Captain	stage / commit / tag / push
`file.write`	exclusive	Implementer	writes source; scope must be a subset of allowedFiles
`file.read`	shareable	many	read files and source
`exec.validator`	shareable	many	run non-mutating validation commands
`database.write`	exclusive	explicit grant	write external DB; not in the default recipe
`ci.write`	exclusive	explicit grant	trigger or modify CI
`web.download`	exclusive	explicit grant	download external data or packages

An exclusive permission goes to only one holder at a time. When a second agent reaches for it, the framework returns ATM_TEAM_PERMISSION_CONFLICT before git is even involved. Live, it looks like this:

$ node atm.mjs team lease --agent builder-01 \
    --permission file.write --paths "packages/cli/src/commands/**"
✓ lease granted — file.write → builder-01

# A second agent reaches for the same write key:
$ node atm.mjs team lease --agent builder-02 \
    --permission file.write --paths "packages/cli/src/commands/**"
✗ ATM_TEAM_PERMISSION_CONFLICT
  file.write is exclusive and already held by builder-01 (since 12:04:11)
  → builder-02 must wait for release, or take a non-overlapping scope

Fig. 7: file.write is a single key. While builder-01 holds it, builder-02 can't even reach it — the conflict is stopped at the permission layer, long before a git merge.

Permissions can be handed over, but only through team lease and team release on the CLI — every handover records a timestamp, the agent ID, and the permission scope. That lease history travels with the closure-packet into permanent evidence.

8. Shared memory: charter is the supreme authority, ATMChart is the runtime brake, closure-packet is the evidence chain

The other place multi-AI work tips over is context drift — Agent A told me an assumption, Agent B never heard it, and B writes code that doesn't line up with A. By the time you notice, three files have drifted apart. The foundry's response is direct: write the governance rules into three layers of machine-readable contract, and let the machine enforce them on the spot:

Supreme authority = the atomic charter (.atm/charter/atomic-charter.md) + 7 INV-ATM. The charter is the framework's final arbiter, currently at v2.0.0; INV-ATM-001~007 are immovable red lines, enforced by three mechanisms (gate / doctor / waiver-required). Breaking an invariant cannot be routed around — it must go through the charter waiver flow (behavior.evolve + charterWaiver + HumanReviewDecision).
Runtime brake = the ATMChart (.atm/memory/atm-chart.md) SHA256 drift detection. Not supreme authority, but the lock that catches anyone trying to act on a stale map.
Evidence chain = closure-packet.v1. Each close writes to a permanent ledger that you can re-walk command-by-command months later for audit.

The next two mechanisms (ATMChart, closure-packet) are spread out below; the seven INV-ATM red lines join the police family discussion in the next section.

ATMChart drift detection

Five core governance schemas (guards, charter, integrations, agent-prompt, upgrade-proposal) are SHA256-locked into the frontmatter of .atm/memory/atm-chart.md. Any schema change → drift → ATM_CHART_STALE blocks it: an AI cannot act on a stale map. Every chart carries the framework version and supported range; an old version is rejected with ATM_CHART_VERSION_UNSUPPORTED.

Closure-packet evidence chain

Each task writes an atm.closurePacket.v1 on close: commitDelta, commandRuns (the SHA256 of each command's stdout/stderr), runnerVersion, teamRunId. Months later you can re-verify whether the recorded output still matches a fresh run. That hash chain is the biggest difference between ATM and a "PR comment."

Opened up, a closure-packet looks like this — it records "which commands I ran, their exit codes, and the SHA256 of their output," rather than just an agent saying "I'm done":

{
  "schemaId": "atm.closurePacket.v1",
  "taskId": "TASK-AAO-0042",
  "commandRuns": [
    { "command": "npm run typecheck",
      "exitCode": 0,
      "stdoutSha256": "sha256:dade4751…655da0" },
    { "command": "node atm.mjs test --spec",
      "exitCode": 0,
      "stdoutSha256": "sha256:9f1c0a…b3e7" }
  ],
  "commitDelta": { "changedFiles": 3, "governedTreeSha": "9f1c2d…" },
  "teamRunId": "team-abc123",
  "runnerVersion": "0.1.0"
}

Note how exitCode and stdoutSha256 are bound together. An agent trying to pass off an exitCode: 1 validator as "passing" gets nowhere — a non-zero exit can only be recorded as a diagnostic, never as a pass. The evidence speaks for itself.

Together these two mechanisms give you one strong guarantee: the rule's version, hash, and supported range are written on the map in plain sight. A stale map gets caught the moment an agent tries to act on it — "I thought the rule was X" stops being a valid excuse.

9. 7 INV-ATM red lines + resident police families: contract brake & structural conscience

While you sleep, the patrol keeps walking; the invariants never clock out.

The foundry's "brake" comes in two layers. The contract-layer brake is the seven INV-ATM red lines on the charter (public, stable, written into the framework). The detection-layer brake is the resident police families (advisory; they run automatically and leave the verdict to humans). They split labor like this:

Red line (INV-ATM-00X)	Brake type	Plain meaning
INV-ATM-001 No second registry	gate (hard reject)	The whole repo gets exactly one atomic-registry source of truth.
INV-ATM-002 Lock before edit	doctor (steer to fix)	You must hold a `ScopeLock` before editing.
INV-ATM-003 Schema-validated promotion only	gate	All atom upgrades must pass schema validation; touching the registry directly is forbidden.
INV-ATM-004 No competing highest authority	doctor	No host rule can claim a status equal to or higher than the charter.
INV-ATM-005 Host rule amendments require waiver flow	waiver-required	The legal path around any invariant: `behavior.evolve` + `charterWaiver` + `HumanReviewDecision`.
INV-ATM-006 Framework work tracking stays target-local	doctor	Framework task ledgers must live in the framework repo's own `.atm/history/tasks/`.
INV-ATM-007 Public framework docs remain English-only	doctor	Framework public docs must be English and adopter-neutral; ATM-CORE-0003 enforces it.

The seven INV-ATM are a public contract; breaking one requires a charter waiver. The police families are advisory detection — they're not in the charter, but they walk through the registry and source daily, catching chronic structural decay. Safety of one mutation belongs to INV-ATM; the health of the repo after hundreds of mutations belongs to the police families.

A few of the most common police families (the full roster has more than a dozen):

Dedup Police

Scans for CID prefix overlap. The moment two atoms have near-identical fingerprints, it emits an advisory finding routed to behavior.dedup-merge.

Demand Police

Scans for caller-graph overload. When an atom's caller count crosses a threshold (default 6) → it recommends behavior.split.

Decomposition Police

Uses source-inventory to count LOC; over the god-file threshold → it drafts an atomize plan. A daily cap stops it from flooding a human with 100 findings at once.

Quality Police

Compares quality metrics before and after. Regression over threshold → blocker. A new atom below baseline → it fails the quality gate.

Evolution / Polymorph

Watches whether an atom should upgrade to a next version, or be abstracted into a template + variants. Advisory only; the final call returns to the human.

Rollback Police

Guards reversibility. Any mutation must have rollback proof before apply, or it's blocked. This line protects every map replacement / evolution / atomization / infect / expire / retirement flow.

Fig. 8: The police bot walks the registry, spots a cell whose fingerprint is suspiciously close to another, plants a dup? flag, and emits a finding — then stops right there. It only reports; it never merges on its own.

These police share one iron rule: directApplyAllowed: false. They can find problems, suggest routes, draft plans — and they always leave the actual change for a human to apply. A finding, opened up, is quite plain; the line that matters is the last one:

{
  "policeFamily": "dedup",
  "severity": "advisory",
  "trigger": "semantic-fingerprint-overlap",
  "scope": "ATM-MAP-0042,ATM-MAP-0088",
  "message": "Two atoms have highly overlapping fingerprints; consider merging",
  "routeHint": "behavior.dedup-merge",
  "directApplyAllowed": false   // ← police only report; acting needs a human
}

Every apply passes through ReviewAdvisory machine-finding + HumanReviewDecision; the human-led promotion from advisory to blocker is a step the framework keeps in your hands on purpose.

The foundry deliberately leaves "24/7 fully automatic evolution" off the table. When N agents are already editing the repo, dropping in an autonomous refactoring robot only manufactures write-write conflict storms. The police see; the humans touch.

10. From git clone to first task close: the minimum adoption path

It sounds complex, but the adoption barrier is low. ATM's official distribution unit is a single-file runner atm.mjs (about 3.1 MB of strip-types bundle, zero transitive deps). Deployment is an explicit framework → adopter sync, which is a different path from npm install. Here's the minimum path from zero to your first closure-packet:

# 0-15 min: get the runner + ORIENT
# Framework side (onefile is already built):
#   npm run build  →  release/atm-onefile/atm.mjs (~3.1 MB)
# Adopter side: sync the onefile in
node release/atm-onefile/atm.mjs internal-release sync \
    --repo /path/to/my-app --json
cd /path/to/my-app
node atm.mjs orient --cwd . --json
node atm.mjs doctor  --cwd . --json
# watch each doctor check (lock-before-edit, charter-integrity, neutrality-scanner-active...) go green

# 15-30 min: NEXT, then LOCK
node atm.mjs next --prompt "your first small change" --json
# get the nextAction → execute its command string (e.g. lock acquire)
node atm.mjs lock acquire --workItem <workItemId> \
    --files src/foo.ts --reason "first small change" --json

# 30-60 min: EXECUTE → EVIDENCE
# edit within the lock scope, run your tests
npm test
node atm.mjs evidence add --task <workItemId> --actor codex-main \
    --kind test --summary "unit tests passed" --artifacts reports/test.json --json

# 60-80 min: HANDOFF + verify
node atm.mjs evidence verify --task <workItemId> --gate close --json
node atm.mjs handoff summarize --task <workItemId> --json

# 80-90 min: CLOSURE
node atm.mjs tasks close --task <workItemId> --actor codex-main --status done --json
# writes .atm/history/evidence/<taskId>/closure-packet.json
# containing commandRuns[].stdoutSha256 + targetCommit + governedTreeSha
# Any future audit replays the same commands → byte-for-byte comparison

What do you own at the ninety-minute mark?

A commit carrying a teamRunId and a reference to its closure evidence
A .atm/history/evidence/TASK-XXXX.closure-packet.json with the stdout SHA256 of every command
A patrol-report.md recording the structural health at close time^*
A team-memory-shard.md filing this task's lesson, to be re-read by a human or the Captain when opening the next task^*

The first two are the most direct difference between the foundry and ordinary vibe coding: on top of the finished code, you also get permanent evidence and a structural snapshot. The memory shard and patrol report are templates filed into git, available for a human or the Captain to cite on the next task; the framework itself doesn't auto-inject them into the next agent's context.

11. Where it tends to fail (and the doctor checks that catch it)

Honestly, the foundry has its share of pitfalls. The six scenarios below are the ones I hit most often; they're also the ones ATM doctor checks name directly. Each maps to a specific check, and on failure it returns a firstFailedCheck plus a suggestion the agent can run as-is — no guesswork:

lock-before-edit (INV-ATM-002)

Editing without holding a lock. The next atm next returns blocked with a suggestion to run atm lock acquire --files <list>. Most common, and easiest to fix.

evidence-freshness

Trying to close with old evidence after a task was reopened. The gate blocks and demands evidence dated after reopenedAt. Introduced by commit 85b92ce to stop "quietly reopen, reuse old evidence, close again."

git-head-evidence

Closure packet's targetCommit doesn't match the current git HEAD. Doctor rejects and asks for rebase or reclose. Keeps "what the packet describes" and "what the repo actually is" from drifting.

registry-sole-source (INV-ATM-001)

A second atomic-registry.json shows up (usually some plugin "for convenience"). Hard gate rejection — fold it back into the root registry via ATM-CORE-0004.

neutrality-scanner-active (INV-ATM-007)

Framework public docs contain adopter-private terms or non-English content. ATM-CORE-0003 reports a violation; if you can't fix it, file a charter waiver.

charter-integrity

charter.md and charter-invariants.json are out of sync (charterVersion / lastAmendedAt mismatch). Doctor asks you to align both. This is the health check on the supreme authority itself.

All six share one trait: they're governance-discipline problems, not framework bugs. The framework already provides the interception, the structured failure report, and the ready-to-run fix command. Wire atm doctor into your pre-commit hook or CI step, and the brake keeps working while you sleep.

12. A CID-native atom market: the north star, not next month's release

Everything so far has been about a single project. From here on it's roadmap territory. I want to draw the line clearly between what this road can deliver and what's still north star, so nobody walks away thinking "next month."

The dominoes line up like this: CID.Interface is deterministic, so the same outward declaration produces the same CID.Interface — author and project don't matter. If ten thousand projects adopt the same governance protocol, each produces a list of CID-bearing atoms. One day in a new project you need idempotent-http-retry-with-jitter:

Locally compute the requirement's CID.Interface prefix
Query the cross-organization CID index
Match N candidate atoms (different implementations, languages, quality / trust tiers)
Each carries its own closure-packet, evidence, clean police record, and declared capabilities (network / file / DB permissions)
A human reviews → approves → atm infect wires it into your project, building the caller graph against your local contracts

It fills a real gap that lives between npm + Docker Hub + Stack Overflow, with none of those covering it: npm hands you a package but not its internals; Stack Overflow hands you a snippet with no audit; Docker hands you an image at far too coarse a grain. Function-grain, governance-laden, machine-searchable code cells — this slot deserves to exist.

A few important boundaries. A cross-organization market is the last epoch, conditioned on Trust Tier and a zero-trust sandbox landing first. It sits alongside npm and Docker rather than replacing them — it fills the "function-grain, governed, recomposable" gap that those leave open. infect always runs advisory + human-reviewed; a stranger's atom never auto-wires itself into your code.^*

So, honestly: the CID atom market is the north star, not next-version's promise. What I can deliver on a sensible timeline is making CID.Interface + the evidence chain + the patrol police work solidly inside a single project. The market emerges naturally once the same protocol gets stretched across organizations — it just needs ground beneath it first.

Back to the picture I ultimately want to see: every piece of code an AI writes should be a named, auditable, findable cell instead of an orphan stuck inside one repo. You don't have to wait for the market to start. It begins in the repo you're already in today, the moment every atom gets a deterministic CID.Interface.

13. Closing

In 2025 everyone competed on "how smart is my AI agent." In 2026 the question I want to ask is: when N smart agents edit one repo at once, does that repo still hold together?

My answer is the foundry + the multi-dimensional ID card + the seven-stage brake. Turn every piece of code into a named cell, every change into a hashed evidence record. Pin the seven INV-ATM as charter red lines, keep the resident police families watching structural health, and let ORIENT→CLOSURE run as a lifecycle that won't let you skip stages. A few more rules in exchange for a real payoff: six months later my repo is still readable, still auditable, still reusable — and five agents in parallel can't smash it.

For a single small feature, this protocol is overkill. For a project that runs for months, has multiple AIs in parallel, may face compliance, and whose good atoms you'd like to share — a foundry with brakes is exactly the right shape.

This is an intro. For the actual implementation, API, police-family mechanics, and permission-model details, see github.com/eaglhuang/AI-Atomic-Framework. The framework is Apache 2.0 — fork the whole protocol, swap the governance bundle for your own, or use it in part (just the evidence chain, or just ATMChart).

* Notes: roadmap & current implementation status

To keep this article readable, "shipped" and "planned" sit on the same map. Here's the line, drawn cleanly — skim it once, come back when you need it.

Shipped (works once you clone): the seven-stage lifecycle + the node atm.mjs onefile runner; CID.Interface (=today's semanticFingerprint); ScopeLock + INV-ATM-002 "lock-before-edit"; closure-packet.v1 (with commandRuns SHA256); ATMChart drift detection; the seven INV-ATM red lines; advisory detection for Atomization / Dedup / Demand / Quality / Lifecycle / Rollback / Boundary / Map-Integration police; atomic-map.schema.json v0.1.0 + basic map integration tests.
Planned (roadmap, marked with ^* in the body):
- The other four CID dimensions (Strict / Effects / Semantic / Behavior): only Interface is shipped; the rest are proposals. Non-deterministic signals are forever banned from the identity hash.
- Team Agents lane: the team lease CLI, the permission table (task.lifecycle / git.write / file.write …), 15-role 4-layer collaboration, team-memory-shard.md, and patrol-report.md are all proposals in this lane. Today's lease only uses ScopeLockRecord + INV-ATM-002 — the minimum path. Memory shards and patrol reports are templates; they are NOT auto-loaded into the next agent's context. They are git markdown files, to be cited by a human or the Captain when the next task opens.
- Map Replacement Protocol v2-r2: §3's replacement.legacyUris[], the five-stage lifecycle (draft → shadow → canary → active → legacy-retired), map-equivalence-report, the create-map / test --equivalence-fixtures / test --propagate CLI, M26 Rescue Police, and M27 Disaster Recovery CLI belong to TASK-MRP-0011~0027 (17 task cards). MRP-0018 / 0021 are done; MRP-0026 is in progress; MRP-0022 (Daemon) / MRP-0024 (Guide Cache) are opt-in, off by default, with a kill switch.
- Parallel Conflict Advisor (§4): the atm tasks parallel CLI, the eight verdicts, and the merge-plan / hotspot reports described in §4 belong to TASK-CID-0005 (P0 planning-only contract card). CID / semanticFingerprint itself is deterministic and already shipped; the advisor's CLI contract is defined and the implementation lands in subsequent AAF target-repo cards. Write Broker / Steward lane integration (TASK-CID-0009~0012) is on the same P0 planning lane.
- CID atom market (§12): a cross-organization market is the north star. It needs the local layer to mature first, Trust Tier institutionalized, and a zero-trust sandbox shipped. It does not replace npm / Docker — it fills the "function-grain, governed, recomposable" middle layer. infect is always advisory + human-reviewed.
- Tier 3 evidence: local sandbox evidence only reaches Tier 2. To climb to Tier 3 / marketplace-grade, a closure must carry an external AttestationProvider signature with verifiable provenance (GitHub Actions is the first reference adapter). Mutation testing and adversarial QA never sit on the synchronous close path — they live in async police.

Formal specs, charter text, and warrant examples are at github.com/eaglhuang/AI-Atomic-Framework.

About the author

Eagl Huang is an independent engineer based in Taiwan, currently building AI-Atomic-Framework — a governance layer for multiple AI agents working on a single repo — and 3KLife, an AI-driven Three Kingdoms game project. He runs five agents concurrently against the same codebase every day, which is exactly why he cares whether this brake actually holds.

GitHub · RSS · More articles

If this helped, share it with anyone else being driven mad by multi-agent dev. Thoughts and corrections welcome via GitHub Discussions.