Agent Workflow

How a Stronger Agent Can Lead Multi-Agent Development Without Turning the Repo into Chaos

When more models join the room, two illusions appear fast: everybody looks busy, and every report sounds like progress. The practical fix is much simpler than the tooling debate. We need stable dispatch slips, stable report formats, file boundaries, validator rules, and clear human checkpoints. Codex, Claude Code, Copilot, Gemini, MCP, and skills can all work on the same line when the language of coordination stays shared.

Read in Traditional Chinese

Lead modelOwns sequencing, boundaries, and final judgment. It does not need to write every patch itself.
Dispatch formatA shared format matters more than a shared vendor. That is what makes collaboration neutral.
Report symmetryIf reports mirror the dispatch fields, handoff quality goes up and confusion drops fast.

Why multi-agent teams get stuck

Most projects do not slow down because one model is weak. They slow down because every model is speaking in a different format.

As soon as multiple agents share one repository, familiar problems show up. Someone was asked to do read-only inspection and started editing anyway. Someone fixed a narrow bug and bundled a mountain of generated artifacts into the same commit. Someone reported success without listing validators, pre-existing failures, or the rest of the dirty working tree.

These are not separate accidents. They come from the same gap: no shared dispatch contract and no shared reporting contract. If the lead model keeps improvising the workflow in plain chat, the context gets heavier, the token bill climbs, and even the human reviewer loses the thread.

A practical leadership loop

Human + Lead Agent Sequence, scope, and final routing Cheap sidecar Read-only scan, grep, risk check Spend tokens late, not early Execution agent Claude Code / Codex / Copilot Touch only the approved bag Review agent Scope audit, review, validator check PASS / CONCERN / BLOCK Dispatch slip → Validation → Report → Human decision PR, merge, push, Draft, and Ready all stay at the end of the lane
Figure 1: The stronger model keeps the map. Cheap sidecars scan first, execution agents patch, review agents judge, and the human keeps the final steering wheel.

Use cheap sidecars first

If the scope is still fuzzy, send a cheaper model to do the read-only preflight. Let it grep, list conflicts, and suggest validators before the stronger model spends serious context.

The lead model does not need to code every branch

A stronger model creates more value when it sequences work, resolves tradeoffs, and integrates results. Writing every patch itself is usually the slower option.

The dispatch slip is the floor, not the wallpaper

The real value of a dispatch slip is simple: any agent can read it and instantly know what it may touch, what it must not touch, and what the return format must look like.

The first line of the report must be: Roster code: 005; Model: <actual model used>

Task: State the purpose of this turn in one sentence.

repo:
Path or worktree location

Context in plain language:
Use 2-5 sentences to explain:
- why this work matters now
- how it relates to the previous step
- what decision this turn is supposed to unlock

Please do:
1. ...
2. ...
3. ...

Please report:
1. Conclusion: PASS / CONCERN / BLOCK
2. ...
3. ...
4. ...

Do not:
- ...
- ...
- ...

One plain-language summary:
Say what bag of work this really is.
FieldWhy it mattersPractical note
First-line roster codeConfirms who caught the ballHuman-facing dispatch slips should use visible roster IDs like 001, 004, or 007.
Plain-language contextExplains why the turn existsKeep it short. Focus on the knot this turn is supposed to untie.
Please doTurns the task into executable stepsOne slip, one main job. Analysis, delivery, and review work best when split.
Please reportLocks the return shapeThe more stable this section is, the faster the lead model can integrate the result.
Do notPrevents accidental scope driftLines like “do not stage, commit, or push” save an absurd amount of cleanup.

Reports should mirror dispatch slips

The most expensive report is a report that reads like a diary entry. The best report mirrors the original dispatch fields. If the slip asked for file boundaries, validators, and a PASS or BLOCK judgment, the reply should not invent a different structure halfway through.

Dispatch slip Return report 1. Visible roster code and model 2. Task and repo 3. Plain-language context 4. Please do 5. Please report 6. Do not 7. Plain-language summary 1. Same roster code and model 2. Actual repo worked on 3. Key findings 4. Steps actually taken 5. PASS / CONCERN / BLOCK 6. Whether forbidden zones stayed untouched 7. One sentence for the lead
Figure 2: When dispatch and report share the same skeleton, tool differences fade into the background. What remains is a neutral collaboration contract.
Capability roles such as “fast executor” or “scope auditor” work well inside the captain’s head. Human-facing dispatch slips should still use visible roster IDs. That keeps the workflow legible when the roster changes every day.

How humans fit into the loop

Humans create the most value at three points: deciding priority, deciding who may touch the main lane, and deciding when something is allowed to merge. Hold those three levers tightly and the whole system feels much calmer.

Bind the roster first

Start the day by mapping visible IDs to current people or models. Example: 001 handles docs and checklists, 004 handles route judgment, 007 handles fast execution. Internally you may still map capability roles. The outward-facing slip should stay numeric.

Decide which bag leaves first

Source fixes, evidence, generated artifacts, and docs each deserve their own bag. Smaller bags move first. Larger bags can sit in Draft review until the path clears.

Human checkpointWhat to doCommon mistake
Before dispatchConfirm roster, priority, allowed files, and forbidden zonesGiving only a verbal instruction and leaving no transferable text behind
During executionStep in only for decisions that truly need judgmentSpending senior time on low-level inspection work
At the endRead the PR, validator results, and report shape before merge or pushTrusting “done” as a complete status by itself

Useful prompts and skill examples

Real teams often mix several tools at once: Codex manages worktrees, Claude Code handles larger patches, Copilot fills focused gaps, smaller Gemini or Flash-class models scout first, and MCP connects repos, issues, preview tools, or design surfaces. The useful trick is making all of them speak the same dispatch language.

Read-only scout prompt

The first line of the report must be: Roster code: 002; Model: <actual model used>

Task: perform a read-only preflight and tell me whether this work is safe to start.

repo:
/workspace/example-repo

Plain-language context:
The main lane is blocked by a large PR. This turn only needs to answer:
- which files are already being touched
- which validators matter
- whether this task should be split into separate bags

Please do:
1. git status -sb
2. git diff --name-only
3. classify source fix / generated artifacts / docs / evidence
4. do not edit or commit

Please report:
1. Conclusion: PASS / CONCERN / BLOCK
2. Recommended split
3. Recommended next assignee

Execution prompt

The first line of the report must be: Roster code: 007; Model: <actual model used>

Task: land only these three source files. Do not mix generated artifacts.

repo:
/workspace/example-repo

Plain-language context:
This turn is only for the source-fix bag. Evidence and runtime artifacts stay outside.

Please do:
1. run the minimum validators
2. stage only the allowlisted files
3. commit
4. return commit SHA and git status -sb

Please report:
1. Conclusion: PASS / CONCERN / BLOCK
2. Commit SHA
3. Validator result
4. Anything intentionally excluded

Useful skill and MCP combinations

  • Dispatch-standard skill: fixed fields, fixed report rules, fixed forbidden list.
  • Task-opening skill: task card creation, shard updates, dry-run opening report.
  • Encoding guard skill: scan touched files before a doc or HTML change turns into a mojibake party.
  • Browser QA skill: open the page, check the layout, verify language switching.
  • MCP repo / issue / design connectors: shared data sources mean fewer “I thought this was already done” moments.

A few painful traps that show up a lot

You asked for inspection and got a patch

If the dispatch slip does not lock down read-only mode and explicitly ban commit actions, many agents will interpret any named task as a start signal.

A small fix travels with a giant bag

Source fix, runtime artifacts, and governance evidence should not ride in the same commit. The review surface becomes huge, and rollback becomes expensive.

Someone pushes from a dirty working tree

When a working tree already contains background modifications, direct pushes can drag unrelated work along for the ride. Clean branches and Draft PRs save a lot of pain.

Everyone reports in a different shape

One person writes a summary, another pastes a SHA, another only says “done.” The captain then burns time translating formats instead of moving the project.

The healthiest pattern stays consistent: send a cheap read-only scout first, dispatch through a fixed field set, and let a review agent close the turn with PASS, CONCERN, or BLOCK. Large bags can wait in Draft PRs. Smaller bags can keep the road clear.

A solid starting checklist

  1. Choose one lead model to coordinate the turn.
  2. Bind the current roster first. Human-facing slips should keep visible numeric IDs.
  3. Use a cheap model for read-only scouting before the stronger model spends heavy context.
  4. Keep one dispatch slip focused on one main job.
  5. Use the same fields for dispatch and report.
  6. Split source fixes, evidence, generated artifacts, and docs into separate bags.
  7. Push fixed procedures into skills and MCP-driven tools. Leave judgment to humans and stronger models.
  8. Save PR, merge, and push actions for the end of the lane.

A crowded tool stack is manageable. A crowded conversation format is where teams usually lose the thread.