Agent Workflow
How a Stronger Agent Can Lead Multi-Agent Development Without Turning the Repo into Chaos
When more models join the room, two illusions appear fast: everybody looks busy, and every report sounds like progress. The practical fix is much simpler than the tooling debate. We need stable dispatch slips, stable report formats, file boundaries, validator rules, and clear human checkpoints. Codex, Claude Code, Copilot, Gemini, MCP, and skills can all work on the same line when the language of coordination stays shared.
Why multi-agent teams get stuck
Most projects do not slow down because one model is weak. They slow down because every model is speaking in a different format.
As soon as multiple agents share one repository, familiar problems show up. Someone was asked to do read-only inspection and started editing anyway. Someone fixed a narrow bug and bundled a mountain of generated artifacts into the same commit. Someone reported success without listing validators, pre-existing failures, or the rest of the dirty working tree.
These are not separate accidents. They come from the same gap: no shared dispatch contract and no shared reporting contract. If the lead model keeps improvising the workflow in plain chat, the context gets heavier, the token bill climbs, and even the human reviewer loses the thread.
A practical leadership loop
Use cheap sidecars first
If the scope is still fuzzy, send a cheaper model to do the read-only preflight. Let it grep, list conflicts, and suggest validators before the stronger model spends serious context.
The lead model does not need to code every branch
A stronger model creates more value when it sequences work, resolves tradeoffs, and integrates results. Writing every patch itself is usually the slower option.
The dispatch slip is the floor, not the wallpaper
The real value of a dispatch slip is simple: any agent can read it and instantly know what it may touch, what it must not touch, and what the return format must look like.
The first line of the report must be: Roster code: 005; Model: <actual model used>
Task: State the purpose of this turn in one sentence.
repo:
Path or worktree location
Context in plain language:
Use 2-5 sentences to explain:
- why this work matters now
- how it relates to the previous step
- what decision this turn is supposed to unlock
Please do:
1. ...
2. ...
3. ...
Please report:
1. Conclusion: PASS / CONCERN / BLOCK
2. ...
3. ...
4. ...
Do not:
- ...
- ...
- ...
One plain-language summary:
Say what bag of work this really is.
| Field | Why it matters | Practical note |
|---|---|---|
| First-line roster code | Confirms who caught the ball | Human-facing dispatch slips should use visible roster IDs like 001, 004, or 007. |
| Plain-language context | Explains why the turn exists | Keep it short. Focus on the knot this turn is supposed to untie. |
| Please do | Turns the task into executable steps | One slip, one main job. Analysis, delivery, and review work best when split. |
| Please report | Locks the return shape | The more stable this section is, the faster the lead model can integrate the result. |
| Do not | Prevents accidental scope drift | Lines like “do not stage, commit, or push” save an absurd amount of cleanup. |
Reports should mirror dispatch slips
The most expensive report is a report that reads like a diary entry. The best report mirrors the original dispatch fields. If the slip asked for file boundaries, validators, and a PASS or BLOCK judgment, the reply should not invent a different structure halfway through.
How humans fit into the loop
Humans create the most value at three points: deciding priority, deciding who may touch the main lane, and deciding when something is allowed to merge. Hold those three levers tightly and the whole system feels much calmer.
Bind the roster first
Start the day by mapping visible IDs to current people or models. Example: 001 handles docs and checklists, 004 handles route judgment, 007 handles fast execution. Internally you may still map capability roles. The outward-facing slip should stay numeric.
Decide which bag leaves first
Source fixes, evidence, generated artifacts, and docs each deserve their own bag. Smaller bags move first. Larger bags can sit in Draft review until the path clears.
| Human checkpoint | What to do | Common mistake |
|---|---|---|
| Before dispatch | Confirm roster, priority, allowed files, and forbidden zones | Giving only a verbal instruction and leaving no transferable text behind |
| During execution | Step in only for decisions that truly need judgment | Spending senior time on low-level inspection work |
| At the end | Read the PR, validator results, and report shape before merge or push | Trusting “done” as a complete status by itself |
Useful prompts and skill examples
Real teams often mix several tools at once: Codex manages worktrees, Claude Code handles larger patches, Copilot fills focused gaps, smaller Gemini or Flash-class models scout first, and MCP connects repos, issues, preview tools, or design surfaces. The useful trick is making all of them speak the same dispatch language.
Read-only scout prompt
The first line of the report must be: Roster code: 002; Model: <actual model used>
Task: perform a read-only preflight and tell me whether this work is safe to start.
repo:
/workspace/example-repo
Plain-language context:
The main lane is blocked by a large PR. This turn only needs to answer:
- which files are already being touched
- which validators matter
- whether this task should be split into separate bags
Please do:
1. git status -sb
2. git diff --name-only
3. classify source fix / generated artifacts / docs / evidence
4. do not edit or commit
Please report:
1. Conclusion: PASS / CONCERN / BLOCK
2. Recommended split
3. Recommended next assignee
Execution prompt
The first line of the report must be: Roster code: 007; Model: <actual model used>
Task: land only these three source files. Do not mix generated artifacts.
repo:
/workspace/example-repo
Plain-language context:
This turn is only for the source-fix bag. Evidence and runtime artifacts stay outside.
Please do:
1. run the minimum validators
2. stage only the allowlisted files
3. commit
4. return commit SHA and git status -sb
Please report:
1. Conclusion: PASS / CONCERN / BLOCK
2. Commit SHA
3. Validator result
4. Anything intentionally excluded
Useful skill and MCP combinations
- Dispatch-standard skill: fixed fields, fixed report rules, fixed forbidden list.
- Task-opening skill: task card creation, shard updates, dry-run opening report.
- Encoding guard skill: scan touched files before a doc or HTML change turns into a mojibake party.
- Browser QA skill: open the page, check the layout, verify language switching.
- MCP repo / issue / design connectors: shared data sources mean fewer “I thought this was already done” moments.
A few painful traps that show up a lot
You asked for inspection and got a patch
If the dispatch slip does not lock down read-only mode and explicitly ban commit actions, many agents will interpret any named task as a start signal.
A small fix travels with a giant bag
Source fix, runtime artifacts, and governance evidence should not ride in the same commit. The review surface becomes huge, and rollback becomes expensive.
Someone pushes from a dirty working tree
When a working tree already contains background modifications, direct pushes can drag unrelated work along for the ride. Clean branches and Draft PRs save a lot of pain.
Everyone reports in a different shape
One person writes a summary, another pastes a SHA, another only says “done.” The captain then burns time translating formats instead of moving the project.
The healthiest pattern stays consistent: send a cheap read-only scout first, dispatch through a fixed field set, and let a review agent close the turn with PASS, CONCERN, or BLOCK. Large bags can wait in Draft PRs. Smaller bags can keep the road clear.
A solid starting checklist
- Choose one lead model to coordinate the turn.
- Bind the current roster first. Human-facing slips should keep visible numeric IDs.
- Use a cheap model for read-only scouting before the stronger model spends heavy context.
- Keep one dispatch slip focused on one main job.
- Use the same fields for dispatch and report.
- Split source fixes, evidence, generated artifacts, and docs into separate bags.
- Push fixed procedures into skills and MCP-driven tools. Leave judgment to humans and stronger models.
- Save PR, merge, and push actions for the end of the lane.
A crowded tool stack is manageable. A crowded conversation format is where teams usually lose the thread.