3KProject Engineering Notes
How I Use Harness Engineering to Make 3KProject More Stable
This article is my own working note on how I run engineering around 3KProject. The project already had instructions, skills, task cards, a doc-id registry, a context budget guard, UI contract validation, and runtime smoke checks. What really made the agent more stable, though, was moving the workflow from model guessing toward computation-backed validation.
A Core Idea
Harness is not the model itself. It is the full set of mechanisms around the model that make mistakes less likely and make recovery easier when mistakes do happen.
If you think of a Coding Agent as a fast engine for producing code, then the harness is the steering wheel, the brakes, the dashboard, and the fuse box. My experience inside 3KProject is simple: without that engineering harness, the model can look brilliant at times but stays unstable; with it, spec summaries, task cards, validators, and handoffs start behaving like a repeatable production line.
The Two Axes That Matter Most
When I planned these tools for 3KProject, the most useful move was reducing control into two simple dimensions. The first is guidance before action versus feedback after action. The second is what can be decided computationally versus what still requires semantic judgment. That framing changes where I invest first instead of blindly piling on more prompt text.
Feedforward
Before the agent edits anything, give it a spec summary, file scope, task contracts, and explicit no-go zones. The goal is not a longer prompt. The goal is fewer wasted attempts.
Feedback
Return executable validation results immediately after changes. If the failure message is short and specific enough, even a smaller model can converge through local fixes.
Split Harness Into Three Buckets So Governance Stays Focused
I ran into the same trap early in 3KProject: if something felt important, I kept adding another rule, and eventually it turned into an unmaintainable pile. The steadier approach was to split harness work into three governance dimensions: maintainability, architecture fitness, and behavioural correctness.
This split matters in 3KProject because it turns quality from an abstract slogan into concrete responsibilities. Maintainability protects day-to-day changeability. Architecture fitness protects long-term evolution. Behaviour checks whether the system actually does the right thing. When all three are mixed together, rules keep growing while the feedback loop gets weaker.
3KProject’s Current Strengths and Gaps
Looking back at the current internal flow of 3KProject, feedforward is not in bad shape at all. Instructions, workflow skills, task cards, consensus docs, the doc-id registry, and the context budget guard are all alive. The weaker area is that computational feedback still is not complete enough.
What 3KProject Already Got Right
Writing experience down as instructions, skill flows, task cards, summary cards, and spec indexes was my first real step in turning an external harness into project infrastructure. Those mechanisms genuinely improved the agent’s first-pass success rate.
What I Most Want to Add Next
Without high-frequency, low-cost computational feedback, the agent is still guessing in the end. On the surface it looks like engineering, but in practice it only moves human review further downstream.
The Small-Model Workflow I Want to Land in 3KProject
The most valuable part of this approach for me is that it does not only serve the strongest models. 3KProject has a lot of internal specs, UI surfaces, data, and tooling. As long as I split work into atomic steps and give every step explicit inputs, outputs, and validation commands, even medium or small models can deliver complex features reliably.
Once the workflow looks like the diagram above, the LLM’s job inside 3KProject becomes very simple: read the task card, edit locally, read the failure message, and repair locally. Any judgment about whether something is actually correct, I try to hand over to the type system, spec validators, boundary checkers, and fixture comparisons.
Example 1: Atomic Task Decomposer
node tools/task-decomposer.js \
--feature "UI contract gate consolidation" \
--spec "docs/ui/UI-tech-spec.md" \
--output-dir "docs/tasks/"
Example 2: Computational Gate Configuration
{
"gates": [
{
"name": "syntax-check",
"cmd": "npx tsc --noEmit --project tsconfig.json",
"priority": 1
},
{
"name": "encoding-check",
"cmd": "node tools_node/check-encoding-touched.js",
"priority": 2
},
{
"name": "domain-data",
"cmd": "node tools_node/validate-generals-data.js",
"priority": 3
},
{
"name": "ui-contract",
"cmd": "node tools_node/validate-ui-specs.js",
"priority": 4
},
{
"name": "runtime-registry",
"cmd": "node tools_node/check-ui-runtime-state-registry.js",
"priority": 5
},
{
"name": "import-boundary",
"cmd": "node tools_node/check-import-boundaries.js",
"priority": 5
}
]
}
The Five Tools I Most Want to Add Next
3KProject does not really lack process documents. What it lacks are feedback gates that can run every day and feed results straight back into the agent. So for me, the efficient move is not writing more policy, but filling in a few tools that would actually be used daily.
- compute-gate.js: unify type checks, encoding checks, data validation, and contract validation behind one entry point.
- check-import-boundaries.js: turn module boundaries from verbal convention into tool-enforced guardrails.
- approved-fixture-check.js: preserve human-approved expected outputs as comparable baselines.
- task-decomposer.js: split large requests into atomic task cards that can be executed in sequence.
- harness-health-report.js: produce a fixed report showing which guides and sensors are still hollow.
What these five tools share is that they translate abstract knowledge into repeatable operations. In 3KProject, once a rule becomes executable, the agent can actually be governed. Once a result becomes measurable, I can discuss improvement with the team instead of debating feelings.
Rollout Order: Patch the Cheap, Stable, Daily Sensors First
When I rank this rollout order for 3KProject, I care about which checks are cheapest, most stable, and likely to run every day. The best strategy is to start with low-cost high-frequency checks, then add behaviour and architecture guards, and only later handle expensive semantic review.
In one sentence, my priority order in 3KProject is this: let machines do the deterministic judgment they are good at first, then let models do the new-content generation and semantic understanding they are good at.
The Final Decision Rule
Back in 3KProject, my decision rule is very simple now: if a quality judgment can be made by scripts, the type system, schema checks, snapshots, or fixtures, then I should not leave it to an LLM to guess. The model is most valuable when it generates something new, understands ambiguous requirements, and fills missing context, not when it acts like an expensive unstable if/else engine.
Better Handed to Models
Requirement decomposition, code writing, refactoring suggestions, document consolidation, semantic diff reading, and high-level design tradeoffs.
Better Handed to Tools
Type correctness, naming rules, module boundaries, data formats, fixed-output comparison, and encoding integrity.
Inside 3KProject, my rule now is to turn as much model-side judgment as possible into scripts, and save the LLM for the work that really needs understanding and creation.