fix-agents-md — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-1487`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Created	2026-05-02T02:31:00.718635052+00:00
Started	2026-05-02T02:31:45.179229704+00:00
Completed	2026-05-02T02:52:42.139061195+00:00
Tags	`priority-high,fix,docs,agents,prompting`, `eval-scheduled`
Eval score	0.76
└ blocking impact	0.80
└ completeness	0.70
└ constraint fidelity	0.10
└ coordination overhead	0.85
└ correctness	0.75
└ downstream usability	0.75
└ efficiency	0.75
└ intent fidelity	0.76
└ style adherence	0.80

Description

Codex chat agents are observed doing implementation work themselves (writing code, making changes) instead of dispatching to worker tasks via wg add. The chat agent contract is supposed to be 'thin task-creator, not implementer'. Claude chat agents follow this; codex doesn't.

User report 2026-05-01: 'we have the codex .chat- agents always doing work themselves rather than making wg tasks. is there a prompting gap with codex/claude? like AGENTS.md or the .chat prompting isn't as clearly saying hey, don't just do the work unless the user asks you to.'

Confirmed root cause

After reorg-separate-universal (Apr 29) split CLAUDE.md into layer-2 only (project-specific) + bundled wg agent-guide (universal role contract), the same surgery was NOT applied to AGENTS.md.

Current state:

CLAUDE.md (5145 bytes): layer-2 only, says 'run wg agent-guide for the universal contract'
AGENTS.md (7687 bytes): has the OLDER mixed content with the role contract INLINE, never updated post-reorg

Net effect:

claude agents → read CLAUDE.md → run wg agent-guide → see the canonical, possibly more directive role contract
codex agents → read AGENTS.md → see older inline role contract → less consistent enforcement

Plus likely behavioral asymmetry: codex's 'be helpful, do the work' baseline is stronger than its instruction-following especially when the role contract feels softer/older than the bundled version.

Spec

Fix 1: bring AGENTS.md into parity with CLAUDE.md

AGENTS.md becomes layer-2 only (workgraph-as-a-project content)
Strip the inline universal role contract
Add the same 'run wg agent-guide for the universal contract' pointer that CLAUDE.md has
Both files point at the SAME bundled source of truth — no drift

Fix 2: strengthen wg agent-guide's chat-agent role language

The current bundled agent-guide should be loud about 'DO NOT WRITE CODE. DO NOT IMPLEMENT.' Specifically:

Lead with the role distinction prominently — the FIRST thing a chat agent reads should be 'You are a chat agent. Your job is to create wg tasks via wg add, NOT to do the work yourself.'
Add concrete anti-patterns: 'Don't run cargo build. Don't open the editor. Don't grep for code. Use wg add to dispatch every code-touching action to a worker.'
Add explicit list of things chat agents CAN do: wg show, wg list, wg log, wg add, wg edit, wg publish. Things they CAN'T do: cargo, grep on source, edit files in src/, etc.

Fix 3 (optional, codex-specific): codex chat spawn includes an extra system-prompt addendum

If codex's behavioral baseline still pulls toward 'do work' even with strengthened agent-guide, add a codex-specific addendum at chat spawn that says (loudly) 'STOP. Do not write code. Use wg add for any implementation. The user is talking to you to ORCHESTRATE work, not to receive it.' Inject this when spawning codex chat tabs specifically.

This is the same kind of asymmetry the codex bypass-flag fix addressed (different handler needs different treatment). Acceptable to ship.

Validate behavior empirically

The proof is empirical: a codex chat agent receiving a 'fix bug X' request should respond with 'I'll file this as a wg task' + actual wg add invocation, NOT with 'Let me look at the code...' followed by editing.

Validation

Failing test or behavioral repro: spawn a codex chat agent, give it a code-touching request ('fix bug Y in src/foo.rs'). Pre-fix: agent reads source / makes edits. Post-fix: agent files wg add and waits for the worker.
AGENTS.md is now layer-2 only (workgraph-project context only); same shape as CLAUDE.md
grep AGENTS.md for inline role-contract content: zero matches (or only pointers to wg agent-guide)
wg agent-guide content updated with stronger / clearer chat-agent role language
Same behavioral test passes for claude chat agent (no regression)
If Fix 3 implemented: codex chat spawn args include the system-prompt addendum
cargo build + cargo test pass
cargo install --path . was run before claiming done

Process note

This is exactly the kind of asymmetry a comprehensive doc-sync would have caught — both files have similar surface but different age. The doc-sync function template should be amended to: 'AGENTS.md and CLAUDE.md should be checked together; any drift between them is a bug, not an intentional difference.'

The autohaiku evaluator-grade-zero bug from earlier today (.workgraph paths) and this bug share a root cause: agent-visible documentation drift. Each instance feels small but they compound — agents make decisions based on stale text and we don't notice until something goes wrong.

## Description
Codex chat agents are observed doing implementation work themselves (writing code, making changes) instead of dispatching to worker tasks via `wg add`. The chat agent contract is supposed to be 'thin task-creator, not implementer'. Claude chat agents follow this; codex doesn't.

## Confirmed root cause

After reorg-separate-universal (Apr 29) split CLAUDE.md into layer-2 only (project-specific) + bundled wg agent-guide (universal role contract), the same surgery was NOT applied to AGENTS.md.

Current state:
- **CLAUDE.md** (5145 bytes): layer-2 only, says 'run `wg agent-guide` for the universal contract'
- **AGENTS.md** (7687 bytes): has the OLDER mixed content with the role contract INLINE, never updated post-reorg

Net effect:
- claude agents → read CLAUDE.md → run `wg agent-guide` → see the canonical, possibly more directive role contract
- codex agents → read AGENTS.md → see older inline role contract → less consistent enforcement

Plus likely behavioral asymmetry: codex's 'be helpful, do the work' baseline is stronger than its instruction-following especially when the role contract feels softer/older than the bundled version.

## Spec

### Fix 1: bring AGENTS.md into parity with CLAUDE.md
- AGENTS.md becomes layer-2 only (workgraph-as-a-project content)
- Strip the inline universal role contract
- Add the same 'run `wg agent-guide` for the universal contract' pointer that CLAUDE.md has
- Both files point at the SAME bundled source of truth — no drift

### Fix 2: strengthen wg agent-guide's chat-agent role language
The current bundled agent-guide should be loud about 'DO NOT WRITE CODE. DO NOT IMPLEMENT.' Specifically:
- Lead with the role distinction prominently — the FIRST thing a chat agent reads should be 'You are a chat agent. Your job is to create wg tasks via `wg add`, NOT to do the work yourself.'
- Add concrete anti-patterns: 'Don't run `cargo build`. Don't open the editor. Don't grep for code. Use `wg add` to dispatch every code-touching action to a worker.'
- Add explicit list of things chat agents CAN do: `wg show`, `wg list`, `wg log`, `wg add`, `wg edit`, `wg publish`. Things they CAN'T do: `cargo`, `grep` on source, edit files in src/, etc.

### Fix 3 (optional, codex-specific): codex chat spawn includes an extra system-prompt addendum
If codex's behavioral baseline still pulls toward 'do work' even with strengthened agent-guide, add a codex-specific addendum at chat spawn that says (loudly) 'STOP. Do not write code. Use wg add for any implementation. The user is talking to you to ORCHESTRATE work, not to receive it.' Inject this when spawning codex chat tabs specifically.

This is the same kind of asymmetry the codex bypass-flag fix addressed (different handler needs different treatment). Acceptable to ship.

### Validate behavior empirically
The proof is empirical: a codex chat agent receiving a 'fix bug X' request should respond with 'I'll file this as a wg task' + actual `wg add` invocation, NOT with 'Let me look at the code...' followed by editing.

## Validation
- [ ] Failing test or behavioral repro: spawn a codex chat agent, give it a code-touching request ('fix bug Y in src/foo.rs'). Pre-fix: agent reads source / makes edits. Post-fix: agent files `wg add` and waits for the worker.
- [ ] AGENTS.md is now layer-2 only (workgraph-project context only); same shape as CLAUDE.md
- [ ] grep AGENTS.md for inline role-contract content: zero matches (or only pointers to wg agent-guide)
- [ ] wg agent-guide content updated with stronger / clearer chat-agent role language
- [ ] Same behavioral test passes for claude chat agent (no regression)
- [ ] If Fix 3 implemented: codex chat spawn args include the system-prompt addendum
- [ ] cargo build + cargo test pass
- [ ] cargo install --path . was run before claiming done

## Process note
This is exactly the kind of asymmetry a comprehensive doc-sync would have caught — both files have similar surface but different age. The doc-sync function template should be amended to: 'AGENTS.md and CLAUDE.md should be checked together; any drift between them is a bug, not an intentional difference.'

Depends on

done .assign-fix-agents-md

Required by

(none)

Log

2026-05-02T02:31:00.687074625+00:00 Task paused
2026-05-02T02:31:00.778336100+00:00 Task published
2026-05-02T02:31:41.809518595+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Implementation + documentation fix requiring careful validation of agent-prompt behavior; Careful Programmer tradeoff matches the correctness-critical nature and exhaustive testing requirement emphasized in validation criteria.
2026-05-02T02:31:45.179235033+00:00 Spawned by coordinator --executor claude --model opus
2026-05-02T02:31:54.145495627+00:00 Starting: reading AGENTS.md, CLAUDE.md, and wg agent-guide source
2026-05-02T02:33:06.460170272+00:00 Plan: (1) rewrite AGENTS.md to layer-2-only mirror of CLAUDE.md; (2) strengthen src/text/agent_guide.md by moving chat-agent contract to lead with louder anti-patterns + can/can't lists; (3) add codex-specific 'STOP. Do not write code.' addendum in codex_handler.rs first-turn prompt; (4) add a unit test asserting the addendum appears.
2026-05-02T02:52:00.721293962+00:00 Validated: cargo build clean (warnings only). cargo test --bin wg -- agent_guide codex_handler: 15/15 pass. Full suite 3350/3351 (1 pre-existing flaky tmux test passes in isolation). cargo install --path . done. wg agent-guide now leads with STOP banner.
2026-05-02T02:52:33.417442984+00:00 Committed: bf583d80e — pushed to origin/wg/agent-1487/fix-agents-md
2026-05-02T02:52:42.139074209+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-05-02T02:56:37.192348383+00:00 PendingEval → Done (evaluator passed; downstream unblocks)