fix-agents-md

Fix: AGENTS.md is stale (pre-reorg); codex chats do work themselves instead of dispatching via wg add

Metadata

Statusdone
Assignedagent-1487
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-05-02T02:31:00.718635052+00:00
Started2026-05-02T02:31:45.179229704+00:00
Completed2026-05-02T02:52:42.139061195+00:00
Tagspriority-high,fix,docs,agents,prompting, eval-scheduled
Eval score0.76
└ blocking impact0.80
└ completeness0.70
└ constraint fidelity0.10
└ coordination overhead0.85
└ correctness0.75
└ downstream usability0.75
└ efficiency0.75
└ intent fidelity0.76
└ style adherence0.80

Description

Description

Codex chat agents are observed doing implementation work themselves (writing code, making changes) instead of dispatching to worker tasks via wg add. The chat agent contract is supposed to be 'thin task-creator, not implementer'. Claude chat agents follow this; codex doesn't.

User report 2026-05-01: 'we have the codex .chat- agents always doing work themselves rather than making wg tasks. is there a prompting gap with codex/claude? like AGENTS.md or the .chat prompting isn't as clearly saying hey, don't just do the work unless the user asks you to.'

Confirmed root cause

After reorg-separate-universal (Apr 29) split CLAUDE.md into layer-2 only (project-specific) + bundled wg agent-guide (universal role contract), the same surgery was NOT applied to AGENTS.md.

Current state:

  • CLAUDE.md (5145 bytes): layer-2 only, says 'run wg agent-guide for the universal contract'
  • AGENTS.md (7687 bytes): has the OLDER mixed content with the role contract INLINE, never updated post-reorg

Net effect:

  • claude agents → read CLAUDE.md → run wg agent-guide → see the canonical, possibly more directive role contract
  • codex agents → read AGENTS.md → see older inline role contract → less consistent enforcement

Plus likely behavioral asymmetry: codex's 'be helpful, do the work' baseline is stronger than its instruction-following especially when the role contract feels softer/older than the bundled version.

Spec

Fix 1: bring AGENTS.md into parity with CLAUDE.md

  • AGENTS.md becomes layer-2 only (workgraph-as-a-project content)
  • Strip the inline universal role contract
  • Add the same 'run wg agent-guide for the universal contract' pointer that CLAUDE.md has
  • Both files point at the SAME bundled source of truth — no drift

Fix 2: strengthen wg agent-guide's chat-agent role language

The current bundled agent-guide should be loud about 'DO NOT WRITE CODE. DO NOT IMPLEMENT.' Specifically:

  • Lead with the role distinction prominently — the FIRST thing a chat agent reads should be 'You are a chat agent. Your job is to create wg tasks via wg add, NOT to do the work yourself.'
  • Add concrete anti-patterns: 'Don't run cargo build. Don't open the editor. Don't grep for code. Use wg add to dispatch every code-touching action to a worker.'
  • Add explicit list of things chat agents CAN do: wg show, wg list, wg log, wg add, wg edit, wg publish. Things they CAN'T do: cargo, grep on source, edit files in src/, etc.

Fix 3 (optional, codex-specific): codex chat spawn includes an extra system-prompt addendum

If codex's behavioral baseline still pulls toward 'do work' even with strengthened agent-guide, add a codex-specific addendum at chat spawn that says (loudly) 'STOP. Do not write code. Use wg add for any implementation. The user is talking to you to ORCHESTRATE work, not to receive it.' Inject this when spawning codex chat tabs specifically.

This is the same kind of asymmetry the codex bypass-flag fix addressed (different handler needs different treatment). Acceptable to ship.

Validate behavior empirically

The proof is empirical: a codex chat agent receiving a 'fix bug X' request should respond with 'I'll file this as a wg task' + actual wg add invocation, NOT with 'Let me look at the code...' followed by editing.

Validation

  • Failing test or behavioral repro: spawn a codex chat agent, give it a code-touching request ('fix bug Y in src/foo.rs'). Pre-fix: agent reads source / makes edits. Post-fix: agent files wg add and waits for the worker.
  • AGENTS.md is now layer-2 only (workgraph-project context only); same shape as CLAUDE.md
  • grep AGENTS.md for inline role-contract content: zero matches (or only pointers to wg agent-guide)
  • wg agent-guide content updated with stronger / clearer chat-agent role language
  • Same behavioral test passes for claude chat agent (no regression)
  • If Fix 3 implemented: codex chat spawn args include the system-prompt addendum
  • cargo build + cargo test pass
  • cargo install --path . was run before claiming done

Process note

This is exactly the kind of asymmetry a comprehensive doc-sync would have caught — both files have similar surface but different age. The doc-sync function template should be amended to: 'AGENTS.md and CLAUDE.md should be checked together; any drift between them is a bug, not an intentional difference.'

The autohaiku evaluator-grade-zero bug from earlier today (.workgraph paths) and this bug share a root cause: agent-visible documentation drift. Each instance feels small but they compound — agents make decisions based on stale text and we don't notice until something goes wrong.

Depends on

Required by

Log