fix-codex-agent-2

Fix: codex agent token tracking — parser doesn't handle codex's exec --json output

Metadata

Statusdone
Assignedagent-2269
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Modelcodex:gpt-5.5
Created2026-05-04T13:40:47.410021877+00:00
Started2026-05-04T14:14:11.503338859+00:00
Completed2026-05-04T15:00:48.811887610+00:00
Tagspriority-high,fix,bug,codex,observability,cost, eval-scheduled
Eval score0.81
└ blocking impact0.85
└ completeness0.80
└ coordination overhead0.80
└ correctness0.85
└ downstream usability0.75
└ efficiency0.72
└ intent fidelity0.90
└ style adherence0.85

Description

Description

Codex agent task runs produce NO token usage / cost stats, while claude agent runs produce them correctly. Confirmed empirically:

$ wg show implement-nex-chat   (codex:gpt-5.5)
  [no Tokens/Cost line]

$ wg show fix-nex-tui          (claude:opus)
  Tokens: 0/140k (in/out) +54M cached
  Cost: $33.10

User report 2026-05-04: 'Looks like token tracking isn't really working for Codex.'

Likely cause

The token-extraction parser was written for claude's stream-json output format (where each assistant message includes a usage object with input_tokens, output_tokens, cache_read_input_tokens, etc.). Codex's exec --json output has a different shape and our parser doesn't recognize it.

Verify by:

  1. Reading a recent codex run's raw output — what does codex emit for token usage?
  2. Comparing to a recent claude run's raw output
  3. Identifying the format-specific extraction code in the workgraph token parser

Investigation + fix

  1. Find the parser — likely src/graph.rs::parse_token_usage_live (cited in diagnose-tui-scales) or src/messages.rs or wherever per-task token aggregation happens
  2. Identify the claude-specific JSON shape it handles
  3. Find what codex actually emits — read /home/erik/workgraph/.wg/output/<codex-task>/log.json or wherever codex task logs are stored
  4. Add a codex-shape arm to the parser; aggregate the same way (input/output/cache + cost)

Validation

  • Failing test: parse a known codex exec --json output sample, assert tokens > 0
  • Same test for claude (no regression on existing path)
  • Live smoke: run a small task on codex:gpt-5.5; wg show <task> after completion shows Tokens + Cost lines populated
  • Cost calculation: codex uses different per-MTok pricing than claude (gpt-5.5 = $5/$30; claude:opus = $15/$75). Ensure cost calc reads the right pricing per provider.
  • Aggregate views (wg agency stats, wg spend, dashboards) reflect codex tokens correctly
  • No regression of nex / openrouter / other handler token tracking (audit while at it — same class of gap may apply)
  • cargo build + cargo test pass
  • Permanent smoke scenario added that exercises codex token capture
  • cargo install --path . was run before claiming done

Coordinate

This is the same class of bug as fix-replace-stale's residual gap (clap defaults missed by a sweep) and fix-agents-md (AGENTS.md missed by reorg). Cross-handler parity gets missed when fixes are written claude-first and not validated against codex/nex. Worth noting in the doc-sync function template's audit-terminology-consistency template: 'cross-handler parity audit — every claude-specific code path should have an equivalent for codex and nex unless explicitly marked claude-only'.

Depends on

Required by

Log