Metadata
| Status | done |
|---|---|
| Assigned | agent-1109 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-04-29T14:37:10.497590740+00:00 |
| Started | 2026-04-29T14:38:07.217484840+00:00 |
| Completed | 2026-04-29T15:00:40.717812643+00:00 |
| Tags | priority-high,bug,agency,config, eval-scheduled |
| Eval score | 0.82 |
| └ blocking impact | 0.85 |
| └ completeness | 0.80 |
| └ constraint fidelity | 0.70 |
| └ coordination overhead | 0.85 |
| └ correctness | 0.85 |
| └ downstream usability | 0.85 |
| └ efficiency | 0.80 |
| └ intent fidelity | 0.87 |
| └ style adherence | 0.90 |
Description
Description
Per CLAUDE.md contract:
.evaluate-, .flip-, and .assign-* tasks ... are pinned to claude:haiku ... Power users who want a non-Anthropic provider for these roles can override per-role via [models.evaluator] / [models.assigner] etc. in config; explicit overrides win, cascade does not.
Observed 2026-04-29: in ~/autohaiku (initialized via wg init --route codex-cli, which writes per-role codex overrides):
[models.assigner]
model = "codex:gpt-5.4-mini"
[models.evaluator]
model = "codex:gpt-5.4-mini"
[models.flip_inference]
model = "codex:gpt-5.4-mini"
[models.flip_comparison]
model = "codex:gpt-5.4-mini"
But wg agents shows the .assign-* task ran on claude:
agent-1 .assign-quality-pass-20260429t000000 claude 3145576 1m done
agent-2 quality-pass-20260429t000000 codex 3145750 59s working
The explicit override is NOT being honored. Worker correctly uses codex; agency assigner falls back to the claude pin even when the user explicitly chose codex for that role.
This makes "easy switch to codex" misleading — the route writes the right config but the runtime ignores it.
Repro
mkdir tmp-test && cd tmp-testwg init --route codex-cligrep -A1 'models.assigner' .wg/config.toml→ confirms codex:gpt-5.4-mini overridewg service start --max-agents 1wg add 'hello' && wg publish hellowg agentsafter .assign-hello spawns → BUG: executor is claude (should be codex)
Investigation
- Find where the agency spawn site picks the model. Likely in src/agency/ or src/dispatch/ near the .assign / .evaluate / .flip task spawn paths.
- Check the model-resolution order. Per CLAUDE.md it should be:
- Explicit [models.] override → use that
- Else → pin to claude:haiku
- Verify whether the bug is:
- The override is read but the pin overrides it (logic bug — pin should be FALLBACK, not preempting)
- The override is never read at agency spawn time (lookup bug)
- The override is applied to a different code path than what spawns these tasks (architectural drift)
Validation
- Failing test written first (TDD): build a project with [models.assigner] = codex:X, spawn an .assign-* task, assert the spawned process's executor=codex
- Same test for [models.evaluator] and [models.flip_inference] / [models.flip_comparison]
-
Live smoke: in a fresh tmpdir with
wg init --route codex-cli, spawn a task, observe.assign-*runs on codex (pastewg agentsoutput as evidence) - Fallback still works: a project with NO [models.assigner] override correctly falls back to claude:haiku (don't break the default)
- CLAUDE.md contract rephrased / clarified if needed — the current wording is right, the implementation just doesn't match
- cargo build + cargo test pass with no regressions
- Permanent smoke scenario added that exercises the override path; this task id in owners
- cargo install --path . was run before claiming done
Process note
This is a 'config UX promise vs runtime behavior' divergence. The codex-cli route writes the override, gives the user the impression of an all-codex setup, but the runtime silently falls back to claude on agency tasks. Surface this asymmetry in the task log if not fully resolvable: at minimum the user should get a clear warning at wg service start if [models.] overrides exist but won't be honored.
Depends on
Required by
- (none)
Log
- 2026-04-29T14:37:10.485795896+00:00 Task paused
- 2026-04-29T14:37:39.284769283+00:00 Task published
- 2026-04-29T14:38:05.539273049+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer with proven performance (0.77 score, 296 tasks) best suited for correctness-critical config UX bug fix requiring TDD, code investigation, integration testing, and cargo install validation.
- 2026-04-29T14:38:07.217488867+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-29T14:38:20.325525778+00:00 Starting investigation: tracing where agency tasks (.assign-/.evaluate-/.flip-) resolve their model
- 2026-04-29T14:39:13.132870658+00:00 Found: spawn_eval_inline + spawn_assign_inline hardcode executor=claude in registry. Actual LLM call is inside spawned 'wg evaluate run' / 'wg assign --auto' which calls run_lightweight_llm_call. Need to trace codex: prefix handling there.
- 2026-04-29T14:40:39.360357289+00:00 Plan: 1) Add resolve_agency_handler() that returns ExecutorKind+model when override present, 2) Add call_codex_cli for codex agency dispatch, 3) Update spawn_eval_inline/spawn_assign_inline to record actual handler+model in registry, not hardcoded claude. 4) Update run_lightweight_llm_call to route via handler_for_model when explicit override present.
- 2026-04-29T14:59:32.808593988+00:00 Validated: cargo build + cargo test (lib) pass for service::llm tests (18 passed). Pre-existing failures in config::tests::test_global_config_path and integration_provider were NOT caused by my changes (verified via stash). cargo install --path . completed.
- 2026-04-29T14:59:37.461751874+00:00 Live smoke evidence: Created /tmp/wg-codex-smoke via wg init --route codex-cli. Daemon log shows '[dispatcher] Lightweight assignment for smoke-parent: Default Assigner (eea940a6) [path=Learning]' — the in-process Assigner LLM call ran on codex via call_codex_cli (verified by valid response text 'The task is an assignment/meta-selection request...' that codex produced). Pre-fix this would have routed via call_openai_native (no key) and fallen back to claude CLI with model='gpt-5.4-mini' (which claude doesn't recognize).
- 2026-04-29T14:59:39.544912825+00:00 New smoke scenario PASS: agency_codex_override_routes_to_codex.sh asserts .evaluate-* inline spawn registers executor=codex + model=codex:gpt-5.4-mini when [models.evaluator] = codex:* override is configured. Existing scenario agency_inline_spawn_registers_executor_claude.sh still PASSES with default config (executor=claude). Both fast (<5s, no LLM call).
- 2026-04-29T15:00:22.894041270+00:00 Committed: 13f7dfca1 — pushed to remote
- 2026-04-29T15:00:40.717816771+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
- 2026-04-29T15:14:29.522737578+00:00 PendingEval → Done (evaluator passed; downstream unblocks)