Metadata
| Status | done |
|---|---|
| Assigned | agent-186 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-04-27T00:27:33.159329238+00:00 |
| Started | 2026-04-27T00:28:22.180109446+00:00 |
| Completed | 2026-04-27T01:12:39.820211987+00:00 |
| Tags | eval-scheduled |
| Eval score | 0.88 |
| └ blocking impact | 0.95 |
| └ completeness | 0.90 |
| └ coordination overhead | 0.95 |
| └ correctness | 0.95 |
| └ downstream usability | 0.85 |
| └ efficiency | 0.90 |
| └ intent fidelity | 0.82 |
| └ style adherence | 0.92 |
Description
Description
agency-picks-claude merged at 2026-04-26 18:59 (commit 8493fbb6c) into main. User binary installed at 19:10 (after merge). But the bug PERSISTS in scratch-dir smoke:
cd /tmp/wg-smoke-codex
wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x codex
wg service start # Dispatcher: executor=codex, model=local:qwen3-coder ✓
wg add 'Say hello in three words' && wg publish say-hello-in
# →
# Task: say-hello-in
# Runtime:
# Executor: claude ← still wrong
# Model: qwen3-coder
# Failure reason: Agent exited with code 1
# (claude CLI returns 404 'qwen3-coder doesn't exist')
The supposed fix didn't actually close the bug. Same pattern as tui-agent-activity → tui-log-view, wg-nex-native → wg-nex-native-2 — agent claimed done without verifying against the user-facing scenario.
What likely went wrong
Possibilities:
- The fix changed agency.effective_executor logic but missed the code path that ACTUALLY makes the per-task executor decision (maybe agency.effective_executor is correct but something else overrides).
- The fix added compatibility check between executor and model prefix but the check doesn't fire for codex executor + non-claude model (only for claude executor + non-claude model? incomplete coverage).
- The fix modified config-default executor selection but per-task SpawnPlan still picks the wrong executor from agency role/tradeoff defaults.
- The fix has a bug (forgot to wire it in, etc.).
Hard gate (TIGHTER than agency-picks-claude's gate was)
The previous task's gate said 'in scratch dir, init with nex, publish task, daemon log shows executor=native, agent metadata shows executor=native, task succeeds.' That gate either wasn't run or was misinterpreted as passing. This task's gate is even more concrete:
- cd /tmp && rm -rf agency-picks-2 && mkdir agency-picks-2 && cd agency-picks-2
wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x codexwg service startwg add 'Say hello in three words' && wg publish say-hello-in- After 30 seconds:
wg show say-hello-inMUST reportExecutor: codex(NOT claude)wg agents --alive(or the agent's metadata.json) MUST showexecutor: codexfor the spawned agent- Daemon log MUST contain a SpawnPlan provenance line showing executor=codex
- Task MUST succeed (status=Done) within 60s OR fail for a non-routing reason (e.g. codex itself is broken — but that's a different task)
- Repeat the same flow with
-x nexand-x native(covering all OAI-compat executors). Each MUST route to its respective executor. - Capture the daemon log + scratch dir state as evidence in the done message.
NO claim of done without points 1-7 demonstrated literally. Anything else is theater.
Diagnostic starting point
Read the SpawnPlan code (post-spawn-single-source merge) — find where executor is chosen. The previous fix presumably edited agency.effective_executor; double-check that's actually what SpawnPlan reads, and that the agency-side check covers ALL non-claude-compatible model prefixes (not just local:).
Out of scope
- Fixing codex executor itself (separate concern; this task is just routing)
- The thin-wrapper-impl approval (rejected separately, will re-evaluate after this fix)
Validation
- Failing test FIRST: test_codex_executor_routes_codex_not_claude — synthetic config with -x codex + local: model spec → SpawnPlan.executor == 'codex'
- Failing test: test_nex_executor_routes_native_not_claude
- Implementation makes tests pass
- cargo build + cargo test pass with no regressions
- HARD GATE manual smoke as above
Depends on
Required by
- (none)
Log
- 2026-04-27T00:27:33.153099310+00:00 Task paused
- 2026-04-27T00:27:33.202398230+00:00 Task published
- 2026-04-27T00:28:21.929331014+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer aligns with debugging + TDD + exhaustive smoke verification; task's hard gate demands literal proof with no shortcuts.
- 2026-04-27T00:28:22.180115578+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-27T00:28:30.226907321+00:00 Starting investigation of executor routing bug — agency-picks-claude didn't fix it
- 2026-04-27T00:45:32.095371373+00:00 Root cause found: agency.effective_executor_for_model overrides claude→native even when dispatcher.executor=codex is explicit. Override fires before dispatcher's executor floor in resolve_executor precedence. Fix: move model-compat check from agency to dispatch::plan_spawn AFTER full executor resolution.
- 2026-04-27T01:12:29.194529101+00:00 Committed: 16f0f9c96 — pushed to remote
- 2026-04-27T01:12:29.208291115+00:00 Validated: cargo build + cargo test (1993 lib tests) pass
- 2026-04-27T01:12:29.225928298+00:00 Validated: HARD GATE — codex/-x nex/-x native scenarios all route correctly via SpawnPlan provenance lines (live smoke against /tmp/routing-smoke-{codex,nex,native} scratch dirs)
- 2026-04-27T01:12:29.241484426+00:00 Validated: smoke scenario dispatcher_codex_wins_over_agency.sh PASS; existing agency_local_model_overrides_claude_executor.sh still PASS (both registered with this task as owner)
- 2026-04-27T01:12:39.820226313+00:00 Task marked as done