agency-still-picks

Agency STILL picks claude executor for local model — agency-picks-claude (merged at 18:59) didn't fix it

Metadata

Statusdone
Assignedagent-186
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-04-27T00:27:33.159329238+00:00
Started2026-04-27T00:28:22.180109446+00:00
Completed2026-04-27T01:12:39.820211987+00:00
Tagseval-scheduled
Eval score0.88
└ blocking impact0.95
└ completeness0.90
└ coordination overhead0.95
└ correctness0.95
└ downstream usability0.85
└ efficiency0.90
└ intent fidelity0.82
└ style adherence0.92

Description

Description

agency-picks-claude merged at 2026-04-26 18:59 (commit 8493fbb6c) into main. User binary installed at 19:10 (after merge). But the bug PERSISTS in scratch-dir smoke:

cd /tmp/wg-smoke-codex
wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x codex
wg service start  # Dispatcher: executor=codex, model=local:qwen3-coder ✓
wg add 'Say hello in three words' && wg publish say-hello-in
# →
# Task: say-hello-in
# Runtime:
#   Executor: claude    ← still wrong
#   Model: qwen3-coder
# Failure reason: Agent exited with code 1
# (claude CLI returns 404 'qwen3-coder doesn't exist')

The supposed fix didn't actually close the bug. Same pattern as tui-agent-activity → tui-log-view, wg-nex-native → wg-nex-native-2 — agent claimed done without verifying against the user-facing scenario.

What likely went wrong

Possibilities:

  1. The fix changed agency.effective_executor logic but missed the code path that ACTUALLY makes the per-task executor decision (maybe agency.effective_executor is correct but something else overrides).
  2. The fix added compatibility check between executor and model prefix but the check doesn't fire for codex executor + non-claude model (only for claude executor + non-claude model? incomplete coverage).
  3. The fix modified config-default executor selection but per-task SpawnPlan still picks the wrong executor from agency role/tradeoff defaults.
  4. The fix has a bug (forgot to wire it in, etc.).

Hard gate (TIGHTER than agency-picks-claude's gate was)

The previous task's gate said 'in scratch dir, init with nex, publish task, daemon log shows executor=native, agent metadata shows executor=native, task succeeds.' That gate either wasn't run or was misinterpreted as passing. This task's gate is even more concrete:

  1. cd /tmp && rm -rf agency-picks-2 && mkdir agency-picks-2 && cd agency-picks-2
  2. wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x codex
  3. wg service start
  4. wg add 'Say hello in three words' && wg publish say-hello-in
  5. After 30 seconds:
    • wg show say-hello-in MUST report Executor: codex (NOT claude)
    • wg agents --alive (or the agent's metadata.json) MUST show executor: codex for the spawned agent
    • Daemon log MUST contain a SpawnPlan provenance line showing executor=codex
    • Task MUST succeed (status=Done) within 60s OR fail for a non-routing reason (e.g. codex itself is broken — but that's a different task)
  6. Repeat the same flow with -x nex and -x native (covering all OAI-compat executors). Each MUST route to its respective executor.
  7. Capture the daemon log + scratch dir state as evidence in the done message.

NO claim of done without points 1-7 demonstrated literally. Anything else is theater.

Diagnostic starting point

Read the SpawnPlan code (post-spawn-single-source merge) — find where executor is chosen. The previous fix presumably edited agency.effective_executor; double-check that's actually what SpawnPlan reads, and that the agency-side check covers ALL non-claude-compatible model prefixes (not just local:).

Out of scope

  • Fixing codex executor itself (separate concern; this task is just routing)
  • The thin-wrapper-impl approval (rejected separately, will re-evaluate after this fix)

Validation

  • Failing test FIRST: test_codex_executor_routes_codex_not_claude — synthetic config with -x codex + local: model spec → SpawnPlan.executor == 'codex'
  • Failing test: test_nex_executor_routes_native_not_claude
  • Implementation makes tests pass
  • cargo build + cargo test pass with no regressions
  • HARD GATE manual smoke as above

Depends on

Required by

Log