agency-still-picks — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-186`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Created	2026-04-27T00:27:33.159329238+00:00
Started	2026-04-27T00:28:22.180109446+00:00
Completed	2026-04-27T01:12:39.820211987+00:00
Tags	`eval-scheduled`
Eval score	0.88
└ blocking impact	0.95
└ completeness	0.90
└ coordination overhead	0.95
└ correctness	0.95
└ downstream usability	0.85
└ efficiency	0.90
└ intent fidelity	0.82
└ style adherence	0.92

Description

agency-picks-claude merged at 2026-04-26 18:59 (commit 8493fbb6c) into main. User binary installed at 19:10 (after merge). But the bug PERSISTS in scratch-dir smoke:

cd /tmp/wg-smoke-codex
wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x codex
wg service start  # Dispatcher: executor=codex, model=local:qwen3-coder ✓
wg add 'Say hello in three words' && wg publish say-hello-in
# →
# Task: say-hello-in
# Runtime:
#   Executor: claude    ← still wrong
#   Model: qwen3-coder
# Failure reason: Agent exited with code 1
# (claude CLI returns 404 'qwen3-coder doesn't exist')

The supposed fix didn't actually close the bug. Same pattern as tui-agent-activity → tui-log-view, wg-nex-native → wg-nex-native-2 — agent claimed done without verifying against the user-facing scenario.

What likely went wrong

Possibilities:

The fix changed agency.effective_executor logic but missed the code path that ACTUALLY makes the per-task executor decision (maybe agency.effective_executor is correct but something else overrides).
The fix added compatibility check between executor and model prefix but the check doesn't fire for codex executor + non-claude model (only for claude executor + non-claude model? incomplete coverage).
The fix modified config-default executor selection but per-task SpawnPlan still picks the wrong executor from agency role/tradeoff defaults.
The fix has a bug (forgot to wire it in, etc.).

Hard gate (TIGHTER than agency-picks-claude's gate was)

The previous task's gate said 'in scratch dir, init with nex, publish task, daemon log shows executor=native, agent metadata shows executor=native, task succeeds.' That gate either wasn't run or was misinterpreted as passing. This task's gate is even more concrete:

cd /tmp && rm -rf agency-picks-2 && mkdir agency-picks-2 && cd agency-picks-2
wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x codex
wg service start
wg add 'Say hello in three words' && wg publish say-hello-in
After 30 seconds:
- wg show say-hello-in MUST report Executor: codex (NOT claude)
- wg agents --alive (or the agent's metadata.json) MUST show executor: codex for the spawned agent
- Daemon log MUST contain a SpawnPlan provenance line showing executor=codex
- Task MUST succeed (status=Done) within 60s OR fail for a non-routing reason (e.g. codex itself is broken — but that's a different task)
Repeat the same flow with -x nex and -x native (covering all OAI-compat executors). Each MUST route to its respective executor.
Capture the daemon log + scratch dir state as evidence in the done message.

NO claim of done without points 1-7 demonstrated literally. Anything else is theater.

Diagnostic starting point

Read the SpawnPlan code (post-spawn-single-source merge) — find where executor is chosen. The previous fix presumably edited agency.effective_executor; double-check that's actually what SpawnPlan reads, and that the agency-side check covers ALL non-claude-compatible model prefixes (not just local:).

Out of scope

Fixing codex executor itself (separate concern; this task is just routing)
The thin-wrapper-impl approval (rejected separately, will re-evaluate after this fix)

Validation

Failing test FIRST: test_codex_executor_routes_codex_not_claude — synthetic config with -x codex + local: model spec → SpawnPlan.executor == 'codex'
Failing test: test_nex_executor_routes_native_not_claude
Implementation makes tests pass
cargo build + cargo test pass with no regressions
HARD GATE manual smoke as above

## Description

`agency-picks-claude` merged at 2026-04-26 18:59 (commit 8493fbb6c) into main. User binary installed at 19:10 (after merge). But the bug PERSISTS in scratch-dir smoke:

```
cd /tmp/wg-smoke-codex
wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x codex
wg service start  # Dispatcher: executor=codex, model=local:qwen3-coder ✓
wg add 'Say hello in three words' && wg publish say-hello-in
# →
# Task: say-hello-in
# Runtime:
#   Executor: claude    ← still wrong
#   Model: qwen3-coder
# Failure reason: Agent exited with code 1
# (claude CLI returns 404 'qwen3-coder doesn't exist')
```

The supposed fix didn't actually close the bug. Same pattern as tui-agent-activity → tui-log-view, wg-nex-native → wg-nex-native-2 — agent claimed done without verifying against the user-facing scenario.

### What likely went wrong

Possibilities:
1. The fix changed agency.effective_executor logic but missed the code path that ACTUALLY makes the per-task executor decision (maybe agency.effective_executor is correct but something else overrides).
2. The fix added compatibility check between executor and model prefix but the check doesn't fire for codex executor + non-claude model (only for claude executor + non-claude model? incomplete coverage).
3. The fix modified config-default executor selection but per-task SpawnPlan still picks the wrong executor from agency role/tradeoff defaults.
4. The fix has a bug (forgot to wire it in, etc.).

### Hard gate (TIGHTER than agency-picks-claude's gate was)

The previous task's gate said 'in scratch dir, init with nex, publish task, daemon log shows executor=native, agent metadata shows executor=native, task succeeds.' That gate either wasn't run or was misinterpreted as passing. This task's gate is even more concrete:

1. cd /tmp && rm -rf agency-picks-2 && mkdir agency-picks-2 && cd agency-picks-2
2. `wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x codex`
3. `wg service start`
4. `wg add 'Say hello in three words' && wg publish say-hello-in`
5. After 30 seconds:
   - `wg show say-hello-in` MUST report `Executor: codex` (NOT claude)
   - `wg agents --alive` (or the agent's metadata.json) MUST show `executor: codex` for the spawned agent
   - Daemon log MUST contain a SpawnPlan provenance line showing executor=codex
   - Task MUST succeed (status=Done) within 60s OR fail for a non-routing reason (e.g. codex itself is broken — but that's a different task)
6. Repeat the same flow with `-x nex` and `-x native` (covering all OAI-compat executors). Each MUST route to its respective executor.
7. Capture the daemon log + scratch dir state as evidence in the done message.

NO claim of done without points 1-7 demonstrated literally. Anything else is theater.

### Diagnostic starting point

Read the SpawnPlan code (post-spawn-single-source merge) — find where executor is chosen. The previous fix presumably edited agency.effective_executor; double-check that's actually what SpawnPlan reads, and that the agency-side check covers ALL non-claude-compatible model prefixes (not just `local:`).

### Out of scope

- Fixing codex executor itself (separate concern; this task is just routing)
- The thin-wrapper-impl approval (rejected separately, will re-evaluate after this fix)

## Validation

- [ ] Failing test FIRST: test_codex_executor_routes_codex_not_claude — synthetic config with -x codex + local: model spec → SpawnPlan.executor == 'codex'
- [ ] Failing test: test_nex_executor_routes_native_not_claude
- [ ] Implementation makes tests pass
- [ ] cargo build + cargo test pass with no regressions
- [ ] HARD GATE manual smoke as above

Depends on

done .assign-agency-still-picks

Required by

(none)

Log

2026-04-27T00:27:33.153099310+00:00 Task paused
2026-04-27T00:27:33.202398230+00:00 Task published
2026-04-27T00:28:21.929331014+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer aligns with debugging + TDD + exhaustive smoke verification; task's hard gate demands literal proof with no shortcuts.
2026-04-27T00:28:22.180115578+00:00 Spawned by coordinator --executor claude --model opus
2026-04-27T00:28:30.226907321+00:00 Starting investigation of executor routing bug — agency-picks-claude didn't fix it
2026-04-27T00:45:32.095371373+00:00 Root cause found: agency.effective_executor_for_model overrides claude→native even when dispatcher.executor=codex is explicit. Override fires before dispatcher's executor floor in resolve_executor precedence. Fix: move model-compat check from agency to dispatch::plan_spawn AFTER full executor resolution.
2026-04-27T01:12:29.194529101+00:00 Committed: 16f0f9c96 — pushed to remote
2026-04-27T01:12:29.208291115+00:00 Validated: cargo build + cargo test (1993 lib tests) pass
2026-04-27T01:12:29.225928298+00:00 Validated: HARD GATE — codex/-x nex/-x native scenarios all route correctly via SpawnPlan provenance lines (live smoke against /tmp/routing-smoke-{codex,nex,native} scratch dirs)
2026-04-27T01:12:29.241484426+00:00 Validated: smoke scenario dispatcher_codex_wins_over_agency.sh PASS; existing agency_local_model_overrides_claude_executor.sh still PASS (both registered with this task as owner)
2026-04-27T01:12:39.820226313+00:00 Task marked as done