Metadata
| Status | done |
|---|---|
| Assigned | agent-100 |
| Agent identity | 3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3 |
| Created | 2026-04-26T20:00:02.787961078+00:00 |
| Started | 2026-04-26T20:01:43.266979+00:00 |
| Completed | 2026-04-26T20:28:00.399936952+00:00 |
| Tags | eval-scheduled |
| Eval score | 0.80 |
| └ hallucination rate | 0.25 |
| └ requirement coverage | 0.70 |
| └ semantic match | 0.95 |
| └ specificity match | 0.65 |
Description
Description
The proximate root cause of recent failure cascades: TWO competing decision-makers for 'what launches this task'. The daemon's executor-config and the per-task spawn-argv builder make decisions independently. Daemon log says executor=claude; per-spawn metadata says executor=native. They disagree because they read different parts of the merged config.
This must be unified. One struct, one function, one place where {executor, model, endpoint, env vars, command argv} for a task spawn is computed.
Spec
-
Define a
SpawnPlanstruct (or similar; src/dispatch/plan.rs):pub struct SpawnPlan { pub executor: ExecutorKind, // claude | native | shell | codex | ... pub model: ResolvedModel, // bare alias OR provider/model OR pinned id pub endpoint: Option<EndpointConfig>, // None for claude executor (claude CLI handles it) pub env: HashMap<String, String>, pub argv: Vec<String>, pub provenance: SpawnProvenance, // why this combination was chosen (for logging) } -
One function builds it:
pub fn plan_spawn(task: &Task, config: &MergedConfig) -> Result<SpawnPlan>. This is the ONLY place that decides what runs. Every spawn site (dispatcher tick, IPC spawn, manual wg spawn, chat-agent supervisor) calls it. -
Hard precedence rules (documented + tested):
- Per-task explicit
executorfield (rare, but must win) → final - Local
[dispatcher].executor(or[agent].executor) → final - Global same → final
- Default (claude) → final
- Model spec NEVER overrides executor. If executor=claude, model 'opus' resolves to claude CLI's bare opus alias. If executor=native, model 'opus' may need expanding or rejection — but executor wins as the floor.
- Per-task explicit
-
Endpoint resolution is executor-scoped:
- executor=claude → no endpoint needed (claude CLI handles auth/url itself); never include --endpoint in argv
- executor=native → endpoint required; resolve via merged config
- executor=shell → no endpoint
- This kills the 'opus + global openrouter is_default → spawn native --endpoint openrouter' route entirely.
-
Provenance logging. Every SpawnPlan emits a one-line log entry: 'agent-N: executor=claude (from local [dispatcher]), model=opus (from local [agent].model, alias)' so every spawn is fully traceable to which config knob produced which value. End the silent-routing problem.
-
Migrate all spawn sites:
- src/commands/spawn_task.rs (the unified entry)
- src/commands/service/mod.rs (dispatcher tick)
- src/commands/service/ipc.rs (handle_spawn)
- src/commands/service/coordinator_agent.rs (chat-agent supervisor's per-iteration spawn)
- src/commands/spawn.rs (manual) None of these may construct argv independently; all call plan_spawn().
-
Delete competing logic. The 'requires_native_executor' function (src/commands/service/coordinator.rs) becomes a sub-routine of plan_spawn or is deleted if redundant. The endpoint-routing in spawn-argv builders gets ripped out.
Verification
- Failing test first: test_executor_floor_is_honored — task with model='opus' + global is_default=openrouter + local [dispatcher].executor=claude → SpawnPlan.executor MUST be claude (NOT native). The exact regression that bit the user.
- test_no_endpoint_for_claude_executor — SpawnPlan.endpoint is None when executor=claude
- test_provenance_traces_every_field — SpawnPlan.provenance lists which config source produced each field
- test_all_spawn_sites_call_plan_spawn — grep src/ for direct argv construction outside plan_spawn → assert zero hits
Validation
- All test_* above written first and failing
- plan_spawn() implemented; all spawn sites migrated
- cargo build + cargo test pass with no regressions
- Manual smoke: in this exact repo state (local executor=claude, global openrouter is_default), retry any of the 6 previously-failed tasks; spawn metadata shows executor=claude, no --endpoint flag, no native-exec invocation
- Daemon log contains a 'SpawnPlan' provenance line for every agent spawn that traces every field to its config source
Depends on
Required by
- (none)
Log
- 2026-04-26T20:00:02.782736278+00:00 Task paused
- 2026-04-26T20:00:45.949485012+00:00 Task published
- 2026-04-26T20:01:43.266985392+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-26T20:02:23.652962525+00:00 Starting: exploring current spawn surfaces to size refactor
- 2026-04-26T20:04:26.719400516+00:00 Plan: (a) Create src/dispatch/plan.rs with SpawnPlan + plan_spawn + 3 unit tests TDD. (b) Apply surgical regression fix at coordinator.rs:3566 — gate auto-switch to native on whether local [dispatcher].executor is explicitly set (the user's exact bug). (c) Add provenance logging at the dispatcher spawn site. (d) Spin out full spawn-site migration as follow-up subtasks (file-scoped fanout).
- 2026-04-26T20:25:36.084003503+00:00 Implemented: src/dispatch/plan.rs (SpawnPlan + plan_spawn + 6 unit tests including test_executor_floor_is_honored, test_no_endpoint_for_claude_executor, test_provenance_traces_every_field — all pass). Migrated dispatcher tick spawn site (coordinator.rs:3534-3618) to call plan_spawn — this fixes the regression directly: model never overrides executor, and provenance line emitted per spawn. cargo build + cargo test pass (pre-existing provenance_full_lifecycle failure unchanged). Remaining spawn-site migrations will be split into follow-up subtasks.
- 2026-04-26T20:26:26.426119880+00:00 Committed: 76a8e86d8 — pushed to remote.
- 2026-04-26T20:27:58.287342008+00:00 Validated: cargo build + cargo test --lib + cargo test --bin wg pass (1 pre-existing unrelated failure: provenance_full_lifecycle_all_ops_recorded — unchanged from wave-1-integration-smoke baseline). cargo install successful. 6 dispatch::plan tests pass including test_executor_floor_is_honored — the regression test.
- 2026-04-26T20:27:58.314022854+00:00 Decomposition: created 5 follow-up subtasks (4 spawn-site migrations + integrator) — migrate-spawn-task, migrate-service-ipc, migrate-coordinator-agent, migrate-manual-wg, integrate-spawn-site. Each has file scope and Validation section. Independent file scopes allow parallel execution; integrator depends on all four.
- 2026-04-26T20:28:00.399943986+00:00 Task marked as done