Metadata
| Status | abandoned ‖ paused |
|---|---|
| Created | 2026-04-26T15:15:41.448453417+00:00 |
Description
Description
Daemon boot calls CoordinatorState::load_all(dir). When no service/coordinator-state-N.json files exist (fresh install or after rm -rf .wg), the legacy fallback at src/commands/service/mod.rs:574-578 synthesizes a (0, default) entry. The daemon then spawns a Coordinator-0 supervisor for it, which formats task id .coordinator-0 and tries to spawn wg spawn-task .coordinator-0 — but no such task exists in the graph, since the actual coordinator created via TUI is .coordinator-1 (or higher; find_next_fresh_coordinator_id skipped 0 because a chat dir for coordinator-0 existed once).
Symptom in daemon log:
[INFO] Coordinator-0: spawning via `wg spawn-task .coordinator-0` (executor=claude, model=None)
[ERROR] Coordinator-0: failed to spawn ... (os error 2)
[ERROR] Coordinator-0: 3 restarts in last 10 minutes, pausing for 584s
Net effect: every fresh wg init produces a ghost coordinator that burns the restart budget, and the user-created coordinator never actually gets a working supervisor (because the supervisor is bound to the ghost id, not the real task).
Fix
-
Don't synthesize a phantom coordinator from absence of state files. The legacy fallback should only fire if there's evidence a coordinator-0 ever existed (e.g.
.wg/chat/coordinator-0/dir, or a.coordinator/.coordinator-0task in the graph). No state file + no chat dir + no graph task → no coordinator. Empty list is the correct return. -
Tie supervisor lifecycle to graph state, not state files. The daemon should derive 'which coordinators need supervisors' from
tasks().filter(coordinator-loop tag, status != Abandoned, !archived). State files are overrides, not the source of truth for existence. -
Defensive check in
subprocess_coordinator_loop: before spawning, verify the task id exists in the graph. If not, log a clear error ('Coordinator-N orphaned: task .coordinator-N not in graph; supervisor exiting') and exit the loop instead of restart-looping.
Files to touch
src/commands/service/mod.rs— fixCoordinatorState::load_allto not synthesize coordinator 0; or better, deprecateload_allin favor of a graph-driven enumeration in the daemon boot path.src/commands/service/coordinator_agent.rs— add the orphaned-task guard before spawn.- Daemon boot logic (wherever
load_allis consumed at boot) — switch to graph-driven coordinator enumeration.
Validation
-
Failing tests first:
- test_load_all_returns_empty_when_no_state_and_no_legacy — ensures fresh install doesn't synthesize Coordinator-0
- test_supervisor_exits_when_task_missing — guard in subprocess_coordinator_loop
- test_daemon_boot_enumerates_coordinators_from_graph — boot path picks up .coordinator-N tasks via tag scan, not state files
- Implementation makes tests pass
- cargo build + cargo test pass with no regressions
-
Manual smoke (in scratch dir):
- rm -rf .wg && wg init -x claude && wg service start
- tail daemon.log: NO 'Coordinator-0: spawning' lines, NO 'failed to spawn' restart loop
- Open wg tui, create coordinator named 'test'
- Daemon log shows 'Coordinator-1: subprocess running (pid X, executor=claude)' (matching the actual task .coordinator-1)
- Send a chat message in TUI; coordinator responds
Depends on
- (none)
Required by
- (none)
Log
- 2026-04-26T15:15:41.448277233+00:00 Task paused
- 2026-04-26T16:02:08.135038958+00:00 Task abandoned