Metadata
| Status | done |
|---|---|
| Assigned | agent-95 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-04-26T14:32:27.951357335+00:00 |
| Started | 2026-04-26T19:43:09.182334474+00:00 |
| Completed | 2026-04-26T20:41:44.544438113+00:00 |
| Tags | eval-scheduled |
| Eval score | 0.04 |
| └ blocking impact | 0.00 |
| └ completeness | 0.00 |
| └ coordination overhead | 0.20 |
| └ correctness | 0.05 |
| └ downstream usability | 0.00 |
| └ efficiency | 0.00 |
| └ intent fidelity | 0.09 |
| └ style adherence | 0.05 |
Description
Description
The coordinator currently polls the graph on a fixed interval (default 60s, also 30s and 10s in different places). Replace the pure-polling design with event-driven graph watching + a slower safety timer:
- Primary trigger: filesystem watch on
.wg/graph.jsonl(and any other coordinator-relevant files:.wg/service/*,.wg/agency/*if the coordinator reacts to those). New events wake the coordinator immediately. - Safety timer: fires every 30s (configurable) for work that isn't graph-change-driven — cycle_delay scheduling, agent heartbeat / timeout reaping, model registry refresh, compaction trigger checks, anything else time-based.
Requirements
- File watcher: use the
notifycrate (ornotify-debouncer-minifor built-in debounce). Watch.wg/graph.jsonland emit a 'graph changed' event. - Debounce: a single
wg add(or any wg command) can cause multiple writes within milliseconds. Coalesce events with a short debounce window (50–200ms) so the coordinator wakes once per logical change, not once per fsync. - Self-write filtering: when the coordinator itself writes the graph (e.g. updating task status), don't wake itself. Either ignore writes that happen between 'I'm about to write' and 'I'm done writing', or rely on the debounce + idempotent loop body.
- Fallback when watcher unavailable: inotify isn't available everywhere (some NFS mounts, WSL1, certain remote/sandbox filesystems). Detect at startup, log one clear warning, fall back to a short poll (e.g. 5s).
- Config consolidation: the current config has three intervals —
[coordinator] interval,[coordinator] poll_interval,[agent] interval. Audit what each one governs, document in the config schema, and where two can collapse into one (now that polling is the safety timer, not the primary trigger), collapse them. Default safety timer = 30s. Don't break existing configs — keep accepting the old keys with deprecation warnings. - TUI responsiveness: when a user adds a task in one terminal, the TUI in another terminal should reflect it within a second. Verify this in manual smoke.
Non-goals
- Don't replace the polling fallback entirely.
- Don't change the agent-spawning logic, only what triggers it.
- Don't try to watch every file in
.wg/— start withgraph.jsonland add others only if a clear coordinator-relevant event is missed.
Files likely to touch (best guess from grep, implementer should verify)
src/service/coordinator.rs(or wherever the main loop lives) — replace thesleep(poll_interval)with aselect!on (watcher_event, safety_tick, shutdown_signal).src/config.rs— schema changes for consolidating intervals + adding watcher-related options (debounce_ms, fallback_poll_interval).Cargo.toml— addnotify(ornotify-debouncer-mini).- Tests in
tests/for the new behavior.
Edge cases to handle
- Watcher process crashes mid-run → restart it once, then fall back to polling.
- Repo on NFS / Docker volume / network filesystem → fallback path must work cleanly.
- Multiple coordinators in different worktrees — don't wake on each other's graph writes (different .wg dirs, so naturally isolated, but verify).
- Graph file missing at startup → wait for it to appear (don't crash); useful for
wg initrace.
Validation
-
Failing tests written first:
- test_coordinator_wakes_on_graph_write (write to graph.jsonl while coordinator idle → coordinator processes within 200ms, well before safety timer)
- test_coordinator_debounces_burst_writes (10 writes in 50ms → coordinator wakes ≤ 2 times, not 10)
- test_coordinator_safety_timer_fires_with_no_graph_changes (no writes for 30s → safety timer triggers a loop iteration)
- test_coordinator_falls_back_when_watcher_init_fails (inject failure → service still works, logs warning, polls at fallback interval)
- test_config_legacy_poll_interval_accepted_with_deprecation_warning
- Implementation makes all tests pass
- cargo build + cargo test pass with no regressions
-
Manual smoke:
-
Start service. In another terminal,
wg add 'foo'. Within 1s, the new task is visible inwg listAND the coordinator log shows a wake event. -
Run
wg tuiin one pane,wg add 'bar'in another — TUI updates within a second. - Test on a tmpfs / NFS-mounted .wg dir if available (or simulate watcher failure) — confirm fallback poll engages.
-
Start service. In another terminal,
Depends on
Required by
- (none)
Log
- 2026-04-26T14:32:27.951083506+00:00 Task paused
- 2026-04-26T16:03:14.701272310+00:00 Task published
- 2026-04-26T16:14:03.137328279+00:00 Spawned by coordinator --executor native --model claude-opus-4-6
- 2026-04-26T16:14:03.162054938+00:00 Task marked as failed: Agent exited with code 1
- 2026-04-26T16:17:15.909496234+00:00 Task reset for retry from failed (attempt #2)
- 2026-04-26T18:59:15.121311333+00:00 Spawned by coordinator --executor native --model opus
- 2026-04-26T18:59:15.208802321+00:00 Task marked as failed: Agent exited with code 1
- 2026-04-26T19:10:28.329358216+00:00 Task reset for retry from failed (attempt #3)
- 2026-04-26T19:10:30.919905093+00:00 Spawned by coordinator --executor native --model opus
- 2026-04-26T19:10:30.941831526+00:00 Task marked as failed: Agent exited with code 1
- 2026-04-26T19:43:06.354144224+00:00 Task reset for retry from failed (attempt #4)
- 2026-04-26T19:43:09.182338882+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-26T19:43:18.594780163+00:00 Starting fresh attempt — previous failed on Anthropic API key. Investigating coordinator structure first.
- 2026-04-26T20:25:21.239359823+00:00 Implementation complete: GraphWatcher (notify-debouncer-mini) wired into daemon main loop via self-pipe; 4 unit tests + 7 alias integration tests pass; manual smoke test confirms direct file writes wake the daemon and self-write filter prevents feedback loop.
- 2026-04-26T20:41:30.173679930+00:00 Committed: 0eded76bc — pushed to remote (wg/agent-95/coordinator-inotify-graph)
- 2026-04-26T20:41:44.544446398+00:00 Task marked as done