Metadata
| Status | failed |
|---|---|
| Assigned | agent-1398 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-05-01T18:38:30.148363534+00:00 |
| Started | 2026-05-01T20:01:34.986086738+00:00 |
| Tags | priority-high,fix,perf,tui, eval-scheduled |
| Tokens | 2653677 in / 9778 out |
| Failure reason | rescue eval unavailable after 2 attempts; falling back to terminal failure |
Description
Description
diagnose-tui-scales (agent-1347) identified the cause + spec'd 6 prioritized fixes. Read its log via wg show diagnose-tui-scales for the full forensic + benchmark scenarios.
Root cause (already proven)
Not quadratic — O(N) per refresh × high refresh-rate. With 8 active agents each appending to output.log every ~100ms, the recursive fs watcher fans in 80+ events/sec, each triggering a full pipeline pass that re-reads same files multiple times, re-walks archive dirs, etc. Single-threaded main loop means PTY keystroke echo waits behind graph render.
The 6 fixes (apply in priority order)
Fix 1 (HIGHEST IMPACT) — message_stats + coordinator_message_status caching + fold-into-one-pass
- Files: src/messages.rs:207-256, :667-713, src/commands/viz/mod.rs:736-756, src/tui/viz_viewer/state.rs:7092-7138
- Cache per (task_id, mtime) in VizApp
- fs watcher already knows changed paths — surface that path list (currently discarded at start_fs_watcher state.rs:7107) and selectively invalidate
- The two functions each call list_messages() — fold into ONE pass that reads the file once
Fix 2 — live_token_usage + agency_token_usage caching
- Files: src/commands/viz/mod.rs:646-733, src/graph.rs:914 (parse_token_usage_live)
- Cache per (agent_id, output_log_mtime) and per (task_id, lifecycle_member_mtime)
- Currently re-walks log/agents//* every refresh
Fix 3 — eliminate second graph load in apply_sort_mode
- Files: src/tui/viz_viewer/state.rs:7815-7835
- Pass already-loaded WorkGraph in (or precompute and cache the status_map)
- Two graph loads per refresh is pure waste
Fix 4 — throttle viz regen
- Files: src/tui/viz_viewer/state.rs:7143-7250
- Add 'last_full_refresh_at' guard; cap at ~200ms (5fps) regardless of fs event rate
- Current: full pipeline runs every wakeup
Fix 5 (THE INPUT-LATENCY KILLER) — decouple chat PTY render from graph render
Two options, prefer (b) but (a) is acceptable as v1:
(a) Cheap: in chat_pty_mode, when redraw is triggered by chat_pty_has_new_bytes() (PTY echo) but NOT by graph state change, skip load_viz_from_graph + apply_sort_mode + load_stats_from_graph. Render cached lines verbatim. Keystrokes echo at PTY speed.
(b) Better: spawn maybe_refresh heavy work on a background thread that posts a snapshot via channel; main loop only reads latest snapshot. fs watcher already runs in own thread; this is natural extension.
Fix 6 — per-agent tail thread for stream parsing
- Files: src/tui/viz_viewer/state.rs:10857-end of update_agent_streams
- Move agent stream parsing off the main thread
Validation
Each fix has a dedicated benchmark scenario from the diagnose (A-E):
-
A. tui_idle_fps — wg tui at idle, 1000 tasks, no agents. Measure render fps.
-
B. tui_loaded_cpu — same with fixture simulating 8 agents appending to output.log every 100ms for 30s; ASSERT wg tui CPU < 40%.
-
C. tui_chat_input_latency — wg tui in chat_pty_mode against 1000-task graph + 8 simulated writers; drive 50 keystrokes via tmux send-keys; ASSERT p99 echo delay < 50ms.
-
D. cargo bench bench_generate_viz_output — N ∈ {100, 500, 1000, 2000}; ASSERT near-linear scaling and < 50ms at N=1000.
-
E. cargo bench bench_message_stats_pair — fold-to-one-pass; ASSERT < 50% of baseline.
-
Failing tests/benchmarks written first per the diagnose's spec
-
Each of the 6 fixes applied
-
All 5 benchmark scenarios PASS the asserts
-
Live smoke against this project (~250 tasks, 8 agents busy): chat input latency feels snappy; CPU stays well under 100%; viewport doesn't lag
-
No regression of revert-redo-fix's last_interaction_at primitive (when it lands first)
-
cargo build + cargo test pass
-
Permanent smoke scenarios A-E added to manifest with this task id in owners
-
cargo install --path . was run before claiming done
Why depends on revert-redo-fix
Both touch src/tui/viz_viewer/state.rs heavily (apply_sort_mode, maybe_refresh, scroll/sort logic). Serializing avoids a merge fight. revert-redo-fix's last_interaction_at primitive may also offer cleaner integration points for caching keys (e.g., (task_id, last_interaction_at) as a cache key naturally invalidates).
Process note
This is a substantial multi-fix task. Apply all 6 in priority order. The diagnose did the design work; the implementer executes against file:line spec. If any one fix turns out wrong/incomplete, file follow-up rather than abandoning all six.
Depends on
Required by
Log
- 2026-05-01T18:38:30.132616504+00:00 Task paused
- 2026-05-01T18:38:30.192664692+00:00 Task published
- 2026-05-01T18:38:53.556262858+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer's 0.80 score and 511 codebase tasks make it ideal for precision-critical performance optimization; Careful tradeoff matches methodical application of 6 coordinated fixes with comprehensive benchmark validation.
- 2026-05-01T20:01:34.986094102+00:00 Spawned by coordinator --executor claude --model opus
- 2026-05-01T20:01:49.621866048+00:00 Starting work — clean worktree, no prior WIP. Plan: read diagnose-tui-scales spec, then apply 6 fixes in priority order with benchmarks.
- 2026-05-01T20:38:14.057726880+00:00 Agent exited without wg done — entering failed-pending-eval for rescue evaluation
- 2026-05-01T20:41:30.673157955+00:00 FailedPendingEval → Failed (rescue eval unavailable after 2 attempts)