fix-tui-perf — Workgraph live mirror

Metadata

Status	failed
Assigned	`agent-1398`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Created	2026-05-01T18:38:30.148363534+00:00
Started	2026-05-01T20:01:34.986086738+00:00
Tags	`priority-high,fix,perf,tui`, `eval-scheduled`
Tokens	2653677 in / 9778 out
Failure reason	rescue eval unavailable after 2 attempts; falling back to terminal failure

Description

diagnose-tui-scales (agent-1347) identified the cause + spec'd 6 prioritized fixes. Read its log via wg show diagnose-tui-scales for the full forensic + benchmark scenarios.

Root cause (already proven)

Not quadratic — O(N) per refresh × high refresh-rate. With 8 active agents each appending to output.log every ~100ms, the recursive fs watcher fans in 80+ events/sec, each triggering a full pipeline pass that re-reads same files multiple times, re-walks archive dirs, etc. Single-threaded main loop means PTY keystroke echo waits behind graph render.

The 6 fixes (apply in priority order)

Fix 1 (HIGHEST IMPACT) — message_stats + coordinator_message_status caching + fold-into-one-pass

Files: src/messages.rs:207-256, :667-713, src/commands/viz/mod.rs:736-756, src/tui/viz_viewer/state.rs:7092-7138
Cache per (task_id, mtime) in VizApp
fs watcher already knows changed paths — surface that path list (currently discarded at start_fs_watcher state.rs:7107) and selectively invalidate
The two functions each call list_messages() — fold into ONE pass that reads the file once

Fix 2 — live_token_usage + agency_token_usage caching

Files: src/commands/viz/mod.rs:646-733, src/graph.rs:914 (parse_token_usage_live)
Cache per (agent_id, output_log_mtime) and per (task_id, lifecycle_member_mtime)
Currently re-walks log/agents//* every refresh

Fix 3 — eliminate second graph load in apply_sort_mode

Files: src/tui/viz_viewer/state.rs:7815-7835
Pass already-loaded WorkGraph in (or precompute and cache the status_map)
Two graph loads per refresh is pure waste

Fix 4 — throttle viz regen

Files: src/tui/viz_viewer/state.rs:7143-7250
Add 'last_full_refresh_at' guard; cap at ~200ms (5fps) regardless of fs event rate
Current: full pipeline runs every wakeup

Fix 5 (THE INPUT-LATENCY KILLER) — decouple chat PTY render from graph render

Two options, prefer (b) but (a) is acceptable as v1:

(a) Cheap: in chat_pty_mode, when redraw is triggered by chat_pty_has_new_bytes() (PTY echo) but NOT by graph state change, skip load_viz_from_graph + apply_sort_mode + load_stats_from_graph. Render cached lines verbatim. Keystrokes echo at PTY speed.

(b) Better: spawn maybe_refresh heavy work on a background thread that posts a snapshot via channel; main loop only reads latest snapshot. fs watcher already runs in own thread; this is natural extension.

Fix 6 — per-agent tail thread for stream parsing

Files: src/tui/viz_viewer/state.rs:10857-end of update_agent_streams
Move agent stream parsing off the main thread

Validation

Each fix has a dedicated benchmark scenario from the diagnose (A-E):

A. tui_idle_fps — wg tui at idle, 1000 tasks, no agents. Measure render fps.
B. tui_loaded_cpu — same with fixture simulating 8 agents appending to output.log every 100ms for 30s; ASSERT wg tui CPU < 40%.
C. tui_chat_input_latency — wg tui in chat_pty_mode against 1000-task graph + 8 simulated writers; drive 50 keystrokes via tmux send-keys; ASSERT p99 echo delay < 50ms.
D. cargo bench bench_generate_viz_output — N ∈ {100, 500, 1000, 2000}; ASSERT near-linear scaling and < 50ms at N=1000.
E. cargo bench bench_message_stats_pair — fold-to-one-pass; ASSERT < 50% of baseline.
Failing tests/benchmarks written first per the diagnose's spec
Each of the 6 fixes applied
All 5 benchmark scenarios PASS the asserts
Live smoke against this project (~250 tasks, 8 agents busy): chat input latency feels snappy; CPU stays well under 100%; viewport doesn't lag
No regression of revert-redo-fix's last_interaction_at primitive (when it lands first)
cargo build + cargo test pass
Permanent smoke scenarios A-E added to manifest with this task id in owners
cargo install --path . was run before claiming done

Why depends on revert-redo-fix

Both touch src/tui/viz_viewer/state.rs heavily (apply_sort_mode, maybe_refresh, scroll/sort logic). Serializing avoids a merge fight. revert-redo-fix's last_interaction_at primitive may also offer cleaner integration points for caching keys (e.g., (task_id, last_interaction_at) as a cache key naturally invalidates).

Process note

This is a substantial multi-fix task. Apply all 6 in priority order. The diagnose did the design work; the implementer executes against file:line spec. If any one fix turns out wrong/incomplete, file follow-up rather than abandoning all six.

## Description
diagnose-tui-scales (agent-1347) identified the cause + spec'd 6 prioritized fixes. Read its log via `wg show diagnose-tui-scales` for the full forensic + benchmark scenarios.

## Root cause (already proven)
Not quadratic — O(N) per refresh × high refresh-rate. With 8 active agents each appending to output.log every ~100ms, the recursive fs watcher fans in 80+ events/sec, each triggering a full pipeline pass that re-reads same files multiple times, re-walks archive dirs, etc. Single-threaded main loop means PTY keystroke echo waits behind graph render.

## The 6 fixes (apply in priority order)

### Fix 1 (HIGHEST IMPACT) — message_stats + coordinator_message_status caching + fold-into-one-pass
- Files: src/messages.rs:207-256, :667-713, src/commands/viz/mod.rs:736-756, src/tui/viz_viewer/state.rs:7092-7138
- Cache per (task_id, mtime) in VizApp
- fs watcher already knows changed paths — surface that path list (currently discarded at start_fs_watcher state.rs:7107) and selectively invalidate
- The two functions each call list_messages() — fold into ONE pass that reads the file once

### Fix 2 — live_token_usage + agency_token_usage caching
- Files: src/commands/viz/mod.rs:646-733, src/graph.rs:914 (parse_token_usage_live)
- Cache per (agent_id, output_log_mtime) and per (task_id, lifecycle_member_mtime)
- Currently re-walks log/agents/<task>/* every refresh

### Fix 3 — eliminate second graph load in apply_sort_mode
- Files: src/tui/viz_viewer/state.rs:7815-7835
- Pass already-loaded WorkGraph in (or precompute and cache the status_map)
- Two graph loads per refresh is pure waste

### Fix 4 — throttle viz regen
- Files: src/tui/viz_viewer/state.rs:7143-7250
- Add 'last_full_refresh_at' guard; cap at ~200ms (5fps) regardless of fs event rate
- Current: full pipeline runs every wakeup

### Fix 5 (THE INPUT-LATENCY KILLER) — decouple chat PTY render from graph render
Two options, prefer (b) but (a) is acceptable as v1:

(a) Cheap: in chat_pty_mode, when redraw is triggered by chat_pty_has_new_bytes() (PTY echo) but NOT by graph state change, skip load_viz_from_graph + apply_sort_mode + load_stats_from_graph. Render cached lines verbatim. Keystrokes echo at PTY speed.

(b) Better: spawn maybe_refresh heavy work on a background thread that posts a snapshot via channel; main loop only reads latest snapshot. fs watcher already runs in own thread; this is natural extension.

### Fix 6 — per-agent tail thread for stream parsing
- Files: src/tui/viz_viewer/state.rs:10857-end of update_agent_streams
- Move agent stream parsing off the main thread

## Validation

Each fix has a dedicated benchmark scenario from the diagnose (A-E):

- A. tui_idle_fps — wg tui at idle, 1000 tasks, no agents. Measure render fps.
- B. tui_loaded_cpu — same with fixture simulating 8 agents appending to output.log every 100ms for 30s; ASSERT wg tui CPU < 40%.
- C. tui_chat_input_latency — wg tui in chat_pty_mode against 1000-task graph + 8 simulated writers; drive 50 keystrokes via tmux send-keys; ASSERT p99 echo delay < 50ms.
- D. cargo bench bench_generate_viz_output — N ∈ {100, 500, 1000, 2000}; ASSERT near-linear scaling and < 50ms at N=1000.
- E. cargo bench bench_message_stats_pair — fold-to-one-pass; ASSERT < 50% of baseline.

- [ ] Failing tests/benchmarks written first per the diagnose's spec
- [ ] Each of the 6 fixes applied
- [ ] All 5 benchmark scenarios PASS the asserts
- [ ] Live smoke against this project (~250 tasks, 8 agents busy): chat input latency feels snappy; CPU stays well under 100%; viewport doesn't lag
- [ ] No regression of revert-redo-fix's last_interaction_at primitive (when it lands first)
- [ ] cargo build + cargo test pass
- [ ] Permanent smoke scenarios A-E added to manifest with this task id in owners
- [ ] cargo install --path . was run before claiming done

## Why depends on revert-redo-fix
Both touch src/tui/viz_viewer/state.rs heavily (apply_sort_mode, maybe_refresh, scroll/sort logic). Serializing avoids a merge fight. revert-redo-fix's last_interaction_at primitive may also offer cleaner integration points for caching keys (e.g., (task_id, last_interaction_at) as a cache key naturally invalidates).

## Process note
This is a substantial multi-fix task. Apply all 6 in priority order. The diagnose did the design work; the implementer executes against file:line spec. If any one fix turns out wrong/incomplete, file follow-up rather than abandoning all six.

Depends on

done .assign-fix-tui-perf

Required by

failed .flip-fix-tui-perf

Log

2026-05-01T18:38:30.132616504+00:00 Task paused
2026-05-01T18:38:30.192664692+00:00 Task published
2026-05-01T18:38:53.556262858+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer's 0.80 score and 511 codebase tasks make it ideal for precision-critical performance optimization; Careful tradeoff matches methodical application of 6 coordinated fixes with comprehensive benchmark validation.
2026-05-01T20:01:34.986094102+00:00 Spawned by coordinator --executor claude --model opus
2026-05-01T20:01:49.621866048+00:00 Starting work — clean worktree, no prior WIP. Plan: read diagnose-tui-scales spec, then apply 6 fixes in priority order with benchmarks.
2026-05-01T20:38:14.057726880+00:00 Agent exited without wg done — entering failed-pending-eval for rescue evaluation
2026-05-01T20:41:30.673157955+00:00 FailedPendingEval → Failed (rescue eval unavailable after 2 attempts)