diagnose-tui-scales

Diagnose: TUI scales poorly with task count — quadratic-ish lag, chat input degraded under load

Metadata

Statusdone
Assignedagent-1347
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Created2026-05-01T15:05:23.951129226+00:00
Started2026-05-01T15:08:10.389864611+00:00
Completed2026-05-01T15:20:46.462767431+00:00
Tagspriority-high,research,perf,tui, eval-scheduled
Eval score0.91
└ blocking impact0.90
└ completeness0.95
└ constraint fidelity0.40
└ coordination overhead0.82
└ correctness0.93
└ downstream usability0.93
└ efficiency0.87
└ intent fidelity0.89
└ style adherence0.90

Description

Description

TUI performance degrades visibly as task count grows AND under high activity (8/8 agents busy). The chat tab's typing latency through tmux is degraded — keystrokes lag, suggesting the TUI's render thread blocks chat input handling.

User direct quote 2026-05-01: 'TUI gets slower and slower the more tasks are in it. At a level that I don't think is appropriate. We're looking at not that much information here. ... it's actually starting to degrade the chat interface to the tmux. It's very laggy to type on it. ... Something got to get decoupled to make it so that we can be refreshing different parts of the screen with different mechanisms.'

User hypothesis (likely correct): something quadratic — 'linear scans being done' inside per-frame work.

User reproducibility note: 'You don't really get this problem when the graph is smaller and activity levels lower.' So both task count AND event rate amplify the symptom.

Investigation goals (no source modifications)

1. Profile under realistic load

  • Run wg tui against the current project (~900+ tasks, frequent state updates)
  • Capture: perf record -F 99 -p $(pgrep -f 'wg tui') -- sleep 30 && perf report (or samply record if perf unavailable)
  • Identify the top hot functions during a 30s sample with active dispatcher

2. Look for quadratic patterns

Specifically grep / inspect for:

  • Per-frame full-graph rescans (filter / sort / re-layout the entire task list every render)
  • Per-row inner loops that scan all-tasks (e.g., 'for each visible row, look up its dependencies by scanning the full edge list')
  • Synchronous full-file reads of graph.jsonl on every event
  • Repeated string formatting / allocation in the render path that should be cached

3. Identify the chat-input coupling

  • Why is typing lag on the chat PTY? Single render thread shared with graph rendering?
  • Trace the input event path for chat-tab keystrokes — does it block on any graph-state computation?
  • If yes: the render-thread architecture is the bottleneck, not just one quadratic scan

4. Recommend decoupling strategy

Per user's framing: 'Something got to get decoupled to make it so that we can be refreshing different parts of the screen with different mechanisms.'

Possible decoupling axes:

  • Chat PTY rendering on its own thread / async task, never blocked by graph rendering
  • Graph view re-renders only when graph state changes (dirty flag), not on every event
  • Per-pane render thrott (different panes can refresh at different rates: chat = 60fps, graph = 5fps, status bar = 1fps)
  • Background pre-computation of expensive layout (graph rendering happens at lower priority + interruptible)

5. Reference fix-wg-tui

Earlier task fix-wg-tui (commit 2e11648ab) addressed a 55% CPU problem. Read its log via wg show fix-wg-tui to understand what it changed. The current symptom may be a separate issue (per-frame work too heavy, even if the busy-loop is fixed) OR a regression.

Deliverable

wg log entry with:

  • Profile output (top functions by self-time during a 30s sample, with task count + active agents documented)
  • Specific hot path identified (file:line of the worst offender)
  • Quadratic pattern named (or 'no quadratic; the issue is X' with evidence)
  • Chat-input-coupling diagnosis (where the typing lag enters the render path)
  • Concrete recommended decoupling/optimization strategy with file pointers for the implementer
  • Smoke / benchmark scenarios that gate future regressions (e.g., 'render N tasks under M events/sec — assert frame time < X ms')

Validation

  • Profile captured (perf or samply output) with task count + activity rate documented
  • Hot path(s) identified with file:line citations
  • Quadratic / O(n²) pattern confirmed or refuted with evidence
  • Chat-input coupling explained
  • Concrete recommended fix(es) for follow-up implementation task
  • At least one benchmark/smoke scenario specified
  • No source / doc modifications — diagnose only

Depends on

Required by

Log