diagnose-hud-slot

Diagnose: HUD slot count vs reality — recurring drift after 3+ attempts

Metadata

Statusdone
Assignedagent-1349
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Modelclaude:opus
Created2026-05-01T14:59:57.501549594+00:00
Started2026-05-01T15:08:17.619996787+00:00
Completed2026-05-01T15:17:13.009140130+00:00
Tagspriority-high,research,bug,tui,hud, eval-scheduled
Eval score0.87
└ blocking impact0.90
└ completeness0.94
└ constraint fidelity0.70
└ coordination overhead0.89
└ correctness0.92
└ downstream usability0.91
└ efficiency0.85
└ intent fidelity0.87
└ style adherence0.88

Description

Description

The TUI's HUD shows agent slot occupancy (e.g., '1/8 slots') but it consistently disagrees with reality. User report 2026-05-01: visibly 5 agents running, HUD says '1/8'.

Prior fix attempts that didn't hold:

  • 6b87ae242 fix-tui-hud (agent-745)
  • 659208d2b 'TUI agent count uses status-based active_count to match wg status' (fix-tui-agent-count-2)
  • Plus possibly more (user says ~4 attempts)

User direct quote 2026-05-01: 'The HUD that shows the system state and how many tasks are running, it is never accurate. Like right now, I see five tasks running. It says one of eight slots are occupied. What the hell is wrong with that thing? We tried to fix it like four times.'

Why prior attempts haven't held

Each fix probably changed WHERE the count is read from but not WHY it's wrong. Possible causes:

  1. The HUD reads from a polled cache that updates slowly (debounce / interval mismatch with reality)
  2. The HUD reads from one source (e.g., 'agents I spawned this tick') while wg status and the chat tab list read from another (registry of all-alive agents)
  3. There's a subscriber pattern where HUD missed an event (spawn fired, HUD didn't subscribe; or kill fired, HUD didn't decrement)
  4. The HUD's count includes a stale-state filter that excludes some active agents (e.g., agents in 'spawning' transitional state aren't counted as occupying a slot, but they ARE running)

Investigation steps (no source mods)

1. Capture the divergence

  • At a moment when HUD is wrong, capture:
    • HUD text output (slot count display)
    • wg agents output (what the registry says is alive)
    • wg service status output (what the daemon says about slots)
    • Process tree: pgrep -af claude|codex|nex|wg.spawn-task (what's actually running)
  • Diff these. Identify which is right (probably the process tree) and which is wrong (probably HUD).

2. Find the data source

  • Search src/tui/ for where the HUD slot count is rendered
  • Identify the data source it reads from (likely an in-memory counter or a derived value from the registry)
  • Compare with the source wg status reads from
  • If they're different sources, that's the bug

3. Audit prior fixes

  • 6b87ae242 (fix-tui-hud): what did it actually change? Read the diff
  • 659208d2b (fix-tui-agent-count-2): same — read the diff
  • Identify why those fixes didn't hold. Possible patterns:
    • Fixed the count for one rendering path, missed another
    • Fixed the read source but the source itself was already wrong
    • Fixed the right thing but a subsequent change reverted it

4. Spec a comprehensive fix

The fix must:

  • Use a SINGLE source-of-truth for active agent count (probably the registry's enumeration of running agents)
  • Be subscribed to the events that change that source (spawn, exit) so HUD never lags
  • Have a smoke test that asserts count parity across HUD, wg agents, and process tree

Deliverable

wg log entry with:

  • Captured divergence evidence (HUD vs reality at a specific moment)
  • Root cause with file:line citation
  • Why prior fixes didn't hold (specific shortcoming of each)
  • Concrete fix proposal that addresses the root cause, not the symptom

Validation

  • Divergence captured and pasted in task log (HUD output, wg agents output, process tree at the same moment)
  • Root cause identified with file:line
  • Prior-fix postmortems written
  • Concrete fix proposal that includes a parity assertion (smoke test)
  • No source / doc modifications — diagnose only

Depends on

Required by

Log