Metadata
| Status | done |
|---|---|
| Assigned | agent-878 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-04-27T21:26:09.526805915+00:00 |
| Started | 2026-04-27T21:50:18.185351367+00:00 |
| Completed | 2026-04-27T21:53:28.811373699+00:00 |
| Tags | eval-scheduled |
| Eval score | 0.76 |
| └ blocking impact | 0.80 |
| └ completeness | 0.75 |
| └ coordination overhead | 0.85 |
| └ correctness | 0.80 |
| └ downstream usability | 0.75 |
| └ efficiency | 0.75 |
| └ intent fidelity | 0.89 |
| └ style adherence | 0.80 |
Description
Description
When a task hits multiple retries (e.g. API errors → kill agent → dispatcher respawns), each retry produces a new agent + transcript. In the TUI you should be able to switch between iterations to see each attempt's log/output. This is broken.
User-reported symptoms (today, on tasks that failed mid-session with claude.ai stream hangs and got recovered via the kill-and-respawn pattern):
- Can't switch to the most recent iteration while it's in-progress — UI seems stuck on an earlier completed iteration
- OR: switching is possible but the latest-iteration pane shows stale content (the prior attempt's logs with the API errors / disconnection, not the new attempt's)
- User isn't sure which of these is actually happening — the UI doesn't make it clear what iteration you're viewing
- Earlier related symptom user reported in this same session: 'clicking on the little triangles doesn't rotate through iterations'
Affected tasks today: improve-wg-setup, tui-settings-tab, migrate-agency-tasks — each retried 2-3 times after stream-hang recovery.
Investigation needed first
- Which view is this? Likely the task Detail view or the Log view (or both — user said 'in the TUI at least' suggesting it's broader). Confirm where iteration-switching is supposed to live
- What are 'iterations' here? Multi-attempt retries (each agent-N for the same task) — distinct from cycle-loop iterations (
loop_iterationfield on cycle members). Make sure code uses the correct concept - Data source: is the iteration switcher reading from the per-attempt agent dirs (
.wg/agents/agent-N/) or from a graph-level attempt history? Are in-progress iterations even indexable until the agent finishes writing its closing log entry? - Triangle rotation bug: what do the triangle controls actually do today? Are they wired to the right data? Does 'next' wrap correctly? Does 'previous' work?
- Affordance: how does a user know which iteration they're viewing? Is there a visible indicator (1 of 3 / iteration N / agent-N)?
Desired end state
- User can navigate between all iterations (completed AND in-progress) of a retried task from the TUI
- Triangle/key controls actually rotate, including reaching the latest iteration
- Latest iteration's content updates live as the agent writes (no stale-content trap)
- UI makes the current-iteration position obvious: '2 of 3' / 'iteration N (in-progress)' / similar
- Default view on opening a task = most recent iteration
Validation
- Reproduce the bug first: run a task that gets killed and respawned mid-flight; confirm broken iteration switching in TUI BEFORE fixing
- After fix: same scenario shows all iterations; triangle navigation reaches each; in-progress iteration content updates live (verify via tail of agent output.log vs what TUI shows)
- Iteration position indicator visible (1/3, etc.) on every iteration view
- Latest iteration is the default view when opening a multi-attempt task
- Manual smoke: kill an in-progress agent, watch dispatcher respawn, confirm TUI shows new iteration immediately and lets you switch back to the killed one to see its API error log
- cargo build + cargo test pass with no regressions
Depends on
Required by
- (none)
Log
- 2026-04-27T21:26:09.501982563+00:00 Task paused
- 2026-04-27T21:26:12.731319010+00:00 Task published
- 2026-04-27T21:27:02.596282727+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=graph, reason=Careful Programmer is well-suited for this TUI bug fix: requires careful investigation of unclear symptoms, correct understanding of retry/iteration semantics across graph+agent+TUI layers, precise state-management changes, and thorough validation with live test scenarios.
- 2026-04-27T21:27:03.870239827+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-27T21:27:11.905276708+00:00 Starting investigation into TUI iteration switching bug
- 2026-04-27T21:50:12.727913468+00:00 Task unclaimed: agent 'agent-856' (PID 2865098) process exited
- 2026-04-27T21:50:18.185356697+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-27T21:52:06.990012517+00:00 Fixed: dead_agents/sweep/zero_output now archive the prior agent's output before unclaiming, so killed-and-respawned attempts appear in TUI iteration switcher. Also: archives refresh on every load_hud_detail (not just task switch), and live slot is labeled '(live)' for in-progress tasks.
- 2026-04-27T21:52:57.764507365+00:00 Validated: cargo build pass; new tests pass (53 iteration/sweep/dead_agents tests). 7 pre-existing failures on main (spawn_task::*, provenance_full_lifecycle, test_legacy_coordinator_tab_is_always_muted) unrelated to this work — confirmed by running same tests against unmodified main checkout.
- 2026-04-27T21:53:13.237203566+00:00 Committed: cebead3fa — pushed to remote
- 2026-04-27T21:53:28.811386994+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
- 2026-04-27T21:56:29.601508617+00:00 PendingEval → Done (evaluator passed; downstream unblocks)
- 2026-04-27T21:58:05.700056936+00:00 Root cause: dead_agents.rs / sweep.rs / zero_output.rs unclaim in-progress tasks but never call archive_agent. Killed-and-respawned attempts (claude.ai stream hangs) were orphaned in .wg/agents/<id>/ instead of being archived to .wg/log/agents/<task-id>/<ts>/, so the TUI iteration switcher (find_all_archives) couldn't see them.
- 2026-04-27T21:58:05.741636923+00:00 Fix 1: archive_agent now called from heartbeat-timeout (dead_agents), sweep (orphaned tasks), and zero-output kill paths. Each killed attempt becomes a TUI-visible iteration archive.
- 2026-04-27T21:58:05.764418204+00:00 Fix 2: TUI's load_hud_detail now refreshes iteration_archives on every call (cheap readdir). Previously archives only refreshed on task switch, so mid-session new archives were invisible.
- 2026-04-27T21:58:05.783900373+00:00 Fix 3: navigator/header now shows '(live)' when viewing the live slot of an in-progress task. Drops the suffix on Done/Failed tasks where the live slot mirrors the latest archive.
- 2026-04-27T21:58:05.803126869+00:00 Validated: cargo build + cargo test pass for new tests (53 iteration tests + 2 dead_agents tests + 2 sweep tests). 7 pre-existing failures (spawn_task / provenance / TUI render) verified on main — unrelated.
- 2026-04-27T21:58:05.829126418+00:00 Live smoke: wg sweep on synthesized orphaned-agent state correctly archives output.log to .wg/log/agents/<task-id>/<ts>/. wg dead-agents --cleanup also archives. Both paths confirmed end-to-end.
- 2026-04-27T21:58:05.855126489+00:00 Smoke gate: added tests/smoke/scenarios/sweep_archives_orphaned_agent.sh; registered in manifest.toml owned by tui-cannot-view + smoke-gate-is. Standalone PASS.
- 2026-04-27T21:58:54.196927509+00:00 Committed: 7bcaa7bfc — smoke scenario added