implement-failed-pending — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-1148`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Model	`claude:sonnet`
Created	2026-04-29T17:12:11.220574277+00:00
Started	2026-04-29T17:24:44.684820489+00:00
Completed	2026-04-29T18:08:55.199710354+00:00
Tags	`priority-high,fix,agency,tui,state-machine`, `eval-scheduled`
Eval score	0.78
└ blocking impact	0.75
└ completeness	0.85
└ coordination overhead	0.80
└ correctness	0.85
└ downstream usability	0.80
└ efficiency	0.80
└ intent fidelity	0.66
└ style adherence	0.85

Description

Implement the state machine + visual treatment chosen in design-failed-pending. Read that task's log via wg show design-failed-pending for the chosen approach, schema changes, color RGB values, and smoke scenarios.

Validation

Failing tests written first (TDD)
State machine: agent exits without wg done AND output exists → task enters pending-eval (from-failure variant) instead of terminal failed
Eval verdict positive → task transitions to done (potentially with 'rescued' marker per design decision)
Eval verdict negative → task transitions to failed (terminal)
TUI viz: failed-pending-eval state renders in the orange/yellow-red color from the design
TUI detail view: shows 'failed pending evaluation' label so user understands the in-flight state
Cycle tasks: a rescued-to-done iteration N correctly unblocks iteration N+1's dispatch (same as cleanly-done'd iteration would)
No regression: tasks that fail in genuinely-broken ways (cargo build error, OOM, signal kill) still go terminal-failed without eval consultation, per design
Live smoke: reproduce the autohaiku scenario — codex agent exits without wg done, output is acceptable, evaluator approves → task lands in done, NOT failed
Counter-smoke: same shape but agent output is bad → evaluator rejects → task lands in failed
cargo build + cargo test pass with no regressions
Permanent smoke scenario added under tests/smoke/scenarios/ with this task id in owners
cargo install --path . was run before claiming done

## Description
Implement the state machine + visual treatment chosen in design-failed-pending. Read that task's log via `wg show design-failed-pending` for the chosen approach, schema changes, color RGB values, and smoke scenarios.

## Validation
- [ ] Failing tests written first (TDD)
- [ ] State machine: agent exits without `wg done` AND output exists → task enters pending-eval (from-failure variant) instead of terminal failed
- [ ] Eval verdict positive → task transitions to done (potentially with 'rescued' marker per design decision)
- [ ] Eval verdict negative → task transitions to failed (terminal)
- [ ] TUI viz: failed-pending-eval state renders in the orange/yellow-red color from the design
- [ ] TUI detail view: shows 'failed pending evaluation' label so user understands the in-flight state
- [ ] Cycle tasks: a rescued-to-done iteration N correctly unblocks iteration N+1's dispatch (same as cleanly-done'd iteration would)
- [ ] No regression: tasks that fail in genuinely-broken ways (cargo build error, OOM, signal kill) still go terminal-failed without eval consultation, per design
- [ ] Live smoke: reproduce the autohaiku scenario — codex agent exits without wg done, output is acceptable, evaluator approves → task lands in done, NOT failed
- [ ] Counter-smoke: same shape but agent output is bad → evaluator rejects → task lands in failed
- [ ] cargo build + cargo test pass with no regressions
- [ ] Permanent smoke scenario added under tests/smoke/scenarios/ with this task id in owners
- [ ] cargo install --path . was run before claiming done

Depends on

Required by

(none)

Log

2026-04-29T17:12:11.168018341+00:00 Task paused
2026-04-29T17:12:46.935096044+00:00 Task published
2026-04-29T17:24:44.684827101+00:00 Spawned by coordinator --executor claude --model sonnet
2026-04-29T17:24:56.062232974+00:00 Starting implementation of FailedPendingEval state machine
2026-04-29T17:27:49.999359209+00:00 Schema changes: adding FailedPendingEval variant, rescued/meta_eval_attempts fields
2026-04-29T17:36:14.404403107+00:00 Schema compiled cleanly. Now implementing state machine logic in fail.rs and coordinator.rs
2026-04-29T17:49:12.579214506+00:00 Tests passing (pre-existing failure excluded). Now writing integration tests and smoke scenarios.
2026-04-29T18:03:02.574338702+00:00 cargo install running. 13/13 integration tests pass. Smoke scenario written - needs binary update to validate.
2026-04-29T18:06:53.899895549+00:00 Committed f13949895 and pushed. 2123+13=2136 tests pass, smoke scenario 6/6 pass.
2026-04-29T18:07:16.571505022+00:00 Validation complete: state machine, TUI color, dep resolution, system bypass, rescue path, cycle compat all verified.
2026-04-29T18:08:35.771360484+00:00 Committed 4247aef19 — restored tui_scroll_mode scenario (was accidentally overwritten)
2026-04-29T18:08:55.199715964+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-04-29T18:11:32.661265285+00:00 PendingEval → Done (evaluator passed; downstream unblocks)