bug-flip-and

Bug: .flip-* and .evaluate-* tasks fail with 'has status PendingEval' instead of waiting for transition

Metadata

Statusdone
Assignedagent-821
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-04-27T17:43:55.179673396+00:00
Started2026-04-27T18:40:57.176496041+00:00
Completed2026-04-27T18:56:12.041520199+00:00
Tagseval-scheduled
Tokens1002363 in / 6661 out

Description

Description

Many .flip-* and .evaluate-* tasks today (2026-04-27, between ~15:29 and ~17:00) failed with:

Eval stderr: Error: Task '<parent-id>' has status PendingEval — must be done or failed to evaluate
Task marked as failed: wg evaluate exited with code 1

Sample failed tasks (from wg list --all --status failed):

  • .flip-tui-cannot-retire, .flip-tui-tab-bar, .flip-tui-chat-smart
  • .flip-simplify-executor-taxonomy, .flip-implement-rename-remaining
  • .evaluate-tui-log-view-3, .evaluate-tui-chat-smart, .evaluate-tui-cannot-retire
  • .evaluate-research-tui-detail, .evaluate-implement-tui-modal
  • ~30 more — see wg list --all --status failed | grep -E '\.flip-|\.evaluate-'

Root cause hypothesis

The recent add-pendingeval-state work (commit a4f591261, per CLAUDE.md memory) added a new PendingEval task status. The flow is supposed to be:

parent task → done → PendingEval → .evaluate-* runs → done → .flip-* runs

But the dispatcher fires .flip-* and .evaluate-* while the parent is still in PendingEval. The wg evaluate command rejects PendingEval as a valid input state with a hard error ('must be done or failed to evaluate'), the task gets marked failed, and the dispatcher never retries.

Two possible fixes:

A. Make .evaluate- / .flip- wait** — dispatcher should not spawn these tasks while the parent is in PendingEval. Add PendingEval to the list of 'parent must not be in this state' checks, treating it like 'in-progress'.

B. Make wg evaluate accept PendingEval — if PendingEval implies 'done but eval pending', then the eval command should accept it and proceed (it's a step on the way to done, not an error condition). Adjust src/commands/evaluate.rs precondition.

C. Both — A as the dispatcher rule + B as a safety net so race-condition retries don't fail.

Recommend C. The dispatcher rule is the structural fix; the eval-side accept is a defensive guard so future state-transition changes don't re-introduce the same race.

Files to touch

  • src/commands/evaluate.rs — the precondition check that emits 'must be done or failed to evaluate'. Add PendingEval to accepted states OR skip-and-defer.
  • src/commands/service/coordinator.rs (or dispatch logic) — when scheduling a .evaluate-X or .flip-X task, check parent's status: if PendingEval, defer; if Done/Failed, dispatch.
  • src/graph.rs — make sure PendingEval is in the right enum slot and the dispatcher's "is parent ready for eval" predicate handles it.

Validation

  • Failing test first: parent task in PendingEval, dispatcher schedules .evaluate-parent → assert dispatch is deferred until parent transitions out of PendingEval (don't auto-fail)
  • Failing test for the defensive case: wg evaluate <task> with task in PendingEval → either accepts and proceeds, OR exits 0 with a 'will retry' message; does NOT exit 1
  • Reset all the currently-failed .flip-* / .evaluate-* tasks back to open and confirm they now succeed (could be a separate sweep task)
  • cargo build + cargo test pass with no regressions
  • Manual: dispatch a fresh task with --verify-style validation, observe the parent → PendingEval → evaluate → flip flow runs end-to-end with no failures
  • No more 'has status PendingEval' errors in daemon.log over a 30-min run with active dispatch

Depends on

Required by

Log