Metadata
| Status | done |
|---|---|
| Assigned | agent-693 |
| Agent identity | 3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3 |
| Created | 2026-04-27T14:41:25.524611646+00:00 |
| Started | 2026-04-27T14:45:18.343871621+00:00 |
| Completed | 2026-04-27T15:06:19.621685207+00:00 |
| Tags | eval-gate, eval-scheduled |
Description
Description
User architectural clarification (2026-04-27): when an eval gate FAILS, the rescue path must reuse the SAME agent identity AND the SAME worktree, NOT spawn a fresh worker.
User's verbatim quote:
'the failed gate, it should result in a retry, but again without destruction of the particular agent. Like we should regenerate that agent so that it has the same work tree and so on. it'''s just another iteration, right?'
Required behavior
PendingEval → eval fail (score < threshold) ─┬─ rescue_count < max → Open (same task.agent, same worktree, eval feedback in context)
└─ rescue_count >= max → Failed (triage)
Same iteration semantic as a chat session reattaching: pick up prior state, continue from there.
Wired with
- add-pendingeval-state (this task's parent): Adds PendingEval state, dispatcher resolution on pass, dep gating. The eval-FAIL path currently uses the existing fresh-agent rescue which is wrong per the clarification.
- worktree-retention-don (already merged): Don't reap worktree until eval+merge actually completes. Together these produce the proper resumable iteration loop.
Files likely to touch
src/commands/evaluate.rs—check_eval_gate: on score < threshold, instead ofrun_eval_reject+rescue::run, transition source PendingEval → Open keeping task.agent / task.assigned, increment task.rescue_count, append eval notes to next-attempt context.src/commands/spawn/context.rs— pick up evaluator notes from prior iteration when spawning (similar to how retry_count > 0 already injects previous-attempt context).src/commands/service/coordinator.rs— if needed, suppress worktree GC for tasks in the eval-rescue loop.src/config.rs—max_eval_rescuescap (already exists as alias for max_verify_failures).
What stays from add-pendingeval-state
- Status::PendingEval variant
- pick_done_target_status (wg done → PendingEval when eval scheduled)
- resolve_pending_eval_tasks (eval pass → Done)
- Color rendering (chartreuse)
- approve / reject / fail accept PendingEval
Validation
- Failing test first: test_eval_fail_retries_in_place_with_same_agent — task A in PendingEval, eval scores below threshold, after the fail-handler runs A is Open with task.agent UNCHANGED and task.rescue_count incremented (no new task created)
- Failing test: test_eval_fail_at_cap_transitions_to_failed — same setup, rescue_count == max_eval_rescues, A goes to Failed (no further iteration spawn)
- Failing test: test_eval_feedback_in_next_spawn_context — after rescue, the next spawn's previous_attempt_context contains the evaluator notes
- Failing test: test_worktree_preserved_across_eval_iteration — worktree dir for A still exists after eval-fail rescue (not reaped)
- cargo build + cargo test pass with no regressions
- Manual smoke: low-scoring task A → wg show A reports Status: in-progress (or open with assigned set), same task.agent hash, same worktree path; rescue_count: 1
Depends on
Required by
- (none)
Log
- 2026-04-27T14:45:18.343874757+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-27T14:45:33.621674261+00:00 Starting: explore current check_eval_gate, spawn context, and rescue flow to plan in-place eval-fail iteration
- 2026-04-27T15:00:00.827441773+00:00 Tests pass: 4 new + 15 existing evaluate tests green. Preexisting failures (provenance_full_lifecycle, integration_resume compile error) are unrelated and present without my changes.
- 2026-04-27T15:05:31.147109733+00:00 Committed: 0912ffa66 — pushed to remote
- 2026-04-27T15:06:19.621697960+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
- 2026-04-27T15:06:52.887897083+00:00 PendingEval → Done (evaluator passed; downstream unblocks)