in-place-eval

In-place eval-fail iteration: same agent, same worktree, iterate with feedback

Metadata

Statusdone
Assignedagent-693
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Created2026-04-27T14:41:25.524611646+00:00
Started2026-04-27T14:45:18.343871621+00:00
Completed2026-04-27T15:06:19.621685207+00:00
Tagseval-gate, eval-scheduled

Description

Description

User architectural clarification (2026-04-27): when an eval gate FAILS, the rescue path must reuse the SAME agent identity AND the SAME worktree, NOT spawn a fresh worker.

User's verbatim quote:

'the failed gate, it should result in a retry, but again without destruction of the particular agent. Like we should regenerate that agent so that it has the same work tree and so on. it'''s just another iteration, right?'

Required behavior

PendingEval → eval fail (score < threshold) ─┬─ rescue_count < max → Open (same task.agent, same worktree, eval feedback in context)
                                              └─ rescue_count >= max → Failed (triage)

Same iteration semantic as a chat session reattaching: pick up prior state, continue from there.

Wired with

  • add-pendingeval-state (this task's parent): Adds PendingEval state, dispatcher resolution on pass, dep gating. The eval-FAIL path currently uses the existing fresh-agent rescue which is wrong per the clarification.
  • worktree-retention-don (already merged): Don't reap worktree until eval+merge actually completes. Together these produce the proper resumable iteration loop.

Files likely to touch

  • src/commands/evaluate.rscheck_eval_gate: on score < threshold, instead of run_eval_reject + rescue::run, transition source PendingEval → Open keeping task.agent / task.assigned, increment task.rescue_count, append eval notes to next-attempt context.
  • src/commands/spawn/context.rs — pick up evaluator notes from prior iteration when spawning (similar to how retry_count > 0 already injects previous-attempt context).
  • src/commands/service/coordinator.rs — if needed, suppress worktree GC for tasks in the eval-rescue loop.
  • src/config.rsmax_eval_rescues cap (already exists as alias for max_verify_failures).

What stays from add-pendingeval-state

  • Status::PendingEval variant
  • pick_done_target_status (wg done → PendingEval when eval scheduled)
  • resolve_pending_eval_tasks (eval pass → Done)
  • Color rendering (chartreuse)
  • approve / reject / fail accept PendingEval

Validation

  • Failing test first: test_eval_fail_retries_in_place_with_same_agent — task A in PendingEval, eval scores below threshold, after the fail-handler runs A is Open with task.agent UNCHANGED and task.rescue_count incremented (no new task created)
  • Failing test: test_eval_fail_at_cap_transitions_to_failed — same setup, rescue_count == max_eval_rescues, A goes to Failed (no further iteration spawn)
  • Failing test: test_eval_feedback_in_next_spawn_context — after rescue, the next spawn's previous_attempt_context contains the evaluator notes
  • Failing test: test_worktree_preserved_across_eval_iteration — worktree dir for A still exists after eval-fail rescue (not reaped)
  • cargo build + cargo test pass with no regressions
  • Manual smoke: low-scoring task A → wg show A reports Status: in-progress (or open with assigned set), same task.agent hash, same worktree path; rescue_count: 1

Depends on

Required by

Log