deprecate-pending-validation

Deprecate pending-validation; make .evaluate-X passing the dependency-unblock gate

Metadata

Statusdone
Assignedagent-185
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-04-27T00:20:07.059463902+00:00
Started2026-04-27T00:20:46.044369231+00:00
Completed2026-04-27T01:06:27.343189556+00:00
Tagseval-scheduled
Eval score0.84
└ blocking impact0.87
└ completeness0.73
└ constraint fidelity0.40
└ coordination overhead0.88
└ correctness0.88
└ downstream usability0.85
└ efficiency0.85
└ intent fidelity0.86
└ style adherence0.87

Description

Description

User insight: pending-validation status is a holdover from the deprecated --verify / --validation=llm era. It now exists to stall tasks indefinitely waiting for wg approve / wg reject — a synchronous human gate that nobody runs. Better model: dependent tasks unblock when the parent's .evaluate-X task passes (score >= eval_gate_threshold). Agency eval IS the verification.

The machinery already exists in config:

  • eval_gate_threshold = 0.7
  • auto_rescue_on_eval_fail = true
  • auto_evaluate = true

What's missing: making .evaluate-X a HARD prerequisite for downstream tasks, and removing pending-validation from the routine state machine.

Spec

  1. Status state machine:

    • Drop PendingValidation from the routine task lifecycle. Tasks go: open → in-progress → done | failed | abandoned.
    • If retained at all, it's only for very rare cases (e.g. cross-org review in a public visibility task that explicitly opts in via --validation=human-review). Never the default for any task.
  2. Dependency unblock model:

    • Today: Task A done → Task B (--after A) becomes ready as soon as A is Done.
    • New: Task A done → .evaluate-A scaffolded → eval runs → if score >= eval_gate_threshold, Task B becomes ready. If score < threshold, eval-fail handler fires:
      • If auto_rescue_on_eval_fail = true (default): re-spawn A with the eval feedback as additional context; Task B stays blocked.
      • If auto_rescue_on_eval_fail = false: A transitions to Failed with eval reason; Task B blocked until manually unblocked.
  3. Display + UX:

    • wg ready and wg viz show eval-gating: 'Task B blocked on .evaluate-A pending'.
    • wg show A shows whether A's eval has run, current score vs threshold, downstream blockers.
    • Eval failure surfaces in chat / TUI immediately with the reason.
  4. Migration for existing PendingValidation tasks:

    • On dispatcher boot, scan for tasks in PendingValidation. For each: log a one-time migration message, transition to Done (assume the agent's claim was accepted; if user wanted to reject it they would have).
    • Clearly document the migration in the upgrade notes.
  5. Drop wg approve / wg reject as routine commands:

    • Keep them as overrides (wg approve <task> to bypass eval gate; wg reject <task> to force re-spawn) for emergency human intervention.
    • Mark them as 'expert mode' in --help; not surfaced in quickstart.
  6. Cascade-failure guardrails:

    • If eval consistently fails (e.g. 3 consecutive auto-rescues without passing), task transitions to Failed instead of looping forever.
    • Configurable via existing max_verify_failures (rename to max_eval_rescues for clarity).

Why this matters now

Showstopper concrete example: thin-wrapper-impl is sitting in PendingValidation (6 hours stale). Downstream tasks that depend on it (any thin-wrapper-smoke / thin-wrapper-docs / etc.) are blocked. The eval (.evaluate-thin-wrapper-impl) probably ran or will run; if it passed, downstream should already be unblocked. PendingValidation is just adding a synchronous human checkpoint that nobody is running.

User's point verbatim: 'a mode of behavior that should be deprecated... maybe dependent tasks should actually depend on evaluation passing them?'

Out of scope

  • Re-implementing the agency eval system (already works; just plug it into the gate model)
  • The smoke-gate-is task (separate concern: agent's own self-check before claiming done)

Validation

  • Failing tests first:
    • test_dependent_task_unblocks_when_eval_passes — Task A done + .evaluate-A scored 0.8 (above threshold) → Task B becomes ready
    • test_dependent_task_stays_blocked_when_eval_fails — Task A done + .evaluate-A scored 0.5 → Task B stays blocked AND A re-spawned with feedback
    • test_no_routine_pending_validation_state — wg add 'foo'; wg done foo ends up in Done, never PendingValidation
    • test_legacy_pending_validation_migrated_on_boot — boot finds an existing PendingValidation task, transitions to Done with migration log
    • test_max_eval_rescues_caps_loops — task that consistently fails eval transitions to Failed after N retries
  • Implementation makes tests pass
  • cargo build + cargo test pass with no regressions
  • Manual smoke (in a scratch dir):
    • Add task A and task B (--after A); publish both
    • A runs, claims done; B should NOT be ready until .evaluate-A passes
    • If eval passes, B becomes ready; if fails, A re-spawns
    • PendingValidation never appears in wg list for either
  • Approve thin-wrapper-impl (or reject if smoke shows broken) to unblock its downstream NOW, separate from this task

Depends on

Required by

Log