fix-shell-pending-eval — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-1321`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Created	2026-05-01T13:23:16.916942979+00:00
Started	2026-05-01T13:38:24.976791111+00:00
Completed	2026-05-01T13:50:04.009763563+00:00
Tags	`bug`, `state-machine`, `eval-scheduled`
Eval score	0.86
└ blocking impact	0.85
└ completeness	0.90
└ coordination overhead	0.90
└ correctness	0.85
└ downstream usability	0.80
└ efficiency	0.85
└ intent fidelity	0.84
└ style adherence	0.90

Description

Bug confirmed during verify-document-shell: when the coordinator dispatches a shell task (--exec / --exec-mode shell) and the wrapper script invokes wg fail --class agent-exit-nonzero, the routing code in src/commands/fail.rs:69-71 puts the task into failed-pending-eval even though no .evaluate-X task exists for shell tasks (correctly suppressed by eval_scaffold.rs:174-179). The coordinators rescue resolver in src/commands/service/coordinator.rs:947 then waits forever for the missing eval task, leaving the shell task stuck.

Reproduction (from live smoke)

wg add "Failing shell" --id failtest3 --exec exit 1
wg claim failtest3 --actor agent-1
wg fail failtest3 --class agent-exit-nonzero --reason "Agent exited with code 1"
wg show failtest3 --json
# → status: "failed-pending-eval"   (BUG: should be "failed")

wg exec --shell <task> is unaffected — its dedicated path (src/commands/exec.rs:138) sets Status::Failed directly without going through wg fail. The bug only fires through the agent-spawn wrapper path.

Patch

In src/commands/fail.rs, add a shell-task bypass to the FailedPendingEval routing condition:

// Resolve task once (it is needed for the bypass and downstream).
let is_shell = workgraph::parser::load_graph(&path)
    .ok()
    .and_then(|g| g.get_task(id).map(|t| t.exec.is_some() || t.exec_mode.as_deref() == Some("shell")))
    .unwrap_or(false);

let use_failed_pending_eval = !eval_reject
    && !is_shell                                         // <-- new
    && class == Some(FailureClass::AgentExitNonzero)
    && Config::load_or_default(dir).agency.auto_evaluate;

(Or factor is_shell_task from eval_scaffold.rs:38 into a shared helper and reuse it here.)

Validation

Failing test written first: tests/integration_failed_pending_eval.rs adds test_shell_task_skips_failed_pending_eval — claim a shell task, call wg fail --class agent-exit-nonzero, assert final status == Failed, not FailedPendingEval
Patch makes the test pass
cargo build + cargo test pass with no regressions
Live smoke: wg add "f" --exec "exit 1" && wg claim f --actor a && wg fail f --class agent-exit-nonzero → wg show f reports failed
cargo install --path . was run

Tags

bug, state-machine, shell-mode, agency-pipeline

## Description

**Bug confirmed during verify-document-shell**: when the coordinator dispatches a shell task (`--exec` / `--exec-mode shell`) and the wrapper script invokes `wg fail --class agent-exit-nonzero`, the routing code in `src/commands/fail.rs:69-71` puts the task into `failed-pending-eval` even though no `.evaluate-X` task exists for shell tasks (correctly suppressed by `eval_scaffold.rs:174-179`). The coordinators rescue resolver in `src/commands/service/coordinator.rs:947` then waits forever for the missing eval task, leaving the shell task stuck.

### Reproduction (from live smoke)

```
wg add "Failing shell" --id failtest3 --exec exit 1
wg claim failtest3 --actor agent-1
wg fail failtest3 --class agent-exit-nonzero --reason "Agent exited with code 1"
wg show failtest3 --json
# → status: "failed-pending-eval"   (BUG: should be "failed")
```

`wg exec --shell <task>` is unaffected — its dedicated path (`src/commands/exec.rs:138`) sets `Status::Failed` directly without going through `wg fail`. The bug only fires through the agent-spawn wrapper path.

## Patch

In `src/commands/fail.rs`, add a shell-task bypass to the FailedPendingEval routing condition:

```rust
// Resolve task once (it is needed for the bypass and downstream).
let is_shell = workgraph::parser::load_graph(&path)
    .ok()
    .and_then(|g| g.get_task(id).map(|t| t.exec.is_some() || t.exec_mode.as_deref() == Some("shell")))
    .unwrap_or(false);

let use_failed_pending_eval = !eval_reject
    && !is_shell                                         // <-- new
    && class == Some(FailureClass::AgentExitNonzero)
    && Config::load_or_default(dir).agency.auto_evaluate;
```

(Or factor `is_shell_task` from `eval_scaffold.rs:38` into a shared helper and reuse it here.)

## Validation
- [ ] Failing test written first: `tests/integration_failed_pending_eval.rs` adds `test_shell_task_skips_failed_pending_eval` — claim a shell task, call `wg fail --class agent-exit-nonzero`, assert final status == `Failed`, not `FailedPendingEval`
- [ ] Patch makes the test pass
- [ ] cargo build + cargo test pass with no regressions
- [ ] Live smoke: `wg add "f" --exec "exit 1" && wg claim f --actor a && wg fail f --class agent-exit-nonzero` → `wg show f` reports `failed`
- [ ] cargo install --path . was run

## Tags
- bug, state-machine, shell-mode, agency-pipeline

Depends on

Required by

(none)

Log

2026-05-01T13:38:22.295295500+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer is ideal for this correctness-critical state-machine bug: implements test-first validation, Rust code modification, and thorough cargo test/install verification.
2026-05-01T13:38:24.976796241+00:00 Spawned by coordinator --executor claude --model opus
2026-05-01T13:38:49.325066327+00:00 Starting investigation: reading fail.rs:69-71 and eval_scaffold.rs
2026-05-01T13:45:15.498313409+00:00 Tests pass: 15/15 in integration_failed_pending_eval (including 2 new shell-task tests). Pre-existing config::tests::test_global_config_path failure noted in task context, plus pre-existing wg init handler-inference failures in integration_chat — both unrelated to fail.rs.
2026-05-01T13:49:54.898455259+00:00 Committed: 5564461ea — pushed to origin
2026-05-01T13:50:04.009777940+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-05-01T13:52:30.997679581+00:00 PendingEval → Done (evaluator passed; downstream unblocks)