Metadata
| Status | done |
|---|---|
| Assigned | agent-1321 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-05-01T13:23:16.916942979+00:00 |
| Started | 2026-05-01T13:38:24.976791111+00:00 |
| Completed | 2026-05-01T13:50:04.009763563+00:00 |
| Tags | bug, state-machine, eval-scheduled |
| Eval score | 0.86 |
| └ blocking impact | 0.85 |
| └ completeness | 0.90 |
| └ coordination overhead | 0.90 |
| └ correctness | 0.85 |
| └ downstream usability | 0.80 |
| └ efficiency | 0.85 |
| └ intent fidelity | 0.84 |
| └ style adherence | 0.90 |
Description
Description
Bug confirmed during verify-document-shell: when the coordinator dispatches a shell task (--exec / --exec-mode shell) and the wrapper script invokes wg fail --class agent-exit-nonzero, the routing code in src/commands/fail.rs:69-71 puts the task into failed-pending-eval even though no .evaluate-X task exists for shell tasks (correctly suppressed by eval_scaffold.rs:174-179). The coordinators rescue resolver in src/commands/service/coordinator.rs:947 then waits forever for the missing eval task, leaving the shell task stuck.
Reproduction (from live smoke)
wg add "Failing shell" --id failtest3 --exec exit 1
wg claim failtest3 --actor agent-1
wg fail failtest3 --class agent-exit-nonzero --reason "Agent exited with code 1"
wg show failtest3 --json
# → status: "failed-pending-eval" (BUG: should be "failed")
wg exec --shell <task> is unaffected — its dedicated path (src/commands/exec.rs:138) sets Status::Failed directly without going through wg fail. The bug only fires through the agent-spawn wrapper path.
Patch
In src/commands/fail.rs, add a shell-task bypass to the FailedPendingEval routing condition:
// Resolve task once (it is needed for the bypass and downstream).
let is_shell = workgraph::parser::load_graph(&path)
.ok()
.and_then(|g| g.get_task(id).map(|t| t.exec.is_some() || t.exec_mode.as_deref() == Some("shell")))
.unwrap_or(false);
let use_failed_pending_eval = !eval_reject
&& !is_shell // <-- new
&& class == Some(FailureClass::AgentExitNonzero)
&& Config::load_or_default(dir).agency.auto_evaluate;
(Or factor is_shell_task from eval_scaffold.rs:38 into a shared helper and reuse it here.)
Validation
-
Failing test written first:
tests/integration_failed_pending_eval.rsaddstest_shell_task_skips_failed_pending_eval— claim a shell task, callwg fail --class agent-exit-nonzero, assert final status ==Failed, notFailedPendingEval - Patch makes the test pass
- cargo build + cargo test pass with no regressions
-
Live smoke:
wg add "f" --exec "exit 1" && wg claim f --actor a && wg fail f --class agent-exit-nonzero→wg show freportsfailed - cargo install --path . was run
Tags
- bug, state-machine, shell-mode, agency-pipeline
Depends on
Required by
- (none)
Log
- 2026-05-01T13:38:22.295295500+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer is ideal for this correctness-critical state-machine bug: implements test-first validation, Rust code modification, and thorough cargo test/install verification.
- 2026-05-01T13:38:24.976796241+00:00 Spawned by coordinator --executor claude --model opus
- 2026-05-01T13:38:49.325066327+00:00 Starting investigation: reading fail.rs:69-71 and eval_scaffold.rs
- 2026-05-01T13:45:15.498313409+00:00 Tests pass: 15/15 in integration_failed_pending_eval (including 2 new shell-task tests). Pre-existing config::tests::test_global_config_path failure noted in task context, plus pre-existing wg init handler-inference failures in integration_chat — both unrelated to fail.rs.
- 2026-05-01T13:49:54.898455259+00:00 Committed: 5564461ea — pushed to origin
- 2026-05-01T13:50:04.009777940+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
- 2026-05-01T13:52:30.997679581+00:00 PendingEval → Done (evaluator passed; downstream unblocks)