Metadata
| Status | done |
|---|---|
| Assigned | agent-1318 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-05-01T13:15:16.229746646+00:00 |
| Started | 2026-05-01T13:15:43.061766306+00:00 |
| Completed | 2026-05-01T13:34:59.752402214+00:00 |
| Tags | fix,bug,docs,state-machine, eval-scheduled |
| Eval score | 0.93 |
| └ blocking impact | 0.95 |
| └ completeness | 0.95 |
| └ constraint fidelity | 0.70 |
| └ coordination overhead | 0.90 |
| └ correctness | 0.95 |
| └ downstream usability | 0.90 |
| └ efficiency | 0.85 |
| └ intent fidelity | 0.69 |
| └ style adherence | 0.95 |
Description
Description
Two related concerns:
-
Behavioral: when a
--exec-mode shelltask fails (exit != 0) without explicitwg done/wg fail, does it enter failed-pending-eval and get stuck (because no LLM evaluator can score a shell task), OR does the state machine correctly skip the rescue path for shell tasks? -
Documentation:
wg quickstartdocuments--exec-mode shelland failed-pending-eval, but doesn't describe how they interact. The failed-pending-eval description says 'agent exited non-zero with auto_evaluate=true, awaiting evaluator verdict' — which implies eval ALWAYS runs, but that can't be true for shell tasks.
User question 2026-05-01: 'is this well documented in quickstart? is failure managed correctly?'
Investigation steps
Behavioral
- Create a
--exec-mode shelltask that runs a command exiting non-zero (e.g.,--exec 'exit 1') - Observe the resulting status. Possibilities:
- Direct terminal failed (skips pending-eval entirely — correct): the implementer saw shell tasks should bypass the rescue path
- Stuck in pending-eval (bug): waiting for an evaluator that never fires
- Some other state: investigate what
- If the behavior is correct, find where in the state-transition code the shell-mode bypass happens (cite file:line). If missing, file as a fix.
Documentation
- Update
wg quickstartto explicitly say:- Shell tasks (
--exec-mode shellor--exec ...) do NOT get LLM evaluation; failure semantics are 'exit code 0 = done, non-zero = failed (terminal, no rescue)' - failed-pending-eval applies ONLY to LLM agent tasks (full / light / bare exec modes)
- Recurring shell tasks (cron with --exec) follow the same rule: each invocation succeeds-or-fails terminally, no rescue attempt
- Shell tasks (
Validation
-
Behavior verified — paste the outcome of
wg add --exec 'exit 1'end-to-end test in the task log (state trajectory: open → in-progress → ???) - If state machine is correct: cite file:line of the shell-mode bypass in the state-transition code
- If state machine is buggy (shell task stuck in pending-eval): file a follow-up fix task with concrete patch
- Quickstart updated to clarify the interaction between shell-mode and failed-pending-eval
- If applicable: CLAUDE.md, agent-guide bundle, or COMMANDS.md updated with the same clarification (per the doc-sync function pattern)
- cargo build + cargo test pass
- cargo install --path . was run before claiming done
Process note
This is a small task that doubles as a behavioral assertion + docs update. If the behavior is fine, the docs change is small. If the behavior has a bug, file the fix separately and let this task land the docs update against current behavior — they shouldn't block each other.
Depends on
Required by
Log
- 2026-05-01T13:15:16.207795027+00:00 Task paused
- 2026-05-01T13:15:20.638463508+00:00 Task published
- 2026-05-01T13:15:40.674680814+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=full, reason=Careful Programmer is well-suited for state-machine verification (correctness-critical) and documentation updates; high score (0.79) and extensive task history (466) demonstrate strong performance on similar verification/refactoring work.
- 2026-05-01T13:15:43.061771115+00:00 Spawned by coordinator --executor claude --model opus
- 2026-05-01T13:15:57.513689568+00:00 Starting investigation: shell-mode (--exec) failure handling vs failed-pending-eval state
- 2026-05-01T13:17:40.804083030+00:00 Found fail.rs:69-71 — FailedPendingEval routing happens when class=AgentExitNonzero AND auto_evaluate=true. No exec-mode bypass exists in the routing decision.
- 2026-05-01T13:21:10.670053059+00:00 BUG CONFIRMED: shell task with --class agent-exit-nonzero routes to failed-pending-eval. Trajectory: open → in-progress → failed-pending-eval. No .evaluate-X exists for shell tasks (correctly suppressed by eval_scaffold.rs:174-179) so the rescue never resolves. Bug location: src/commands/fail.rs:69-71 — the routing condition checks class+auto_evaluate but NOT is_shell_task.
- 2026-05-01T13:33:51.155767448+00:00 Validated: cargo build + cargo test (failed_pending_eval + pending_eval_state suites pass; 1 unrelated pre-existing failure in config::tests::test_global_config_path on .workgraph→.wg path migration). cargo install --path . completed. wg quickstart now shows updated docs in agent-guide.md, COMMANDS.md, and quickstart.rs.
- 2026-05-01T13:34:49.008835829+00:00 Committed: b1acf64d2 — pushed to origin
- 2026-05-01T13:34:59.752414257+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
- 2026-05-01T13:38:02.846230745+00:00 PendingEval → Done (evaluator passed; downstream unblocks)