verify-document-shell

Verify + document: shell-mode (--exec) task failure handling vs failed-pending-eval state

Metadata

Statusdone
Assignedagent-1318
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-05-01T13:15:16.229746646+00:00
Started2026-05-01T13:15:43.061766306+00:00
Completed2026-05-01T13:34:59.752402214+00:00
Tagsfix,bug,docs,state-machine, eval-scheduled
Eval score0.93
└ blocking impact0.95
└ completeness0.95
└ constraint fidelity0.70
└ coordination overhead0.90
└ correctness0.95
└ downstream usability0.90
└ efficiency0.85
└ intent fidelity0.69
└ style adherence0.95

Description

Description

Two related concerns:

  1. Behavioral: when a --exec-mode shell task fails (exit != 0) without explicit wg done / wg fail, does it enter failed-pending-eval and get stuck (because no LLM evaluator can score a shell task), OR does the state machine correctly skip the rescue path for shell tasks?

  2. Documentation: wg quickstart documents --exec-mode shell and failed-pending-eval, but doesn't describe how they interact. The failed-pending-eval description says 'agent exited non-zero with auto_evaluate=true, awaiting evaluator verdict' — which implies eval ALWAYS runs, but that can't be true for shell tasks.

User question 2026-05-01: 'is this well documented in quickstart? is failure managed correctly?'

Investigation steps

Behavioral

  1. Create a --exec-mode shell task that runs a command exiting non-zero (e.g., --exec 'exit 1')
  2. Observe the resulting status. Possibilities:
    • Direct terminal failed (skips pending-eval entirely — correct): the implementer saw shell tasks should bypass the rescue path
    • Stuck in pending-eval (bug): waiting for an evaluator that never fires
    • Some other state: investigate what
  3. If the behavior is correct, find where in the state-transition code the shell-mode bypass happens (cite file:line). If missing, file as a fix.

Documentation

  • Update wg quickstart to explicitly say:
    • Shell tasks (--exec-mode shell or --exec ...) do NOT get LLM evaluation; failure semantics are 'exit code 0 = done, non-zero = failed (terminal, no rescue)'
    • failed-pending-eval applies ONLY to LLM agent tasks (full / light / bare exec modes)
    • Recurring shell tasks (cron with --exec) follow the same rule: each invocation succeeds-or-fails terminally, no rescue attempt

Validation

  • Behavior verified — paste the outcome of wg add --exec 'exit 1' end-to-end test in the task log (state trajectory: open → in-progress → ???)
  • If state machine is correct: cite file:line of the shell-mode bypass in the state-transition code
  • If state machine is buggy (shell task stuck in pending-eval): file a follow-up fix task with concrete patch
  • Quickstart updated to clarify the interaction between shell-mode and failed-pending-eval
  • If applicable: CLAUDE.md, agent-guide bundle, or COMMANDS.md updated with the same clarification (per the doc-sync function pattern)
  • cargo build + cargo test pass
  • cargo install --path . was run before claiming done

Process note

This is a small task that doubles as a behavioral assertion + docs update. If the behavior is fine, the docs change is small. If the behavior has a bug, file the fix separately and let this task land the docs update against current behavior — they shouldn't block each other.

Depends on

Required by

Log