verify-end-to

Verify: end-to-end repro of all 4 poietic-session bugs

Metadata

Statusdone
Assignedagent-1023
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Modelclaude:sonnet
Created2026-04-28T22:24:26.914583356+00:00
Started2026-04-28T23:31:20.661686774+00:00
Completed2026-04-28T23:38:45.027780048+00:00
Tagsbug,smoke,verify, eval-scheduled
Eval score0.84
└ blocking impact0.95
└ completeness0.95
└ constraint fidelity0.85
└ coordination overhead0.90
└ correctness0.95
└ downstream usability0.85
└ efficiency0.90
└ intent fidelity0.82
└ style adherence0.90

Description

Description

Integrator task. Run all four bug repros end-to-end against the fixes in this batch. Confirm the bug behavior is gone AND the manifest grew permanent smoke scenarios for each.

Source bug docs:

  • /home/erik/workgraph/bug-reset-leaves-stale-claims.md (repro section, steps 1–6)
  • /home/erik/workgraph/bug-retry-doesnt-clear-stale-downstream-claims.md (repro section, steps 1–6)
  • /home/erik/workgraph/bug-failed-upstream-treated-as-satisfied.md (repro section)
  • /home/erik/workgraph/bug-read-tool-on-pdfs-burns-tokens-then-crashes.md (repro section)

Validation

  • Each of the 4 repros executed live against the freshly-built binary (cargo install --path . is the prerequisite). Document the run as evidence in the task log.
  • Bug 1 (reset): after wg reset + wg service resume, dispatcher spawns fresh agents (spawned > 0)
  • Bug 2 (retry-downstream): after upstream retry + completion, downstream actually spawns (no stale claim)
  • Bug 3 (failed-upstream): wg list shows downstream as blocked (not ready) while upstream is failed
  • Bug 4 (PDF): malformed PDF produces a distinguishable failure_class in wg show; total agent-cost-before-failure is bounded (preflight option) or differently surfaced (classification option)
  • Each fix has a permanent smoke scenario under tests/smoke/scenarios/, listed in tests/smoke/manifest.toml with the corresponding fix-* task id in its owners list. Verify by running wg done against any one fix task and confirming the matching scenario runs.
  • If any repro still shows the bug, file a new task and FAIL this verify task with details — do NOT mark done.

Depends on

Required by

Log