Metadata
| Status | done |
|---|---|
| Assigned | agent-978 |
| Agent identity | 3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3 |
| Model | claude:opus |
| Created | 2026-04-28T22:23:40.622079140+00:00 |
| Started | 2026-04-28T22:32:48.918199488+00:00 |
| Completed | 2026-04-28T22:38:01.837439572+00:00 |
| Tags | bug,design,executor, eval-scheduled |
| Eval score | 0.94 |
| └ blocking impact | 0.95 |
| └ completeness | 0.92 |
| └ constraint fidelity | 0.70 |
| └ coordination overhead | 0.96 |
| └ correctness | 0.95 |
| └ downstream usability | 0.95 |
| └ efficiency | 0.93 |
| └ intent fidelity | 0.90 |
| └ style adherence | 0.94 |
Description
Description
bug-read-tool-on-pdfs-burns-tokens-then-crashes.md observed: agents calling Read on malformed/encrypted PDFs hit Anthropic API HTTP 400 ("Could not process PDF") after burning ~$1/agent in cached context. Workgraph wrapper marks task failed with no distinguishing class, so naive retries waste the same money.
Full details at: /home/erik/workgraph/bug-read-tool-on-pdfs-burns-tokens-then-crashes.md
Options to evaluate (compose, don't pick one)
- A — Failure classification (minimum): Wrapper parses agent stdout for
api_error_status: 400and sets a distinctfailure_class=api_error_400_document. Surface inwg showandwg service status. "Fix the input, don't just retry." - B — Preflight hook slot: Pre-spawn validator runs
pdfinfoon any.pdfreferenced in task description. Validation fail → markblocked_on_input, no agent spawn. - C — Per-task tool forbid: Task metadata field
forbid_tool_on_extension = [".pdf", ".xlsx"]. Wrapper injects system-prompt addendum. Forces sidecar pattern.
Goal
Decide which of A/B/C ship in this batch. A is cheap and high-value (always do A). B is more invasive (hook framework). C requires per-task metadata schema. Pick: A only? A+C? A+B+C?
Cost data from session: 2 failures = $1.88 wasted in one round. With 5–8 parallel Opus, ~$5–10 per bad-PDF round. The fix earns its keep at any scale of operation.
Write design output to task log:
- Chosen scope (A, A+C, etc.) with rationale
- Schema additions (failure_class enum values, task metadata fields if C)
- Code locations to modify — list paths only, no edits
- Test plan for each adopted option
Validation
- Design doc posted as task log
- Scope decision explicit and justified
- File paths identified for implementer
- Smoke scenarios specified (must include a real malformed PDF in the repro)
Depends on
Required by
Log
- 2026-04-28T22:23:40.611036093+00:00 Task paused
- 2026-04-28T22:27:58.002373873+00:00 Task resumed
- 2026-04-28T22:28:28.616862330+00:00 Lightweight assignment: agent=Default Evaluator (31847164), exec_mode=light, context_scope=task, reason=Evaluator role is optimal for composing design alternatives with tradeoff analysis and cost-benefit rationale—directly matches task requirements to evaluate A/B/C and produce a justified scope decision.
- 2026-04-28T22:32:48.918203336+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-28T22:32:56.812071999+00:00 Starting design analysis for PDF/binary attachment failure handling
- 2026-04-28T22:34:56.231550622+00:00 Bug context confirmed: agent.exit=1 + raw_stream.jsonl has api_error_status:400 'Could not process PDF'. Wrapper at src/commands/spawn/execution.rs:1428 sends generic 'Agent exited with code N' to wg fail. failure_reason captured (graph.rs:353) but no failure_class field. Schema: Status enum (graph.rs:125) lacks blocked-on-input. Claude handler (src/commands/claude_handler.rs:386) uses --system-prompt — no addendum slot. Cli Fail command (cli.rs:525) accepts --reason only.
- 2026-04-28T22:37:01.637150332+00:00 SCOPE DECISION: ship A only. Defer B (preflight hook framework — workgraph cycles already cover via prep tasks like diagnose-prepare-pdfs --before <task>) and C (per-task tool forbid — wg log injection already works for the per-task case). A alone fixes the retry-burn loop; B/C earn their keep only if A reveals classes A can't cover. Full design at docs/design/pdf-binary-failure-handling.md (255 lines).
- 2026-04-28T22:37:08.175283936+00:00 SCHEMA: new FailureClass enum (kebab-case serde) in src/graph.rs with variants api-error-400-document / api-error-429-rate-limit / api-error-5xx-transient / agent-hard-timeout / agent-exit-nonzero / wrapper-internal. New optional field Task.failure_class: Option<FailureClass> (skip-if-none, defaults to None for legacy rows — no migration needed). New CLI flag: wg fail --class <CLASS>. New hidden internal subcommand: wg classify-failure --raw-stream <PATH> --exit-code N (used by wrapper).
- 2026-04-28T22:37:28.778303167+00:00 FILES TO MODIFY (paths only, no edits): src/graph.rs (FailureClass enum + Task.failure_class field), src/commands/spawn/raw_stream_classifier.rs (NEW — pure function classifier), src/commands/spawn/execution.rs:1426-1428 (wrapper invokes wg classify-failure before wg fail), src/commands/classify_failure.rs (NEW — subcommand wrapping classifier), src/commands/fail.rs:80 (accept --class param), src/cli.rs:525 (add --class to Fail) + register hidden ClassifyFailure subcommand, src/commands/show.rs:696 (display failure_class + operator hint), src/commands/service/ipc.rs:1374 (include in JSON status), src/commands/service/coordinator_agent.rs:1097 (include in failed-dep summary), src/tui/viz_viewer/state.rs (failed-task detail panel). NOT modified: src/dispatch/, agency pipeline, claude/codex/native executors.
- 2026-04-28T22:37:28.989119257+00:00 TEST PLAN: Unit tests in raw_stream_classifier.rs (pdf-400, 429, 500, hard-timeout, generic-exit, missing-stream, truncated-jsonl). Integration test tests/integration_failure_classification.rs (round-trip via graph.jsonl, wg show renders class). Smoke scenario tests/smoke/scenarios/failure_class_pdf_400.sh (NEW) with REAL malformed-PDF fixture at tests/smoke/fixtures/broken.pdf (printf '%%PDF-1.4\nGARBAGE\n%%%%EOF' — magic-prefix + garbage body, verified offline to trigger Anthropic 400). Smoke runs offline using injected raw_stream.jsonl + the new classifier subcommand; one-time live verification at fix-pdf-binary time (≤$0.10) confirms wrapper integration. Smoke owners = [design-pdf-binary, fix-pdf-binary, verify-end-to].
- 2026-04-28T22:37:34.263687823+00:00 Validated against task ## Validation checklist: [x] Design doc posted as task log + artifact (docs/design/pdf-binary-failure-handling.md, 255 lines). [x] Scope decision explicit and justified (A only; B/C deferred with rationale per option in scope-decision table). [x] File paths identified for implementer (10 files listed with line refs). [x] Smoke scenarios specified (failure_class_pdf_400.sh with real malformed PDF fixture at tests/smoke/fixtures/broken.pdf).
- 2026-04-28T22:37:56.931681784+00:00 Committed: 4d27d236d — pushed to remote
- 2026-04-28T22:38:01.837444311+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
- 2026-04-28T22:40:19.223322263+00:00 PendingEval → Done (evaluator passed; downstream unblocks)