design-pdf-binary

Design: PDF / binary attachment failure handling

Metadata

Statusdone
Assignedagent-978
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Modelclaude:opus
Created2026-04-28T22:23:40.622079140+00:00
Started2026-04-28T22:32:48.918199488+00:00
Completed2026-04-28T22:38:01.837439572+00:00
Tagsbug,design,executor, eval-scheduled
Eval score0.94
└ blocking impact0.95
└ completeness0.92
└ constraint fidelity0.70
└ coordination overhead0.96
└ correctness0.95
└ downstream usability0.95
└ efficiency0.93
└ intent fidelity0.90
└ style adherence0.94

Description

Description

bug-read-tool-on-pdfs-burns-tokens-then-crashes.md observed: agents calling Read on malformed/encrypted PDFs hit Anthropic API HTTP 400 ("Could not process PDF") after burning ~$1/agent in cached context. Workgraph wrapper marks task failed with no distinguishing class, so naive retries waste the same money.

Full details at: /home/erik/workgraph/bug-read-tool-on-pdfs-burns-tokens-then-crashes.md

Options to evaluate (compose, don't pick one)

  • A — Failure classification (minimum): Wrapper parses agent stdout for api_error_status: 400 and sets a distinct failure_class=api_error_400_document. Surface in wg show and wg service status. "Fix the input, don't just retry."
  • B — Preflight hook slot: Pre-spawn validator runs pdfinfo on any .pdf referenced in task description. Validation fail → mark blocked_on_input, no agent spawn.
  • C — Per-task tool forbid: Task metadata field forbid_tool_on_extension = [".pdf", ".xlsx"]. Wrapper injects system-prompt addendum. Forces sidecar pattern.

Goal

Decide which of A/B/C ship in this batch. A is cheap and high-value (always do A). B is more invasive (hook framework). C requires per-task metadata schema. Pick: A only? A+C? A+B+C?

Cost data from session: 2 failures = $1.88 wasted in one round. With 5–8 parallel Opus, ~$5–10 per bad-PDF round. The fix earns its keep at any scale of operation.

Write design output to task log:

  • Chosen scope (A, A+C, etc.) with rationale
  • Schema additions (failure_class enum values, task metadata fields if C)
  • Code locations to modify — list paths only, no edits
  • Test plan for each adopted option

Validation

  • Design doc posted as task log
  • Scope decision explicit and justified
  • File paths identified for implementer
  • Smoke scenarios specified (must include a real malformed PDF in the repro)

Depends on

Required by

Log