Metadata
| Status | done |
|---|---|
| Assigned | agent-986 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Model | claude:sonnet |
| Created | 2026-04-28T22:24:15.980418280+00:00 |
| Started | 2026-04-28T22:40:20.513713784+00:00 |
| Completed | 2026-04-28T23:29:14.500597722+00:00 |
| Tags | bug,fix,executor, eval-scheduled |
| Eval score | 0.83 |
| └ blocking impact | 0.85 |
| └ completeness | 0.90 |
| └ coordination overhead | 0.88 |
| └ correctness | 0.85 |
| └ downstream usability | 0.82 |
| └ efficiency | 0.80 |
| └ intent fidelity | 0.78 |
| └ style adherence | 0.88 |
Description
Description
Implement the scope chosen in design-pdf-binary. Read that task's log first via wg show design-pdf-binary for the chosen scope (A only / A+C / A+B+C), schema additions, and file paths.
Bug being fixed: bug-read-tool-on-pdfs-burns-tokens-then-crashes.md — agents burn ~$1/agent on cached context before hitting Anthropic API HTTP 400 on malformed PDFs; failure_class is undifferentiated so retries waste the same money.
Validation (adapt to chosen scope)
- Failing tests written first (TDD)
-
Option A: agent stdout containing
api_error_status: 400produces a distinctfailure_classvalue visible inwg show <task> -
Option A:
wg service statussummary surfaces api_error_400 failures distinctly from generic exit-1 failures -
Option B (if in scope): preflight hook with
pdfinfovalidator marks task asblocked_on_inputwithout spawning an agent for a malformed PDF -
Option C (if in scope): task metadata
forbid_tool_on_extensioninjects a system-prompt addendum and the agent does not call Read on forbidden extensions -
Repro from bug doc reproduces the BAD behavior on
mainand the FIXED behavior on this branch (record evidence in task log — agent stdout snippets, wg show output) - cargo build + cargo test pass
- Permanent smoke scenario added under tests/smoke/scenarios/ — must use a REAL malformed/encrypted PDF fixture (not stubbed). Owners list includes this task id.
- cargo install --path . was run before claiming done
Depends on
Required by
Log
- 2026-04-28T22:24:15.959832477+00:00 Task paused
- 2026-04-28T22:29:15.944541343+00:00 Task resumed
- 2026-04-28T22:29:44.530131437+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer specialized for correctness-critical bug fixes; task requires TDD, test coverage, and thorough validation with proven track record on complex implementation.
- 2026-04-28T22:40:20.513718673+00:00 Spawned by coordinator --executor claude --model sonnet
- 2026-04-28T22:40:51.631432355+00:00 Starting implementation: scope A only (failure classification). Reading existing code structure.
- 2026-04-28T22:50:06.110707912+00:00 Step 1 done: FailureClass enum + failure_class field added to graph.rs, builds clean.
- 2026-04-28T23:17:19.087705596+00:00 Integration tests pass (7/7). Creating smoke scenario and fixture.
- 2026-04-28T23:27:03.551161194+00:00 Committed: 3e4cc3f25 — pushed to remote
- 2026-04-28T23:27:03.574980036+00:00 Validation complete:
- 2026-04-28T23:27:03.603537940+00:00 [x] FailureClass enum + Task.failure_class field in graph.rs
- 2026-04-28T23:27:03.627286289+00:00 [x] raw_stream_classifier.rs with 13 unit tests (all passing)
- 2026-04-28T23:27:03.652024411+00:00 [x] wg classify-failure hidden subcommand
- 2026-04-28T23:27:03.681135538+00:00 [x] wg fail --class <CLASS> persists to graph.jsonl
- 2026-04-28T23:27:03.705774834+00:00 [x] Wrapper invokes classify-failure before wg fail
- 2026-04-28T23:27:03.737298283+00:00 [x] wg show surfaces failure_class + operator hint
- 2026-04-28T23:27:03.759997045+00:00 [x] ipc.rs / coordinator_agent.rs / TUI surfacing
- 2026-04-28T23:27:03.787893565+00:00 [x] 7 integration tests passing
- 2026-04-28T23:27:03.811862068+00:00 [x] smoke scenario failure_class_pdf_400.sh PASS with real broken.pdf fixture
- 2026-04-28T23:27:03.834727304+00:00 [x] cargo build + cargo test pass (pre-existing failures on main unchanged)
- 2026-04-28T23:27:03.863957805+00:00 [x] cargo install --path . completed
- 2026-04-28T23:29:14.500604956+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
- 2026-04-28T23:31:19.316687416+00:00 PendingEval → Done (evaluator passed; downstream unblocks)