fix-readme-s — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-2289`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Model	`codex:gpt-5.5`
Created	2026-05-04T15:16:14.781930817+00:00
Started	2026-05-04T15:17:18.707217097+00:00
Completed	2026-05-04T15:30:35.812525644+00:00
Tags	`fix,docs,readme,evaluation`, `eval-scheduled`
Eval score	0.90
└ blocking impact	0.90
└ completeness	1.00
└ constraint fidelity	0.85
└ coordination overhead	0.80
└ correctness	0.95
└ downstream usability	0.75
└ efficiency	0.85
└ intent fidelity	0.82
└ style adherence	0.90

Description

README.md contains a 'Terminal-Bench evaluation' section presenting null-result data from an early prototype run. The run was buggy and not representative. STRAIGHT REMOVE — not demote, not archive-with-preface. Just remove the reference from the README entirely.

User direct quotes 2026-05-04:

'we need a task to remove the tb references they are very stale and not appropriate in the readme!'
'we should straight remove the reference to terminalbench!'
'it's not right'
'it was a messed up run'

What to change

README.md

DELETE the entire Terminal-Bench evaluation section (the table with 52.3%/51.4%/49.0%, the 'no statistically significant difference' framing, the easy/medium/hard breakdown, and any link to terminal-bench/BLOG.md from README)
DELETE any other reference to Terminal-Bench in the README
Do NOT replace with a half-archival note — straight remove

terminal-bench/ directory

Leave it untouched on disk (git history preserves the work)
No need for archival prefaces or 'this is superseded' notices in the directory itself — it's just not promoted from the README anymore

Why straight remove

The run was 'messed up' (user's words). The data is unreliable. Keeping it in the README — even framed as 'historical' or 'superseded' — still presents stale buggy-prototype data as the project's quantitative section. A skeptical reader's takeaway is the same regardless of framing: 'they have a null result, workgraph doesn't help.' Straight removal eliminates that misread entirely.

If a real evaluation is run later, that goes into the README. Until then: no eval section. Simpler than complicated archival framing.

Validation

grep README.md for 'terminal' / 'tb' / 'bench' / '52.3' / '51.4' / '49.0' — all matches removed
No new 'archival' / 'superseded' framing added — just removed
terminal-bench/ directory contents untouched (preserved on disk for reproducibility, just not surfaced from README)
cargo build + cargo test pass (defensive — docs only)
cargo install --path . was run before claiming done

Per skip-back-compat-ceremony memory

Hard removal is the standing default. No deprecation framing. Just delete the section.

## Description
README.md contains a 'Terminal-Bench evaluation' section presenting null-result data from an early prototype run. The run was buggy and not representative. **STRAIGHT REMOVE** — not demote, not archive-with-preface. Just remove the reference from the README entirely.

User direct quotes 2026-05-04:
- 'we need a task to remove the tb references they are very stale and not appropriate in the readme!'
- 'we should straight remove the reference to terminalbench!'
- 'it's not right'
- 'it was a messed up run'

## What to change

### README.md
- DELETE the entire Terminal-Bench evaluation section (the table with 52.3%/51.4%/49.0%, the 'no statistically significant difference' framing, the easy/medium/hard breakdown, and any link to terminal-bench/BLOG.md from README)
- DELETE any other reference to Terminal-Bench in the README
- Do NOT replace with a half-archival note — straight remove

### terminal-bench/ directory
- Leave it untouched on disk (git history preserves the work)
- No need for archival prefaces or 'this is superseded' notices in the directory itself — it's just not promoted from the README anymore

## Why straight remove

If a real evaluation is run later, that goes into the README. Until then: no eval section. Simpler than complicated archival framing.

## Validation
- [ ] grep README.md for 'terminal' / 'tb' / 'bench' / '52.3' / '51.4' / '49.0' — all matches removed
- [ ] No new 'archival' / 'superseded' framing added — just removed
- [ ] terminal-bench/ directory contents untouched (preserved on disk for reproducibility, just not surfaced from README)
- [ ] cargo build + cargo test pass (defensive — docs only)
- [ ] cargo install --path . was run before claiming done

## Per skip-back-compat-ceremony memory
Hard removal is the standing default. No deprecation framing. Just delete the section.

Depends on

done .assign-fix-readme-s

Required by

done rewrite-readme-manifesto-5

Log

2026-05-04T15:16:14.745600296+00:00 Task paused
2026-05-04T15:16:52.378176078+00:00 Task published
2026-05-04T15:17:15.445159848+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer is the best fit for straightforward documentation cleanup requiring thorough verification (grep, build, test, install validation).
2026-05-04T15:17:18.707222677+00:00 Spawned by coordinator --executor codex --model gpt-5.5
2026-05-04T15:17:33.338067537+00:00 Starting README cleanup; no unread messages at session start
2026-05-04T15:29:12.697421563+00:00 Validation: README stale Terminal-Bench strings/links/percentages removed; no archival/superseded framing added; terminal-bench directory diff is empty
2026-05-04T15:29:48.471160542+00:00 Validation: cargo build passed; cargo test failed in pre-existing integration_cycle_detection wg init/no-model tests unrelated to README docs; cargo install --path . completed; committed 864549f21
2026-05-04T15:30:35.812533269+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-05-04T15:33:00.637798830+00:00 PendingEval → Done (evaluator passed; downstream unblocks)