standout-review-hostile

Metadata

Status	abandoned ‖ paused
Created	2026-05-01T22:08:23.157722705+00:00
Tags	`grant,urgent,review,v3`

Description

Erik wants an independent review of workgraph_google_application_FINAL_v3.md from a posture closer to an actual Google.org program officer reading a stack of 200 applications looking for reasons to cut. The auto-eval scored v3 at 0.78 LLM / 0.95 FLIP — passing, but the LLM dimension dropped from the ~0.94 average earlier in the day, suggesting weaknesses the grader sensed but didn't articulate.

Critical addition: web-link verification is part of the review. The application references poietic.life, GitHub repos, the WorkGraph docs site, and cited papers. If any of those are broken, mismatched, or weaker than what the application implies, a reviewer notices. This must be checked.

What to read

workgraph_google_application_FINAL_v3.md — committed at 70c8e7f on worktree branch wg/agent-77/v3-assemble-stitch. Use git show 70c8e7f:workgraph_google_application_FINAL_v3.md to read it.
workgraph_google_application_FINAL_v2.md (post-audit-fix on main) — for comparison
~/poietic.life/notes/v3-spine-brief.md — the intended frame
~/poietic.life/notes/v3-assembly-summary-20260501.md — what the assembler did
CLAUDE.md — project context, key narrative decisions, attribution rules

What to do

Part 1: Hostile reviewer pass on v3

Adopt the posture of a Google.org program officer who must cut 95%+ of applications. Read v3 looking for reasons to discount it. Specifically:

Overclaim hunt. Where does v3 promise more than it can deliver in 36 months? Flag exact section and quote.
Vague-claim hunt. Where does v3 use language that sounds important but doesn't commit to anything verifiable? ("reliable", "careful", "auditable" are tells if not operationally defined.)
Authority gaps. Where does v3 invoke founder track records that aren't actually relevant to the proposed work?
Internal inconsistencies. Do §17 (approach) and §26 (track record) and §29 (theory of change) tell the same story? Where do they drift?
The 'where's the science?' test. A reviewer used to seeing scientific deliverables may bounce off infrastructure-as-deliverable framing. Has §30's mitigation actually answered this objection or just acknowledged it?
Comparison to Liverpool Hive Mind. Does §28 land the complementary positioning, or does it accidentally invite a 'didn't we already fund this' read?

Identify the 3 weakest sections and write specific surgical fix proposals for each ("in §X, replace 'Y' with 'Z' because..."). Identify the 3 strongest sections — these stay untouched.

Part 2: v2 vs v3 honest comparison

Re-do the v1-vs-v2 style comparison but for v2 vs v3. Six dimensions:

Translational impact (Google.org cares about real-world benefit)
Defensibility under expert review
Authenticity to founders' track record
Demonstration credibility
Risk of falling apart under scrutiny
Fit to Google's stated priorities (Functional Genomics framing)

State which is stronger on each dimension. Recommend v2 OR v3 OR a fold-in. Be willing to recommend v2 if v3 has regressions v2 didn't.

Part 3: Web-link verification

For EVERY URL in v3, verify it:

Resolves (200 status, not 404 / dead / parked).
Loads content that actually matches what v3 implies.
Is not weaker than the application implies.

Specifically check:

poietic.life — does the landing page deliver on the v3 framing? Public benefit statement present? Founder bios consistent?
github.com/orgs/poietic-pbc — repos visible? Look credible? Anything stale or embarrassing (the deep-research-competition KRAS scaffold)?
github.com/graphwork/workgraph — actively developed? README delivers what v3 implies? Recent commits?
graphwork.github.io — docs site loads? Substantive?
Any cited paper DOIs / arXiv links / PubMed links — resolve correctly? Cite the right thing?

For each URL, log: URL, status, brief assessment (matches application / weaker than application / mismatch / dead). Flag mismatches as MUST FIX.

Output

Write ~/poietic.life/notes/v3-standout-review-20260501.md with sections:

Headline verdict (one paragraph): would you fund this if you were the program officer? Why or why not?
Three weakest sections with surgical fixes
Three strongest sections (don't break them)
v2 vs v3 honest comparison (six dimensions + recommendation)
Web-link verification table (URL | status | assessment | action)
MUST FIX before submit (consolidated punch list, distinguishing content fixes from manual Erik-only steps)
Optional improvements if time permits (deeper revisions, not blockers)

Cap: 1500 words total. Be terse and concrete.

wg log a one-paragraph summary on this task.

Constraints

Adopt actual hostile-reviewer posture, not 'mostly positive with minor notes.' If v3 has real weaknesses, name them.
For web-link verification, USE WebFetch on each URL. Do not assume.
No em-dashes (CLAUDE.md style rule).
Do not modify the v3 application file. Output is the review note only.
If v2 is stronger overall, say so. The point is honest critique, not v3 advocacy.

Validation

All seven listed inputs read
Three weakest sections identified with surgical fixes
Three strongest sections identified
v2 vs v3 six-dimension comparison with verdict
Every URL in v3 verified via WebFetch
Web-link table includes status + assessment for each URL
MUST FIX punch list distinguishes content fixes from Erik-only steps
Review note at ~/poietic.life/notes/v3-standout-review-YYYYMMDD.md
Under 1500 words

## Description

Erik wants an independent review of `workgraph_google_application_FINAL_v3.md` from a posture closer to an actual Google.org program officer reading a stack of 200 applications looking for reasons to cut. The auto-eval scored v3 at 0.78 LLM / 0.95 FLIP — passing, but the LLM dimension dropped from the ~0.94 average earlier in the day, suggesting weaknesses the grader sensed but didn't articulate.

Critical addition: web-link verification is part of the review. The application references poietic.life, GitHub repos, the WorkGraph docs site, and cited papers. If any of those are broken, mismatched, or weaker than what the application implies, a reviewer notices. This must be checked.

## What to read

1. `workgraph_google_application_FINAL_v3.md` — committed at `70c8e7f` on worktree branch `wg/agent-77/v3-assemble-stitch`. Use `git show 70c8e7f:workgraph_google_application_FINAL_v3.md` to read it.
2. `workgraph_google_application_FINAL_v2.md` (post-audit-fix on main) — for comparison
3. `~/poietic.life/notes/v3-spine-brief.md` — the intended frame
4. `~/poietic.life/notes/v3-assembly-summary-20260501.md` — what the assembler did
5. `CLAUDE.md` — project context, key narrative decisions, attribution rules

## What to do

### Part 1: Hostile reviewer pass on v3

Adopt the posture of a Google.org program officer who must cut 95%+ of applications. Read v3 looking for reasons to discount it. Specifically:

- **Overclaim hunt.** Where does v3 promise more than it can deliver in 36 months? Flag exact section and quote.
- **Vague-claim hunt.** Where does v3 use language that sounds important but doesn't commit to anything verifiable? ("reliable", "careful", "auditable" are tells if not operationally defined.)
- **Authority gaps.** Where does v3 invoke founder track records that aren't actually relevant to the proposed work?
- **Internal inconsistencies.** Do §17 (approach) and §26 (track record) and §29 (theory of change) tell the same story? Where do they drift?
- **The 'where's the science?' test.** A reviewer used to seeing scientific deliverables may bounce off infrastructure-as-deliverable framing. Has §30's mitigation actually answered this objection or just acknowledged it?
- **Comparison to Liverpool Hive Mind.** Does §28 land the complementary positioning, or does it accidentally invite a 'didn't we already fund this' read?

Identify the 3 weakest sections and write specific surgical fix proposals for each ("in §X, replace 'Y' with 'Z' because..."). Identify the 3 strongest sections — these stay untouched.

### Part 2: v2 vs v3 honest comparison

Re-do the v1-vs-v2 style comparison but for v2 vs v3. Six dimensions:
- Translational impact (Google.org cares about real-world benefit)
- Defensibility under expert review
- Authenticity to founders' track record
- Demonstration credibility
- Risk of falling apart under scrutiny
- Fit to Google's stated priorities (Functional Genomics framing)

State which is stronger on each dimension. Recommend v2 OR v3 OR a fold-in. Be willing to recommend v2 if v3 has regressions v2 didn't.

### Part 3: Web-link verification

For EVERY URL in v3, verify it:
1. Resolves (200 status, not 404 / dead / parked).
2. Loads content that actually matches what v3 implies.
3. Is not weaker than the application implies.

Specifically check:
- `poietic.life` — does the landing page deliver on the v3 framing? Public benefit statement present? Founder bios consistent?
- `github.com/orgs/poietic-pbc` — repos visible? Look credible? Anything stale or embarrassing (the deep-research-competition KRAS scaffold)?
- `github.com/graphwork/workgraph` — actively developed? README delivers what v3 implies? Recent commits?
- `graphwork.github.io` — docs site loads? Substantive?
- Any cited paper DOIs / arXiv links / PubMed links — resolve correctly? Cite the right thing?

For each URL, log: URL, status, brief assessment (matches application / weaker than application / mismatch / dead). Flag mismatches as MUST FIX.

## Output

Write `~/poietic.life/notes/v3-standout-review-20260501.md` with sections:

1. **Headline verdict** (one paragraph): would you fund this if you were the program officer? Why or why not?
2. **Three weakest sections with surgical fixes**
3. **Three strongest sections (don't break them)**
4. **v2 vs v3 honest comparison** (six dimensions + recommendation)
5. **Web-link verification table** (URL | status | assessment | action)
6. **MUST FIX before submit** (consolidated punch list, distinguishing content fixes from manual Erik-only steps)
7. **Optional improvements if time permits** (deeper revisions, not blockers)

Cap: 1500 words total. Be terse and concrete.

`wg log` a one-paragraph summary on this task.

## Constraints

- Adopt actual hostile-reviewer posture, not 'mostly positive with minor notes.' If v3 has real weaknesses, name them.
- For web-link verification, USE WebFetch on each URL. Do not assume.
- No em-dashes (CLAUDE.md style rule).
- Do not modify the v3 application file. Output is the review note only.
- If v2 is stronger overall, say so. The point is honest critique, not v3 advocacy.

## Validation
- [ ] All seven listed inputs read
- [ ] Three weakest sections identified with surgical fixes
- [ ] Three strongest sections identified
- [ ] v2 vs v3 six-dimension comparison with verdict
- [ ] Every URL in v3 verified via WebFetch
- [ ] Web-link table includes status + assessment for each URL
- [ ] MUST FIX punch list distinguishes content fixes from Erik-only steps
- [ ] Review note at `~/poietic.life/notes/v3-standout-review-YYYYMMDD.md`
- [ ] Under 1500 words

Depends on

(none)

Required by

(none)

Log

2026-05-01T22:08:23.155716617+00:00 Task paused
2026-05-01T22:08:48.369435959+00:00 Task abandoned