audit-existing-manuscript — octopus01:/moosefs/erikg/phrs

Metadata

Status	abandoned ‖ paused
Created	2026-05-05T22:54:16.687013174+00:00
Tags	`paper-prep`, `audit`, `rewrite-anchor`
Failure reason	rewriting with Nature companion + BoG-this-week constraints; will depend on park-off-target-and-anchor

Description

CRITICAL CONTEXT — read this carefully before doing anything else.

The user has reviewed the previously-rendered MANUSCRIPT_DRAFT.pdf and rejected it as fundamentally off-target. Verbatim user feedback:

'It doesn't match at all what we've been up to. Only in the most superficial way. ... I don't think anything that's been done is actually correct. ... The paper drafts you produced is completely useless.' 'Even the figures don't make any sense. ... What you have is like, it's from another dimension. It focuses on all the minutiae that we use at the end of this abstract, effectively, to bolster our argument. Instead, you make it the core argument of the paper, and it makes it incredibly boring.' 'Furthermore, you missed the connection to the human pan-genome reference version 2, HPRC v2, and that's so fundamental, it was a companion paper to that manuscript. So this needs to be reflected.'

THE CANONICAL ABSTRACT (this is the anchor — every rewrite decision must serve it):

Title: Concerted evolution and unorthodox recombination of human subtelomeres Authors: Andrea Guarracino and Erik Garrison

Human subtelomeric regions are among the most dynamic and structurally complex parts of our genome, yet their interchromosomal relationships have remained difficult to characterize due to the limitations of both assembly completeness and alignment methodology. Here we present the most comprehensive survey of subtelomeric sequence relationships to date, leveraging 466 near-complete haplotype assemblies from the Human Pangenome Reference Consortium (HPRC) version 2. To analyze these regions, we introduce the implicit pangenome graph, a reference-free alignment approach that performs all-to-all pairwise comparisons across haplotypes—sampling approximately 12% of all possible combinations—without imposing chromosomal partitioning or positional bias. This yields a truly unbiased view of interchromosomal homology across the pangenome, where every haplotype serves as its own point of reference, allowing a systematic and universal view of human subtelomeric evolution.

A genome-wide survey of alignment identity reveals extended regions of interchromosomal homology spanning tens to hundreds of kilobases at nearly all subtelomeres—comparable in scale to canonical pseudohomologous systems such as PAR2 on the sex chromosomes. This dramatically expands the scope of known pseudohomologous regions in the human genome to include almost all subtelomeric regions. Cladistic analysis based on neighbor-joining trees of subtelomeric similarity uncovers both expected relationships—Xp/Yp and Xq/Yq via the pseudoautosomal regions, acrocentric short arms—and novel associations, including strong 10p–18p homology, a tightly linked clade involving 22q, 21q, 19q, 1q, 13q, and 17q, and extended DUX4-containing homology between 4q and 10q with wide copy number diversity. A large clade of many chromosome arms shares homology at moderate similarity, suggesting broad ongoing interchromosomal exchange. Principal component and community detection analyses of the similarity matrix further resolve subtelomeric clustering across human populations. We hypothesize that these patterns are maintained by recombination facilitated by the physical proximity of subtelomeres at the nuclear envelope, and evaluate this using Hi-C-derived three-dimensional genome maps. Our work exposes the extent to which ongoing recombination shapes these highly dynamic and poorly understood regions of the genome.

KEY CLAIMS THAT MUST BE THE CORE OF THE PAPER (extracted from abstract):

This is a COMPANION PAPER to HPRC v2 — that framing is non-negotiable, it must be in the intro and positioning.
Methodological contribution: 'implicit pangenome graph' — reference-free, all-to-all, samples ~12% of haplotype pairs, no chromosomal partitioning bias. This is core, not minutiae.
Empirical contribution: extended interchromosomal subtelomeric homology at tens-to-hundreds-of-kb scale across nearly all subtelomeres — comparable to PAR2.
Specific cladistic findings to report:
- Expected: Xp/Yp & Xq/Yq via PARs; acrocentric short arms.
- Novel: 10p–18p homology; the {22q,21q,19q,1q,13q,17q} clade; extended DUX4-containing 4q–10q homology with copy-number diversity; large moderate-similarity clade.
PCA + community detection → subtelomeric clustering across human populations.
Hi-C 3D maps → tests the 'recombination via nuclear-envelope proximity' hypothesis.
Framing thesis: ongoing recombination shapes subtelomeres → 'concerted evolution and unorthodox recombination.'

EXISTING MATERIALS TO AUDIT (do not delete; just classify):

paper_prep/synthesis/MANUSCRIPT_DRAFT.md — current rendered draft, user says off-target
paper_prep/synthesis/MANUSCRIPT_SKELETON.md
paper_prep/synthesis/CAPTIONS.md
paper_prep/synthesis/WORK_DECOMPOSITION.md
paper_prep/synthesis/SCRIPT_INVENTORY.md
paper_prep/synthesis/STATS_AUDIT.md
paper_prep/synthesis/ACCEPTANCE_CHECKLIST.md
paper_prep/synthesis/ARCHITECT_TASK_BRIEF.md
paper_prep/synthesis/TALK_OUTLINE_15MIN.md
paper_prep/synthesis/VERSIONS.md
paper_prep/figures/fig1, fig2, fig3, fig4, ed1, ed2, ed3, ed4, ed5, ed8 — directories with figure assets, captions, source scripts. Read each fig directory's README/notes/captions to determine actual content — do NOT trust the directory names alone.
Scattered top-level files in repo root: many ora, phyper, copy_number_enrichment.md/csv/R files. The user's comment 'minutiae used to bolster the argument made the core' suggests these copy-number-enrichment / ORA materials were inappropriately elevated to core. Catalog them but mark as supporting-material-at-most.

TASK — produce three deliverables under paper_prep/synthesis/:

Deliverable 1: ABSTRACT.md - Save the canonical abstract above verbatim. No edits, no embellishment. - This becomes the anchor every future rewrite task references.

Deliverable 2: AUDIT_REPORT.md Structure: ## Synthesis docs audit For each .md file under paper_prep/synthesis/ (except ABSTRACT.md), one row: Filename | Topic actually covered | Aligned with abstract? (Y/partial/N) | Salvageable content (specific section refs) | Reason for verdict ## Figures audit For each fig{1..4} and ed{1..8} directory, one row: Figure | Title/topic per local notes | Maps to which abstract claim? | Aligned? (Y/partial/N) | Action recommended (keep / minor revise / redo / scrap) | Reason ## Data & code assets audit For each abstract claim (1-7 above), list the specific data files / scripts in the repo that support it, OR mark 'MISSING — needs to be produced.' The 466-haplotype HPRC v2 dataset, the ~12% pairwise sampling, the implicit pangenome graph alignments, NJ trees, similarity matrix, PCA, community detection results, and Hi-C cross-check are the seven concrete claims to chase. ## Off-target materials inventory List the copy-number-enrichment / ORA / phyper / weighted-hypergeometric files cluttering the repo root and synthesis. Note: do NOT delete; just inventory and recommend whether each could plausibly serve as supplementary material (most probably cannot).

Deliverable 3: REWRITE_PLAN.md A bounded task decomposition for the rewrite. Constraints: - Total tasks should be in the 20–60 range. Stop at 60. If you feel the urge to make 100+, prune. - Each task entry follows the template: ### TASK-NN: Inputs: <specific files / data sources> Output: <specific filename + format> Acceptance: Depends on: <prior TASK-NN ids, or 'none'> - Sequence the plan: data/methods recovery → results sections → figures (revise vs redo) → integration → render. Reuse the typst pipeline that just succeeded (paper_prep/synthesis/MANUSCRIPT_DRAFT.typ via pandoc 3.5 + typst 0.13.1 — see render-manuscript-draft-4 task log for the working incantation). - Explicitly include a TASK that establishes the HPRC v2 companion-paper framing in the introduction. This is non-negotiable. - Explicitly include TASK(s) that produce or recover the implicit-pangenome-graph methods description and the ~12%-pairwise-sampling methods description. - Do NOT pre-decompose work whose shape will only be clear after the audit (e.g., individual figure redos). Instead, produce one umbrella task per figure that reads 'Decide based on AUDIT_REPORT.md whether to keep/revise/redo, and execute.' Detailed sub-tasks can be added after this audit returns.

ACCEPTANCE:

paper_prep/synthesis/ABSTRACT.md exists, byte-for-byte matches the abstract embedded above (whitespace tolerance acceptable).
paper_prep/synthesis/AUDIT_REPORT.md exists; covers ALL existing synthesis .md files and ALL figure directories; uses the structured tables described above.
paper_prep/synthesis/REWRITE_PLAN.md exists; contains 20–60 tasks; each task has Inputs/Output/Acceptance/Depends-on filled in; explicitly addresses HPRC-v2-companion-framing and implicit-pangenome-graph methods.
Commit with message 'docs: audit + rewrite plan anchored on canonical abstract'.

DO NOT in this task:

Rewrite any section of the manuscript.
Redo any figure.
Delete or move any existing files.
Run any data analysis. This task is purely classification and planning. Subsequent tasks will be dispatched from REWRITE_PLAN.md.

CRITICAL CONTEXT — read this carefully before doing anything else.

The user has reviewed the previously-rendered MANUSCRIPT_DRAFT.pdf and rejected it as fundamentally off-target. Verbatim user feedback:

> 'It doesn't match at all what we've been up to. Only in the most superficial way. ... I don't think anything that's been done is actually correct. ... The paper drafts you produced is completely useless.'
> 'Even the figures don't make any sense. ... What you have is like, it's from another dimension. It focuses on all the minutiae that we use at the end of this abstract, effectively, to bolster our argument. Instead, you make it the core argument of the paper, and it makes it incredibly boring.'
> 'Furthermore, you missed the connection to the human pan-genome reference version 2, HPRC v2, and that's so fundamental, it was a companion paper to that manuscript. So this needs to be reflected.'

THE CANONICAL ABSTRACT (this is the anchor — every rewrite decision must serve it):

Title: Concerted evolution and unorthodox recombination of human subtelomeres
Authors: Andrea Guarracino and Erik Garrison

KEY CLAIMS THAT MUST BE THE CORE OF THE PAPER (extracted from abstract):
1. This is a COMPANION PAPER to HPRC v2 — that framing is non-negotiable, it must be in the intro and positioning.
2. Methodological contribution: 'implicit pangenome graph' — reference-free, all-to-all, samples ~12% of haplotype pairs, no chromosomal partitioning bias. This is core, not minutiae.
3. Empirical contribution: extended interchromosomal subtelomeric homology at tens-to-hundreds-of-kb scale across nearly all subtelomeres — comparable to PAR2.
4. Specific cladistic findings to report:
- Expected: Xp/Yp & Xq/Yq via PARs; acrocentric short arms.
- Novel: 10p–18p homology; the {22q,21q,19q,1q,13q,17q} clade; extended DUX4-containing 4q–10q homology with copy-number diversity; large moderate-similarity clade.
5. PCA + community detection → subtelomeric clustering across human populations.
6. Hi-C 3D maps → tests the 'recombination via nuclear-envelope proximity' hypothesis.
7. Framing thesis: ongoing recombination shapes subtelomeres → 'concerted evolution and unorthodox recombination.'

EXISTING MATERIALS TO AUDIT (do not delete; just classify):
- paper_prep/synthesis/MANUSCRIPT_DRAFT.md — current rendered draft, user says off-target
- paper_prep/synthesis/MANUSCRIPT_SKELETON.md
- paper_prep/synthesis/CAPTIONS.md
- paper_prep/synthesis/WORK_DECOMPOSITION.md
- paper_prep/synthesis/SCRIPT_INVENTORY.md
- paper_prep/synthesis/STATS_AUDIT.md
- paper_prep/synthesis/ACCEPTANCE_CHECKLIST.md
- paper_prep/synthesis/ARCHITECT_TASK_BRIEF.md
- paper_prep/synthesis/TALK_OUTLINE_15MIN.md
- paper_prep/synthesis/VERSIONS.md
- paper_prep/figures/fig1, fig2, fig3, fig4, ed1, ed2, ed3, ed4, ed5, ed8 — directories with figure assets, captions, source scripts. Read each fig directory's README/notes/captions to determine actual content — do NOT trust the directory names alone.
- Scattered top-level files in repo root: many *_ora_*, *_phyper_*, *_copy_number_enrichment_*.md/csv/R files. The user's comment 'minutiae used to bolster the argument made the core' suggests these copy-number-enrichment / ORA materials were inappropriately elevated to core. Catalog them but mark as supporting-material-at-most.

TASK — produce three deliverables under paper_prep/synthesis/:

Deliverable 1: ABSTRACT.md
- Save the canonical abstract above verbatim. No edits, no embellishment.
- This becomes the anchor every future rewrite task references.

Deliverable 2: AUDIT_REPORT.md
Structure:
## Synthesis docs audit
For each .md file under paper_prep/synthesis/ (except ABSTRACT.md), one row:
Filename | Topic actually covered | Aligned with abstract? (Y/partial/N) | Salvageable content (specific section refs) | Reason for verdict
## Figures audit
For each fig{1..4} and ed{1..8} directory, one row:
Figure | Title/topic per local notes | Maps to which abstract claim? | Aligned? (Y/partial/N) | Action recommended (keep / minor revise / redo / scrap) | Reason
## Data & code assets audit
For each abstract claim (1-7 above), list the specific data files / scripts in the repo that support it, OR mark 'MISSING — needs to be produced.' The 466-haplotype HPRC v2 dataset, the ~12% pairwise sampling, the implicit pangenome graph alignments, NJ trees, similarity matrix, PCA, community detection results, and Hi-C cross-check are the seven concrete claims to chase.
## Off-target materials inventory
List the copy-number-enrichment / ORA / phyper / weighted-hypergeometric files cluttering the repo root and synthesis. Note: do NOT delete; just inventory and recommend whether each could plausibly serve as supplementary material (most probably cannot).

Deliverable 3: REWRITE_PLAN.md
A bounded task decomposition for the rewrite. Constraints:
- Total tasks should be in the 20–60 range. Stop at 60. If you feel the urge to make 100+, prune.
- Each task entry follows the template:
### TASK-NN: <imperative title>
Inputs: <specific files / data sources>
Output: <specific filename + format>
Acceptance: <one checkable condition>
Depends on: <prior TASK-NN ids, or 'none'>
- Sequence the plan: data/methods recovery → results sections → figures (revise vs redo) → integration → render. Reuse the typst pipeline that just succeeded (paper_prep/synthesis/MANUSCRIPT_DRAFT.typ via pandoc 3.5 + typst 0.13.1 — see render-manuscript-draft-4 task log for the working incantation).
- Explicitly include a TASK that establishes the HPRC v2 companion-paper framing in the introduction. This is non-negotiable.
- Explicitly include TASK(s) that produce or recover the implicit-pangenome-graph methods description and the ~12%-pairwise-sampling methods description.
- Do NOT pre-decompose work whose shape will only be clear after the audit (e.g., individual figure redos). Instead, produce one umbrella task per figure that reads 'Decide based on AUDIT_REPORT.md whether to keep/revise/redo, and execute.' Detailed sub-tasks can be added after this audit returns.

ACCEPTANCE:
- paper_prep/synthesis/ABSTRACT.md exists, byte-for-byte matches the abstract embedded above (whitespace tolerance acceptable).
- paper_prep/synthesis/AUDIT_REPORT.md exists; covers ALL existing synthesis .md files and ALL figure directories; uses the structured tables described above.
- paper_prep/synthesis/REWRITE_PLAN.md exists; contains 20–60 tasks; each task has Inputs/Output/Acceptance/Depends-on filled in; explicitly addresses HPRC-v2-companion-framing and implicit-pangenome-graph methods.
- Commit with message 'docs: audit + rewrite plan anchored on canonical abstract'.

DO NOT in this task:
- Rewrite any section of the manuscript.
- Redo any figure.
- Delete or move any existing files.
- Run any data analysis.
This task is purely classification and planning. Subsequent tasks will be dispatched from REWRITE_PLAN.md.

Depends on

(none)

Required by

(none)

Log

2026-05-05T22:54:16.670820255+00:00 Task paused
2026-05-05T23:09:07.769212514+00:00 Task abandoned: rewriting with Nature companion + BoG-this-week constraints; will depend on park-off-target-and-anchor