Description
Description
Produce Extended Data ED1 (4 panels) and ED2 (4 panels). Implement directly — do not decompose further.
File scope
paper_prep/figures/ed1/figure_ed1.{pdf,png,R/py}, caption.md, sources.tsv
paper_prep/figures/ed2/figure_ed2.{pdf,png,R/py}, caption.md, sources.tsv
Figure spec (excerpt from MANUSCRIPT_SKELETON.md ED1, ED2)
ED1 — Pipeline and per-arm flank inventory
| Panel | Content | Status | Source |
| ED1a | Pipeline schematic: 465 assemblies → 18,827 flanks → 15,668 PHRs → 15/50 communities | GENERATE | new schematic |
| ED1b | Per-arm flank counts (48 arms) with assembly QC overlay | GENERATE | contig_classifications.tsv — SURVEY_01 §3 |
| ED1c | PHR length distribution (median 105 kb, mean 144 kb) | GENERATE | all-vs-all.1Mb.p95.id95.len.tsv — SURVEY_01 §3 |
| ED1d | Chr18_q (NA18982#1) chimera evidence — wfmash + minimap2 dotplot + NNN gap + Flagger | GENERATE | SURVEY_01 §1.5, §5 item 6 |
ED2 — Sequence-level (50-community) detail
| Panel | Content | Status | Source |
| ED2a | UMAP / force-directed layout coloured by 50-community partition | READY/composite | plot-seq-community-structure.R outputs (/moosefs/.../similarity/) |
| ED2b | Within-community Jaccard distance bimodality (C1, C2, C3, C5, C6, C7, C11, C12) | GENERATE | similarity.tsv.gz per community subsets — SURVEY_04 §1.10, §6 F10 |
| ED2c | Cross-arm affinity circular plot — 41 arms with edges weighted by absorbed sequences | GENERATE | cross_arm_affinity_sequences.tsv — SURVEY_01 §6 F5 |
| ED2d | Confusion matrix Arm-Leiden vs Sequence-Leiden (15 × 50; ARI 0.35, NMI 0.76) | GENERATE | arm-leiden vs seq-leiden assignment TSVs |
Validation
-
All 8 panels (4 in ED1 + 4 in ED2); sources.tsv per ED
-
Captions ≤ 200 words each; ≥ 2 metrics with TSV paths
-
PDF + PNG per ED
Inputs
paper_prep/synthesis/MANUSCRIPT_SKELETON.md
paper_prep/surveys/SURVEY_01_pipeline.md, SURVEY_04_heterogeneity.md
## Description
Produce Extended Data ED1 (4 panels) and ED2 (4 panels). Implement directly — do not decompose further.
## File scope
- `paper_prep/figures/ed1/figure_ed1.{pdf,png,R/py}`, `caption.md`, `sources.tsv`
- `paper_prep/figures/ed2/figure_ed2.{pdf,png,R/py}`, `caption.md`, `sources.tsv`
## Figure spec (excerpt from MANUSCRIPT_SKELETON.md ED1, ED2)
### ED1 — Pipeline and per-arm flank inventory
| Panel | Content | Status | Source |
|---|---|---|---|
| ED1a | Pipeline schematic: 465 assemblies → 18,827 flanks → 15,668 PHRs → 15/50 communities | GENERATE | new schematic |
| ED1b | Per-arm flank counts (48 arms) with assembly QC overlay | GENERATE | `contig_classifications.tsv` — `SURVEY_01 §3` |
| ED1c | PHR length distribution (median 105 kb, mean 144 kb) | GENERATE | `all-vs-all.1Mb.p95.id95.len.tsv` — `SURVEY_01 §3` |
| ED1d | Chr18_q (NA18982#1) chimera evidence — wfmash + minimap2 dotplot + NNN gap + Flagger | GENERATE | `SURVEY_01 §1.5, §5 item 6` |
### ED2 — Sequence-level (50-community) detail
| Panel | Content | Status | Source |
|---|---|---|---|
| ED2a | UMAP / force-directed layout coloured by 50-community partition | READY/composite | `plot-seq-community-structure.R` outputs (`/moosefs/.../similarity/`) |
| ED2b | Within-community Jaccard distance bimodality (C1, C2, C3, C5, C6, C7, C11, C12) | GENERATE | `similarity.tsv.gz` per community subsets — `SURVEY_04 §1.10, §6 F10` |
| ED2c | Cross-arm affinity circular plot — 41 arms with edges weighted by absorbed sequences | GENERATE | `cross_arm_affinity_sequences.tsv` — `SURVEY_01 §6 F5` |
| ED2d | Confusion matrix Arm-Leiden vs Sequence-Leiden (15 × 50; ARI 0.35, NMI 0.76) | GENERATE | `arm-leiden` vs `seq-leiden` assignment TSVs |
## Validation
- [ ] All 8 panels (4 in ED1 + 4 in ED2); sources.tsv per ED
- [ ] Captions ≤ 200 words each; ≥ 2 metrics with TSV paths
- [ ] PDF + PNG per ED
## Inputs
- `paper_prep/synthesis/MANUSCRIPT_SKELETON.md`
- `paper_prep/surveys/SURVEY_01_pipeline.md`, `SURVEY_04_heterogeneity.md`