Description
Description
Produce ED3 (4 panels) annotation and ED4 (4 panels) gene enrichment. Implement directly — do not decompose further.
File scope
paper_prep/figures/ed3/figure_ed3.{pdf,png,R/py}, caption.md, sources.tsv
paper_prep/figures/ed4/figure_ed4.{pdf,png,R/py}, caption.md, sources.tsv
Figure spec (excerpt from MANUSCRIPT_SKELETON.md ED3, ED4)
ED3 — Annotation: TAR1 + internal (TTAGGG)n + telomere length
| Panel | Content | Status | Source |
| ED3a | TAR1 prevalence per arm (PAR1 absence; acrocentric intermediate; autosomal saturation) | GENERATE | community_tar1_by_arm.tsv — SURVEY_02 §6 Fig M1a |
| ED3b | Internal (TTAGGG)n island length distribution + canonical-fraction histogram | GENERATE | length_distribution.tsv + motif_composition.tsv — SURVEY_02 §6 Fig M2 |
| ED3c | Terminal telomere length by community (Kruskal-Wallis H = 100.89, p = 3.2e-15) | GENERATE | .telo.tsv joined to community assignments — SURVEY_02 §6 ED3 |
| ED3d | Per-arm TAR1 positional distance-from-telomere | GENERATE | tar1_positional_per_arm.tsv — SURVEY_02 §6 Fig M1b |
ED4 — Gene enrichment, pseudogene gradient, copy-weighted GO
| Panel | Content | Status | Source |
| ED4a | GSEA / GO:BP top terms (snRNP, olfactory, sensory) — vertical bar | READY | Figure1_GSEA_BP_vertical.pdf (1 Mb caveat — flag PHR-only re-run) — SURVEY_FIG_inv §3 |
| ED4b | Copy-weighted vs deduplicated comparison (olfactory fold = 598) | GENERATE | improved_copy_weighted_vs_deduplicated_comparison.csv — SURVEY_DATA §4 |
| ED4c | High-copy gene families (DUX4 ×18, BAGE2, MTCO, RPL23A, SEPTIN14P22, OR4F) | GENERATE | gene_copy_summary.csv — SURVEY_DATA §2 |
| ED4d | OR4F pseudogenisation gradient (62.1 % pseudogene; 11.1 % chr7_p → 99.8 % chr15_q) | GENERATE | per-arm pseudogene fraction — SURVEY_10/11/12 C12 |
Validation
-
All 8 panels; sources.tsv per ED
-
Captions ≤ 200 words; ≥ 2 metrics with TSV paths
-
PDF + PNG per ED
-
ED4a caption notes 1 Mb window caveat (PHR-only re-run flagged in WORK_DECOMPOSITION.md ## Gaps)
Inputs
paper_prep/synthesis/MANUSCRIPT_SKELETON.md
paper_prep/surveys/SURVEY_02_annotation.md, SURVEY_DATA_inventory.md, SURVEY_10_11_12_limits_summary_lit.md
## Description
Produce ED3 (4 panels) annotation and ED4 (4 panels) gene enrichment. Implement directly — do not decompose further.
## File scope
- `paper_prep/figures/ed3/figure_ed3.{pdf,png,R/py}`, `caption.md`, `sources.tsv`
- `paper_prep/figures/ed4/figure_ed4.{pdf,png,R/py}`, `caption.md`, `sources.tsv`
## Figure spec (excerpt from MANUSCRIPT_SKELETON.md ED3, ED4)
### ED3 — Annotation: TAR1 + internal (TTAGGG)n + telomere length
| Panel | Content | Status | Source |
|---|---|---|---|
| ED3a | TAR1 prevalence per arm (PAR1 absence; acrocentric intermediate; autosomal saturation) | GENERATE | `community_tar1_by_arm.tsv` — `SURVEY_02 §6 Fig M1a` |
| ED3b | Internal (TTAGGG)n island length distribution + canonical-fraction histogram | GENERATE | `length_distribution.tsv` + `motif_composition.tsv` — `SURVEY_02 §6 Fig M2` |
| ED3c | Terminal telomere length by community (Kruskal-Wallis H = 100.89, p = 3.2e-15) | GENERATE | `.telo.tsv` joined to community assignments — `SURVEY_02 §6 ED3` |
| ED3d | Per-arm TAR1 positional distance-from-telomere | GENERATE | `tar1_positional_per_arm.tsv` — `SURVEY_02 §6 Fig M1b` |
### ED4 — Gene enrichment, pseudogene gradient, copy-weighted GO
| Panel | Content | Status | Source |
|---|---|---|---|
| ED4a | GSEA / GO:BP top terms (snRNP, olfactory, sensory) — vertical bar | READY | `Figure1_GSEA_BP_vertical.pdf` (1 Mb caveat — flag PHR-only re-run) — `SURVEY_FIG_inv §3` |
| ED4b | Copy-weighted vs deduplicated comparison (olfactory fold = 598) | GENERATE | `improved_copy_weighted_vs_deduplicated_comparison.csv` — `SURVEY_DATA §4` |
| ED4c | High-copy gene families (DUX4 ×18, BAGE2, MTCO, RPL23A, SEPTIN14P22, OR4F) | GENERATE | `gene_copy_summary.csv` — `SURVEY_DATA §2` |
| ED4d | OR4F pseudogenisation gradient (62.1 % pseudogene; 11.1 % chr7_p → 99.8 % chr15_q) | GENERATE | per-arm pseudogene fraction — `SURVEY_10/11/12 C12` |
## Validation
- [ ] All 8 panels; sources.tsv per ED
- [ ] Captions ≤ 200 words; ≥ 2 metrics with TSV paths
- [ ] PDF + PNG per ED
- [ ] ED4a caption notes 1 Mb window caveat (PHR-only re-run flagged in WORK_DECOMPOSITION.md ## Gaps)
## Inputs
- `paper_prep/synthesis/MANUSCRIPT_SKELETON.md`
- `paper_prep/surveys/SURVEY_02_annotation.md`, `SURVEY_DATA_inventory.md`, `SURVEY_10_11_12_limits_summary_lit.md`