fig5-synteny-event-manifest

Fig5 synteny event manifest and coordinate audit

Metadata

Statusdone
Assignedagent-2579
Agent identity3577bc75d6ed4f1947509aa5c086c91ce7c997c7806dab6bf6affac647452647
Created2026-06-20T08:24:42.080355885+00:00
Started2026-06-20T08:25:55.114688059+00:00
Completed2026-06-20T08:33:17.762098875+00:00
Tagspedigree, figure, synteny, coordinates, eval-scheduled
Eval score0.96
└ blocking impact0.98
└ completeness0.94
└ constraint fidelity0.25
└ coordination overhead0.97
└ correctness0.96
└ downstream usability0.95
└ efficiency0.98
└ intent fidelity0.92
└ style adherence0.97

Description

Create the data manifest for a new Fig5 recombination/synteny schematic. This is a data-audit task only: do not make the final figure and do not edit the submitted manuscript.

Goal: Replace hard-to-read colored interval tracks with a clearer SVbyEye/SyRI-like schematic: per event, show the child/recombinant haplotype and the parental/source haplotype segments that align into it, with chromosome context. The final downstream visual should emphasize flows/splines/blocks rather than many ambiguous colors.

Required audit and event selection:

  1. Use the strict primary path only: paper_prep/_brainstorming/fig5_sweepga_1to1_redraw/conservative_segments.tsv, which is nb=1 plus sweepGA 1:1 no-scaffold. Do not select events from permissive multimap/nth-best rows. Use patches.tsv only for annotation/community/status, not for drawing geometry.
  2. Confirm the coordinate system. Determine from the sequence names and any available metadata whether the plotted windows are native sample assembly coordinates, CHM13-projected coordinates, or something else. The current expectation is native assembly windows parsed from names like PAN027#2#chr9.paternal:135704825-136204824_chr9_qarm, not CHM13. Document this clearly.
  3. Double-check the T2T/reference question: are the involved assemblies complete chromosome-scale/T2T for these arms, or are we only looking at 500 kb extracted subtelomeric windows anchored/labeled against chromosome arms? Record what can and cannot be inferred from available files. Do not invent a CHM13 projection if none exists.
  4. Select exactly three review-facing events from the strict primary path:
    • PAR1 positive control: PAN027_vs_PAN011, child/query PAN027#2#chrX.paternal:12265-512264_chrX_parm, donor arm chrYp intervals totaling ~150 kb. This is the known male X/Y PAR1 recombination sanity check.
    • PHR candidate 1: PAN027_vs_PAN011, child/query PAN027#2#chr9.paternal:135704825-136204824_chr9_qarm, donor chr3q terminal intervals totaling ~45 kb, with chr15q as a smaller side fragment and tiny chr20q low-confidence tail. This should be candidate autosomal PHR exchange, not a clean full crossover.
    • PHR candidate 2: prefer the strict-path PAN028 chr9q event if supported: PAN028_vs_PAN027, child/query PAN028#1#chr9.haplotype1:134380985-134880984_chr9_qarm, donor chr3q intervals totaling ~34 kb, with chr15q side fragment if present. This replaces the earlier misleading PAN028 chr3q panel, whose strict path did not actually draw chr9q donor segments.
  5. For each event, define the chromosomes/haplotypes to be drawn in the downstream schematic. The intended schematic is roughly three tracks per event: child/recombinant haplotype; same-chromosome parental/source context when informative; non-homologous donor haplotype(s). If a side fragment makes a fourth source chromosome necessary, record that explicitly rather than hiding it.
  6. Produce machine-readable tables with both local and native genomic coordinates. Where target-side exact segment coordinates are available in the strict PAF, recover them; otherwise record target source window and mark exact target segment coordinates as unavailable.

Required outputs under paper_prep/_brainstorming/fig5_synteny_recombination_schematic/:

  • event_manifest.tsv: one row per selected event with event id, event class, transmission, child/query source, source windows, involved chromosomes/haplotypes, primary donor arms, side fragments, and recommended schematic tracks.
  • selected_segments.tsv: one row per strict primary-path segment used by the selected events, with local query interval, native query interval, target/donor source window, target interval if recoverable, arm/haplotype labels, identity/jaccard, community annotation if joined, and event role (same-chromosome context, primary donor, side fragment, low-confidence tail, PAR positive control).
  • coordinate_provenance.md: short audit of native-vs-CHM13 coordinate status and T2T/window limitations.
  • README.md: concise guide for the downstream schematic task.

Constraints:

  • No heavy alignments, no new odgi/sweepGA runs on the head node. If truly needed, write an sbatch plan and stop.
  • No edits to submission/.
  • Do not overclaim event-level validation. Keep candidate language for PHR events and positive-control language for PAR1.
  • Commit with project convention: feat: fig5-synteny-event-manifest (agent-NNN).

Validation:

  • The three selected events are all supported by strict conservative_segments.tsv rows.
  • The second PHR event is not the misleading strict PAN028 chr3q panel unless the audit proves that is actually the better strict-path event.
  • Coordinate provenance explicitly states native assembly vs CHM13 and what is known about T2T/window status.
  • Tables include both local offsets and displayed/native genomic coordinates.
  • Manuscript files are untouched.

Depends on

Required by

Log