fig5-whole-genome-alignment-overview

Whole-genome Fig5 alignment overview panels

Metadata

Statusdone
Assignedagent-2733
Agent identity46f6237a65ec4f1002c4d3fb201dc8633638d0947c276be7008c227e1051ba5e
Created2026-06-24T13:37:05.921743499+00:00
Started2026-06-25T10:12:26.149269094+00:00
Completed2026-06-25T10:43:00.611348607+00:00
Tagsfig5, whole-genome, sweepga, wfmash, untangle, query-grid, eval-scheduled

Description

Generate clean whole-genome Fig5 overview plots for untangle, SweepGA/FastGA, and wfmash.

Goal: show the whole-genome context explicitly, not only PAR/PHR candidate zooms. The plot must make it possible to see genome-wide alignment/support behavior across all query chromosomes while still exposing the chrX/chrY PAR1 and chr9q/chr3q recombination-candidate intervals as callouts.

Required inputs/methods:

  • Untangle-style source: use the strict/corrected untangle whole-genome overview artifacts from fig5-untangle-whole-genome-overview.
  • SweepGA/FastGA f16: use the query-grid chopped + SweepGA 1:1 ANI-filtered outputs already produced for chunk lengths 10kb, 5kb, and 2kb.
  • SweepGA/FastGA f32: wait for fig5-sweepga-fastga-frequency32-raw, fig5-f32-query-grid-chop-filter-rerun, and fig5-f32-query-grid-overlap-audit; use the same query-grid/1:1 ANI-filtered outputs.
  • wfmash -p95 updated bin: use raw whole-genome wfmash PAFs after the same query-grid chopping and SweepGA 1:1 ANI filtering from fig5-wfmash-query-grid-chop-filter.

Visualization requirements:

  • Primary panel is whole genome: one horizontal genome track per method/comparison/chop length or a compact faceted equivalent, with every query chromosome/arm shown in query-coordinate order. Do not emit a candidate-only plot as the main result.
  • Bin query coordinates consistently, preferably 500kb or 1Mb for whole-genome readability, and color each bin by retained target chromosome/arm or dominant target family after filtering. Include missing/no-support as an explicit neutral state.
  • Include a query-arm x target-arm support matrix for each method/chop setting, or a compact summary table/heatmap if the full matrix is too large.
  • Include PAR1 and chr9q/chr3q candidate callouts below the whole-genome tracks, using the same coordinates as the current query-grid panels, but make them subordinate to the whole-genome context.
  • Avoid full raw alignment ribbon spaghetti; aggregate rows into binned support/winner tracks and matrices so the result is legible.
  • Fix the prior confusing legend: no overlapping bottom legends, stable colors for chromosomes/arms, and clear labels for method, comparison, chop length, and filter mode.
  • Use chromosome coordinates, not just arbitrary 500kb window indices. If windows are binned, axes must still report chromosome coordinates.

Deliverables:

  • paper_prep/_brainstorming/fig5_whole_genome_alignment_overview/fig5_whole_genome_alignment_overview.{pdf,png,svg}
  • paper_prep/_brainstorming/fig5_whole_genome_alignment_overview/whole_genome_binned_support.tsv
  • paper_prep/_brainstorming/fig5_whole_genome_alignment_overview/whole_genome_support_matrix.tsv
  • paper_prep/_brainstorming/fig5_whole_genome_alignment_overview/whole_genome_method_manifest.tsv
  • README.md describing exact inputs, binning, scoring/filtering, and how to regenerate.
  • validate_outputs.sh that checks all expected outputs and non-empty row counts.

Acceptance criteria:

  • Whole-genome panels exist for untangle, SweepGA f16, SweepGA f32, and wfmash where input data are complete.
  • The main visual makes all query chromosomes visible; candidate zooms are secondary callouts.
  • The figure documents query-grid chunk length, f16/f32 or wfmash method, and 1:1 ANI filtering.
  • No heavy PAF aggregation is run on the head node; use Slurm for large scans and commit only lightweight outputs/manifests.
  • Commit with message: feat: fig5-whole-genome-alignment-overview (agent-NNN)

Depends on

Required by

Log