fig5-minimap2-longrun-chop-sweep

Long-run minimap2 Fig5 PAF chop and sweep filter

Metadata

Statusdone
Assignedagent-2652
Agent identity46f6237a65ec4f1002c4d3fb201dc8633638d0947c276be7008c227e1051ba5e
Created2026-06-21T19:09:27.164713108+00:00
Started2026-06-21T19:11:20.343278597+00:00
Completed2026-06-22T05:46:25.378352745+00:00
Tagspedigree, fig5, minimap2, pafchop, sweepga, whole-genome-alignment, chr3-homology, eval-scheduled
Eval score0.78
└ blocking impact0.80
└ completeness0.75
└ constraint fidelity0.55
└ coordination overhead0.84
└ correctness0.71
└ downstream usability0.83
└ efficiency0.84
└ intent fidelity0.86
└ style adherence0.87

Description

Motivation: The previous minimap2 asm5 all-chain attempt (fig5-whole-genome-minimap2-asm5-allchains) launched the right full whole-genome commands with /home/erikg/bin/minimap2 v2.31-r1302, but jobs 1704346-1704348 were cancelled after ~2h35m with no complete PAF rows copied back. That result is not evaluable, not chr3-negative. Per user correction during the frequency16 work, do not classify these long whole-genome local-homology jobs as pathological just because early output is header-only or not flushed; allow an ~8 h scale before cancellation unless there is hard failure/OOM/devshm exhaustion/manual cancel.

Task: Rerun the same three full whole-genome minimap2 comparisons and then, if complete raw PAFs are produced, test whether the chr3 homology is preserved after chopping and sweepGA PAF filtering.

Primary minimap2 command shape:

/home/erikg/bin/minimap2 -x asm5 -c --eqx -P --q-occ-frac=0 -t <threads> TARGET.fa QUERY.fa | pigz -p <threads> > OUT.paf.gz

Comparisons:

  • PAN027pat_vs_PAN011_joint
  • PAN027mat_vs_PAN010_joint
  • PAN028mat_vs_PAN027_joint

Requirements:

  • Use full whole-genome query/target FASTAs from the updated sweepGA/minimap2 packages or regenerate identically. No chromosome-only/window-only FASTAs.
  • Use Slurm. Request enough wall time (at least 24 h allocation is fine) and explicitly do not cancel before ~8 h solely because the node-local PAF is header-sized or unflushed.
  • Keep scratch/output on appropriate node-local scratch (/dev/shm if used and sufficient, or a documented safe alternative if /dev/shm is too tight for minimap2 output). Record the scratch decision.
  • Capture exact binary provenance for /home/erikg/bin/minimap2: which, realpath, version, sha256, help text.
  • Raw PAF first: inspect whether chr3 target rows overlap the PAN027/PAN028 chr9 candidate windows before any chopping/filtering.
  • If raw PAFs complete, run pafchop-rs with configurable split length set to 10 kb (l10000_o0) on the minimap2 PAFs.
  • Then run sweepGA PAF filtering on the chopped minimap2 PAFs for at least many:many and 4:many with --scaffold-jump 0, using /home/erikg/.cargo/bin/sweepga and recording command logs. Do not realign with sweepGA here; use sweepGA as the PAF filtering/sweeping stage.
  • Summarize chr3 target support at three layers: raw minimap2 PAF, chopped minimap2 PAF, and sweepGA-filtered chopped minimap2 PAF.
  • Compare to updated wfmash p95 positive evidence and updated sweepGA/FastGA default negative evidence.

Output package: Create paper_prep/_brainstorming/pedigree_whole_genome_minimap2_asm5_allchains_longrun_chop_sweep/ with README, config, scripts, logs, summaries, ignored raw/chopped/filtered PAF paths/checksums.

Required summaries:

  • summaries/minimap2_binary.tsv
  • summaries/sweepga_binary.tsv
  • summaries/slurm_jobs.tsv
  • summaries/paf_file_summary.tsv
  • summaries/raw_candidate_window_support.tsv
  • summaries/chop_manifest.tsv if raw PAFs complete
  • summaries/filter_manifest.tsv if filtering is run
  • summaries/minimap2_chop_sweep_chr3_support_summary.tsv
  • if still no complete PAF after allowed runtime: summaries/longrun_runtime_diagnosis.tsv

Acceptance:

  • Direct answer: did minimap2 emit chr3 target rows overlapping the PAN027/PAN028 chr9 candidate windows in raw PAF?
  • Direct answer: if raw PAF exists, does 10 kb chopping plus sweepGA PAF filtering retain/catch that chr3 homology?
  • The report does not treat early header-only/unflushed PAF state as a negative result.
  • Exact logs prove minimap2 v2.31-r1302, -x asm5, -P, --q-occ-frac=0, full whole-genome inputs, and any sweepGA PAF filtering commands.
  • No submission/ files are modified and no Fig5 schematic is created.

Depends on

Required by

Log