fig5-raw-fasta-f16-scaffold-jump-filter-sensitivity

Fig5 raw FASTA f16 scaffold-jump filter sensitivity

Metadata

Statusdone
Assignedagent-2692
Agent identity46f6237a65ec4f1002c4d3fb201dc8633638d0947c276be7008c227e1051ba5e
Created2026-06-23T12:55:05.704345125+00:00
Started2026-06-23T12:59:19.827152959+00:00
Completed2026-06-23T14:11:23.853074616+00:00
Tagsfig5, sweepga, slurm, scaffold-chaining, filtering, whole-genome-alignment, eval-scheduled
Eval score0.93
└ blocking impact0.95
└ completeness0.95
└ coordination overhead0.89
└ correctness0.94
└ downstream usability0.93
└ efficiency0.90
└ intent fidelity0.85
└ style adherence0.90

Description

Run a final SweepGA filtering sensitivity sweep for the Fig5 raw-FASTA f16 evidence, focused on scaffold chaining/merge distance and minimum alignment length. Source alignments should be the current whole-genome raw f16 many:many PAFs:

/moosefs/erikg/phrs/.wg-worktrees/agent-2649/paper_prep/_brainstorming/pedigree_whole_genome_sweepga_fastga_frequency16/raw_paf/*.sweepga_frequency16_many_many_j0.paf.gz

Use the updated /home/erikg/.cargo/bin/sweepga. Run the final PAF filtering on Slurm, not on the head node, with /dev/shm scratch where useful. The core matrix must include:

  • --scaffold-jump: 0, 10k, 20k, 50k
  • --num-mappings: 1:1 and 4:many at minimum; include many:many as the unfiltered/multiway baseline where useful
  • --scoring: ani and log-length-ani
  • --min-aln-length: unset/default plus at least 1k, 5k, and 10k
  • keep --overlap default unless there is a reason to vary it; document if varied
  • record --scaffold-mass default 10k, and optionally add a small scaffold-mass sensitivity if chr3 behavior changes around the candidate windows

For each matrix cell, summarize candidate-window support for PAR1, PAN027 chr9q->chr3q, and PAN028 chr9q->chr3q using absolute query chromosome coordinates. Report expected-target rows, expected-target sum/union bp, all target-chrom union bp, row counts, and whether chr3 survives. The important readout is whether scaffold chaining at 10k/20k/50k and min-length thresholds recover, erase, or ambiguate the chr3 homology relative to raw many:many and the chopped 2kb/5kb/10kb panels.

Deliver a committed package under paper_prep/_brainstorming/fig5_raw_fasta_sweepga_f16_scaffold_jump_sensitivity/ containing scripts/configs/manifests, ignored heavy filtered PAFs/logs, summary TSVs, and PDF/SVG/PNG visualizations. Include a compact heatmap/table panel where rows are candidate events and columns are scaffold-jump/min-length/scoring/mapping-mode cells, with chr3 union bp and status encoded clearly.

Acceptance criteria:

  • Matrix explicitly includes --scaffold-jump 10k, 20k, 50k, plus 0.
  • Matrix explicitly includes --min-aln-length thresholds, not only scaffold-jump.
  • Heavy filtering is run through Slurm with bounded parallelism; no repeated full-PAF filtering on the head node.
  • Outputs are reproducible from committed scripts/configs and include command lines, binary versions, and checksums for filtered PAFs.
  • Commit to main with WG provenance suffix.

Depends on

Required by

Log