fig5-sweepga-fastga-frequency100

SweepGA FastGA frequency 100 sensitivity for Fig5 chr3 homology

Metadata

Statusdone
Assignedagent-2644
Agent identity46f6237a65ec4f1002c4d3fb201dc8633638d0947c276be7008c227e1051ba5e
Created2026-06-21T13:00:47.727502095+00:00
Started2026-06-21T13:04:50.979340344+00:00
Completed2026-06-21T15:54:59.635401229+00:00
Tagspedigree, fig5, sweepga, fastga, frequency-sensitivity, whole-genome-alignment, chr3-homology, eval-scheduled
Eval score0.79
└ blocking impact0.74
└ completeness0.77
└ constraint fidelity0.55
└ coordination overhead0.83
└ correctness0.83
└ downstream usability0.70
└ efficiency0.74
└ intent fidelity0.84
└ style adherence0.88

Description

Motivation: Updated wfmash at /home/erikg/bin/wfmash with -p 95 recovers chr3 target homology for the Fig5 PAN027/PAN028 chr9 candidate windows, consistent with the curated subtelomeric PGGB graph. Updated sweepGA/FastGA did not emit chr3 target rows in the raw PAF, even before downstream chopping/filtering. Hypothesis from chat: FastGA's k-mer occurrence/frequency filter is too stringent for repetitive subtelomeric homology, so chr3 anchors are sparsified away and the remaining chain favors chr9.

Use the existing updated-binary inputs and provenance from fig5-whole-genome-sweepga-updated-bin. If paper_prep/_brainstorming/pedigree_whole_genome_sweepga_updated_bin/ has not merged into main yet, use the live worker package at .wg-worktrees/agent-2639/paper_prep/_brainstorming/pedigree_whole_genome_sweepga_updated_bin/ and record that explicitly.

Task: Run a focused full whole-genome sweepGA/FastGA sensitivity test with the same three joint-parent comparisons and the updated binary /home/erikg/.cargo/bin/sweepga, but make the seed frequency knob explicit. Primary test command shape:

/home/erikg/.cargo/bin/sweepga --fastga --fastga-frequency 100 --num-mappings many:many --scaffold-jump 0 --temp-dir /dev/shm/... --output-file ... QUERY.fa TARGET.fa

Requirements:

  • Run through Slurm, parallelizing the three comparisons where safe.
  • Keep sweepGA/FastGA scratch under /dev/shm; do not use $SLURM_TMPDIR as sweepGA scratch.
  • Record binary provenance: explicit path, which, realpath, version, sha256, --help, and exact command logs.
  • Record FastGA binary provenance via sweepga --check-fastga.
  • Raw PAF first: before any chopping/filtering, inspect whether chr3 target rows overlap the PAN027/PAN028 chr9 candidate windows.
  • Compare against the prior updated-bin raw no-explicit-frequency result and against updated wfmash p95 results. Treat wfmash/curated PGGB as expected-positive comparator, not as a filter input.
  • If --fastga-frequency 100 still has no chr3 rows or produces a pathological output, optionally run a small bracket such as 25 and 500 for the same inputs, but do not let bracket runs obscure the primary 100 result.
  • If raw chr3 rows appear at frequency 100, then run the existing 10 kb pafchop-rs chop and at least many:many plus 4:many chopped sweepGA filters on those new raw PAFs. If no raw chr3 rows appear, do not spend time on chopped filtering except to state why.

Output package: Create paper_prep/_brainstorming/pedigree_whole_genome_sweepga_fastga_frequency100/ with README, config, scripts, logs, summaries, and ignored raw/chopped/filtered PAF paths/checksums. Required summaries:

  • summaries/sweepga_binary.tsv
  • summaries/fastga_binary.tsv or equivalent check-fastga output
  • summaries/slurm_jobs.tsv
  • summaries/raw_chr3_support.tsv
  • summaries/frequency_sensitivity_summary.tsv
  • If chopping/filtering is run: summaries/chop_manifest.tsv, summaries/filter_manifest.tsv, and updated candidate-window support.

Acceptance:

  • Direct yes/no: does explicit --fastga-frequency 100 make sweepGA/FastGA emit chr3 target rows for PAN027 and/or PAN028 candidate windows in raw PAF?
  • The report explains whether the wfmash-positive / sweepGA-negative discrepancy is consistent with FastGA seed-frequency sparsification in repetitive subtelomeric sequence.
  • Exact command logs prove /home/erikg/.cargo/bin/sweepga, --fastga-frequency 100, --scaffold-jump 0, and /dev/shm scratch were used.
  • No submission/ files are modified and no Fig5 schematic is created.

Depends on

Required by

Log