fig5-sweepga-fastga-frequency16

SweepGA FastGA frequency 16 sensitivity for Fig5 chr3 homology

Metadata

Statusdone
Assignedagent-2649
Agent identity46f6237a65ec4f1002c4d3fb201dc8633638d0947c276be7008c227e1051ba5e
Created2026-06-21T17:38:07.077038589+00:00
Started2026-06-21T17:39:52.783152635+00:00
Completed2026-06-22T00:25:49.941538550+00:00
Tagspedigree, fig5, sweepga, fastga, frequency-sensitivity, whole-genome-alignment, chr3-homology, eval-scheduled
Eval score0.89
└ blocking impact0.88
└ completeness0.95
└ constraint fidelity0.55
└ coordination overhead0.68
└ correctness0.96
└ downstream usability0.92
└ efficiency0.70
└ intent fidelity0.88
└ style adherence0.95

Description

Motivation: --fastga-frequency 100 was too aggressive for the Fig5 whole-genome sweepGA/FastGA rerun: after ~2.6 h it remained inside FastGA, emitted 0-byte .1aln temp files, and produced no raw PAF. The original updated-bin sweepGA run appears to have used FastGA -f2 and finished quickly but did not emit chr3 target rows overlapping the PAN027/PAN028 chr9 candidate windows. Test a smaller frequency relaxation, centered on -f16, to see whether the wfmash-positive chr3 homology can be recovered without the pathological runtime seen at -f100.

Task: Run a focused full whole-genome sweepGA/FastGA sensitivity test with updated /home/erikg/.cargo/bin/sweepga, same three joint-parent comparisons, same full whole-genome FASTA inputs, same raw-first evidence standard:

/home/erikg/.cargo/bin/sweepga --fastga --fastga-frequency 16 --num-mappings many:many --scaffold-jump 0 --temp-dir /dev/shm/... --output-file ... QUERY.fa TARGET.fa

Requirements:

  • Use Slurm and parallelize the three comparisons where safe.
  • Keep sweepGA/FastGA scratch explicitly under /dev/shm; do not use $SLURM_TMPDIR as sweepGA scratch.
  • Reuse/copy the prior frequency100 package scripts/config where appropriate, but create a separate output package: paper_prep/_brainstorming/pedigree_whole_genome_sweepga_fastga_frequency16/.
  • Record binary provenance: explicit path, which, realpath, version, sha256, --help, and exact command logs.
  • Record FastGA binary provenance via sweepga --check-fastga.
  • Raw PAF first: inspect whether chr3 target rows overlap the PAN027/PAN028 chr9 candidate windows before chopping/filtering.
  • Compare against prior updated-bin no-explicit-frequency sweepGA and updated wfmash p95. Treat wfmash/curated PGGB as expected-positive comparator, not as a filter input.
  • If -f16 finishes quickly and either still misses chr3 or looks borderline, optionally run a tiny bracket such as -f8 and/or -f32, but keep -f16 as the primary result and do not let bracket runs obscure it.
  • If raw chr3 rows appear at -f16, then run 10 kb pafchop-rs and at least many:many plus 4:many chopped sweepGA filters on those new raw PAFs. If no raw chr3 rows appear, do not spend time on chopped filtering except to state why.
  • If -f16 becomes pathological, cancel promptly after evidence comparable to the -f100 diagnosis: active FastGA, no raw PAF, zero/near-zero .1aln, and elapsed runtime substantially beyond the prior -f2 jobs.

Output package: Required files include README, config, scripts, logs, summaries, and ignored raw/chopped/filtered PAF paths/checksums. Required summaries:

  • summaries/sweepga_binary.tsv
  • summaries/fastga_binary.tsv
  • summaries/slurm_jobs.tsv
  • summaries/raw_chr3_support.tsv
  • summaries/frequency_sensitivity_summary.tsv
  • if pathological: summaries/pathological_runtime.tsv
  • if chopping/filtering is run: summaries/chop_manifest.tsv, summaries/filter_manifest.tsv, and candidate-window support summary

Acceptance:

  • Direct yes/no: does explicit --fastga-frequency 16 make sweepGA/FastGA emit chr3 target rows for PAN027 and/or PAN028 candidate windows in raw PAF?
  • The report explains whether -f16 supports, weakens, or leaves unresolved the seed-frequency sparsification hypothesis behind the wfmash-positive / sweepGA-negative discrepancy.
  • Exact command logs prove /home/erikg/.cargo/bin/sweepga, --fastga-frequency 16, --num-mappings many:many, --scaffold-jump 0, and /dev/shm scratch were used.
  • No submission/ files are modified and no Fig5 schematic is created.

Depends on

Required by

Messages 6 messages (6 unread)

  1. #1codex2026-06-21T17:41:42.615789791+00:00read
    Please prioritize submitting the primary -f16 Slurm matrix now. Copy/reuse the frequency100 package, adjust 100->16 labels and validation, symlink or reuse agent-2639 inputs, and run raw PAF first. Defer optional -f8/-f32 until after -f16 raw support is known.
  2. #2codex2026-06-21T17:43:23.829310285+00:00read
    I am submitting the primary -f16 Slurm matrix directly from your package now because the scripts are ready and no sg-f16 jobs are running. Do not submit a duplicate if summaries/slurm_jobs.tsv already contains job IDs.
  3. #3codex2026-06-21T18:07:11.967339756+00:00read
    Correction from user: do NOT treat -f16 as pathological at 25-30 minutes. Let primary jobs 1704349-1704351 run up to ~8 hours unless they hard-fail/OOM/exhaust /dev/shm or the user explicitly asks to cancel. Record periodic status, but do not cancel just because .1aln is zero early.
  4. #4codex2026-06-21T18:08:09.304707651+00:00read
    I resubmitted the two PAN027 -f16 jobs as 1704355 and 1704356 after the user's correction. Treat 1704349/1704350 as aborted-too-early attempts, not final negatives. Do not cancel 1704351/1704355/1704356 before ~8 hours unless hard failure/OOM/devshm exhaustion/manual user cancel.
  5. #5fig5-sweepga-fastga-frequency162026-06-21T18:09:59.808083665+00:00read
    Acknowledged #3 — I will not treat early zero-byte .1aln as pathological and will let primary -f16 jobs run up to ~8 hours unless hard-fail/OOM/devshm exhaustion/manual cancel.
  6. #6fig5-sweepga-fastga-frequency162026-06-21T18:09:59.964870807+00:00read
    Acknowledged #4 — I will treat 1704349/1704350 as aborted-too-early attempts, use 1704351/1704355/1704356 as the primary matrix, and update package summaries accordingly.

Log