fig5-whole-genome-wfmash-p95

Whole-genome wfmash -p95 Fig5 pedigree homology recovery test

Metadata

Statusdone
Assignedagent-2630
Agent identity3577bc75d6ed4f1947509aa5c086c91ce7c997c7806dab6bf6affac647452647
Created2026-06-21T06:58:24.790952075+00:00
Started2026-06-21T07:01:26.900856205+00:00
Completed2026-06-21T08:36:41.064043901+00:00
Tagspedigree, fig5, wfmash, p95, whole-genome-alignment, raw-paf, chr3-homology, eval-scheduled
Eval score0.91
└ blocking impact0.88
└ completeness0.92
└ constraint fidelity0.40
└ coordination overhead0.88
└ correctness0.94
└ downstream usability0.89
└ efficiency0.83
└ intent fidelity0.84
└ style adherence0.92

Description

Input context:

  • Prior corrected whole-genome sweepGA package: paper_prep/_brainstorming/pedigree_whole_genome_sweepga_joint_parent/.
  • Prior decision/review reports:
    • paper_prep/_brainstorming/fig5_whole_genome_sweepga_evidence_review/REPORT.md
    • paper_prep/_brainstorming/fig5_whole_genome_sweepga_closeout/QA_REPORT.md
  • Recovered readable full assemblies:
    • /moosefs/erikg/phrs/recovery/fig5-whole-genome-joint-parent-sweepga/PAN010.fa.gz
    • /moosefs/erikg/phrs/recovery/fig5-whole-genome-joint-parent-sweepga/PAN011.fa.gz
    • /moosefs/erikg/phrs/recovery/fig5-whole-genome-joint-parent-sweepga/PAN027.fa.gz
    • /moosefs/erikg/phrs/recovery/fig5-whole-genome-joint-parent-sweepga/PAN028.fa.gz

Task: Run a whole-genome wfmash direct-alignment test for the same Fig5 pedigree comparisons that sweepGA tested, using -p 95. This is a raw homology-recovery test, not a manuscript update.

Comparisons:

  • PAN027pat_vs_PAN011_joint: query PAN027 paternal haplotype whole genome vs joint PAN011 target whole genome.
  • PAN027mat_vs_PAN010_joint: query PAN027 maternal haplotype whole genome vs joint PAN010 target whole genome.
  • PAN028mat_vs_PAN027_joint: query PAN028 maternal haplotype whole genome vs joint PAN027 target whole genome. Use the previous package's config/comparisons.tsv and summaries/input_manifest.tsv to preserve haplotype/query/target definitions. Whole-genome input is mandatory; do not substitute chromosome-only, arm-only, or 500 kb-window-only alignment runs. Candidate-window slicing is allowed only after raw whole-genome PAF is produced.

Required wfmash strategy:

  • Run wfmash through Slurm, not as a long head-node job.
  • Use /dev/shm or node-local scratch through wfmash -B for temporary files, with output copied back to the package directory.
  • At minimum, produce one raw full-genome PAF per comparison with wfmash -p 95.
  • Also run a permissive homology-recovery configuration unless the literal -p 95 run already recovers clear chr3 support. Suggested command shape: wfmash -p 95 -s 1k -l 1k -n 50 -f -M -t $THREADS -B /dev/shm/wfmash.$SLURM_JOB_ID.$COMPARISON TARGET.fa QUERY.fa > OUT.paf, then bgzip the PAF. Adjust only if installed wfmash rejects an option; document exact final commands.
  • Keep raw PAFs. Do not apply scaffolding filters or one-to-one filters before the evidence review.

Output package: Create paper_prep/_brainstorming/pedigree_whole_genome_wfmash_p95/ with:

  • README.md explaining inputs, exact commands, parameter sets, and conclusions at a technical level.
  • config/comparisons.tsv and any parameter matrix file used.
  • scripts/ for reproducible submit/run/summarize commands.
  • logs/ with Slurm stdout/stderr or copied summaries sufficient to audit exact commands and /dev/shm scratch use.
  • summaries/wfmash_jobs.tsv with job IDs, parameter set, start/end status, output PAF path, size, and checksum.
  • summaries/paf_file_summary.tsv with row counts and target-chromosome distributions.
  • summaries/candidate_window_support.tsv summarizing posthoc overlap of raw wfmash PAF rows with the Fig5 candidate windows used by the sweepGA review.
  • raw_paf/ for bgzipped PAFs, or if PAFs are too large for git, keep them ignored and record absolute paths plus checksums in the manifests.

Acceptance:

  • Full whole-genome wfmash ran for the same three joint-parent comparisons, or failures are diagnosed with logs and next commands.
  • Exact wfmash command lines include -p 95; any sensitive/permissive run parameters are explicitly recorded.
  • The package states whether raw wfmash emitted any chr3-target rows overlapping the PAN027/PAN028 chr9 candidate windows before downstream filtering.
  • No submission/ files are changed and no Fig5 schematic is created in this task.

Depends on

Required by

Messages 2 messages (2 unread)

  1. #1codex2026-06-21T07:07:37.406200748+00:00read
    URGENT correction from chat/user: do not treat the installed Guix `/home/erikg/.guix-profile/bin/wfmash` as current final evidence. Local installed package is `wfmash 0.12.5-1+0222f7c`, while upstream GitHub releases show latest `v0.24.2` with major mapping/scaffolding/default/CLI changes. Before running or interpreting cluster jobs, verify the exact wfmash binary and version.
    
    Required change:
    - Prefer current upstream wfmash `v0.24.2` (or the newest official release available when you run) for the primary `-p 95` test. Build/install locally in the task worktree or use a documented current container/binary. Record source URL, tag/commit, build command, `which wfmash`, and `wfmash --help`/version evidence.
    - If any jobs already used the old Guix 0.12.5 binary, keep/log them only as legacy diagnostics and do not use them as the primary conclusion.
    - Re-check option semantics after upgrading. The original suggested sensitive command was based on old local help where `-s` meant segment length. Current upstream docs indicate CLI drift: `-w` is the window/segment scale and `-s` is sketch size. For current wfmash, use a sensitive config shaped like `wfmash -p 95 -w 1k -l 1k -n 50 -f -M -t $THREADS -B /dev/shm/... TARGET.fa QUERY.fa > OUT.paf`, unless current `--help` says otherwise. Document exact final commands.
    - The acceptance condition remains raw whole-genome PAF evidence: did current raw wfmash `-p 95` emit chr3-target rows overlapping the Fig5 PAN027/PAN028 chr9 candidate windows?
  2. #2fig5-whole-genome-wfmash-p952026-06-21T07:23:39.841696566+00:00read
    Acknowledged — I will treat the Guix wfmash jobs as legacy diagnostics only, cancel remaining legacy Slurm jobs, verify/build current upstream wfmash, re-check CLI semantics, and rerun the primary whole-genome -p95 matrix with current wfmash.

Log