fig5-wfmash-query-grid-chop-filter

Query-grid filter updated wfmash whole-genome PAFs

Metadata

Statusdone
Assignedagent-2719
Agent identity46f6237a65ec4f1002c4d3fb201dc8633638d0947c276be7008c227e1051ba5e
Created2026-06-24T13:33:08.123344812+00:00
Started2026-06-24T13:34:48.698074900+00:00
Completed2026-06-24T13:51:54.662622483+00:00
Tagsfig5, wfmash, query-grid, sweepga-filter, whole-genome, eval-scheduled
Eval score0.94
└ blocking impact0.97
└ completeness0.95
└ constraint fidelity0.85
└ coordination overhead0.93
└ correctness0.96
└ downstream usability0.95
└ efficiency0.90
└ intent fidelity0.82
└ style adherence0.92

Description

Run the same exact query-grid chop/filter workflow on the updated-bin wfmash -p95 whole-genome PAFs.

Goal: make wfmash comparable to the SweepGA/FastGA query-grid results for both candidate-window panels and whole-genome overview plots.

Inputs:

  • Raw wfmash PAFs are currently ignored heavy files under /moosefs/erikg/phrs/.wg-worktrees/agent-2636/paper_prep/_brainstorming/pedigree_whole_genome_wfmash_p95_updated_bin/raw_paf/updated_bin_v0.24.2-12-ge040aa10/*.paf.gz
  • Existing provenance summaries are in paper_prep/_brainstorming/pedigree_whole_genome_wfmash_p95_updated_bin/summaries/.

Required processing:

  • Verify every raw wfmash PAF has cg:Z on all rows sampled and fail clearly if any row cannot be chopped exactly.
  • Chop from raw with pafchop-rs --chunk-mode query-grid --overlap 0 for lengths 10000, 5000, 2000.
  • Filter chopped PAFs with SweepGA PAF filtering: --num-mappings 1:1 --scaffold-jump 0 --scoring ani --overlap 0.
  • Use distinct output directories under paper_prep/_brainstorming/pedigree_whole_genome_wfmash_p95_updated_bin/query_grid_filter/, e.g. chopped_paf_qgrid_l{N}_o0 and filtered_paf_qgrid_l{N}_o0.
  • Use Slurm if any step is non-trivial, but these wfmash PAFs are small enough that a short compute-node or carefully bounded local run may be acceptable; do not run an hours-long job on the head node.

Validation/deliverables:

  • pigz -t all chopped and filtered outputs, write sha256 sidecars.
  • Write query_grid_filter_manifest.tsv with method=wfmash_p95_updated_bin, comparison_id, chop length, raw/chopped/filtered paths, row counts, commands, binary paths/checksums.
  • Write candidate-window support summary after filtering, including chr3 retained row count/summed bp/query-union bp for PAN027 and PAN028.
  • Write README section explaining this is wfmash raw WGA plus the same query-grid/SweepGA 1:1 post-filter used for SweepGA.

Acceptance criteria:

  • All 9 comparison x length filtered wfmash outputs exist and validate.
  • Outputs are ready for whole-genome overview plotting.
  • Commit with message: feat: fig5-wfmash-query-grid-chop-filter (agent-NNN)

Depends on

Required by

Log