fig5-sweepga-f16-validated-chop-rerun-after-real-audit

Fig5 sweepGA f16 validated chop rerun after real audit

Metadata

Statusopen ‖ paused
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Created2026-06-22T19:56:58.747917953+00:00
Started2026-06-22T20:32:35.097545567+00:00
Tagspedigree, fig5, sweepga, pafchop, slurm, validation, plot, eval-scheduled

Description

Paused downstream rerun. Do not unpause until redo-sweepga-paf-filter-identity-scoring-audit has produced both required files and explicitly says the rerun may proceed.

Important stale state:

  • Do not reuse outputs from cancelled Slurm array 1705408 or old f16 chopped outputs created before PAF semantics validation.
  • The old task audit-sweepga-paf-filter-identity-scoring is invalid for gating because it did not produce SWEEPGA_PAF_FILTER_IDENTITY_AUDIT.md or the TSV audit summary.

Goal: rerun the Fig5 sweepGA f16 sensitivity using exact CIGAR-based PAF chopping, small configurable chunk sizes no larger than 10 kb unless the audit justifies otherwise, and the command line approved by SWEEPGA_PAF_FILTER_IDENTITY_AUDIT.md.

Required work:

  • Heavy chopping/sweep filtering only through Slurm, never on the head node. Shared-node jobs are preferred; use a few cores per task and /dev/shm for scratch where needed.
  • Use the repaired pafchop-rs build that recomputes qstart/qend/tstart/tend/col10/col11 and CIGAR-derived tags. Reject raw PAF rows that cannot be exactly split.
  • Run a small matrix of chunk sizes including 10 kb and at least one smaller size if practical, plus the f16 raw/un-chopped reference.
  • Apply sweepGA filtering exactly as approved by the audit, expected to be --num-mappings 1:1 --scaffold-jump 0 with identity/ANI scoring only if the audit passes.
  • Summarize whether chr3 target homology survives 1:1 filtering and compare to wfmash/minimap2/PGGB untangling evidence already collected.

Acceptance criteria:

  • Slurm job IDs and executable paths are recorded.
  • No whole-genome pafchop/sweepGA work ran on the head node.
  • Deliverables include a TSV summary and a plot suitable for updating the Fig5 sweepGA panel/key supplemental figure.
  • The report clearly says whether the result supports smaller chopping, f16 frequency, minimap2/wfmash cross-validation, or a sweepGA limitation.

Depends on

Required by

Log