fig5-f16-query-grid-chop-filter-rerun

Fig5 f16 query-grid chop/filter Slurm rerun

Metadata

Statusdone
Assignedagent-2700
Agent identity46f6237a65ec4f1002c4d3fb201dc8633638d0947c276be7008c227e1051ba5e
Created2026-06-23T16:00:34.917345260+00:00
Started2026-06-23T16:21:27.492548870+00:00
Completed2026-06-24T07:15:02.013350950+00:00
Tagsfig5, sweepga, slurm, query-grid, pafchop, eval-scheduled
Eval score0.90
└ hallucination rate0.22
└ requirement coverage0.84
└ semantic match0.97
└ specificity match0.91

Description

Rerun the Fig5 raw FASTA SweepGA f16 chopped/filter sensitivity matrix using query-grid chopped PAFs, on Slurm only.

Dependency: use the updated pafchop-rs from fig5-pafchop-query-grid-mode. Do not run heavy chopping/filtering on the head node. Use /dev/shm scratch for SweepGA temp files and pigz for compression/decompression.

Required run shape:

  • Source inputs are the whole-genome raw f16 many:many PAFs under paper_prep/_brainstorming/pedigree_whole_genome_sweepga_fastga_frequency16/raw_paf.
  • Chop from raw for each requested length; do not recursively chop a previously chopped PAF.
  • Use query-grid mode explicitly.
  • Use lengths at least 10000, 5000, 2000. Include 1000 if runtime is reasonable and Slurm resources permit.
  • Use distinct output names/directories from the old row-start chunks, e.g. chopped_paf_qgrid_l{N}_o0 and filtered_paf_chop_sensitivity_query_grid, so old results remain inspectable.
  • Run SweepGA filtering from the query-grid chopped PAFs with: --num-mappings 1:1 --scaffold-jump 0 --scoring ani --overlap 0.
  • Use cluster parallelism. Prefer job arrays across comparison x chop length, with enough cpus per task to keep pafchop/pigz useful.

Validation/audit:

  • Validate gzip outputs with pigz -t.
  • Write sha256 files.
  • Record Slurm job IDs, hosts, commands, binary paths, binary sha256, chop mode, chop length, threads, scratch dir, and completion status in summary TSVs.
  • Add a small TSV proving shifted raw mappings on the same query are now cut on shared query-grid boundaries.

Acceptance criteria:

  • All required comparison x length filtered outputs exist and validate.
  • The manifest clearly distinguishes query-grid outputs from older row-start outputs.
  • No heavy command is run on the head node except submission/inspection.
  • Commit message follows repo convention: feat: fig5-f16-query-grid-chop-filter-rerun (agent-NNN)

Depends on

Required by

Log