fig5-f16-query-grid-chop-filter-rerun — octopus01:/moosefs/erikg/phrs

Metadata

Status	done
Assigned	`agent-2700`
Agent identity	`46f6237a65ec4f1002c4d3fb201dc8633638d0947c276be7008c227e1051ba5e`
Created	2026-06-23T16:00:34.917345260+00:00
Started	2026-06-23T16:21:27.492548870+00:00
Completed	2026-06-24T07:15:02.013350950+00:00
Tags	`fig5`, `sweepga`, `slurm`, `query-grid`, `pafchop`, `eval-scheduled`
Eval score	0.90
└ hallucination rate	0.22
└ requirement coverage	0.84
└ semantic match	0.97
└ specificity match	0.91

Description

Rerun the Fig5 raw FASTA SweepGA f16 chopped/filter sensitivity matrix using query-grid chopped PAFs, on Slurm only.

Dependency: use the updated pafchop-rs from fig5-pafchop-query-grid-mode. Do not run heavy chopping/filtering on the head node. Use /dev/shm scratch for SweepGA temp files and pigz for compression/decompression.

Required run shape:

Source inputs are the whole-genome raw f16 many:many PAFs under paper_prep/_brainstorming/pedigree_whole_genome_sweepga_fastga_frequency16/raw_paf.
Chop from raw for each requested length; do not recursively chop a previously chopped PAF.
Use query-grid mode explicitly.
Use lengths at least 10000, 5000, 2000. Include 1000 if runtime is reasonable and Slurm resources permit.
Use distinct output names/directories from the old row-start chunks, e.g. chopped_paf_qgrid_l{N}_o0 and filtered_paf_chop_sensitivity_query_grid, so old results remain inspectable.
Run SweepGA filtering from the query-grid chopped PAFs with: --num-mappings 1:1 --scaffold-jump 0 --scoring ani --overlap 0.
Use cluster parallelism. Prefer job arrays across comparison x chop length, with enough cpus per task to keep pafchop/pigz useful.

Validation/audit:

Validate gzip outputs with pigz -t.
Write sha256 files.
Record Slurm job IDs, hosts, commands, binary paths, binary sha256, chop mode, chop length, threads, scratch dir, and completion status in summary TSVs.
Add a small TSV proving shifted raw mappings on the same query are now cut on shared query-grid boundaries.

Acceptance criteria:

All required comparison x length filtered outputs exist and validate.
The manifest clearly distinguishes query-grid outputs from older row-start outputs.
No heavy command is run on the head node except submission/inspection.
Commit message follows repo convention: feat: fig5-f16-query-grid-chop-filter-rerun (agent-NNN)

Rerun the Fig5 raw FASTA SweepGA f16 chopped/filter sensitivity matrix using query-grid chopped PAFs, on Slurm only.

Required run shape:
- Source inputs are the whole-genome raw f16 many:many PAFs under paper_prep/_brainstorming/pedigree_whole_genome_sweepga_fastga_frequency16/raw_paf.
- Chop from raw for each requested length; do not recursively chop a previously chopped PAF.
- Use query-grid mode explicitly.
- Use lengths at least 10000, 5000, 2000. Include 1000 if runtime is reasonable and Slurm resources permit.
- Use distinct output names/directories from the old row-start chunks, e.g. chopped_paf_qgrid_l{N}_o0 and filtered_paf_chop_sensitivity_query_grid, so old results remain inspectable.
- Run SweepGA filtering from the query-grid chopped PAFs with: --num-mappings 1:1 --scaffold-jump 0 --scoring ani --overlap 0.
- Use cluster parallelism. Prefer job arrays across comparison x chop length, with enough cpus per task to keep pafchop/pigz useful.

Validation/audit:
- Validate gzip outputs with pigz -t.
- Write sha256 files.
- Record Slurm job IDs, hosts, commands, binary paths, binary sha256, chop mode, chop length, threads, scratch dir, and completion status in summary TSVs.
- Add a small TSV proving shifted raw mappings on the same query are now cut on shared query-grid boundaries.

Acceptance criteria:
- All required comparison x length filtered outputs exist and validate.
- The manifest clearly distinguishes query-grid outputs from older row-start outputs.
- No heavy command is run on the head node except submission/inspection.
- Commit message follows repo convention: feat: fig5-f16-query-grid-chop-filter-rerun (agent-NNN)

Depends on

Required by

Log

2026-06-23T16:20:36.978187732+00:00 Lightweight assignment: agent=random-agent-46f6237a (46f6237a), exec_mode=full, context_scope=task, reason=This is primarily Slurm orchestration plus validation/audit, which matches a Pipeline Operator better than a general programmer or reviewer.
2026-06-23T16:21:27.492552747+00:00 Spawned by coordinator --executor codex --model gpt-5.5
2026-06-23T16:21:43.420350197+00:00 Starting query-grid f16 chop/filter rerun orchestration; inspecting existing scripts and inputs first
2026-06-23T16:28:08.166477372+00:00 Prepared Slurm-only query-grid chop/filter scripts; submitting 3 comparisons x 4 lengths array from raw f16 PAFs
2026-06-23T17:04:28.194403533+00:00 Observed first qgrid output valid but worker status pipeline used subshell; patched worker and collector to preserve/derive OK status
2026-06-23T18:36:52.489535242+00:00 Cancelled optional 1000 bp Slurm array elements after >2h runtime on first 1kb row; continuing required 10000/5000/2000 lengths
2026-06-23T22:40:14.891122063+00:00 Resubmitted final required PAN028 l2000 cell as job 1706270 with pigz -1 and 16 CPUs after original element 1706227_11 remained in chop compression after 4h
2026-06-24T07:12:22.993759895+00:00 Validated: 9 required query-grid comparison x length outputs (3 comparisons x 10000/5000/2000) exist, pigz -t passed, sha256 sidecars/manifests written; optional 1000 bp cells cancelled for runtime
2026-06-24T07:12:54.504849302+00:00 Validated: bash -n wrappers, python3 -m py_compile collector/boundary audit, cargo test for pafchop-rs passed
2026-06-24T07:13:33.837509148+00:00 Committed: 3373bc3 — feat: fig5-f16-query-grid-chop-filter-rerun (agent-2700)
2026-06-24T07:14:22.674942626+00:00 Committed and pushed: 3373bc3 — feat: fig5-f16-query-grid-chop-filter-rerun (agent-2700)
2026-06-24T07:15:02.013360028+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-06-24T07:26:59.381556226+00:00 PendingEval → Done (evaluator passed; downstream unblocks)