fig5-pafchop-query-grid-mode

Fig5 pafchop query-grid chunk mode

Metadata

Statusdone
Assignedagent-2697
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-06-23T15:59:46.652454058+00:00
Started2026-06-23T16:01:02.117273252+00:00
Completed2026-06-23T16:13:26.626008932+00:00
Tagsfig5, pafchop, rust, sweepga, query-grid, eval-scheduled
Eval score0.84
└ blocking impact0.95
└ completeness0.82
└ constraint fidelity0.85
└ coordination overhead0.85
└ correctness0.78
└ downstream usability0.86
└ efficiency0.91
└ intent fidelity0.80
└ style adherence0.93

Description

Implement absolute query-coordinate grid chopping for pafchop-rs and wire the f16 chopping wrappers to use it explicitly.

Problem: current pafchop splits each PAF row starting at that row's q_start. Two alignments covering the same query interval but starting at different q_start positions are therefore chopped in different phases, producing offset 2 kb / 5 kb / 10 kb chunks. For Fig5 identity-per-chunk SweepGA filtering we need chunks aligned to query-space coordinates, e.g. boundaries at 0,N,2N,3N on each query contig, so competing mappings are compared over the same query slices.

Required implementation:

  • Add an explicit query-grid mode to pafchop-rs. Acceptable CLI shape: --chunk-mode query-grid / --grid query / --query-grid; choose a clear name and document it in --help.
  • In query-grid mode, for --length N --overlap 0, emit chunk intervals equal to [kN,(k+1)N) intersected with the PAF row query interval. Example: q_start=7 q_end=27 length=10 => 7-10, 10-20, 20-27. Two rows on the same query overlapping 10-20 must produce that exact 10-20 chunk.
  • Preserve the existing exact CIGAR clipping and recompute PAF columns 2/3/7/8/9/10 plus cg/cs/NM/dv/de/df exactly. No interpolation.
  • Preserve threaded behavior and input-order deterministic output.
  • Add tests covering shifted q_start grid boundaries, reverse strand, CIGAR crossing grid boundaries, and threaded output equals sequential output in query-grid mode.
  • Add summary/provenance fields so outputs distinguish row-start chunking from query-grid chunking.
  • Update the f16 relevant wrappers to call query-grid mode explicitly. Do not overwrite existing row-start chopped outputs silently; use a distinct directory/name convention if needed.

Validation commands:

  • cargo test --manifest-path paper_prep/_brainstorming/pafchop-rs/Cargo.toml
  • cargo build --release --manifest-path paper_prep/_brainstorming/pafchop-rs/Cargo.toml

Acceptance criteria:

  • Tests pass.
  • Existing row-start behavior remains available or backwards-compatible unless deliberately documented.
  • Fig5 f16 wrappers use query-grid mode explicitly for future reruns.
  • Commit message follows repo convention: feat: fig5-pafchop-query-grid-mode (agent-NNN)

Depends on

Required by

Log