redo-sweepga-paf-filter-identity-scoring-audit

Redo sweepGA PAF filter identity scoring audit

Metadata

Statusdone
Assignedagent-2671
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Created2026-06-22T19:55:05.683616485+00:00
Started2026-06-22T19:56:31.296357774+00:00
Completed2026-06-22T20:06:05.941224315+00:00
Tagssweepga, paf, validation, scoring, fig5, eval-scheduled
Eval score0.94
└ blocking impact0.96
└ completeness0.95
└ constraint fidelity0.85
└ coordination overhead0.90
└ correctness0.96
└ downstream usability0.94
└ efficiency0.90
└ intent fidelity0.83
└ style adherence0.94

Description

Replacement for the invalid task audit-sweepga-paf-filter-identity-scoring, which was marked done without producing the required audit deliverables.

Goal: prove exactly how the currently installed /home/erikg/.cargo/bin/sweepga scores and filters PAF records when applying --num-mappings 1:1 / many:many, especially whether --scoring ani ranks by per-chunk identity rather than raw length or length*identity.

Required work:

  • Inspect the actual sweepGA binary/help/version and source if available; record the executable path and version/hash where possible.
  • Build minimal synthetic PAF fixtures with equal-length and unequal-length competing chunks, with recomputed col10/col11 identity fields and cg:Z tags where sweepGA expects them.
  • Run sweepGA locally only on tiny fixtures, never on whole-genome data from the head node.
  • Explicitly test default scoring and --scoring ani behavior for --num-mappings 1:1 and many:many if supported.
  • Confirm whether filtering is per PAF row/chunk, whether any trivial merging/chaining occurs inside sweepGA before scoring, and which PAF columns/tags affect the score.
  • Write the required Markdown audit and TSV summary. The task is not complete without both deliverables.

Acceptance criteria:

  • SWEEPGA_PAF_FILTER_IDENTITY_AUDIT.md states the exact safe command line to use for the f16 chopped rerun, or states that no safe command exists.
  • sweepga_paf_filter_identity_audit.tsv has one row per synthetic fixture/test with command, expected winner, observed winner, pass/fail, and interpretation.
  • The audit explicitly says whether the downstream Fig5 f16 validated chop rerun may proceed.

Depends on

Required by

Log