direct-sweepga-parental

Direct sweepGA parental haplotype concordance

Metadata

Statusdone
Assignedagent-2594
Created2026-06-20T14:10:33.897376503+00:00
Started2026-06-20T14:12:18.544270006+00:00
Completed2026-06-20T14:25:23.082734802+00:00
Tagseval-scheduled
Eval score0.89
└ blocking impact0.93
└ completeness0.89
└ constraint fidelity0.55
└ coordination overhead0.86
└ correctness0.91
└ downstream usability0.88
└ efficiency0.85
└ intent fidelity0.89
└ style adherence0.92

Description

Run a Slurm-backed direct sweepGA alignment/concordance analysis to test whether direct haplotype-to-parent-haplotype alignments recover the same inheritance/recombination structure as the graph/odgi-untangle results.

Scientific objective:

  • Check whether direct sweepGA/fastGA alignment of child haplotypes against the two haplotypes of the relevant parent matches the graph-derived results.
  • Treat graph/untangle outputs as the comparison target, not as something to overwrite: compare direct PAF signals to paper_prep/_brainstorming/fig5_synteny_recombination_schematic/event_manifest.tsv, selected_segments.tsv, and the earlier strict sweepGA/untangle outputs such as paper_prep/_brainstorming/fig5_sweepga_1to1_redraw/conservative_segments.tsv and pedigree_native_untangle_agent2556_slurm/.

Comparisons to run first:

  • PAN027 paternal haplotype/product versus PAN011 hap1 and PAN011 hap2.
  • PAN027 maternal haplotype/product versus PAN010 hap1 and PAN010 hap2.
  • PAN028 maternal haplotype/product versus PAN027 hap1 and PAN027 hap2.
  • Inspect the manifest/prior query lists and add only directly relevant transmitting-parent comparisons if another one is clearly required.

Execution requirements:

  • Do not run heavy alignments on the login/head node. Use Slurm sbatch jobs and run comparisons in parallel.
  • Start with unfiltered sweepGA -n many:many -j 0 / equivalent --num-mappings many:many --scaffold-jump 0 output. If the installed sweepGA does not support exactly that spelling, determine the correct current-main spelling and record it.
  • Use /dev/shm or per-job local scratch as TMPDIR for sweepGA if needed, and clean it up in job epilog/trap.
  • Check the installed sweepGA version/commit and whether it is current enough for many:many/no-scaffold behavior. If an update is needed, build/update it in the established local style and record the exact binary path and commit/version used.
  • Reuse existing pedigree source data/paths where possible; do not invent reference-projected coordinates. If FASTA extraction from the graph/window FASTA is needed, script it reproducibly.

Filtering/configuration matrix:

  • Preserve raw unfiltered many:many/no-scaffold PAFs as first-class artifacts.
  • Then run or derive a small filter matrix comparable to prior analysis: 1:1 no-scaffold, 1:many, 2:many, 4:many or equivalent supported sweepGA configurations, plus simple PAF filters for identity/length/query coverage as needed.
  • Keep the filter scripts parameterized so we can add/remove thresholds without rerunning expensive alignment when possible.

Deliverables:

  • Create a new scratch package under paper_prep/_brainstorming/pedigree_direct_sweepga_concordance/.
  • Include runnable scripts/configs for input discovery/preparation, Slurm submission, sweepGA execution, filtering, and summarization.
  • Write a README.md explaining inputs, commands, job IDs, output files, sweepGA version, and how to resume/check jobs.
  • Produce raw and filtered PAF outputs, compressed where appropriate, plus concise summary TSVs.
  • Produce a concordance table saying, for each graph-derived candidate segment/event, whether direct sweepGA supports the same query interval/local window, parent haplotype, target arm, and broad role (same-chr context, PAR1 positive control, primary PHR donor, side fragment).
  • If the direct signal is clear, generate review-only full-genome and focused PDFs/SVGs in the same brainstorming directory. Do not modify submission/ or manuscript figures.

Validation

  • All heavy sweepGA runs are submitted through Slurm, not executed on the head node.
  • Raw unfiltered many:many/no-scaffold PAFs exist for the required comparisons, or a README records exact job IDs/status if still running.
  • At least one filtered configuration comparable to the prior strict analysis is produced, with scripts to generate the rest.
  • Summary/concordance TSVs compare direct sweepGA outputs against the graph/untangle candidate tables.
  • The report explicitly says where direct sweepGA agrees with, disagrees with, or is inconclusive relative to the graph results.
  • All coordinates are native assembly/window coordinates unless explicitly documented otherwise.
  • No manuscript/submission files are edited.

Depends on

Required by

Messages 2 messages (2 unread)

  1. #1user2026-06-20T14:13:51.768866960+00:00read
    User clarification: if direct haplotype-to-parent sweepGA is cleaner than graph/untangle for the candidate events, it can become the primary evidence source for Fig5/pedigree. Please structure outputs so this can be decided: preserve raw direct PAFs, compare directly to graph-derived selected_segments/event_manifest, and make clear which source should be primary per event. Heavy runs still must go through Slurm.
  2. #2direct-sweepga-parental2026-06-20T14:16:13.376211211+00:00read
    Acknowledged — I will preserve raw direct PAFs, keep graph tables as the explicit comparison target, and add per-event evidence-source recommendation fields so downstream tasks can decide whether direct sweepGA should become primary for Fig5/pedigree.

Log