direct-sweepga-parental — octopus01:/moosefs/erikg/phrs

Metadata

Status	done
Assigned	`agent-2594`
Created	2026-06-20T14:10:33.897376503+00:00
Started	2026-06-20T14:12:18.544270006+00:00
Completed	2026-06-20T14:25:23.082734802+00:00
Tags	`eval-scheduled`
Eval score	0.89
└ blocking impact	0.93
└ completeness	0.89
└ constraint fidelity	0.55
└ coordination overhead	0.86
└ correctness	0.91
└ downstream usability	0.88
└ efficiency	0.85
└ intent fidelity	0.89
└ style adherence	0.92

Description

Run a Slurm-backed direct sweepGA alignment/concordance analysis to test whether direct haplotype-to-parent-haplotype alignments recover the same inheritance/recombination structure as the graph/odgi-untangle results.

Scientific objective:

Check whether direct sweepGA/fastGA alignment of child haplotypes against the two haplotypes of the relevant parent matches the graph-derived results.
Treat graph/untangle outputs as the comparison target, not as something to overwrite: compare direct PAF signals to paper_prep/_brainstorming/fig5_synteny_recombination_schematic/event_manifest.tsv, selected_segments.tsv, and the earlier strict sweepGA/untangle outputs such as paper_prep/_brainstorming/fig5_sweepga_1to1_redraw/conservative_segments.tsv and pedigree_native_untangle_agent2556_slurm/.

Comparisons to run first:

PAN027 paternal haplotype/product versus PAN011 hap1 and PAN011 hap2.
PAN027 maternal haplotype/product versus PAN010 hap1 and PAN010 hap2.
PAN028 maternal haplotype/product versus PAN027 hap1 and PAN027 hap2.
Inspect the manifest/prior query lists and add only directly relevant transmitting-parent comparisons if another one is clearly required.

Execution requirements:

Do not run heavy alignments on the login/head node. Use Slurm sbatch jobs and run comparisons in parallel.
Start with unfiltered sweepGA -n many:many -j 0 / equivalent --num-mappings many:many --scaffold-jump 0 output. If the installed sweepGA does not support exactly that spelling, determine the correct current-main spelling and record it.
Use /dev/shm or per-job local scratch as TMPDIR for sweepGA if needed, and clean it up in job epilog/trap.
Check the installed sweepGA version/commit and whether it is current enough for many:many/no-scaffold behavior. If an update is needed, build/update it in the established local style and record the exact binary path and commit/version used.
Reuse existing pedigree source data/paths where possible; do not invent reference-projected coordinates. If FASTA extraction from the graph/window FASTA is needed, script it reproducibly.

Filtering/configuration matrix:

Preserve raw unfiltered many:many/no-scaffold PAFs as first-class artifacts.
Then run or derive a small filter matrix comparable to prior analysis: 1:1 no-scaffold, 1:many, 2:many, 4:many or equivalent supported sweepGA configurations, plus simple PAF filters for identity/length/query coverage as needed.
Keep the filter scripts parameterized so we can add/remove thresholds without rerunning expensive alignment when possible.

Deliverables:

Create a new scratch package under paper_prep/_brainstorming/pedigree_direct_sweepga_concordance/.
Include runnable scripts/configs for input discovery/preparation, Slurm submission, sweepGA execution, filtering, and summarization.
Write a README.md explaining inputs, commands, job IDs, output files, sweepGA version, and how to resume/check jobs.
Produce raw and filtered PAF outputs, compressed where appropriate, plus concise summary TSVs.
Produce a concordance table saying, for each graph-derived candidate segment/event, whether direct sweepGA supports the same query interval/local window, parent haplotype, target arm, and broad role (same-chr context, PAR1 positive control, primary PHR donor, side fragment).
If the direct signal is clear, generate review-only full-genome and focused PDFs/SVGs in the same brainstorming directory. Do not modify submission/ or manuscript figures.

Validation

All heavy sweepGA runs are submitted through Slurm, not executed on the head node.
Raw unfiltered many:many/no-scaffold PAFs exist for the required comparisons, or a README records exact job IDs/status if still running.
At least one filtered configuration comparable to the prior strict analysis is produced, with scripts to generate the rest.
Summary/concordance TSVs compare direct sweepGA outputs against the graph/untangle candidate tables.
The report explicitly says where direct sweepGA agrees with, disagrees with, or is inconclusive relative to the graph results.
All coordinates are native assembly/window coordinates unless explicitly documented otherwise.
No manuscript/submission files are edited.

Scientific objective:
- Check whether direct sweepGA/fastGA alignment of child haplotypes against the two haplotypes of the relevant parent matches the graph-derived results.
- Treat graph/untangle outputs as the comparison target, not as something to overwrite: compare direct PAF signals to `paper_prep/_brainstorming/fig5_synteny_recombination_schematic/event_manifest.tsv`, `selected_segments.tsv`, and the earlier strict sweepGA/untangle outputs such as `paper_prep/_brainstorming/fig5_sweepga_1to1_redraw/conservative_segments.tsv` and `pedigree_native_untangle_agent2556_slurm/`.

Comparisons to run first:
- PAN027 paternal haplotype/product versus PAN011 hap1 and PAN011 hap2.
- PAN027 maternal haplotype/product versus PAN010 hap1 and PAN010 hap2.
- PAN028 maternal haplotype/product versus PAN027 hap1 and PAN027 hap2.
- Inspect the manifest/prior query lists and add only directly relevant transmitting-parent comparisons if another one is clearly required.

Execution requirements:
- Do not run heavy alignments on the login/head node. Use Slurm `sbatch` jobs and run comparisons in parallel.
- Start with unfiltered sweepGA `-n many:many -j 0` / equivalent `--num-mappings many:many --scaffold-jump 0` output. If the installed sweepGA does not support exactly that spelling, determine the correct current-main spelling and record it.
- Use `/dev/shm` or per-job local scratch as TMPDIR for sweepGA if needed, and clean it up in job epilog/trap.
- Check the installed sweepGA version/commit and whether it is current enough for many:many/no-scaffold behavior. If an update is needed, build/update it in the established local style and record the exact binary path and commit/version used.
- Reuse existing pedigree source data/paths where possible; do not invent reference-projected coordinates. If FASTA extraction from the graph/window FASTA is needed, script it reproducibly.

Filtering/configuration matrix:
- Preserve raw unfiltered many:many/no-scaffold PAFs as first-class artifacts.
- Then run or derive a small filter matrix comparable to prior analysis: 1:1 no-scaffold, 1:many, 2:many, 4:many or equivalent supported sweepGA configurations, plus simple PAF filters for identity/length/query coverage as needed.
- Keep the filter scripts parameterized so we can add/remove thresholds without rerunning expensive alignment when possible.

Deliverables:
- Create a new scratch package under `paper_prep/_brainstorming/pedigree_direct_sweepga_concordance/`.
- Include runnable scripts/configs for input discovery/preparation, Slurm submission, sweepGA execution, filtering, and summarization.
- Write a `README.md` explaining inputs, commands, job IDs, output files, sweepGA version, and how to resume/check jobs.
- Produce raw and filtered PAF outputs, compressed where appropriate, plus concise summary TSVs.
- Produce a concordance table saying, for each graph-derived candidate segment/event, whether direct sweepGA supports the same query interval/local window, parent haplotype, target arm, and broad role (same-chr context, PAR1 positive control, primary PHR donor, side fragment).
- If the direct signal is clear, generate review-only full-genome and focused PDFs/SVGs in the same brainstorming directory. Do not modify `submission/` or manuscript figures.

## Validation
- [ ] All heavy sweepGA runs are submitted through Slurm, not executed on the head node.
- [ ] Raw unfiltered many:many/no-scaffold PAFs exist for the required comparisons, or a README records exact job IDs/status if still running.
- [ ] At least one filtered configuration comparable to the prior strict analysis is produced, with scripts to generate the rest.
- [ ] Summary/concordance TSVs compare direct sweepGA outputs against the graph/untangle candidate tables.
- [ ] The report explicitly says where direct sweepGA agrees with, disagrees with, or is inconclusive relative to the graph results.
- [ ] All coordinates are native assembly/window coordinates unless explicitly documented otherwise.
- [ ] No manuscript/submission files are edited.

Depends on

done .assign-direct-sweepga-parental

Required by

✉ Messages 2 messages (2 unread)

#1user2026-06-20T14:13:51.768866960+00:00read

User clarification: if direct haplotype-to-parent sweepGA is cleaner than graph/untangle for the candidate events, it can become the primary evidence source for Fig5/pedigree. Please structure outputs so this can be decided: preserve raw direct PAFs, compare directly to graph-derived selected_segments/event_manifest, and make clear which source should be primary per event. Heavy runs still must go through Slurm.

#2direct-sweepga-parental2026-06-20T14:16:13.376211211+00:00read

Acknowledged — I will preserve raw direct PAFs, keep graph tables as the explicit comparison target, and add per-event evidence-source recommendation fields so downstream tasks can decide whether direct sweepGA should become primary for Fig5/pedigree.

Log

2026-06-20T14:10:33.861985943+00:00 Task paused
2026-06-20T14:10:55.876094598+00:00 Task published
2026-06-20T14:12:18.544279133+00:00 Spawned by wg spawn --executor codex --model gpt-5.5
2026-06-20T14:12:39.623789260+00:00 Starting direct sweepGA concordance package: inspecting source manifests, prior sweepGA/untangle outputs, and local Slurm/sweepGA availability.
2026-06-20T14:16:08.983048627+00:00 Created direct sweepGA concordance package skeleton with comparisons config, filter matrix, Slurm runner, PAF filters, and concordance summarizer.
2026-06-20T14:16:35.711116127+00:00 Submitted raw direct sweepGA many:many/no-scaffold jobs through Slurm: 1704247-1704252.
2026-06-20T14:17:04.876403785+00:00 First Slurm submission failed before alignment due package path inference from sbatch spool; patched runner to export PACKAGE_DIR and resubmitting.
2026-06-20T14:18:02.198581695+00:00 Second Slurm submission reached FastGA but failed in FAtoGDB with /dev/shm TMPDIR; switched wrapper to node-local /tmp scratch, removed empty gzip placeholders, and resubmitted.
2026-06-20T14:18:59.225036236+00:00 Discovered installed sweepGA does not stream output for --output-file -; patched Slurm runner to write a scratch PAF then gzip it, removed empty placeholders, canceled leftover jobs if still active, and resubmitted.
2026-06-20T14:23:44.466511954+00:00 Validated direct sweepGA package: 6 raw PAFs, 30 filtered PAFs, concordance table with 38 rows, gzip integrity OK, Python scripts compile, no submission/manuscript files touched.
2026-06-20T14:24:28.059473579+00:00 Committed and pushed: 554917c — direct sweepGA parental concordance package.
2026-06-20T14:25:23.082743078+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-06-20T14:42:56.213880427+00:00 PendingEval → Done (evaluator passed; downstream unblocks)