Metadata
| Status | done |
|---|---|
| Assigned | agent-2553 |
| Agent identity | 289ccc9f03fc7c121a5ab8d685ffd018371bcdac67ceab1d50b03e7347d29155 |
| Created | 2026-06-18T20:11:14.727622047+00:00 |
| Started | 2026-06-18T20:15:29.643716235+00:00 |
| Completed | 2026-06-18T20:27:28.510781043+00:00 |
| Tags | pedigree, untangle, recombination, eval-scheduled |
| Tokens | 2077509 in / 22374 out |
| Eval score | 0.77 |
| └ blocking impact | 0.90 |
| └ completeness | 0.66 |
| └ constraint fidelity | 0.40 |
| └ coordination overhead | 0.78 |
| └ correctness | 0.77 |
| └ downstream usability | 0.79 |
| └ efficiency | 0.88 |
| └ intent fidelity | 0.84 |
| └ style adherence | 0.82 |
Description
Objective: improve the WashU pedigree tract-calling analysis so it recovers interpretable recombination-tract candidates from odgi untangle alignments without defaulting to an arbitrary nth.best=1 projection. The key question is whether consecutive m1000 or lower-threshold untangle runs can be merged into biologically meaningful tracts when multimapping and equivalent donors are represented explicitly.
Scientific framing:
- The pedigree remains a supportive compatibility analysis, not a new headline result. Preserve candidate language.
- The aim is to measure tract lengths more honestly, especially through repeats and equivalent haplotypes, and to show when untangle is genuinely inconclusive rather than pretending a first-best donor is unique.
- WFMASH 1 kb segment length is not a hard tract-length lower bound. Treat it as part of graph/seed construction, not as proof that alignments must occur in exact 1 kb increments.
Required inputs and starting points:
- Existing WashU untangle BEDs under /moosefs/guarracino/HPRCv2/PHR_III/pedigrees/washu/untangle/, especially PAN027_vs_PAN010.e50000.m1000.bed.gz, PAN027_vs_PAN011.e50000.m1000.bed.gz, and PAN028_vs_PAN027.e50000.m1000.bed.gz.
- Existing patch table: /moosefs/guarracino/HPRCv2/PHR_III/pedigrees/washu/untangle/recombination/patches.tsv.
- Existing code and reports: scripts/pedigree/patch_tract_lengths.py, scripts/pedigree/run_patch_tract_lower_merge.sh, scripts/pedigree/patch_tract_length_summary.tsv, scripts/pedigree/patch_tract_lower_merge_summary.tsv, paper_prep/_brainstorming/pedigree_patch_tract_lengths.md, and the pedigree Methods in submission/paper.tex.
- sweepga is available at /home/erikg/.cargo/bin/sweepga. Inspect sweepga --help. Its --num-mappings option may be useful for retaining n:m-best mappings if a PAF or FASTA-derived path is practical. If sweepga does not fit odgi untangle BEDs cleanly, document why and implement an equivalent interval-sweep merger instead.
Implementation requirements:
- Add a reproducible script, preferably scripts/pedigree/untangle_multimap_tracts.py, that runs from the repo root on moosefs.
- Support parameters for top-N mappings, score delta or tie epsilon, minimum segment score, maximum bridge gap, and bridge mode. Do not hard-code nth.best=1 as the only interpretation.
- Build equivalence classes per child/query interval: collect all donor/reference hits that are tied or near-tied to the best hit under the chosen threshold. Keep exact donor haplotype, chromosome arm, and any available community annotation.
- Merge adjacent/consecutive runs when donor equivalence classes are compatible. At minimum distinguish these resolvability classes: unique donor haplotype, unique donor arm with multiple haplotypes, same-community ambiguous donors, cross-community ambiguous donors, and unresolved/no-call.
- Explore merging through repeats/ambiguous segments rather than breaking every tract at a multimapping interval. Bridge only when flanking evidence remains compatible, and record the bridge length and reason. Provide sensitivity across at least two gap/bridge settings.
- Compare against current first-best behavior: existing high-confidence m1000 patch table and the lower-merge m0/n1 run-level summary. Report how many tracts merge, split, or become ambiguous under the multimap-aware method.
- Quantify tract-length distributions and the primate literature ranges already discussed: 22-95 bp, 318-688 bp, and 159-1376 bp. Include counts, proportions, medians, IQRs, and max/min under each parameter setting.
- Produce a small visual artifact that makes the case visible for representative Fig. 5/WashU regions: unique best segments, equivalent alternatives, bridged ambiguous/repeat intervals, and final tract calls. Use PDF/PNG or TSV plus a plotting script in paper_prep/_brainstorming/pedigree_multimap_tracts/.
- Produce a concise Markdown report explaining what was tried, whether sweepga was used or rejected, the recommended default parameters, and which claims are supported versus inconclusive.
- Only edit the manuscript if the result is robust and useful. If editing, keep it light: one Methods sentence or one cautious Results sentence. Do not promote the pedigree analysis to a headline result, do not add defensive caveats, and keep candidate/compatible-with wording.
Expected outputs:
- scripts/pedigree/untangle_multimap_tracts.py or a clearly named equivalent.
- A summary TSV in scripts/pedigree/ with parameter settings and tract length statistics.
- A tract-level TSV with resolvability class and donor equivalence metadata.
- paper_prep/_brainstorming/pedigree_multimap_tracts.md.
- Representative visualization files under paper_prep/_brainstorming/pedigree_multimap_tracts/.
- If paper.tex is touched, rebuild submission/paper.pdf and confirm grep -c undefined submission/paper.log is 0.
Acceptance criteria:
- The analysis no longer silently treats first-best untangle as uniquely true when multiple equivalent donors exist.
- Consecutive m1000 and lower-threshold runs are explicitly tested for mergeability.
- Multimapping is represented as evidence/resolvability, not discarded noise.
- The report makes clear whether the result strengthens conversion-vs-crossover tract-length interpretation or remains inconclusive.
Depends on
Required by
Log
- 2026-06-18T20:13:45.083937819+00:00 Spawn failed (attempt 1/5): Invalid --timeout value. exec_mode=full, executor=codex
- 2026-06-18T20:14:11.857408685+00:00 Spawn failed (attempt 2/5): Invalid --timeout value. exec_mode=full, executor=codex
- 2026-06-18T20:14:37.331391486+00:00 Spawn failed (attempt 1/5): Invalid --timeout value. exec_mode=full, executor=codex
- 2026-06-18T20:15:03.428137481+00:00 Spawned by coordinator --executor codex --model gpt-5.5
- 2026-06-18T20:15:10.031797776+00:00 Task reset for retry from in-progress (attempt #1) — killed agent agent-2552 (PID 4605) — reason: clear invalid 1d timeout from task metadata and retry spawn
- 2026-06-18T20:15:29.643721155+00:00 Spawned by coordinator --executor codex --model gpt-5.5
- 2026-06-18T20:15:40.064977968+00:00 Starting review/implementation pass; checking prior WIP and existing pedigree artifacts
- 2026-06-18T20:16:48.683090823+00:00 Implementing multimap-aware untangle tract caller with configurable tie/top-N and bridge settings
- 2026-06-18T20:25:35.771178774+00:00 Validated: ran untangle_multimap_tracts.py on m1000 WashU BEDs; wrote tract TSV, summary TSV, Markdown report, and representative SVG/TSV
- 2026-06-18T20:26:01.036362309+00:00 Validated: python3 -m py_compile passed for the new tract caller and plot script
- 2026-06-18T20:26:55.696722660+00:00 Committed: 0b1b7ba — pushed to remote
- 2026-06-18T20:27:28.510788798+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
- 2026-06-18T20:34:53.798255990+00:00 PendingEval → Done (evaluator passed; downstream unblocks)