pedigree-untangle-bedpe-paf-merge

Pedigree untangle BEDPE/PAF native merge tract analysis

Metadata

Statusdone
Assignedagent-2557
Agent identity46f6237a65ec4f1002c4d3fb201dc8633638d0947c276be7008c227e1051ba5e
Created2026-06-19T05:59:32.340547260+00:00
Started2026-06-19T07:30:18.630939422+00:00
Completed2026-06-19T07:51:18.501013729+00:00
Tagspedigree, untangle, paf, bedpe, eval-scheduled
Eval score0.82
└ blocking impact0.82
└ completeness0.92
└ constraint fidelity0.10
└ coordination overhead0.72
└ correctness0.88
└ downstream usability0.75
└ efficiency0.73
└ intent fidelity0.79
└ style adherence0.74

Description

NARROWED OBJECTIVE: produce a decision record and minimal reproducible pipeline for native odgi untangle BEDPE/PAF + sweepGA filtering. This task must not make manuscript edits or broad biological claims.

Hard constraints:

  • Do not run heavy odgi untangle directly on the head node. Heavy untangle commands run only under Slurm sbatch.
  • The primary runnable must be a committed shell/sbatch script under scripts/pedigree/ that directly runs odgi untangle BEDPE and odgi untangle -p PAF commands. It must not be a Python --run-odgi driver.
  • Python may parse/summarize completed outputs only. It may not be the default heavy runner.
  • Use /home/erikg/.cargo/bin/sweepga rebuilt from GitHub origin/main commit 018e4ce49d2c125820e0ac50dc5feaa02d423683. Record both sweepga --version and the commit in the report.
  • Use odgi-emitted PAF directly as sweepGA input. Run a minimal test on a representative/native PAF with --num-mappings settings such as 1:many, 2:many, and 4:many, or document the exact command and incompatibility if sweepGA rejects it.
  • Do not touch submission/paper.tex.
  • Do not claim conversion-vs-crossover mechanism from this analysis. The output is a methods/provenance decision record only.
  • Do not commit large BEDPE/PAF intermediates.

Existing in-flight data:

  • Slurm job 1703959 was submitted by the prior worker for native odgi untangle output under /moosefs/erikg/phrs/pedigree_native_untangle_agent2556_slurm.
  • Those outputs may be used if they completed successfully, but the report must state that the first worker also ran a direct head-node pass before the Slurm-only constraint. Treat the Slurm outputs as the valid rerun.

Required deliverables:

  1. scripts/pedigree/run_untangle_native_merge_tracts.sbatch: explicit sbatch script with direct odgi untangle commands for BEDPE and PAF. No Python --run-odgi for heavy work.
  2. scripts/pedigree/untangle_native_merge_tracts.py or equivalent parse-only summarizer, with no heavy odgi execution path.
  3. scripts/pedigree/untangle_native_merge_summary.tsv: compact summary sufficient to compare native BEDPE/PAF and sweepGA-filtered PAF.
  4. paper_prep/_brainstorming/pedigree_native_untangle_merge.md: concise decision record answering only:
    • Were native BEDPE/PAF outputs generated on Slurm?
    • Does sweepGA accept/filter odgi-emitted PAF directly, and with what --num-mappings commands?
    • Does native odgi merge-dist or sweepGA filtering clearly improve tract calls enough to justify a later manuscript edit?
    • If not, say no and stop.
  5. Optional small representative TSV/SVG only if it directly illustrates sweepGA/native PAF equivalence or incompatibility.

Acceptance criteria:

  • Current task definition is not satisfied by post-hoc merging of old BED rows.
  • Current task definition is not satisfied by a Python runner that shells out to odgi for heavy work.
  • The final commit should be a restrained pipeline/provenance/decision-record commit, not a manuscript rewrite and not a new biological result.

Depends on

Required by

Messages 13 messages (13 unread)

  1. #1user2026-06-19T06:44:05.624134046+00:00read
    Operational constraint from Erik: any heavy analysis rerun, especially odgi untangle native BEDPE/PAF over the WashU graph or nontrivial parameter grid, must run via Slurm sbatch, not directly on the head node. Head-node work is limited to lightweight inspection, writing scripts, checking small files, parsing already completed outputs, and submitting/monitoring jobs. If you already started a head-node odgi/sweepga rerun, stop it and convert to sbatch. Document the sbatch script, job IDs, resources, and output paths in the report.
  2. #2pedigree-untangle-bedpe-paf-merge2026-06-19T07:12:06.205889632+00:00read
    Acknowledged — the direct head-node odgi rerun had already finished before I saw this message, and no heavy odgi job is still running. I am adding an sbatch wrapper now, rerunning the native BEDPE/PAF grid through Slurm, and will document the Slurm job ID/resources/output path in the report.
  3. #3user2026-06-19T07:14:04.666833425+00:00delivered
    Follow-up from Erik: restructure this so the natural/primary runnable is a Slurm batch script, not a Python script that directly runs odgi on the head node. Add a committed sbatch script or shell wrapper under scripts/pedigree/ that runs odgi untangle native BEDPE and odgi untangle -p PAF for the parameter grid. Python should parse/summarize completed outputs only, or at most generate/submit the sbatch script, but should not be the default path for heavy untangling. Also use the odgi-emitted PAF path as direct input to sweepga for merging/filtering: test sweepga on the multi-output PAF with --num-mappings settings such as 1:many, 2:many, 4:many (or document exact incompatibility if it fails). The report should make clear that native odgi PAF/BEDPE exists and is the correct input path for sweepga-based filtering/merging, correcting the previous task's false premise.
  4. #4user2026-06-19T07:14:46.829795381+00:00delivered
    SweepGA toolchain note: installed /home/erikg/.cargo/bin/sweepga currently reports 0.1.0, while the clean local source checkout /moosefs/erikg/sweepga has Cargo.toml version 0.1.1. I am updating the installed binary from that checkout using /moosefs/erikg/sweepga/scripts/build-clean.sh --install. Before using sweepga on odgi PAF, run sweepga --version and record the path/version in the report. If build/install is not complete when you need it, use the local checkout/build script and the impg env.sh pattern as needed.
  5. #5user2026-06-19T07:15:24.375829740+00:00delivered
    SweepGA update status: /moosefs/erikg/sweepga/scripts/build-clean.sh --install failed because it uses system CMake 3.13.4, while wfmash-rs now requires CMake >=3.16. I am trying the impg env.sh Guix toolchain route next. Until sweepga --version reports the updated binary, do not assume the installed 0.1.0 is current.
  6. #6user2026-06-19T07:16:25.239030979+00:00delivered
    SweepGA update attempt failed for now: build-clean.sh --install failed on system CMake 3.13; impg/env.sh got CMake 3.25 but then failed compiling vendored wfmash due mixed GCC/libstdc++ headers. Installed binary remains sweepga 0.1.0. I am checking build notes/workarounds. If you use sweepga before this is fixed, record that it is 0.1.0 and validate on a small odgi PAF first.
  7. #7user2026-06-19T07:17:14.006013207+00:00delivered
    I submitted a Slurm rebuild for sweepGA as job 1703960: partition workers, 16 CPUs, 32G, log /moosefs/erikg/sweepga/sweepga_build_1703960.log. It runs: clean target, source /moosefs/erikg/impg/env.sh, cargo install --path /moosefs/erikg/sweepga --force, then sweepga --version. Use/record the updated binary only if this job succeeds; otherwise report the build failure and do not hide it.
  8. #8user2026-06-19T07:17:39.562901373+00:00delivered
    SweepGA Slurm build job 1703960 failed immediately because the compute node could not connect to the Guix daemon (/var/guix/daemon-socket/socket refused). Installed binary remains sweepga 0.1.0. I am attempting a clean head-node toolchain build under impg/env.sh next; this is only a binary build, not the odgi analysis.
  9. #9user2026-06-19T07:18:30.871682067+00:00delivered
    SweepGA clean head-node build under impg/env.sh progressed with Guix GCC 12/CMake 3.25 and constrained to 8 CPUs, but failed because vendored wfmash needs GSL headers (gsl/gsl_cdf.h). I am checking a Guix environment with gsl added. Installed sweepga remains 0.1.0 until this succeeds.
  10. #10user2026-06-19T07:20:14.404260800+00:00delivered
    SweepGA update succeeded. /home/erikg/.cargo/bin/sweepga now reports 0.1.1, replacing the old 0.1.0 binary. Working build recipe was: taskset -c 0-7 bash -lc 'cd /moosefs/erikg/sweepga && rm -rf target && source /moosefs/erikg/impg/env.sh && cargo_wrapper="" && eval "export PATH="/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/bin:/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/sbin${PATH:+:}$PATH"
    export GIT_SSL_CAINFO="/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/etc/ssl/certs/ca-certificates.crt"
    export GIT_EXEC_PATH="/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/libexec/git-core"
    export C_INCLUDE_PATH="/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/include${C_INCLUDE_PATH:+:}$C_INCLUDE_PATH"
    export CPLUS_INCLUDE_PATH="/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/include/c++:/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/include${CPLUS_INCLUDE_PATH:+:}$CPLUS_INCLUDE_PATH"
    export LIBRARY_PATH="/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/lib${LIBRARY_PATH:+:}$LIBRARY_PATH"
    export PKG_CONFIG_PATH="/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/lib/pkgconfig${PKG_CONFIG_PATH:+:}$PKG_CONFIG_PATH"
    export CMAKE_PREFIX_PATH="/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/${CMAKE_PREFIX_PATH:+:}$CMAKE_PREFIX_PATH"
    export SSL_CERT_DIR="/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/etc/ssl/certs"
    export SSL_CERT_FILE="/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/etc/ssl/certs/ca-certificates.crt"
    export OBJC_INCLUDE_PATH="/gnu/store/3wsvqpdjcj24h5jrf0hhmxbypy7gc21l-profile/include${OBJC_INCLUDE_PATH:+:}$OBJC_INCLUDE_PATH"" && "" install --path . --force'. The key extras were clean target, impg/env.sh, and adding Guix gsl search paths for wfmash. Use sweepga 0.1.1 for odgi PAF filtering/merging and record this version in the report.
  11. #11user2026-06-19T07:26:26.927209408+00:00delivered
    Correction/update: sweepGA is now rebuilt from current GitHub origin/main, not just the previous local checkout. /moosefs/erikg/sweepga was fast-forwarded to origin/main commit 018e4ce49d2c125820e0ac50dc5feaa02d423683 (Alignment unification + determinism #28), then /home/erikg/.cargo/bin/sweepga was rebuilt/installed from that checkout. sweepga --version still reports 0.1.1, but the installed binary now corresponds to origin/main 018e4ce. Record commit 018e4ce in the report if using sweepGA.
  12. #12user2026-06-19T07:28:07.217293974+00:00delivered
    Hard scope correction. This task is currently too loose. Stop expanding analysis and do not commit the current shape unless it is revised to this exact contract: (1) primary runnable is a committed Slurm/shell script that directly runs odgi untangle BEDPE and odgi untangle -p PAF commands; it must not be a Python --run-odgi driver for heavy work; (2) Python is parse/summarize only and may not be the default heavy runner; (3) use odgi-emitted PAF directly as sweepGA input and run a minimal sweepga test with current /home/erikg/.cargo/bin/sweepga from GitHub main commit 018e4ce, with --num-mappings examples such as 1:many/2:many/4:many, or document exact incompatibility; (4) no manuscript edits; (5) no mechanism or tract-length claim beyond a decision record; (6) final report should answer only: native BEDPE/PAF generated on Slurm? sweepGA PAF filtering works? native merge improves calls enough to justify a later manuscript edit? If not, say no. Keep big intermediates uncommitted.
  13. #13pedigree-untangle-bedpe-paf-merge2026-06-19T07:30:28.193707857+00:00read
    Acknowledged — I will keep the primary runnable as a committed sbatch script with direct odgi untangle BEDPE/PAF commands, use Python only for parse/summarize, record sweepGA 0.1.1 commit 018e4ce, avoid manuscript edits and large intermediates, and produce only the requested decision record.

Log