fig5-whole-genome-existing-paf-impg-like-scan

Fig5 raw many:many IMPG similarity genome scan

Metadata

Statusdone
Assignedagent-2759
Agent identity3577bc75d6ed4f1947509aa5c086c91ce7c997c7806dab6bf6affac647452647
Created2026-06-25T16:57:57.980255880+00:00
Started2026-06-25T17:00:58.424796496+00:00
Completed2026-06-25T17:57:49.457343447+00:00
Tagseval-scheduled
Eval score0.11
└ blocking impact0.08
└ completeness0.08
└ constraint fidelity0.10
└ coordination overhead0.08
└ correctness0.05
└ downstream usability0.24
└ efficiency0.10
└ style adherence0.22

Description

Hard correction. The previous implementation path under fig5_whole_genome_existing_paf_impg_like_scan is invalid for this task: it is a homemade PAF bin reducer, includes filtered_one_to_one, and does not run real impg similarity. Do not rerun or recover from those outputs. If real impg similarity cannot be made to run, stop and report the blocker instead of falling back.

Required analysis:

  • Use RAW many:many WFMASH whole-genome PAFs only from the WFMASH manifest raw_paf rows.
  • Use RAW many:many SweepGA/FastGA f32 whole-genome PAFs only from the f32 manifest raw_paf rows.
  • Do not use filtered_paf, chopped PAFs, filtered_one_to_one, or SweepGA 1:1 outputs in the primary analysis.
  • Run real /home/erikg/.cargo/bin/impg similarity on the raw PAFs. The command must include --alignment-files RAW.paf.gz, full-genome BED tiles via --target-bed, --merge-distance 0 or --no-merge, --num-mappings many:many, --scaffold-jump 0, and --threads ${SLURM_CPUS_PER_TASK} after verifying exact CLI help.
  • Tile the full query/reference genome into BED windows. Start with 10 kb tiles for WFMASH raw and SweepGA/FastGA f32 raw across all three comparisons; add 2 kb only after 10 kb is validated.
  • The output must be the IMPG similarity result, then summaries derived from it. A PAF-overlap reducer may only be used as a post-hoc QC comparison, never as the main deliverable.

Parallelization requirements:

  • Use Slurm only; do not run full scans on the head node.
  • Use job arrays or separate Slurm jobs over method x comparison x BED shard/chromosome. Do not process all comparisons serially inside one job.
  • Each Slurm job must request 48 CPUs on workers/octopus or 96 CPUs on tux when available, and pass that exact allocation to impg similarity --threads.
  • Record Slurm job IDs, node, partition, SLURM_CPUS_PER_TASK, exact impg similarity command lines, IMPG binary path/version, raw PAF path, BED shard, and output path.

Deliverables go under a new clean directory: paper_prep/_brainstorming/fig5_raw_manymany_impg_similarity_scan/. Include scripts, Slurm submit wrappers, manifests, raw/compressed IMPG outputs, summaries, validation logs, and a report comparing WFMASH raw vs SweepGA/FastGA f32 raw for PAN027/PAN028 chr9q->chr3q, PAR, acrocentric controls, and full-genome interchromosomal patterns.

Depends on

Required by

Messages 2 messages

  1. #1user2026-06-25T17:09:41.957466992+00:00read
    Stop the current/old implementation path. Slurm job 1706566 was cancelled from coordination context because the spec was wrong. New spec: use RAW many:many WFMASH PAFs and RAW many:many SweepGA/FastGA f32 PAFs only; run real impg similarity over full-genome BED tiles (10kb first, optionally 2kb after validation), with --threads from SLURM_CPUS_PER_TASK. Do not use filtered_paf / 1:1 / chopped-filtered outputs as primary evidence. Do not resubmit the old PAF-proxy scan.
  2. #2user2026-06-25T17:40:32.350222745+00:00read
    Cancel/abandon the existing_paf_impg_like_scan implementation. It is invalid: PAF reducer, filtered_one_to_one included, not real impg similarity. Slurm 1706569 has been cancelled. New requirement: create clean fig5_raw_manymany_impg_similarity_scan; run real /home/erikg/.cargo/bin/impg similarity on raw many:many WFMASH and raw many:many SweepGA/FastGA f32 PAFs only; parallelize as Slurm arrays over method x comparison x BED shard/chromosome; pass --threads . If impg similarity is blocked, stop and report rather than fallback.

Log