fig5-raw-manymany-impg-similarity-2kb-sharded

Fig5 raw many:many IMPG similarity 2kb sharded

Metadata

Statusdone
Assignedagent-2837
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Created2026-06-27T11:17:22.132544854+00:00
Started2026-06-27T11:19:15.350170134+00:00
Completed2026-06-27T11:48:10.519787483+00:00
Tagsfig5, impg, slurm, raw-manymany, sharded, eval-scheduled
Eval score0.76
└ blocking impact0.78
└ completeness0.57
└ constraint fidelity0.55
└ coordination overhead0.82
└ correctness0.74
└ downstream usability0.84
└ efficiency0.86
└ intent fidelity0.87
└ style adherence0.91

Description

Correct replacement execution for Fig5 IMPG similarity at 2 kb resolution.

Goal:

  • Run IMPG similarity over full-genome 2 kb target windows for the existing raw unfiltered many:many alignments.
  • Do not run WFMASH, SweepGA, FastGA, minimap2, seqwish, odgi, or any new alignment/graph construction. This task consumes already-generated PAFs and FASTAs only.

Inputs / evidence layers:

  • WFMASH updated-bin raw many:many unfiltered PAFs from paper_prep/_brainstorming/pedigree_whole_genome_wfmash_p95_updated_bin/summaries/query_grid_filter_manifest.tsv, using only raw_paf.
  • SweepGA/FastGA f32 raw many:many unfiltered PAFs from paper_prep/_brainstorming/pedigree_whole_genome_sweepga_fastga_frequency32/summaries/query_grid_chop_filter_manifest.tsv, using only raw_paf.
  • Query/target FASTA paths from paper_prep/_brainstorming/pedigree_whole_genome_wfmash_p95_updated_bin/summaries/input_manifest.tsv.
  • Previous failed 10 kb task fig5-raw-manymany-impg-similarity-fullbed may be used only for scripts, manifests, WFMASH command validation, and existing BGZF-normalized SweepGA PAF copies if validated with bgzip. Do not treat its partial SweepGA TSVs as valid evidence.

Required execution shape:

  • Build exact full-genome target BEDs with 2,000 bp windows from each target FASTA .fai. Do not expand windows to fixed display widths; last window can be shorter at contig end.
  • Use impg similarity --alignment-files EXISTING_RAW_OR_BGZF_PAF --target-bed SHARD_2KB.bed --sequence-files QUERY.fa TARGET.fa --gfa-engine poa --no-merge --num-mappings many:many --scaffold-jump 0 --threads ${SLURM_CPUS_PER_TASK}.
  • Because 10 kb monolithic SweepGA timed out at 24h/48 CPUs, shard the 2 kb BEDs and submit Slurm arrays/jobs over shards. Every Slurm job must pass exactly ${SLURM_CPUS_PER_TASK} to IMPG. Choose shard size/concurrency pragmatically so work runs in parallel across the cluster without launching one monolithic full-BED job.
  • For SweepGA raw PAFs, IMPG requires BGZF; reuse validated BGZF copies from the previous task if present, otherwise bgzip-normalize the existing raw PAFs only. This is not a new alignment.
  • Record exact raw source PAF path, IMPG alignment PAF path, query FASTA, target FASTA, BED shard, command, Slurm job ID, node/partition, SLURM_CPUS_PER_TASK, IMPG version/path, and output path for every shard.

Required comparisons:

  • PAN027mat_vs_PAN010_joint
  • PAN027pat_vs_PAN011_joint
  • PAN028mat_vs_PAN027_joint

Required methods:

  • wfmash_p95_updated_bin
  • sweepga_fastga_frequency32

Deliverables:

  • One finalized compressed 2 kb IMPG similarity TSV per method x comparison, assembled from completed shards with header/format handled correctly.
  • Shard manifest and Slurm manifest with success/failure state for all shards.
  • Summary tables: per-window target similarity/support, top/all interchromosomal targets, chr9q->chr3q windows, PAR controls, acrocentric controls, and full-genome target-pattern tracks.
  • Concise report explaining that this is raw unfiltered PAF-backed IMPG similarity over 2 kb target windows, with no new alignments.

Depends on

Required by

Log