fig5-raw-manymany-impg-similarity-fullbed

Fig5 raw many:many IMPG similarity full-BED jobs

Metadata

Statusfailed
Assignedagent-2764
Agent identity46f6237a65ec4f1002c4d3fb201dc8633638d0947c276be7008c227e1051ba5e
Created2026-06-25T17:59:05.908719131+00:00
Started2026-06-26T03:56:26.098268728+00:00
Tagseval-scheduled
Failure reasonAttempted and monitored the required raw many:many IMPG full-BED execution. WFMASH completed for all three comparisons (jobs 1706572, 1706573, 1706574) and produced finalized .tsv.gz outputs. SweepGA/FastGA f32 full-BED jobs 1706581 and 1706582 timed out at 24h on workers/48 CPUs with partial uncompressed outputs (6.4G and 4.9G); job 1706583 was cancelled after this confirmed blocker. Per task instructions, stopped and reported the blocker plus logs in paper_prep/_brainstorming/fig5_raw_manymany_impg_similarity_scan/REPORT.md. Required six complete outputs and derived summaries are therefore unavailable. Recommend retrying SweepGA on tux/96 CPUs with longer walltime, then sharding only if that still fails. Committed and pushed documentation in ef24d4a.

Description

Clean replacement for the invalid existing-PAF reducer task. There must be only one valid output area: paper_prep/_brainstorming/fig5_raw_manymany_impg_similarity_scan/. Ignore/do not use paper_prep/_brainstorming/fig5_whole_genome_existing_paf_impg_like_scan/ except as a failure record.

Use real IMPG similarity. IMPG similarity has built-in parallelism over the target BED/region list; do not shard regions first unless a full-BED job fails or exceeds limits. The default execution unit should be one Slurm job per raw PAF evidence layer/comparison, giving IMPG the full BED of tiled regions and all allocated threads.

Inputs:

  • WFMASH updated-bin raw many:many whole-genome PAFs from paper_prep/_brainstorming/pedigree_whole_genome_wfmash_p95_updated_bin/summaries/query_grid_filter_manifest.tsv, using only the raw_paf column.
  • SweepGA/FastGA f32 raw many:many whole-genome PAFs from paper_prep/_brainstorming/pedigree_whole_genome_sweepga_fastga_frequency32/summaries/query_grid_chop_filter_manifest.tsv, using only the raw_paf column.
  • Same query/target FASTA naming from paper_prep/_brainstorming/pedigree_whole_genome_wfmash_p95_updated_bin/summaries/input_manifest.tsv.

Required command shape after verifying impg similarity --help: /home/erikg/.cargo/bin/impg similarity --alignment-files RAW.paf.gz --target-bed full_genome_10kb.bed --merge-distance 0 --no-merge --num-mappings many:many --scaffold-jump 0 --threads ${SLURM_CPUS_PER_TASK} Use the exact valid combination if IMPG rejects redundant --merge-distance 0 plus --no-merge, but keep no merging/chaining behavior.

Execution:

  • Build full-genome 10 kb BED tiles first. Add 2 kb only after the 10 kb run validates.
  • Submit Slurm jobs in parallel over method x comparison raw PAFs. There are expected to be six primary jobs: 2 methods x 3 comparisons.
  • Each job should request 48 CPUs on workers/octopus or 96 CPUs on tux if using tux, and pass exactly that to impg similarity --threads.
  • Record exact commands, raw PAF paths, BED path, job IDs, node/partition, SLURM_CPUS_PER_TASK, IMPG version/path, and output paths.
  • Do not use filtered_paf, filtered_one_to_one, chopped filtered PAFs, or PAF-overlap reducer output as primary evidence.
  • If IMPG similarity cannot process a full BED in one job, stop and report the blocker plus logs; only then propose region sharding.

Deliverables:

  • Raw/compressed IMPG similarity outputs for WFMASH raw and SweepGA/FastGA f32 raw across all three comparisons.
  • Summaries derived from IMPG output: per-window target similarity/support, top/all interchromosomal targets, chr9q->chr3q windows, PAR, acrocentric controls, and full-genome target-pattern tracks.
  • Concise report explaining methods and results, under paper_prep/_brainstorming/fig5_raw_manymany_impg_similarity_scan/.

Depends on

Required by

Log