finalize-fig5-raw-after-arrays

Finalize Fig5 raw many:many IMPG 2kb shards when Slurm arrays complete

Metadata

Statusopen
Created2026-06-27T15:46:22.357517260+00:00
Tagseval-scheduled, fig5

Description

Recover from the premature failure of finalize-fig5-raw. Do not rerun WFMASH, SweepGA/FastGA, minimap2, seqwish, odgi, or any alignment. Use the existing 2 kb IMPG Slurm arrays submitted by fig5-raw-manymany-impg-similarity-2kb-sharded: 1706840,1706841,1706842,1706843,1706844,1706845.

Important paths:

  • Live shard outputs/logs are under /moosefs/erikg/phrs/.wg-worktrees/agent-2837/paper_prep/_brainstorming/fig5_raw_manymany_impg_similarity_2kb_sharded/
  • Main target path is /moosefs/erikg/phrs/paper_prep/_brainstorming/fig5_raw_manymany_impg_similarity_2kb_sharded/

First check Slurm with squeue/sacct. If any of arrays 1706840-1706845 are still RUNNING/PENDING, do not run finalization against incomplete shards. Log exact state and create a new delayed follow-up task rather than marking this as a pipeline/data failure.

Once all six arrays are terminal and successful, normalize tmp shard filenames if needed, run scripts/finalize_2kb_sharded_impg.py against the live agent-2837 output tree or sync live outputs into the main target tree before finalization. Preserve all-hit assembled outputs for audit. For plotting summaries/tracks, keep/select the single best similarity/support hit per 2 kb query window with deterministic tie-breaking: highest similarity/ANI/support score first, then aligned/support length, then stable lexical target coordinates. Document the exact rule in REPORT.md.

Validation:

  • All 906 shard rows in manifests/shard_completion_manifest.tsv are OK, or any failed Slurm shard is diagnosed concretely with log paths.
  • Six assembled compressed outputs exist, one per method x comparison.
  • Assembled all-hit outputs preserve complete IMPG similarity records for audit.
  • Plotting tables reduce to one best hit per 2 kb window: per_window_target_similarity_support.tsv and full_genome_target_pattern_tracks.tsv.
  • Summary tables include top_interchromosomal_targets.tsv, all_interchromosomal_targets.tsv, chr9q_chr3q_windows.tsv, par_controls.tsv, acrocentric_controls.tsv.
  • REPORT.md records final Slurm state, live/source paths, output paths, row counts, and best-hit tie-breaking.
  • Commit and push changes, then report whether this supersedes failed finalize-fig5-raw.

Depends on

Required by

Log