finalize-fig5-raw

Finalize Fig5 raw many:many IMPG 2kb shards

Metadata

Statusfailed
Assignedagent-2842
Agent identity46f6237a65ec4f1002c4d3fb201dc8633638d0947c276be7008c227e1051ba5e
Created2026-06-27T11:41:16.046190924+00:00
Started2026-06-27T11:58:59.350571254+00:00
Tagseval-scheduled
Failure reasonAttempted finalization but Slurm arrays 1706840-1706845 were still incomplete: at 2026-06-27T13:47Z two tasks from 1706840 were running, six array ranges were pending, only 15 shard metadata files and 13 temp gzip shard outputs existed, and zero final-named shard outputs existed. The finalizer requires all 906 shard rows to be OK, so running it now would fail or assemble incomplete data. Recorded recovery notes and current state in commit 437139f.

Description

Description

Finalize the submitted Slurm arrays for fig5-raw-manymany-impg-similarity-2kb-sharded. File scope: paper_prep/_brainstorming/fig5_raw_manymany_impg_similarity_2kb_sharded/outputs/, metadata/, manifests/shard_completion_manifest.tsv, manifests/assembled_outputs.tsv, summaries/, REPORT.md only. Implement directly — do not decompose further.

Arrays submitted by the parent task: 1706840,1706841,1706842,1706843,1706844,1706845. Run the existing finalizer after all shards complete:

python3 paper_prep/_brainstorming/fig5_raw_manymany_impg_similarity_2kb_sharded/scripts/finalize_2kb_sharded_impg.py

Validation

  • All 906 shard rows in manifests/shard_completion_manifest.tsv have state OK
  • Six assembled compressed outputs exist under outputs/assembled/, one per required method x comparison
  • Assembled outputs have one header each and nonzero data rows
  • Summary tables exist: per_window_target_similarity_support.tsv, top_interchromosomal_targets.tsv, all_interchromosomal_targets.tsv, chr9q_chr3q_windows.tsv, par_controls.tsv, acrocentric_controls.tsv, full_genome_target_pattern_tracks.tsv
  • REPORT.md updated with final Slurm completion state, output paths, and row counts

Depends on

Required by

Messages 2 messages (2 unread)

  1. #1codex2026-06-27T12:05:09.434177514+00:00read
    For plotting summaries, reduce IMPG similarity output to the single best target hit per 2 kb query/target window. Keep all raw/all-hit assembled output and all_interchromosomal_targets.tsv for audit/debug, but per_window_target_similarity_support.tsv and full_genome_target_pattern_tracks.tsv should choose the best similarity/support record per window, with deterministic tie-breaking documented. Do not plot multiple overlapping hits per window as separate tracks unless explicitly requested.
  2. #2finalize-fig5-raw2026-06-27T12:15:42.881786799+00:00read
    Acknowledged — I will keep assembled raw/all-hit outputs and all_interchromosomal_targets.tsv intact, then collapse per_window_target_similarity_support.tsv and full_genome_target_pattern_tracks.tsv to one deterministic best record per 2 kb window and document the tie-break in REPORT.md.

Log