review-zoom-v5-phr-jaccard-method-slide

Add PHR Jaccard similarity workflow slide to review zoom v5

Metadata

Statusdone
Assignedagent-1079
Agent identity3577bc75d6ed4f1947509aa5c086c91ce7c997c7806dab6bf6affac647452647
Created2026-05-07T15:51:54.781401653+00:00
Started2026-05-07T16:13:46.471702952+00:00
Completed2026-05-07T16:33:53.189478117+00:00
Tagsreview-zoom, review-zoom-v5, methods, jaccard, phr-similarity, eval-scheduled
Eval score0.95
└ blocking impact0.96
└ completeness0.95
└ coordination overhead0.94
└ correctness0.96
└ downstream usability0.97
└ efficiency0.94
└ intent fidelity0.88
└ style adherence0.94

Description

Add one concise methods/process slide to the review-zoom v5 deck after the current v5 fan-in render completes.

User intent:

  • We have been glossing over how the PHR similarity/Jaccard matrix is generated.
  • The slide should be step-by-step and visual-friendly; Erik will illustrate it manually if needed.
  • Keep it compact enough for a talk slide, but precise about the bundle/self-comparison issue.

Placement:

  • Start from the output of review-zoom-v5-enrichment-fanin-render.
  • Insert this methods slide immediately before the first PHR similarity heatmap/community slide, or wherever it best introduces the 07a tree-to-community story.
  • Keep existing v5 additions intact: the 07a two-slide heatmap sequence, enrichment slides, v4 slide10a X-axis fix, and revision notes.

Core slide content to express:

  1. Detect candidate PHR intervals from HPRCv2 subtelomeric flanks using all-vs-all alignment indexed/queryable via IMPG; keep inter-chromosomal sharing within the terminal analysis window.
  2. Extract the PHR sequence for every sample/haplotype/chromosome-arm call. These sequences form arm-specific bundles, e.g. all haplotype paths assigned to chr9q.
  3. Build one PGGB graph from the full collection of PHR sequences, not one graph per arm pair.
  4. Compute pairwise graph-path Jaccard with odgi similarity --all -P: for two paths/bundles, shared graph nodes are the intersection and all traversed graph nodes are the union.
  5. Aggregate from sequence/path pairs to chromosome-arm bundles: for every arm A x arm B, average the Jaccard values over all haplotype-path pairs in bundle A and bundle B.
  6. Include same-arm bundle comparisons. The arm-level A x A value is not forced to 1 because it averages many distinct haplotypes/paths within the same arm, not only each path compared to itself. This is a within-arm heterogeneity signal.
  7. Convert similarity to distance as needed (distance = 1 - Jaccard) and use the arm-level matrix for UPGMA/Leiden community views.

Suggested slide title:

  • "How we turn PHR paths into the similarity matrix"

Suggested slide text skeleton:

  • IMPG defines the PHR interval calls from HPRCv2 all-vs-all subtelomeric alignments.
  • PGGB builds one graph over all extracted PHR paths.
  • ODGI reports graph-node Jaccard for every pair of paths.
  • Arm-pair cells are bundle averages across haplotypes, including A x A.
  • A x A can be < 1: same arm, different haplotypes, different graph paths.

Validation:

  • v5 PDF renders successfully after adding the methods slide.
  • The slide explicitly names IMPG, PGGB, ODGI/Jaccard, arm/haplotype bundles, and same-arm self-bundle averaging.
  • The slide says A x A / self-bundle values are not necessarily 1 and explains why in one sentence.
  • The added text is not overloaded; it should be readable on one slide and suitable for a hand-drawn/illustrated process figure.
  • REVISION_NOTES_V5.md records the added methods slide and its purpose.
  • New/changed slide PNG exports are generated and nonblank.
  • git diff --check passes.

Depends on

Required by

Log