review-zoom-v6-community-assignment-method-slide

Add community assignment methods slide to review zoom

Metadata

Statusdone
Assignedagent-1104
Agent identity3577bc75d6ed4f1947509aa5c086c91ce7c997c7806dab6bf6affac647452647
Created2026-05-07T18:25:15.410250301+00:00
Started2026-05-07T18:43:34.744513573+00:00
Completed2026-05-07T18:58:24.522599585+00:00
Tagsreview-zoom, review-zoom-v6, communities, methods, leiden, slides, eval-scheduled
Eval score0.95
└ blocking impact0.97
└ completeness0.97
└ constraint fidelity0.55
└ coordination overhead0.94
└ correctness0.96
└ downstream usability0.96
└ efficiency0.92
└ intent fidelity0.87
└ style adherence0.95

Description

Add one concise methods slide explaining how the C1-C15 arm-level community assignments were made.

User intent:

  • The talk needs a brief slide on community assignment methods/algorithm/parameters.
  • Keep it readable and non-defensive: enough detail to make the heatmaps/community calls credible, not a methods dump.
  • Clarify that there were few manually chosen parameters, but not literally none. The community labels are algorithmic; biological names/interpretations were added afterward.

Placement:

  • Add this as a follow-on to the current review-zoom deck chain, after review-zoom-v6-pggb-graph-black, to avoid racing current v6 edits.
  • Insert near the PHR Jaccard/similarity-method slide and before the tree/community heatmap sequence if possible.
  • Preserve all current v6 content: Dip-C slides, PGGB graph recolor, enrichment slides, v5/v6 fixes.

Core slide title:

  • "How we assigned PHR communities"

Core slide content to express:

  1. Start with 15,668 PHR paths from HPRCv2 haplotypes/arms with inter-chromosomal PHR signal.
  2. Build one PGGB graph (pggb -p 95) and compute all-vs-all graph-path Jaccard with odgi similarity --all -P.
  3. Collapse path-level similarities to chromosome arms: for each arm pair A x B, average all haplotype/path pair distances, using distance = 1 - Jaccard, producing a 41 x 41 arm-level distance matrix.
  4. Primary clustering: Leiden on a fully connected weighted graph of arms, with edge weights w_ij = exp(-d_ij / median(d)).
  5. Parameter selection: scan Leiden resolution 0.1-3.0 in 0.01 steps and choose the partition with maximum mean silhouette. Result: 15 arm-level communities; optimal resolution 1.16; silhouette 0.347.
  6. Robustness/comparison: UPGMA average-linkage on the same distance matrix gives 14 communities with similar silhouette 0.342 and agrees on 12/15 Leiden communities; differences are boundary cases around f7501-like arms.
  7. Biological labels (D4Z4, acrocentric p, PAR1/PAR2, f7501/OR4F etc.) were assigned after clustering for interpretation, not used as inputs.

Suggested short on-slide wording:

  • "No gene labels or 3D data were used to define communities."
  • "Inputs: graph-path Jaccard only."
  • "One automated choice: Leiden resolution selected by silhouette scan."
  • "Output: 15 communities across 41 arms with detected inter-chromosomal PHR signal."

Important caveats:

  • Do not confuse arm-level C1-C15 with the separate sequence-level 50-community partition. If mentioned, say sequence-level communities are a separate finer-grained analysis.
  • Do not imply all 48 arms were clustered; seven zero-signal arms were excluded from the 41 x 41 matrix.
  • Do not imply CHM13 called PHR intervals exist for every community-assigned arm.

Source anchors:

  • subtelomeric_analysis_report.md section 5 and 6.1.
  • /moosefs/guarracino/HPRCv2/scripts/similarity/plot-similarity-subtelo.R
  • /moosefs/guarracino/HPRCv2/PHR_III/similarity/hprcv2.1Mb.subtelo.arm_dist_matrix.tsv
  • /moosefs/guarracino/HPRCv2/PHR_III/similarity/hprcv2.1Mb.subtelo.arm-leiden-k15.assignments.tsv

Deliverables:

  • Updated review-zoom PDF using the latest current deck version as base.
  • slides/v2-review-zoom/_revision_assets/v6/community_assignment_method/README.md
  • Any small schematic/asset needed for the slide under slides/v2-review-zoom/_revision_assets/v6/community_assignment_method/
  • Updated revision notes for the deck version being modified.
  • Page PNG export for the new methods slide and nearby context.

Validation:

  • PDF renders successfully.
  • New slide explains inputs, aggregation, Leiden algorithm, resolution/silhouette selection, and UPGMA comparison in one slide.
  • Slide explicitly says no gene labels or 3D data were used for assignment.
  • Slide includes the key numbers: 15,668 paths, 41 x 41 arms, 15 Leiden communities, resolution 1.16, silhouette 0.347.
  • Slide does not conflate arm-level C1-C15 with sequence-level 50-community clustering.
  • Page PNG export is nonblank/readable at 1920x1080.
  • git diff --check passes.

Depends on

Required by

Log