implement-copy-number

Implement copy-number-aware enrichment analysis

Metadata

Statusdone
Assignedagent-73
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Created2026-04-01T14:47:27.639789922+00:00
Started2026-04-01T14:50:14.560811007+00:00
Completed2026-04-01T14:56:15.429461085+00:00
Tagsanalysis,impl, eval-scheduled
Eval score0.82
└ blocking impact0.90
└ completeness0.88
└ coordination overhead0.87
└ correctness0.83
└ downstream usability0.80
└ efficiency0.78
└ intent fidelity0.77
└ style adherence0.85

Description

Goal

Implement the top 2-3 copy-number-aware enrichment methods recommended by the research task, and run them on the PHR gene data.

Context

  • 29 non-acrocentric PHR intervals on CHM13
  • 1,189 gene copies (23 unique protein-coding families + ncRNA) across these intervals
  • Standard ORA deduplicates and loses the copy structure
  • The research task (research-copy-number) will recommend specific methods — read its output first

Input data

  • chm13.phrs.no_acro.bed — 29 PHR intervals
  • phrs.no_acro.genes.gff3 — all gene copies in PHR intervals
  • gene_copy_summary.csv — copy counts per gene family
  • all_gene_copies_by_arm.csv — every copy with location
  • chm13v2.0_RefSeq_Liftoff_v5.2.gff3.gz — full genome annotation (for background)

Approach

Follow the recommendations from the research task. For each method:

  1. Prepare inputs in the required format
  2. Run the analysis with appropriate parameters
  3. Save results as CSV with term, p-value, gene count, copy count
  4. Log top results and compare to the standard ORA findings

For ALL methods:

  • Background must also be copy-number-aware (count all copies genome-wide, not just unique genes)
  • Report both the copy-weighted result AND the contrast with the deduplicated ORA
  • Run on non-acrocentric PHR intervals

Output

  • Results CSV for each method run
  • Comparison table: standard ORA vs copy-aware method(s)
  • Clear statement: does copy awareness change the enrichment picture?
  • If new terms appear or old terms strengthen: highlight these

Validation

  • At least 2 methods are implemented and run
  • Background is properly constructed (genome-wide copy counts)
  • Results are compared to previous deduplicated ORA
  • A clear conclusion on whether copy awareness matters for these data

Depends on

Required by

Log