rerun-enrichment-excluding

Rerun enrichment excluding C7/acrocentric PHRs

Metadata

Statusdone
Assignedagent-36
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-03-31T22:01:30.067957175+00:00
Started2026-03-31T22:01:50.088259003+00:00
Completed2026-03-31T22:06:21.503945783+00:00
Tagsanalysis, eval-scheduled
Eval score0.88
└ blocking impact0.90
└ completeness0.92
└ coordination overhead0.90
└ correctness0.88
└ downstream usability0.80
└ efficiency0.85
└ intent fidelity0.85
└ style adherence0.88

Description

Goal

Redo the PHR gene enrichment analysis but EXCLUDING acrocentric chromosome PHRs (community C7).

Context

The previous analysis found 25 significant BP terms and 3 MF terms dominated by RNA splicing/snRNP assembly (8/23 genes) and olfactory receptor activity (4/18 genes). We want to know if these signals persist or are driven by the acrocentric PHRs.

Approach

Step 1: Filter chm13.phrs.bed

Remove acrocentric chromosome rows. Acrocentric chromosomes are chr13, chr14, chr15, chr21, chr22. Filter these out from chm13.phrs.bed:

grep -v -E '^chr(13|14|15|21|22)\b' chm13.phrs.bed > chm13.phrs.no_acro.bed

Log how many intervals remain (should be ~37 minus the acrocentric entries).

Step 2: Re-intersect genes

zcat chm13v2.0_RefSeq_Liftoff_v5.2.gff3.gz \
  | awk '$3 == "gene"' \
  | bedtools intersect -a - -b chm13.phrs.no_acro.bed -wa \
  > phrs.no_acro.genes.gff3

Extract gene names:

grep -oP 'Name=\K[^;]+' phrs.no_acro.genes.gff3 | sort -u > phrs.no_acro.gene_names.txt

Log gene count and compare to the 245 from the full analysis.

Step 3: Re-run GO enrichment

Use g:Profiler API (same approach as step-3-run-go used — clusterProfiler not installable). Query with the filtered gene list against all human genes background.

Save results to:

  • phr_no_acro_GO_BP_enrichment.csv
  • phr_no_acro_GO_MF_enrichment.csv

Step 4: Compare

  • Which terms survived the acrocentric exclusion?
  • Which terms were lost (i.e. driven by acrocentric PHR genes)?
  • Does the snRNP/splicing signal persist?
  • Does the OR signal change?
  • Log a comparison table and narrative summary

Output files

  • chm13.phrs.no_acro.bed
  • phrs.no_acro.genes.gff3
  • phrs.no_acro.gene_names.txt
  • phr_no_acro_GO_BP_enrichment.csv
  • phr_no_acro_GO_MF_enrichment.csv
  • Comparison narrative in the task log

Validation

  • Acrocentric chromosomes (13, 14, 15, 21, 22) are absent from the filtered BED
  • Gene count is reported and compared to 245
  • GO enrichment results are logged with full term names and p-values
  • Comparison to original results identifies which signals are acrocentric-driven vs genome-wide

Depends on

Required by

Log