go-enrichment-on

GO enrichment on protein-coding genes only

Metadata

Statusdone
Assignedagent-61
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-04-01T12:49:44.455131558+00:00
Started2026-04-01T12:50:09.835684602+00:00
Completed2026-04-01T12:54:26.985883845+00:00
Tagsanalysis, eval-scheduled
Eval score0.86
└ blocking impact0.95
└ completeness0.90
└ coordination overhead0.90
└ correctness0.85
└ downstream usability0.78
└ efficiency0.85
└ intent fidelity0.87
└ style adherence0.88

Description

Goal

Rerun GO enrichment using ONLY protein-coding genes from the PHR intervals (excluding acrocentrics), to test whether any functional pathway enrichment exists beyond the lncRNA/pseudogene/miRNA signal.

Context

Previous analysis found 220 genes in non-acrocentric PHRs. The GO enrichment was dominated by:

  • 8 LOC lncRNAs (snRNP signal)
  • 36 MIR8078 copies (miRNA signal)
  • OR pseudogenes + LINC genes (olfactory signal)
  • IL9R pseudogenes (silencing signal)

Step 2 found only ~27 protein-coding genes out of 245 total. We want to know: does ANY enrichment survive when restricted to protein-coding genes?

Approach

Step 1: Extract protein-coding genes only

From phrs.no_acro.genes.gff3, filter for protein-coding genes:

grep 'gene_biotype=protein_coding' phrs.no_acro.genes.gff3 > phrs.no_acro.coding_genes.gff3
# Or use the biotype field in the GFF3 attributes
# Extract gene names
grep -oP 'Name=\K[^;]+' phrs.no_acro.coding_genes.gff3 | sort -u > phrs.no_acro.coding_gene_names.txt

Log the count and list all protein-coding gene names. We expect ~20-27 genes.

Step 2: Run GO enrichment via g:Profiler

Use the same g:Profiler API approach as step-3-run-go. Query with ONLY the protein-coding gene list. Background: all human genes.

Save results to:

  • phr_coding_only_GO_BP_enrichment.csv
  • phr_coding_only_GO_MF_enrichment.csv

Step 3: Report and compare

  • If enrichment is found: what terms? Are they different from the full-gene-set analysis?
  • If NO enrichment: log this clearly — it confirms that PHR functional enrichment is driven by ncRNA/pseudogene content, not protein-coding pathways
  • List all protein-coding genes with their chromosomal location and brief function
  • Compare: are the OR4F protein-coding copies enough to drive olfactory enrichment alone?

Output

  • Gene list file: phrs.no_acro.coding_gene_names.txt
  • Enrichment CSVs (even if empty)
  • Full log of all protein-coding gene names, locations, and functions
  • Clear conclusion: enrichment found or not

Validation

  • Protein-coding gene count is reported
  • All protein-coding genes are listed by name
  • GO enrichment results (or lack thereof) are clearly logged
  • Comparison to full-gene-set enrichment is addressed

Depends on

Required by

Log