step-3-run-go

Step 3: Run GO enrichment analysis

Metadata

Statusdone
Assignedagent-23
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Created2026-03-31T21:03:46.636268534+00:00
Started2026-03-31T21:11:53.033421501+00:00
Completed2026-03-31T21:16:59.083095789+00:00
Tagsimpl,analysis, eval-scheduled
Eval score0.84
└ blocking impact0.90
└ completeness0.82
└ coordination overhead0.88
└ correctness0.85
└ downstream usability0.75
└ efficiency0.90
└ intent fidelity0.82
└ style adherence0.85

Description

Goal

Run over-representation analysis (ORA) on the PHR gene list using clusterProfiler in R.

Approach

library(clusterProfiler)
library(org.Hs.eg.db)

# Read PHR gene list from Step 2
phr_genes <- readLines('phrs.gene_names.txt')

# Convert gene symbols to Entrez IDs
gene_ids <- bitr(phr_genes, fromType='SYMBOL', toType='ENTREZID', OrgDb=org.Hs.eg.db)

# GO Biological Process
ego_bp <- enrichGO(gene = gene_ids$ENTREZID,
                   OrgDb = org.Hs.eg.db,
                   ont = 'BP',
                   pAdjustMethod = 'BH',
                   pvalueCutoff = 0.05)

# GO Molecular Function
ego_mf <- enrichGO(gene = gene_ids$ENTREZID,
                   OrgDb = org.Hs.eg.db,
                   ont = 'MF',
                   pAdjustMethod = 'BH',
                   pvalueCutoff = 0.05)

# KEGG (optional)
ekegg <- enrichKEGG(gene = gene_ids$ENTREZID, organism = 'hsa')

# Save results
write.csv(as.data.frame(ego_bp), 'phr_GO_BP_enrichment.csv')
write.csv(as.data.frame(ego_mf), 'phr_GO_MF_enrichment.csv')
write.csv(as.data.frame(ekegg), 'phr_KEGG_enrichment.csv')

# Generate plots
pdf('phr_GO_BP_dotplot.pdf')
dotplot(ego_bp, showCategory=20)
dev.off()

pdf('phr_GO_MF_dotplot.pdf')
dotplot(ego_mf, showCategory=20)
dev.off()

Key decisions

  • Background set: Use all human genes (most conservative). This is the primary analysis.
  • If R/clusterProfiler is not available, use gprofiler2 or the g:Profiler web API as fallback.
  • Check the research task output for tool availability.

Validation

  • CSV result files exist with enrichment results
  • PDF dotplots are generated
  • Log the top 10 enriched BP and MF terms with p-values
  • Note how many PHR genes mapped to Entrez IDs vs how many were lost

Depends on

Required by

Log