detailed-mapping-enriched

Detailed mapping: enriched genes → chromosomes, arms, communities

Metadata

Statusdone
Assignedagent-58
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-03-31T22:45:45.604937995+00:00
Started2026-03-31T22:46:05.901938749+00:00
Completed2026-03-31T22:52:57.966301276+00:00
Tagsanalysis,detail, eval-scheduled
Eval score0.88
└ blocking impact0.90
└ completeness0.95
└ coordination overhead0.95
└ correctness0.90
└ downstream usability0.92
└ efficiency0.85
└ intent fidelity0.87
└ style adherence0.90

Description

Goal

Create a detailed, specific mapping of every gene driving the GO enrichment signals to its exact chromosomal location, PHR interval, and Leiden community.

Context

The GO enrichment found 3 functional clusters driven by specific genes. But the synthesis was too vague — we need SPECIFICS for the paper. The previous research identified the genes but didn't map them to locations/communities.

Genes to map:

snRNP/spliceosome cluster (8 LOC lncRNAs): LOC101928344, LOC101928626, LOC101928932, LOC101929650, LOC101929756, LOC101929819, LOC101929823, LOC101929828

Olfactory receptor GO term drivers (4 LINC genes — NOT actual OR genes): Need to identify these 4 LINC genes from the enrichment results. Also map the 14 actual OR4F/OR4G genes that are in PHRs but didn't drive GO enrichment.

miRNA/silencing cluster (5 genes): IL9R, IL9RP1, IL9RP3, IL9RP4, IQSEC3P3 Also: MIR8078 (38 copies) — map all copies.

Approach

  1. Get coordinates: For each gene above, extract its chr:start-end from phrs.no_acro.genes.gff3 (or the full phrs.genes.gff3).

  2. Map to PHR intervals: Cross-reference each gene's coordinates with chm13.phrs.no_acro.bed — which PHR interval does each gene fall in? Which chromosome arm (p or q)?

  3. Map to Leiden communities: Use column 4 of the PHR BED file (the comma-separated chromosome list) to identify which community/sharing pattern each gene's PHR belongs to. Cross-reference with Andrea's section 9 community assignments if possible (from subtelomeric_analysis_report.md).

  4. Create a master table with columns:

    • Gene name
    • Gene type (lncRNA, protein-coding, pseudogene, miRNA)
    • Functional cluster (snRNP, OR, miRNA/silencing)
    • Chromosome
    • Arm (p or q)
    • PHR interval (start-end)
    • Leiden community (if mappable)
    • Brief function note
  5. Create a per-arm summary: For each chromosome arm with enriched genes, list what's there. This lets us say things like "chr1p PHR contains X, Y, Z genes from communities C3 and C5".

Output

  • enriched_genes_detailed_map.csv — the master table
  • enriched_genes_per_arm.md — per-arm narrative summary
  • Log the full master table so we can read it directly

Validation

  • Every gene from the 3 clusters is mapped to a specific chr:start-end
  • Every gene is assigned to a PHR interval
  • Community assignments are provided where possible
  • The per-arm summary covers all arms with enriched genes
  • We can answer: "which genes, on which chromosomes, in which communities?"

Depends on

Required by

Log