validate-cross-check

Validate: cross-check all findings against raw data

Metadata

Statusdone
Assignedagent-437
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Created2026-04-01T19:18:43.341043444+00:00
Started2026-04-02T01:26:50.595125033+00:00
Completed2026-04-02T01:32:34.449044879+00:00
Tagsvalidation,critical, eval-scheduled
Tokens0 in / 0 out
Eval score0.86
└ hallucination rate0.05
└ requirement coverage0.85
└ semantic match1.00
└ specificity match0.55

Description

Goal

Validation task: independently verify that all claimed findings in the updated documents match the actual data files. This is NOT input validation — this is scientific result validation.

What to validate

1. Gene counts

  • Read phrs.no_acro.genes.gff3 and count total genes, protein-coding genes, lncRNAs, pseudogenes, miRNAs
  • Compare to what the reports claim
  • Flag any discrepancies

2. Copy counts

  • Read gene_copy_summary.csv and verify copy counts for key families:
    • DUX4 should be 18 copies on 18 arms
    • WASHC1 should be 16 copies on 16 arms
    • OR4F17 should be 20 copies on 20 arms
    • MIR8078 should be 672 copies on 24 arms
  • Cross-check against all_gene_copies_by_arm.csv

3. Enrichment p-values

  • Read the original g:Profiler CSVs (phr_no_acro_GO_BP_enrichment.csv, phr_no_acro_GO_MF_enrichment.csv)
  • Read the protein-coding CSVs (phr_coding_only_GO_BP_enrichment.csv, phr_coding_only_GO_MF_enrichment.csv)
  • Read the copy-weighted results (copy_weighted_vs_deduplicated_comparison.csv, phr_copy_weighted_enrichment.csv)
  • Verify all p-values quoted in reports match the source CSVs

4. Gene-to-arm mappings

  • Spot-check 5 gene families in enriched_genes_detailed_map.csv:
    • Verify chromosome assignments are correct
    • Verify Leiden community assignments are consistent with the PHR BED sharing patterns

5. Angela/Andrea comparisons

  • Verify any claims about Angela's 1Mb GSEA results against PHR_enrichment_summary.xlsx or PHR_enrichment_all_results.xlsx
  • Verify claims about Andrea's section 9 against subtelomeric_analysis_report.md

Output

  • validation_report.md — itemized checklist of what was checked, what passed, what failed
  • Any discrepancies flagged with the correct values

Validation

  • Every check has a pass/fail status
  • Any discrepancies include both the claimed value and the actual value
  • The report is honest — if something doesn't check out, say so

Depends on

Required by

Log