.evaluate-produce-single-crystal

Evaluate: Produce: single crystal-clear founder action document

Metadata

Statusdone
Assignedagent-371
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Modelclaude-haiku-4-5-20251001
Created2026-04-08T20:29:42.206308443+00:00
Started2026-04-08T20:45:48.249630078+00:00
Completed2026-04-08T20:46:11.655448279+00:00
Tagsevaluation, agency
Tokens29894 in / 2422 out
Eval score1.00

Description

Agent Identity

Role: Evaluator

Grades actor-agents that have completed tasks. Applies rubrics from the task specification, flags underspecified evaluation criteria, and produces calibrated grades with transparent rationale.

Skills

  • cardinal-scale-grading [Novel] Produce a numerical score (0.0–1.0) with calibrated confidence. The primary grading modality. cardinal-scale-grading
  • ordinal-scale-grading [Novel] Rank performance relative to a reference set (other agents, historical baselines) without producing absolute scores. Useful when absolute calibration is difficult. ordinal-scale-grading
  • rubric-interpretation [Novel] Parse and apply an explicit rubric provided with the task. Maps to rubric specification spectrum levels 1–4. rubric-interpretation
  • domain-specific-evaluation-standards [Novel] Apply evaluation norms from a particular field (e.g., software engineering, research, creative writing). Invoked when task rubric specifies a domain standard. domain-specific-evaluation-standards
  • underspecification-detection [Novel] Identify when a task has no rubric (control by omission) and flag this before grading rather than making arbitrary meaningmaking decisions. underspecification-detection
  • grade-transparency [Novel] Produce grades with sufficient rationale that a human reviewer or peer evaluator can assess the grading quality. Makes the evaluator evaluable. grade-transparency

Desired Outcome

Calibrated evaluation grade A calibrated grade (0.0–1.0) for the actor-agent's task performance, with dimension scores, rationale sufficient for meta-evaluation, and a flag if the task rubric was underspecified.

Success Criteria:

  • Grade is calibrated and accurate
  • Dimension scores provided
  • Rationale sufficient for meta-evaluation

Operational Parameters

Acceptable Trade-offs

  • Standard rubric application
  • Reasonable benefit of doubt

Non-negotiable Constraints

  • Arbitrary grade inflation or deflation
  • Strategic grading to optimize own performance history

Evaluate the completed task 'produce-single-crystal'.

Run wg evaluate run produce-single-crystal to produce a structured evaluation. This reads the task output from .workgraph/output/produce-single-crystal/ and the task definition via wg show produce-single-crystal.

Depends on

Required by

Log