Metadata
| Status | done |
|---|---|
| Assigned | agent-399 |
| Agent identity | 3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3 |
| Created | 2026-04-12T18:46:35.251442108+00:00 |
| Started | 2026-04-12T18:53:35.476484728+00:00 |
| Completed | 2026-04-12T18:53:38.103822039+00:00 |
| Tags | eval-scheduled |
| Eval score | 0.83 |
| └ blocking impact | 0.80 |
| └ completeness | 0.90 |
| └ coordination overhead | 0.85 |
| └ correctness | 0.85 |
| └ downstream usability | 0.80 |
| └ efficiency | 0.80 |
| └ intent fidelity | 0.81 |
| └ style adherence | 0.85 |
Description
Quality Pass: Post-Triage Review
Review and optimize task metadata for newly created tasks before they enter execution.
Tasks to review
- review-and-merge
- fix-terminology-single
What to do
For EACH task listed above:
1. Classify task type
Read the task via wg show <task-id>. Classify as one of:
- research — Investigation, analysis, library evaluation
- implementation — New code, features, endpoints
- fix — Bug fixes, error corrections
- design — Architecture, API design, planning
- test — Test writing, test infrastructure
- docs — Documentation, comments, guides
- refactor — Code restructuring without behavior change
2. Assign agent identity
Run wg agency stats --by-task-type to see role performance by task type.
The Recommendations by Task Type table shows the best role for each type.
Use the recommended role for the task's classified type. If the recommendation says '(insufficient data)', fall back to the overall Role Leaderboard.
For JSON access (machine-readable): wg agency stats --by-task-type --json
Look at .task_type_breakdown.recommendations[].best_role.
Apply: wg assign <task-id> <agent-hash>
3. Select model tier
Run wg agency stats --by-task-type and check the Best Model by Task Type
table. Use the top-scoring model for the task's classified type.
For JSON access: .task_type_breakdown.recommendations[].best_model.
Override heuristics (when evaluation data is sparse or absent):
| Signal | Model |
|---|---|
| Simple, mechanical, well-defined (e.g., 'add a flag', 'rename X') | haiku |
| Standard implementation, testing, research | sonnet |
| Complex design, multi-system reasoning, novel architecture | opus |
| Task has failed before (check status history) | escalate one tier |
Apply: wg edit <task-id> --model <tier>
4. Release for execution
After assigning agent and model: wg resume <task-id>
Validation
- Every listed task has an agent assigned (check via
wg show) - Every listed task has a model set
- Every listed task is un-paused (status: open, not paused)
- Assignments are justified by evaluation data, not arbitrary
Depends on
Required by
Log
- 2026-04-12T18:48:13.442817514+00:00 Spawned by coordinator --executor claude --model claude-opus-4-6
- 2026-04-12T18:48:30.270334927+00:00 Starting quality pass. Reviewing tasks: review-and-merge and fix-terminology-single
- 2026-04-12T18:49:06.694583158+00:00 Classification: review-and-merge → type 'other' (PR review/merge). fix-terminology-single → type 'fix' (terminology correction).
- 2026-04-12T18:53:33.599639093+00:00 Quality pass complete. Assignments: - review-and-merge: Careful Programmer (f5143935), claude:sonnet, status=open. Rationale: Programmer role 52335de1 best for 'other' type (0.91 avg); sonnet best model for 'other' (0.93 avg). - fix-terminology-single: Careful Programmer (f5143935), claude:haiku, status=open. Rationale: 'fix' type had insufficient data; Programmer is best semantic match from overall leaderboard; haiku selected via override heuristic (simple mechanical find-and-replace).
- 2026-04-12T18:53:35.417836078+00:00 Task unclaimed: agent 'agent-398' (PID 279284) timed out (no heartbeat)
- 2026-04-12T18:53:35.476487172+00:00 Spawned by coordinator --executor claude --model claude-opus-4-6
- 2026-04-12T18:53:38.103827910+00:00 Task marked as done