quality-pass-20260412t184600

Metadata

Status	done
Assigned	`agent-399`
Agent identity	`3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3`
Created	2026-04-12T18:46:35.251442108+00:00
Started	2026-04-12T18:53:35.476484728+00:00
Completed	2026-04-12T18:53:38.103822039+00:00
Tags	`eval-scheduled`
Eval score	0.83
└ blocking impact	0.80
└ completeness	0.90
└ coordination overhead	0.85
└ correctness	0.85
└ downstream usability	0.80
└ efficiency	0.80
└ intent fidelity	0.81
└ style adherence	0.85

Description

Quality Pass: Post-Triage Review

Review and optimize task metadata for newly created tasks before they enter execution.

Tasks to review

review-and-merge
fix-terminology-single

What to do

For EACH task listed above:

1. Classify task type

Read the task via wg show <task-id>. Classify as one of:

research — Investigation, analysis, library evaluation
implementation — New code, features, endpoints
fix — Bug fixes, error corrections
design — Architecture, API design, planning
test — Test writing, test infrastructure
docs — Documentation, comments, guides
refactor — Code restructuring without behavior change

2. Assign agent identity

Run wg agency stats --by-task-type to see role performance by task type. The Recommendations by Task Type table shows the best role for each type.

Use the recommended role for the task's classified type. If the recommendation says '(insufficient data)', fall back to the overall Role Leaderboard.

For JSON access (machine-readable): wg agency stats --by-task-type --json Look at .task_type_breakdown.recommendations[].best_role.

Apply: wg assign <task-id> <agent-hash>

3. Select model tier

Run wg agency stats --by-task-type and check the Best Model by Task Type table. Use the top-scoring model for the task's classified type.

For JSON access: .task_type_breakdown.recommendations[].best_model.

Override heuristics (when evaluation data is sparse or absent):

Signal	Model
Simple, mechanical, well-defined (e.g., 'add a flag', 'rename X')	haiku
Standard implementation, testing, research	sonnet
Complex design, multi-system reasoning, novel architecture	opus
Task has failed before (check status history)	escalate one tier

Apply: wg edit <task-id> --model <tier>

4. Release for execution

After assigning agent and model: wg resume <task-id>

Validation

Every listed task has an agent assigned (check via wg show)
Every listed task has a model set
Every listed task is un-paused (status: open, not paused)
Assignments are justified by evaluation data, not arbitrary

## Quality Pass: Post-Triage Review

Review and optimize task metadata for newly created tasks before they enter execution.

## Tasks to review
- review-and-merge
- fix-terminology-single

## What to do

For EACH task listed above:

### 1. Classify task type
Read the task via `wg show <task-id>`. Classify as one of:
- **research** — Investigation, analysis, library evaluation
- **implementation** — New code, features, endpoints
- **fix** — Bug fixes, error corrections
- **design** — Architecture, API design, planning
- **test** — Test writing, test infrastructure
- **docs** — Documentation, comments, guides
- **refactor** — Code restructuring without behavior change

### 2. Assign agent identity
Run `wg agency stats --by-task-type` to see role performance by task type.
The **Recommendations by Task Type** table shows the best role for each type.

Use the recommended role for the task's classified type. If the recommendation
says '(insufficient data)', fall back to the overall Role Leaderboard.

For JSON access (machine-readable): `wg agency stats --by-task-type --json`
Look at `.task_type_breakdown.recommendations[].best_role`.

Apply: `wg assign <task-id> <agent-hash>`

### 3. Select model tier
Run `wg agency stats --by-task-type` and check the **Best Model by Task Type**
table. Use the top-scoring model for the task's classified type.

For JSON access: `.task_type_breakdown.recommendations[].best_model`.

Override heuristics (when evaluation data is sparse or absent):

| Signal | Model |
|--------|-------|
| Simple, mechanical, well-defined (e.g., 'add a flag', 'rename X') | haiku |
| Standard implementation, testing, research | sonnet |
| Complex design, multi-system reasoning, novel architecture | opus |
| Task has failed before (check status history) | escalate one tier |

Apply: `wg edit <task-id> --model <tier>`

### 4. Release for execution
After assigning agent and model: `wg resume <task-id>`

## Validation
- Every listed task has an agent assigned (check via `wg show`)
- Every listed task has a model set
- Every listed task is un-paused (status: open, not paused)
- Assignments are justified by evaluation data, not arbitrary

Depends on

done .assign-quality-pass-20260412t184600

Required by

Log

2026-04-12T18:48:13.442817514+00:00 Spawned by coordinator --executor claude --model claude-opus-4-6
2026-04-12T18:48:30.270334927+00:00 Starting quality pass. Reviewing tasks: review-and-merge and fix-terminology-single
2026-04-12T18:49:06.694583158+00:00 Classification: review-and-merge → type 'other' (PR review/merge). fix-terminology-single → type 'fix' (terminology correction).
2026-04-12T18:53:33.599639093+00:00 Quality pass complete. Assignments: - review-and-merge: Careful Programmer (f5143935), claude:sonnet, status=open. Rationale: Programmer role 52335de1 best for 'other' type (0.91 avg); sonnet best model for 'other' (0.93 avg). - fix-terminology-single: Careful Programmer (f5143935), claude:haiku, status=open. Rationale: 'fix' type had insufficient data; Programmer is best semantic match from overall leaderboard; haiku selected via override heuristic (simple mechanical find-and-replace).
2026-04-12T18:53:35.417836078+00:00 Task unclaimed: agent 'agent-398' (PID 279284) timed out (no heartbeat)
2026-04-12T18:53:35.476487172+00:00 Spawned by coordinator --executor claude --model claude-opus-4-6
2026-04-12T18:53:38.103827910+00:00 Task marked as done