wg-setup-routes — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-686`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Created	2026-04-27T14:05:18.425511758+00:00
Started	2026-04-27T14:05:57.859453357+00:00
Completed	2026-04-27T14:43:06.235047021+00:00
Tags	`eval-scheduled`
Eval score	0.90
└ blocking impact	0.90
└ completeness	0.95
└ coordination overhead	0.90
└ correctness	0.95
└ downstream usability	0.80
└ efficiency	0.85
└ intent fidelity	0.88
└ style adherence	0.95

Description

wg setup --route claude-cli --dry-run produces this config:

[models.default]   model = 'claude:sonnet'
[models.evaluator] model = 'claude:sonnet'   # 10x cost vs haiku, no quality gain (verified)
[models.assigner]  model = 'claude:sonnet'   # same
[agent]/[dispatcher] model = 'claude:sonnet' # workers should run opus for real implementation

This is the OPPOSITE of optimal:

Eval / assign / FLIP work is mostly summarization + scoring — haiku is sufficient, sonnet adds ~10x cost for nearly identical scores (we verified: 0.82 vs 0.84 for same task, same red-flag dimensions)
Worker agents do the actual implementation — opus is the right default for code quality

User's mental model (verbatim): 'haiku is fine for all this stuff... we should default to that if possible. for local models where cost is much less of an issue, we can run the same model for everything'.

Required defaults per route

Route	agent.model	evaluator	assigner	tiers
claude-cli	claude:opus	claude:haiku	claude:haiku	haiku/sonnet/opus
openrouter	(premium opus equivalent)	(cheapest equivalent)	(cheapest)	per-provider 3-tier
codex-cli	(codex premium)	(codex cheap)	(codex cheap)	per-provider 3-tier
local	local:	local:	local:	(same model everywhere — cost ≈ 0)
nex-custom	user-supplied	user-supplied OR same	user-supplied OR same	user-supplied

For local and nex-custom routes specifically: when the endpoint is local (cost ≈ 0), default to using the SAME model for all roles. No need for tier split.

For paid routes (claude-cli, openrouter, codex-cli): cheap/standard/premium tiers PLUS appropriate role-to-tier mapping.

Files likely to touch

src/config/defaults.rs (or wherever route defaults live — probably src/commands/setup.rs since the routes were added there)
The config_for_route() function (or equivalent) that produces the per-route Config

Out of scope

Changing the tier scheme itself (haiku/sonnet/opus naming stays)
Per-task model overrides (already work)

Validation

Failing tests first:
- test_route_claude_cli_evaluator_is_haiku — config_for_route(ClaudeCli).models.evaluator.model == 'claude:haiku'
- test_route_claude_cli_assigner_is_haiku
- test_route_claude_cli_agent_is_opus
- test_route_local_uses_same_model_everywhere — given a local route with model X, all role-models point at X
Implementation makes tests pass
cargo build + cargo test pass with no regressions
Manual smoke:
- wg setup --route claude-cli --dry-run --yes produces config with evaluator=haiku, assigner=haiku, agent.model=opus
- wg setup --route local --model qwen3-coder --url http://lambda01... --dry-run --yes produces config with all role-models = local:qwen3-coder

## Description

`wg setup --route claude-cli --dry-run` produces this config:
```
[models.default]   model = 'claude:sonnet'
[models.evaluator] model = 'claude:sonnet'   # 10x cost vs haiku, no quality gain (verified)
[models.assigner]  model = 'claude:sonnet'   # same
[agent]/[dispatcher] model = 'claude:sonnet' # workers should run opus for real implementation
```

This is the OPPOSITE of optimal:
- Eval / assign / FLIP work is mostly summarization + scoring — **haiku is sufficient**, sonnet adds ~10x cost for nearly identical scores (we verified: 0.82 vs 0.84 for same task, same red-flag dimensions)
- Worker agents do the actual implementation — **opus is the right default** for code quality

User's mental model (verbatim): 'haiku is fine for all this stuff... we should default to that if possible. for local models where cost is much less of an issue, we can run the same model for everything'.

### Required defaults per route

| Route        | agent.model     | evaluator        | assigner         | tiers                    |
|--------------|-----------------|------------------|------------------|--------------------------|
| claude-cli   | claude:opus     | claude:haiku     | claude:haiku     | haiku/sonnet/opus        |
| openrouter   | (premium opus equivalent) | (cheapest equivalent) | (cheapest) | per-provider 3-tier |
| codex-cli    | (codex premium) | (codex cheap)    | (codex cheap)    | per-provider 3-tier      |
| local        | local:<model>   | local:<model>    | local:<model>    | (same model everywhere — cost ≈ 0) |
| nex-custom   | user-supplied   | user-supplied OR same | user-supplied OR same | user-supplied            |

For `local` and `nex-custom` routes specifically: when the endpoint is local (cost ≈ 0), default to using the SAME model for all roles. No need for tier split.

For paid routes (claude-cli, openrouter, codex-cli): cheap/standard/premium tiers PLUS appropriate role-to-tier mapping.

### Files likely to touch

- `src/config/defaults.rs` (or wherever route defaults live — probably `src/commands/setup.rs` since the routes were added there)
- The `config_for_route()` function (or equivalent) that produces the per-route Config

### Out of scope

- Changing the tier scheme itself (haiku/sonnet/opus naming stays)
- Per-task model overrides (already work)

## Validation

- [ ] Failing tests first:
  - test_route_claude_cli_evaluator_is_haiku — `config_for_route(ClaudeCli).models.evaluator.model == 'claude:haiku'`
  - test_route_claude_cli_assigner_is_haiku
  - test_route_claude_cli_agent_is_opus
  - test_route_local_uses_same_model_everywhere — given a local route with model X, all role-models point at X
- [ ] Implementation makes tests pass
- [ ] cargo build + cargo test pass with no regressions
- [ ] Manual smoke:
  - `wg setup --route claude-cli --dry-run --yes` produces config with evaluator=haiku, assigner=haiku, agent.model=opus
  - `wg setup --route local --model qwen3-coder --url http://lambda01... --dry-run --yes` produces config with all role-models = local:qwen3-coder

Depends on

done .assign-wg-setup-routes

Required by

(none)

Log

2026-04-27T14:05:18.417834677+00:00 Task paused
2026-04-27T14:05:28.640655452+00:00 Task published
2026-04-27T14:05:57.579898951+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer with proven performance on correctness-critical config/test work; task requires TDD implementation of model tier split across route configs.
2026-04-27T14:05:57.859456333+00:00 Spawned by coordinator --executor claude --model opus
2026-04-27T14:06:11.812148740+00:00 Starting work — no prior WIP, clean tree. Investigating current setup route defaults.
2026-04-27T14:07:20.670117452+00:00 Read config_defaults.rs + integration_setup_routes.rs. Plan: 1) split models routing into role-specific helper, 2) claude-cli agent=opus, eval/assign=haiku, 3) openrouter agent=opus, eval/assign=haiku, 4) codex-cli agent=o1-pro, eval/assign=gpt-5-mini, 5) local + nex-custom unchanged (already same model).
2026-04-27T14:42:28.219627570+00:00 Implementation complete. Tests: all 19 config_defaults tests pass (4 new tests including the 4 required by validation), all 15 integration_setup_routes tests pass, full lib suite (2017 tests) passes. Pre-existing unrelated failures: integration_dual_executor compile error (ResumeConfig fields), integration_config 2 tests about [coordinator]→[dispatcher] migration, integration_global_config 2 tests about TOML duplicate field — all reproduced on baseline. Smoke: dry-run for all 5 routes shows correct role split — claude-cli agent=opus eval/assign=haiku; openrouter agent=opus-4 eval/assign=haiku-4; codex-cli agent=o1-pro eval/assign=gpt-5-mini; local + nex-custom same model everywhere.
2026-04-27T14:42:58.182451994+00:00 Committed: f0a6c7fb2 — pushed to remote
2026-04-27T14:43:06.235052241+00:00 Task marked as done