Metadata
| Status | done |
|---|---|
| Assigned | agent-686 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-04-27T14:05:18.425511758+00:00 |
| Started | 2026-04-27T14:05:57.859453357+00:00 |
| Completed | 2026-04-27T14:43:06.235047021+00:00 |
| Tags | eval-scheduled |
| Eval score | 0.90 |
| └ blocking impact | 0.90 |
| └ completeness | 0.95 |
| └ coordination overhead | 0.90 |
| └ correctness | 0.95 |
| └ downstream usability | 0.80 |
| └ efficiency | 0.85 |
| └ intent fidelity | 0.88 |
| └ style adherence | 0.95 |
Description
Description
wg setup --route claude-cli --dry-run produces this config:
[models.default] model = 'claude:sonnet'
[models.evaluator] model = 'claude:sonnet' # 10x cost vs haiku, no quality gain (verified)
[models.assigner] model = 'claude:sonnet' # same
[agent]/[dispatcher] model = 'claude:sonnet' # workers should run opus for real implementation
This is the OPPOSITE of optimal:
- Eval / assign / FLIP work is mostly summarization + scoring — haiku is sufficient, sonnet adds ~10x cost for nearly identical scores (we verified: 0.82 vs 0.84 for same task, same red-flag dimensions)
- Worker agents do the actual implementation — opus is the right default for code quality
User's mental model (verbatim): 'haiku is fine for all this stuff... we should default to that if possible. for local models where cost is much less of an issue, we can run the same model for everything'.
Required defaults per route
| Route | agent.model | evaluator | assigner | tiers |
|---|---|---|---|---|
| claude-cli | claude:opus | claude:haiku | claude:haiku | haiku/sonnet/opus |
| openrouter | (premium opus equivalent) | (cheapest equivalent) | (cheapest) | per-provider 3-tier |
| codex-cli | (codex premium) | (codex cheap) | (codex cheap) | per-provider 3-tier |
| local | local: | local: | local: | (same model everywhere — cost ≈ 0) |
| nex-custom | user-supplied | user-supplied OR same | user-supplied OR same | user-supplied |
For local and nex-custom routes specifically: when the endpoint is local (cost ≈ 0), default to using the SAME model for all roles. No need for tier split.
For paid routes (claude-cli, openrouter, codex-cli): cheap/standard/premium tiers PLUS appropriate role-to-tier mapping.
Files likely to touch
src/config/defaults.rs(or wherever route defaults live — probablysrc/commands/setup.rssince the routes were added there)- The
config_for_route()function (or equivalent) that produces the per-route Config
Out of scope
- Changing the tier scheme itself (haiku/sonnet/opus naming stays)
- Per-task model overrides (already work)
Validation
-
Failing tests first:
- test_route_claude_cli_evaluator_is_haiku —
config_for_route(ClaudeCli).models.evaluator.model == 'claude:haiku' - test_route_claude_cli_assigner_is_haiku
- test_route_claude_cli_agent_is_opus
- test_route_local_uses_same_model_everywhere — given a local route with model X, all role-models point at X
- test_route_claude_cli_evaluator_is_haiku —
- Implementation makes tests pass
- cargo build + cargo test pass with no regressions
-
Manual smoke:
wg setup --route claude-cli --dry-run --yesproduces config with evaluator=haiku, assigner=haiku, agent.model=opuswg setup --route local --model qwen3-coder --url http://lambda01... --dry-run --yesproduces config with all role-models = local:qwen3-coder
Depends on
Required by
- (none)
Log
- 2026-04-27T14:05:18.417834677+00:00 Task paused
- 2026-04-27T14:05:28.640655452+00:00 Task published
- 2026-04-27T14:05:57.579898951+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer with proven performance on correctness-critical config/test work; task requires TDD implementation of model tier split across route configs.
- 2026-04-27T14:05:57.859456333+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-27T14:06:11.812148740+00:00 Starting work — no prior WIP, clean tree. Investigating current setup route defaults.
- 2026-04-27T14:07:20.670117452+00:00 Read config_defaults.rs + integration_setup_routes.rs. Plan: 1) split models routing into role-specific helper, 2) claude-cli agent=opus, eval/assign=haiku, 3) openrouter agent=opus, eval/assign=haiku, 4) codex-cli agent=o1-pro, eval/assign=gpt-5-mini, 5) local + nex-custom unchanged (already same model).
- 2026-04-27T14:42:28.219627570+00:00 Implementation complete. Tests: all 19 config_defaults tests pass (4 new tests including the 4 required by validation), all 15 integration_setup_routes tests pass, full lib suite (2017 tests) passes. Pre-existing unrelated failures: integration_dual_executor compile error (ResumeConfig fields), integration_config 2 tests about [coordinator]→[dispatcher] migration, integration_global_config 2 tests about TOML duplicate field — all reproduced on baseline. Smoke: dry-run for all 5 routes shows correct role split — claude-cli agent=opus eval/assign=haiku; openrouter agent=opus-4 eval/assign=haiku-4; codex-cli agent=o1-pro eval/assign=gpt-5-mini; local + nex-custom same model everywhere.
- 2026-04-27T14:42:58.182451994+00:00 Committed: f0a6c7fb2 — pushed to remote
- 2026-04-27T14:43:06.235052241+00:00 Task marked as done