Metadata
| Status | done |
|---|---|
| Assigned | agent-2286 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Model | codex:gpt-5.5 |
| Created | 2026-05-04T15:13:17.247944293+00:00 |
| Started | 2026-05-04T15:14:56.873673232+00:00 |
| Completed | 2026-05-04T17:50:23.628879956+00:00 |
| Tags | priority-high,fix,bug,native,evaluator,openrouter, eval-scheduled |
| Eval score | 0.91 |
| └ blocking impact | 0.90 |
| └ completeness | 0.94 |
| └ constraint fidelity | 0.85 |
| └ coordination overhead | 0.86 |
| └ correctness | 0.95 |
| └ downstream usability | 0.90 |
| └ efficiency | 0.82 |
| └ intent fidelity | 0.73 |
| └ style adherence | 0.91 |
Description
Description
Evaluator role assigned to native executor with model='anthropic/claude-haiku-4-5'. The native executor tried OpenRouter (no API key configured), failed, fell back to claude CLI, which ALSO failed.
Live error from .evaluate-weather-monitor-memphis on 2026-05-04 10:10:
wg evaluate exited with code 1
Note: task has no assigned agent — evaluating without role/tradeoff context [lightweight-llm]
native openrouter call failed for role=evaluator model=anthropic/claude-haiku-4-5:
Failed to create OpenAI client for lightweight call:
No API key found for provider 'openrouter'.
— falling back to claude CLI
Error: Evaluation LLM call failed
Caused by: Claude CLI call failed (exit Some(1))
User report 2026-05-04: 'got this bug with native executor.'
Root cause hypothesis
Two stacked problems:
-
Routing: the model spec
anthropic/claude-haiku-4-5(provider/model slash-format) routes to openrouter by default. With no openrouter API key it fails. But claude:haiku is available via the claude CLI directly — we should be using that path, not openrouter. -
Fallback: the claude-CLI fallback ALSO fails (exit 1, no useful error). The fallback may be passing the openrouter-style model name (
anthropic/claude-haiku-4-5) to claude CLI, which doesn't recognize that format. claude CLI expectsopus/sonnet/haiku/claude:opusetc.
Required fix
- For evaluator role on native executor: prefer claude CLI directly (no openrouter detour) when the model is a claude variant
- OR translate
anthropic/claude-haiku-4-5→claude:haikufor the claude-CLI fallback path - OR if we genuinely want openrouter to handle this, surface the no-key error EARLIER with a clear instruction (not a confusing two-step fallback)
Recommend: native executor's eval path should route claude-flavored model specs directly to the claude CLI, bypassing openrouter entirely. The openrouter path is for non-claude models (qwen, gpt, etc.). Don't go through openrouter just to land at claude.
Validation
- Failing test: trigger an evaluator with native executor + claude-flavored model, observe the routing
- Fix: claude-flavored models route directly to claude CLI (no openrouter hop)
-
Live smoke:
.evaluate-Xtask on a task with no openrouter key configured succeeds - Audit: are there OTHER role-on-native paths (assigner, flip) that have the same routing bug? Check.
- cargo build + cargo test pass
- cargo install --path . was run before claiming done
Depends on
Required by
- (none)
Log
- 2026-05-04T15:13:17.010147035+00:00 Task paused
- 2026-05-04T15:14:11.713473250+00:00 Task published
- 2026-05-04T15:14:45.251397189+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=graph, reason=Careful Programmer best suited for correctness-critical native executor routing bug; graph context needed to audit evaluator, assigner, and FLIP paths for same issue
- 2026-05-04T15:14:56.873679914+00:00 Spawned by coordinator --executor codex --model gpt-5.5
- 2026-05-04T15:15:11.482177984+00:00 Starting implementation: inspecting evaluator routing and native lightweight LLM paths
- 2026-05-04T15:21:23.714834157+00:00 Implemented Claude slash-form normalization for agency one-shot dispatch; focused agency_dispatch tests pass
- 2026-05-04T15:33:19.743041444+00:00 Fixed native resume hard-truncation regression uncovered by full cargo test; focused unit and integration compaction tests now pass
- 2026-05-04T16:16:02.147453796+00:00 Resuming WIP; investigating remaining native coordinator test cleanup issue before full validation
- 2026-05-04T16:16:51.445371583+00:00 Patched native coordinator test helpers to scrub parent WG worktree env so spawned test tasks cannot reap this agent worktree
- 2026-05-04T16:20:55.489704805+00:00 Validated targeted routing tests and smoke scenario: agency_dispatch tests pass; evaluator slash-form smoke bypasses OpenRouter and calls claude --model haiku
- 2026-05-04T16:23:50.018579186+00:00 Fixed stale integration_provider expectations for canonical oai-compat provider naming, premium verification default, and current registry IDs; integration_provider now passes
- 2026-05-04T16:26:50.050587097+00:00 Fixed integration_service status assertion for current Dispatcher status output; integration_service now passes
- 2026-05-04T16:30:02.679063721+00:00 Fixed integration_streaming canonical provider-name expectation; integration_streaming now passes
- 2026-05-04T16:32:37.754032826+00:00 Hardened spawn wrapper cleanup guard so inherited parent WG_WORKTREE_PATH cannot mark or reap this agent worktree
- 2026-05-04T16:35:14.259506105+00:00 Validation target was deleted by inherited cleanup env; rerunning build/tests with isolated CARGO_TARGET_DIR and cleanup guard fix
- 2026-05-04T16:37:55.388213649+00:00 Focused validation passed: cargo test test_wrapper_ with isolated CARGO_TARGET_DIR
- 2026-05-04T17:08:28.972806650+00:00 Full cargo test exposed cwd-sandbox assumptions in test_edit_file_edge_cases; updating that fixture to create temp files under target/ inside the repo
- 2026-05-04T17:14:13.057601762+00:00 Updated test_prompt_logging_debug to use ## Validation descriptions instead of removed wg add --verify
- 2026-05-04T17:20:40.728699560+00:00 Updated verify-timeout tests to use current ## Validation flow instead of removed --verify flag
- 2026-05-04T17:35:16.310513924+00:00 Validated: cargo build --quiet passed with isolated CARGO_TARGET_DIR
- 2026-05-04T17:35:41.193080583+00:00 Validated: cargo test --quiet passed with isolated CARGO_TARGET_DIR
- 2026-05-04T17:36:10.072677927+00:00 Validated: evaluator Claude slash smoke scenario passed without OpenRouter key
- 2026-05-04T17:36:36.038452509+00:00 Validated: cargo install --path . completed
- 2026-05-04T17:42:35.078649254+00:00 Committed: 6229e3ed7 — pushed to remote
- 2026-05-04T17:50:08.086319860+00:00 Validated after merge: agency_dispatch tests passed, evaluator smoke passed, git diff --check passed, cargo install --path . completed
- 2026-05-04T17:50:23.628895155+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
- 2026-05-04T17:51:15.332157339+00:00 PendingEval → Done (evaluator passed; downstream unblocks)