fix-evaluator-role

Fix: evaluator role on native executor fails with no-openrouter-key, then claude-CLI fallback also fails

Metadata

Statusdone
Assignedagent-2286
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Modelcodex:gpt-5.5
Created2026-05-04T15:13:17.247944293+00:00
Started2026-05-04T15:14:56.873673232+00:00
Completed2026-05-04T17:50:23.628879956+00:00
Tagspriority-high,fix,bug,native,evaluator,openrouter, eval-scheduled
Eval score0.91
└ blocking impact0.90
└ completeness0.94
└ constraint fidelity0.85
└ coordination overhead0.86
└ correctness0.95
└ downstream usability0.90
└ efficiency0.82
└ intent fidelity0.73
└ style adherence0.91

Description

Description

Evaluator role assigned to native executor with model='anthropic/claude-haiku-4-5'. The native executor tried OpenRouter (no API key configured), failed, fell back to claude CLI, which ALSO failed.

Live error from .evaluate-weather-monitor-memphis on 2026-05-04 10:10:

wg evaluate exited with code 1
Note: task has no assigned agent — evaluating without role/tradeoff context [lightweight-llm]
native openrouter call failed for role=evaluator model=anthropic/claude-haiku-4-5: 
  Failed to create OpenAI client for lightweight call: 
  No API key found for provider 'openrouter'.
— falling back to claude CLI
Error: Evaluation LLM call failed
Caused by: Claude CLI call failed (exit Some(1))

User report 2026-05-04: 'got this bug with native executor.'

Root cause hypothesis

Two stacked problems:

  1. Routing: the model spec anthropic/claude-haiku-4-5 (provider/model slash-format) routes to openrouter by default. With no openrouter API key it fails. But claude:haiku is available via the claude CLI directly — we should be using that path, not openrouter.

  2. Fallback: the claude-CLI fallback ALSO fails (exit 1, no useful error). The fallback may be passing the openrouter-style model name (anthropic/claude-haiku-4-5) to claude CLI, which doesn't recognize that format. claude CLI expects opus / sonnet / haiku / claude:opus etc.

Required fix

  • For evaluator role on native executor: prefer claude CLI directly (no openrouter detour) when the model is a claude variant
  • OR translate anthropic/claude-haiku-4-5claude:haiku for the claude-CLI fallback path
  • OR if we genuinely want openrouter to handle this, surface the no-key error EARLIER with a clear instruction (not a confusing two-step fallback)

Recommend: native executor's eval path should route claude-flavored model specs directly to the claude CLI, bypassing openrouter entirely. The openrouter path is for non-claude models (qwen, gpt, etc.). Don't go through openrouter just to land at claude.

Validation

  • Failing test: trigger an evaluator with native executor + claude-flavored model, observe the routing
  • Fix: claude-flavored models route directly to claude CLI (no openrouter hop)
  • Live smoke: .evaluate-X task on a task with no openrouter key configured succeeds
  • Audit: are there OTHER role-on-native paths (assigner, flip) that have the same routing bug? Check.
  • cargo build + cargo test pass
  • cargo install --path . was run before claiming done

Depends on

Required by

Log