native-executor-client

Native executor client init demands ANTHROPIC_API_KEY even for OpenAI-compatible models (qwen3-coder via lambda01) — autohaiku 100% blocked

Metadata

Statusdone
Assignedagent-896
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-04-27T22:00:26.473445511+00:00
Started2026-04-27T22:22:52.910039062+00:00
Completed2026-04-27T22:45:37.181123057+00:00
Tagseval-scheduled
Tokens7594732 in / 35083 out

Description

Description (rewritten 2026-04-27 evening — user clarification)

The user's contract: wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 --executor nex MUST be sufficient to make tasks run on that endpoint. No env vars. No follow-up config edits. No [native_executor] api_key workaround. If the endpoint actually requires a key, the user will configure it via workgraph config (not env vars), and only when the endpoint rejects with 401.

User quote (exact): 'I want the init command that I uttered to be sufficient to set it up so that it works with that endpoint. Missing the key is not a problem. It's only a problem... like we'll see if we need a key. And it shouldn't be that I have to set a random ass environmental variable and maintain that. It should be in our config.'

Bug

Today (2026-04-27) the native executor crashes immediately when running any task whose model resolves to a non-Anthropic provider:

[native-exec] Starting agent loop for task '...' with model 'qwen3-coder', exec_mode 'full', max_turns 100
Error: Failed to initialize OpenAI-compatible client

Caused by:
    No Anthropic API key found. Set ANTHROPIC_API_KEY environment variable, add [native_executor] api_key to .workgraph/config.toml, or create ~/.config/anthropic/api_key

[wrapper] Agent exited with code 1

The error is wrong on every axis:

  1. Model is qwen3-coder via OAI-compat — Anthropic key is irrelevant.
  2. The Tailscale endpoint at lambda01.tail334fe6.ts.net does not require a key. Workgraph should not pre-emptively reject.
  3. Demanding env vars violates the user's hard requirement: credentials live in workgraph config, period.

Earlier 'fix' tasks (agency-picks-claude, wg-nex-resume-311, chat-agent-loops-2, wg-evaluate-record — all marked Done) addressed plan resolution upstream but did NOT fix this client init precondition. Native executor remains 100% broken for autohaiku.

Required (this is the contract — do not narrow it)

1. No env-var fallback in the credential path

The native executor must NOT consult ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENROUTER_API_KEY, or any other env var when initializing a client. Any such lookup is a bug. Credentials come from workgraph config exclusively.

2. No precondition check that gates on key presence

Client init must succeed when no key is configured. The HTTP layer just won't send an Authorization header. If the endpoint requires one and rejects with 401, surface THAT error to the user with a message that points at the right config block to add the key.

This means the current code path that errors with 'No Anthropic API key found ... before any HTTP call' is wrong by design and must be removed for all providers, not just OAI-compat. Even for direct anthropic:* / claude:* models, the right behavior is 'try without; if 401, surface a clear error pointing at config'.

3. wg init is sufficient

After running:

wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 --executor nex

the resulting config must let wg service start dispatch tasks against that endpoint with no further user action. If the endpoint requires a key, wg init should accept it via flag (--api-key or similar) and write it into the config. If wg init is invoked without a key flag, no key is configured, and that's fine.

4. Per-endpoint credentials in [llm_endpoints.<name>]

Where credentials ARE configured, they live in the endpoint block:

[[llm_endpoints.endpoints]]
name = "lambda01"
provider = "oai-compat"
url = "https://lambda01.tail334fe6.ts.net:30000"
api_key = "..."   # OPTIONAL. Empty/absent = no auth header sent.

The native executor pulls the key for the endpoint it is about to call — not from a global env var, not from a generic [native_executor] block, not from ~/.config/anthropic/api_key.

5. Error messages name the right config path

When the endpoint rejects (401/403), the user-facing error must say something like 'Endpoint rejected request — set api_key under [llm_endpoints.] in .' Never reference env vars in the user-facing error.

Files likely to touch

  • src/executors/native/ — client init for OAI-compat / openrouter / anthropic. Remove the 'require key on init' guard. Remove env var lookups.
  • src/dispatch/handler_for_model.rs — provider resolution; ensure provider derives correctly from model prefix and routes to the right endpoint config block.
  • src/config/ — ensure [llm_endpoints.*] per-endpoint api_key is read; remove [native_executor] api_key if it exists OR keep as last-resort but document it as deprecated; remove ~/.config/anthropic/api_key file lookup entirely.
  • src/commands/init.rs — ensure wg init -m <model> -e <url> --executor nex writes a complete [llm_endpoints.endpoints] block; accept optional --api-key flag to populate it.
  • All error message construction sites that reference ANTHROPIC_API_KEY env var → rewrite to point at config.

Validation (live smoke required — per memory feedback_assertion_driven_live_smoke)

  • Failing test first: native executor + model=local:qwen3-coder + endpoint configured + NO env vars + NO api_key in config → client init succeeds; HTTP call goes out without Authorization header; if endpoint accepts (Tailscale does), task runs.
  • Failing test for 401 path: endpoint configured to require key + no key in config → client init STILL succeeds; HTTP call returns 401; user-visible error names the [llm_endpoints.] config block, NOT env vars.
  • Failing test for env var ignored: ANTHROPIC_API_KEY=ignored-junk in env + model=local:qwen3-coder + key configured in [llm_endpoints.lambda01] → request uses the configured key, NOT the env var.
  • Failing test for wg init: wg init -m qwen3-coder -e https://example.invalid --executor nex in a tmpdir, then wg list runs without errors; config has a complete [llm_endpoints.endpoints] block for that endpoint.
  • grep src/ for ANTHROPIC_API_KEY / OPENAI_API_KEY / OPENROUTER_API_KEY env var lookups → only allowed in tests/ or migration code; main runtime never reads them.
  • Live smoke against /home/erik/autohaiku: with current config (no env vars), wg --dir /home/erik/autohaiku/.wg spawn-task quality-pass-haiku-system-setup reaches at least 'in-progress' without an exit-1 due to client init.
  • cargo build + cargo test pass with no regressions
  • Update wg-nex / autohaiku setup docs to reflect: 'credentials are configured in workgraph config under [llm_endpoints.], never env vars; wg init writes the right block; missing key is fine unless the endpoint rejects'

Depends on

Required by

Messages 2 messages (all seen)

  1. #1user2026-04-27T22:03:16.731100443+00:00read
    Additional user requirements (2026-04-27 evening, after task description rewrite):
    
    ### Setting an API key when one IS needed
    
    The user wants two ergonomic paths to set a key — both targeting workgraph config (no env vars):
    
    1. **`wg nex --api-key <key> ...`** — accept a key on the wg nex (chat handler) command line; if specified, persist it to the right endpoint block in the active config.
    
    2. **`wg config --api-key <key>`** OR `wg config set llm_endpoints.<name>.api_key <key>`** — a config-management command that writes the key to the right [llm_endpoints.<name>] block. Pick whichever shape fits the existing wg config command surface.
    
    3. **`wg init --api-key <key>`** — already implied by the rewritten task; if the user provides an endpoint that needs a key at init time, accept it via flag and write it to the [llm_endpoints.endpoints] block as part of the init.
    
    In ALL three paths the destination is the same: the api_key field on the [llm_endpoints.<name>] block in the project's wg config. NEVER an env var. NEVER a side file (`~/.config/anthropic/api_key` etc).
    
    ### Test the graph for this end-to-end
    
    User: 'make sure that they're testing the graph for this kind of stuff.'
    
    Translation: don't just unit-test the credential resolution function. Add a real integration test that:
    - runs `wg init -m qwen3-coder -e <url> --executor nex` in a tmp dir
    - runs `wg add 'test task' -d 'echo hello'` (or whatever a no-op smoke task looks like)
    - runs `wg service start --max-agents 1` against a fake/local OAI-compat endpoint
    - waits for the task to dispatch and reach in-progress
    - asserts the dispatched agent's HTTP call went out, no Authorization header was sent (since no key configured), endpoint accepted (since fake endpoint accepts unauth)
    - asserts the agent process did NOT die at client init time
    - runs another variant with `--api-key x` on the init OR via `wg config --api-key` afterwards; asserts the dispatched agent's HTTP call now sends the configured key as bearer auth
    
    This pattern (init → add → spawn → assert outcome from real HTTP) is the only way to catch the kind of regression we keep shipping. Per memory feedback_assertion_driven_live_smoke: live endpoint, behavioral assertions, versioned script. Add it under `tests/smoke/scenarios/` and list this task id in `owners` per the smoke gate doc in CLAUDE.md.
    
    If this can also be its own grow-only smoke scenario file (not just a Rust test), that's better — it survives across releases as a long-term regression catcher.
  2. #2native-executor-client2026-04-27T22:09:15.359541919+00:00read
    Acknowledged — I've reproduced the bug. Will fix the core issue (provider.rs:384 fallback calling Anthropic key resolver from OAI-compat path) + misleading error, AND add the smoke scenario per your direction. The --api-key CLI ergonomics (nex/init/config flags writing to llm_endpoints.<name>.api_key) is broader scope; will scope-check after the core fix lands.

Log