research-into-impl

Research-into-impl: thin-wrapper executor pattern (make nex reliable like claude is)

Metadata

Statusdone
Assignedagent-61
Agent identityeea940a6f6be13d60578dee27be1f4bade4fcaab05bbbe54b9c5ef4b2d05eae0
Created2026-04-26T16:05:31.638122776+00:00
Started2026-04-26T17:26:03.129823531+00:00
Completed2026-04-26T17:32:35.473163411+00:00
Tagseval-scheduled
Eval score0.02
└ blocking impact0.00
└ completeness0.00
└ coordination overhead0.10
└ correctness0.00
└ downstream usability0.00
└ efficiency0.10
└ intent fidelity0.02
└ style adherence0.00

Description

Description

Autopoietic research-into-impl task — investigate the architectural question that surfaced repeatedly this session, then ship a working impl + follow-up subgraph.

The recurring pain: claude executor is reliable because it delegates to the mature claude CLI binary (auth, retries, tool-use, streaming, prompt caching, history all handled outside wg). nex re-implements that whole loop in-process and is fragile (breaks after one message in some configs, see wg-nex-native task). Re-implementing claude CLI is a huge surface — wg shouldn't be in that business.

Phase 1: Research (single agent, produces a research artifact)

Survey OAI-compatible / agentic CLI binaries that could serve as nex's backend the way claude CLI serves claude executor:

  • codex CLI (OpenAI) — already an executor option; how complete is it?
  • aider (https://aider.chat) — well-maintained, supports many providers, has a CLI
  • llama-cli / llama-server (llama.cpp) — for local models
  • ollama CLI — for local Ollama
  • plandex — agentic CLI
  • claude-code itself in non-Anthropic mode (does it support routing to OpenRouter?)
  • Any others discovered during research

For each candidate, capture:

  • License + maintenance health
  • Supported providers / model families
  • Streaming / tool-use / file-edit / prompt-cache support
  • Stdio / JSON / IPC contract surface (how easy to shim)
  • Reliability under multi-turn workloads (the original pain)
  • Distribution: easy install? bundled? requires Python/Node?

Output: a research doc docs/research/thin-wrapper-executors-2026-04.md with the survey table + a recommendation (one or two finalists with reasoning).

Phase 2: Autopoietic subgraph (research agent dispatches impl + smoke tasks)

Based on the recommendation in Phase 1, the agent should call wg add to create:

  • thin-wrapper-impl-<name> — implement the chosen wrapper as a new wg executor (e.g. 'aider' or 'codex-thin'). Spec includes: wg's existing handler shim pattern (see src/commands/claude_handler.rs as template), session-state mapping, error surface. Mark as draft initially; orchestrator publishes when subgraph is complete.
  • thin-wrapper-smoke-<name> — extend wave-1-integration-smoke with a multi-message scenario for the new executor.
  • thin-wrapper-docs-<name> — README + docs/ entry explaining when to use the new executor vs claude vs (deprecated?) nex.
  • Dependencies: smoke --after impl --after research, docs --after smoke.

Then the agent calls wg publish on the subgraph to fire it.

Phase 3 (implicit): the autopoietic loop continues

If the wrapper succeeds, it likely supersedes nex for OAI-compat use cases. The agent (or follow-up agent) may propose deprecating nex, with a separate task. That's out of scope here — just produce Phase 1 + Phase 2.

Constraints / guidance

  • Phase 1 research can use web fetches but MUST cite sources. No hallucinated CLI feature claims.
  • Phase 2 impl follows the same TDD pattern as other wave-1 tasks (failing test first, etc).
  • Don't gold-plate Phase 1: 6-8 candidates max, table format, ~1-2 page doc.
  • Scoring criteria for Phase 1 recommendation: reliability > install-ease > feature breadth > license. Reliability wins because that's the original pain.

Validation

  • Phase 1 artifact: docs/research/thin-wrapper-executors-2026-04.md exists, has survey table, has recommendation with reasoning (~1-2 pages, sources cited).
  • Phase 2 subgraph: at least 3 follow-up wg tasks created (impl + smoke + docs) with correct dependencies. All three published by the research agent itself (autopoietic).
  • cargo build + cargo test pass with no regressions
  • Manual smoke (after Phase 2 impl lands separately): the thin-wrapper executor sends 5 messages back-to-back without breaking, against a real OAI-compat endpoint.

Depends on

Required by

Messages 1 message

  1. #1user2026-04-26T17:25:32.371065160+00:00delivered
    PROMOTED critical, deps removed. The user is actively hitting wg nex breakage in TUI right now (wg nex faults on first message). Their architectural prescription, verbatim: 'WG NEX SHOULD JUST BE RUN IN A DUMB PTY LIKE WE DO CLAUDE CODE. WHY IS THIS SO HARD.' Skip the gold-plated 6-candidate survey — go straight to the pragmatic answer: pick ONE existing OAI-compatible CLI binary that wg can pty-wrap (the way wg wraps the claude CLI). Candidates: aider, codex CLI, llm by Simon Willison, llama-cli. Pick the one most likely to work end-to-end with the user's lambda01 OAI-compat endpoint + qwen3-coder model. Implement the wg executor as a thin pty wrapper. Verify by running the same smoke the user did: wg init -x <new-name> -m qwen3-coder -e https://lambda01..., wg service start, wg tui, send 5 messages — all 5 must produce responses.

Log