Metadata
| Status | done |
|---|---|
| Assigned | agent-61 |
| Agent identity | eea940a6f6be13d60578dee27be1f4bade4fcaab05bbbe54b9c5ef4b2d05eae0 |
| Created | 2026-04-26T16:05:31.638122776+00:00 |
| Started | 2026-04-26T17:26:03.129823531+00:00 |
| Completed | 2026-04-26T17:32:35.473163411+00:00 |
| Tags | eval-scheduled |
| Eval score | 0.02 |
| └ blocking impact | 0.00 |
| └ completeness | 0.00 |
| └ coordination overhead | 0.10 |
| └ correctness | 0.00 |
| └ downstream usability | 0.00 |
| └ efficiency | 0.10 |
| └ intent fidelity | 0.02 |
| └ style adherence | 0.00 |
Description
Description
Autopoietic research-into-impl task — investigate the architectural question that surfaced repeatedly this session, then ship a working impl + follow-up subgraph.
The recurring pain: claude executor is reliable because it delegates to the mature claude CLI binary (auth, retries, tool-use, streaming, prompt caching, history all handled outside wg). nex re-implements that whole loop in-process and is fragile (breaks after one message in some configs, see wg-nex-native task). Re-implementing claude CLI is a huge surface — wg shouldn't be in that business.
Phase 1: Research (single agent, produces a research artifact)
Survey OAI-compatible / agentic CLI binaries that could serve as nex's backend the way claude CLI serves claude executor:
- codex CLI (OpenAI) — already an executor option; how complete is it?
- aider (https://aider.chat) — well-maintained, supports many providers, has a CLI
- llama-cli / llama-server (llama.cpp) — for local models
- ollama CLI — for local Ollama
- plandex — agentic CLI
- claude-code itself in non-Anthropic mode (does it support routing to OpenRouter?)
- Any others discovered during research
For each candidate, capture:
- License + maintenance health
- Supported providers / model families
- Streaming / tool-use / file-edit / prompt-cache support
- Stdio / JSON / IPC contract surface (how easy to shim)
- Reliability under multi-turn workloads (the original pain)
- Distribution: easy install? bundled? requires Python/Node?
Output: a research doc docs/research/thin-wrapper-executors-2026-04.md with the survey table + a recommendation (one or two finalists with reasoning).
Phase 2: Autopoietic subgraph (research agent dispatches impl + smoke tasks)
Based on the recommendation in Phase 1, the agent should call wg add to create:
thin-wrapper-impl-<name>— implement the chosen wrapper as a new wg executor (e.g. 'aider' or 'codex-thin'). Spec includes: wg's existing handler shim pattern (see src/commands/claude_handler.rs as template), session-state mapping, error surface. Mark as draft initially; orchestrator publishes when subgraph is complete.thin-wrapper-smoke-<name>— extend wave-1-integration-smoke with a multi-message scenario for the new executor.thin-wrapper-docs-<name>— README + docs/ entry explaining when to use the new executor vs claude vs (deprecated?) nex.- Dependencies: smoke --after impl --after research, docs --after smoke.
Then the agent calls wg publish on the subgraph to fire it.
Phase 3 (implicit): the autopoietic loop continues
If the wrapper succeeds, it likely supersedes nex for OAI-compat use cases. The agent (or follow-up agent) may propose deprecating nex, with a separate task. That's out of scope here — just produce Phase 1 + Phase 2.
Constraints / guidance
- Phase 1 research can use web fetches but MUST cite sources. No hallucinated CLI feature claims.
- Phase 2 impl follows the same TDD pattern as other wave-1 tasks (failing test first, etc).
- Don't gold-plate Phase 1: 6-8 candidates max, table format, ~1-2 page doc.
- Scoring criteria for Phase 1 recommendation: reliability > install-ease > feature breadth > license. Reliability wins because that's the original pain.
Validation
- Phase 1 artifact: docs/research/thin-wrapper-executors-2026-04.md exists, has survey table, has recommendation with reasoning (~1-2 pages, sources cited).
- Phase 2 subgraph: at least 3 follow-up wg tasks created (impl + smoke + docs) with correct dependencies. All three published by the research agent itself (autopoietic).
- cargo build + cargo test pass with no regressions
- Manual smoke (after Phase 2 impl lands separately): the thin-wrapper executor sends 5 messages back-to-back without breaking, against a real OAI-compat endpoint.
Depends on
Required by
- (none)
Log
- 2026-04-26T16:05:31.637477654+00:00 Task paused
- 2026-04-26T16:06:08.503439243+00:00 Task published
- 2026-04-26T16:15:03.868303149+00:00 Spawned by coordinator --executor native --model claude-opus-4-6
- 2026-04-26T16:15:03.883417446+00:00 Task marked as failed: Agent exited with code 1
- 2026-04-26T16:17:15.846640330+00:00 Task reset for retry from failed (attempt #2)
- 2026-04-26T17:26:03.129826607+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-26T17:26:42.671269901+00:00 Starting work. Previous attempt failed due to wrong executor (native-exec without API key); now running on claude executor. User's promoted message overrides Phase 1 scope: skip gold-plating, narrow to {aider, codex, llm, llama-cli}, pick one for pty-wrap impl.
- 2026-04-26T17:27:59.509496951+00:00 Discovery: src/commands/codex_handler.rs already implements the thin-wrapper pattern (spawn per turn, replay history, no long-lived subprocess to supervise). Recommendation crystallized: codex-cli + harden custom OAI-compat endpoint plumbing.
- 2026-04-26T17:29:03.260753672+00:00 Phase 1 doc written: docs/research/thin-wrapper-executors-2026-04.md. Recommendation: harden existing codex-handler for custom OAI-compat endpoints. Now creating Phase 2 subgraph.
- 2026-04-26T17:32:30.907022970+00:00 Committed bc551c6c5 — pushed to remote. Phase 2 subgraph: thin-wrapper-impl (after this) → thin-wrapper-smoke (after impl + this) → thin-wrapper-docs (after smoke). cargo build clean (no new warnings). Marking done.
- 2026-04-26T17:32:35.473174923+00:00 Task marked as done