Metadata
| Status | abandoned |
|---|---|
| Assigned | agent-159 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-04-26T23:00:44.728798063+00:00 |
| Started | 2026-04-26T23:02:44.605706231+00:00 |
| Tags | eval-scheduled |
Description
Description
User insight: wg nex works fine as a TASK agent (spawns, processes, exits). What's broken is the chat-handler integration — wg nex --chat <ref> faults on the second message in a chat session. User quote: 'we can use nex as a task agent. it's just the damn tui integration which should be like falling off a log if we just rebuild it based on how claude is integrated and codex presumably.'
The chat-handler shim ALREADY exists. claude-handler.rs:1-7 says: 'Peer of wg nex --chat <ref>: where nex IS a native handler that speaks chat/*.jsonl directly, this handler spawns the claude CLI'. So nex's chat-handler is wired in but its multi-message handling has a regression.
This is the OPPOSITE direction from the research-into-impl 'thin-wrapper around an external OAI-compat CLI' approach — that's still valid as a future option, but THIS task is the targeted fix for what already exists.
Diagnose
- Reproduce in a scratch dir against lambda01 endpoint:
rm -rf /tmp/wg-nex-chat && mkdir /tmp/wg-nex-chat && cd /tmp/wg-nex-chat wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x nex wg service start wg service create-chat --name test --executor native --model qwen3-coder wg tui # OR programmatic via wg chat send / wg msg send # send msg 1: 'hi' — should get response # send msg 2: 'hi again' — currently faults - Capture the exact stack trace / error from the daemon log + chat session jsonl + handler log when msg 2 faults
- Identify where in the nex chat-handler code path the regression is (inbox cursor mismanagement? session state not carried across turns? tool-result accumulation? message-id collision?)
Fix model: claude-handler.rs
claude-handler.rs gets multi-message handling right. Read its approach:
- inbox cursor: tracks last answered message id; only processes ids > cursor
- session lock: held for handler lifetime; released cleanly on exit
- subprocess: claude CLI process is long-lived across turns; not restarted per message
- restart on crash: supervisor restarts handler; claude process picks up from chat session jsonl
Mirror this structure for the nex chat path. nex doesn't have a separate CLI subprocess (it's in-process), so the equivalent is: keep the nex loop's conversation state alive across inbox polls; don't reinit the LLM client per message.
Files likely to touch
- src/commands/nex.rs (the
wg nex --chatentry point — likely) - src/executor/native/agent.rs (nex's loop)
- src/executor/native/inbox.rs (inbox handling — strong suspect for the bug)
- Compare side-by-side with src/commands/claude_handler.rs as reference
Hard gate
Before claiming done:
- Run the verbatim repro above
- Send AT LEAST 5 messages back-to-back; ALL 5 must produce non-fault responses
- Capture daemon log + handler log + chat session jsonl as evidence
- Add the multi-message scenario to the smoke manifest (per smoke-gate-is) so this regression is locked
Validation
- Failing test first: test_nex_chat_handler_multi_message — programmatic 5-message roundtrip against a stub OAI endpoint; all 5 succeed
- Implementation: identify and fix the regression (likely inbox cursor or session state)
- cargo build + cargo test pass with no regressions
- HARD GATE: live repro against lambda01 produces 5 successful responses; evidence attached
Out of scope
- The thin-wrapper-around-codex/aider approach (separate path via research-into-impl + thin-wrapper-impl tasks)
- TUI dialog fixes for picking nex executor (that's tui-new-coordinator)
- Making nex feature-parity with claude CLI (auth, prompt cache, etc) — just fix the multi-message break
Depends on
Required by
- (none)
Log
- 2026-04-26T23:00:44.725452585+00:00 Task paused
- 2026-04-26T23:01:46.035179763+00:00 Task published
- 2026-04-26T23:02:43.052031761+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer is ideal for this correctness-critical bug fix requiring deep understanding of existing code (claude-handler as reference), identification of inbox/session state regression, and thorough validation (5-message live repro + smoke test).
- 2026-04-26T23:02:44.605713705+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-26T23:02:59.613160423+00:00 Starting investigation: comparing nex chat-handler vs claude-handler.rs structure
- 2026-04-26T23:03:23.208729207+00:00 Reading agent.rs to find chat-loop and multi-message handling
- 2026-04-26T23:04:06.554644291+00:00 Task abandoned