wg-nex-chat — Workgraph live mirror

Metadata

Status	abandoned
Assigned	`agent-159`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Created	2026-04-26T23:00:44.728798063+00:00
Started	2026-04-26T23:02:44.605706231+00:00
Tags	`eval-scheduled`

Description

User insight: wg nex works fine as a TASK agent (spawns, processes, exits). What's broken is the chat-handler integration — wg nex --chat <ref> faults on the second message in a chat session. User quote: 'we can use nex as a task agent. it's just the damn tui integration which should be like falling off a log if we just rebuild it based on how claude is integrated and codex presumably.'

The chat-handler shim ALREADY exists. claude-handler.rs:1-7 says: 'Peer of wg nex --chat <ref>: where nex IS a native handler that speaks chat/*.jsonl directly, this handler spawns the claude CLI'. So nex's chat-handler is wired in but its multi-message handling has a regression.

This is the OPPOSITE direction from the research-into-impl 'thin-wrapper around an external OAI-compat CLI' approach — that's still valid as a future option, but THIS task is the targeted fix for what already exists.

Diagnose

Reproduce in a scratch dir against lambda01 endpoint:

rm -rf /tmp/wg-nex-chat && mkdir /tmp/wg-nex-chat && cd /tmp/wg-nex-chat
wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x nex
wg service start
wg service create-chat --name test --executor native --model qwen3-coder
wg tui  # OR programmatic via wg chat send / wg msg send
# send msg 1: 'hi' — should get response
# send msg 2: 'hi again' — currently faults

Capture the exact stack trace / error from the daemon log + chat session jsonl + handler log when msg 2 faults
Identify where in the nex chat-handler code path the regression is (inbox cursor mismanagement? session state not carried across turns? tool-result accumulation? message-id collision?)

Fix model: claude-handler.rs

claude-handler.rs gets multi-message handling right. Read its approach:

inbox cursor: tracks last answered message id; only processes ids > cursor
session lock: held for handler lifetime; released cleanly on exit
subprocess: claude CLI process is long-lived across turns; not restarted per message
restart on crash: supervisor restarts handler; claude process picks up from chat session jsonl

Mirror this structure for the nex chat path. nex doesn't have a separate CLI subprocess (it's in-process), so the equivalent is: keep the nex loop's conversation state alive across inbox polls; don't reinit the LLM client per message.

Files likely to touch

src/commands/nex.rs (the wg nex --chat entry point — likely)
src/executor/native/agent.rs (nex's loop)
src/executor/native/inbox.rs (inbox handling — strong suspect for the bug)
Compare side-by-side with src/commands/claude_handler.rs as reference

Hard gate

Before claiming done:

Run the verbatim repro above
Send AT LEAST 5 messages back-to-back; ALL 5 must produce non-fault responses
Capture daemon log + handler log + chat session jsonl as evidence
Add the multi-message scenario to the smoke manifest (per smoke-gate-is) so this regression is locked

Validation

Failing test first: test_nex_chat_handler_multi_message — programmatic 5-message roundtrip against a stub OAI endpoint; all 5 succeed
Implementation: identify and fix the regression (likely inbox cursor or session state)
cargo build + cargo test pass with no regressions
HARD GATE: live repro against lambda01 produces 5 successful responses; evidence attached

Out of scope

The thin-wrapper-around-codex/aider approach (separate path via research-into-impl + thin-wrapper-impl tasks)
TUI dialog fixes for picking nex executor (that's tui-new-coordinator)
Making nex feature-parity with claude CLI (auth, prompt cache, etc) — just fix the multi-message break

## Description

User insight: `wg nex` works fine as a TASK agent (spawns, processes, exits). What's broken is the **chat-handler integration** — `wg nex --chat <ref>` faults on the second message in a chat session. User quote: 'we can use nex as a task agent. it's just the damn tui integration which should be like falling off a log if we just rebuild it based on how claude is integrated and codex presumably.'

The chat-handler shim ALREADY exists. claude-handler.rs:1-7 says: 'Peer of `wg nex --chat <ref>`: where nex IS a native handler that speaks chat/*.jsonl directly, this handler spawns the claude CLI'. So nex's chat-handler is wired in but its multi-message handling has a regression.

### Diagnose

1. Reproduce in a scratch dir against lambda01 endpoint:
```
rm -rf /tmp/wg-nex-chat && mkdir /tmp/wg-nex-chat && cd /tmp/wg-nex-chat
wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x nex
wg service start
wg service create-chat --name test --executor native --model qwen3-coder
wg tui # OR programmatic via wg chat send / wg msg send
# send msg 1: 'hi' — should get response
# send msg 2: 'hi again' — currently faults
```
2. Capture the exact stack trace / error from the daemon log + chat session jsonl + handler log when msg 2 faults
3. Identify where in the nex chat-handler code path the regression is (inbox cursor mismanagement? session state not carried across turns? tool-result accumulation? message-id collision?)

### Fix model: claude-handler.rs

claude-handler.rs gets multi-message handling right. Read its approach:
- inbox cursor: tracks last answered message id; only processes ids > cursor
- session lock: held for handler lifetime; released cleanly on exit
- subprocess: claude CLI process is long-lived across turns; not restarted per message
- restart on crash: supervisor restarts handler; claude process picks up from chat session jsonl

### Files likely to touch

- src/commands/nex.rs (the `wg nex --chat` entry point — likely)
- src/executor/native/agent.rs (nex's loop)
- src/executor/native/inbox.rs (inbox handling — strong suspect for the bug)
- Compare side-by-side with src/commands/claude_handler.rs as reference

### Hard gate

Before claiming done:
1. Run the verbatim repro above
2. Send AT LEAST 5 messages back-to-back; ALL 5 must produce non-fault responses
3. Capture daemon log + handler log + chat session jsonl as evidence
4. Add the multi-message scenario to the smoke manifest (per smoke-gate-is) so this regression is locked

## Validation

- [ ] Failing test first: test_nex_chat_handler_multi_message — programmatic 5-message roundtrip against a stub OAI endpoint; all 5 succeed
- [ ] Implementation: identify and fix the regression (likely inbox cursor or session state)
- [ ] cargo build + cargo test pass with no regressions
- [ ] HARD GATE: live repro against lambda01 produces 5 successful responses; evidence attached

## Out of scope

- The thin-wrapper-around-codex/aider approach (separate path via research-into-impl + thin-wrapper-impl tasks)
- TUI dialog fixes for picking nex executor (that's tui-new-coordinator)
- Making nex feature-parity with claude CLI (auth, prompt cache, etc) — just fix the multi-message break

Depends on

done .assign-wg-nex-chat

Required by

(none)

Log

2026-04-26T23:00:44.725452585+00:00 Task paused
2026-04-26T23:01:46.035179763+00:00 Task published
2026-04-26T23:02:43.052031761+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer is ideal for this correctness-critical bug fix requiring deep understanding of existing code (claude-handler as reference), identification of inbox/session state regression, and thorough validation (5-message live repro + smoke test).
2026-04-26T23:02:44.605713705+00:00 Spawned by coordinator --executor claude --model opus
2026-04-26T23:02:59.613160423+00:00 Starting investigation: comparing nex chat-handler vs claude-handler.rs structure
2026-04-26T23:03:23.208729207+00:00 Reading agent.rs to find chat-loop and multi-message handling
2026-04-26T23:04:06.554644291+00:00 Task abandoned