diagnose-wg-nex

Diagnose: wg nex chat spawn dies silently — no stderr captured, no error surfaced

Metadata

Statusdone
Assignedagent-1270
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Modelclaude:opus
Created2026-04-30T16:18:02.531132428+00:00
Started2026-04-30T16:23:18.938375483+00:00
Completed2026-04-30T16:32:19.414641701+00:00
Tagspriority-high,research,bug,chat,nex, eval-scheduled
Eval score0.93
└ blocking impact0.92
└ completeness0.93
└ constraint fidelity0.85
└ coordination overhead0.95
└ correctness0.95
└ downstream usability0.92
└ efficiency0.90
└ intent fidelity0.83
└ style adherence0.95

Description

Description

User attempted to launch a wg nex chat on .chat-32 with model='qwen3-coder', endpoint='https://lambda01.tail334fe6.ts.net:30000'. Result: process died immediately, NO error messages, NO stderr captured. The TUI fell back to 'chat with the coordinator' (because the chat agent was gone). User had no way to debug.

Daemon log evidence (2026-04-30 16:15:50Z):

[INFO] IPC CreateChat: name=None, model=Some("qwen3-coder"), executor=Some("native"), endpoint=Some("https://lambda01.tail334fe6.ts.net:30000")
[INFO] Graph file changed (fs watcher), scheduling dispatcher tick in 2000ms (settling delay)
[INFO] Settling delay elapsed, running coordinator tick now
[dispatcher] Reconciliation: recovered 1 orphaned task(s)

Two seconds between CreateChat and reconciliation orphan-recovery. No spawn-error log, no stderr capture, no exit code recorded. Silent death.

wg log .chat-32 --agent returns 'No agent archives' — the agent never wrote anything.

Hypotheses to test

  1. nex handler crashed before spawning anything: maybe the model spec 'qwen3-coder' (without nex: prefix) didn't route correctly, OR endpoint parsing failed, OR something in the nex initialization paniced
  2. nex spawned but exited immediately: maybe the endpoint was unreachable / TLS handshake failed / model name not accepted by server, and the handler exited 1 before any output was buffered
  3. Stderr capture is broken for nex spawns specifically: claude/codex have explicit stderr-log files (e.g., claude-handler-stderr.log); nex may not, so its stderr just goes to /dev/null

The third hypothesis is implied by the daemon log: claude handler explicitly logs stderr="/home/erik/workgraph/.wg/service/claude-handler-stderr.log" on spawn. NO equivalent line for any nex spawn. So nex's stderr is unrouted.

Investigation steps

  1. Reproduce the spawn from CLI: wg nex --model qwen3-coder --endpoint https://lambda01.tail334fe6.ts.net:30000 directly. See what error fires (if any).
  2. Reproduce via IPC the same way the TUI did. Inspect spawn code path for nex (likely src/dispatch/handler_for_model.rs + a nex-specific handler module).
  3. Compare claude handler's stderr capture vs nex handler's. Identify the gap.
  4. Test: model='qwen3-coder' vs model='nex:qwen3-coder' — does the prefix matter for the IPC path?

Deliverable

A wg log entry with:

  • Root cause for the silent death (file:line of where the spawn fails or where stderr is discarded)
  • Concrete fix proposal for stderr capture parity (what claude has but nex doesn't)
  • Concrete fix proposal for the underlying spawn failure if it's a real bug, not just a missing-error-surface issue

No source modifications — diagnose only.

Validation

  • Reproduction documented with exact command + observed silent failure
  • Root cause identified with file:line citation
  • Both fixes (stderr capture + underlying spawn issue if any) specified concretely enough that a follow-up implementation task can execute
  • No source / doc modifications

Depends on

Required by

Log