verify-fix-nex — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-2109`
Agent identity	`3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3`
Model	`codex:gpt-5.5`
Created	2026-05-03T21:30:47.630194808+00:00
Started	2026-05-04T01:41:37.050985498+00:00
Completed	2026-05-04T01:48:19.153139648+00:00
Tags	`priority-critical,verify,smoke,nex,chat`, `eval-scheduled`
Eval score	0.95
└ blocking impact	0.92
└ completeness	1.00
└ constraint fidelity	0.85
└ coordination overhead	0.98
└ correctness	0.95
└ downstream usability	0.95
└ efficiency	0.90
└ intent fidelity	0.85
└ style adherence	0.95

Description

Validation gate before B can run. Verify the narrow fix from fix-nex-chat-mirror actually works end-to-end by running the simulated-human-in-TUI smoke against the live tailnet endpoint.

What to verify

The full canonical user flow on lambda01/qwen3-coder-30b:

pkill -f 'wg tui' && cargo install --path . && wg tui — fresh process on fresh binary
Open new-chat dialog
Pick nex executor, model=qwen3-coder-30b, endpoint=https://lambda01.tail334fe6.ts.net:30000
Click Launch
Type 'hi' in the new chat tab
ASSERT: response arrives within 30s
Type a follow-up question
ASSERT: response continues coherently
Exit TUI (Ctrl+C or quit)
Restart TUI
ASSERT: nex chat tab is reattached, prior conversation visible
Send another message
ASSERT: response continues from prior context

If ALL of these pass: A is verified. B can proceed.

If ANY of these fail: A is not actually fixed; do NOT advance to B. Document the specific failure mode in the task log + decide whether to re-attempt A or escalate.

Test mechanism

The simulated-human smoke harness from smoke-tui-nex-end-to-end (filed earlier). If that harness exists and is functional, use it. If not, file a manual-verification result with screenshot/text-capture evidence.

Validation

All 13 steps above executed against the user's real endpoint
Each step's expected outcome confirmed (with capture: tmux pane text, daemon log excerpt, etc.)
Verdict: PASS (advances to B) OR FAIL (with specific failure mode documented)
No source / doc modifications — verification only
cargo install --path . was run before testing — verify the binary actually has fix-nex-chat-mirror's commit

## Description
Validation gate before B can run. Verify the narrow fix from fix-nex-chat-mirror actually works end-to-end by running the simulated-human-in-TUI smoke against the live tailnet endpoint.

## What to verify

The full canonical user flow on lambda01/qwen3-coder-30b:
1. `pkill -f 'wg tui' && cargo install --path . && wg tui` — fresh process on fresh binary
2. Open new-chat dialog
3. Pick nex executor, model=qwen3-coder-30b, endpoint=https://lambda01.tail334fe6.ts.net:30000
4. Click Launch
5. Type 'hi' in the new chat tab
6. ASSERT: response arrives within 30s
7. Type a follow-up question
8. ASSERT: response continues coherently
9. Exit TUI (Ctrl+C or quit)
10. Restart TUI
11. ASSERT: nex chat tab is reattached, prior conversation visible
12. Send another message
13. ASSERT: response continues from prior context

If ALL of these pass: A is verified. B can proceed.

If ANY of these fail: A is not actually fixed; do NOT advance to B. Document the specific failure mode in the task log + decide whether to re-attempt A or escalate.

## Test mechanism
The simulated-human smoke harness from smoke-tui-nex-end-to-end (filed earlier). If that harness exists and is functional, use it. If not, file a manual-verification result with screenshot/text-capture evidence.

## Validation
- [ ] All 13 steps above executed against the user's real endpoint
- [ ] Each step's expected outcome confirmed (with capture: tmux pane text, daemon log excerpt, etc.)
- [ ] Verdict: PASS (advances to B) OR FAIL (with specific failure mode documented)
- [ ] No source / doc modifications — verification only
- [ ] cargo install --path . was run before testing — verify the binary actually has fix-nex-chat-mirror's commit

Depends on

Required by

done implement-generalize-chat

Log

2026-05-03T21:30:47.610728029+00:00 Task paused
2026-05-03T21:31:53.101236538+00:00 Task published
2026-05-03T22:00:20.523522061+00:00 Spawned by coordinator --executor codex --model gpt-5.5
2026-05-03T22:00:37.788612144+00:00 Starting verification: checking smoke harness and installing fresh wg binary before live lambda nex TUI flow.
2026-05-03T22:01:03.786751399+00:00 Found existing smoke harness tests/smoke/scenarios/tui_nex_chat_end_to_end.sh; it covers launcher/new nex chat/live response/relaunch reattach, but I will add manual coverage for follow-up and post-restart continuation if needed.
2026-05-03T22:04:03.150618859+00:00 Validated install prerequisite: cargo install --path . completed from HEAD 2a63fd971, replacing global wg at /home/erik/.cargo/bin/wg.
2026-05-03T22:10:54.952724054+00:00 VERDICT: FAIL. Permanent harness tui_nex_chat_end_to_end.sh PASSed against lambda01/qwen3-coder after cargo install, proving launcher -> live first response -> relaunch reattach. Additional canonical manual coverage found a repeatable step 13 failure: after restart, .chat-1 is reattached and prior conversation/token is visible, but a new post-restart prompt is accepted/displayed and no model response appears within 90s. Pane evidence: prior 'remembered verify-nex-2087b' visible, then '> What token did I ask you to remember? Reply with only the token.' remains unanswered in [PTY] mode. A is not fully verified; B must not proceed.
2026-05-03T22:11:11.423788467+00:00 Filed follow-up fix-nex-tui and added it as an explicit dependency of implement-generalize-chat, so B remains blocked until the post-restart response bug is fixed and reverified.
2026-05-03T22:11:17.992581393+00:00 Task marked as failed: Verification FAIL: live TUI nex flow passed through create, hi response, follow-up response, exit/restart, and prior conversation visible, but after restart a new prompt in the reattached .chat-1 [PTY] pane did not receive any model response within 90s. Filed fix-nex-tui and blocked implement-generalize-chat on it.
2026-05-04T01:41:31.859721050+00:00 Task reset for retry from failed (attempt #2)
2026-05-04T01:41:37.050988825+00:00 Spawned by coordinator --executor codex --model gpt-5.5
2026-05-04T01:42:06.442567780+00:00 Retry started: prior attempt failed at post-restart nex response; checking fix-nex-tui status and current branch before rerunning live verification.
2026-05-04T01:42:25.488996760+00:00 fix-nex-tui is done at cb30d1da5; current verify worktree remains pre-fix, so live verification will run from the fixed worktree /home/erik/workgraph/.wg-worktrees/agent-2090 without source edits in verify-fix-nex.
2026-05-04T01:45:51.368338657+00:00 Validated install prerequisite: cargo install --path . completed from fixed worktree at commit cb30d1da5, replacing /home/erik/.cargo/bin/wg.
2026-05-04T01:48:04.759007019+00:00 VERDICT: PASS on retry. After installing fixed commit cb30d1da5, ran tests/smoke/scenarios/tui_nex_chat_post_restart_response.sh with WG_LIVE_NEX_MODEL=qwen3-coder-30b and WG_LIVE_NEX_ENDPOINT=https://lambda01.tail334fe6.ts.net:30000. Harness confirmed launcher opened, nex chat created, first prompt rendered a reply, second prompt rendered a coherent short reply, TUI exited/restarted, .chat-1 reattached with prior conversation visible, and post-restart prompt rendered a reply. This covers canonical steps 1-13; A is verified and B can proceed.
2026-05-04T01:48:19.153142704+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-05-04T01:49:48.815484159+00:00 PendingEval → Done (evaluator passed; downstream unblocks)