Metadata
| Status | done |
|---|---|
| Assigned | agent-1869 |
| Agent identity | 3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3 |
| Created | 2026-05-02T23:54:58.735871575+00:00 |
| Started | 2026-05-03T04:57:57.098729317+00:00 |
| Completed | 2026-05-03T05:39:55.759467165+00:00 |
| Tags | smoke,test,nex,chat,tui, eval-scheduled |
| Eval score | 0.82 |
| └ blocking impact | 0.85 |
| └ completeness | 0.78 |
| └ constraint fidelity | 0.55 |
| └ coordination overhead | 0.85 |
| └ correctness | 0.82 |
| └ downstream usability | 0.80 |
| └ efficiency | 0.85 |
| └ intent fidelity | 0.91 |
| └ style adherence | 0.90 |
Description
Description
Add the permanent simulated-human smoke that exercises the canonical user flow end-to-end. Uses the same tmux + send-keys + capture-pane + wg tui-dump pattern already in tests/smoke/scenarios/tui_chat_redesign_modal_close_persist.sh and tui_chat_switch_keystrokes.sh. Architecture rationale documented in design-nex-chat's design doc Section 2.
Implement directly — do not decompose further.
File scope
- tests/smoke/scenarios/tui_nex_chat_end_to_end.sh (NEW)
- tests/smoke/manifest.toml (ADD scenario entry; do not modify or remove existing entries)
DO NOT touch: any src/ file (smoke test should be passive).
Scenario shape
1. wg init -m claude:opus (project default; chat will override)
2. start daemon (background)
3. tmux new-session -d 'wg tui'
4. send '+' to open new-chat dialog
5. arrow-key to nex preset, OR Tab into AddNew form, type 'qwen3-coder' + endpoint URL
6. Enter to submit
7. wait + capture-pane: assert chat tab "[N]" appears in tab bar
8. send 'hello' + Enter
9. wait up to 60s, capture-pane: assert response text appears in chat pane (not just the prompt echo)
10. tmux kill-session
11. wg --json tui-dump | grep chat persistence proof
12. relaunch tui, capture-pane: assert resume worked
Live-skip pattern: if endpoint unreachable → loud_skip (exit 77) per existing nex_two_message_against_lambda01.sh.
Manifest entry shape
[[scenario]]
name = "tui_nex_chat_end_to_end"
script = "scenarios/tui_nex_chat_end_to_end.sh"
owners = [
"fix-nex-cursor-corruption",
"fix-supervisor-restart-backoff",
"fix-tui-supervisor-coexistence",
"fix-chat-dir-race",
"integrate-nex-chat-end-to-end",
"smoke-tui-nex-end-to-end",
"design-nex-chat",
]
description = "Simulated-human end-to-end smoke: TUI new-chat dialog → nex chat against live endpoint → message+reply → close+reopen → resume."
timeout_seconds = 240
Validation
- Scenario script written
- Manifest entry added
- Pre-fix: scenario FAILs (or LOUD_SKIPs if endpoint unreachable) — proves the test catches the regression
- Post-fix: scenario PASSes against lambda01
-
CI: scenario is included in
wg donesmoke gate via owners list - Documented in tests/smoke/README.md if a new dependency or convention was introduced
Depends on
Required by
- (none)
Log
- 2026-05-02T23:54:58.711730744+00:00 Task paused
- 2026-05-03T00:51:27.907170311+00:00 Task published
- 2026-05-03T04:57:57.098737121+00:00 Spawned by coordinator --executor claude --model opus
- 2026-05-03T04:58:13.715020788+00:00 Starting work: inspecting reference scenarios for TUI tmux+tui-dump pattern
- 2026-05-03T05:00:54.303281453+00:00 Studied launcher key-flow: '+' opens, Down x2 to '+ Add new...', Enter, Right x2 to nex, Tab to model field, type qwen3-coder, Tab to endpoint, type URL, Enter to submit. Chat tab format: '[N] ● .chat-0 ✕'.
- 2026-05-03T05:04:13.593001043+00:00 Wrote tests/smoke/scenarios/tui_nex_chat_end_to_end.sh and added manifest entry. Now running it live to validate.
- 2026-05-03T05:35:34.456355646+00:00 Smoke passed end-to-end against lambda01: launcher dialog → AddNew form filled → submit → graph row → tab switch → chat reply → kill+relaunch+resume.
- 2026-05-03T05:38:36.524526117+00:00 Build clean (cargo build --release ok). Manifest entry verified. Owners list includes all 5 fixes + smoke-tui-nex-end-to-end + design-nex-chat.
- 2026-05-03T05:39:47.175294199+00:00 Committed: a1358033a — pushed to remote
- 2026-05-03T05:39:55.759473307+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
- 2026-05-03T05:42:08.815150045+00:00 PendingEval → Done (evaluator passed; downstream unblocks)