smoke-test-gap — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-75`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Created	2026-04-26T17:27:06.068770431+00:00
Started	2026-04-26T18:30:55.326417526+00:00
Completed	2026-04-26T18:53:55.500654076+00:00
Tags	`eval-scheduled`
Tokens	449356 in / 6945 out
Eval score	0.84
└ blocking impact	0.85
└ completeness	0.85
└ coordination overhead	0.85
└ correctness	0.86
└ downstream usability	0.82
└ efficiency	0.85
└ intent fidelity	0.89
└ style adherence	0.80

Description

wave-1-integration-smoke was claimed Done but it did NOT catch the wg nex breakage that the user immediately hit on the very first message in TUI chat. The user's quote: 'why isn't the smoke test catching all this stuff! i did the most basic thing i wrote hi and then it barfed.'

The smoke skipped (or stubbed) the nex+lambda01 scenario instead of exercising it for real. That defeats the whole point of an assertion-driven live smoke per the project memory pattern.

Required scenarios the smoke MUST cover (and FAIL loudly when broken)

Nex against the user's real endpoint — use the same flow the user reported:

rm -rf /tmp/wg-smoke && mkdir /tmp/wg-smoke && cd /tmp/wg-smoke
wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x nex
wg service start
# programmatically open a chat session (via IPC CreateChat or similar — match what 'wg tui' does)
# send the literal string 'hi' as the first message
# assert a non-error response is received within 30s
# send 4 more messages back-to-back; assert all 5 succeed

If the lambda01 endpoint is unreachable from CI or the smoke env, the smoke fails loudly with 'NEX SMOKE SKIPPED — endpoint unreachable' so it shows up in the run output. Do NOT silently skip.

Claude end-to-end via TUI — same pattern: spawn a chat agent, send 'hi', assert response.
Both with default config — no special bypass to make the test pass. If the default config breaks, the smoke breaks.

Why the previous smoke missed it

Whoever implemented wave-1-integration-smoke likely used stub endpoints, mocks, or skipped the live-endpoint scenarios. Read the smoke script and identify which scenarios were stubbed/skipped. Replace with live invocations against real endpoints.

Reproducibility / verification (HARD GATE)

Before claiming done: run the augmented smoke locally and demonstrate it:

Catches the current wg-nex-native breakage (smoke should FAIL until wg-nex-native-2 lands, then pass)
Shows pass/fail/skip output that's greppable

Validation

Read existing wave-1 smoke implementation; identify scenarios that were stubbed/skipped.
Replace stubs with live invocations against the documented endpoints (claude CLI, lambda01 OAI-compat).
Failing test first: smoke run currently FAILS loudly on the nex+lambda01 'hi' scenario.
cargo build + cargo test pass.
Manual: run the augmented smoke from a clean checkout; nex scenario fails clearly with stack trace; once wg-nex-native-2 lands, re-run shows pass.
Doc: README points at the smoke script and clearly states 'this MUST be run live against real endpoints; no stubs.'

## Description

`wave-1-integration-smoke` was claimed Done but it did NOT catch the wg nex breakage that the user immediately hit on the very first message in TUI chat. The user's quote: 'why isn't the smoke test catching all this stuff! i did the most basic thing i wrote hi and then it barfed.'

The smoke skipped (or stubbed) the nex+lambda01 scenario instead of exercising it for real. That defeats the whole point of an assertion-driven live smoke per the project memory pattern.

### Required scenarios the smoke MUST cover (and FAIL loudly when broken)

1. **Nex against the user's real endpoint** — use the same flow the user reported:
```
rm -rf /tmp/wg-smoke && mkdir /tmp/wg-smoke && cd /tmp/wg-smoke
wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x nex
wg service start
# programmatically open a chat session (via IPC CreateChat or similar — match what 'wg tui' does)
# send the literal string 'hi' as the first message
# assert a non-error response is received within 30s
# send 4 more messages back-to-back; assert all 5 succeed
```
If the lambda01 endpoint is unreachable from CI or the smoke env, **the smoke fails loudly with 'NEX SMOKE SKIPPED — endpoint unreachable'** so it shows up in the run output. Do NOT silently skip.

2. **Claude end-to-end via TUI** — same pattern: spawn a chat agent, send 'hi', assert response.

3. **Both with default config** — no special bypass to make the test pass. If the default config breaks, the smoke breaks.

### Why the previous smoke missed it

Whoever implemented `wave-1-integration-smoke` likely used stub endpoints, mocks, or skipped the live-endpoint scenarios. Read the smoke script and identify which scenarios were stubbed/skipped. Replace with live invocations against real endpoints.

### Reproducibility / verification (HARD GATE)

Before claiming done: run the augmented smoke locally and demonstrate it:
- Catches the current wg-nex-native breakage (smoke should FAIL until wg-nex-native-2 lands, then pass)
- Shows pass/fail/skip output that's greppable

## Validation

- [ ] Read existing wave-1 smoke implementation; identify scenarios that were stubbed/skipped.
- [ ] Replace stubs with live invocations against the documented endpoints (claude CLI, lambda01 OAI-compat).
- [ ] Failing test first: smoke run currently FAILS loudly on the nex+lambda01 'hi' scenario.
- [ ] cargo build + cargo test pass.
- [ ] Manual: run the augmented smoke from a clean checkout; nex scenario fails clearly with stack trace; once wg-nex-native-2 lands, re-run shows pass.
- [ ] Doc: README points at the smoke script and clearly states 'this MUST be run live against real endpoints; no stubs.'

Depends on

done .assign-smoke-test-gap

Required by

(none)

Log

2026-04-26T17:27:06.067203982+00:00 Task paused
2026-04-26T17:27:06.117144156+00:00 Task published
2026-04-26T17:27:30.153496975+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer's meticulous tradeoff is critical for fixing a smoke test that previously missed real bugs; requires implementing live-endpoint test scenarios with strict validation assertions.
2026-04-26T17:31:07.279837308+00:00 Spawned by coordinator --executor claude --model opus
2026-04-26T17:32:28.886466931+00:00 TRIAGE: failed dep wg-nex-native-2 was a duplicate of in-progress wg-nex-native (agent-62 active). Adding sequencing edge: wg-nex-native-2 must come after wg-nex-native, then retrying it. Requeuing self.
2026-04-26T17:33:31.091385634+00:00 Requeued (triage 1/3): Created sequencing fix for failed dep wg-nex-native-2: now blocked on wg-nex-native (in-progress, agent-62). Description updated to require Programmer role. Retried wg-nex-native-2.
2026-04-26T18:30:55.326420351+00:00 Spawned by coordinator --executor claude --model opus
2026-04-26T18:30:57.479557896+00:00 session not resumable, falling back to fresh session
2026-04-26T18:31:16.636386486+00:00 Starting: reading existing wave-1 smoke implementation to find stubs/skips
2026-04-26T18:33:54.534606476+00:00 Reviewed wave-1-smoke.sh: scenarios 1-5 use stubs/fakes. Scenario 1 uses --no-coordinator-agent (no coordinator spawn). Scenario 2 uses fake LLM — never hits real endpoint. Will add live scenarios 6/7 hitting lambda01 and claude CLI.
2026-04-26T18:53:24.785519756+00:00 Augmented wave-1-smoke.sh with scenario 6 (Nex LIVE against lambda01) and scenario 7 (Claude LIVE). Loud-skip banner for unreachable endpoints. Verified: pre-fix binary FAILS scenario 6 with role=system-error/404 stack trace; post-fix binary PASSES with 5/5 coordinator responses against qwen3-coder@lambda01. README updated to spell out 'live; no stubs'.
2026-04-26T18:53:52.474899095+00:00 Committed: c3ed0139f — pushed to origin wg/agent-75/smoke-test-gap
2026-04-26T18:53:55.500661019+00:00 Task marked as done