Metadata
| Status | done |
|---|---|
| Assigned | agent-75 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-04-26T17:27:06.068770431+00:00 |
| Started | 2026-04-26T18:30:55.326417526+00:00 |
| Completed | 2026-04-26T18:53:55.500654076+00:00 |
| Tags | eval-scheduled |
| Tokens | 449356 in / 6945 out |
| Eval score | 0.84 |
| └ blocking impact | 0.85 |
| └ completeness | 0.85 |
| └ coordination overhead | 0.85 |
| └ correctness | 0.86 |
| └ downstream usability | 0.82 |
| └ efficiency | 0.85 |
| └ intent fidelity | 0.89 |
| └ style adherence | 0.80 |
Description
Description
wave-1-integration-smoke was claimed Done but it did NOT catch the wg nex breakage that the user immediately hit on the very first message in TUI chat. The user's quote: 'why isn't the smoke test catching all this stuff! i did the most basic thing i wrote hi and then it barfed.'
The smoke skipped (or stubbed) the nex+lambda01 scenario instead of exercising it for real. That defeats the whole point of an assertion-driven live smoke per the project memory pattern.
Required scenarios the smoke MUST cover (and FAIL loudly when broken)
-
Nex against the user's real endpoint — use the same flow the user reported:
rm -rf /tmp/wg-smoke && mkdir /tmp/wg-smoke && cd /tmp/wg-smoke wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x nex wg service start # programmatically open a chat session (via IPC CreateChat or similar — match what 'wg tui' does) # send the literal string 'hi' as the first message # assert a non-error response is received within 30s # send 4 more messages back-to-back; assert all 5 succeedIf the lambda01 endpoint is unreachable from CI or the smoke env, the smoke fails loudly with 'NEX SMOKE SKIPPED — endpoint unreachable' so it shows up in the run output. Do NOT silently skip.
-
Claude end-to-end via TUI — same pattern: spawn a chat agent, send 'hi', assert response.
-
Both with default config — no special bypass to make the test pass. If the default config breaks, the smoke breaks.
Why the previous smoke missed it
Whoever implemented wave-1-integration-smoke likely used stub endpoints, mocks, or skipped the live-endpoint scenarios. Read the smoke script and identify which scenarios were stubbed/skipped. Replace with live invocations against real endpoints.
Reproducibility / verification (HARD GATE)
Before claiming done: run the augmented smoke locally and demonstrate it:
- Catches the current wg-nex-native breakage (smoke should FAIL until wg-nex-native-2 lands, then pass)
- Shows pass/fail/skip output that's greppable
Validation
- Read existing wave-1 smoke implementation; identify scenarios that were stubbed/skipped.
- Replace stubs with live invocations against the documented endpoints (claude CLI, lambda01 OAI-compat).
- Failing test first: smoke run currently FAILS loudly on the nex+lambda01 'hi' scenario.
- cargo build + cargo test pass.
- Manual: run the augmented smoke from a clean checkout; nex scenario fails clearly with stack trace; once wg-nex-native-2 lands, re-run shows pass.
- Doc: README points at the smoke script and clearly states 'this MUST be run live against real endpoints; no stubs.'
Depends on
Required by
- (none)
Log
- 2026-04-26T17:27:06.067203982+00:00 Task paused
- 2026-04-26T17:27:06.117144156+00:00 Task published
- 2026-04-26T17:27:30.153496975+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer's meticulous tradeoff is critical for fixing a smoke test that previously missed real bugs; requires implementing live-endpoint test scenarios with strict validation assertions.
- 2026-04-26T17:31:07.279837308+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-26T17:32:28.886466931+00:00 TRIAGE: failed dep wg-nex-native-2 was a duplicate of in-progress wg-nex-native (agent-62 active). Adding sequencing edge: wg-nex-native-2 must come after wg-nex-native, then retrying it. Requeuing self.
- 2026-04-26T17:33:31.091385634+00:00 Requeued (triage 1/3): Created sequencing fix for failed dep wg-nex-native-2: now blocked on wg-nex-native (in-progress, agent-62). Description updated to require Programmer role. Retried wg-nex-native-2.
- 2026-04-26T18:30:55.326420351+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-26T18:30:57.479557896+00:00 session not resumable, falling back to fresh session
- 2026-04-26T18:31:16.636386486+00:00 Starting: reading existing wave-1 smoke implementation to find stubs/skips
- 2026-04-26T18:33:54.534606476+00:00 Reviewed wave-1-smoke.sh: scenarios 1-5 use stubs/fakes. Scenario 1 uses --no-coordinator-agent (no coordinator spawn). Scenario 2 uses fake LLM — never hits real endpoint. Will add live scenarios 6/7 hitting lambda01 and claude CLI.
- 2026-04-26T18:53:24.785519756+00:00 Augmented wave-1-smoke.sh with scenario 6 (Nex LIVE against lambda01) and scenario 7 (Claude LIVE). Loud-skip banner for unreachable endpoints. Verified: pre-fix binary FAILS scenario 6 with role=system-error/404 stack trace; post-fix binary PASSES with 5/5 coordinator responses against qwen3-coder@lambda01. README updated to spell out 'live; no stubs'.
- 2026-04-26T18:53:52.474899095+00:00 Committed: c3ed0139f — pushed to origin wg/agent-75/smoke-test-gap
- 2026-04-26T18:53:55.500661019+00:00 Task marked as done