smoke-test-gap

Smoke test gap: wave-1-integration-smoke did not catch wg nex breakage; must include user-endpoint live smoke

Metadata

Statusdone
Assignedagent-75
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-04-26T17:27:06.068770431+00:00
Started2026-04-26T18:30:55.326417526+00:00
Completed2026-04-26T18:53:55.500654076+00:00
Tagseval-scheduled
Tokens449356 in / 6945 out
Eval score0.84
└ blocking impact0.85
└ completeness0.85
└ coordination overhead0.85
└ correctness0.86
└ downstream usability0.82
└ efficiency0.85
└ intent fidelity0.89
└ style adherence0.80

Description

Description

wave-1-integration-smoke was claimed Done but it did NOT catch the wg nex breakage that the user immediately hit on the very first message in TUI chat. The user's quote: 'why isn't the smoke test catching all this stuff! i did the most basic thing i wrote hi and then it barfed.'

The smoke skipped (or stubbed) the nex+lambda01 scenario instead of exercising it for real. That defeats the whole point of an assertion-driven live smoke per the project memory pattern.

Required scenarios the smoke MUST cover (and FAIL loudly when broken)

  1. Nex against the user's real endpoint — use the same flow the user reported:

    rm -rf /tmp/wg-smoke && mkdir /tmp/wg-smoke && cd /tmp/wg-smoke
    wg init -m qwen3-coder -e https://lambda01.tail334fe6.ts.net:30000 -x nex
    wg service start
    # programmatically open a chat session (via IPC CreateChat or similar — match what 'wg tui' does)
    # send the literal string 'hi' as the first message
    # assert a non-error response is received within 30s
    # send 4 more messages back-to-back; assert all 5 succeed
    

    If the lambda01 endpoint is unreachable from CI or the smoke env, the smoke fails loudly with 'NEX SMOKE SKIPPED — endpoint unreachable' so it shows up in the run output. Do NOT silently skip.

  2. Claude end-to-end via TUI — same pattern: spawn a chat agent, send 'hi', assert response.

  3. Both with default config — no special bypass to make the test pass. If the default config breaks, the smoke breaks.

Why the previous smoke missed it

Whoever implemented wave-1-integration-smoke likely used stub endpoints, mocks, or skipped the live-endpoint scenarios. Read the smoke script and identify which scenarios were stubbed/skipped. Replace with live invocations against real endpoints.

Reproducibility / verification (HARD GATE)

Before claiming done: run the augmented smoke locally and demonstrate it:

  • Catches the current wg-nex-native breakage (smoke should FAIL until wg-nex-native-2 lands, then pass)
  • Shows pass/fail/skip output that's greppable

Validation

  • Read existing wave-1 smoke implementation; identify scenarios that were stubbed/skipped.
  • Replace stubs with live invocations against the documented endpoints (claude CLI, lambda01 OAI-compat).
  • Failing test first: smoke run currently FAILS loudly on the nex+lambda01 'hi' scenario.
  • cargo build + cargo test pass.
  • Manual: run the augmented smoke from a clean checkout; nex scenario fails clearly with stack trace; once wg-nex-native-2 lands, re-run shows pass.
  • Doc: README points at the smoke script and clearly states 'this MUST be run live against real endpoints; no stubs.'

Depends on

Required by

Log