wave-1-integration-smoke

Wave 1 integration smoke test: assertion-driven live coverage of TUI + claude + coordinator + nex paths

Metadata

Statusdone
Assignedagent-9
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-04-26T16:00:22.192009760+00:00
Started2026-04-26T16:03:44.274732291+00:00
Completed2026-04-26T16:13:53.190021390+00:00
Tagseval-scheduled
Eval score0.75
└ blocking impact0.70
└ completeness0.70
└ coordination overhead0.82
└ correctness0.76
└ downstream usability0.70
└ efficiency0.75
└ intent fidelity0.84
└ style adherence0.85

Description

Description

After this wave of fixes lands (Bug A ghost coordinator already fixed; pending: tui-new-coord-dialog, launcher-history, wg-nex-native, stale-model-alias, rename-dispatcher-daemon, wg-setup-5-smooth-2, remove-validation-cli, coordinator-inotify-graph), we need a single integration smoke that exercises the whole stack against a live binary and asserts behavior — not just 'the build succeeds' or 'unit tests pass.'

Per the assertion-driven-live-smoke memory: versioned scripts + behavioral assertions + live endpoint. The only pattern that catches design bugs before they ship.

Scenarios to cover

  1. Claude end-to-end:

    • rm -rf scratch && cd scratch
    • wg init -x claude
    • wg service start
    • Assert daemon log shows NO 'Coordinator-0' phantom (Bug A regression check)
    • wg add 'echo hello'wg publish <id> → wait for done → assert task status=done within 60s
    • Open wg tui non-interactively (or via expect/pexpect), create coordinator named 'test', send a message, assert response received within 30s
  2. Nex end-to-end:

    • Same scratch dir flow with wg init -x nex -m <local-model> -e <local-endpoint> (or skip if no local endpoint configured — emit clear SKIP, not failure)
    • Send TWO messages back-to-back, assert both responses received (Bug 'wg nex breaks after one message' regression check)
  3. Setup routes (after wg-setup-5-smooth-2 lands):

    • wg setup --route claude-cli --yes produces complete config
    • wg setup --route openrouter --api-key-env OPENROUTER_API_KEY --yes produces complete config
    • Assert no empty [tiers] in either
  4. Launcher history flow:

    • CLI wg nex -m foo -e http://example (one message, quit)
    • Open TUI, assert new-coordinator dialog offers 'nex / foo / example' as recall
  5. Model alias resolution:

    • wg add 'x' --model claude:sonnetwg show <id> reports current sonnet 4.6 id, not the dated 4.0 string

Form

  • Bash script in tests/integration/wave-1-smoke.sh (or scripts/smoke/wave-1.sh)
  • Each scenario is a function with one-line description, performs the steps, asserts the outcome with grep/jq/wg show, exits non-zero on first failure with a clear 'SCENARIO N failed: ' message.
  • Skip cleanly when prerequisites unavailable (e.g. no local OAI endpoint for nex scenario) — print SKIP, continue.
  • Run from CI ideally; for now, doc says 'run before merging any wave-1 task.'

Out of scope

  • Performance benchmarking (separate concern)
  • TUI mouse / scroll testing (covered by the dialog task; if hard to script, exercise it manually in this smoke and document)

Validation

  • Failing test first: literal — the smoke script exists and runs end-to-end against a freshly-built wg binary, asserting all scenarios pass on a clean tree.
  • cargo build + cargo test pass with no regressions
  • Manual: run the smoke script from a clean checkout; all non-SKIP scenarios pass; output is greppable/CI-friendly
  • Doc note added to README.md or docs/ pointing at the smoke script and saying 'run this after any wave-1 task lands'

Depends on

Required by

Log