Metadata
| Status | done |
|---|---|
| Assigned | agent-9 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-04-26T16:00:22.192009760+00:00 |
| Started | 2026-04-26T16:03:44.274732291+00:00 |
| Completed | 2026-04-26T16:13:53.190021390+00:00 |
| Tags | eval-scheduled |
| Eval score | 0.75 |
| └ blocking impact | 0.70 |
| └ completeness | 0.70 |
| └ coordination overhead | 0.82 |
| └ correctness | 0.76 |
| └ downstream usability | 0.70 |
| └ efficiency | 0.75 |
| └ intent fidelity | 0.84 |
| └ style adherence | 0.85 |
Description
Description
After this wave of fixes lands (Bug A ghost coordinator already fixed; pending: tui-new-coord-dialog, launcher-history, wg-nex-native, stale-model-alias, rename-dispatcher-daemon, wg-setup-5-smooth-2, remove-validation-cli, coordinator-inotify-graph), we need a single integration smoke that exercises the whole stack against a live binary and asserts behavior — not just 'the build succeeds' or 'unit tests pass.'
Per the assertion-driven-live-smoke memory: versioned scripts + behavioral assertions + live endpoint. The only pattern that catches design bugs before they ship.
Scenarios to cover
-
Claude end-to-end:
- rm -rf scratch && cd scratch
wg init -x claudewg service start- Assert daemon log shows NO 'Coordinator-0' phantom (Bug A regression check)
wg add 'echo hello'→wg publish <id>→ wait for done → assert task status=done within 60s- Open
wg tuinon-interactively (or via expect/pexpect), create coordinator named 'test', send a message, assert response received within 30s
-
Nex end-to-end:
- Same scratch dir flow with
wg init -x nex -m <local-model> -e <local-endpoint>(or skip if no local endpoint configured — emit clear SKIP, not failure) - Send TWO messages back-to-back, assert both responses received (Bug 'wg nex breaks after one message' regression check)
- Same scratch dir flow with
-
Setup routes (after wg-setup-5-smooth-2 lands):
wg setup --route claude-cli --yesproduces complete configwg setup --route openrouter --api-key-env OPENROUTER_API_KEY --yesproduces complete config- Assert no empty [tiers] in either
-
Launcher history flow:
- CLI
wg nex -m foo -e http://example(one message, quit) - Open TUI, assert new-coordinator dialog offers 'nex / foo / example' as recall
- CLI
-
Model alias resolution:
wg add 'x' --model claude:sonnet→wg show <id>reports current sonnet 4.6 id, not the dated 4.0 string
Form
- Bash script in tests/integration/wave-1-smoke.sh (or scripts/smoke/wave-1.sh)
- Each scenario is a function with one-line description, performs the steps, asserts the outcome with grep/jq/wg show, exits non-zero on first failure with a clear 'SCENARIO N failed: ' message.
- Skip cleanly when prerequisites unavailable (e.g. no local OAI endpoint for nex scenario) — print SKIP, continue.
- Run from CI ideally; for now, doc says 'run before merging any wave-1 task.'
Out of scope
- Performance benchmarking (separate concern)
- TUI mouse / scroll testing (covered by the dialog task; if hard to script, exercise it manually in this smoke and document)
Validation
- Failing test first: literal — the smoke script exists and runs end-to-end against a freshly-built wg binary, asserting all scenarios pass on a clean tree.
- cargo build + cargo test pass with no regressions
- Manual: run the smoke script from a clean checkout; all non-SKIP scenarios pass; output is greppable/CI-friendly
- Doc note added to README.md or docs/ pointing at the smoke script and saying 'run this after any wave-1 task lands'
Depends on
Required by
- (none)
Log
- 2026-04-26T16:00:22.191718913+00:00 Task paused
- 2026-04-26T16:03:14.701285896+00:00 Task published
- 2026-04-26T16:03:44.274735297+00:00 Spawned by coordinator --executor claude --model claude-opus-4-6
- 2026-04-26T16:03:52.223758477+00:00 Starting implementation of wave-1 integration smoke test script
- 2026-04-26T16:05:41.563868584+00:00 Writing scripts/smoke/wave-1-smoke.sh — 5 scenarios per task description
- 2026-04-26T16:13:08.294432471+00:00 Validated: smoke script runs e2e — 3 PASS, 1 FAIL (known: empty [tiers] needs wg-setup-5-smooth-2), 1 SKIP (launcher-history not landed). cargo build passes. Pre-existing test failure in provenance_full_lifecycle unrelated to changes.
- 2026-04-26T16:13:41.640040009+00:00 Committed: d838e4b4a — pushed to remote
- 2026-04-26T16:13:53.190027953+00:00 Task marked as done