wave-1-integration-smoke — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-9`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Created	2026-04-26T16:00:22.192009760+00:00
Started	2026-04-26T16:03:44.274732291+00:00
Completed	2026-04-26T16:13:53.190021390+00:00
Tags	`eval-scheduled`
Eval score	0.75
└ blocking impact	0.70
└ completeness	0.70
└ coordination overhead	0.82
└ correctness	0.76
└ downstream usability	0.70
└ efficiency	0.75
└ intent fidelity	0.84
└ style adherence	0.85

Description

After this wave of fixes lands (Bug A ghost coordinator already fixed; pending: tui-new-coord-dialog, launcher-history, wg-nex-native, stale-model-alias, rename-dispatcher-daemon, wg-setup-5-smooth-2, remove-validation-cli, coordinator-inotify-graph), we need a single integration smoke that exercises the whole stack against a live binary and asserts behavior — not just 'the build succeeds' or 'unit tests pass.'

Per the assertion-driven-live-smoke memory: versioned scripts + behavioral assertions + live endpoint. The only pattern that catches design bugs before they ship.

Scenarios to cover

Claude end-to-end:
- rm -rf scratch && cd scratch
- wg init -x claude
- wg service start
- Assert daemon log shows NO 'Coordinator-0' phantom (Bug A regression check)
- wg add 'echo hello' → wg publish <id> → wait for done → assert task status=done within 60s
- Open wg tui non-interactively (or via expect/pexpect), create coordinator named 'test', send a message, assert response received within 30s
Nex end-to-end:
- Same scratch dir flow with wg init -x nex -m <local-model> -e <local-endpoint> (or skip if no local endpoint configured — emit clear SKIP, not failure)
- Send TWO messages back-to-back, assert both responses received (Bug 'wg nex breaks after one message' regression check)
Setup routes (after wg-setup-5-smooth-2 lands):
- wg setup --route claude-cli --yes produces complete config
- wg setup --route openrouter --api-key-env OPENROUTER_API_KEY --yes produces complete config
- Assert no empty [tiers] in either
Launcher history flow:
- CLI wg nex -m foo -e http://example (one message, quit)
- Open TUI, assert new-coordinator dialog offers 'nex / foo / example' as recall
Model alias resolution:
- wg add 'x' --model claude:sonnet → wg show <id> reports current sonnet 4.6 id, not the dated 4.0 string

Form

Bash script in tests/integration/wave-1-smoke.sh (or scripts/smoke/wave-1.sh)
Each scenario is a function with one-line description, performs the steps, asserts the outcome with grep/jq/wg show, exits non-zero on first failure with a clear 'SCENARIO N failed: ' message.
Skip cleanly when prerequisites unavailable (e.g. no local OAI endpoint for nex scenario) — print SKIP, continue.
Run from CI ideally; for now, doc says 'run before merging any wave-1 task.'

Out of scope

Performance benchmarking (separate concern)
TUI mouse / scroll testing (covered by the dialog task; if hard to script, exercise it manually in this smoke and document)

Validation

Failing test first: literal — the smoke script exists and runs end-to-end against a freshly-built wg binary, asserting all scenarios pass on a clean tree.
cargo build + cargo test pass with no regressions
Manual: run the smoke script from a clean checkout; all non-SKIP scenarios pass; output is greppable/CI-friendly
Doc note added to README.md or docs/ pointing at the smoke script and saying 'run this after any wave-1 task lands'

## Description

After this wave of fixes lands (Bug A ghost coordinator already fixed; pending: tui-new-coord-dialog, launcher-history, wg-nex-native, stale-model-alias, rename-dispatcher-daemon, wg-setup-5-smooth-2, remove-validation-cli, coordinator-inotify-graph), we need a single integration smoke that exercises the whole stack against a live binary and asserts behavior — not just 'the build succeeds' or 'unit tests pass.'

Per the assertion-driven-live-smoke memory: versioned scripts + behavioral assertions + live endpoint. The only pattern that catches design bugs before they ship.

### Scenarios to cover

1. **Claude end-to-end**:
   - rm -rf scratch && cd scratch
   - `wg init -x claude`
   - `wg service start`
   - Assert daemon log shows NO 'Coordinator-0' phantom (Bug A regression check)
   - `wg add 'echo hello'` → `wg publish <id>` → wait for done → assert task status=done within 60s
   - Open `wg tui` non-interactively (or via expect/pexpect), create coordinator named 'test', send a message, assert response received within 30s

2. **Nex end-to-end**:
   - Same scratch dir flow with `wg init -x nex -m <local-model> -e <local-endpoint>` (or skip if no local endpoint configured — emit clear SKIP, not failure)
   - Send TWO messages back-to-back, assert both responses received (Bug 'wg nex breaks after one message' regression check)

3. **Setup routes** (after wg-setup-5-smooth-2 lands):
   - `wg setup --route claude-cli --yes` produces complete config
   - `wg setup --route openrouter --api-key-env OPENROUTER_API_KEY --yes` produces complete config
   - Assert no empty [tiers] in either

4. **Launcher history flow**:
   - CLI `wg nex -m foo -e http://example` (one message, quit)
   - Open TUI, assert new-coordinator dialog offers 'nex / foo / example' as recall

5. **Model alias resolution**:
   - `wg add 'x' --model claude:sonnet` → `wg show <id>` reports current sonnet 4.6 id, not the dated 4.0 string

### Form

- Bash script in tests/integration/wave-1-smoke.sh (or scripts/smoke/wave-1.sh)
- Each scenario is a function with one-line description, performs the steps, asserts the outcome with grep/jq/wg show, exits non-zero on first failure with a clear 'SCENARIO N failed: <what>' message.
- Skip cleanly when prerequisites unavailable (e.g. no local OAI endpoint for nex scenario) — print SKIP, continue.
- Run from CI ideally; for now, doc says 'run before merging any wave-1 task.'

### Out of scope

- Performance benchmarking (separate concern)
- TUI mouse / scroll testing (covered by the dialog task; if hard to script, exercise it manually in this smoke and document)

## Validation

- [ ] Failing test first: literal — the smoke script exists and runs end-to-end against a freshly-built wg binary, asserting all scenarios pass on a clean tree.
- [ ] cargo build + cargo test pass with no regressions
- [ ] Manual: run the smoke script from a clean checkout; all non-SKIP scenarios pass; output is greppable/CI-friendly
- [ ] Doc note added to README.md or docs/ pointing at the smoke script and saying 'run this after any wave-1 task lands'

Depends on

done .assign-wave-1-integration-smoke

Required by

(none)

Log

2026-04-26T16:00:22.191718913+00:00 Task paused
2026-04-26T16:03:14.701285896+00:00 Task published
2026-04-26T16:03:44.274735297+00:00 Spawned by coordinator --executor claude --model claude-opus-4-6
2026-04-26T16:03:52.223758477+00:00 Starting implementation of wave-1 integration smoke test script
2026-04-26T16:05:41.563868584+00:00 Writing scripts/smoke/wave-1-smoke.sh — 5 scenarios per task description
2026-04-26T16:13:08.294432471+00:00 Validated: smoke script runs e2e — 3 PASS, 1 FAIL (known: empty [tiers] needs wg-setup-5-smooth-2), 1 SKIP (launcher-history not landed). cargo build passes. Pre-existing test failure in provenance_full_lifecycle unrelated to changes.
2026-04-26T16:13:41.640040009+00:00 Committed: d838e4b4a — pushed to remote
2026-04-26T16:13:53.190027953+00:00 Task marked as done