Metadata
| Status | done |
|---|---|
| Assigned | agent-102 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-04-26T20:00:02.907277292+00:00 |
| Started | 2026-04-26T20:01:44.444708556+00:00 |
| Completed | 2026-04-26T21:01:42.324052337+00:00 |
| Tags | eval-scheduled |
| Eval score | 0.90 |
| └ blocking impact | 0.90 |
| └ completeness | 0.95 |
| └ coordination overhead | 0.92 |
| └ correctness | 0.92 |
| └ downstream usability | 0.90 |
| └ efficiency | 0.88 |
| └ intent fidelity | 0.88 |
| └ style adherence | 0.92 |
Description
Description
Multiple recent 'done' tasks regressed live behavior:
- wg-nex-native (still broken; user repro'd same fault after 'fix')
- rename-dispatcher-daemon (resurrected Bug A orphan ghost)
- wire-priority-field (priority schema mid-migration; integer in graph.jsonl breaks readers)
- model-is-not done was good but exposed downstream routing breakage that a smoke would have caught
Pattern: agent claims done, smoke is theater, regression ships, user hits it. The smoke-test-gap task is supposed to fix this but evidently the smoke still doesn't have teeth.
Spec
The smoke must be a HARD GATE on wg done, not just a recommendation. If a defined smoke scenario fails, wg done rejects with the specific scenario that broke.
- Define a structured smoke manifest (e.g. tests/smoke/manifest.toml) listing scenarios with: name, repro script, assertion, owner-task (which task this scenario protects).
wg done <task>runs all scenarios where the task is in the owner-task list, OR all scenarios if --full-smoke. If any fail,wg doneexits non-zero with which scenarios failed.- Agents (per CLAUDE.md and the agent prompt) MUST run the smoke before
wg done. Add this as a hard requirement in the agent prompt template. - Smoke runs against LIVE binaries and LIVE endpoints (no stubs). If an endpoint is unreachable, scenario emits SKIP loud (not silent).
- The smoke manifest is grow-only: every regression in this session adds a scenario. wg-nex-native bug, Bug A ghost, model routing — each gets a permanent scenario.
Required initial scenarios
- nex_two_message_against_lambda01 — repro the exact 'wg init -x nex -m qwen3-coder -e https://lambda01...; wg tui; send 'hi'' user smoke
- dispatcher_boot_no_orphan_supervisor — Bug A regression check (covered by bug-a-regression-test task; reuse)
- claude_executor_with_global_openrouter_default — local=claude + global openrouter is_default → spawn metadata shows executor=claude, no native-exec
- priority_int_and_string_deserialize — graph.jsonl with both
'priority':10and'priority':'high'reads cleanly - chat_create_via_ipc_works — open chat, send 'hi', get response within 30s
Out of scope
- Replacing the existing wave-1 smoke (extend it, don't replace)
- CI integration (this task is the local-gate; CI plumbing is follow-up)
Validation
- Failing tests first: test_done_blocks_when_smoke_scenario_fails, test_done_succeeds_when_all_owned_scenarios_pass
- Implementation: smoke manifest + wg done integration + agent prompt update
- cargo build + cargo test pass with no regressions
-
Manual: try
wg done <task>in scratch dir while a known scenario is broken; assert wg done refuses with a specific failure message - Manual: all 5 initial scenarios above run live against real endpoints, with clear SKIP on unreachable endpoints
Depends on
Required by
- (none)
Log
- 2026-04-26T20:00:02.901543890+00:00 Task paused
- 2026-04-26T20:00:46.009667983+00:00 Task published
- 2026-04-26T20:01:42.527056154+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=graph, reason=Careful Programmer best fits correctness-critical infrastructure (smoke gate blocks task completion); Careful tradeoff suits high-risk validation work with live endpoint testing; 41-task track record demonstrates reliable implementation execution.
- 2026-04-26T20:01:44.444715489+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-26T20:01:57.376659621+00:00 Starting: planning approach — manifest + wg done integration + tests
- 2026-04-26T20:49:59.030502498+00:00 Implemented smoke gate: src/smoke.rs (manifest loader + scenario runner), wired into wg done as a hard gate before status transitions. Added --full-smoke (run all scenarios) and --skip-smoke (refused for agents unless WG_SMOKE_AGENT_OVERRIDE=1).
- 2026-04-26T20:49:59.039200680+00:00 Added 5 initial scenarios under tests/smoke/scenarios/ + manifest. Live-verified against the installed wg binary: 4/5 PASS (including nex_two_message against lambda01), 1 loud-SKIP for missing LLM creds. End-to-end: gate refused done with broken scenario, allowed done with --skip-smoke (human) and loud warn, blocked --skip-smoke for agent.
- 2026-04-26T20:49:59.052860111+00:00 Tests: 6 integration tests in tests/integration_smoke_gate.rs + 8 unit tests in src/smoke.rs all pass. cargo test full suite: 1 pre-existing failure (provenance_full_lifecycle_all_ops_recorded), confirmed pre-existing on clean checkout. No regressions caused by this change.
- 2026-04-26T20:49:59.064616167+00:00 Updated executor.rs prompt template (REQUIRED_WORKFLOW_SECTION) and CLAUDE.md so future agents see the smoke gate as part of the wg done contract.
- 2026-04-26T21:01:34.397106179+00:00 Committed: 9d1e60295 — pushed to origin wg/agent-102/smoke-gate-is
- 2026-04-26T21:01:42.324068737+00:00 Task marked as done