fix-flaky-tui

Fix flaky TUI subprocess-IPC tests + push merge to origin

Metadata

Statusdone
Assignedagent-2639
Agent identity02e879681e52e0a384106169be043416c4d946e850ab26b2269c57681b52a6e7
Created2026-05-11T19:55:33.973220608+00:00
Started2026-05-11T19:56:22.038861028+00:00
Completed2026-05-11T20:39:14.430639108+00:00
Tagsbug,tui,test,blocking,push, eval-scheduled
Eval score0.76
└ blocking impact0.80
└ completeness0.80
└ constraint fidelity0.55
└ coordination overhead0.80
└ correctness0.72
└ downstream usability0.75
└ efficiency0.75
└ intent fidelity0.70
└ style adherence0.80

Description

Context

Merge commit 620d0fbdd is on local main, not yet pushed. Test suite is 3405/3407 passing — but the two failing tests are blocking the push.

Both failures exist on upstream pre-merge (verified via git show FETCH_HEAD:src/tui/viz_viewer/event.rs) — they are NOT regressions from the merge resolution. They are pre-existing flakes in the TUI subprocess-IPC test path. The merge resolution itself is clean.

Failing tests

  1. tui::viz_viewer::event::chat_tab_navigation_tests::click_close_button_enqueues_delete_coordinator_ipc
  2. tui::viz_viewer::event::chat_tab_navigation_tests::chat_manager_bulk_abandon_enqueues_per_selected

Observed symptom

Both tests panic with a 10-second timeout on cmd_rx.recv_timeout():

panicked at src/tui/viz_viewer/event.rs:8421:14:
click_close_button must enqueue an IPC subprocess: Timeout

Each test simulates a UI action (clicking a tab close button, multi-selecting + bulk-abandoning chat tasks) that should enqueue an IPC subprocess via exec_command, then waits for a CommandResult on cmd_rx. The subprocess channel never delivers within 10s.

The test environment shows deprecated-key warnings from a synthesized /tmp/.tmpXXX/.wg/config.toml (legacy [coordinator]/coordinator.executor keys). Possibly relevant: maybe the test fixture's config triggers a warn-and-bail in the subprocess startup path. Or the IPC machinery's spawn-and-handshake is timing-fragile under load.

What to do

  1. Diagnose. Run both tests in isolation, capture the subprocess stderr (the tests likely swallow it). Determine whether: (a) subprocess fails to spawn, (b) subprocess spawns but never produces a CommandResult, (c) channel routing drops the result.
  2. Fix the root cause. Likely candidates:
    • Test fixture's config.toml uses deprecated keys that the current code path now hard-rejects → update fixture
    • The IPC channel setup has a race that 10s isn't long enough for → fix the race, don't just bump the timeout
    • exec_command's spawn path changed semantics in recent commits (fix-tui-must, fix-tui-chat-2) → reconcile
  3. Do NOT #[ignore] the tests — they cover real behavior (DeleteCoordinator IPC routing, bulk-abandon enqueue). If they are flaky-by-design, that's a real defect.
  4. Verify the full suite is green. cargo test must show test result: ok for every binary.
  5. Push the merge to origin.
    git status                    # confirm clean, on main, ahead by N
    git push origin main          # push 620d0fbdd + your fix commit(s)
    
    Branch protection / hooks should be respected. Don't --force or --no-verify.

Background — what the merge contained

620d0fbdd is the resolution of two competing fixes for the profile-overlay bug:

  • Local (agent-2620 fix-codex-init): profile = full Config snapshot, wg profile use <name> is a file swap, no overlay/merge logic
  • Upstream (agent-49 fix-profile-application): kept overlay, made it strict via deny_unknown_fields allowlists

Resolved in favor of file-swap. Profile = snapshot, local always wins over global. See the merge commit message for full detail.

Out of scope

  • Reverting or modifying the profile design (already settled)
  • Changing the merge resolution
  • Refactoring the TUI IPC layer beyond what the bug requires

Validation

  • Both named tests diagnosed; root cause documented in commit message (not just 'made it pass')
  • cargo test --bin wg click_close_button_enqueues_delete_coordinator_ipc passes 3 runs in a row
  • cargo test --bin wg chat_manager_bulk_abandon_enqueues_per_selected passes 3 runs in a row
  • Full cargo test is green (zero failed)
  • cargo build clean
  • cargo install --path . so the global wg is current
  • git push origin main succeeds; git status shows 'up to date with origin/main' after
  • The fix commit AND the merge commit 620d0fbdd both visible on origin (git log --oneline origin/main | head -5)
  • No #[ignore] attributes added, no --no-verify or --force used

Depends on

Required by

Log