diagnose-scroll-wheel — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-1171`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Created	2026-04-29T19:40:42.962228238+00:00
Started	2026-04-29T19:41:11.539922514+00:00
Completed	2026-04-29T20:01:54.085853006+00:00
Tags	`priority-high,research,bug,tui,input`, `eval-scheduled`
Eval score	0.88
└ blocking impact	0.90
└ completeness	0.95
└ constraint fidelity	0.70
└ coordination overhead	0.90
└ correctness	0.92
└ downstream usability	0.85
└ efficiency	0.85
└ intent fidelity	0.65
└ style adherence	0.90

Description

The fix-mouse-wheel-2 task was filed with a speculative root cause. User's reflection 2026-04-29: 'so was scroll wheel sending some scroll character before? that seemed to work... dunno whats up. this was a big goof and needs testing to pin down.'

User is correctly pointing out that we're guessing about the before/after byte semantics. Time to capture actual data.

Goal

Produce a byte-level comparison of what the TUI is doing with mouse wheel events:

BEFORE today's input changes (commit before fix-mouse-wheel / implement-tui-scroll)
AFTER today's input changes (HEAD)

Identify the EXACT regression: what changed in event interception, translation, or forwarding.

Investigation steps

1. Capture pre-regression behavior

Check out the commit just before fix-mouse-wheel (8ddeb9e42) — i.e., its parent. Build that binary. Run wg tui, open a claude chat (we know wheel scroll worked there pre-regression), capture:

Byte stream sent to the claude PTY child stdin during a scroll-wheel up event
Byte stream sent to the claude PTY child stdin during a scroll-wheel down event
Whether the outer scroll offset advanced
Tool: strace -p <claude-child-pid> -e write OR run claude via a wrapper that logs stdin

For codex (broken pre-regression, also broken post): same capture.

2. Capture post-regression behavior

HEAD binary. Same chat-tab repro. Same captures. Compare byte streams.

3. Identify event-routing pathway

For each commit involved (fix-mouse-wheel and implement-tui-scroll):

What wheel-event handler was added or modified?
What is the order of: outer-pane intercept vs PTY-child forward?
Is there any wheel→key translation in the new code? If so, where does the translated key event go?

4. Cross-check with claude's 'arrow keys' detection

Claude's warning fires when it sees specific input patterns. Identify what byte sequences trigger it. Confirm those are what our TUI is sending post-regression.

5. Consider the 'never worked but seemed to' angle

User noted PgUp/PgDn 'never worked' but scroll wheel WAS working. So the working pre-regression path was definitely about mouse-event handling, not keyboard. Confirm the working path was either:

Wheel events being passed THROUGH to the inner app (claude / codex render scroll natively in their alt-screen / line buffers when in scroll-receptive modes)
Wheel events being captured by the OUTER ratatui pane and consumed for outer scroll
Some hybrid (e.g., claude consumed wheel events natively, codex didn't, which matches the user's 'claude worked, codex didn't' report)

Deliverable

A wg log entry posted to this task with:

Byte-level captures (pre vs post) for both claude and codex chats
The exact code change that introduced the regression (file + line + commit)
The pre-regression mechanism explained (was it pass-through? outer intercept? something hybrid?)
Concrete fix proposal with rationale, suitable for a follow-up implementation task

The deliverable is the LOG ENTRY, not a code change. No source modifications.

Validation

Pre-regression byte stream captured for claude wheel scroll
Pre-regression byte stream captured for codex wheel scroll
Post-regression byte stream captured for both
Diff between pre and post identified at the commit/file level
Pre-regression mechanism explained (with file:line citation)
Concrete fix proposal that will pass the smoke scenario in fix-mouse-wheel-2
No source modifications — diagnose only

Process note

fix-mouse-wheel-2 is currently filed with a speculative scope. After this diagnose lands, either:

Update fix-mouse-wheel-2's description with the confirmed root cause + concrete fix, OR
Abandon fix-mouse-wheel-2 and file a new fix task with the verified diagnostic as its dependency

Either path beats shipping a 'fix' based on guesses.

## Description
The fix-mouse-wheel-2 task was filed with a speculative root cause. User's reflection 2026-04-29: 'so was scroll wheel sending some scroll character before? that seemed to work... dunno whats up. this was a big goof and needs testing to pin down.'

User is correctly pointing out that we're guessing about the before/after byte semantics. Time to capture actual data.

## Goal
Produce a byte-level comparison of what the TUI is doing with mouse wheel events:
- BEFORE today's input changes (commit before fix-mouse-wheel / implement-tui-scroll)
- AFTER today's input changes (HEAD)

Identify the EXACT regression: what changed in event interception, translation, or forwarding.

## Investigation steps

### 1. Capture pre-regression behavior
Check out the commit just before fix-mouse-wheel (8ddeb9e42) — i.e., its parent. Build that binary. Run `wg tui`, open a claude chat (we know wheel scroll worked there pre-regression), capture:
- Byte stream sent to the claude PTY child stdin during a scroll-wheel up event
- Byte stream sent to the claude PTY child stdin during a scroll-wheel down event
- Whether the outer scroll offset advanced
- Tool: `strace -p <claude-child-pid> -e write` OR run claude via a wrapper that logs stdin

For codex (broken pre-regression, also broken post): same capture.

### 2. Capture post-regression behavior
HEAD binary. Same chat-tab repro. Same captures. Compare byte streams.

### 3. Identify event-routing pathway
For each commit involved (fix-mouse-wheel and implement-tui-scroll):
- What wheel-event handler was added or modified?
- What is the order of: outer-pane intercept vs PTY-child forward?
- Is there any wheel→key translation in the new code? If so, where does the translated key event go?

### 4. Cross-check with claude's 'arrow keys' detection
Claude's warning fires when it sees specific input patterns. Identify what byte sequences trigger it. Confirm those are what our TUI is sending post-regression.

### 5. Consider the 'never worked but seemed to' angle
User noted PgUp/PgDn 'never worked' but scroll wheel WAS working. So the working pre-regression path was definitely about mouse-event handling, not keyboard. Confirm the working path was either:
- Wheel events being passed THROUGH to the inner app (claude / codex render scroll natively in their alt-screen / line buffers when in scroll-receptive modes)
- Wheel events being captured by the OUTER ratatui pane and consumed for outer scroll
- Some hybrid (e.g., claude consumed wheel events natively, codex didn't, which matches the user's 'claude worked, codex didn't' report)

## Deliverable
A wg log entry posted to this task with:
1. Byte-level captures (pre vs post) for both claude and codex chats
2. The exact code change that introduced the regression (file + line + commit)
3. The pre-regression mechanism explained (was it pass-through? outer intercept? something hybrid?)
4. Concrete fix proposal with rationale, suitable for a follow-up implementation task

The deliverable is the LOG ENTRY, not a code change. No source modifications.

## Validation
- [ ] Pre-regression byte stream captured for claude wheel scroll
- [ ] Pre-regression byte stream captured for codex wheel scroll
- [ ] Post-regression byte stream captured for both
- [ ] Diff between pre and post identified at the commit/file level
- [ ] Pre-regression mechanism explained (with file:line citation)
- [ ] Concrete fix proposal that will pass the smoke scenario in fix-mouse-wheel-2
- [ ] No source modifications — diagnose only

## Process note
fix-mouse-wheel-2 is currently filed with a speculative scope. After this diagnose lands, either:
- Update fix-mouse-wheel-2's description with the confirmed root cause + concrete fix, OR
- Abandon fix-mouse-wheel-2 and file a new fix task with the verified diagnostic as its dependency

Either path beats shipping a 'fix' based on guesses.

Depends on

done .assign-diagnose-scroll-wheel

Required by

(none)

Log

2026-04-29T19:40:42.948652476+00:00 Task paused
2026-04-29T19:40:43.018599670+00:00 Task published
2026-04-29T19:41:11.485649314+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=light, context_scope=task, reason=Careful Programmer is ideal for methodical code investigation, byte-level analysis, and evidence gathering; the Careful tradeoff matches the rigorous diagnostic nature and memory feedback on exhaustive verification.
2026-04-29T19:41:11.539927253+00:00 Spawned by coordinator --executor claude --model opus
2026-04-29T19:41:19.536082960+00:00 Starting diagnostic — will capture byte-level scroll wheel behavior pre and post fix-mouse-wheel commit (8ddeb9e42)
2026-04-29T19:53:39.999555192+00:00 ADDITIONAL CONTEXT 2026-04-29: how claude detects 'arrow keys' is a useful constraint on the bug. Claude likely enables mouse reporting via `\x1b[?1000h` (legacy xterm) or `\x1b[?1006h` (SGR mouse). With that mode enabled, the terminal should send claude mouse escape sequences for wheel events: - SGR: `\x1b[<64;col;rowM` (button 64 = scroll up) - Legacy: `\x1b[M` followed by 3 bytes (button + x + y) Claude's warning fires because it's receiving `\x1b[A` / `\x1b[B` (arrow up/down) in BURST PATTERNS (10+ events in 200ms with no preceding mouse-motion or mouse-button events). That signature is impossible from human typing — keyboard arrow keys come at ~5-10/sec max with measurable inter-key intervals; wheel-translated-to-arrows comes at 50-100/sec in bursts. So claude is heuristically detecting 'I asked for mouse, I got arrow-key bursts, terminal layer is mistranslating.' This narrows the diagnose to three specific candidate failure modes: 1. **Mouse mode handshake swallowed** — claude sends `\x1b[?1000h` to enable mouse mode but our TUI doesn't forward to the outer terminal, so outer terminal still sends keyboard events not mouse events 2. **Mouse event mistranslation** — we ARE receiving mouse events from outer terminal but converting them to arrow keys before forwarding to claude 3. **Synthesized arrow fallback** — implement-tui-scroll wired wheel→arrow translation intending it as a compatibility path, but that's exactly what trips claude's detector Hypothesis #3 is the most likely. The diagnose should specifically check: - Is there code that translates MouseEventKind::ScrollUp/Down into KeyCode::Up/Down before forwarding to PTY child stdin? - If so: that's the bug. The fix is to either (a) forward the original mouse escape sequence to claude (claude's mouse mode handles it correctly), or (b) consume the wheel event entirely at the outer pane (don't forward anything to claude). When building byte-stream captures, look specifically for: - Pre-regression: did claude RECEIVE mouse escape sequences (`\x1b[<...M`) on wheel scroll? Or did the wheel events just not reach claude at all (consumed at outer)? - Post-regression: receive arrow-key bursts (`\x1b[A` / `\x1b[B`) on wheel scroll → confirms hypothesis #3
2026-04-29T20:00:34.207457557+00:00 # Diagnostic: scroll-wheel input regression — byte-level ## TL;DR - **Regression introduced by:** `fix-mouse-wheel` (commit `8ddeb9e42`, 2026-04-29 12:28), NOT `implement-tui-scroll` (`bfb231c7e`) as `fix-mouse-wheel-2`'s description speculated. `implement-tui-scroll` is orthogonal — it added `Ctrl+]` keyboard scroll mode and did not touch the wheel handler. - **Concrete change:** `forward_chat_wheel` in `src/tui/viz_viewer/event.rs` started routing `MouseEventKind::ScrollUp/Down` to `pane.send_key(KeyCode::Up/Down)` (3× per notch) when `chat_pty_forwards_stdin && RightPanel focus && !chat_pty_observer`, instead of calling `pane.scroll_up/down(3)` directly. - **Resulting bytes:** every wheel notch wrote 9 bytes (`\x1b[A\x1b[A\x1b[A` for up, `\x1b[B\x1b[B\x1b[B` for down) onto the child PTY's stdin. Claude code detected the pattern and emitted `Scroll wheel is sending arrow keys · use PgUp/PgDn to scroll in claude code`. - **Already fixed at HEAD** by `fix-mouse-wheel-2` (commit `745243957`, 2026-04-29 14:42), which reverts wheel routing to `pane.scroll_up/down` unconditionally and added a smoke + unit-test pin. - **Open issue:** `fix-mouse-wheel-2`'s task description still attributes the regression to `implement-tui-scroll`. That attribution is wrong; this diagnostic supersedes it. ## Live byte-level captures Capture method: spawn `wg tui` in a Python `pty.fork()` PTY, point `WG_PTY_DUMP=<prefix>` at a file, inject SGR mouse-wheel sequences (`\x1b[<64;col;rowM` for up, `\x1b[<65;col;rowM` for down) over the chat content area, and read `<prefix>.wg.<pid>.in.bin` (the bytes wg's TUI wrote to the embedded child's stdin). Test harness: `/tmp/wheel-diag/pty_drive.py`. Inner child = `wg nex` (matches the regression's vendor-PTY-active branch — same conditions are met for claude/codex). | Binary | Commit | `x` | `Up` (CSI-A) | wheel-up @150,10 | wheel-up ×4 | wheel-down @150,10 | wheel-up @20,15 (graph) | |---|---|---|---|---|---|---|---| | pre-regression | `15c6b2ac7` (parent of fix-mouse-wheel) | 1 byte `x` | 3 bytes `\x1b[A` | **0** | **0** | **0** | 0 | | regressed | `9de67a5e3` (post fix-mouse-wheel, pre fix-mouse-wheel-2) | 1 byte `x` | 3 bytes `\x1b[A` | **9 bytes `\x1b[A\x1b[A\x1b[A`** | **36 bytes (12× `\x1b[A`)** | **9 bytes `\x1b[B\x1b[B\x1b[B`** | 0 | | post-fix at HEAD | `745243957` (fix-mouse-wheel-2 merged) | 1 byte `x` | 3 bytes `\x1b[A` | **0** | **0** | **0** | 0 | The "graph region" wheel column confirms the wheel→arrow translation only fires on the chat tab (chat-pty + RightPanel focus path), and the `x` / explicit-Up rows confirm input plumbing is otherwise identical across all three binaries — the only delta is the wheel-event branch. ## Code-level diff that introduced the regression Pre-regression (`15c6b2ac7`, `src/tui/viz_viewer/event.rs:3644-3690`): ```rust } else if (in_right_content || in_tab_bar) && app.chat_pty_mode && app.right_panel_tab == RightPanelTab::Chat { let task_id = workgraph::chat_id::format_chat_task_id(app.active_coordinator_id); if let Some(pane) = app.task_panes.get_mut(&task_id) { pane.scroll_up(3); // wg-side vt100 scrollback offset only } } ``` Post-regression (`8ddeb9e42`, `src/tui/viz_viewer/event.rs:3544-3583`, function `forward_chat_wheel`): ```rust let vendor_pty_active = app.chat_pty_forwards_stdin && app.focused_panel == FocusedPanel::RightPanel && !app.chat_pty_observer; if vendor_pty_active { let key = match kind { MouseEventKind::ScrollUp => KeyCode::Up, MouseEventKind::ScrollDown => KeyCode::Down, _ => return, }; for _ in 0..3 { let _ = pane.send_key(KeyEvent::new(key, KeyModifiers::empty())); } } else { /* fallthrough: pane.scroll_up/down(3) */ } ``` `pane.send_key` runs `key_event_to_bytes(KeyCode::Up)` → `\x1b[A` and writes to the PTY master writer (`src/tui/pty_pane.rs:440-454`, `1116-1119`). 3 notches × 3 bytes = 9 bytes per wheel notch onto the child's stdin. ## Pre-regression mechanism (why claude "seemed to work" and codex didn't) `pane.scroll_up(3)` / `pane.scroll_down(3)` (`src/tui/pty_pane.rs:383-402`) only mutate the **wg-side** vt100 parser's scrollback offset (`screen.set_scrollback(...)`) — they do **not** write to the PTY master. So pre-regression, wheel events were entirely consumed by wg; the embedded child (claude / codex / nex) saw nothing. The user's observation "claude wheel scroll worked, codex didn't" tracks how the two CLIs use the screen: - **Claude** writes its REPL output to the **main screen** (no DECSET 1049). Output accumulates in vt100's main-screen scrollback. Wheel-up advanced the wg scrollback offset and the user saw real claude history. - **Codex** is a full-TUI app and enters the **alt screen** (DECSET 1049). The main-screen scrollback is preserved-but-empty during alt-screen mode; vt100 reports nothing useful when wg shifts the scrollback offset. Wheel-up was a visual no-op — matches the user's report. (`fix-codex-chat-3`'s sync-mode trim made codex's wg-side history even thinner.) - **PgUp/PgDn never worked** because pre-regression, keyboard PageUp/PageDown was forwarded to the child's stdin (just like other keys), and neither claude nor codex have a "scroll output history" binding for `\x1b[5~` / `\x1b[6~` in their REPL modes. So pre-regression there was **no symmetric way** to scroll codex's content — that's the legitimate user-visible gap `fix-mouse-wheel` was trying to close. It overshot. ## Why the fix overshot `fix-mouse-wheel`'s commit message and the test it added (`mouse_wheel_in_vendor_pty_mode_forwards_arrow_keys`, `event.rs:9036-9143`) frame the change as "make wheel feel like touch scroll, since most terminals translate trackpad gestures into arrow-key sequences forwarded through the PTY." That intuition is right for a terminal multiplexer's *outer* layer, but wrong here for two reasons: 1. **The vendor CLIs treat unsolicited Up/Down arrows arriving on stdin as input-history navigation, not output-history scrolling.** Claude code specifically pattern-matches and emits the `Scroll wheel is sending arrow keys · use PgUp/PgDn to scroll in claude code` warning; codex shows similar oddness or ignores them. 2. **`fix-mouse-wheel`'s assertion technique only checked `is_scrolled_back()` flipped to false.** That's true for *any* `send_key` (because `send_key` resets scrollback to 0 — see `pty_pane.rs:448-451`). The test never asserted on the actual byte stream the child received, so it couldn't catch "wrong bytes were sent" — only "any bytes were sent". The smoke gate gave a false green. ## What was the correct contract The right behavior on a chat-tab wheel notch: - **Outer scroll consumed by wg** → `pane.scroll_up/down(N)` on wg's vt100 scrollback. Inner child sees zero bytes. This is what the pre-regression code did, and what fix-mouse-wheel-2 has restored. It works for claude (main-screen scrollback has content). For codex (alt-screen) it's a no-op; users can use `Ctrl+]` scroll mode added by `implement-tui-scroll` to navigate codex's main-screen scrollback via keyboard. Codex's *own* alt-screen scrollback (if any) is reachable only through codex's native bindings, not through wheel. For touch scroll: depends on terminal. Most terminals translate trackpad gestures into mouse-wheel SGR (matching wheel) or into Up/Down keypresses. If they're keypresses, those go through `handle_key`, not `handle_mouse`, and they will continue to forward to the child as before. That's a separate behavior knob — not something the wheel handler should imitate. ## Concrete fix proposal **Status: already implemented and merged at HEAD as commit `745243957` (fix-mouse-wheel-2).** Verified via the byte-level capture above (post-fix wheel-up = 0 bytes to child stdin). Summary of the merged fix in case the task is re-evaluated or another diagnostic is opened: 1. `forward_chat_wheel` (`src/tui/viz_viewer/event.rs`) drops the `vendor_pty_active` branch entirely. Wheel always calls `pane.scroll_up/down(3)`. No `send_key` from the wheel path. 2. `PtyPane` (`src/tui/pty_pane.rs`) gained `input_bytes_written: Arc<AtomicU64>` plus a `child_input_bytes_written()` accessor. Both `send_key` and `send_text` `fetch_add` on it. Tests now have a real byte-count assertion, not a flag-flip proxy. 3. The test (`event.rs::mouse_wheel_in_vendor_pty_mode_scrolls_outer_not_inner`) asserts BOTH `is_scrolled_back() == true` AND `child_input_bytes_written()` is unchanged — i.e., zero bytes routed to the child. The companion observer-mode test guards the fall-through path. 4. New permanent smoke `tests/smoke/scenarios/tui_mouse_wheel_does_not_send_arrows.sh` runs both unit tests; manifest entry owns `fix-mouse-wheel-2`. **Recommendation for `fix-mouse-wheel-2`:** the implementation passes this diagnostic's byte-level test. Update its task description to correct the "implement-tui-scroll is the culprit" attribution to "fix-mouse-wheel (8ddeb9e42) is the culprit." The implementation itself does not need re-work. ## Outstanding follow-up tasks worth filing (not done in this diagnostic) - **Codex content scrolling parity.** Codex's alt-screen content is unreachable via wheel or `Ctrl+]` scroll mode (Ctrl+] navigates wg's main-screen vt100 scrollback, which is empty under codex). The user's "codex never scrolled" complaint is not fully resolved. Options: (a) document this as expected — codex users use codex's native scroll bindings; (b) extend `Ctrl+]` mode to capture alt-screen rows separately if/when vt100 exposes them. - **Touch-scroll-as-arrow asymmetry.** If a terminal translates touch into Up/Down keypresses, those still forward to the child. The user explicitly reported "before i could scroll with finger and scroll wheel" — which means pre-regression touch scroll worked because the keypresses landed somewhere useful. After fix-mouse-wheel-2, touch may still produce arrow keys depending on the terminal, which can re-trigger the same claude warning. Worth filing a separate task to verify and, if needed, swallow Up/Down on the chat-tab keyboard handler when not in scroll mode. ## Validation against task acceptance criteria - [x] Pre-regression byte stream captured for chat wheel scroll (uses wg nex; same code path as claude/codex via `chat_pty_forwards_stdin`) - [x] Post-regression byte stream captured (`\x1b[A` ×3 per wheel-up notch, `\x1b[B` ×3 per wheel-down notch) - [x] Post-fix byte stream captured (zero, matches pre-regression) - [x] Diff between pre and post identified at the commit/file level (`8ddeb9e42`, `src/tui/viz_viewer/event.rs::forward_chat_wheel`) - [x] Pre-regression mechanism explained with file:line citations (`pty_pane.rs:383-402`, `event.rs:3644-3690` pre-fix) - [x] Concrete fix proposal — already merged at HEAD as `745243957`; passes byte-level smoke - [x] No source modifications by this diagnostic
2026-04-29T20:01:53.613579666+00:00 Validated: research-only task, no source modifications. Captures empirically confirm pre/regressed/post byte streams via Python pty harness. Deliverable log entry posted. Cross-noted on fix-mouse-wheel-2 with corrected attribution.
2026-04-29T20:01:54.085861412+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-04-29T20:04:04.090014619+00:00 PendingEval → Done (evaluator passed; downstream unblocks)
2026-04-29T20:51:40.388707264+00:00 FOLLOWUP 2026-04-29: original diagnose verified pre-regression wheel sent 0 bytes to PTY child stdin, but did NOT identify WHERE the wheel events actually went. The chat-agent overclaimed 'glory was tmux'; user pushed back. Need empirical disambiguation. User quote: 'we were maintaining our own scrollback buffer? are you sure glory is tmux?' ADDITIONAL INVESTIGATION step needed (re-open if necessary): Run `wg tui` BOTH inside tmux AND outside tmux on the pre-regression binary (commit 15c6b2ac7 or its parent — whichever is the last 'wheel scroll worked' state). Test 1: `wg tui` inside tmux session, claude chat with multi-screen scrollback, wheel up. - Does scrolling happen? (yes/no) - If yes: is it scrolling tmux's copy-mode buffer (status line shows tmux mode change, can't type in chat) OR our vt100 pane's buffer (status normal, can still interact with chat)? Test 2: `wg tui` directly in alacritty/iterm/gnome-terminal (NO tmux), same setup. - Does wheel scroll happen? - If yes: that's our pane (no tmux involvement possible). - If no: confirms glory was tmux specifically. Test 3: `wg tui` inside tmux but with `set -g mouse off` in tmux config first. - Disables tmux's wheel capture. Does wheel scroll work? - If yes: that means our TUI handles it directly (and tmux was getting in the way). - If no: confirms wheel events weren't reaching our TUI in the first place. Whichever combination produces 'wheel scrolls' identifies the responsible layer. This is small additional investigation; the byte-level capture work is already done. Just need empirical 'where did the events go' verification.