wg-nex-resume-311 — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-166`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Created	2026-04-26T23:24:28.558416530+00:00
Started	2026-04-26T23:28:01.014617545+00:00
Completed	2026-04-27T00:09:12.923318853+00:00
Tags	`eval-scheduled`
Tokens	16887975 in / 45358 out
Eval score	0.84
└ blocking impact	0.90
└ completeness	0.85
└ coordination overhead	0.80
└ correctness	0.82
└ downstream usability	0.78
└ efficiency	0.88
└ intent fidelity	0.86
└ style adherence	0.92

Description

User repro in ~/autohaiku: try to resume coordinator-0 in TUI chat. Multiple stacked bugs visible in one frame:

Symptoms

Top error: 'Failed to create coordinator: Error: Service daemon...' (truncated in TUI display — full message likely 'Service daemon not responding' or similar)
nex resume log: '[native-agent] Resuming from journal: 311 messages, 0 stale annotations'
User Ctrl-C: '[nex] Interrupted — dropping in-flight response.'

Endpoint log (lambda01) receives a flood of malformed requests:

POST /v1/chat/completions HTTP/1.1 400 Bad Request
POST /v1/chat/completions HTTP/1.1 400 Bad Request
POST /v1/chat/completions HTTP/1.1 400 Bad Request
... (multiple per second, indefinitely)

TUI throbber spins infinitely — never times out, never reports the 400, never gives up.

Root causes (3 distinct, all critical)

A. Context overflow on resume: qwen3-coder (per model_registry: context_window=32768) cannot accept 311 messages. nex's resume path replays the entire journal verbatim into a single request → exceeds context → endpoint rejects with 400.

B. No backoff on 4xx: nex retries the same request immediately on failure, in a tight loop. There's no exponential backoff, no max-retries circuit breaker, no dead-letter handoff. 400 means 'request is malformed; retrying won't help' — should surface and stop, not retry.

C. Throbber doesn't reflect actual state: UI shows perpetual 'thinking' even when every request is failing. Throbber should clear / show error after N consecutive failures or M seconds without a successful response.

Fix per cause

A. Journal-replay context budget:

On resume, compute total token estimate of journal messages.
If total > model's context_window * (1 - safety_margin, e.g. 0.8), apply one of:
- Auto-summarize older messages (compaction-on-resume)
- Truncate to last N messages that fit
- Surface a 'journal too large to resume; truncate? compact? abandon?' prompt
DEFAULT: auto-truncate to fit with a clear log line: 'Journal had 311 messages (~250k tokens); truncated to last 47 messages (~28k tokens) to fit qwen3-coder context'.

B. Retry policy on HTTP 4xx:

400/422 (malformed): 0 retries, surface error immediately, abort the turn.
401/403 (auth): 0 retries, surface error, abort.
429 (rate limit): exponential backoff with Retry-After header respect, max 3 retries.
5xx: exponential backoff, max 5 retries.
Document this policy in src/executor/native/client.rs or wherever HTTP errors are handled.

C. Throbber state truthfulness:

Throbber reflects 'live request in flight'. If request fails / aborts / times out, throbber clears AND error is shown in chat pane.
After N consecutive failures or T seconds without response, throbber clears and an error toast surfaces: 'nex: 8 consecutive 400 Bad Request errors against lambda01; aborted. Check daemon log.'

Workaround (manual, until fix lands)

Kill the wedged process: pkill -f 'wg nex.*autohaiku' && pkill -f 'native-agent.*coordinator-0'
Endpoint flood stops.
Either trim coordinator-0's chat journal manually (.wg/chat/coordinator-0/{inbox,outbox}.jsonl) to leave the last 20-30 messages, OR archive coordinator-0 and create a fresh chat.

Hard gate before claiming done

Repro the exact scenario: scratch dir, qwen3-coder + lambda01, drop a 311-message journal in (synthesize one), resume in TUI.
Assert: nex either auto-truncates the journal AND succeeds, OR surfaces a clear error AND stops retrying within 3 attempts AND throbber clears.
Endpoint MUST NOT receive more than 3 requests in the failure case.
Capture endpoint log + daemon log + chat session jsonl as evidence.

Validation

Failing tests first:
- test_nex_resume_truncates_oversized_journal — synthetic 500-msg journal + small context → resume succeeds with truncation log
- test_nex_400_no_retry_loop — stub endpoint returning 400 → nex sends exactly 1 request, surfaces error, throbber clears
- test_nex_429_backoff — stub endpoint returning 429 → exponential backoff observed, max 3 retries
Implementation makes tests pass
cargo build + cargo test pass with no regressions
HARD GATE manual smoke as above
Coordinate with deprecation-warnings-on (handler stdout hygiene) — same handler code area

## Description

User repro in ~/autohaiku: try to resume coordinator-0 in TUI chat. Multiple stacked bugs visible in one frame:

### Symptoms

1. **Top error**: 'Failed to create coordinator: Error: Service daemon...' (truncated in TUI display — full message likely 'Service daemon not responding' or similar)
2. **nex resume log**: '[native-agent] Resuming from journal: 311 messages, 0 stale annotations'
3. **User Ctrl-C**: '[nex] Interrupted — dropping in-flight response.'
4. **Endpoint log (lambda01)** receives a flood of malformed requests:
   ```
   POST /v1/chat/completions HTTP/1.1 400 Bad Request
   POST /v1/chat/completions HTTP/1.1 400 Bad Request
   POST /v1/chat/completions HTTP/1.1 400 Bad Request
   ... (multiple per second, indefinitely)
   ```
5. **TUI throbber spins infinitely** — never times out, never reports the 400, never gives up.

### Root causes (3 distinct, all critical)

**A. Context overflow on resume**: qwen3-coder (per model_registry: context_window=32768) cannot accept 311 messages. nex's resume path replays the entire journal verbatim into a single request → exceeds context → endpoint rejects with 400.

**B. No backoff on 4xx**: nex retries the same request immediately on failure, in a tight loop. There's no exponential backoff, no max-retries circuit breaker, no dead-letter handoff. 400 means 'request is malformed; retrying won't help' — should surface and stop, not retry.

**C. Throbber doesn't reflect actual state**: UI shows perpetual 'thinking' even when every request is failing. Throbber should clear / show error after N consecutive failures or M seconds without a successful response.

### Fix per cause

**A. Journal-replay context budget**:
- On resume, compute total token estimate of journal messages.
- If total > model's context_window * (1 - safety_margin, e.g. 0.8), apply one of:
  - Auto-summarize older messages (compaction-on-resume)
  - Truncate to last N messages that fit
  - Surface a 'journal too large to resume; truncate? compact? abandon?' prompt
- DEFAULT: auto-truncate to fit with a clear log line: 'Journal had 311 messages (~250k tokens); truncated to last 47 messages (~28k tokens) to fit qwen3-coder context'.

**B. Retry policy on HTTP 4xx**:
- 400/422 (malformed): 0 retries, surface error immediately, abort the turn.
- 401/403 (auth): 0 retries, surface error, abort.
- 429 (rate limit): exponential backoff with Retry-After header respect, max 3 retries.
- 5xx: exponential backoff, max 5 retries.
- Document this policy in src/executor/native/client.rs or wherever HTTP errors are handled.

**C. Throbber state truthfulness**:
- Throbber reflects 'live request in flight'. If request fails / aborts / times out, throbber clears AND error is shown in chat pane.
- After N consecutive failures or T seconds without response, throbber clears and an error toast surfaces: 'nex: 8 consecutive 400 Bad Request errors against lambda01; aborted. Check daemon log.'

### Workaround (manual, until fix lands)

1. Kill the wedged process: `pkill -f 'wg nex.*autohaiku' && pkill -f 'native-agent.*coordinator-0'`
2. Endpoint flood stops.
3. Either trim coordinator-0's chat journal manually (`.wg/chat/coordinator-0/{inbox,outbox}.jsonl`) to leave the last 20-30 messages, OR archive coordinator-0 and create a fresh chat.

### Hard gate before claiming done

- Repro the exact scenario: scratch dir, qwen3-coder + lambda01, drop a 311-message journal in (synthesize one), resume in TUI.
- Assert: nex either auto-truncates the journal AND succeeds, OR surfaces a clear error AND stops retrying within 3 attempts AND throbber clears.
- Endpoint MUST NOT receive more than 3 requests in the failure case.
- Capture endpoint log + daemon log + chat session jsonl as evidence.

## Validation

- [ ] Failing tests first: 
  - test_nex_resume_truncates_oversized_journal — synthetic 500-msg journal + small context → resume succeeds with truncation log
  - test_nex_400_no_retry_loop — stub endpoint returning 400 → nex sends exactly 1 request, surfaces error, throbber clears
  - test_nex_429_backoff — stub endpoint returning 429 → exponential backoff observed, max 3 retries
- [ ] Implementation makes tests pass
- [ ] cargo build + cargo test pass with no regressions
- [ ] HARD GATE manual smoke as above
- [ ] Coordinate with deprecation-warnings-on (handler stdout hygiene) — same handler code area

Depends on

done .assign-wg-nex-resume-311

Required by

(none)

Log

2026-04-26T23:24:28.555783423+00:00 Task paused
2026-04-26T23:24:37.931431760+00:00 Task published
2026-04-26T23:27:59.424056134+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer best matches this correctness-critical showstopper requiring TDD, token estimation logic, HTTP retry backoff, and exhaustive manual smoke testing—Careful's tradeoff and 0.58 score on 52 complex tasks demonstrates the attention to detail needed for multi-faceted bugs with hard validation gates.
2026-04-26T23:28:01.014624739+00:00 Spawned by coordinator --executor claude --model opus
2026-04-26T23:28:13.151469534+00:00 Starting work — orienting on nex client + resume code paths
2026-04-27T00:08:19.639532485+00:00 Implemented Cause A (resume.rs hard-truncation post-compaction with model-aware tokenizer), Cause B (agent.rs interactive context-too-long counter, abort after 2 attempts or no-progress), and Cause C path (errors surface to outbox via existing on_error → throbber clears via existing TUI poll). Added 4 regression tests + 1 smoke scenario. cargo build + 1957 lib tests pass.
2026-04-27T00:09:03.374983273+00:00 Committed: 07a1092aa — pushed to remote
2026-04-27T00:09:12.923329893+00:00 Task marked as done