fix-supervisor-restart-backoff

Implement: supervisor restart-backoff on session-lock contention (per research-supervisor-lock-churn)

Metadata

Statusdone
Assignedagent-1802
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Modelcodex:gpt-5.5
Created2026-05-02T23:53:58.053052236+00:00
Started2026-05-03T01:06:38.218250080+00:00
Completed2026-05-03T01:07:46.347391432+00:00
Tagsfix,nex,chat,supervisor,bug, eval-scheduled
Eval score0.04
└ blocking impact0.00
└ completeness0.00
└ constraint fidelity0.70
└ coordination overhead0.10
└ correctness0.00
└ downstream usability0.00
└ efficiency0.20
└ intent fidelity0.01
└ style adherence0.10

Description

Description

Apply the backoff patch proposed by research-supervisor-lock-churn to coordinator_agent.rs's restart loop. When the nex subprocess exits status=1 within ~1s of spawn AND the live session lock holder is still recent, treat as lock contention and back off ≥10s instead of immediate restart. After N consecutive backoffs without progress, exit supervisor cleanly.

Implement directly — do not decompose further.

File scope (limit to these files)

  • src/commands/service/coordinator_agent.rs (around lines 929-971, restart logic on Ok(status))
  • tests/ (unit test for the backoff path with a fake session-lock-busy exit)

DO NOT touch: nex.rs lock acquisition, session_lock.rs, dispatch_boot.

Validation

  • Failing test written first: simulates exit-status-1 + recent live lock holder → assert ≥10s sleep recorded
  • Implementation makes the test pass
  • cargo build + cargo test pass with no regressions
  • Live verification: with daemon running and a chat in lock-churn (per research repro), apply patch, verify churn stops within 1 backoff window

Depends on

Required by

Log