bump-codex-defaults — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-1030`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Created	2026-04-29T00:10:51.455197210+00:00
Started	2026-04-29T00:11:10.794901232+00:00
Completed	2026-04-29T00:26:14.764889671+00:00
Tags	`fix,codex,config`, `eval-scheduled`
Eval score	0.91
└ blocking impact	0.90
└ completeness	0.95
└ coordination overhead	0.90
└ correctness	0.95
└ downstream usability	0.90
└ efficiency	0.85
└ intent fidelity	0.88
└ style adherence	0.95

Description

Update the codex defaults to use gpt-5.5 (newest frontier model, / per MTok) as the worker default instead of gpt-5.4 ($2.50/$15). User preference: prefer newest capability for workers; the 2x cost difference is acceptable given codex:gpt-5.5 is still 3x cheaper than claude:opus per MTok. Meta-tasks (eval/flip/assign) stay on gpt-5.4-mini (best mini tier — no gpt-5.5-mini exists).

Scope

wg init --route codex-cli should write:
- [agent].model = "codex:gpt-5.5" (was: gpt-5.4)
- [dispatcher].model = "codex:gpt-5.5" (was: gpt-5.4)
- [models.default].model = "codex:gpt-5.5" (was: gpt-5.4)
- [models.evaluator] / [models.assigner] stay on codex:gpt-5.4-mini (unchanged)
- [tiers] section: fast=gpt-5.4-mini (unchanged), standard=gpt-5.5 (was gpt-5.4), premium=gpt-5.5 (unchanged) — OR keep standard=gpt-5.4 and just change [agent]/[dispatcher]. Pick the consistent option.
The codex starter profile (written by wg profile init-starters) should match — worker on gpt-5.5, meta on gpt-5.4-mini. The current starter description says "gpt-5.5 worker, gpt-5.4-mini for agency" which is what we want; verify it matches the actual file content.
Add a [models.flip] entry (currently missing in the codex-cli route output) so FLIP scoring also explicitly uses gpt-5.4-mini instead of falling through to a default that might silently be claude:haiku.

Validation

wg init --route codex-cli --dry-run output shows codex:gpt-5.5 in [agent].model and [dispatcher].model
wg init --route codex-cli --dry-run output includes a [models.flip] section pointing at codex:gpt-5.4-mini
wg profile init-starters followed by wg profile show codex shows gpt-5.5 as worker model
Live smoke: in a fresh tmpdir, wg init --route codex-cli then spawn a tiny task and confirm worker uses gpt-5.5 (wg agents or WG_MODEL env var)
cargo build + cargo test pass with no regressions
cargo install --path . was run before claiming done

## Description
Update the codex defaults to use gpt-5.5 (newest frontier model, / per MTok) as the worker default instead of gpt-5.4 ($2.50/$15). User preference: prefer newest capability for workers; the 2x cost difference is acceptable given codex:gpt-5.5 is still 3x cheaper than claude:opus per MTok. Meta-tasks (eval/flip/assign) stay on gpt-5.4-mini (best mini tier — no gpt-5.5-mini exists).

## Scope
1. `wg init --route codex-cli` should write:
   - `[agent].model = "codex:gpt-5.5"` (was: gpt-5.4)
   - `[dispatcher].model = "codex:gpt-5.5"` (was: gpt-5.4)
   - `[models.default].model = "codex:gpt-5.5"` (was: gpt-5.4)
   - `[models.evaluator]` / `[models.assigner]` stay on `codex:gpt-5.4-mini` (unchanged)
   - `[tiers]` section: fast=gpt-5.4-mini (unchanged), standard=gpt-5.5 (was gpt-5.4), premium=gpt-5.5 (unchanged) — OR keep standard=gpt-5.4 and just change [agent]/[dispatcher]. Pick the consistent option.

2. The `codex` starter profile (written by `wg profile init-starters`) should match — worker on gpt-5.5, meta on gpt-5.4-mini. The current starter description says "gpt-5.5 worker, gpt-5.4-mini for agency" which is what we want; verify it matches the actual file content.

3. Add a `[models.flip]` entry (currently missing in the codex-cli route output) so FLIP scoring also explicitly uses gpt-5.4-mini instead of falling through to a default that might silently be claude:haiku.

## Validation
- [ ] `wg init --route codex-cli --dry-run` output shows codex:gpt-5.5 in [agent].model and [dispatcher].model
- [ ] `wg init --route codex-cli --dry-run` output includes a [models.flip] section pointing at codex:gpt-5.4-mini
- [ ] `wg profile init-starters` followed by `wg profile show codex` shows gpt-5.5 as worker model
- [ ] Live smoke: in a fresh tmpdir, `wg init --route codex-cli` then spawn a tiny task and confirm worker uses gpt-5.5 (`wg agents` or WG_MODEL env var)
- [ ] cargo build + cargo test pass with no regressions
- [ ] cargo install --path . was run before claiming done

Depends on

done .assign-bump-codex-defaults

Required by

(none)

Log

2026-04-29T00:10:51.446826765+00:00 Task paused
2026-04-29T00:10:55.910635453+00:00 Task published
2026-04-29T00:11:09.379725994+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer is the sole programmer available with high performance (0.76), 240 completed tasks, and Careful tradeoff—ideal for correctness-critical config defaults.
2026-04-29T00:11:10.794904939+00:00 Spawned by coordinator --executor claude --model opus
2026-04-29T00:11:19.002725167+00:00 Starting: bump codex defaults to gpt-5.5 worker, keep gpt-5.4-mini for meta-tasks
2026-04-29T00:13:53.094829002+00:00 Plan: bump codex-cli route worker to gpt-5.5 in config_defaults.rs + config_cmd.rs render_minimal_config; add flip_inference/flip_comparison pinned to gpt-5.4-mini; update tests + smoke + docs
2026-04-29T00:25:29.229243861+00:00 Validated: cargo build OK, cargo test config_defaults all 22 pass, integration_setup_routes 15/15 pass, smoke codex_init_route_has_correct_defaults PASS, profile_diff PASS, profile_use_without_daemon PASS
2026-04-29T00:25:29.347796064+00:00 Pre-existing test failures unrelated to this task: tests/smoke_context.rs compile errors (missing ResumeConfig fields), config::tests::test_global_config_path (renamed .workgraph→.wg), config_show_default (renamed [coordinator]→[dispatcher]) — all from prior commits, not regressions
2026-04-29T00:26:03.867138970+00:00 Committed: 0299b376e — pushed to remote (wg/agent-1030/bump-codex-defaults)
2026-04-29T00:26:14.764897356+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-04-29T00:29:03.430647734+00:00 PendingEval → Done (evaluator passed; downstream unblocks)