Metadata
| Status | done |
|---|---|
| Assigned | agent-1030 |
| Agent identity | f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e |
| Created | 2026-04-29T00:10:51.455197210+00:00 |
| Started | 2026-04-29T00:11:10.794901232+00:00 |
| Completed | 2026-04-29T00:26:14.764889671+00:00 |
| Tags | fix,codex,config, eval-scheduled |
| Eval score | 0.91 |
| └ blocking impact | 0.90 |
| └ completeness | 0.95 |
| └ coordination overhead | 0.90 |
| └ correctness | 0.95 |
| └ downstream usability | 0.90 |
| └ efficiency | 0.85 |
| └ intent fidelity | 0.88 |
| └ style adherence | 0.95 |
Description
Description
Update the codex defaults to use gpt-5.5 (newest frontier model, / per MTok) as the worker default instead of gpt-5.4 ($2.50/$15). User preference: prefer newest capability for workers; the 2x cost difference is acceptable given codex:gpt-5.5 is still 3x cheaper than claude:opus per MTok. Meta-tasks (eval/flip/assign) stay on gpt-5.4-mini (best mini tier — no gpt-5.5-mini exists).
Scope
-
wg init --route codex-clishould write:[agent].model = "codex:gpt-5.5"(was: gpt-5.4)[dispatcher].model = "codex:gpt-5.5"(was: gpt-5.4)[models.default].model = "codex:gpt-5.5"(was: gpt-5.4)[models.evaluator]/[models.assigner]stay oncodex:gpt-5.4-mini(unchanged)[tiers]section: fast=gpt-5.4-mini (unchanged), standard=gpt-5.5 (was gpt-5.4), premium=gpt-5.5 (unchanged) — OR keep standard=gpt-5.4 and just change [agent]/[dispatcher]. Pick the consistent option.
-
The
codexstarter profile (written bywg profile init-starters) should match — worker on gpt-5.5, meta on gpt-5.4-mini. The current starter description says "gpt-5.5 worker, gpt-5.4-mini for agency" which is what we want; verify it matches the actual file content. -
Add a
[models.flip]entry (currently missing in the codex-cli route output) so FLIP scoring also explicitly uses gpt-5.4-mini instead of falling through to a default that might silently be claude:haiku.
Validation
-
wg init --route codex-cli --dry-runoutput shows codex:gpt-5.5 in [agent].model and [dispatcher].model -
wg init --route codex-cli --dry-runoutput includes a [models.flip] section pointing at codex:gpt-5.4-mini -
wg profile init-startersfollowed bywg profile show codexshows gpt-5.5 as worker model -
Live smoke: in a fresh tmpdir,
wg init --route codex-clithen spawn a tiny task and confirm worker uses gpt-5.5 (wg agentsor WG_MODEL env var) - cargo build + cargo test pass with no regressions
- cargo install --path . was run before claiming done
Depends on
Required by
- (none)
Log
- 2026-04-29T00:10:51.446826765+00:00 Task paused
- 2026-04-29T00:10:55.910635453+00:00 Task published
- 2026-04-29T00:11:09.379725994+00:00 Lightweight assignment: agent=Careful Programmer (f5143935), exec_mode=full, context_scope=task, reason=Careful Programmer is the sole programmer available with high performance (0.76), 240 completed tasks, and Careful tradeoff—ideal for correctness-critical config defaults.
- 2026-04-29T00:11:10.794904939+00:00 Spawned by coordinator --executor claude --model opus
- 2026-04-29T00:11:19.002725167+00:00 Starting: bump codex defaults to gpt-5.5 worker, keep gpt-5.4-mini for meta-tasks
- 2026-04-29T00:13:53.094829002+00:00 Plan: bump codex-cli route worker to gpt-5.5 in config_defaults.rs + config_cmd.rs render_minimal_config; add flip_inference/flip_comparison pinned to gpt-5.4-mini; update tests + smoke + docs
- 2026-04-29T00:25:29.229243861+00:00 Validated: cargo build OK, cargo test config_defaults all 22 pass, integration_setup_routes 15/15 pass, smoke codex_init_route_has_correct_defaults PASS, profile_diff PASS, profile_use_without_daemon PASS
- 2026-04-29T00:25:29.347796064+00:00 Pre-existing test failures unrelated to this task: tests/smoke_context.rs compile errors (missing ResumeConfig fields), config::tests::test_global_config_path (renamed .workgraph→.wg), config_show_default (renamed [coordinator]→[dispatcher]) — all from prior commits, not regressions
- 2026-04-29T00:26:03.867138970+00:00 Committed: 0299b376e — pushed to remote (wg/agent-1030/bump-codex-defaults)
- 2026-04-29T00:26:14.764897356+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
- 2026-04-29T00:29:03.430647734+00:00 PendingEval → Done (evaluator passed; downstream unblocks)