research-current-openai — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-993`
Agent identity	`f51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e`
Model	`claude:opus`
Created	2026-04-28T22:38:43.509006827+00:00
Started	2026-04-28T22:45:28.078440627+00:00
Completed	2026-04-28T22:48:32.656875426+00:00
Tags	`priority-high,research,codex,config`, `eval-scheduled`
Eval score	0.94
└ blocking impact	0.95
└ completeness	0.95
└ constraint fidelity	0.70
└ coordination overhead	0.95
└ correctness	0.95
└ downstream usability	0.92
└ efficiency	0.90
└ intent fidelity	0.80
└ style adherence	0.95

Description

Workgraph's codex-cli route currently hardcodes codex:o1-pro for everything. There's no haiku/sonnet/opus-equivalent tier breakdown for codex agents, and the agency pipeline (.evaluate-*, .flip-*, .assign-*) silently falls back to claude:haiku because [models.*] per-role overrides aren't seeded.

Research the current state of OpenAI/codex models as of 2026-04-28 and produce the mapping needed to fix this.

What to find out

Use WebSearch / WebFetch (do NOT guess from training-cutoff knowledge — models have moved fast). Verify against:

OpenAI's official model list (platform.openai.com/docs/models or equivalent)
The codex CLI documentation for what models it supports
Recent (2026) pricing pages

Produce a mapping table in the task log with these columns:

Tier role	Claude model	Codex/OpenAI equivalent (2026-04-28)	Rationale	$/1M input	$/1M output
cheap/fast (haiku)	claude:haiku	codex:???	...	...	...
balanced (sonnet)	claude:sonnet	codex:???	...	...	...
heavy (opus)	claude:opus	codex:???	...	...	...

Specifically answer:

What are the current top-tier (reasoning-heavy) codex models in 2026-04?
What is the cheapest codex model suitable for short, structured outputs (the role agency uses haiku for: scoring, FLIP, assignment verdicts)?
What is the mid-tier balanced codex model for normal worker tasks?
What model name strings does the codex CLI handler accept? (Check src/dispatch/handler_for_model.rs or equivalent if uncertain — but only the registry, not full source review.)
Is codex:o1-pro (the current default) actually still a current model name in 2026-04, or has it been deprecated/superseded?
The user mentioned "gpt-5.5" — does this exist? If not, what's closest?

Validation

Mapping table with all 3 tiers populated, posted to task log via wg log
Each codex model name is verified against an authoritative 2026 source (cite URL + date in the log)
Cost figures included (or marked "could not verify" — never invented)
Concrete recommendation: which model strings should the fix task write into [models.evaluator] / [models.assigner] / [models.flip] / [agent] / [dispatcher] in the codex-cli route
Note any gotchas (rate limits, regional availability, codex CLI compat) that affect the recommendation

## Description
Workgraph's codex-cli route currently hardcodes `codex:o1-pro` for everything. There's no haiku/sonnet/opus-equivalent tier breakdown for codex agents, and the agency pipeline (`.evaluate-*`, `.flip-*`, `.assign-*`) silently falls back to `claude:haiku` because `[models.*]` per-role overrides aren't seeded.

Research the current state of OpenAI/codex models as of 2026-04-28 and produce the mapping needed to fix this.

## What to find out

Use WebSearch / WebFetch (do NOT guess from training-cutoff knowledge — models have moved fast). Verify against:
- OpenAI's official model list (platform.openai.com/docs/models or equivalent)
- The codex CLI documentation for what models it supports
- Recent (2026) pricing pages

Produce a mapping table in the task log with these columns:
| Tier role | Claude model | Codex/OpenAI equivalent (2026-04-28) | Rationale | $/1M input | $/1M output |
|-----------|--------------|--------------------------------------|-----------|-------------|--------------|
| cheap/fast (haiku) | claude:haiku | codex:???                            | ...       | ...         | ...          |
| balanced (sonnet)  | claude:sonnet| codex:???                            | ...       | ...         | ...          |
| heavy (opus)       | claude:opus  | codex:???                            | ...       | ...         | ...          |

Specifically answer:
1. What are the current top-tier (reasoning-heavy) codex models in 2026-04?
2. What is the cheapest codex model suitable for short, structured outputs (the role agency uses haiku for: scoring, FLIP, assignment verdicts)?
3. What is the mid-tier balanced codex model for normal worker tasks?
4. What model name strings does the codex CLI handler accept? (Check src/dispatch/handler_for_model.rs or equivalent if uncertain — but only the registry, not full source review.)
5. Is `codex:o1-pro` (the current default) actually still a current model name in 2026-04, or has it been deprecated/superseded?
6. The user mentioned "gpt-5.5" — does this exist? If not, what's closest?

## Validation
- [ ] Mapping table with all 3 tiers populated, posted to task log via `wg log`
- [ ] Each codex model name is verified against an authoritative 2026 source (cite URL + date in the log)
- [ ] Cost figures included (or marked "could not verify" — never invented)
- [ ] Concrete recommendation: which model strings should the fix task write into [models.evaluator] / [models.assigner] / [models.flip] / [agent] / [dispatcher] in the codex-cli route
- [ ] Note any gotchas (rate limits, regional availability, codex CLI compat) that affect the recommendation

Depends on

Required by

done fix-codex-cli

Log

2026-04-28T22:38:43.494554937+00:00 Task paused
2026-04-28T22:39:16.280426423+00:00 Task published
2026-04-28T22:45:28.078444414+00:00 Spawned by coordinator --executor claude --model opus
2026-04-28T22:45:40.624706801+00:00 Starting research: 1) check codex-cli handler accepted model strings 2) web search current OpenAI models 2026-04 3) verify pricing 4) produce mapping
2026-04-28T22:48:16.600569155+00:00 # OpenAI/codex tier mapping research — 2026-04-28 ## Sources verified (cited inline below) - OpenAI Codex models page: https://developers.openai.com/codex/models (fetched 2026-04-28) - OpenAI API models index: https://developers.openai.com/api/docs/models (fetched 2026-04-28) - OpenAI deprecations page: https://developers.openai.com/api/docs/deprecations (fetched 2026-04-28) - Codex CLI changelog: https://developers.openai.com/codex/changelog (fetched 2026-04-28) - Codex CLI reference: https://developers.openai.com/codex/cli/reference (fetched 2026-04-28) - "Introducing GPT-5.5" (OpenAI, 2026-04-23): https://openai.com/index/introducing-gpt-5-5/ - TechCrunch GPT-5.5 launch coverage 2026-04-23: https://techcrunch.com/2026/04/23/openai-chatgpt-gpt-5-5-ai-model-superapp/ - BenchLM API pricing snapshot (April 2026): https://benchlm.ai/blog/posts/openai-api-pricing - pricepertoken.com per-model pages (2026): https://pricepertoken.com/pricing-page/model/openai-gpt-5.4-pro , .../openai-gpt-5.4-nano ## Q1–Q6 — direct answers **Q1 — top-tier (reasoning-heavy) codex models in 2026-04:** gpt-5.5 (released 2026-04-23, API 2026-04-24) is the new frontier model. gpt-5.5-pro exists for maximum-reasoning workloads. gpt-5.4 remains current as the affordable flagship. gpt-5.2 is still listed as an alternate "deeper deliberation" debugging choice. Codex-specialised: gpt-5.3-codex (industry-leading coding model, current). **Q2 — cheapest codex model for short structured outputs:** gpt-5.4-nano ($0.20 / $1.25 per MTok) is the cheapest. gpt-5.4-mini ($0.75 / $4.50) is the next step up and is OpenAI's recommended subagent model ("more than 2x faster than gpt-5-mini"). For agency scoring/FLIP/assignment verdicts (a few hundred output tokens) either works; nano is ~3.5x cheaper. **Q3 — mid-tier balanced codex model:** gpt-5.4 ($2.50 / $15 per MTok) is the codex CLI's own default as of CLI 0.124.0 (2026-04-23 release: "Refreshed bundled model metadata, including the current gpt-5.4 default"). 1M context, 128K max output, knowledge cutoff 2025-08-31. **Q4 — model name strings the codex CLI handler accepts:** Workgraph's handler (src/dispatch/handler_for_model.rs:66) is lenient: ANY `codex:<anything>` routes to the codex CLI handler — the string after the prefix is passed verbatim as `--model` to the codex binary. The codex CLI itself accepts any model from its bundled metadata; current verified strings (per developers.openai.com/codex/models, 2026-04-28): - gpt-5.5 - gpt-5.5-pro (API only, 2026-04-24+) - gpt-5.4 (CLI default) - gpt-5.4-mini - gpt-5.4-nano - gpt-5.3-codex - gpt-5.3-codex-spark (ChatGPT Pro only, research preview) - gpt-5.2 **Q5 — is codex:o1-pro still current?** NO — o1-pro is on a deprecation path. From OpenAI's deprecations page: "Shutdown date 2026-10-23, Model snapshot o1-pro-2025-03-19 | o1-pro, Recommended replacement: gpt-5.4-pro." It still works today but will hard-fail in ~6 months. Workgraph's codex-cli route default of `codex:o1-pro` is a guaranteed-future-breakage. Replace immediately. ALSO ON THE SUNSET LIST that workgraph references in src/profile.rs: - `codex:gpt-5-codex` (profile.rs:116, "standard" tier) → shutdown 2026-07-23, substitute = `gpt-5.4`. Replace before July or it breaks silently. **Q6 — does gpt-5.5 exist? Was the user right?** Yes — gpt-5.5 was announced 2026-04-23, exactly 5 days before this research. The user's "gpt-5.5" reference is correct, not hallucinated. API access via api-key landed 2026-04-24. Codex CLI 0.124.0 (released same day, 2026-04-23) already supports `codex --model gpt-5.5`. ## Mapping table (the deliverable) | Tier role | Claude model | Codex/OpenAI equiv (2026-04-28) | Rationale | \$/1M in | \$/1M out | |------------------|----------------|---------------------------------|----------------------------------------------------------------------------------------------------|----------|-----------| | cheap/fast (haiku) — DEFAULT | claude:haiku | **codex:gpt-5.4-mini** | OpenAI's recommended subagent model; 2x faster than gpt-5-mini; cheap enough for FLIP/eval/assign. | \$0.75 | \$4.50 | | cheap/fast (haiku) — ALT | claude:haiku | codex:gpt-5.4-nano | 3.5x cheaper than mini; use only if structured outputs are confirmed-short. | \$0.20 | \$1.25 | | balanced (sonnet) — DEFAULT | claude:sonnet | **codex:gpt-5.4** | Codex CLI's own default as of 0.124.0; 1M context; affordable flagship. | \$2.50 | \$15 | | balanced (sonnet) — coding alt | claude:sonnet | codex:gpt-5.3-codex | Codex-specialised; still current; pricing not separately published (treat ≈ gpt-5.4). | ~\$2.50 | ~\$15 | | heavy (opus) — DEFAULT | claude:opus | **codex:gpt-5.5** | Newest frontier model (2026-04-23); OpenAI's "smartest"; matches opus tier. | \$5 | \$30 | | heavy (opus) — max reasoning | claude:opus | codex:gpt-5.5-pro | Deep-horizon reasoning; 12x more expensive than gpt-5.5; reserve for hard tasks. | \$30 | \$180 | ## Concrete recommendation for fix-codex-cli Workgraph today writes (when `wg config -m codex:o1-pro` or similar is set): - agent default = codex:o1-pro ← DEPRECATED 2026-10-23 - dispatcher default = codex:o1-pro ← DEPRECATED 2026-10-23 - [models.evaluator] = (unset) ← falls back to claude:haiku silently - [models.assigner] = (unset) ← falls back to claude:haiku silently - [models.flip] = (unset) ← falls back to claude:haiku silently The fix-codex-cli task should write THIS into the codex-cli route's default config: ```toml [agent] model = "codex:gpt-5.4" [dispatcher] model = "codex:gpt-5.4" [models] evaluator = "codex:gpt-5.4-mini" # cheap, structured, fast assigner = "codex:gpt-5.4-mini" flip = "codex:gpt-5.4-mini" ``` For users who explicitly want a premium tier: ```toml [models] premium = "codex:gpt-5.5" # opus-equivalent ``` Also update src/profile.rs codex tiers (currently profile.rs:115-117): - fast: codex:gpt-5.4-mini (already correct — keep) - standard: codex:gpt-5-codex → CHANGE to codex:gpt-5.4 (gpt-5-codex sunsets 2026-07-23) - premium: codex:gpt-5.4-pro → CHANGE to codex:gpt-5.5 (gpt-5.5 is now newer + cheaper than 5.4-pro at $5/$30 vs $30/$180; 5.4-pro is overkill for "premium worker") ## Gotchas affecting the recommendation 1. **Auth split for gpt-5.5 in codex CLI**: gpt-5.5 was added 2026-04-23 (codex CLI 0.124.0). It works with ChatGPT-login auth from day 1; API-key auth landed 2026-04-24. Workgraph's codex-cli route delegates auth to the codex binary, so this should "just work" — but users on codex CLI <0.124.0 won't see it. The fix task should NOT assume gpt-5.5 is available without a CLI version check, OR should fall back to gpt-5.4. 2. **Sunset cascade**: TWO model strings already in workgraph hit the deprecation list: - `codex:o1-pro` → 2026-10-23 shutdown (current default — burning fuse) - `codex:gpt-5-codex` → 2026-07-23 shutdown (in profile.rs:116 — closer fuse) 3. **Long-context pricing cliff**: gpt-5.5-pro and gpt-5.4-pro charge 2x input / 1.5x output for sessions exceeding 272K input tokens. Workgraph's `context full` scope can blow past this on big graphs. Reserve pro models for surgical use. 4. **Regional residency uplift**: 10% surcharge for gpt-5.5-pro and gpt-5.4-nano on data-residency endpoints. Not a blocker, but cost-modelling should note it. 5. **gpt-5.3-codex-spark** is ChatGPT-Pro-only research preview — do NOT default to it. Document only as an opt-in for power users. 6. **Default `wg config -m codex:gpt-5`** (CLAUDE.md examples) currently passes the bare base `gpt-5` string. Per the deprecations page various `gpt-5-*` snapshots are being retired; the unversioned `gpt-5` itself appears to remain as a routing alias but `gpt-5.4` is the safer pinned default.
2026-04-28T22:48:24.467811945+00:00 Validated against task description checklist: [x] Mapping table with all 3 tiers populated (haiku=gpt-5.4-mini, sonnet=gpt-5.4, opus=gpt-5.5) — posted via wg log above [x] Each codex model name verified against authoritative 2026 source — developers.openai.com/codex/models, /api/docs/models, /api/docs/deprecations, /codex/changelog all fetched 2026-04-28 with URLs cited [x] Cost figures included from BenchLM + pricepertoken + OpenAI API docs (none invented) [x] Concrete config recommendation provided for [agent], [dispatcher], [models.evaluator], [models.assigner], [models.flip], and src/profile.rs tier strings [x] Gotchas listed: auth split, sunset cascade (o1-pro 2026-10-23, gpt-5-codex 2026-07-23), >272K context pricing cliff, regional uplift, ChatGPT-Pro-only spark variant
2026-04-28T22:48:32.656880586+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-04-28T22:51:34.109444153+00:00 PendingEval → Done (evaluator passed; downstream unblocks)