I currently use GPT‑5.1-Codex High and have a workflow that works well with the 5-hour/weekly limits, credits, et al. If I use GPT‑5.1-Codex-Max Medium or GPT‑5.1-Codex-Max High, how will that compare cost / credits / limits wise to GPT‑5.1-Codex High? I don't think that's clear. "Reduced tokens" makes me think it'll be priced similarly / lower. But, "Max" makes me think it'll be priced higher.
Would it make sense to have a similar feature in Codex CLI? I often do "spec-driven development", which is basically a loop of:
research -> implementation plan -> actual implementation (based on research + plan) -> validation
I have multiple subagents that I use for each phase that (based on subjective judgement) improve the output quality (vs keeping everything, every tool use etc. in the "main" context window).
Codex CLI is great and I use it often but I'd like to have more of these convenient features for managing context from CC. I'm super happy that compaction is now available, hopefully we'll get more features for managing context.
It would be nice if users of the codex-cli that are just using API keys as a way to handle rate limits and billing could receive these new models at the same time. I appreciate the reasoning behind delayed 'actual API' release, but I've found the rate limiting to be quite annoying, and my own API keys don't have this limitation.
Re: rate limits, I'm not sure they can, yet, on capacity. See Jensen's comment today about their cloud GPUs being sold out. So capacity increased await the ongoing data center build out.
Will -minis come for the codex family of models? About two months ago I used 5-mini as a daily driver for a few weeks and quite liked it, it seemed capable enough on small tasks with some hand holding and the speed/price were great as well.
Sorry don’t like the max model, feels like it needs a lot more guiding. The plans it writes however are better, so I tried feeding it back in (meta prompt style) and working okay so far. Very large repository.
Did you guys fix not being able to enable websearches or configure no timeouts for specific commands in the SDk (error 124 is way too common for long running tasks)
Probably that before it was given system instructions on how to do compaction and now the compaction is learned by the model making it a native ability of the model without any extra instruction used in the prompt.
Continuous pre training or fine tuning, instead of inference-time instructions. It's also possible synthetic data for this purpose was in the pre training as well, and they're now getting it to behave the way they'd like.
I think the point here is not that it does compaction (which Codex also already does) - but that the model was trained with examples of the Codex compaction, so it should perform better when compaction has taken place (a common source for drops in performance for earlier models).
I am also trying to understand the difference between compaction, and what IDEs like Cursor do when they "summarize" context over long-running conversations.
Is this saying that said summarization now happens at the model level? Or are there other differences?
My understanding is that they trained it to explicitly use a self-prune/self-edit tool that trims/summarizes portions of its message history (e.g. use tool results from file explorations, messages that are no longer relevant, etc) during the session, rather than "panic-compact" at the end. In any case, it would be good if it does something like this.
I don’t see how their business would succeed. So far they are burning billions of investment dollars on compute with barely any revenue. Side hustles like Sora are a disaster that costs so much money for each video and will never bring any money
- New benchmark SOTAs with 77.9% on SWE-Bench-Verified, 79.9% on SWE-Lancer, and 58.1% on TerminalBench 2.0
- Natively trained to work across many hours across multiple context windows via compaction
- 30% more token-efficient at the same reasoning level across many tasks
Let us know what you think!