GPT 5.5 – The Smartest Model Yet (Again)

OpenAI shipped GPT-5.5 this week. The launch post calls it “our smartest and most intuitive to use model yet.”

That sentence, almost word-for-word, has been deployed at every major OpenAI release since GPT-4. It’s the AI lab equivalent of a Bond film insisting this one is darker, grittier, more grown up. By the time you’ve heard it five times you stop asking whether it’s true and start asking what the phrase is actually for.

It’s for keeping the launch cadence going. That’s fine. Marketing has a job. But “smartest yet” tells you nothing about what to do on Monday morning, which is the only question that matters if you’re shipping production AI. See the video below and follow me in RogueLoop here.

Three claims, one with teeth

OpenAI is pitching three things with GPT-5.5: better agentic coding, better knowledge work on a computer, and fewer tokens to do the same job.

Two of those are vibes. “Better at agentic coding” is unfalsifiable until SWE-bench Verified scores land and people other than OpenAI run them. “More intuitive computer use” is a feeling, not a feature — and the launch copy actively hedges, saying GPT-5.5 “brings us closer to the feeling that the model can actually use the computer with you.” That’s not a capability claim. That’s a vibe statement dressed in capability clothing.

The third claim is the one to take seriously. Fewer tokens for the same task is measurable. It shows up on your invoice. You can A/B it on your own workload tomorrow morning, and unlike the agentic claims, you don’t need to wait for METR or anyone else to tell you whether it’s real. If the numbers hold, that’s a meaningful efficiency gain, possibly the most consequential part of this release, even though it’s the least quotable.

Treat the rest as hypotheses.

The Codex angle is the actual story

The bit getting under-discussed — and it’s getting under-discussed because it doesn’t fit on a launch banner — is that GPT-5.5 is now the default in Codex, and Codex now has computer-use skills baked in. Clicking. Typing. Navigating interfaces.

OpenAI is no longer positioning Codex as a code assistant. They’re positioning it as the surface where the model operates the computer for you. Coding is just the first muscle. The same loop — see the screen, decide, act, observe the result — is the one you need for any knowledge work that lives across multiple apps.

This is the same Rubicon Anthropic crossed with Claude’s computer use, and Google’s pushing with Gemini. Three frontier labs converging on the same thesis: the IDE is the new browser, and the agent is the new user.

If you’re a developer, the practical question stops being which model is best. It becomes which agent harness is least painful to live inside for eight hours a day. That’s a workflow question, not a benchmark question — and most procurement processes don’t know how to evaluate it.

The migration command nobody’s talking about

Buried in the launch is this:

$ openai-docs migrate this project to gpt-5.5

You run that in Codex and it drafts a migration plan for your existing API integration.

This is small but telling. OpenAI is admitting that “just point at the new model name” isn’t real advice — that prompt regressions, behaviour drift, and new failure modes are the actual cost of model upgrades. Building a migrate-my-codebase command into the tool is the right framing.

Whether it works well is a separate question. But every other lab will copy this within six months, because the alternative — making customers eat the migration cost themselves — only works while you’re the only game in town. And nobody is the only game in town any more.

The bit I want to flag carefully: OpenAI claim GPT-5.5 is “better at persisting across the loop” — meaning it can explore an idea, gather evidence, test assumptions, interpret results, and decide what to try next, without losing the thread.

That would be transformative. It would also fix the central failure mode of every agentic system shipped to date.

I want to believe. But I’ve heard this song before. It was called AutoGPT. It was called BabyAGI. It was called every agentic framework from 2023 onwards, and what each of them taught us was that persistence across a research loop is the hardest unsolved problem in applied AI — not because models are dumb, but because they hallucinate intermediate state and compound errors over long horizons.

Maybe GPT-5.5 has cracked it. Maybe the architecture changes are real and the persistence is genuine. We’ll know in a fortnight, when METR’s long-horizon task numbers land. Until then, treat the claim like a film trailer: enticing, edited for effect, no obligation to match the final cut.

What to actually do this week

If you’re on the OpenAI API: run your existing eval suite against GPT-5.5 before you migrate anything in production. The token-cost claim is the easiest to verify — measure it on your workload, not theirs.

If you’re a Codex user: this is your default now whether you wanted that or not. Pay attention to whether your tool-use success rate moves over the next two weeks.

If you’re multi-vendor — Claude, Gemini, OpenAI in rotation — GPT-5.5 doesn’t change your strategy. The whole point of last year’s prompt-engineering work was to keep these models swappable. Don’t undo that work for one launch.

And whatever you do, don’t migrate on a Friday.

RogueLoop. Where AI meets real-world innovation.

GPT 5.5 – The Smartest Model Yet (Again)

Three claims, one with teeth

The Codex angle is the actual story

The migration command nobody’s talking about

What to actually do this week

Related