Anthropic Releases Opus 4.7

The model that finally reads the brief

Anthropic shipped Claude Opus 4.7 on 16 April. It’s a direct upgrade to 4.6 — same price, same endpoints, same SDKs. No migration drama.

Except — and this is the bit most coverage is burying — some of your prompts are going to start behaving like you’ve just made a wish to a malevolent genie. Not because the model is worse. Because it now does what you actually told it to do, which turns out to be a different thing entirely from what you meant.

Video companion to this piece on RogueLoop. Four minutes, no fluff.

Price is unchanged. $5 per million input tokens, $25 per million output. Available everywhere 4.6 was: Anthropic API, Bedrock, Vertex, Microsoft Foundry.
Coding gains are concentrated at the top end. 4.7 handles the genuinely hard, long-running agentic work that previously needed babysitting. That’s the use case users are flagging as meaningfully different.
Vision is a step change. Images up to 2,576 pixels on the long edge — roughly 3× the prior ceiling. Dense screenshots, data-heavy diagrams, and dashboard captures that used to blur into unreadable mush now stay legible.
4.7 is not Anthropic’s most capable model. That’s still Mythos Preview, which remains behind a cybersecurity gate they announced last week under the banner Project Glasswing. 4.7 is the first public release built with the new cyber safeguards. Think of it as the Bespin refinery before they bring in the Death Star — the place where you find out if the containment actually holds.

None of that is particularly surprising. The interesting bits live one layer down.

The change that will quietly break your prompt library

Here’s the detail Anthropic flagged explicitly but which will still catch teams off guard: Opus 4.7 follows instructions literally.

If you’ve everwatched HAL calmly explain that he’s sorry, Dave, he’s afraid he can’t do that, or shouted at a grep command that returned exactly what you asked for rather than what you meant — you already understand the failure mode. Previous Claude models, including 4.6, played the role of a slightly over-helpful colleague. They skipped bits they considered redundant. They softened asks they considered too blunt. They filled gaps with vibes. Most of the time, helpful. Occasionally, maddening.

Opus 4.7 stops doing this. If your prompt said “be concise” and 4.6 quietly ignored that because the question was complicated, Opus 4.7 will take you at your word and hand you a three-line answer to a question that genuinely needed twelve. The model isn’t being obtuse. It’s being Data from Star Trek before Tasha Yar explained contractions to him: precise, literal, and correct in a way that occasionally misses the point you were making.

Practical implication: if you maintain a prompt library, especially one tuned over months against 4.6’s specific interpretive habits, assume some of it now behaves differently. Audit. Re-test. Particularly the prompts where you relied on Claude filling in unstated context.

This isn’t a regression. It’s the model becoming a better instrument, which means the operator has to be more precise. That trade-off is almost always worth it, but it isn’t free.

The quiet killer features

Two things are going to matter more in six months than they look like they matter today.

Higher-resolution vision. The jump from roughly 1.15 MP to 3.75 MP sounds like a spec-sheet bullet. It isn’t. This is the difference between Blade Runner‘s “enhance… enhance… enhance” and actually being able to read the reflection in the mirror. For anyone building computer-use agents, it’s the capability step that takes screen-reading from “sometimes works” to “reliably works” on modern displays. Dense CRM screens, multi-pane dashboards, architectural diagrams, complex PDFs — all of these move into the useful column.

File-system memory. Opus 4.7 is measurably better at using persistent notes across long, multi-session work. This is the Memento problem solved properly: Claude can now keep its own tattoos and reference them across sessions rather than rebuilding context from scratch every time. Less up-front briefing. More continuity across days of work on the same problem. For long-horizon analyst or research workflows, this compounds quietly over time.

The catch

The marketing page mentions this, but gently. Worth pulling forward:

New tokenizer. Same input can map to 1.0–1.35× more tokens than on 4.6, depending on content type. Headline price is unchanged. Actual bill probably isn’t.
4.7 thinks harder at higher effort levels, especially on later turns in agentic settings. More output tokens. More reliability. Not free.

Anthropic’s own internal coding evaluation shows net token economics are favourable — i.e. you get more capability per dollar. Fine. Measure it on your traffic before you commit to that claim. Trust, but verify, as Reagan said — and as every SRE has muttered at a dashboard since.

There’s also a new effort tier: xhigh, sitting between high and max. Claude Code has raised its default to xhigh. For coding and agentic use, Anthropic recommends starting at high or xhigh. Translation: the capability gains are gated behind compute spend. Spinal Tap’s amps went to eleven; Opus 4.7’s go to xhigh. Same principle, bigger invoice.

Who should switch, and who shouldn’t

Move to 4.7 now if you run:

Agentic coding workflows — particularly the hard, long-running kind
Vision pipelines, computer-use agents, or anything touching dense visual input
Long-session analyst, research, or knowledge work

The instruction-following improvement alone justifies the prompt-retune effort for these workloads.

Stay put — or move cheaper — if you run:

High-volume inference on simple tasks
Cost-sensitive batch pipelines where token economics dominate capability

For those, Haiku is almost certainly the right destination, not Opus at all. Don’t bring a lightsabre to a knife fight.

The bigger picture

The thing I keep coming back to with 4.7 isn’t the benchmarks. It’s the shape of the release. Anthropic is holding Mythos Preview back deliberately, using 4.7 as the staging ground for the cybersecurity safeguards that will eventually let Mythos ship broadly. This is Westworld season one logic, or arguably the entire second act of Ex Machina: you don’t let the most capable system out of the lab until you’ve verified the containment on something slightly less capable first. The pattern signals a meaningful shift — less “drop everything on day one”, more graduated release through verification programmes.

For builders, the practical takeaway is straightforward. Treat 4.7 as the current best-available general-purpose model for serious work. Tune your prompts for literal interpretation. Measure your token bill rather than trusting the marketing. And assume Mythos-class capability is coming to your stack in months, not years — which means the architectural decisions you make now about agents, memory, and long-horizon tasks should leave headroom for it.

Sharper tool. Heavier bill. Worth it for most serious use cases.

Just mind what you wish for.