← Back to Blog AI & Tech EN

Claude Opus 4.7 — what’s new and why it matters

Claude Opus 4.7 — best publicly available LLM: SWE-bench Pro 64.3%, 3× better vision, 3× fewer agent errors. Benchmarks, changes, pricing and Mythos.

Claude Opus 4.7 — grafika artykułu

Claude Opus 4.7 is the new leader among publicly available LLMs — best-in-class coding, 3× better vision, three times fewer agent errors, same price. And Mythos is lurking in the background.

Anthropic released Claude Opus 4.7 today (April 16, 2026). New flagship model, the best publicly available LLM on the market — at least until OpenAI and Google respond. Here’s what changed, what it means in practice, and why you should pay attention to what Anthropic didn’t release.

Key changes at a glance

Coding — a massive leap. SWE-bench Pro: 64.3% (vs 53.4% for Opus 4.6). CursorBench: 70.0% (vs 58.0%). This isn’t a minor tweak — it’s a 10–12 percentage point jump on benchmarks that measure real-world programming tasks. GPT-5.4 scores 57.7% on SWE-bench Pro. Gemini 3.1 Pro — 54.2%. Opus 4.7 leads.

3× better vision. New maximum resolution: 2,576 pixels on the longer edge (~3.75 megapixels) — over 3× more than before. In visual navigation tests (no tools): 79.5% vs 57.7% for Opus 4.6. Claude now genuinely “sees” screenshots, interfaces, documents — no more guessing from blurry thumbnails.

Fewer tool errors. 14% better results on complex multi-step agentic workflows, while using fewer tokens and three times fewer tool-use errors. For people building AI agents, this is huge — less retry logic, less debugging, less frustration.

New effort level: “xhigh”. Sits between the existing “high” and “max”. Better results than “high” without the token cost of “max”. A practical sweet spot for most tasks.

/ultrareview. New Claude Code command — an in-depth code review analyzing architecture, security, performance, and maintainability. More thorough than standard review. Per CodeRabbit tests: Opus 4.7 found 68 out of 100 real bugs, vs 55 for Opus 4.6 — a 24% improvement.

New tokenizer. A tokenizer is the component that splits text into “tokens” (word fragments) before processing by the model — it determines how many tokens a given query consumes. The improved tokenizer boosts text processing efficiency. Note: some inputs may generate 1.0–1.35× more tokens than before — worth factoring into your API budget.

Benchmarks — Opus 4.7 vs the competition

Anthropic’s official benchmarks are one thing, but it’s worth looking at independent Vals.ai measurements from April 16, 2026 — covering Opus 4.7, Opus 4.6, Sonnet 4.6, Gemini 3.1 Pro, and GPT-5.4:

Vals.ai benchmarks — Claude Opus 4.7 vs competition, April 16, 2026
Vals.ai benchmarks — Claude Opus 4.7 vs competition, April 16, 2026

Additionally, from Anthropic’s official benchmarks:

Benchmark Opus 4.7 Opus 4.6 GPT-5.4
SWE-bench Pro 64.3% 53.4% 57.7%
CursorBench 70.0% 58.0%
Vision (no-tool navigation) 79.5% 57.7%
API price (input/output per 1M) $5/$25 $5/$25 $2.50/$15

A few things stand out:

Opus 4.7 leads in 6 out of 8 Vals.ai categories. Exceptions: CorpFin v2 (credit agreements), where the older Opus 4.6 is marginally better (67.02% vs 66.08%), and ProofBench (formal math proofs), where GPT-5.4 wins (56.00% vs 54.00%).

Vibe Coding Bench (building apps from scratch): 71.00% vs 67.42% for GPT-5.4. This benchmark measures what people actually do — you tell the AI “build me an app” and see what comes out. Opus 4.7 is the best. Notably, Gemini 3.1 Pro scores a dismal 32.03% here.

Terminal-Bench: Opus 4.7 finally leads (68.54%). Terminal-Bench used to be Codex/GPT territory. Now Opus 4.7 beats both GPT-5.4 (58.43%) and Gemini 3.1 Pro (67.42%). This changes the picture from our Codex vs Claude Code comparison.

What this means in practice

For developers

Opus 4.7 solves tasks that Opus 4.6 couldn’t handle. Developers report the model passed Terminal-Bench tests where earlier Claude versions failed, and solved a tricky concurrency bug that Opus 4.6 couldn’t crack. If you use Claude Code — it simply works better. Fewer corrections, less “no, I meant something else,” fewer iterations.

For people building agents

Three times fewer tool-use errors is a game-changer. AI agents that previously needed retry logic and error handling at every step now pass through complex workflows with fewer hiccups. If you’re building automations in n8n, Make, or directly on the API — Opus 4.7 is significantly more reliable.

For regular users

Better vision means better analysis of screenshots, documents, photos. If you upload an invoice photo, an error screenshot, or a document scan to Claude — results will be more accurate. The rest of the changes (coding, agents) are more developer-facing.

The elephant in the room: Claude Mythos

Alongside the Opus 4.7 launch, Anthropic mentioned something more interesting: Claude Mythos Preview. This is their truly most powerful model — available by invitation only, as part of Project Glasswing (defensive cybersecurity).

Mythos numbers are in a different league: 93.9% SWE-bench, 97.6% USAMO (USA Mathematical Olympiad — a prestigious math competition), ability to discover zero-day exploits. Opus 4.7 is deliberately weaker than Mythos — Anthropic intentionally limited its cybersecurity capabilities.

Why does this matter? Because Anthropic is openly saying: “we have something far more powerful, but we’re releasing a weaker version because we need to test safety measures first.” That’s rare in an industry where everyone rushes to release. And it suggests that Mythos (or a Mythos-class model) will reach public distribution within months — after Opus 4.7 serves as a testing ground for safety filters.

Gizmodo put it aptly: “Anthropic Releases Claude Opus 4.7 to Remind Everyone How Great Mythos Is.”

Pricing

No changes: $5 input / $25 output per million tokens — identical to Opus 4.6. Available on Anthropic’s API, Amazon Bedrock, Google Cloud, and Azure.

Subscription plans (Claude Pro, Max) remain unchanged — Opus 4.7 replaces 4.6 as the default model.

My take

Opus 4.7 is a solid upgrade, not a revolution. The coding benchmark jumps are real and noticeable — the model is clearly better at hard tasks. 3× better vision is a big plus. Fewer tool-use errors are what every agent builder has been waiting for.

But the real story is Mythos. Anthropic showed they have a model that beats everything on the market — and deliberately didn’t release it. This is a company that thinks about safety differently than OpenAI (release everything, fix later). Whether that’s good or bad depends on your perspective. But the fact remains: Anthropic is sitting on something we haven’t seen yet.

For today: if you use Claude — the update is free and worth it. If you don’t — Opus 4.7 is a good moment to try.

Sources: Anthropic (anthropic.com) — official Opus 4.7 announcement. Benchmarks: SWE-bench Pro, CursorBench, no-tool vision. Industry reactions: VentureBeat, CNBC, Axios, Gizmodo, 9to5Mac, The Decoder, CodeRabbit. As of April 16, 2026.

MML Studio

Written by

MML Studio

Comments

Leave a comment

Comments are published after admin approval.

← Previous AI News Weekly Summary — April 6–12, 2026 | Mythos, Terafab
Next → AI News Weekly Summary — April 13–19, 2026 | Opus 4.7 vs Mythos, GPT-5.4-Cyber, NVIDIA Ising