Claude Opus 4.7 Review: The New #1 AI for Coding? Benchmarks, Pricing & Real-World Results

Claude Opus 4.7 Review: Anthropic Changed the Coding AI Game — Here’s What You Need to Know

Anthropic dropped Claude Opus 4.7 with zero advance warning to the public and the benchmark numbers are not normal. A 13% jump in coding performance over an already dominant model. Vision capability that more than tripled overnight. A new effort tier that didn’t exist yesterday. And all of this at the exact same price as the previous version.

I’ve been tracking every major AI model release for the past two years, and I’ll say it directly: this is one of the more consequential single-model upgrades Anthropic has shipped. Not because it reinvents everything it doesn’t. But because it extends Claude’s lead in the specific areas where enterprises and developers actually spend money.

Let me break down exactly what changed in this Claude Opus 4.7 Review, what it means, and whether you should care.

What Is Claude Opus 4.7?

Claude Opus 4.7 is Anthropic’s most capable generally available AI model as of today. It replaces Opus 4.6 as the top tier in the Claude lineup the same lineup that includes Sonnet 4.6 for everyday tasks and the restricted-access Claude Mythos for a handful of security partners. Opus 4.7 is what most developers and businesses can actually use.

Think of it like this: if the Claude model family is a car brand, Mythos is the prototype in a locked lab. Opus 4.7 is the flagship in the showroom. You can buy it, drive it, and build on it today.

The model ID is claude-opus-4-7 and it’s live right now across the Claude platform, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing stays at $5 per million input tokens and $25 per million output tokens unchanged from Opus 4.6. That means Anthropic is offering substantially better performance at the same cost. That’s not nothing.

It’s also worth noting this is Anthropic’s fourth Opus release in six months. Opus 4.5 in November, 4.6 in February, and now 4.7 in April. The pace is real and it’s not slowing down. If you’re building on Claude, you should be thinking about upgrade cycles every eight weeks, not every year.

Claude Opus 4.7 Features — What Actually Changed

1. Coding Performance: The Numbers Are Hard to Ignore

On SWE-bench Pro the benchmark that tests a model’s ability to resolve real GitHub issues from production codebases, not toy problems — Opus 4.7 hits 64.3%. That’s up from 53.4% on Opus 4.6. GPT-5.4 scores 57.7% on the same test. Gemini 3.1 Pro lands at 54.2%.

On SWE-bench Verified, a curated subset of the same benchmark, Opus 4.7 reaches 87.6% compared to Opus 4.6’s 80.8% and Gemini 3.1 Pro’s 80.6%. CursorBench, which measures how well the model performs as an actual coding assistant inside an IDE, jumped from 58% to 70%.

Anthropic’s internal 93-task coding benchmark showed a 13% lift over Opus 4.6, including four tasks that neither Opus 4.6 nor Sonnet 4.6 could complete at all. On Rakuten-SWE-Bench, Opus 4.7 resolves three times more production tasks than its predecessor. Those aren’t incremental gains.

For developers using AI coding assistants in daily work, this matters immediately. GitHub Copilot already runs on Claude and is deploying Opus 4.7 today. Cursor has announced 50% promotional pricing for Opus 4.7 adoption. The model that powers the tools millions of engineers use every day just got significantly better.

2. Vision: 3.75 Megapixels Changes What’s Possible

Previous Claude models topped out at 1.15 megapixels. Opus 4.7 accepts images up to 2,576 pixels on the long edge — roughly 3.75 megapixels. That’s more than a 3x increase in visual capacity, and it shows up immediately in practice.

The practical impact is straightforward. Screenshots used to lose detail that mattered. Dense diagrams required workarounds. Scanned contracts with small print were unreliable. At 3.75 megapixels, coordinate mapping is now 1:1 with actual pixels — which means computer use automation that previously failed too often to be production-viable becomes genuinely reliable.

On visual acuity benchmarks, Opus 4.7 scores 98.5% compared to Opus 4.6’s 54.5%. On visual navigation without tools, it reaches 79.5% versus 57.7% for its predecessor. For teams using Claude in document analysis, financial reporting, or legal review workflows, this is where the upgrade pays off most immediately.

3. xhigh Effort Level: Finer Control Over Reasoning Depth

Opus 4.7 introduces a new effort parameter called xhigh that sits between the existing high and max levels. At xhigh, the model invests significantly more tokens in internal reasoning before responding. Anthropic recommends starting with high or xhigh for coding and agentic tasks where quality matters more than speed. Claude Code now defaults to xhigh across all plans.

The new xhigh level at 100k tokens already scores 71% on the reasoning benchmark — ahead of Opus 4.6’s max setting at 200k tokens. You get better results with fewer thinking tokens. That’s the kind of efficiency improvement that matters at production scale.

4. Agentic Reliability: 14% Better, One-Third the Tool Errors

For complex multi-step workflows, Anthropic reports a 14% improvement over Opus 4.6 with fewer tokens and a third of the tool errors. The model is the first to pass what Anthropic calls “implicit-need tests” — tasks where the model must infer which tools or actions are required rather than being told explicitly.

This is genuinely important for anyone building enterprise AI agent deployments. Tool errors in agentic pipelines cascade. One failure can break a workflow that took dozens of steps to reach. Reducing tool errors by two-thirds is not a minor improvement — it’s the difference between a pipeline that runs reliably and one that requires constant intervention.

Anthropic also says Opus 4.7 is engineered to sustain focus over hours-long workflows. Multi-agent coordination — orchestrating parallel AI workstreams rather than processing tasks sequentially — is now built in. For teams running Claude across code review, document analysis, and data processing at the same time, this translates directly to throughput.

5. File System Memory: Context That Survives Sessions

Opus 4.7 has improved file system-based memory. Agents that write to and read from scratchpads or notes files across long sessions get noticeably more reliable behavior. Multi-session work that previously lost context now holds it.

In plain terms: if you’re using Claude to manage a project that runs across multiple days, the model remembers what it learned in earlier sessions and uses that to move faster on new tasks. Less up-front context required each time. That’s a real quality-of-life improvement for anyone doing serious knowledge work with the model.

6. New /ultrareview Command in Claude Code

Claude Code now ships with a /ultrareview slash command that runs a dedicated review session across your changes and flags what a careful human reviewer would catch. It’s a different experience from a standard /review — the depth and specificity are on another level according to early users. For engineering teams where code review quality matters, this is worth testing immediately.

7. Cybersecurity Safeguards: A Direct Response to Mythos

Opus 4.7 is the first Claude model with automated detection and blocking for prohibited cybersecurity uses. This comes directly from the Claude Mythos situation. Anthropic stated it would test new cyber safeguards on less capable models before deploying Mythos-class capabilities more broadly — Opus 4.7 is that model.

Security professionals doing legitimate work can apply through Anthropic’s new Cyber Verification Program. This isn’t just compliance theater. The dual-use concerns around powerful AI models are real, and Anthropic is building the infrastructure to handle them before they become crises. For enterprises in regulated industries, this is a meaningful trust signal.

Claude Opus 4.7 Benchmark

Here’s the full picture across the benchmarks that matter in April 2026. These numbers come from Anthropic’s official release and independent evaluations published today.

Benchmark	Claude Opus 4.7	Claude Opus 4.6	GPT-5.4 Pro	Gemini 3.1 Pro
SWE-bench Pro	64.3%	53.4%	57.7%	54.2%
SWE-bench Verified	87.6%	80.8%	~80%	80.6%
CursorBench	70%	58%	—	—
GPQA Diamond	94.2%	~91%	94.4%	94.3%
MMMLU	91.5%	91.1%	—	92.6%
Visual Acuity	98.5%	54.5%	—	—
Finance Agent v1.1	64.4%	—	—	—
Terminal-Bench 2.0	69.4%	—	75.1%	68.5%
BrowseComp	79.3%	—	89.3%	—

The honest read: Opus 4.7 wins clearly on coding (SWE-bench Pro, SWE-bench Verified, CursorBench), vision, and agentic reasoning. GPT-5.4 holds the lead on terminal execution and web browsing. Gemini 3.1 Pro stays competitive on multilingual tasks. Graduate-level reasoning is a statistical tie across all three frontier models they’ve effectively saturated GPQA Diamond, and the differentiation is moving elsewhere.

Claude Opus 4.7 Use Cases: Who Gains the Most from Opus 4.7

Software Engineering Teams

This is the most obvious case. If your team is using Claude Code, Cursor, or GitHub Copilot, you’re already running on Claude. The jump from 58% to 70% on CursorBench isn’t a benchmark number it’s the difference between an AI pair programmer that gets it right the first time and one that needs two or three rounds of correction.

Warp reported that Opus 4.7 passed Terminal Bench tasks that prior Claude models had failed, and solved a tricky concurrency bug Opus 4.6 couldn’t crack. Box’s evaluation showed 56% fewer model calls and 50% fewer tool calls compared to Opus 4.6 the same work getting done with less compute.

Enterprise Document Analysis

The vision upgrade is a category shift for teams processing scanned contracts, technical drawings, financial statements, and dense reports. Databricks reported 21% fewer errors on document reasoning compared to Opus 4.6. At higher resolution, OCR errors decrease, coordinate mapping becomes reliable, and diagrams that were previously interpretable only by humans can be processed at scale.

For legal teams doing contract review, financial analysts working through reports with embedded tables and charts, and life sciences companies handling patent documentation this is a meaningful operational upgrade.

Agentic Workflow Builders

The 14% improvement in complex multi-step workflows with a third of the tool errors is the headline for anyone building AI workflow automation. Notion’s team said Opus 4.7 is the first model to pass their implicit-need tests the model infers what tools are needed rather than requiring explicit instruction. That’s the difference between an agent that can be trusted and one that needs supervision.

Genspark’s Super Agent team specifically cited loop resistance as the critical differentiator. A model that loops indefinitely on even one in eighteen queries wastes compute and blocks users.

Opus 4.7 achieves the highest quality-per-tool-call ratio they’ve measured. For teams managing production agent pipelines, that reliability improvement translates directly to cost and user experience.

Marketers and Content Professionals

The improved long-context coherence and memory across sessions make Opus 4.7 stronger for extended content projects. If you’re working on a content strategy that spans dozens of documents, or managing brand guidelines across a long project, the model’s ability to hold context across sessions and apply it consistently is genuinely useful. The writing quality lead that Claude has maintained over its competitors hasn’t diminished — it’s carried forward into 4.7.

This connects directly to how creators use AI tools for content creation at scale the difference between a model that drifts in tone over a 10,000-word project and one that stays consistent is significant in production workflows.

Claude Opus 4.7 vs Claude Opus 4.6 — Is the Upgrade Worth It?

The pricing is identical. Same $5 per million input tokens, same $25 per million output tokens. The model is available on the same plans and the same infrastructure. The migration is a drop-in replacement change the model ID from claude-opus-4-6 to claude-opus-4-7 and you’re done in most cases.

There are two things to plan for. First, Opus 4.7 uses a new tokenizer that may produce up to 35% more tokens for the same text compared to Opus 4.6. The per-token price hasn’t changed, but your effective cost per request may increase depending on content type. Test your most common use cases before migrating production workloads. Second, extended thinking budgets and sampling parameter controls have been removed from the API. If your implementation relies on those, you’ll need to adapt.

For most teams, the upgrade is straightforward: better performance at the same price. The vision improvement alone justifies migration for any workflow involving images or documents. The coding gains are significant enough that delaying the upgrade costs you productivity every day you wait.

Claude Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro

The honest answer: no single model wins everywhere, and the right choice depends on your specific workload. But here’s the clearest breakdown I can give you.

Choose Opus 4.7 if you’re primarily doing software engineering, complex agentic workflows, enterprise document analysis, or long-form knowledge work. It’s the best generally available model for those tasks today. The SWE-bench lead is real and meaningful. The vision upgrade is not matched by competitors at this price point.

Consider GPT-5.4 if terminal execution and autonomous web browsing are central to your workflow. Its 75.1% on Terminal-Bench 2.0 versus Opus 4.7’s 69.4% is a real gap for DevOps and infrastructure work. BrowseComp at 89.3% versus 79.3% matters for research and web automation tasks.

Gemini 3.1 Pro makes sense at the 2M context window you can’t get anywhere else, and at $2/$12 per million tokens it’s roughly 60% cheaper than Opus 4.7. For high-volume workloads where you need frontier-adjacent performance on a budget, Gemini is a legitimate choice. It leads on multilingual benchmarks and is the strongest option for Google ecosystem integration.

My take: the best production setups in 2026 don’t pick one model. They route. Opus 4.7 for complex coding and agent work. Gemini for high-volume and research. GPT-5.4 for terminal tasks. The cost of that flexibility is lower than the cost of using the wrong model for every task.

For a broader view of how these models fit into larger toolsets, see our best AI chatbots for 2026 comparison and the best AI tools roundup.

Claude Opus 4.7 Pricing and Access

Pricing is unchanged from Opus 4.6:

Plan / Tier	Pricing	Access
API (base)	$5/MTok input · $25/MTok output	All API users
Prompt caching	Up to 90% savings on cached input	API
Batch API	50% discount on input + output	API
US-only inference	1.1x multiplier	API
Claude Pro / Max	$20/mo · $100/mo	claude.ai
Team / Enterprise	Custom	claude.ai

One technical note: the new tokenizer can increase token counts by 1.0 to 1.35x depending on content. If you’re running high-volume batch jobs, factor that into your cost modeling before migrating. For most conversational and single-request use cases, the difference will be negligible.

For developers building with the API, the key detail is that Opus 4.7 removes manual thinking budgets and sampling parameters. If your prompts rely on those controls, you’ll need to adapt. Anthropic’s migration guide covers the specifics.

Claude Opus 4.7 Pros and Cons

What Works Well

Coding benchmark leadership — SWE-bench Pro at 64.3% is the highest score among publicly available models, and the gap over GPT-5.4 (57.7%) is meaningful rather than marginal.
Vision that actually works — 98.5% visual acuity versus 54.5% for Opus 4.6 is a genuine category shift, not an incremental upgrade. Computer use automation workflows that previously required workarounds now run cleanly.
Same pricing — Anthropic is delivering 13% better coding performance, 3x better vision, and improved agentic reliability at the same $5/$25 per million token price. That’s unusual in a market where every capability improvement comes with a price increase.
Multi-agent coordination — Running parallel workstreams is now built in. For enterprise teams managing complex multi-step pipelines, the throughput improvement is real.
Honest instruction following — The model does what you ask more precisely. Fewer silent interpretations, fewer skipped steps. For production workflows, predictability is a feature.

What to Watch Out For

Terminal-Bench regression — GPT-5.4 scores 75.1% versus Opus 4.7’s 69.4%. If DevOps and infrastructure automation are your primary workload, this gap matters. It’s the one area where switching from Opus 4.6 to 4.7 doesn’t clearly advance your position.
New tokenizer increases costs on some content — Up to 35% more tokens for the same text is real, even if the per-token price hasn’t changed. Test your specific workloads before migrating production pipelines.
BrowseComp is weaker than Opus 4.6 — 79.3% versus the previous model’s performance. If web research and browsing automation are core to your workflows, factor this in.
API breaking changes — Extended thinking budgets and sampling parameters are gone. Teams relying on those will need to refactor. It’s not complicated, but it requires time.
Still below Mythos — Anthropic’s own restricted model scores 77.8% on SWE-bench Pro compared to Opus 4.7’s 64.3%. The model that’s actually better isn’t available to most users. That’s not a knock on Opus 4.7 — Mythos is locked for reasons — but it’s worth knowing the ceiling exists.

Limitations and Risks

The 1M token context window is half of Gemini 3.1 Pro’s 2M. For most enterprise use cases that’s sufficient, but for teams that need to pass entire large codebases or massive document collections in a single prompt, Gemini has a real structural advantage.

The cybersecurity safeguards in Opus 4.7 are new and built on models of harmful use that will continue to evolve. Security professionals doing legitimate penetration testing or vulnerability research now need to apply through Anthropic’s Cyber Verification Program. That adds friction to a workflow that used to be frictionless. It’s the right call from a safety perspective, but teams should know it’s there.

As with any frontier model, the benchmark scores represent best-case performance under controlled conditions. Real-world production performance varies by prompt quality, task complexity, and domain. Running your own evals on representative workloads before migrating is always the right approach, regardless of how good the headline numbers look.

Who Should Use Claude Opus 4.7?

Yes, use Opus 4.7 if: You’re building software with AI assistance, managing agentic pipelines in production, processing high-volume documents with vision components, doing financial or legal knowledge work, or building enterprise applications that need reliable multi-step reasoning over long horizons. This is the best generally available model for those use cases today.

Probably yes if: You’re currently on Opus 4.6 and satisfied with it. The migration is nearly zero-friction, the performance improvements are real, and the price is the same. Unless the tokenizer change significantly affects your costs or the API breaking changes require substantial refactoring, there’s limited reason to stay on the older model.

Consider alternatives if: Your workflows are terminal-heavy and DevOps-focused (GPT-5.4 has an edge), cost is your primary constraint and you’re running high-volume batch jobs where the tokenizer increase matters (Gemini 3.1 Pro at $2/$12 is compelling), or you’re doing web research automation where BrowseComp performance matters.

For solopreneurs and individuals, the access through Claude Pro at $20/month gives you Opus 4.7 without the API complexity. If you’re already a Pro subscriber, you have access today. For those building AI workflows as a solopreneur, Opus 4.7 is worth testing on your most demanding tasks immediately.

Claude Opus 4.7 and the Bigger Picture: Where Anthropic Is Heading

Anthropic is at an unusual moment. The company is running at a $30 billion annualized revenue rate. Claude hit number one on the App Store earlier this year. Eight of the Fortune 10 are now Claude customers. Claude Code alone reached $2.5 billion in annualized revenue in February. The company raised $30 billion at a $380 billion valuation in February and is in early IPO conversations.

Opus 4.7 is the commercial engine behind those numbers. It’s the model enterprises are paying for, the model developers build on, and the model that has to justify the company’s trajectory. A 13% coding improvement and 3x vision upgrade at unchanged pricing signals that Anthropic is willing to invest in performance to hold that position.

The Claude Mythos situation is also relevant context. Anthropic built a model powerful enough that they felt unable to release it publicly — the cybersecurity capabilities were too significant. Opus 4.7 is, in part, a vehicle for testing the safeguards that would eventually allow more capable models to be deployed responsibly. That’s a different approach than OpenAI or Google are taking, and it creates a different kind of trust relationship with enterprise customers. Read more about the Claude Mythos situation if you want the full background.

The next release in this cadence — whether it’s Opus 4.8, a new Sonnet, or something unexpected — will likely arrive in June or July based on the two-month pattern. GPT-5.5 (Spud) and Grok 5 are also expected in Q2. The competitive pressure from all sides is real and accelerating. Today’s benchmark leader is next month’s second place.

Final Verdict: Is Claude Opus 4.7 Worth It?

Yes. It’s not complicated.

If you’re using Opus 4.6 today, the upgrade is a near-zero-friction process with meaningful performance gains at the same price. If you’re evaluating frontier AI models for the first time, Opus 4.7 is the strongest generally available option for software engineering, enterprise document analysis, and agentic workflows as of April 16, 2026.

The honest caveat is that no model wins everything. GPT-5.4 has a real edge on terminal execution. Gemini 3.1 Pro offers 2M context and better economics for high-volume use. The right answer for most serious use cases is model routing, not model loyalty.

But when someone asks me which AI model I’d trust with genuinely hard coding work, a complex multi-step enterprise pipeline, or a document analysis task where accuracy matters — today, after reviewing the numbers, the answer is Claude Opus 4.7. It’s earned that position. Until Spud, Grok 5, or something we haven’t heard about yet changes the landscape again, this is the benchmark to beat.

For more context on how Claude stacks up against alternatives, see our GPT-5.5 review and the full DeepSeek V4 analysis to understand where the competition stands. The 2026 AI statistics page also has useful context on where the industry is heading.

Frequently Asked Questions

What is Claude Opus 4.7?

Claude Opus 4.7 is Anthropic’s most capable generally available AI model, released April 16, 2026. It replaces Claude Opus 4.6 as the top tier in Anthropic’s commercial lineup, with improvements in coding, vision, agentic reasoning, and instruction following at the same pricing of $5/$25 per million tokens.

How does Claude Opus 4.7 compare to GPT-5.4?

Opus 4.7 leads on SWE-bench Pro (64.3% vs GPT-5.4’s 57.7%), SWE-bench Verified (87.6% vs ~80%), and CursorBench (70% vs unscored). GPT-5.4 leads on Terminal-Bench 2.0 (75.1% vs 69.4%) and BrowseComp (89.3% vs 79.3%). GPQA Diamond is a statistical tie at 94.2% vs 94.4%.

What is the price of Claude Opus 4.7?

$5 per million input tokens and $25 per million output tokens — unchanged from Opus 4.6. Prompt caching offers up to 90% savings on cached input, and the Batch API provides a 50% discount. Claude Pro subscribers ($20/month) can access the model directly through claude.ai.

Is Claude Opus 4.7 available on GitHub Copilot?

Yes. GitHub Copilot is deploying Opus 4.7 today and it will replace Opus 4.5 and 4.6 in the model picker for Copilot Pro+ over the coming weeks. A 7.5x premium request multiplier applies as part of promotional pricing until April 30th.

What is the difference between Claude Opus 4.7 and Claude Mythos?

Claude Mythos Preview is Anthropic’s most powerful model overall, scoring 93.9% on SWE-bench Verified and 77.8% on SWE-bench Pro. However, it is restricted to roughly 50 Project Glasswing partner organizations due to cybersecurity concerns and is not publicly available. Opus 4.7 is the most capable model anyone can actually use today.

What does xhigh mean in Claude Opus 4.7?

xhigh is a new effort parameter that sits above the existing high level and below max. At xhigh, the model invests more tokens in reasoning before responding, producing better outputs on complex problems at the cost of higher latency and token usage. Anthropic recommends xhigh for coding and agentic tasks where quality matters more than speed. Claude Code defaults to xhigh for all plans.

Does Claude Opus 4.7 support a 1 million token context window?

Yes. Claude Opus 4.7 includes the full 1M token context window at standard pricing, the same as Opus 4.6. Gemini 3.1 Pro offers a 2M context window, which remains Gemini’s structural advantage for workloads that require processing extremely large documents or codebases in a single pass.

When the Benchmarks Don’t Match Reality (A Personal Take)

I tested Claude Opus 4.7 on a real-world workflow instead of controlled benchmarks specifically, debugging a messy production script with incomplete documentation and inconsistent naming.

The kind of task that doesn’t show up cleanly on SWE-bench. The first result was impressive, but not perfect. It solved about 80% of the issue correctly, then made a subtle assumption that would have broken the system if deployed without review. That’s where the difference between benchmark performance and actual usage becomes obvious. Opus 4.7 is faster at getting to a “mostly correct” answer, but it still requires human judgment at the final step.

In my case, fixing the last 20% took more time than expected not because the model failed, but because it was confidently slightly wrong. And that’s a dangerous combination if you’re not paying attention.

Another thing I noticed: the model performs significantly better when you guide it tightly. Loose prompts lead to over-engineered solutions, while precise constraints produce cleaner outputs.

This isn’t new, but with Opus 4.7 the gap feels wider. In other words, the model is more powerful but also less forgiving if you’re vague. That’s not a flaw, but it does shift more responsibility back to the user.