I’ll be honest — I wasn’t expecting it to ship today. I’d been watching the Project Glasswing rollout since April, tracking every breadcrumb Anthropic dropped about Claude Mythos, and my working assumption was that the public version would land sometime in Q3, quietly, with a lengthy safety disclaimer and a hefty waitlist. Then I woke up this morning, checked my feed, and there it was: Claude Fable 5. Available now. No waitlist. No special access required. Free on my Pro plan until June 22.

I’ve tested a lot of models this year — GPT-5.5, Gemini 3.1 Pro, Claude Opus 4.7. Each one came with good benchmarks and some real-world limitations I had to discover myself. Fable 5 hit different within the first twenty minutes of testing. Not because it’s faster or writes prettier prose. Because when I gave it a genuinely hard, multi-step coding task — one that I’d used to stress-test five other models — it didn’t just finish. It finished, caught its own error in one of the sub-tasks, corrected it, and wrote a cleaner solution than the one I’d expected. That’s the thing nobody tells you: the gap doesn’t show on quick demos. It shows when you let it run.

What surprised me most: On a real project I handed it — migrating a Python data pipeline with conflicting dependencies across three files — Fable 5 completed the full refactor, wrote the updated unit tests, and flagged a latent bug in the original code I hadn’t noticed. The whole thing took one prompt and about four minutes of autonomous execution. Opus 4.8 needed three rounds of iteration to get close.

This review covers everything you need to know about Claude Fable 5 on day one: what it actually is, the benchmarks that matter (and which ones to ignore), how it stacks up against GPT-5.5 and Gemini 3.1 Pro on the work people actually do, the prompts I found most useful, and an honest look at who should pay the premium.


What Is Claude Fable 5, Exactly?

Claude Fable 5 is Anthropic’s first publicly available Mythos-class model. The Mythos class sits above Opus in Anthropic’s model hierarchy — this isn’t a rebrand or a marketing tier. It’s architecturally and capability-wise a different generation from Opus 4.8.

The name matters here. Fable comes from the Latin fabula — meaning “that which is told” — and shares its roots with the Greek mythos. Anthropic released two products today from the same underlying model: Claude Fable 5 (the general-access version with safety classifiers) and Claude Mythos 5 (the same model with some safeguards lifted, restricted to vetted Project Glasswing partners working in cybersecurity and critical infrastructure).

Fable 5 isn’t a neutered version of Mythos. For the vast majority of tasks — coding, knowledge work, analysis, vision, long-context reasoning — the performance difference between Fable 5 and Mythos 5 is within 1–3 percentage points. The difference only appears on the specific categories where Fable’s classifiers kick in: cybersecurity exploitation, offensive biology research, and distillation. Less than 5% of sessions trigger a fallback at all. For everyone outside those restricted verticals, Fable 5 is the frontier.

Omar Diani — Expert Note

Anthropic is doing something I haven’t seen before: pricing a top-tier model at less than half what its restricted predecessor cost. Mythos Preview was effectively enterprise-only. Fable 5 at $10/$50 per million tokens is steep compared to Opus 4.8 at $5/$25, but it’s commercially viable for serious production workloads. The math changes when it finishes in one pass what Opus needs three attempts for.


Key Features of Claude Fable 5

1. Long-Horizon Agentic Coding

This is where Fable 5’s lead is not marginal — it’s categorical. The model is explicitly designed to work autonomously for days on complex, multi-stage tasks. Stripe gave it a 50-million-line Ruby codebase and asked it to complete a codebase-wide migration. The model finished in one day. A full engineering team doing the same work by hand would have taken over two months. On SWE-Bench Pro — which tests end-to-end GitHub issue resolution on real repositories — Fable 5 scores 80.3%, against 69.2% for Opus 4.8, 58.6% for GPT-5.5, and 54.2% for Gemini 3.1 Pro.

On the harder FrontierCode Diamond benchmark — which tests whether models can solve difficult coding problems while meeting production-quality standards — Fable 5 scores 29.3%. Opus 4.8 sits at 13.4%. GPT-5.5 is at 5.7%. That gap is not noise. I’ve seen the pattern in my own testing: give Fable 5 a production-grade problem with messy constraints and it produces cleaner, more maintainable code with fewer rounds of iteration. The model understands what done means in a real codebase, not just a benchmark harness.

2. Memory and Long-Context Focus

Fable 5 handles a 1M+ token context window and actively uses self-generated notes to improve output across extended runs. Anthropic ran it through the deck-building game Slay the Spire to test this: when the model had access to persistent file-based memory, its performance improved three times more than it did for Opus 4.8 under the same conditions. Fable also reached the game’s final act three times more often than Opus. That’s a meaningful proxy for how the model handles multi-session, multi-context professional work.

3. Vision and Multimodal Capabilities

Fable 5 is the current state-of-the-art model for vision-based tasks. It can reconstruct a web app’s source code from screenshots alone, extract precise numerical data from dense scientific figures, and understand spatial layouts with a level of accuracy that previous Claude models — even with helper tooling — couldn’t reach. The Pokémon FireRed demonstration is illustrative: past Claude models needed complex helper harnesses to navigate the game. Fable 5 completed it with vision alone and no additional scaffolding.

4. Knowledge Work and Finance

On Hebbia’s Finance Benchmark for senior-level reasoning, Fable 5 scored highest of any model tested. IMC, the trading firm, reported that the model aced their trading analysis evaluations — factual lookup, conceptual reasoning, root-cause analysis, and expected-value analysis — nearly across the board. Hex said Fable 5 was the first model to break 90% on their core analytics benchmark of long-running analytical tasks, calling out “a 10-point jump over Opus” and “strong judgment on the hardest questions.”

5. Scientific Research Capabilities (Mythos 5 tier)

This is the more restricted territory. Using Mythos 5, Anthropic’s internal protein design team accelerated aspects of the drug design process by approximately ten times. The model executed all tasks a scientist normally handles — choosing binding sites, selecting and running protein design tools, recovering from failures — without human assistance. Nine of fourteen protein targets from this study yielded strong candidates currently under investigation.

For Claude Fable 5 users, some of these capabilities are present but limited by the safety classifiers. The genomics research work — where Mythos 5 assembled single-cell data for millions of cells across 138 animal species and trained a machine learning model that outperformed a recent Science publication — required the full Mythos 5 access. Worth knowing what you’re getting vs. what stays gated.


Claude Fable 5 Benchmark Scores vs. Competitors

Here’s the full head-to-head comparison against Opus 4.8, GPT-5.5, and Gemini 3.1 Pro. Note the asterisks — starred rows reflect Mythos 5 scores, not Fable 5, because the safety fallbacks reduce performance on those specific categories.

BenchmarkClaude Fable 5Claude Opus 4.8GPT-5.5Gemini 3.1 Pro
SWE-Bench Pro (agentic coding)80.3%69.2%58.6%54.2%
FrontierCode Diamond (production quality)29.3%13.4%5.7%
Terminal-Bench 2.188.0%*83.4%70.7%
GDPval-AA (knowledge work)19321890
OSWorld-Verified (computer use)85.0%83.4%83.4%
Spatial Reasoning38.6%*
Legal tasks13.3%
Health66.0%*
Prompt Injection Resistance (k=100)4.8%9.6%30.8%45.5%

* Asterisked scores = Mythos 5 figure (Fable 5 lands closer to Opus 4.8 on those categories due to safety fallbacks). Source: Anthropic launch post, June 9, 2026; Digital Applied analysis.

The pattern is consistent: the harder and longer the task, the wider Fable 5’s lead grows. Quick factual queries? Marginal difference over Opus 4.8. Multi-day autonomous work with complex dependencies? The gap is enormous and practically visible within an hour of testing.


Pricing and Availability

Plan / AccessPriceFable 5 AccessNotes
Claude Pro / Max / TeamExisting planFree until June 22Usage credits required after June 23
Claude Enterprise (seat-based)Existing planFree until June 22Then usage credits apply
Claude API (direct)$10 input / $50 output per MTokAvailable nowModel ID: claude-fable-5. Batch: $5/$25 per MTok. 90% prompt caching discount.
GitHub CopilotUsage-based billingAvailable nowPro+, Business, Enterprise. Data retention required.
Amazon Bedrock / Vertex AI / Azure FoundryProvider list pricingAvailable nowSame underlying model, platform-native deployment
Claude Mythos 5$10 input / $50 output per MTokRestrictedProject Glasswing partners only. Cybersecurity / bio researchers via trusted access.
Important for GitHub Copilot users: Claude Fable 5 requires data retention as part of Anthropic’s safety classifier infrastructure. Anthropic retains prompts and outputs for up to 30 days to run the safety classifiers, then deletes them. This data is not used for model training. All other Claude models in Copilot (Opus 4.8, Sonnet 4.5, Haiku 4.5) continue with Zero Data Retention. Enterprise admins must manually enable the Fable 5 policy — it’s off by default.

Testing Methodology and Prompts That Actually Work

I ran Fable 5 across five testing categories: agentic coding, long-context analysis, vision tasks, structured knowledge work, and general reasoning under ambiguity. Here’s what I found useful, including three prompts you can use today.

Test 1: Autonomous Refactoring Prompt

This one I use to stress-test every new coding model. It requires the model to understand intent across multiple files, handle conflicting constraints, and produce production-quality output without hand-holding.

Prompt — Tested on Claude Fable 5 (claude-fable-5)

You are working in a Python 3.11 codebase. The data pipeline in /src/pipeline.py imports from /src/utils.py and /src/validators.py. The pipeline currently breaks on nested JSON inputs with null values in optional fields.

Your tasks:
1. Audit all three files and identify the root cause of the null-handling failure.
2. Refactor the validation logic to handle nulls gracefully without changing the public API signature.
3. Update the unit tests in /tests/test_pipeline.py to cover the new null cases.
4. If you find any other latent bugs during the audit, document them in a comment block at the top of each file.

Do not ask clarifying questions. Work autonomously and explain your decisions inline.

Result: Fable 5 completed the full refactor in one pass, found a latent race condition in the original validator I hadn’t noticed, documented it with a clear comment block, and wrote four additional test cases covering edge scenarios I hadn’t specified. Total time: ~3.5 minutes. Opus 4.8 required three rounds of feedback to reach comparable output quality.

Test 2: Long-Document Financial Synthesis

Prompt — Knowledge Work Testing

Attached are three annual reports (2023, 2024, 2025) for a mid-cap SaaS company. Your task is to:
1. Identify the three largest drivers of margin compression over the three-year period.
2. Flag any discrepancies between stated revenue growth and actual free cash flow trends.
3. Produce a one-page executive summary with your findings, using a senior research analyst’s voice.
4. Note any risks that appear in the footnotes but aren’t discussed in the main narrative.

Cite the specific pages and tables you’re drawing from.

Result: Fable 5 pulled cross-period comparisons I hadn’t spotted, caught a footnote disclosure about a deferred revenue reclassification that shifted the 2024 growth numbers, and wrote a genuinely tight executive summary. The page-citation habit it developed — without being asked explicitly — is the kind of thing that makes outputs actually usable in professional settings.

Test 3: Vision-to-Code Reconstruction

Prompt — Vision Testing

Here is a screenshot of a web app dashboard (attached). Reconstruct the HTML and CSS needed to replicate this layout as closely as possible. Use modern CSS Grid and Flexbox. Produce clean, commented code I can drop directly into a new project without modification.
Result: Fable 5 produced a clean, well-commented HTML/CSS layout that captured the structural hierarchy, spacing, and color relationships from the screenshot. Previous Claude models needed explicit pixel measurements and repeated correction. Fable produced it in a single pass, and the code ran without modification. Impressive for a task that’s basically reading a compressed image and inferring intent.

Understanding the Safeguard Fallback System

This is the most misunderstood part of today’s launch. When Fable 5’s classifiers detect a query in one of three categories — cybersecurity, biology/chemistry, or distillation — the response is automatically handled by Claude Opus 4.8 instead. You’ll be informed when this happens.

In practice: more than 95% of sessions involve zero fallbacks. For the average developer, researcher, analyst, or content creator, you’ll never encounter it. The classifiers are tuned conservatively, which means some benign requests — especially adjacent to security topics — will occasionally trigger a fallback you didn’t expect. Anthropic acknowledges this and is working to reduce false positives post-launch.

The fallback to Opus 4.8 matters because Opus 4.8 is itself a strong model. You don’t get a refusal — you get a response from a highly capable model on a different tier. For those building in cybersecurity-adjacent spaces, this behavior is worth testing before committing to a production deployment.

Data retention note (GitHub Copilot users): Fable 5 requires Anthropic to retain prompts and outputs for up to 30 days to run safety classifiers. Data is not used for training and is deleted after 30 days. This is a meaningful policy difference from other Claude models in Copilot, and it matters for enterprise compliance conversations. Source: GitHub Changelog, June 9, 2026.

PrimeAIcenter Score (PAC Score)

PrimeAIcenter Score — Claude Fable 5
Tested June 9, 2026 · Omar Diani · 7-dimension methodology

8.9/10

Coding
9.8

Reasoning
9.4

Context Handling
9.3

Accuracy
9.0

Reliability
8.7

UI/UX
8.4

Pricing / Value
7.2

Methodology: PAC Scores are based on direct hands-on testing across the categories above. Coding was evaluated using multi-file refactoring tasks, SWE-Bench-style problems, and vision-to-code reconstruction. Reasoning was evaluated with multi-step logic chains, financial synthesis, and contradictory-evidence prompts. Context handling was tested using 200K+ token documents with cross-reference questions. Accuracy compared outputs against verified sources. Reliability tracked fallback frequency and output consistency across identical prompts. Pricing reflects value relative to comparable models at their respective price points. Scores reflect testing done June 9, 2026, on claude-fable-5 via the Claude API. Scores may change as the model is refined post-launch.


Pros and Cons

What’s Good
  • Best agentic coding model publicly available — the SWE-Bench Pro lead is real and I felt it in testing.
  • Long-context memory is genuinely improved vs. Opus 4.8. The model stays focused and self-corrects across extended runs.
  • Vision tasks are the new frontier — reconstructing apps from screenshots without scaffolding is a practical capability, not a party trick.
  • Less than half the price of Mythos Preview. Commercially viable for serious production workloads.
  • Available on every major platform on day one: API, Bedrock, Vertex, Foundry, GitHub Copilot.
  • Prompt injection resistance is class-leading. 4.8% attack success rate vs. 30.8% for GPT-5.5.
  • Free on Pro/Max/Team plans until June 22 — plenty of time to run real workload tests before deciding on credits.
What’s Not
  • $50/million output tokens is expensive for high-volume, production API workloads. Cost discipline is non-optional.
  • The safeguard fallback triggers on some benign requests in security-adjacent domains — frustrating if your work lives near those edges.
  • Data retention requirement for GitHub Copilot integration is a hard constraint for enterprise compliance in certain industries.
  • Computer use benchmark doesn’t lead the field — Mythos Preview edges it 85.4% to 85.0% on OSWorld-Verified. Not a weakness, but not the headline either.
  • Post-June 22, access on subscription plans requires usage credits. Pricing structure is still being defined and could change.
  • Cybersecurity, biology, and distillation tasks route to Opus 4.8, not Fable 5. If those verticals are your primary use case, Fable 5 isn’t the model you need.


Claude Fable 5 vs. GPT-5.5 vs. Gemini 3.1 Pro

The direct comparison most people need. I’ve tested all three. Here’s the honest breakdown:

On agentic coding: Fable 5 wins, and it’s not close. 80.3% vs. 58.6% (GPT-5.5) vs. 54.2% (Gemini 3.1 Pro) on SWE-Bench Pro. If you’re building or maintaining complex codebases and need a model to work autonomously for hours at a time, Fable 5 is the right call. GPT-5.5 is strong in its own Codex CLI harness — cross-lab terminal benchmarks are harness-confounded — but on the neutral benchmarks, Fable 5 leads clearly.

On pricing: GPT-5.5 costs roughly half of Fable 5 per token. If your work is routine — writing, summarization, classification, moderate-complexity code — the capability premium for Fable 5 doesn’t earn itself. Run both on your actual workload during the free window (before June 22) and see if the quality gap justifies the cost difference.

On vision: Fable 5 is the clearest win. GPT-5.5 and Gemini 3.1 Pro both have capable vision, but Fable 5’s ability to handle complex, structured visual data — scientific figures, UI reconstruction, game environments — without additional scaffolding is a practical differentiator I noticed immediately in testing.

On safety posture: Both Anthropic and OpenAI gate cybersecurity and biology behind safety classifiers and vetted-access programs. This isn’t a Fable 5-specific limitation — it’s the new industry standard at the frontier tier. Gemini 3.1 Pro’s prompt injection resistance is notably weaker (45.5% attack success rate vs. Fable’s 4.8%).

For the deep dive on how I tested GPT-5.5 and Gemini 3.1 Pro, see those full reviews. The three-way benchmark comparison in the table above covers the publicly available numbers.


What Surprised Me Most About Claude Fable 5

I expected the coding benchmarks. I expected the long-context improvements. What I didn’t expect was the self-validation behavior. At the highest effort level, Fable 5 reflects on and validates its own work before declaring a task complete. Yusuke Kaji, GM of AI for Business at Kintsugi, flagged this in early testing: “the extra thinking pays for itself.” That’s not marketing language — I saw it in my own testing. On the refactoring task I ran, the model caught an error in its own initial solution before I had to give feedback. That’s a different category of behavior than “the model is smart.”

The other surprise: the performance on Ethan Mollick’s informal testing, published on his Substack today, corroborated what I was seeing independently. He noted that Fable 5 “outperformed basically every other public model I have used by a considerable margin” and described it working “up to a dozen hours executing on multi-page specifications.” Given that Mollick tests dozens of models methodically and doesn’t give away strong praise easily, that’s a signal worth taking seriously.

Andrej Karpathy called it “a major-version-bump-deserving step change forward.” I don’t use language like that lightly. But after a few hours of testing across domains, I understand why he said it. This is not an incremental release.


Who Should Use Claude Fable 5?

💻

Software Engineers
Working on complex, multi-file codebases where long-horizon autonomy matters. The SWE-Bench Pro lead is real and felt in practice.

📊

Financial Analysts
Senior-level financial reasoning, document synthesis, and chart interpretation — Hebbia’s Finance Benchmark confirms the edge.

🔬

Life Sciences Researchers
Within the Fable 5 safety bounds (non-dual-use biology). Novel hypothesis generation is significantly improved over Opus 4.8.

🤖

AI Agent Builders
Building pipelines where the agent needs to run for hours, recover from failures, and validate its own output before delivering.

⚖️

Legal Teams
Harvey’s early testing showed Fable 5’s redlines matched or beat their current model in blind review. First meaningful legal AI upgrade in months.

🎯

Enterprise Power Users
Anyone running multi-day analytical, writing, or research workflows where output quality justifies the premium per-token cost.

Who should stick with Opus 4.8 or a cheaper model: If your use case is routine — chatbot responses, basic summarization, content drafts, simple classification — the cost difference between Fable 5 and Opus 4.8 won’t be offset by capability gains you’d actually notice. Fable 5’s advantage is concentrated in hard, long, complex tasks. For everything else, Opus 4.8 is excellent and half the price.


Limitations and Honest Caveats

A few things worth flagging before you commit to a production rollout:

The asterisks in the benchmark table matter. Several of Anthropic’s headline numbers — Terminal-Bench 2.1, spatial reasoning, health — are Mythos 5 scores, not Fable 5 scores. Fable 5 lands closer to Opus 4.8 on those categories due to safety fallbacks. Anthropic is transparent about this, but coverage of the launch has been less so. Read the fine print before citing those numbers internally.

Cost management is a real operational concern. Fable 5 at $50/million output tokens is token-hungry on long tasks. If you’re running multi-hour autonomous workflows, you need a cost cap and a routing strategy. Use prompt caching aggressively — the 90% input token discount is significant. Reserve Fable 5 for the tasks that justify it and route simpler work to Opus 4.8 or Sonnet.

It’s day one. Watcher Kaji’s comment about “extra thinking paying for itself” reflects early testing under controlled conditions. Real-world production behavior — especially for agentic pipelines running unattended for hours — will surface edge cases that don’t appear in benchmarks. Test on your actual workload before making production decisions.


Final Verdict

The Best AI Coding Model Available Today. Worth the Premium If the Work Is Hard Enough.

Claude Fable 5 is a genuine step change — not an incremental release, not a renamed model with tweaked hyperparameters. The SWE-Bench Pro lead (80.3% vs. 58.6% for GPT-5.5) is large enough to feel in practice, the self-validation behavior is a new category of capability, and the long-context memory improvements are real. The pricing is steep, and the safeguard fallback is a constraint you need to understand before deploying. But if you’re working on complex, long-horizon coding, research, or analytical tasks, Fable 5 earns its cost. Run it on your actual work before June 22 while it’s free on paid plans. That’s the honest advice.

Try Claude Fable 5 Free →


Frequently Asked Questions

What is Claude Fable 5?
Claude Fable 5 is Anthropic’s first publicly available Mythos-class AI model, released June 9, 2026. It’s the same underlying architecture as Claude Mythos Preview — Anthropic’s top-tier restricted model — made safe for general use through a set of safety classifiers that route sensitive queries in cybersecurity, biology, and distillation to Claude Opus 4.8 instead.

How is Claude Fable 5 different from Claude Mythos 5?
They share the same underlying model. Fable 5 has safety classifiers that reroute certain high-risk queries (cybersecurity, biology, distillation) to Opus 4.8. Mythos 5 has those safeguards lifted in some areas and is restricted to vetted Project Glasswing partners. For 95%+ of use cases, Fable 5 performance is within 1–3 percentage points of Mythos 5.

What is the API model ID for Claude Fable 5?
The official API model string is claude-fable-5. Access it through the Anthropic Claude Platform, Amazon Bedrock, Vertex AI, or Microsoft Foundry.

How much does Claude Fable 5 cost?
$10 per million input tokens and $50 per million output tokens. Batch pricing is $5/$25 per million tokens. Prompt caching reduces input costs by 90%. Free on Pro, Max, Team, and seat-based Enterprise plans until June 22, 2026. Usage credits required after June 23.

What is Claude Fable 5’s context window?
Claude Fable 5 has a 1M+ token context window with a maximum output of 128K tokens per request.

Is Claude Fable 5 available on GitHub Copilot?
Yes. Claude Fable 5 is available to Copilot Pro+, Business, and Enterprise users across VS Code, Visual Studio, JetBrains, Xcode, Eclipse, the CLI, github.com, and GitHub Mobile. Important: it requires data retention (up to 30 days) to operate Anthropic’s safety classifiers — different from other Claude models in Copilot. Enterprise admins must manually enable it in Copilot settings.

What are Claude Fable 5’s benchmark scores?
Key scores: SWE-Bench Pro 80.3% (vs. Opus 4.8 at 69.2%, GPT-5.5 at 58.6%, Gemini 3.1 Pro at 54.2%). FrontierCode Diamond 29.3% (vs. Opus 4.8 at 13.4%, GPT-5.5 at 5.7%). GDPval-AA knowledge work score 1932 (vs. Opus 4.8 at 1890). Prompt injection resistance 4.8% attack success rate (vs. 30.8% for GPT-5.5 and 45.5% for Gemini 3.1 Pro). Note: some benchmarks in Anthropic’s table show Mythos 5 scores — see the asterisks.

What happens when Claude Fable 5’s safety classifiers trigger?
Queries in cybersecurity, biology/chemistry, and distillation automatically route to Claude Opus 4.8 instead. Users are informed when this happens. The fallback triggers in less than 5% of sessions. You receive a response from Opus 4.8 — not a refusal. Anthropic is working to reduce false positive rates post-launch.

Is Claude Fable 5 better than GPT-5.5 for coding?
On agentic coding benchmarks, yes — clearly. Fable 5 scores 80.3% on SWE-Bench Pro vs. GPT-5.5 at 58.6%. On FrontierCode Diamond, Fable 5 scores 29.3% vs. GPT-5.5 at 5.7%. GPT-5.5 costs roughly half the price and performs well in its own Codex CLI harness. The honest answer: Fable 5 wins on hard, complex, long-horizon coding tasks. GPT-5.5 wins on price-per-token for routine coding work.

What is Project Glasswing and how does it relate to Claude Fable 5?
Project Glasswing is Anthropic’s trusted-access program, launched in April 2026, that gave vetted cybersecurity organizations and critical infrastructure providers access to Claude Mythos Preview. Claude Mythos 5 continues through this program with upgraded capabilities. Claude Fable 5 is the public version of the same underlying model, designed for general access with safety classifiers in place. Learn more at anthropic.com/glasswing.

What is the PrimeAIcenter Score for Claude Fable 5?
Claude Fable 5 earns a PAC Score of 8.9/10 based on hands-on testing by Omar Diani on June 9, 2026. Scores by dimension: Coding 9.8, Reasoning 9.4, Context Handling 9.3, Accuracy 9.0, Reliability 8.7, UI/UX 8.4, Pricing/Value 7.2. The lower pricing score reflects the $50/M output token cost compared to alternatives.


Sources and Further Reading