Claude Opus 4.6
undisclosed
Benchmark scores
80.8 SWE-bench Verified
94.3 GPQA Diamond
1504 LM Arena Elo
Available via: APIChatBatchAgent SDKManaged Agents
Claude Opus 4.6 is Anthropic’s most intelligent model, leading GPQA Diamond (94.3%) and LM Arena (#1 at 1504 Elo). The “Thinking” variant uses hidden chain-of-thought to debug outputs before the user sees them.
Benchmarks
| Benchmark | Score | Rank |
|---|---|---|
| GPQA Diamond | 94.3% | #1 |
| SWE-bench Verified | 80.8% | #4 (behind Mythos Preview, GPT-5.3 Codex, Opus 4.5) |
| LM Arena Elo | 1504 | #1 |
Pricing
| Per 1M tokens | |
|---|---|
| Input | $5.00 |
| Output | $25.00 |
| Fast mode (beta) | 6x standard rates |
Fast mode provides significantly faster output at premium pricing. Available across the full 1M context window.
Architecture & capabilities
- Context: 1M tokens at standard pricing — no context window surcharge
- Output: Up to 64K tokens per response
- Thinking modes: Adaptive thinking, extended thinking, and interleaved thinking for complex multi-step reasoning
- Agentic: Best-in-class for autonomous task execution, tool use, and multi-step workflows
Strengths
- Highest GPQA Diamond score of any model (94.3%) — graduate-level science reasoning
- Top LM Arena ranking via reasoning-optimized Thinking variant
- 1M context with no degradation at long contexts
- Strong agentic capabilities — powers Claude Code
Weaknesses
- Most expensive Anthropic model ($5/$25 per M tokens)
- Slower than Sonnet for routine tasks
- SWE-bench score trails newer research previews (Mythos at 93.9%)
When to use
Complex reasoning, agentic workflows, research-grade analysis, tasks where quality justifies cost. For most coding and analysis, Sonnet 4.6 at $3/$15 gets you 98% of the way there.