insidejob
closed source Anthropic

Claude Opus 4.6

undisclosed
Context 1M tokens
Max output 64K tokens
Architecture Dense transformer
Pricing (per 1M tokens) $5 in / $25 out

Benchmark scores

80.8 SWE-bench Verified
94.3 GPQA Diamond
1504 LM Arena Elo
Available via: APIChatBatchAgent SDKManaged Agents

Claude Opus 4.6 is Anthropic’s most intelligent model, leading GPQA Diamond (94.3%) and LM Arena (#1 at 1504 Elo). The “Thinking” variant uses hidden chain-of-thought to debug outputs before the user sees them.

Benchmarks

BenchmarkScoreRank
GPQA Diamond94.3%#1
SWE-bench Verified80.8%#4 (behind Mythos Preview, GPT-5.3 Codex, Opus 4.5)
LM Arena Elo1504#1

Pricing

Per 1M tokens
Input$5.00
Output$25.00
Fast mode (beta)6x standard rates

Fast mode provides significantly faster output at premium pricing. Available across the full 1M context window.

Architecture & capabilities

  • Context: 1M tokens at standard pricing — no context window surcharge
  • Output: Up to 64K tokens per response
  • Thinking modes: Adaptive thinking, extended thinking, and interleaved thinking for complex multi-step reasoning
  • Agentic: Best-in-class for autonomous task execution, tool use, and multi-step workflows

Strengths

  • Highest GPQA Diamond score of any model (94.3%) — graduate-level science reasoning
  • Top LM Arena ranking via reasoning-optimized Thinking variant
  • 1M context with no degradation at long contexts
  • Strong agentic capabilities — powers Claude Code

Weaknesses

  • Most expensive Anthropic model ($5/$25 per M tokens)
  • Slower than Sonnet for routine tasks
  • SWE-bench score trails newer research previews (Mythos at 93.9%)

When to use

Complex reasoning, agentic workflows, research-grade analysis, tasks where quality justifies cost. For most coding and analysis, Sonnet 4.6 at $3/$15 gets you 98% of the way there.