Claude Opus 4.6 — insidejob

Claude Opus 4.6 is Anthropic’s most intelligent model, leading GPQA Diamond (94.3%) and LM Arena (#1 at 1504 Elo). The “Thinking” variant uses hidden chain-of-thought to debug outputs before the user sees them.

Benchmarks

Benchmark	Score	Rank
GPQA Diamond	94.3%	#1
SWE-bench Verified	80.8%	#4 (behind Mythos Preview, GPT-5.3 Codex, Opus 4.5)
LM Arena Elo	1504	#1

Pricing

	Per 1M tokens
Input	$5.00
Output	$25.00
Fast mode (beta)	6x standard rates

Fast mode provides significantly faster output at premium pricing. Available across the full 1M context window.

Architecture & capabilities

Context: 1M tokens at standard pricing — no context window surcharge
Output: Up to 64K tokens per response
Thinking modes: Adaptive thinking, extended thinking, and interleaved thinking for complex multi-step reasoning
Agentic: Best-in-class for autonomous task execution, tool use, and multi-step workflows

Strengths

Highest GPQA Diamond score of any model (94.3%) — graduate-level science reasoning
Top LM Arena ranking via reasoning-optimized Thinking variant
1M context with no degradation at long contexts
Strong agentic capabilities — powers Claude Code

Weaknesses

Most expensive Anthropic model ($5/$25 per M tokens)
Slower than Sonnet for routine tasks
SWE-bench score trails newer research previews (Mythos at 93.9%)

When to use

Complex reasoning, agentic workflows, research-grade analysis, tasks where quality justifies cost. For most coding and analysis, Sonnet 4.6 at $3/$15 gets you 98% of the way there.