Benchmarks
Leaderboard snapshots tracked over time. Rankings sorted by score.
| # | Model | Provider | Score |
| 1 | Claude Opus 4.6 | Anthropic | 94.3 |
| 2 | GPT-5.4 | OpenAI | 92 |
| 3 | GPT-5.3 Codex | OpenAI | 91.5 |
| 4 | Gemini 3.1 Pro | Google | 90.8 |
| 5 | Claude Sonnet 4.6 | Anthropic | 88.5 |
| 6 | Grok 4.20 | xAI | 86.2 |
| 7 | DeepSeek V4 | DeepSeek | 84 |
| # | Model | Provider | Score |
| 1 | Claude Mythos Preview | Anthropic | 93.9 |
| 2 | GPT-5.3 Codex | OpenAI | 85 |
| 3 | Claude Opus 4.5 | Anthropic | 80.9 |
| 4 | Claude Opus 4.6 | Anthropic | 80.8 |
| 5 | Claude Sonnet 4.6 | Anthropic | 79.6 |
| 6 | Gemini 3.1 Pro | Google | 78.8 |
| 7 | GPT-5.4 | OpenAI | 77.2 |
| 8 | DeepSeek V4 | DeepSeek | 72.5 |
| # | Model | Provider | Score |
| 1 | Claude Opus 4.6 Thinking | Anthropic | 1504 |
| 2 | Gemini 3.1 Pro Preview | Google | 1493 |
| 3 | Grok 4.20 Beta1 | xAI | 1491 |
| 4 | GPT-5.4 High | OpenAI | 1484 |
| 5 | Claude Sonnet 4.6 Thinking | Anthropic | 1478 |
| 6 | GPT-5.4 | OpenAI | 1470 |
| 7 | Gemini 3.1 Flash | Google | 1455 |
| 8 | DeepSeek V4 | DeepSeek | 1445 |