LM Arena (Chatbot Arena) Elo Rankings

#	Model	Provider	Score
1	Claude Opus 4.6 Thinking	Anthropic	1504
2	Gemini 3.1 Pro Preview	Google	1493
3	Grok 4.20 Beta1	xAI	1491
4	GPT-5.4 High	OpenAI	1484
5	Claude Sonnet 4.6 Thinking	Anthropic	1478
6	GPT-5.4	OpenAI	1470
7	Gemini 3.1 Flash	Google	1455
8	DeepSeek V4	DeepSeek	1445

LM Arena (formerly LMSYS Chatbot Arena) ranks models using Elo ratings from crowdsourced human pairwise comparisons. Users chat with two anonymous models and vote for the better response.

Trends — April 2026

Reasoning-optimized models dominate. Claude Opus 4.6 Thinking uses hidden chain-of-thought to debug outputs before the user sees them.
Grok 4.20 disrupts the top tier — climbed to #3 globally, surpassing GPT-5.4.
Gemini 3.1 Pro Preview outperforms GPT-5.4 High by 9 Elo points in the text arena.
Anything above 1400 Elo is considered frontier-level performance.

The leaderboard updates daily as thousands of new human comparisons are processed.