insidejob
closed source OpenAI

GPT-5.4

Context 256K tokens
Max output 32K tokens
Architecture Dense transformer (speculated MoE)
Pricing (per 1M tokens) $2.5 in / $10 out

Benchmark scores

92 GPQA Diamond
83 GDPval
1484 LM Arena Elo
Available via: APIChatBatchAssistants

GPT-5.4, released March 5 2026, delivered record benchmark scores particularly in computer-use tasks (OSWorld-Verified, WebArena Verified) and the 83% GDPval record.

Benchmarks

BenchmarkScoreNotes
GPQA Diamond92.0%#2, behind Claude Opus 4.6
GDPval83.0%Record score
LM Arena Elo1484#4 (Standard), #2 with High variant
OSWorld-VerifiedRecordComputer-use benchmark
WebArena VerifiedRecordWeb navigation benchmark

Pricing

Per 1M tokens
Input$2.50
Output$10.00

Competitively priced — cheaper than Claude Opus for comparable frontier performance.

Variants

VariantUse case
GPT-5.4 StandardGeneral-purpose
GPT-5.4 ThinkingExtended reasoning with chain-of-thought
GPT-5.4 ProMaximum quality, higher cost

Strengths

  • Best computer-use model available (OSWorld, WebArena records)
  • Strong price/performance — $2.50/$10 undercuts Opus pricing
  • GDPval record suggests strong real-world task completion

Weaknesses

  • Trails Claude Opus on GPQA Diamond by 2.3 points
  • 256K context window vs Claude’s 1M
  • OpenAI stopped reporting SWE-bench Verified scores (data contamination concerns)

What’s next

GPT-5.5 (codenamed “Spud”) has completed pretraining. Another major release expected soon.