Llama 4 Maverick
400B total, 17B active (MoE)
Benchmark scores
68.5 SWE-bench Verified
78 GPQA Diamond
Available via: Self-hostedOpenRouterTogether AIFireworks
Llama 4 Maverick pushes the open-source frontier with a 10M token context window and competitive benchmark scores at zero licensing cost.
Benchmarks
| Benchmark | Score | Notes |
|---|---|---|
| SWE-bench Verified | ~68.5% | Strong for open-weight |
| GPQA Diamond | ~78.0% | Approaching closed-source models |
Pricing
Self-hosted: Free (open weights, permissive license). Hosted providers vary:
| Provider | Input/Output (per 1M) |
|---|---|
| Together AI | ~$0.80/$0.80 |
| OpenRouter | ~$0.50/$0.50 |
| Fireworks | ~$0.60/$0.60 |
Architecture
Mixture-of-Experts with 400B total parameters but only 17B active per query. This means:
- Inference speed comparable to a 17B dense model
- Quality approaching a 400B dense model
- Dramatically lower GPU requirements than parameter count suggests
Llama 4 family
| Model | Params | Active | Context | Use case |
|---|---|---|---|---|
| Scout | 109B | 17B | 10M | Efficient, long-context |
| Maverick | 400B | 17B | 10M | Quality-focused |
Strengths
- 10M context window — 10x larger than most competitors
- Zero cost for self-hosted deployment
- MoE architecture keeps inference fast despite 400B params
- Open weights enable fine-tuning and customization
Weaknesses
- Benchmark scores trail frontier closed-source models by 10-15 points
- Requires significant GPU resources for self-hosting (multiple A100s/H100s)
- No official hosted API from Meta