Gemma 4
26B (MoE)
Benchmark scores
72 GPQA Diamond
Available via: Self-hostedOllamaLM Studio
Gemma 4 brings genuine reasoning capability to consumer hardware — it runs at 85 tokens/second on a single consumer GPU.
Key specs
| Spec | Value |
|---|---|
| Parameters | 26B (MoE) |
| Disk size | ~14 GB |
| Speed | 85 tok/s on consumer GPU |
| Context | 128K tokens |
| Cost | Free (open weights) |
Why it matters
Gemma 4 is significant because it demonstrates that MoE architectures can deliver meaningful quality improvements at sizes that actually run on hardware people own. You don’t need an H100 cluster — a MacBook Pro or a gaming PC with 16GB+ VRAM will do.
Strengths
- Runs on consumer hardware (16GB+ VRAM)
- Fast inference — 85 tok/s without specialized infrastructure
- Google’s training data and methodology at open-source scale
- Excellent for local/private deployment
Weaknesses
- Benchmarks trail frontier models significantly (GPQA ~72%)
- 128K context window
- Limited to text (no multimodal)
- Smaller community than Llama ecosystem