Gemma 4 — insidejob

Gemma 4 brings genuine reasoning capability to consumer hardware — it runs at 85 tokens/second on a single consumer GPU.

Key specs

Spec	Value
Parameters	26B (MoE)
Disk size	~14 GB
Speed	85 tok/s on consumer GPU
Context	128K tokens
Cost	Free (open weights)

Why it matters

Gemma 4 is significant because it demonstrates that MoE architectures can deliver meaningful quality improvements at sizes that actually run on hardware people own. You don’t need an H100 cluster — a MacBook Pro or a gaming PC with 16GB+ VRAM will do.

Strengths

Runs on consumer hardware (16GB+ VRAM)
Fast inference — 85 tok/s without specialized infrastructure
Google’s training data and methodology at open-source scale
Excellent for local/private deployment

Weaknesses

Benchmarks trail frontier models significantly (GPQA ~72%)
128K context window
Limited to text (no multimodal)
Smaller community than Llama ecosystem

Benchmark scores

Key specs

Why it matters

Strengths

Weaknesses