DeepSeek V4 at $0.28/M — what 1T parameters means for cost

DeepSeek V4 costs $0.28 per million input tokens. Claude Opus costs $5. That’s an 18x price difference. But is V4 actually 18x worse? Let’s do the math.

Cost per task: real workloads

Assumptions: average task uses 10K input + 3K output tokens.

Model	Input cost	Output cost	Total per task	10,000 tasks
Claude Opus 4.6	$0.050	$0.075	$0.125	$1,250
GPT-5.4	$0.025	$0.030	$0.055	$550
Claude Sonnet 4.6	$0.030	$0.045	$0.075	$750
Gemini 3.1 Pro	$0.020	$0.036	$0.056	$560
DeepSeek V4	$0.003	$0.003	$0.006	$60
DeepSeek V4 Lite	$0.001	$0.002	$0.003	$30

DeepSeek V4 is 21x cheaper than Claude Opus per task. At 10,000 tasks, you save $1,190.

But what do you lose?

Benchmark	DeepSeek V4	Claude Opus	Gap
SWE-bench Verified	72.5%	80.8%	-8.3 pts
GPQA Diamond	84.0%	94.3%	-10.3 pts
LM Arena Elo	1445	1504	-59 pts

V4 trails Opus by 8-10 points on major benchmarks. That’s significant for frontier tasks (novel reasoning, PhD-level science, complex code architecture) but often invisible for routine work (summarization, data extraction, classification, format conversion).

The 80/20 rule for model selection

Use DeepSeek V4 when:

Task is well-defined (extraction, classification, summarization)
You’re processing high volume (thousands of items)
Quality difference between 84% and 94% doesn’t matter for your use case
Cost is a constraint

Use frontier models (Opus, GPT-5.4, Gemini 3.1) when:

Task requires novel reasoning or creativity
Errors are expensive (security review, medical, legal)
You need the best available quality regardless of cost

The smart approach: Use DeepSeek V4 for data gathering and preprocessing, then pass the structured results to Opus for analysis and synthesis. This is exactly what multi-model agent pipelines are designed for.

Context window trade-off

DeepSeek V4’s 128K context is much smaller than Claude’s 1M or Llama’s 10M. For document analysis, code review of large repos, or conversation with long history, this is a real limitation.

Self-hosting economics

V4 is open-weight, so you can self-host for zero marginal token cost. But 1T parameters requires a serious GPU cluster. Rough estimate: 8x H100 for inference, ~$25K/month in cloud GPU costs. Only worth it at extremely high volume (millions of requests/month).