insidejob

DeepSeek V4 at $0.28/M — what 1T parameters means for cost

DeepSeek V4 costs $0.28 per million input tokens. Claude Opus costs $5. That’s an 18x price difference. But is V4 actually 18x worse? Let’s do the math.

Cost per task: real workloads

Assumptions: average task uses 10K input + 3K output tokens.

ModelInput costOutput costTotal per task10,000 tasks
Claude Opus 4.6$0.050$0.075$0.125$1,250
GPT-5.4$0.025$0.030$0.055$550
Claude Sonnet 4.6$0.030$0.045$0.075$750
Gemini 3.1 Pro$0.020$0.036$0.056$560
DeepSeek V4$0.003$0.003$0.006$60
DeepSeek V4 Lite$0.001$0.002$0.003$30

DeepSeek V4 is 21x cheaper than Claude Opus per task. At 10,000 tasks, you save $1,190.

But what do you lose?

BenchmarkDeepSeek V4Claude OpusGap
SWE-bench Verified72.5%80.8%-8.3 pts
GPQA Diamond84.0%94.3%-10.3 pts
LM Arena Elo14451504-59 pts

V4 trails Opus by 8-10 points on major benchmarks. That’s significant for frontier tasks (novel reasoning, PhD-level science, complex code architecture) but often invisible for routine work (summarization, data extraction, classification, format conversion).

The 80/20 rule for model selection

Use DeepSeek V4 when:

  • Task is well-defined (extraction, classification, summarization)
  • You’re processing high volume (thousands of items)
  • Quality difference between 84% and 94% doesn’t matter for your use case
  • Cost is a constraint

Use frontier models (Opus, GPT-5.4, Gemini 3.1) when:

  • Task requires novel reasoning or creativity
  • Errors are expensive (security review, medical, legal)
  • You need the best available quality regardless of cost

The smart approach: Use DeepSeek V4 for data gathering and preprocessing, then pass the structured results to Opus for analysis and synthesis. This is exactly what multi-model agent pipelines are designed for.

Context window trade-off

DeepSeek V4’s 128K context is much smaller than Claude’s 1M or Llama’s 10M. For document analysis, code review of large repos, or conversation with long history, this is a real limitation.

Self-hosting economics

V4 is open-weight, so you can self-host for zero marginal token cost. But 1T parameters requires a serious GPU cluster. Rough estimate: 8x H100 for inference, ~$25K/month in cloud GPU costs. Only worth it at extremely high volume (millions of requests/month).