insidejob

Daily Summary

Saturday, April 11, 2026

First edition. March 2026 was the densest model release window in AI history — GPT-5.4, Gemini 3.1, DeepSeek V4 (1T params), and Claude Managed Agents all shipped. Open-source models now match proprietary on many benchmarks.

New Releases

  • Claude Managed Agents entered public beta (April 1) — fully managed agent harness with secure sandboxing, $0.08/session hour
  • Qwen 3.6 Plus released April 2 with 1M native context window (4x larger than Qwen 3.5)
  • GPT-5.4 shipped March 5 with record GDPval (83%) and computer-use benchmarks. GPT-5.5 (“Spud”) has completed pretraining
  • Gemini 3.1 Pro leads SWE-bench Verified at 78.80% and GPQA Diamond at 94.3%

Models

  • DeepSeek V4 — 1T parameters, available on OpenRouter at $0.28/M input tokens since March 11
  • Gemma 4 — Google’s 26B MoE model runs at 85 tok/s on consumer hardware
  • Llama 4 Maverick — 400B params (17B active via MoE), 10M context window
  • Claude Opus 4.6 and Sonnet 4.6 hold strong — Sonnet at 79.6% SWE-bench for $3/$15 per million tokens

Frameworks & Tools

  • Claude Code SDK renamed to Claude Agent SDK — available in Python (claude-agent-sdk) and TypeScript (@anthropic-ai/claude-agent-sdk)
  • Agent SDK v0.2.101 is the latest release

Security

  • OWASP LLM Top 10 continues to evolve as agents gain autonomy
  • Key risks: prompt injection in agentic contexts, excessive agency, supply chain attacks via MCP servers
  • AI red teaming and adversarial ML research remain active frontiers

Industry

  • LLM Stats tracked 255 model releases from major organizations in Q1 2026
  • March-April 2026 is being called the densest model release window in AI history
  • Open-source models are now on par with proprietary models in many areas

Sources