Latest News 7
A comprehensive guide to MITRE ATLAS — 16 tactics, 84 techniques, and 42 case studies for understanding adversarial threats to AI/ML systems.
A technical breakdown of prompt injection attack classes, real CVEs, and the defense mechanisms that work — and those that don't.
Three frontier models in a single month — GPT-5.4, Gemini 3.1 Ultra, and Grok 4.20 — plus major open-source releases.
As AI agents gain autonomy, the OWASP LLM Top 10 tracks the most critical security risks for large language model applications.
Anthropic renames the SDK to reflect its broader applications beyond coding. Now available in Python and TypeScript.
A fully managed agent harness for running Claude autonomously with secure sandboxing, multi-agent coordination, and server-sent event streaming.
The largest freely available AI model at 1T parameters, hosted on OpenRouter at $0.28/M input tokens.
Releases 3
- Fully managed agent harness on Anthropic infrastructure
- Secure sandboxing and long-running sessions
- Multi-agent coordination in research preview
- Record 83% on GDPval
- Record scores on OSWorld-Verified and WebArena Verified
- Standard, Thinking, and Pro variants
- 1M context window at standard pricing
- Opus 80.8% and Sonnet 79.6% on SWE-bench Verified
- Adaptive, extended, and interleaved thinking
Models 8 pricing per 1M tokens
| Model | Provider | In/Out |
|---|---|---|
| Qwen 3.6 Plus | Alibaba | $0.3/$1.2 |
| Gemma 4 | free | |
| DeepSeek V4 | DeepSeek | $0.28/$1.1 |
| GPT-5.4 | OpenAI | $2.5/$10 |
| Gemini 3.1 Pro | $2/$12 | |
| Claude Opus 4.6 | Anthropic | $5/$25 |
| Claude Sonnet 4.6 | Anthropic | $3/$15 |
| Llama 4 Maverick | Meta | free |
Security 2 rss
Benchmarks 3
GPQA Diamond
- Claude Opus 4.6 94.3
- GPT-5.4 92
- GPT-5.3 Codex 91.5
- Gemini 3.1 Pro 90.8
- Claude Sonnet 4.6 88.5
SWE-bench Verified
- Claude Mythos Preview 93.9
- GPT-5.3 Codex 85
- Claude Opus 4.5 80.9
- Claude Opus 4.6 80.8
- Claude Sonnet 4.6 79.6
LM Arena (Chatbot Arena) Elo Rankings
- Claude Opus 4.6 Thinking 1504
- Gemini 3.1 Pro Preview 1493
- Grok 4.20 Beta1 1491
- GPT-5.4 High 1484
- Claude Sonnet 4.6 Thinking 1478
Trends 1 snapshots
| Model | Arena | GPQA | $/M in |
|---|---|---|---|
| claude opus 4 6 | 1504 | 94.3% | $5 |
| gemini 3 1 pro | 1493 | 90.8% | $2 |
| gpt 5 4 | 1484 | 92% | $2.5 |
| claude sonnet 4 6 | 1478 | 88.5% | $3 |
| deepseek v4 | 1445 | 84% | $0.28 |
| llama 4 maverick | — | 78% | free |
| qwen 3 6 plus | — | 82% | $0.3 |
| gemma 4 | — | 72% | free |