Interesting and early. Worth a spike or exploration session.
Llama 4
Competitive open-weight multimodal models with industry-leading context windows — but the benchmark manipulation scandal and unreleased Behemoth are serious credibility issues.
LLM·Open-source·Multimodal
llama.comOur Take
What It Is
Llama 4 is Meta's latest open-weight model family, released April 2025. Scout (109B total, 17B active, 16 experts) handles up to 10M tokens of context. Maverick (400B total, 17B active, 128 experts) is fine-tuned to 1M tokens. Both are natively multimodal with image understanding built into the architecture. The flagship Behemoth (2T total, 288B active) was announced but remains unreleased as of March 2026.
Why It Matters
Llama 4 is Emerging because the models are genuinely useful but the launch was badly damaged by a confirmed benchmark manipulation scandal. Departing Meta AI chief Yann LeCun confirmed in January 2026 that results were "fudged" — Meta used a non-public experimental variant for leaderboard submissions. CEO Zuckerberg reportedly "lost confidence in everyone involved," and the GenAI organisation was sidelined.
Despite the controversy, Scout and Maverick are available with downloadable weights and competitive pricing ($0.11/M input for Scout). For teams that need long-context multimodal capabilities with open weights, Llama 4 remains a practical option — you just can't trust the published benchmarks.
Key Developments
- Jan 2026: Departing Meta AI chief confirms benchmark results were "fudged" using a non-public model variant.
- Post-launch: GenAI organisation sidelined. Behemoth postponed indefinitely (11+ months past family launch).
- Independent testing: Real-world performance showed regression versus Llama 3 on coding tasks.
- Ongoing: Scout and Maverick weights available on Hugging Face for download and self-hosting.
What to Watch
Behemoth is the question mark. If Meta releases it with genuine benchmarks under new leadership, Llama 4's position improves significantly. If Behemoth remains vapour, the credibility damage compounds and teams will look to Llama 3 (proven) or DeepSeek R1 (competitive, MIT licensed) instead. The organisational restructuring at Meta AI is the signal to track.
Strengths
- Industry-leading context window: Scout supports 10M tokens (trained on up to 256K). Maverick fine-tuned to 1M tokens.
- Efficient MoE architecture: 17B active parameters for both models, enabling deployment at a fraction of dense model costs.
- Competitive pricing: Scout at $0.11/M input tokens, Maverick at $0.50/M — significantly cheaper than proprietary alternatives.
- Natively multimodal: First Llama generation with native image understanding integrated into the architecture.
Considerations
- Confirmed benchmark manipulation: Meta used a non-public variant for leaderboard submissions. Real-world performance didn't match claims.
- 10M context is extrapolated: No model was actually trained on prompts longer than 256K tokens. The 10M claim is not demonstrated at scale.
- Behemoth indefinitely delayed: The flagship 2T-parameter model remains unreleased 11+ months after the family launch.
- Organisational instability: GenAI team sidelined post-launch. Unclear roadmap under restructured leadership.
Resources
Articles
More in Models & Platforms
Llama 4· DeepSeek R1· Mistral· Reasoning Models· Claude Opus 4· GPT-5 Family· Gemini 3.1 Pro· Hugging Face· Amazon Nova· NVIDIA Nemotron 3
Back to AI Radar