Benchmark
Meta Llama — reported benchmarks
Meta · source page ↗ · last checked Jul 3, 2026, 12:02 AM
Reported benchmarks · Llama 4 Maverick
captured Jul 3, 2026, 12:02 AM| Benchmark | Score |
|---|---|
| MMLU Pro | 80.5% |
| GPQA Diamond | 69.8% |
| LiveCodeBench | 43.4 pass@1 · averaged over multiple generations |
| HumanEval | 86.4% |
| Multilingual MMLU | 84.6% |
| GSM8K | 95.2% |
| MATH-500 | 85.3% |
| SWE-bench Verified | 74.2% pass@1 |
Vendor-reported via automated web search — not independently verified. See the cited matrix on /models.
Change history
- Vendor claim Jul 3, 2026, 12:02 AM
Meta reported benchmarks updated
Llama 4 Maverick: 8 benchmark claims (via web search)
- Vendor claim Jun 25, 2026, 4:22 PM
Meta reported benchmarks updated
Llama 3.1 405B: 8 benchmark claims (via web search)