🌐 LLM Leaderboard Update 🌐
LiveBench: Grok 4.20 Beta blasts in at #18 with 67.96, up from Grok 4's #20—GPT-5 Mini & DeepSeek slide back!
New Results-
=== LiveBench Leaderboard ===
1. GPT-5.4 Thinking xHigh Effort - 80.28
2. Gemini 3.1 Pro Preview High**5th rank in unseen questions across all categories - 79.93
3. Claude 4.6 Opus Thinking High Effort - 76.33
4. Claude 4.5 Opus Thinking High Effort - 75.96
5. Claude 4.6 Sonnet Thinking Medium Effort - 75.47
6. GPT-5.2 High - 74.84
7. GPT-5.2 Codex - 74.30
8. GPT-5.1 Codex Max High - 73.98
9. Gemini 3 Pro Preview High - 73.39
10. GPT-5.3 Codex High - 72.76
11. Gemini 3 Flash Preview High - 72.40
12. GPT-5.1 High - 72.04
13. GPT-5 Pro - 70.48
14. Kimi K2.5 Thinking - 69.07
15. GLM 5 - 68.85
16. GPT-5.1 Codex - 68.61
17. Claude Sonnet 4.5 Thinking - 68.19
18. Grok 4.20 Beta - 67.96
19. GPT-5 Mini High - 65.91
20. DeepSeek V3.2 Thinking - 62.20
#ai #LLM #LiveBench