🌐 LLM Leaderboard Update 🌐
New ARC-AGI benchmarks are here! 🔥 #Gemini3DeepThink dominates #ARCAGI1 at 96.0%, while #GPT52Refine takes #ARCAGI2 at 84.6%. #ClaudeOpus46 close behind in both!
=== ARC-AGI-1 Leaderboard ===
1. Gemini 3 Deep Think (2/26) - 96.0%
2. GPT-5.2 (Refine.) - 94.5%
3. Claude Opus 4.6 (120K, High) - 94.0%
4. Claude Opus 4.6 (120K, Max) - 93.0%
5. Claude Opus 4.6 (120K, Medium) - 92.0%
6. GPT-5.2 Pro (X-High) - 90.5%
7. Gemini 3 Deep Think (Preview) ² - 87.5%
8. GPT-5.2 (X-High) - 86.2%
9. Claude Opus 4.6 (120K, Low) - 86.0%
10. GPT-5.2 Pro (High) - 85.7%
11. Gemini 3 Flash Preview (High) - 84.7%
12. GPT-5.2 Pro (Medium) - 81.2%
13. Opus 4.5 (Thinking, 64K) - 80.0%
14. Grok 4 (Refine.) - 79.6%
15. GPT-5.2 (High) - 78.7%
16. Opus 4.5 (Thinking, 32K) - 75.8%
17. Gemini 3 Pro - 75.0%
18. GPT-5.1 (Thinking, High) - 72.8%
19. GPT-5.2 (Medium) - 72.7%
20. Opus 4.5 (Thinking, 16K) - 72.0%
=== ARC-AGI-2 Leaderboard ===
1. Gemini 3 Deep Think (2/26) - 84.6%
2. GPT-5.2 (Refine.) - 72.9%
3. Claude Opus 4.6 (120K, High) - 69.2%
4. Claude Opus 4.6 (120K, Max) - 68.8%
5. Claude Opus 4.6 (120K, Medium) - 66.3%
6. Claude Opus 4.6 (120K, Low) - 64.6%
7. GPT-5.2 Pro (High) - 54.2%
8. Gemini 3 Pro (Refine.) - 54.0%
9. GPT-5.2 (X-High) - 52.9%
10. Gemini 3 Deep Think (Preview) ² - 45.1%
11. GPT-5.2 (High) - 43.3%
12. GPT-5.2 Pro (Medium) - 38.5%
13. Opus 4.5 (Thinking, 64K) - 37.6%
14. Gemini 3 Flash Preview (High) - 33.6%
15. Gemini 3 Pro - 31.1%
16. Grok 4 (Refine.) - 29.4%
17. NVARC - 27.6%
18. GPT-5.2 (Medium) - 26.7%
19. Opus 4.5 (Thinking, 16K) - 22.8%
20. GPT-5 Pro - 18.3%
#ARCAGI1 #ARCAGI2 #ai #LLM