
GPQA-Diamond Benchmark: Scores, Leaderboard & How AI Models Compare
GPQA-Diamond scores updated through 2026: Gemini 3.1 Pro (94.1%), GPT-5.2, Claude Opus 4.6, Aristotle-X1, and more. See which AI models beat PhD experts on 198 graduate-level science questions.