
Humanity's Last Exam: The AI Benchmark for LLM Reasoning
Learn about Humanity's Last Exam (HLE), the advanced AI benchmark created to test true LLM reasoning with graduate-level questions that stump current models.

Learn about Humanity's Last Exam (HLE), the advanced AI benchmark created to test true LLM reasoning with graduate-level questions that stump current models.

Examines how an OpenAI AI system achieved a gold medal score at the 2025 IMO, detailing its performance, natural-language proofs, and AI reasoning ability.