Achieving IMO Gold: Robust Mathematical Reasoning with LLMs

Location: 

Astro-Math Bldg. 202

Day and Time: 

2025-11-03 (Monday) 14:00 - 15:00

Abstract: 

Abstract: In 2025, an advanced version of Gemini with Deep Think achieved a historic gold-medal standard at the International Mathematical Olympiad (IMO), the world’s most prestigious

competition for young mathematicians. This breakthrough represents a significant leap in AI reasoning, as our model solved five of the six problems end-to-end in natural language—producing rigorous, human-readable proofs directly from the problem statements, all within the official competition time limit.

This talk will provide high-level ideas and design principles about the LLM and how it achieves robust mathematical reasoning. A critical challenge in advancing mathematical AI is that existing evaluation benchmarks are often too simplistic or focus only on correct short answers, failing to assess the deep, multi-step reasoning required for Olympiad-level mathematics. We will present how we overcame this issue by establishing a new north-star for our research: a suite of rigorous evaluation benchmarks designed to reward the entire problem-solving process. This talk will detail the design of these benchmarks, and explore how they, in turn, guided the high-level research and design principles that were fundamental to achieving gold-level performance.

Refreshment: 13:30
Organizers:
Olivier Hénot(NTU), Chun-Ju Lai(AS), Chao Ming Lin(NTU), Colin McSwiggen(AS), Masao Oi(NTU)

附件: