DeepMind reports research-level results from Gemini Deep Think

Google DeepMind has published two research papers describing how its Gemini Deep Think system is being used to assist with research-level problems in mathematics, computer science, physics, and economics.

An announcement, posted February 11 on the official DeepMind blog^[1], outlines a math-focused research agent and a broader set of human-AI collaboration techniques. The papers are titled “Towards Autonomous Mathematics Research” and “Accelerating Scientific Research with Gemini: Case Studies and Common Techniques.”^[2]^[3]

Gemini Deep Think previously achieved gold-medal standard performance at the International Mathematical Olympiad in 2025 and later reached similar results at the International Collegiate Programming Contest.^[1] The new work moves beyond student-level benchmarks into professional research settings.

Yossi Matias, Head of Google Research, posted the following on X about the research papers:

The first paper introduces a math research agent internally codenamed Aletheia.^[2]

According to the authors, the system operates through an iterative generate-verify-revise loop. It includes a natural language verifier designed to identify logical flaws in candidate solutions and can restart or revise its reasoning when errors are detected. The agent can also explicitly report failure when it cannot solve a problem.

The paper describes several results achieved with varying levels of autonomy.

Among them:

A fully AI-generated research paper in arithmetic geometry calculating structure constants known as eigenweights.
A human-AI collaboration paper proving bounds related to independent sets in interacting particle systems.
A semi-autonomous evaluation of 700 open problems listed in Bloom’s Erdős Conjectures database, including autonomous solutions to four open questions.
Contributions of intermediate propositions to two additional research papers.

The second paper documents 18 research problems addressed in collaboration with academic experts.^[3]

Examples described in the paper include:

Constructing a counterexample that refuted a conjecture proposed in 2015 in online submodular optimization.
Applying tools from continuous mathematics, such as the Kirszbraun theorem and Stone-Weierstrass theorem, to problems including Max-Cut and Steiner Tree.
Providing a theoretical explanation for an adaptive penalty mechanism in machine learning optimization.
Extending an auction-theory result from rational-number domains to continuous real numbers.
Deriving an analytical solution in cosmic string physics using Gegenbauer polynomials to convert an infinite series into a finite sum.

DeepMind states that roughly half of the results described are targeting strong conferences, including an ICLR 2026 acceptance, according to the blog. All benchmark results cited in the blog were graded by human experts, according to the company.^[1]

DeepMind describes the effort as part of ongoing research into how advanced reasoning systems can assist professional scientific workflows.

Update (Feb. 12, 2026):

A day after Google DeepMind published its blog post outlining how its Gemini Deep Think system is assisting with research-level mathematics and scientific work, Google announced a major upgrade to Gemini 3 Deep Think.^[4]

The update expands benchmark performance claims and, for the first time, makes Deep Think available via the Gemini API through an early access program for researchers and enterprises.^[4]

Google says the updated model achieves:

48.4% on Humanity’s Last Exam (without tools)
84.6% on ARC-AGI-2 (verified by the ARC Prize Foundation)
An Elo rating of 3455 on Codeforces
Gold-medal-level results on the 2025 International Math Olympiad
Gold-level performance on the written sections of the 2025 International Physics and Chemistry Olympiads

Dr Haozhe "Harry" Wang, Assistant Professor at Duke University, said his team used Deep Think in their work to design new semiconductors. According to Dr Wang, the system helped identify a “sweet spot in parameters” for 2D materials and generated a full thermal profile to guide optimization.

DeepMind reports research-level results from Gemini Deep Think

References

Cite This Article

Study Suggests LLM Leaderboards May Be More Fragile Than They Appear

4-Minute Picture Task Helps AI Screen for Addiction Risk