DeepMind reports research-level results from Gemini Deep Think
Written by Joseph Nordqvist/
4 min read
Google DeepMind has published two research papers describing how its Gemini Deep Think system is being used to assist with research-level problems in mathematics, computer science, physics, and economics.
An announcement, posted February 11 on the official DeepMind blog[1], outlines a math-focused research agent and a broader set of human-AI collaboration techniques. The papers are titled “Towards Autonomous Mathematics Research” and “Accelerating Scientific Research with Gemini: Case Studies and Common Techniques.”[2][3]
Gemini Deep Think previously achieved gold-medal standard performance at the International Mathematical Olympiad in 2025 and later reached similar results at the International Collegiate Programming Contest.[1] The new work moves beyond student-level benchmarks into professional research settings.
Yossi Matias, Head of Google Research, posted the following on X about the research papers:
The first paper introduces a math research agent internally codenamed Aletheia.[2]
According to the authors, the system operates through an iterative generate-verify-revise loop. It includes a natural language verifier designed to identify logical flaws in candidate solutions and can restart or revise its reasoning when errors are detected. The agent can also explicitly report failure when it cannot solve a problem.
The paper describes several results achieved with varying levels of autonomy.
Among them:
A fully AI-generated research paper in arithmetic geometry calculating structure constants known as eigenweights.
A human-AI collaboration paper proving bounds related to independent sets in interacting particle systems.
A semi-autonomous evaluation of 700 open problems listed in Bloom’s Erdős Conjectures database, including autonomous solutions to four open questions.
Contributions of intermediate propositions to two additional research papers.
The second paper documents 18 research problems addressed in collaboration with academic experts.[3]
Examples described in the paper include:
Constructing a counterexample that refuted a conjecture proposed in 2015 in online submodular optimization.
Applying tools from continuous mathematics, such as the Kirszbraun theorem and Stone-Weierstrass theorem, to problems including Max-Cut and Steiner Tree.
Providing a theoretical explanation for an adaptive penalty mechanism in machine learning optimization.
Extending an auction-theory result from rational-number domains to continuous real numbers.
Deriving an analytical solution in cosmic string physics using Gegenbauer polynomials to convert an infinite series into a finite sum.
DeepMind states that roughly half of the results described are targeting strong conferences, including an ICLR 2026 acceptance, according to the blog. All benchmark results cited in the blog were graded by human experts, according to the company.[1]
DeepMind describes the effort as part of ongoing research into how advanced reasoning systems can assist professional scientific workflows.
Update (Feb. 12, 2026):
A day after Google DeepMind published its blog post outlining how its Gemini Deep Think system is assisting with research-level mathematics and scientific work, Google announced a major upgrade to Gemini 3 Deep Think.[4]
The update expands benchmark performance claims and, for the first time, makes Deep Think available via the Gemini API through an early access program for researchers and enterprises.[4]
Google says the updated model achieves:
48.4% on Humanity’s Last Exam (without tools)
84.6% on ARC-AGI-2 (verified by the ARC Prize Foundation)
An Elo rating of 3455 on Codeforces
Gold-medal-level results on the 2025 International Math Olympiad
Gold-level performance on the written sections of the 2025 International Physics and Chemistry Olympiads
Dr Haozhe "Harry" Wang, Assistant Professor at Duke University, said his team used Deep Think in their work to design new semiconductors. According to Dr Wang, the system helped identify a “sweet spot in parameters” for 2D materials and generated a full thermal profile to guide optimization.
Editorial Transparency
This article was produced with the assistance of AI tools as part of our editorial workflow. All analysis, conclusions, and editorial decisions were made by human editors. Read our Editorial Guidelines
References
- 1.
Accelerating Mathematical and Scientific Discovery with Gemini Deep Think — Thang Luong and Vahab Mirrokni, Google DeepMind, February 11, 2026
Primary - 2.
Towards Autonomous Mathematics Research — Tony Feng, Trieu H. Trinh, Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung, Joonkyung Lee, Carlo Pagano, Sang-hyun Kim, Federico Pasqualotto, Sergei Gukov, Jonathan N. Lee, Junsu Kim, Kaiying Hou, Golnaz Ghiasi, Yi Tay, YaGuang Li, Chenkai Kuang, Yuan Liu, Hanzhao (Maggie) Lin, Evan Zheran Liu, Nigamaa Nayakanti, Xiaomeng Yang, Heng-tze Cheng, Demis Hassabis, Koray Kavukcuoglu, Quoc V. Le, Thang Luong, Google, February 11, 2026
- 3.
Accelerating Scientific Research with Gemini: Case Studies and Common Techniques — David P. Woodruff, Vincent Cohen-Addad, Lalit Jain, Jieming Mao, Song Zuo, Mohammad Hossein Bateni, Simina Branzei, Michael P. Brenner, Lin Chen, Ying Feng, Lance Fortnow, Gang Fu, Ziyi Guan, Zahra Hadizadeh, Mohammad T. Hajiaghayi, Mahdi JafariRaviz, Adel Javanmard, Karthik C. S., Ken-ichi Kawarabayashi, Ravi Kumar, Silvio Lattanzi, Euiwoong Lee, Yi Li, Ioannis Panageas, Dimitris Paparas, Benjamin Przybocki, Bernardo Subercaseaux, Ola Svensson, Shayan Taherijam, Xuan Wu, Eylon Yogev, Morteza Zadimoghaddam, Samson Zhou, Vahab Mirrokni, arXiv, February 3, 2026
- 4.
Gemini 3 Deep Think: Advancing science, research and engineering — The Deep Think Team, Google, February 12, 2026
Was this useful?
More in Research
View all- Machine learning model predicts liver cancer risk from routine blood tests and health records5d ago
- Google finds a way to shrink AI memory usage by 4.5x without losing accuracy6d ago
- Moonshot AI proposes new method for how LLM layers share information, claims 1.25x compute advantageMarch 16, 2026
- Study suggests LLM leaderboards may be more fragile than they appearFebruary 10, 2026
Related stories
Machine learning model predicts liver cancer risk from routine blood tests and health records
Researchers have developed a machine learning model that predicts an individual's risk of hepatocellular carcinoma using routine clinical information already collected during standard medical visits.
5d ago
ResearchGoogle finds a way to shrink AI memory usage by 4.5x without losing accuracy
A new compression technique from Google Research could make AI models cheaper to run, faster to respond, and capable of handling much longer conversations.
6d ago
ResearchMoonshot AI proposes new method for how LLM layers share information, claims 1.25x compute advantage
Moonshot AI's Kimi Team has published a technical report proposing a change to a fundamental component of modern AI models called residual connections.
March 16, 2026
ResearchStudy suggests LLM leaderboards may be more fragile than they appear
February 10, 2026