Gemini 3.1 Pro claims top-tier reasoning gains
Written by Joseph Nordqvist/
4 min read
Google has released Gemini 3.1 Pro, a new version of its “Pro” model line that it says upgrades the core reasoning capabilities behind recent Gemini 3 advances. [1]
The company is rolling it out starting February 19, 2026 across consumer products and developer and enterprise platforms.
Context and background
Gemini 3.1 Pro arrives about a week after Google DeepMind’s update to Gemini 3 Deep Think, which the company positioned as a research-focused system for science and engineering workflows. [2]
Google describes 3.1 Pro as the “core intelligence” behind those improvements, now packaged for broader day-to-day use.
Key details
Google says Gemini 3.1 Pro is designed for tasks where a short, direct answer is not enough, including multi-step reasoning, synthesis, and complex problem solving.
Availability (starting today):
Developers (preview): via the Gemini API in Google AI Studio, plus access through Gemini CLI, Android Studio, and Google’s agentic development tooling (including “Antigravity,” per Google’s announcement).
Enterprise: via Vertex AI and Gemini Enterprise.
Consumers: via the Gemini app and NotebookLM, with higher usage limits tied to Google’s paid tiers.
Benchmark claims
Google highlights performance on ARC-AGI-2, reporting a verified score of 77.1% for Gemini 3.1 Pro and describing it as more than double Gemini 3 Pro’s reasoning performance on that benchmark.

Google also publishes a broader benchmark table for Gemini 3.1 Pro across reasoning, coding, agentic tool use, multilingual evaluation, and long-context testing.

Google also showcased the new model’s SVG generation capabilities (from the official Google blog):
I used the new model to create an animated hummingbird SVG and it was surprisingly good:
Why this matters
For most people, the practical question is not whether a model can answer trivia. It is whether it can reliably complete multi-step work without losing the thread.
Google is positioning Gemini 3.1 Pro as a stronger default model for those longer tasks, while simultaneously pushing it into products where users expect it to be dependable, including NotebookLM for research workflows and Vertex AI for production deployments.
The benchmark emphasis also signals where Google wants to compete: not only on general reasoning, but on agent-style evaluation categories such as tool use and coding. The company’s publication of an evaluation methodology document makes it easier to compare results, but it does not eliminate the need to examine how each benchmark was run and whether conditions match real-world use.
Keeping up with other SOTA model releases
Over the past few weeks, the frontier model market has moved quickly, with major labs shipping new flagships and specialized variants in short succession. OpenAI introduced GPT-5.3-Codex and later GPT-5.3-Codex-Spark, positioning them around agentic coding and faster interactive development workflows.
Anthropic has also updated its top tier models, releasing Claude Opus 4.6 (and a fast mode for it) and then, in the same month, releasing Claude Sonnet 4.6; with both announcements emphasizing stronger reasoning and coding performance, plus support for long-context work.
Google’s release of Gemini 3.1 Pro fits into the same pattern. The company is framing 3.1 Pro as a step up in “core reasoning,” and is rolling it out broadly across consumer, developer, and enterprise surfaces rather than limiting it to a single product. It also comes just a day after Google began rolling out Lyria 3, a built-in tool for generating short music tracks, in the Gemini app.
Editorial Transparency
This article was produced with the assistance of AI tools as part of our editorial workflow. All analysis, conclusions, and editorial decisions were made by human editors. Read our Editorial Guidelines
References
- 1.
Gemini 3.1 Pro: A smarter model for your most complex tasks — The Gemini Team, Google, February 19, 2026
Primary - 2.
Gemini 3 Deep Think: Advancing science, research and engineering — The Deep Think team, Google, February 12, 2026
Was this useful?
More in Models
View all- Gemma 4 on an iPhone: here's what a 2B model can actually doApril 8, 2026
- Gemma 4 E4B benchmark: 53 tests on a MacBook Pro M4 (24GB)April 3, 2026
- Karpathy says LLM memory features "trying too hard"March 26, 2026
- Moonshot AI proposes new method for how LLM layers share information, claims 1.25x compute advantageMarch 16, 2026
Related stories
GPT-5.4 arrives with native computer use, 1M context, and a new tool search capability
OpenAI released GPT-5.4 today, billing it as its most capable model to date for professional work, coding, and automated tasks.
March 5, 2026
ModelsGoogle releases Gemini 3.1 Flash-Lite
Google has released a preview of Gemini 3.1 Flash-Lite, positioning it as the fastest and most cost-efficient model in the Gemini 3 series.
March 3, 2026
ModelsAlibaba launches Qwen 3.5 small model series with sub-1B edge options
Alibaba's Qwen team has released four new small language models — Qwen3.5-0.8B, 2B, 4B, and 9B — alongside their base model counterparts, extending the Qwen3.5 architecture to compact, resource-efficient deployments.
March 2, 2026
IndustryClaude hits #1 on the US App Store after Pentagon dispute
After the Pentagon banned Anthropic for refusing to allow Claude to be used for mass surveillance and autonomous weapons, users responded by making Claude the #1 app on the US App Store, overtaking ChatGPT for the first time.
March 1, 2026