Without any tool assistance, the model achieved a 48.4% accuracy rate on the “Humanity’s Last Exam” (HLE) benchmark and scored 84.6% on the ARC-AGI-2 test. It also reached gold medal level in the written portions of the 2025 International Physics and Chemistry Olympiads. Google stated that the new model is designed to help researchers tackle “unsolvable” problems—ranging from identifying flaws in research papers to optimizing semiconductor crystal growth.
Gemini 3 Deep Think (NASDAQ:GOOGL), Google’s deep thinking model, has undergone a significant upgrade, taking its reasoning capabilities from abstract theory to practical applications. This upgrade focuses on solving complex challenges in modern scientific research and engineering, marking Google’s strategic investment in the enterprise AI market.
On Thursday, February 12, Google officially announced the Gemini 3 Deep Think upgrade, stating that the updated model achieved breakthrough results across several industry benchmarks, including 84.6% in the ARC-AGI-2 test (verified by the ARC Prize Foundation) and an Elo score of 3455 on the competitive programming platform Codeforces.

The upgraded deep thinking model is now available to Google AI Ultra subscribers and is accessible through the Gemini API for selected researchers, engineers, and enterprise users for early access. Google reported that the model has already shown practical value in real-world research, from detecting logical flaws in research papers to optimizing semiconductor material growth processes.
This release positions Google to directly compete with OpenAI’s o1 series and Anthropic’s Claude in the AI reasoning model race. As general AI capabilities become increasingly commoditized, specialized reasoning abilities have become the new battleground in the enterprise market. The launch of the deep thinking model signals that Google is unwilling to concede in this high-value sector.
From Benchmark Results to Gold Medal Performance
Google highlighted the deep thinking model’s performance on rigorous academic benchmarks. In addition to the previously mentioned results, Gemini 3 Deep Think achieved gold medal levels in the written portions of the 2025 International Physics and Chemistry Olympiads and scored 50.5% in the CMT-Benchmark advanced theoretical physics test.
Comparative results from Google show that Gemini 3 Deep Think surpassed the strongest models from Anthropic and OpenAI in several tests, including outperforming the Gemini 3 Pro preview version. For instance, in the ARC-AGI-2 test, Gemini 3 Deep Think scored 84.6%, while Anthropic’s Claude Opus 4.6 Thinking Max achieved 68.8%, and OpenAI’s GPT-5.2 Thinking xhigh scored 52.9%.

Google’s team stated that this upgrade was developed in close collaboration with scientists and researchers to address research challenges that lack clear boundaries or single correct answers, often involving messy or incomplete data. The model combines deep scientific knowledge with practical engineering capabilities, bridging the gap from abstract theory to practical applications.
Beyond breakthroughs in mathematics and programming, the deep thinking model has extended its performance to multiple scientific fields, including chemistry and physics (including theoretical physics). This broad applicability means the model is no longer limited to specific disciplines, but rather serves as a cross-disciplinary research tool.
Real-World Application Cases Validate Its Value
Early test users have demonstrated the model’s real-world potential. Lisa Carbone, a mathematician at Rutgers University, used the deep thinking model to review a highly specialized mathematical paper while researching the required structures for high-energy physics. The model successfully identified a subtle logical flaw that had previously gone undetected despite peer review.
At Duke University, Wang Lab used the deep thinking model to optimize the manufacturing method for complex crystal growth, aiming at the discovery of potential semiconductor materials. The model successfully designed a formula that grew thin films over 100 microns in thickness, achieving precision that was previously unattainable with prior methods.
Anupam Pathak, head of research and development at Google’s Platforms & Devices Division and former CEO of Liftware, tested the upgraded deep thinking model to accelerate the design of physical components.
Another use case showcased by Google demonstrated how the upgraded Gemini 3 Deep Think could convert sketches into 3D printable physical models. The model can analyze blueprints, model complex shapes, and generate the necessary files for 3D printing.
Strategic Positioning in the Enterprise Market
This upgrade reflects a broader shift in the AI industry—from general chatbots to specialized reasoning engines that can tackle professional-grade problems. For enterprise clients, evaluation criteria are changing, focusing not only on which AI can write code or summarize documents the fastest, but also on reasoning capabilities—whether the model can handle complex financial models, analyze experimental data, identify methodological flaws, or assist in patent research or drug discovery.
Google’s advantage lies in its integration capabilities. The deep thinking model is not an isolated tool but part of the broader Gemini ecosystem, meaning it can leverage Google’s vast knowledge graph, scientific datasets, and research partnerships. Researchers using deep thinking through Google Cloud theoretically have access to computational power and data sources that standalone AI services cannot match.
On Thursday, the company posted on X, saying, “The upgraded deep thinking model is driving discoveries and helping researchers solve ‘unsolvable’ problems—from finding flaws in research papers to optimizing semiconductor (crystal) growth.” This statement underscores the model’s transition from benchmark tests to real-world applications.
From a product strategy perspective, Google is targeting both consumer and enterprise users. Google AI Ultra subscribers can immediately access the model via the Gemini app, while scientists, engineers, and enterprise users can apply for early access through the Gemini API. This layered strategy reflects Google’s dual goal of maintaining a presence in the consumer market while vying for high-value enterprise clients.
AI Reasoning Model Competition Heats Up
The launch of the deep thinking model puts Google in direct competition with OpenAI and Anthropic in the AI reasoning race. OpenAI’s o1 model reportedly spends more time “thinking” before generating responses, using reinforcement learning to improve reasoning chains. Anthropic’s Claude 3 has also carved out a niche in research and analytical tasks. Now, Google has staked its claim in the same field, backed by the infrastructure and distribution advantages of being integrated into Workspace and Cloud Platform.
For professional users, this means making a choice between fast general responses and slower, deeper reasoning, which could lead to a new architectural decision. Applications may route simple queries to standard models while escalating complex issues to the reasoning model, creating a layered AI reasoning approach.
Google posted on X on Thursday: “Gemini 3 Deep Think performed exceptionally well in pushing the frontiers of intelligence in benchmark tests. Specific data: 48.4% in ‘Humanity’s Last Exam’ (without tools), 84.6% in ARC-AGI-2 (verified by the ARC Prize Foundation), and an Elo rating of 3455 on Codeforces.”
Google also pointed out that the model now excels in fields like chemistry and physics.
The true test of this competition, however, will not be the press releases, but real-world adoption. If research institutions and engineering firms begin using deep thinking models to tackle complex tasks, it will validate Google’s judgment—that the future of enterprise AI lies in depth, not speed. The company has made it clear: it is competing for the high-end sector of the AI market, where reasoning matters more than conversation.

