Google’s Gemini lineup now spans four active models, three model generations, and a pricing range from $0.10 to $12.00 per million output tokens. That’s more than a 100x difference between the cheapest and most capable options, which means the choice matters.
This guide covers every current Gemini model, what each one is built for, and where Gemini genuinely pulls ahead of competitors. It ends with a decision table so you can match your use case to the right model without reading the full spec sheet.
Where Gemini Stands Out in 2026
Gemini is not the only multimodal AI family on the market. GPT-5, Claude 4.x, and others all process text, images, audio, and video. What actually differentiates Gemini is more specific:
| Feature | Gemini | Competitors (GPT-5, Claude 4.x) |
|---|---|---|
| Context Length | 1 million tokens (Gemini 2.5 Pro, 3.1 Pro Preview) | 400K tokens (GPT-5) |
| Benchmark Leadership | Highest scores (as of March 2026) on ARC-AGI-2 (77.1%) and GPQA Diamond (94.3%) with Gemini 3.1 Pro Preview | Not specified, but Gemini holds the highest published scores |
| Native Integration | Google Search, Workspace (Docs, Gmail, Sheets), Google Cloud | Less integrated with Google ecosystem |
| Grounding | Built-in feature with live Google Search results, cited sources | Described as an add-on or not explicitly mentioned as built-in |
| Price-to-Performance | Gemini 2.5 Flash at $0.30 per million input tokens, 1M context window | Generally higher cost for comparable models with similar features |
Current Gemini Models
Released February 19, 2026, Gemini 3.1 Pro Preview is Google’s current flagship model and the most capable model in the Gemini lineup. The “.1” increment signals a focused intelligence upgrade over Gemini 3 Pro rather than a full architectural rebuild: the same multimodal foundation with substantially stronger reasoning.
The reasoning improvement is significant. Gemini 3.1 Pro Preview scored 77.1% on ARC-AGI-2, more than double the 31.1% posted by its predecessor Gemini 3 Pro just three months earlier. It also achieved 94.3% on GPQA Diamond (graduate-level science) and 80.6% on SWE-Bench Verified (real-world software engineering). These are benchmark-leading scores across all publicly available models as of this writing.
The model includes an adjustable reasoning system with Low, Medium, and High thinking levels, letting you trade response speed for reasoning depth depending on the task.
Gemini 3.1 Pro Preview
Overview
Flagship Gemini model: same multimodal foundation as Gemini 3 Pro with a focused “.1” intelligence upgrade—substantially stronger reasoning, not a full architectural rebuild. Benchmark-leading among publicly available models as of this writing.
Benchmark chart
ARC-AGI-2
GPQA Diamond (graduate-level science)
SWE-Bench Verified (software engineering)
Reasoning is adjustable: Low, Medium, and High thinking levels trade speed for depth.
Specifications
| Metric | Value |
|---|---|
| Context window | 1M tokens |
| Max output | 64K tokens |
| Input cost | $2.00 / 1M (up to $4.00 / 1M above 200K) |
| Output cost | $12.00 / 1M |
| Speed | ~109 tok/s |
| Knowledge cutoff | Jan 2025 |
Capabilities
| Capability | Detail |
|---|---|
| Modalities | Text, image, audio, video input |
| Tools | Function calling, web search (Grounding), code execution, file upload |
| Reasoning | Low / Medium / High thinking levels |
Best for
| Use case focus | Notes |
|---|---|
| Primary strengths | Complex multi-step reasoning, scientific research, agentic workflows, hard coding problems |
Gemini 2.5 Pro
Gemini 2.5 Pro is Google’s primary production-ready model for complex tasks. Released May 2025, it predates the 3.1 Pro Preview but remains widely used for its strong balance of capability, context length, and cost. At $1.25/1M input tokens, it’s meaningfully cheaper than 3.1 Pro Preview while still supporting 1M token context windows and integrated thinking mode.
The main trade-off versus 3.1 Pro Preview is reasoning benchmark performance. For document analysis, coding, and detailed Q&A, the gap is marginal. For cutting-edge research or benchmark-sensitive tasks, 3.1 Pro Preview is the upgrade.
Gemini 2.5 Pro
Overview
Gemini 2.5 Pro is Google's primary production-ready model for complex tasks. Released May 2025, it predates the 3.1 Pro Preview but remains widely used for its strong balance of capability, context length, and cost. At $1.25/1M input tokens, it's meaningfully cheaper than 3.1 Pro Preview while still supporting 1M token context windows and integrated thinking mode.
The main trade-off versus 3.1 Pro Preview is reasoning benchmark performance. For document analysis, coding, and detailed Q&A, the gap is marginal. For cutting-edge research or benchmark-sensitive tasks, 3.1 Pro Preview is the upgrade.
vs Gemini 3.1 Pro Preview
Input cost ($/1M, list ≤200K context)
Output cost ($/1M)
Speed (tok/s)
Bars are normalized within each metric (100% = higher of the two). Input/output: longer bar = higher price. Speed: longer bar = higher tok/s.
Specifications
| Metric | Value |
|---|---|
| Context window | 1M tokens |
| Max output | 64K tokens |
| Input cost | $1.25 / 1M (up to $2.50 / 1M above 200K) |
| Output cost | $10.00 / 1M |
| Speed | ~148 tok/s |
| Knowledge cutoff | Jan 2025 |
Highlights
| Area | Detail |
|---|---|
| Role | Primary production-ready Pro model for complex tasks |
| Thinking | Integrated thinking mode |
| Economics | Lower list input/output pricing than 3.1 Pro Preview with same 1M / 64K context and output limits |
Best for
| Use case focus | Notes |
|---|---|
| Primary strengths | Long-document analysis, complex Q&A, production coding tasks, teams wanting Pro-quality at a lower price than 3.1 |
Gemini 2.5 Flash
Gemini 2.5 Flash is Google’s best value model for high-volume use cases. Released June 2025, it balances speed and quality well: at $0.30/1M input tokens, it costs less than a quarter of Gemini 2.5 Pro while still supporting a 1M context window, thinking mode, and Grounding.
The speed improvement over 2.5 Pro is real. At around 201 tokens per second versus Pro’s 148, Flash is noticeably faster for applications where latency matters. For most content generation, summarization, classification, and chatbot tasks, Flash delivers results close to Pro quality at a fraction of the cost.
Gemini 2.5 Flash
Overview
Gemini 2.5 Flash is Google's best value model for high-volume use cases. Released June 2025, it balances speed and quality well: at $0.30/1M input tokens, it costs less than a quarter of Gemini 2.5 Pro while still supporting a 1M context window, thinking mode, and Grounding.
The speed improvement over 2.5 Pro is real. At around 201 tokens per second versus Pro's 148, Flash is noticeably faster for applications where latency matters. For most content generation, summarization, classification, and chatbot tasks, Flash delivers results close to Pro quality at a fraction of the cost.
vs Gemini 2.5 Pro
Input cost ($/1M, list pricing)
Output cost ($/1M)
Speed (tok/s)
Bars are normalized within each metric (100% = higher of the two). Input/output: longer bar = higher price. Speed: longer bar = higher tok/s. Pro list input uses the ≤200K tier ($1.25/1M) for comparison.
Specifications
| Metric | Value |
|---|---|
| Context window | 1M tokens |
| Max output | 64K tokens |
| Input cost | $0.30 / 1M |
| Output cost | $2.50 / 1M |
| Speed | ~201 tok/s |
| TTFT | 0.46s |
| Knowledge cutoff | Jan 2025 |
Highlights
| Area | Detail |
|---|---|
| Value | Best value tier for high-volume traffic; list input under a quarter of 2.5 Pro |
| Features | 1M context, thinking mode, and Grounding (web search) |
| Latency | Higher tok/s than 2.5 Pro for latency-sensitive workloads |
Best for
| Use case focus | Notes |
|---|---|
| Primary strengths | High-volume applications, chatbots, summarization, content generation, cost-sensitive production workloads |
Gemini 2.0 Flash
Gemini 2.0 Flash is Google’s most affordable model still in active service. At $0.10/1M input tokens, it is the lowest-cost Gemini option for teams running very high volumes at tight budgets. The trade-off is capability and output length: the model uses an older architecture and is capped at 8K tokens of output per response, limiting its usefulness for longer-form generation.
For new projects, Gemini 2.5 Flash is the recommended alternative. Gemini 2.0 Flash remains relevant for legacy integrations or extreme-volume scenarios where the per-token cost difference is the deciding factor.
Gemini 2.0 Flash
Overview
Gemini 2.0 Flash is Google's most affordable model still in active service. At $0.10/1M input tokens, it is the lowest-cost Gemini option for teams running very high volumes at tight budgets. The trade-off is capability and output length: the model uses an older architecture and is capped at 8K tokens of output per response, limiting its usefulness for longer-form generation.
For new projects, Gemini 2.5 Flash is the recommended alternative. Gemini 2.0 Flash remains relevant for legacy integrations or extreme-volume scenarios where the per-token cost difference is the deciding factor.
vs Gemini 2.5 Flash
Input cost ($/1M, list pricing)
Output cost ($/1M)
Speed (tok/s)
Bars are normalized within each metric (100% = higher of the two). Input/output: longer bar = higher price. Speed: longer bar = higher tok/s.
Specifications
| Metric | Value |
|---|---|
| Context window | 1M tokens |
| Max output | 8K tokens per response |
| Input cost | $0.10 / 1M |
| Output cost | $0.40 / 1M |
| Speed | ~233 tok/s |
| Knowledge cutoff | 2024 |
Highlights
| Area | Detail |
|---|---|
| Pricing | Lowest list input cost among active Gemini models shown here |
| Limits | Older architecture; 8K max output caps long-form generation vs 2.5 Flash (64K) |
| Guidance | New projects: prefer Gemini 2.5 Flash; keep 2.0 Flash for legacy or extreme-volume cost optimization |
Best for
| Use case focus | Notes |
|---|---|
| Primary strengths | Lowest-cost production tasks, legacy integrations, extreme-volume batch processing |
Thinking Mode and Grounding: Features, Not Separate Models
Two capabilities often show up as separate entries when comparing Gemini options. They are not distinct models.
Thinking Mode: Integrated into Gemini 2.5 Flash, 2.5 Pro, and 3.1 Pro Preview, thinking mode enables the model to reason through problems step by step before generating a response. In Gemini 3.1 Pro Preview, this is adjustable (Low, Medium, High) to trade speed for depth. It is not a separate model to select; it is enabled within the model you are already using.
Grounding: Also built into current Gemini models, Grounding links model responses to live Google Search results. This reduces hallucinations on time-sensitive topics and provides cited sources alongside the answer. It is particularly useful when recency of information matters. Enable it via the API or Google AI Studio without switching models.
Gemini 2.5 Pro vs Gemini 2.5 Flash: Which Should You Use?
These are the two most commonly compared models in the Gemini family. They share the same context window, the same knowledge cutoff, and both support thinking mode and Grounding. The difference comes down to cost, speed, and reasoning depth.
Gemini 2.5 Pro vs Gemini 2.5 Flash: Which Should You Use?
Overview
These are the two most commonly compared models in the Gemini family. They share the same context window, the same knowledge cutoff, and both support thinking mode and Grounding. The difference comes down to cost, speed, and reasoning depth.
Cost & speed at a glance
Input cost ($/1M, list pricing)
Output cost ($/1M)
Speed (tok/s)
Bars are normalized within each metric (100% = higher of the two). Input/output: longer bar = higher price. Speed: longer bar = higher tok/s.
Full comparison
| Gemini 2.5 Pro | Gemini 2.5 Flash | |
|---|---|---|
| Released | May 2025 | June 2025 |
| Input price | $1.25/1M tokens | $0.30/1M tokens |
| Output price | $10.00/1M tokens | $2.50/1M tokens |
| Context window | 1M tokens | 1M tokens |
| Max output | 64K tokens | 64K tokens |
| Speed | ~148 tok/s | ~201 tok/s |
| Knowledge cutoff | January 2025 | January 2025 |
| Thinking mode | Yes (integrated) | Yes (integrated) |
| Grounding | Yes | Yes |
| Best for | Complex reasoning, research, nuanced coding | High-volume apps, summarization, cost-sensitive tasks |
Bottom line: For most teams, Gemini 2.5 Flash is the right starting point. It handles the majority of content, summarization, coding, and Q&A tasks at 24% of the cost. Move to Gemini 2.5 Pro when task complexity requires deeper reasoning, or to Gemini 3.1 Pro Preview when you need the strongest reasoning available at any price.
Gemini Models Comparison Table
All current Gemini models at a glance. Pricing is per 1 million tokens via the Google AI Studio API.
Gemini models — comparison chart
Models
Metrics chart
Input cost ($/1M)
Output cost ($/1M)
Speed (tok/s)
Max output (per response)
Each group is scaled to the largest value in that group (100%). All four models share a 1M token context window. Input/output bars: longer = higher price. Speed: longer = higher tok/s. Max output: 8K vs 64K for 2.0 Flash vs the others.
Best for
Gemini 3.1 Pro Preview
Advanced reasoning, research, agentic workflows
Gemini 2.5 Pro
Complex tasks, long-context analysis, coding
Gemini 2.5 Flash
High-volume tasks, best price-performance
Gemini 2.0 Flash
Budget use, legacy integrations
Speed figures are approximate median throughput from benchmark data. Pricing verified from Google AI Studio and Vertex AI, March 27, 2026. Gemini 3.1 Pro Preview input/output costs double above 200K tokens in a single context.
Which Gemini Model Should You Use?
Use this table to match your task to the right model. If you’re not sure where to start, Gemini 2.5 Flash handles the majority of workloads at a strong price-to-quality ratio.
| If you need… | Best model | Runner-up | Avoid |
| Maximum reasoning quality | Gemini 3.1 Pro Preview | Gemini 2.5 Pro | Gemini 2.0 Flash (older arch) |
| Complex tasks / research | Gemini 2.5 Pro | Gemini 3.1 Pro Preview | Gemini 2.0 Flash |
| High-volume / budget | Gemini 2.5 Flash | Gemini 2.0 Flash | Gemini 3.1 Pro (overpriced) |
| Long-document analysis | Gemini 2.5 Pro (1M ctx) | Gemini 3.1 Pro Preview | Gemini 2.0 Flash (8K output) |
| Real-time web answers | Any model + Grounding | Any model + Grounding | No grounding enabled |
| Coding and STEM | Gemini 3.1 Pro Preview | Gemini 2.5 Pro | Gemini 2.0 Flash |
| Google Workspace tasks | Gemini 2.5 Pro | Gemini 2.5 Flash | Gemini 2.0 Flash |
Speed figures are approximate median throughput from benchmark data. Pricing verified from Google AI Studio and Vertex AI, March 27, 2026. Gemini 3.1 Pro Preview input/output costs double above 200K tokens in a single context.
TeamAI gives your team access to Gemini 2.5 Pro, Gemini 3 Pro Preview, GPT-5, Claude 4.5 Sonnet, DeepSeek, and 20+ other models in a single shared workspace. Switch between models in one click, share prompts and workflows across your team, and compare outputs side by side. See how it works at teamai.com
Legacy and Discontinued Gemini Models
These models are no longer the recommended choice for new projects. They are listed here for teams managing existing integrations or tracking the Gemini lineage.
| Model | Context | Input $/1M | Output $/1M | Status |
| Gemini 2.0 Pro Experimental | 2M | N/A | N/A | Preview discontinued. Superseded by Gemini 2.5 Pro. |
| Gemini 2.0 Flash Thinking | 1M | N/A | N/A | Not a separate model. Thinking is now integrated into 2.5 Flash. |
| Gemini 1.5 Pro / Flash | 1M | Deprecated | Deprecated | Superseded. Use Gemini 2.5 Pro or Flash. |
| Gemini 1.0 / PaLM 2 | N/A | Deprecated | Deprecated | Fully retired. PaLM 2 was Gemini’s predecessor. |