Google Cloud AI, the practical way
An architecture-first reference for Google Cloud's AI stack as of June 2026. The headline change: at Cloud Next '26, Vertex AI became the Gemini Enterprise Agent Platform and Agentspace folded into Gemini Enterprise. This portal covers the new platform, Gemini models, agents, grounding, generative media, and TPUs - trade-offs and risks, no marketing.
vertex-ai - same lineage.Google's 2026 pitch is owning all four layers: custom silicon (Ironwood and 8th-gen TPU 8t/8i), frontier models (Gemini 3.x, plus open Gemma 4), the platform (Gemini Enterprise Agent Platform with Agent Studio, ADK, A2A orchestration, Agent Registry/Gateway/Identity/Observability), and distribution (Workspace, 3B+ users). Model Garden fronts 200+ models including Anthropic Claude. If your data is in BigQuery and you want Gemini + TPU economics, this is the strongest full-stack story in the market - at the cost of a platform that just went through a large rebrand.
The Google Cloud AI mental model
What sets Google Cloud apart in 2026
| Differentiator | What it means in practice |
|---|---|
| Full-stack ownership | Silicon (TPU), frontier model (Gemini), platform, and distribution (Workspace) under one roof. No other vendor owns all four - it shows up as tight integration and price/perf. |
| Gemini multimodal + long context | Native text/image/audio/video with very long context windows. Strong for document, video, and multimodal RAG without bolt-ons. |
| BigQuery data gravity | If your analytics live in BigQuery, Gemini-in-BigQuery and BQ vector search bring AI to the data with no pipeline. |
| Agent-native platform | ADK, A2A protocol (multi-vendor agent interop), Agent Registry/Gateway/Identity/Observability - an opinionated, governed agent runtime. |
| TPU economics | Ironwood and 8th-gen TPUs give a training/inference cost lever, and first-class hosting for Gemini and open models. |
Where Google Cloud is weaker (be honest)
How to read this portal
Each flagship service tab has sub-tabs (Overview / Architecture / Components / Models / Risks / When to use) with a reference-architecture diagram. If you only read one sub-tab, read Risks.
What's New - late 2025 through June 2026
Material changes that affect architecture, cost, or risk. Curated.
Cloud Next '26 (April 2026) was a reset: Vertex AI to Gemini Enterprise Agent Platform, Agentspace to Gemini Enterprise, the A2A protocol v1.0 for multi-vendor agent interop, ADK 1.0, managed MCP servers via Apigee, and 8th-gen TPU 8t/8i. Model-wise, Gemini 3.x (3.1 Pro/Flash, 3.1 Flash Image), Lyria 3, and open Gemma 4 landed, with 200+ models in Model Garden including Anthropic Claude.
| Date | Release | Why it matters |
|---|---|---|
| Late 2025 | Ironwood TPU (7th gen) GA | ~4.6 PFLOPS/chip, scales to 9,216-chip superpods. Big jump in training/inference capacity and price/perf. |
| Q1 2026 | Gemini 3.x family | Gemini 3.1 Pro and Flash; very long context and stronger multimodal/reasoning. Confirm model IDs and lifecycle before pinning. |
| Apr 2026 | Gemini Enterprise Agent Platform (Vertex AI rebrand) | One platform to build, scale, govern, optimize agents - Agent Studio, ADK, Agent Runtime, A2A Orchestration, Registry, Gateway, Identity, Observability, Simulation, Evaluation. |
| Apr 2026 | Gemini Enterprise (Agentspace absorbed) | Packaged enterprise search + agents tied to Workspace; partner agents from Box, Workday, Salesforce, ServiceNow. |
| Apr 2026 | A2A protocol v1.0; ADK 1.0 | Agent-to-agent interop standard in production at ~150 orgs; stable ADK across four languages. Multi-vendor agent ecosystems become real. |
| Apr 2026 | Managed MCP servers via Apigee; Project Mariner | Apigee bridges existing APIs to agents as MCP tools; Mariner is a browsing agent. Lowers integration cost. |
| Apr 2026 | 8th-gen TPU 8t / 8i | 8t for training (scale to ~9,600 TPUs, 2 PB shared HBM), 8i for inference (Boardfly topology, ~80% better perf/$). Cost lever widens. |
| 2026 | Gemma 4 (open), Lyria 3 (music), Gemini Flash Image | Open-weight option for on-prem/customization; generative audio and image breadth. |
Service Map
The Google Cloud AI services worth knowing, grouped by what you do with them.
Formerly Vertex AI. Model Garden, ADK, Agent Runtime, A2A, Registry, Gateway, Identity, Observability, training, RAG.
Frontier multimodal (Pro/Flash) and open-weight Gemma; 200+ models in Model Garden incl. Anthropic Claude.
Packaged search + agents + Workspace; partner agents; the Agentspace successor.
Agent Development Kit, Agent-to-Agent protocol, Agent Engine/Runtime, managed MCP via Apigee.
Image, video, speech, and music generation - all on the platform.
Gemini-in-BigQuery, BQ vector search, AlloyDB AI (ScaNN), Vector Search, Spanner/Firestore vectors.
Ironwood (v7), 8th-gen 8t/8i, A3/A4 GPU VMs (Blackwell), AI Hypercomputer.
RAG Engine, grounding with Google Search, Vertex AI Search retrieval.
Prompt/response screening, safety filters, and responsible-AI tooling.
Gemini Enterprise Agent Platform was Vertex AI
The single platform to choose models, build and govern agents, evaluate, and ship - the center of gravity for AI on Google Cloud.
At Cloud Next '26, Google folded all Vertex AI capabilities into this platform and organized it around four jobs: build, scale, govern, optimize. It gives you Model Garden (200+ models), a managed agent stack (Agent Studio, ADK, Runtime, A2A), grounding/RAG, training/tuning, generative media, and Model Armor - all under IAM, VPC-SC, and Google Cloud billing. The agent stack is the new center of gravity; the classic ML tools (training, prediction, pipelines) are still here under the new name.
What problem this solves
Enterprises don't want to wire a model API, a vector store, a guardrail service, an eval harness, an agent orchestrator, and monitoring from separate vendors. This platform's offer is one governed surface where you pick models from one catalog, build and register agents, ground them in your data, and observe them in production - with Gemini and TPU economics underneath. The trade-off is the rebrand: names, SDKs, and IAM roles are mid-migration, so onboarding has a terminology tax.
vertex-ai SDK/URL paths largely still work during the transition.Reference architecture
Network and identity
The platform supports VPC Service Controls and Private Service Connect so model and agent traffic stays inside your perimeter. Authorization is Google Cloud IAM; agents get first-class Agent Identity with scoped permissions. Use CMEK for data at rest and keep secrets in Secret Manager.
Where the data goes
Google's stated position is that your prompts and data are not used to train the foundation models, and data stays within your project and chosen region. Confirm the specific Gemini model's region/availability and lifecycle before designing around it - versions have explicit expiry.
| Component | What it does |
|---|---|
| Agent Studio | Low/no-code design surface for building and testing agents. |
| Agent Development Kit (ADK 1.0) | Code-first framework for agents; stable across four languages. |
| Agent Runtime / Engine | Managed, scalable execution for agents in production. |
| A2A Orchestration | Agent-to-Agent protocol (v1.0) for multi-agent and cross-vendor coordination. |
| Agent Gateway | Connect tools/APIs (incl. managed MCP via Apigee) with governance. |
| Agent Identity / Registry | First-class agent identities + a catalog to version and govern agents in your estate. |
| Observability / Simulation / Evaluation | Trace, test against simulated and real traffic, and evaluate agent quality. |
| Model Garden + training | 200+ models, plus tuning and custom training on TPU/GPU. |
| Lever | How it bills | Control |
|---|---|---|
| Gemini API | Per input/output token (and per modality), per model tier. | Flash for routine; Pro only when needed; cap output; context caching. |
| Provisioned Throughput | Reserved capacity for steady high volume. | Commit after you know the load. |
| Agent Runtime | Model tokens x steps + runtime + tool calls. | Cap loop length; route cheap model for routine steps. |
| Training / tuning | Accelerator-hours (TPU/GPU). | Tune only with evidence; TPUs for price/perf. |
- Use the platform for any GenAI workload on Google Cloud - one governed surface for models, agents, grounding, and evals.
- Lead with Gemini Flash + the agent stack; reserve Provisioned Throughput once volume is steady.
- Adopt A2A only when you have multi-agent or multi-vendor needs.
- Use Gemini Enterprise (packaged) if you want the M365-style buy rather than build.
Model Garden
One catalog, 200+ models - Google first-party, open, and third-party - behind the platform's API and governance.
| Source | Models |
|---|---|
| Google first-party | Gemini 3.x (Pro, Flash, Flash Image), Imagen, Veo, Chirp, Lyria 3. |
| Google open | Gemma 4 (open weights) for customization and on-prem/edge. |
| Third-party | Anthropic Claude (Opus/Sonnet/Haiku), Meta Llama, Mistral, and more. |
Gemini Models
Google's frontier multimodal family - the default model on the platform.
| Tier | Best for |
|---|---|
| Gemini 3.x Pro | Hardest reasoning, agents, long-context analysis, coding. |
| Gemini 3.x Flash | Cost/latency-optimized high-volume tasks; the workhorse. |
| Gemini Flash Image | Native image generation/editing within the Gemini family. |
| Gemma 4 (open) | Open-weight option for customization, on-prem, and edge. |
Agents - ADK & A2A
Google's opinionated, governed agent stack: a code-first kit, a managed runtime, and an interop protocol.
The agent stack is a code-first kit (ADK 1.0, four languages), a managed Agent Runtime/Engine, and the A2A protocol v1.0 for agents to discover and call each other across teams and vendors. Tools connect via managed MCP (Apigee bridges existing APIs), and Project Mariner adds browser use. Agent Identity, Registry, and Observability make it governable - which matters once agents call agents.
What problem this solves
Building one agent is easy; operating many, safely, is not. The stack standardizes how agents are built (ADK), run (Runtime), talk to each other (A2A), reach tools (MCP/Apigee), and get governed (Identity, Registry, Observability). A2A in particular makes multi-vendor agent ecosystems real - your agent can call a partner's agent as an interoperable endpoint.
Reference architecture
| Piece | Role |
|---|---|
| Agent Development Kit (ADK) | Build agents in code with tools, memory, and orchestration; v1.0 stable in four languages. |
| Agent Runtime / Engine | Managed hosting and scaling for agents in production. |
| A2A protocol (v1.0) | Open standard for agents to discover and call each other across teams and vendors. |
| Managed MCP (via Apigee) | Expose existing APIs to agents as governed MCP tools. |
| Project Mariner | Browser-using agent for web tasks. |
- Use ADK + Agent Runtime for any agent heading to production.
- Adopt A2A when the problem decomposes into specialists or spans vendors.
- Bridge existing APIs as MCP tools via Apigee rather than rebuilding integrations.
- Govern via Agent Registry + Identity before scaling agent count.
GCP vs AWS vs OCI vs Azure
A practitioner's quick read. Every cloud does the basics; differences are in defaults, data gravity, and silicon.
| Dimension | Google Cloud | AWS | OCI | Azure |
|---|---|---|---|---|
| Frontier own model | Gemini 3.x | Nova (mid); Claude hosted | None (partners) | OpenAI GPT-5.x |
| Model breadth (managed) | Model Garden (200+) | Bedrock (widest) | Broad (OCI Gen AI) | Foundry Models (1000+) |
| Agents | Platform + A2A | AgentCore | Enterprise AI Agents | Foundry Agent Service |
| Custom silicon | TPU (Ironwood/8th) | Trainium/Inferentia | GPU (NVIDIA) | Maia (emerging) |
| Data gravity | BigQuery | S3/Redshift | Oracle DB 26ai (in-DB vectors) | Fabric/Synapse |
| Distribution | Workspace (3B+) | Console/partners | Oracle apps/EBS | M365 |
| Best when | BigQuery/Workspace central; want Gemini + TPU full stack | Already on AWS; want model choice + silicon economics | Run Oracle DB/EBS; want in-DB vectors + sovereignty | Microsoft-centric; want OpenAI + M365 |
Sources
Primary Google material used for this portal (June 2026). Verify specifics against current docs - names and versions are mid-transition.
- Gemini Enterprise Agent Platform (formerly Vertex AI)
- Introducing the Gemini Enterprise Agent Platform (Cloud Blog)
- Welcome to Google Cloud Next '26
- Generative AI on Vertex AI - release notes · Model versions & lifecycle
- Cloud TPU · Model Garden
Gemini Enterprise
The packaged enterprise product - search and agents over your company's knowledge, tied to Workspace. The successor to Agentspace.
Gemini Enterprise gives business users a governed assistant that searches across enterprise systems and runs pre-built or custom agents, with connectors and partner agents (Box, Workday, Salesforce, ServiceNow). For Workspace customers it is the path of least resistance to enterprise GenAI - the buy option that sits on top of the platform's build option.
| Capability | What it gives you |
|---|---|
| Enterprise search | Permission-aware search across connected systems and documents. |
| Pre-built & partner agents | Ready agents from Google and partners (Box, Workday, Salesforce, ServiceNow). |
| Workspace integration | Assistance in Gmail, Docs, Sheets, Meet for 3B+ users. |
| Governance | Inherits IAM and data-access controls; managed via the Agent Registry. |
Grounding & RAG
Keep answers tied to your data and to fresh facts.
| Option | Use |
|---|---|
| RAG Engine | Managed retrieval pipeline: ingest, chunk, embed, retrieve - minimal code. |
| Grounding with Google Search | Ground responses in live web results with citations. |
| Vertex AI Search retrieval | Enterprise retrieval over your indexed corpora, permission-aware. |
| BigQuery / AlloyDB / Vector Search | Bring your own vector store when you want control or data locality. |
Vertex AI Build (training & tuning)
The classic ML platform under the new name - train, tune, deploy, and run MLOps.
| Capability | Use |
|---|---|
| Tuning | Supervised fine-tuning and distillation of Gemini/open models for your task. |
| Custom training | Train your own models on CPU/GPU/TPU with managed jobs. |
| Prediction / endpoints | Online and batch serving with autoscaling. |
| Pipelines / Feature Store / Eval | MLOps: reproducible pipelines, features, and evaluation. |
| Colab Enterprise / Notebooks | Managed notebooks for development. |
Generative Media
Image, video, speech, and music generation - all first-party on the platform.
| Model | Modality |
|---|---|
| Imagen | Image generation and editing. |
| Veo | Text/image-to-video generation. |
| Chirp | Speech-to-text and text-to-speech. |
| Lyria 3 | Music generation. |
| Gemini Flash Image | Image generation/editing inside the Gemini family. |
Vectors & Data
Where embeddings and ground-truth live. Pick by where your data already is.
| Store | Best for |
|---|---|
| BigQuery vector search | Vectors next to your analytics data; Gemini-in-BigQuery for SQL-native AI. Strongest when BQ is your warehouse. |
| AlloyDB AI (pgvector + ScaNN) | Low-latency vectors beside operational Postgres data, with Google's ScaNN index. |
| Vertex AI Vector Search | Purpose-built, high-scale vector search (formerly Matching Engine). |
| Spanner / Firestore vectors | Vectors in globally-distributed or document/app databases. |
TPUs & GPUs
Google's silicon is the cost/perf lever; NVIDIA GPUs are the compatibility lever.
| Silicon | Role |
|---|---|
| TPU Ironwood (v7) | ~4.6 PFLOPS/chip; 9,216-chip superpods (~42.5 EFLOPS). Frontier training and inference. |
| TPU 8t (8th gen, training) | Scales to ~9,600 TPUs and ~2 PB shared HBM per superpod; ~3x Ironwood, up to ~2x perf/Watt. |
| TPU 8i (8th gen, inference) | Boardfly topology connecting ~1,152 TPUs/pod; ~3x on-chip SRAM; ~80% better perf/$ for inference. |
| A3 / A4 GPU VMs | NVIDIA H100/H200/Blackwell for max framework/CUDA compatibility. |
| AI Hypercomputer | The integrated supercomputing architecture (silicon + network + software) under it all. |
Governance & Safety
Independent screening and responsible-AI controls for prompts, responses, and models.
| Control | What it does |
|---|---|
| Model Armor | Screen prompts and responses for prompt injection, jailbreaks, sensitive data, and unsafe content - independent of the model. |
| Safety filters | Configurable content-safety thresholds on Gemini. |
| Responsible AI tooling | Evaluation, explainability, and safety guidance. |
| Agent Identity / IAM | Least-privilege access for agents and humans across the platform. |
Architecture Patterns
The shapes most Google Cloud GenAI workloads fall into.
Gemini Enterprise over your connectors + Workspace, or a custom RAG Engine app on the platform with Model Armor.
ADK + Agent Runtime + Agent Identity + Gateway (MCP via Apigee) + Observability. Add A2A for multi-agent.
Gemini-in-BigQuery and BQ vector search bring generation and retrieval to data already in the warehouse.
Gemini long-context over documents/video; Imagen/Veo for generation; SynthID for provenance.
Tune Gemma 4 or an open model, serve on TPU/GPU endpoints; distill to cut cost.
Gemini for Workspace and Code Assist - buy the assistant in the tools people already use.
Decision Matrix
Fast answers for design reviews.
| Question | Default answer |
|---|---|
| Which model? | Gemini 3.x Flash for volume; 3.x Pro for hardest reasoning; Claude (also in Model Garden) when it wins your eval; Gemma 4 for open/on-prem. |
| Buy or build the assistant? | Gemini Enterprise first; build on the platform for bespoke logic. |
| Agent framework? | ADK + Agent Runtime; adopt A2A only when you have multi-agent/multi-vendor needs. |
| Where do vectors live? | BigQuery if data is there; AlloyDB AI for operational/low-latency; Vector Search for largest dedicated indexes. |
| TPU or GPU? | TPU for Gemini/open models at volume (price/perf); GPU for specific CUDA/framework needs. |
| RAG how? | RAG Engine or Vertex AI Search for managed; bring-your-own vector store for control. |
Pricing & Cost Control
Shape, not exact numbers - rates change and vary by model/region. Confirm on Google Cloud pricing pages.
| Lever | How it bills | Control |
|---|---|---|
| Gemini API | Per input/output token (and per modality), per model tier. | Flash for routine work; Pro only when needed; cap output; use context caching. |
| Provisioned Throughput | Reserved capacity for steady high volume. | Commit after you know the load. |
| Vector Search / RAG | Index storage + query + embedding tokens. | Right-size chunks; prune stale docs; prefer BigQuery if data is there. |
| Training / endpoints | Accelerator-hours (TPU/GPU) + serving. | Autoscale; batch where possible; TPUs for price/perf. |
| Agents | Model tokens x steps + tool calls + runtime. | Cap loop length; route cheap model for routine steps. |
Risks & Gotchas
Read this one.