Google Cloud AI, the practical way

An architecture-first reference for Google Cloud's AI stack as of June 2026. The headline change: at Cloud Next '26, Vertex AI became the Gemini Enterprise Agent Platform and Agentspace folded into Gemini Enterprise. This portal covers the new platform, Gemini models, agents, grounding, generative media, and TPUs - trade-offs and risks, no marketing.

Refreshed June 2026Architecture-firstEnterprise focusVendor-neutral

Naming, 2026

Vertex AI is now the Gemini Enterprise Agent Platform. All former Vertex AI capabilities (Model Garden, training, prediction, RAG, Vector Search) are delivered through it. Gemini Enterprise is the packaged product for enterprises (search + agents + Workspace), having absorbed Agentspace. Docs and many URLs still say vertex-ai - same lineage.

TL;DR

Google's 2026 pitch is owning all four layers: custom silicon (Ironwood and 8th-gen TPU 8t/8i), frontier models (Gemini 3.x, plus open Gemma 4), the platform (Gemini Enterprise Agent Platform with Agent Studio, ADK, A2A orchestration, Agent Registry/Gateway/Identity/Observability), and distribution (Workspace, 3B+ users). Model Garden fronts 200+ models including Anthropic Claude. If your data is in BigQuery and you want Gemini + TPU economics, this is the strongest full-stack story in the market - at the cost of a platform that just went through a large rebrand.

The Google Cloud AI mental model

Figure 1 - Google's AI stack. Most teams enter at Layer 3/2 (Gemini Enterprise / the platform); drop to Layer 1 for data gravity and TPU economics.

What sets Google Cloud apart in 2026

Differentiator	What it means in practice
Full-stack ownership	Silicon (TPU), frontier model (Gemini), platform, and distribution (Workspace) under one roof. No other vendor owns all four - it shows up as tight integration and price/perf.
Gemini multimodal + long context	Native text/image/audio/video with very long context windows. Strong for document, video, and multimodal RAG without bolt-ons.
BigQuery data gravity	If your analytics live in BigQuery, Gemini-in-BigQuery and BQ vector search bring AI to the data with no pipeline.
Agent-native platform	ADK, A2A protocol (multi-vendor agent interop), Agent Registry/Gateway/Identity/Observability - an opinionated, governed agent runtime.
TPU economics	Ironwood and 8th-gen TPUs give a training/inference cost lever, and first-class hosting for Gemini and open models.

Where Google Cloud is weaker (be honest)

Rebrand churn & naming

The Cloud Next '26 rename (Vertex AI to Gemini Enterprise Agent Platform; Agentspace into Gemini Enterprise) is a lot of moving cheese. Docs, SDKs, and URLs still mix old and new names. Budget time for the terminology and check the model-lifecycle pages before pinning versions.

Enterprise footprint & habits

AWS and Azure have deeper enterprise install bases and more third-party tooling. Choosing GCP for AI often means swimming against an existing AWS/Azure data estate unless BigQuery/Workspace are already central.

How to read this portal

Each flagship service tab has sub-tabs (Overview / Architecture / Components / Models / Risks / When to use) with a reference-architecture diagram. If you only read one sub-tab, read Risks.

What's New - late 2025 through June 2026

Material changes that affect architecture, cost, or risk. Curated.

TL;DR

Cloud Next '26 (April 2026) was a reset: Vertex AI to Gemini Enterprise Agent Platform, Agentspace to Gemini Enterprise, the A2A protocol v1.0 for multi-vendor agent interop, ADK 1.0, managed MCP servers via Apigee, and 8th-gen TPU 8t/8i. Model-wise, Gemini 3.x (3.1 Pro/Flash, 3.1 Flash Image), Lyria 3, and open Gemma 4 landed, with 200+ models in Model Garden including Anthropic Claude.

Date	Release	Why it matters
Late 2025	Ironwood TPU (7th gen) GA	~4.6 PFLOPS/chip, scales to 9,216-chip superpods. Big jump in training/inference capacity and price/perf.
Q1 2026	Gemini 3.x family	Gemini 3.1 Pro and Flash; very long context and stronger multimodal/reasoning. Confirm model IDs and lifecycle before pinning.
Apr 2026	Gemini Enterprise Agent Platform (Vertex AI rebrand)	One platform to build, scale, govern, optimize agents - Agent Studio, ADK, Agent Runtime, A2A Orchestration, Registry, Gateway, Identity, Observability, Simulation, Evaluation.
Apr 2026	Gemini Enterprise (Agentspace absorbed)	Packaged enterprise search + agents tied to Workspace; partner agents from Box, Workday, Salesforce, ServiceNow.
Apr 2026	A2A protocol v1.0; ADK 1.0	Agent-to-agent interop standard in production at ~150 orgs; stable ADK across four languages. Multi-vendor agent ecosystems become real.
Apr 2026	Managed MCP servers via Apigee; Project Mariner	Apigee bridges existing APIs to agents as MCP tools; Mariner is a browsing agent. Lowers integration cost.
Apr 2026	8th-gen TPU 8t / 8i	8t for training (scale to ~9,600 TPUs, 2 PB shared HBM), 8i for inference (Boardfly topology, ~80% better perf/$). Cost lever widens.
2026	Gemma 4 (open), Lyria 3 (music), Gemini Flash Image	Open-weight option for on-prem/customization; generative audio and image breadth.

Practical read

If you built on Vertex AI Agent Builder in 2025, your concepts map forward - but re-check service names, IAM roles, and SDK packages against the new platform docs. New agent work should target ADK + the Agent Runtime and consider A2A if you have multi-agent or multi-vendor needs.

Service Map

The Google Cloud AI services worth knowing, grouped by what you do with them.

PLATFORMGemini Enterprise Agent Platform

Formerly Vertex AI. Model Garden, ADK, Agent Runtime, A2A, Registry, Gateway, Identity, Observability, training, RAG.

MODELSGemini 3.x & Gemma 4

Frontier multimodal (Pro/Flash) and open-weight Gemma; 200+ models in Model Garden incl. Anthropic Claude.

ENTERPRISEGemini Enterprise

Packaged search + agents + Workspace; partner agents; the Agentspace successor.

AGENTSADK + A2A

Agent Development Kit, Agent-to-Agent protocol, Agent Engine/Runtime, managed MCP via Apigee.

MEDIAImagen / Veo / Chirp / Lyria

Image, video, speech, and music generation - all on the platform.

DATABigQuery AI & Vector Search

Gemini-in-BigQuery, BQ vector search, AlloyDB AI (ScaNN), Vector Search, Spanner/Firestore vectors.

SILICONTPUs & GPUs

Ironwood (v7), 8th-gen 8t/8i, A3/A4 GPU VMs (Blackwell), AI Hypercomputer.

GROUNDGrounding & RAG

RAG Engine, grounding with Google Search, Vertex AI Search retrieval.

GOVERNModel Armor & Responsible AI

Prompt/response screening, safety filters, and responsible-AI tooling.

How to read this

The flagship services (Gemini Enterprise Agent Platform, Agents/ADK) carry full sub-tabs - Overview / Architecture / Components / Pricing / Risks / When-to-use - with reference-architecture diagrams. Secondary services use a single rich page with the same architecture-first, risk-honest treatment. If you're scoping production, read a service's Risks before its Overview.

Gemini Enterprise Agent Platform was Vertex AI

The single platform to choose models, build and govern agents, evaluate, and ship - the center of gravity for AI on Google Cloud.

Official documentation ↗

Overview

Architecture

Components

Pricing model

Risks & gotchas

When to use

TL;DR

At Cloud Next '26, Google folded all Vertex AI capabilities into this platform and organized it around four jobs: build, scale, govern, optimize. It gives you Model Garden (200+ models), a managed agent stack (Agent Studio, ADK, Runtime, A2A), grounding/RAG, training/tuning, generative media, and Model Armor - all under IAM, VPC-SC, and Google Cloud billing. The agent stack is the new center of gravity; the classic ML tools (training, prediction, pipelines) are still here under the new name.

What problem this solves

Enterprises don't want to wire a model API, a vector store, a guardrail service, an eval harness, an agent orchestrator, and monitoring from separate vendors. This platform's offer is one governed surface where you pick models from one catalog, build and register agents, ground them in your data, and observe them in production - with Gemini and TPU economics underneath. The trade-off is the rebrand: names, SDKs, and IAM roles are mid-migration, so onboarding has a terminology tax.

Mental note

Think of this as Vertex AI's superset: the model and training tools you knew, plus a full agent-operations layer. Old vertex-ai SDK/URL paths largely still work during the transition.

Reference architecture

Figure - Gemini Enterprise Agent Platform. One governed surface over Model Garden, the agent stack, grounding, training, and observability; Model Armor sits in-line.

Network and identity

The platform supports VPC Service Controls and Private Service Connect so model and agent traffic stays inside your perimeter. Authorization is Google Cloud IAM; agents get first-class Agent Identity with scoped permissions. Use CMEK for data at rest and keep secrets in Secret Manager.

Where the data goes

Google's stated position is that your prompts and data are not used to train the foundation models, and data stays within your project and chosen region. Confirm the specific Gemini model's region/availability and lifecycle before designing around it - versions have explicit expiry.

Component	What it does
Agent Studio	Low/no-code design surface for building and testing agents.
Agent Development Kit (ADK 1.0)	Code-first framework for agents; stable across four languages.
Agent Runtime / Engine	Managed, scalable execution for agents in production.
A2A Orchestration	Agent-to-Agent protocol (v1.0) for multi-agent and cross-vendor coordination.
Agent Gateway	Connect tools/APIs (incl. managed MCP via Apigee) with governance.
Agent Identity / Registry	First-class agent identities + a catalog to version and govern agents in your estate.
Observability / Simulation / Evaluation	Trace, test against simulated and real traffic, and evaluate agent quality.
Model Garden + training	200+ models, plus tuning and custom training on TPU/GPU.

Lever	How it bills	Control
Gemini API	Per input/output token (and per modality), per model tier.	Flash for routine; Pro only when needed; cap output; context caching.
Provisioned Throughput	Reserved capacity for steady high volume.	Commit after you know the load.
Agent Runtime	Model tokens x steps + runtime + tool calls.	Cap loop length; route cheap model for routine steps.
Training / tuning	Accelerator-hours (TPU/GPU).	Tune only with evidence; TPUs for price/perf.

Rule of thumb

Lead with Gemini Flash and context caching; reserve Provisioned Throughput once volume is steady; keep Pro for the hard prompts only.

Rebrand & naming drift

Vertex AI to Gemini Enterprise Agent Platform; Agentspace to Gemini Enterprise. SDKs, IAM roles, and URLs mix old/new. Confirm current names and check model lifecycle pages before pinning.

Model version expiry

Gemini versions have explicit lifecycles and can be retired. Pin versions, monitor deprecations, and test before upgrading - auto-latest will bite you.

Quotas & TPU capacity

Accelerator and API quotas throttle real workloads; TPU capacity can be contended. Request increases early and design for backoff.

Use the platform for any GenAI workload on Google Cloud - one governed surface for models, agents, grounding, and evals.
Lead with Gemini Flash + the agent stack; reserve Provisioned Throughput once volume is steady.
Adopt A2A only when you have multi-agent or multi-vendor needs.
Use Gemini Enterprise (packaged) if you want the M365-style buy rather than build.

Model Garden

One catalog, 200+ models - Google first-party, open, and third-party - behind the platform's API and governance.

Official documentation ↗

Source	Models
Google first-party	Gemini 3.x (Pro, Flash, Flash Image), Imagen, Veo, Chirp, Lyria 3.
Google open	Gemma 4 (open weights) for customization and on-prem/edge.
Third-party	Anthropic Claude (Opus/Sonnet/Haiku), Meta Llama, Mistral, and more.

Why it matters

Claude and Gemini on the same platform means you can route the hardest tasks to whichever model wins your eval, without leaving Google Cloud's governance and billing.

Gemini Models

Google's frontier multimodal family - the default model on the platform.

Official documentation ↗

Tier	Best for
Gemini 3.x Pro	Hardest reasoning, agents, long-context analysis, coding.
Gemini 3.x Flash	Cost/latency-optimized high-volume tasks; the workhorse.
Gemini Flash Image	Native image generation/editing within the Gemini family.
Gemma 4 (open)	Open-weight option for customization, on-prem, and edge.

Version lifecycle

Gemini versions rev quickly and have explicit lifecycle/expiry. Pin a version, watch the model-versions page, and test before auto-upgrading.

Agents - ADK & A2A

Google's opinionated, governed agent stack: a code-first kit, a managed runtime, and an interop protocol.

Official documentation ↗

Overview

Architecture

Protocols & tools

Risks & gotchas

When to use

TL;DR

The agent stack is a code-first kit (ADK 1.0, four languages), a managed Agent Runtime/Engine, and the A2A protocol v1.0 for agents to discover and call each other across teams and vendors. Tools connect via managed MCP (Apigee bridges existing APIs), and Project Mariner adds browser use. Agent Identity, Registry, and Observability make it governable - which matters once agents call agents.

What problem this solves

Building one agent is easy; operating many, safely, is not. The stack standardizes how agents are built (ADK), run (Runtime), talk to each other (A2A), reach tools (MCP/Apigee), and get governed (Identity, Registry, Observability). A2A in particular makes multi-vendor agent ecosystems real - your agent can call a partner's agent as an interoperable endpoint.

Reference architecture

Figure - The agent stack. ADK agents run on the Agent Runtime, call specialist agents over A2A and tools over managed MCP, ground in your data, and are governed + observed centrally.

Piece	Role
Agent Development Kit (ADK)	Build agents in code with tools, memory, and orchestration; v1.0 stable in four languages.
Agent Runtime / Engine	Managed hosting and scaling for agents in production.
A2A protocol (v1.0)	Open standard for agents to discover and call each other across teams and vendors.
Managed MCP (via Apigee)	Expose existing APIs to agents as governed MCP tools.
Project Mariner	Browser-using agent for web tasks.

Multi-agent blast radius

A2A makes agent ecosystems powerful and harder to reason about. Use Agent Identity, scoped permissions, Registry governance, budgets, and Observability from day one - don't let agents call agents without audit.

Tool auth & data egress

MCP tools and partner A2A agents can read and send data. Vet every tool/agent, scope consent, and keep traffic inside VPC-SC where compliance requires.

Runaway loops

Cap steps and tokens per conversation; route routine steps to Gemini Flash and reserve Pro for the hard parts.

Use ADK + Agent Runtime for any agent heading to production.
Adopt A2A when the problem decomposes into specialists or spans vendors.
Bridge existing APIs as MCP tools via Apigee rather than rebuilding integrations.
Govern via Agent Registry + Identity before scaling agent count.

GCP vs AWS vs OCI vs Azure

A practitioner's quick read. Every cloud does the basics; differences are in defaults, data gravity, and silicon.

Dimension	Google Cloud	AWS	OCI	Azure
Frontier own model	Gemini 3.x	Nova (mid); Claude hosted	None (partners)	OpenAI GPT-5.x
Model breadth (managed)	Model Garden (200+)	Bedrock (widest)	Broad (OCI Gen AI)	Foundry Models (1000+)
Agents	Platform + A2A	AgentCore	Enterprise AI Agents	Foundry Agent Service
Custom silicon	TPU (Ironwood/8th)	Trainium/Inferentia	GPU (NVIDIA)	Maia (emerging)
Data gravity	BigQuery	S3/Redshift	Oracle DB 26ai (in-DB vectors)	Fabric/Synapse
Distribution	Workspace (3B+)	Console/partners	Oracle apps/EBS	M365
Best when	BigQuery/Workspace central; want Gemini + TPU full stack	Already on AWS; want model choice + silicon economics	Run Oracle DB/EBS; want in-DB vectors + sovereignty	Microsoft-centric; want OpenAI + M365

Honest take

The cloud your data and identity already live in usually wins - gravity beats a marginally better model. GCP's edge is a genuine full-stack story when BigQuery and Workspace are already yours.

Sources

Primary Google material used for this portal (June 2026). Verify specifics against current docs - names and versions are mid-transition.

Accuracy note

Compiled by Brijesh Gogia for expertoracle.com. Independent and not affiliated with Google. Google Cloud's AI naming changed substantially at Cloud Next '26 - treat this as orientation and confirm in the console/docs before designing.

Gemini Enterprise

The packaged enterprise product - search and agents over your company's knowledge, tied to Workspace. The successor to Agentspace.

Official documentation ↗

Gemini Enterprise gives business users a governed assistant that searches across enterprise systems and runs pre-built or custom agents, with connectors and partner agents (Box, Workday, Salesforce, ServiceNow). For Workspace customers it is the path of least resistance to enterprise GenAI - the buy option that sits on top of the platform's build option.

Capability	What it gives you
Enterprise search	Permission-aware search across connected systems and documents.
Pre-built & partner agents	Ready agents from Google and partners (Box, Workday, Salesforce, ServiceNow).
Workspace integration	Assistance in Gmail, Docs, Sheets, Meet for 3B+ users.
Governance	Inherits IAM and data-access controls; managed via the Agent Registry.

Build vs buy

For an internal knowledge assistant, pilot Gemini Enterprise before building custom RAG - the connectors, permissions, and Workspace integration save real work. Build on the platform when you need bespoke logic.

Grounding & RAG

Keep answers tied to your data and to fresh facts.

Official documentation ↗

Option	Use
RAG Engine	Managed retrieval pipeline: ingest, chunk, embed, retrieve - minimal code.
Grounding with Google Search	Ground responses in live web results with citations.
Vertex AI Search retrieval	Enterprise retrieval over your indexed corpora, permission-aware.
BigQuery / AlloyDB / Vector Search	Bring your own vector store when you want control or data locality.

Retrieval is the failure point

Most RAG quality problems are retrieval, not the model. Tune chunking, add rerankers, and evaluate retrieval before blaming Gemini.

Vertex AI Build (training & tuning)

The classic ML platform under the new name - train, tune, deploy, and run MLOps.

Official documentation ↗

Capability	Use
Tuning	Supervised fine-tuning and distillation of Gemini/open models for your task.
Custom training	Train your own models on CPU/GPU/TPU with managed jobs.
Prediction / endpoints	Online and batch serving with autoscaling.
Pipelines / Feature Store / Eval	MLOps: reproducible pipelines, features, and evaluation.
Colab Enterprise / Notebooks	Managed notebooks for development.

Order of operations

Prompt + grounding first; tune only with evidence the base model misses your bar; distill to cut run-cost once a tuned large model proves out.

Generative Media

Image, video, speech, and music generation - all first-party on the platform.

Official documentation ↗

Model	Modality
Imagen	Image generation and editing.
Veo	Text/image-to-video generation.
Chirp	Speech-to-text and text-to-speech.
Lyria 3	Music generation.
Gemini Flash Image	Image generation/editing inside the Gemini family.

Provenance

Google applies SynthID watermarking to generated media - useful for content-credential and compliance requirements.

Vectors & Data

Where embeddings and ground-truth live. Pick by where your data already is.

Official documentation ↗

Store	Best for
BigQuery vector search	Vectors next to your analytics data; Gemini-in-BigQuery for SQL-native AI. Strongest when BQ is your warehouse.
AlloyDB AI (pgvector + ScaNN)	Low-latency vectors beside operational Postgres data, with Google's ScaNN index.
Vertex AI Vector Search	Purpose-built, high-scale vector search (formerly Matching Engine).
Spanner / Firestore vectors	Vectors in globally-distributed or document/app databases.

Default

If your data is in BigQuery, start there. Use AlloyDB AI for operational/low-latency, Vector Search for the largest dedicated indexes.

TPUs & GPUs

Google's silicon is the cost/perf lever; NVIDIA GPUs are the compatibility lever.

Official documentation ↗

Silicon	Role
TPU Ironwood (v7)	~4.6 PFLOPS/chip; 9,216-chip superpods (~42.5 EFLOPS). Frontier training and inference.
TPU 8t (8th gen, training)	Scales to ~9,600 TPUs and ~2 PB shared HBM per superpod; ~3x Ironwood, up to ~2x perf/Watt.
TPU 8i (8th gen, inference)	Boardfly topology connecting ~1,152 TPUs/pod; ~3x on-chip SRAM; ~80% better perf/$ for inference.
A3 / A4 GPU VMs	NVIDIA H100/H200/Blackwell for max framework/CUDA compatibility.
AI Hypercomputer	The integrated supercomputing architecture (silicon + network + software) under it all.

Architect's lever

For Gemini and TPU-friendly open models at volume, TPUs can win decisively on price/perf. Keep GPUs where a specific CUDA/framework path is required.

Governance & Safety

Independent screening and responsible-AI controls for prompts, responses, and models.

Official documentation ↗

Control	What it does
Model Armor	Screen prompts and responses for prompt injection, jailbreaks, sensitive data, and unsafe content - independent of the model.
Safety filters	Configurable content-safety thresholds on Gemini.
Responsible AI tooling	Evaluation, explainability, and safety guidance.
Agent Identity / IAM	Least-privilege access for agents and humans across the platform.

Apply at the platform layer

Put Model Armor between the app and the model so the same policy holds regardless of which model an agent selects.

Architecture Patterns

The shapes most Google Cloud GenAI workloads fall into.

1. Enterprise assistant

Gemini Enterprise over your connectors + Workspace, or a custom RAG Engine app on the platform with Model Armor.

2. Production agent

ADK + Agent Runtime + Agent Identity + Gateway (MCP via Apigee) + Observability. Add A2A for multi-agent.

3. SQL-native AI

Gemini-in-BigQuery and BQ vector search bring generation and retrieval to data already in the warehouse.

4. Multimodal pipeline

Gemini long-context over documents/video; Imagen/Veo for generation; SynthID for provenance.

5. Custom/open model service

Tune Gemma 4 or an open model, serve on TPU/GPU endpoints; distill to cut cost.

6. Workspace-embedded

Gemini for Workspace and Code Assist - buy the assistant in the tools people already use.

Decision Matrix

Fast answers for design reviews.

Question	Default answer
Which model?	Gemini 3.x Flash for volume; 3.x Pro for hardest reasoning; Claude (also in Model Garden) when it wins your eval; Gemma 4 for open/on-prem.
Buy or build the assistant?	Gemini Enterprise first; build on the platform for bespoke logic.
Agent framework?	ADK + Agent Runtime; adopt A2A only when you have multi-agent/multi-vendor needs.
Where do vectors live?	BigQuery if data is there; AlloyDB AI for operational/low-latency; Vector Search for largest dedicated indexes.
TPU or GPU?	TPU for Gemini/open models at volume (price/perf); GPU for specific CUDA/framework needs.
RAG how?	RAG Engine or Vertex AI Search for managed; bring-your-own vector store for control.

Pricing & Cost Control

Shape, not exact numbers - rates change and vary by model/region. Confirm on Google Cloud pricing pages.

Lever	How it bills	Control
Gemini API	Per input/output token (and per modality), per model tier.	Flash for routine work; Pro only when needed; cap output; use context caching.
Provisioned Throughput	Reserved capacity for steady high volume.	Commit after you know the load.
Vector Search / RAG	Index storage + query + embedding tokens.	Right-size chunks; prune stale docs; prefer BigQuery if data is there.
Training / endpoints	Accelerator-hours (TPU/GPU) + serving.	Autoscale; batch where possible; TPUs for price/perf.
Agents	Model tokens x steps + tool calls + runtime.	Cap loop length; route cheap model for routine steps.

The agent cost trap

Agent loops multiply token cost by steps. Budget per-conversation, log token usage, and use Flash for routing/routine steps with Pro reserved for the hard parts.

Risks & Gotchas

Read this one.

Rebrand & naming drift

Vertex AI to Gemini Enterprise Agent Platform, Agentspace to Gemini Enterprise. SDKs, IAM roles, and URLs mix old/new. Confirm current names and check model lifecycle pages before pinning.

Model version expiry

Gemini versions have explicit lifecycles and can be retired. Pin versions, monitor deprecations, and test before upgrading - auto-latest will bite you.

Multi-agent sprawl

A2A and agent-of-agents are powerful and hard to govern. Enforce Agent Identity, least privilege, Registry governance, budgets, and Observability from the start.

Data residency & grounding

Grounding with Google Search and external tools can move data off your boundary. Confirm residency; prefer in-boundary retrieval where compliance requires.

Quotas & capacity

Accelerator and API quotas throttle real workloads; TPU capacity can be contended. Request increases early and design for backoff.

Estate fit

GCP AI shines when BigQuery/Workspace are central. If your data lives in AWS/Azure, weigh egress and identity friction before committing.