As on 26 June 2026
← expertoracle.com

Google Cloud AI, the practical way

An architecture-first reference for Google Cloud's AI stack as of June 2026. The headline change: at Cloud Next '26, Vertex AI became the Gemini Enterprise Agent Platform and Agentspace folded into Gemini Enterprise. This portal covers the new platform, Gemini models, agents, grounding, generative media, and TPUs - trade-offs and risks, no marketing.

Refreshed June 2026Architecture-firstEnterprise focusVendor-neutral
Naming, 2026
Vertex AI is now the Gemini Enterprise Agent Platform. All former Vertex AI capabilities (Model Garden, training, prediction, RAG, Vector Search) are delivered through it. Gemini Enterprise is the packaged product for enterprises (search + agents + Workspace), having absorbed Agentspace. Docs and many URLs still say vertex-ai - same lineage.
TL;DR

Google's 2026 pitch is owning all four layers: custom silicon (Ironwood and 8th-gen TPU 8t/8i), frontier models (Gemini 3.x, plus open Gemma 4), the platform (Gemini Enterprise Agent Platform with Agent Studio, ADK, A2A orchestration, Agent Registry/Gateway/Identity/Observability), and distribution (Workspace, 3B+ users). Model Garden fronts 200+ models including Anthropic Claude. If your data is in BigQuery and you want Gemini + TPU economics, this is the strongest full-stack story in the market - at the cost of a platform that just went through a large rebrand.

The Google Cloud AI mental model

LAYER 3 - AGENTS & ASSISTANTS (consume) Gemini Enterprise (search + agents) - Agent Studio / Workspace Studio - Gemini Code Assist - Gemini for Workspace Pre-built or low/no-code. Governed by IAM + Agent Identity. Configure tools and data, not weights. LAYER 2 - GEMINI ENTERPRISE AGENT PLATFORM (build) Model Garden (200+) - ADK - Agent Runtime - A2A Orchestration - Gateway - Registry - Observability - Eval Grounding & RAG Engine - training/tuning - Imagen / Veo / Chirp / Lyria - Model Armor Build, scale, govern, optimize agents and models on one platform (formerly Vertex AI). LAYER 1 - DATA & INFRASTRUCTURE (ground) BigQuery (+ vector search, Gemini in BQ) - AlloyDB AI - Vector Search - Spanner / Firestore vectors - Cloud Storage TPU Ironwood (v7) & 8th-gen 8t/8i - A3/A4 GPUs (Blackwell) - AI Hypercomputer Your data and vectors live here, next to BigQuery and Google's custom silicon.
Figure 1 - Google's AI stack. Most teams enter at Layer 3/2 (Gemini Enterprise / the platform); drop to Layer 1 for data gravity and TPU economics.

What sets Google Cloud apart in 2026

DifferentiatorWhat it means in practice
Full-stack ownershipSilicon (TPU), frontier model (Gemini), platform, and distribution (Workspace) under one roof. No other vendor owns all four - it shows up as tight integration and price/perf.
Gemini multimodal + long contextNative text/image/audio/video with very long context windows. Strong for document, video, and multimodal RAG without bolt-ons.
BigQuery data gravityIf your analytics live in BigQuery, Gemini-in-BigQuery and BQ vector search bring AI to the data with no pipeline.
Agent-native platformADK, A2A protocol (multi-vendor agent interop), Agent Registry/Gateway/Identity/Observability - an opinionated, governed agent runtime.
TPU economicsIronwood and 8th-gen TPUs give a training/inference cost lever, and first-class hosting for Gemini and open models.

Where Google Cloud is weaker (be honest)

Rebrand churn & naming
The Cloud Next '26 rename (Vertex AI to Gemini Enterprise Agent Platform; Agentspace into Gemini Enterprise) is a lot of moving cheese. Docs, SDKs, and URLs still mix old and new names. Budget time for the terminology and check the model-lifecycle pages before pinning versions.
Enterprise footprint & habits
AWS and Azure have deeper enterprise install bases and more third-party tooling. Choosing GCP for AI often means swimming against an existing AWS/Azure data estate unless BigQuery/Workspace are already central.

How to read this portal

Each flagship service tab has sub-tabs (Overview / Architecture / Components / Models / Risks / When to use) with a reference-architecture diagram. If you only read one sub-tab, read Risks.

What's New - late 2025 through June 2026

Material changes that affect architecture, cost, or risk. Curated.

TL;DR

Cloud Next '26 (April 2026) was a reset: Vertex AI to Gemini Enterprise Agent Platform, Agentspace to Gemini Enterprise, the A2A protocol v1.0 for multi-vendor agent interop, ADK 1.0, managed MCP servers via Apigee, and 8th-gen TPU 8t/8i. Model-wise, Gemini 3.x (3.1 Pro/Flash, 3.1 Flash Image), Lyria 3, and open Gemma 4 landed, with 200+ models in Model Garden including Anthropic Claude.

DateReleaseWhy it matters
Late 2025Ironwood TPU (7th gen) GA~4.6 PFLOPS/chip, scales to 9,216-chip superpods. Big jump in training/inference capacity and price/perf.
Q1 2026Gemini 3.x familyGemini 3.1 Pro and Flash; very long context and stronger multimodal/reasoning. Confirm model IDs and lifecycle before pinning.
Apr 2026Gemini Enterprise Agent Platform (Vertex AI rebrand)One platform to build, scale, govern, optimize agents - Agent Studio, ADK, Agent Runtime, A2A Orchestration, Registry, Gateway, Identity, Observability, Simulation, Evaluation.
Apr 2026Gemini Enterprise (Agentspace absorbed)Packaged enterprise search + agents tied to Workspace; partner agents from Box, Workday, Salesforce, ServiceNow.
Apr 2026A2A protocol v1.0; ADK 1.0Agent-to-agent interop standard in production at ~150 orgs; stable ADK across four languages. Multi-vendor agent ecosystems become real.
Apr 2026Managed MCP servers via Apigee; Project MarinerApigee bridges existing APIs to agents as MCP tools; Mariner is a browsing agent. Lowers integration cost.
Apr 20268th-gen TPU 8t / 8i8t for training (scale to ~9,600 TPUs, 2 PB shared HBM), 8i for inference (Boardfly topology, ~80% better perf/$). Cost lever widens.
2026Gemma 4 (open), Lyria 3 (music), Gemini Flash ImageOpen-weight option for on-prem/customization; generative audio and image breadth.
Practical read
If you built on Vertex AI Agent Builder in 2025, your concepts map forward - but re-check service names, IAM roles, and SDK packages against the new platform docs. New agent work should target ADK + the Agent Runtime and consider A2A if you have multi-agent or multi-vendor needs.

Service Map

The Google Cloud AI services worth knowing, grouped by what you do with them.

PLATFORMGemini Enterprise Agent Platform

Formerly Vertex AI. Model Garden, ADK, Agent Runtime, A2A, Registry, Gateway, Identity, Observability, training, RAG.

MODELSGemini 3.x & Gemma 4

Frontier multimodal (Pro/Flash) and open-weight Gemma; 200+ models in Model Garden incl. Anthropic Claude.

ENTERPRISEGemini Enterprise

Packaged search + agents + Workspace; partner agents; the Agentspace successor.

AGENTSADK + A2A

Agent Development Kit, Agent-to-Agent protocol, Agent Engine/Runtime, managed MCP via Apigee.

MEDIAImagen / Veo / Chirp / Lyria

Image, video, speech, and music generation - all on the platform.

DATABigQuery AI & Vector Search

Gemini-in-BigQuery, BQ vector search, AlloyDB AI (ScaNN), Vector Search, Spanner/Firestore vectors.

SILICONTPUs & GPUs

Ironwood (v7), 8th-gen 8t/8i, A3/A4 GPU VMs (Blackwell), AI Hypercomputer.

GROUNDGrounding & RAG

RAG Engine, grounding with Google Search, Vertex AI Search retrieval.

GOVERNModel Armor & Responsible AI

Prompt/response screening, safety filters, and responsible-AI tooling.

How to read this
The flagship services (Gemini Enterprise Agent Platform, Agents/ADK) carry full sub-tabs - Overview / Architecture / Components / Pricing / Risks / When-to-use - with reference-architecture diagrams. Secondary services use a single rich page with the same architecture-first, risk-honest treatment. If you're scoping production, read a service's Risks before its Overview.

Gemini Enterprise Agent Platform was Vertex AI

The single platform to choose models, build and govern agents, evaluate, and ship - the center of gravity for AI on Google Cloud.

Official documentation ↗

Overview
Architecture
Components
Pricing model
Risks & gotchas
When to use
TL;DR

At Cloud Next '26, Google folded all Vertex AI capabilities into this platform and organized it around four jobs: build, scale, govern, optimize. It gives you Model Garden (200+ models), a managed agent stack (Agent Studio, ADK, Runtime, A2A), grounding/RAG, training/tuning, generative media, and Model Armor - all under IAM, VPC-SC, and Google Cloud billing. The agent stack is the new center of gravity; the classic ML tools (training, prediction, pipelines) are still here under the new name.

What problem this solves

Enterprises don't want to wire a model API, a vector store, a guardrail service, an eval harness, an agent orchestrator, and monitoring from separate vendors. This platform's offer is one governed surface where you pick models from one catalog, build and register agents, ground them in your data, and observe them in production - with Gemini and TPU economics underneath. The trade-off is the rebrand: names, SDKs, and IAM roles are mid-migration, so onboarding has a terminology tax.

Mental note
Think of this as Vertex AI's superset: the model and training tools you knew, plus a full agent-operations layer. Old vertex-ai SDK/URL paths largely still work during the transition.

Reference architecture

Google Cloud project - VPC-SC, IAM, Private Service Connect Application GKE / Cloud Run / Functions Vertex / Gen AI SDK Platform endpoint Private Service Connect Model Armor in-line Model Garden (200+) ▸ Gemini 3.x Pro / Flash / Flash Image ▸ Gemma 4 (open weights) ▸ Anthropic Claude, Llama, Mistral ▸ Imagen / Veo / Chirp / Lyria ▸ tuned + imported models Agent stack Agent Studio - ADK - Agent Runtime A2A orchestration, connected agents Grounding & RAG RAG Engine, Vertex AI Search, Google Search BigQuery / AlloyDB / Vector Search Govern Model Armor, Agent Registry Agent Identity, IAM Responsible AI Observe & evaluate Agent Observability, traces Simulation, Evaluation Cloud Monitoring Train & tune tuning, custom training on TPU / GPU pipelines, registry
Figure - Gemini Enterprise Agent Platform. One governed surface over Model Garden, the agent stack, grounding, training, and observability; Model Armor sits in-line.

Network and identity

The platform supports VPC Service Controls and Private Service Connect so model and agent traffic stays inside your perimeter. Authorization is Google Cloud IAM; agents get first-class Agent Identity with scoped permissions. Use CMEK for data at rest and keep secrets in Secret Manager.

Where the data goes

Google's stated position is that your prompts and data are not used to train the foundation models, and data stays within your project and chosen region. Confirm the specific Gemini model's region/availability and lifecycle before designing around it - versions have explicit expiry.

ComponentWhat it does
Agent StudioLow/no-code design surface for building and testing agents.
Agent Development Kit (ADK 1.0)Code-first framework for agents; stable across four languages.
Agent Runtime / EngineManaged, scalable execution for agents in production.
A2A OrchestrationAgent-to-Agent protocol (v1.0) for multi-agent and cross-vendor coordination.
Agent GatewayConnect tools/APIs (incl. managed MCP via Apigee) with governance.
Agent Identity / RegistryFirst-class agent identities + a catalog to version and govern agents in your estate.
Observability / Simulation / EvaluationTrace, test against simulated and real traffic, and evaluate agent quality.
Model Garden + training200+ models, plus tuning and custom training on TPU/GPU.
LeverHow it billsControl
Gemini APIPer input/output token (and per modality), per model tier.Flash for routine; Pro only when needed; cap output; context caching.
Provisioned ThroughputReserved capacity for steady high volume.Commit after you know the load.
Agent RuntimeModel tokens x steps + runtime + tool calls.Cap loop length; route cheap model for routine steps.
Training / tuningAccelerator-hours (TPU/GPU).Tune only with evidence; TPUs for price/perf.
Rule of thumb
Lead with Gemini Flash and context caching; reserve Provisioned Throughput once volume is steady; keep Pro for the hard prompts only.
Rebrand & naming drift
Vertex AI to Gemini Enterprise Agent Platform; Agentspace to Gemini Enterprise. SDKs, IAM roles, and URLs mix old/new. Confirm current names and check model lifecycle pages before pinning.
Model version expiry
Gemini versions have explicit lifecycles and can be retired. Pin versions, monitor deprecations, and test before upgrading - auto-latest will bite you.
Quotas & TPU capacity
Accelerator and API quotas throttle real workloads; TPU capacity can be contended. Request increases early and design for backoff.
  • Use the platform for any GenAI workload on Google Cloud - one governed surface for models, agents, grounding, and evals.
  • Lead with Gemini Flash + the agent stack; reserve Provisioned Throughput once volume is steady.
  • Adopt A2A only when you have multi-agent or multi-vendor needs.
  • Use Gemini Enterprise (packaged) if you want the M365-style buy rather than build.

Model Garden

One catalog, 200+ models - Google first-party, open, and third-party - behind the platform's API and governance.

Official documentation ↗

SourceModels
Google first-partyGemini 3.x (Pro, Flash, Flash Image), Imagen, Veo, Chirp, Lyria 3.
Google openGemma 4 (open weights) for customization and on-prem/edge.
Third-partyAnthropic Claude (Opus/Sonnet/Haiku), Meta Llama, Mistral, and more.
Why it matters
Claude and Gemini on the same platform means you can route the hardest tasks to whichever model wins your eval, without leaving Google Cloud's governance and billing.

Gemini Models

Google's frontier multimodal family - the default model on the platform.

Official documentation ↗

TierBest for
Gemini 3.x ProHardest reasoning, agents, long-context analysis, coding.
Gemini 3.x FlashCost/latency-optimized high-volume tasks; the workhorse.
Gemini Flash ImageNative image generation/editing within the Gemini family.
Gemma 4 (open)Open-weight option for customization, on-prem, and edge.
Version lifecycle
Gemini versions rev quickly and have explicit lifecycle/expiry. Pin a version, watch the model-versions page, and test before auto-upgrading.

Agents - ADK & A2A

Google's opinionated, governed agent stack: a code-first kit, a managed runtime, and an interop protocol.

Official documentation ↗

Overview
Architecture
Protocols & tools
Risks & gotchas
When to use
TL;DR

The agent stack is a code-first kit (ADK 1.0, four languages), a managed Agent Runtime/Engine, and the A2A protocol v1.0 for agents to discover and call each other across teams and vendors. Tools connect via managed MCP (Apigee bridges existing APIs), and Project Mariner adds browser use. Agent Identity, Registry, and Observability make it governable - which matters once agents call agents.

What problem this solves

Building one agent is easy; operating many, safely, is not. The stack standardizes how agents are built (ADK), run (Runtime), talk to each other (A2A), reach tools (MCP/Apigee), and get governed (Identity, Registry, Observability). A2A in particular makes multi-vendor agent ecosystems real - your agent can call a partner's agent as an interoperable endpoint.

Reference architecture

Platform project - Agent Identity, IAM, Model Armor ADK agentmodel + instructions + toolson Agent Runtime/Engine Connected agents (A2A)specialists, cross-vendororchestrator delegates Tools (governed)managed MCP via ApigeeMariner browser, functions Knowledge / groundingRAG Engine, Vertex AI SearchBigQuery / AlloyDB vectors GovernAgent Registry + IdentityModel Armor, IAM scopes ObserveAgent Observability, tracesSimulation + Evaluation
Figure - The agent stack. ADK agents run on the Agent Runtime, call specialist agents over A2A and tools over managed MCP, ground in your data, and are governed + observed centrally.
PieceRole
Agent Development Kit (ADK)Build agents in code with tools, memory, and orchestration; v1.0 stable in four languages.
Agent Runtime / EngineManaged hosting and scaling for agents in production.
A2A protocol (v1.0)Open standard for agents to discover and call each other across teams and vendors.
Managed MCP (via Apigee)Expose existing APIs to agents as governed MCP tools.
Project MarinerBrowser-using agent for web tasks.
Multi-agent blast radius
A2A makes agent ecosystems powerful and harder to reason about. Use Agent Identity, scoped permissions, Registry governance, budgets, and Observability from day one - don't let agents call agents without audit.
Tool auth & data egress
MCP tools and partner A2A agents can read and send data. Vet every tool/agent, scope consent, and keep traffic inside VPC-SC where compliance requires.
Runaway loops
Cap steps and tokens per conversation; route routine steps to Gemini Flash and reserve Pro for the hard parts.
  • Use ADK + Agent Runtime for any agent heading to production.
  • Adopt A2A when the problem decomposes into specialists or spans vendors.
  • Bridge existing APIs as MCP tools via Apigee rather than rebuilding integrations.
  • Govern via Agent Registry + Identity before scaling agent count.

GCP vs AWS vs OCI vs Azure

A practitioner's quick read. Every cloud does the basics; differences are in defaults, data gravity, and silicon.

DimensionGoogle CloudAWSOCIAzure
Frontier own modelGemini 3.xNova (mid); Claude hostedNone (partners)OpenAI GPT-5.x
Model breadth (managed)Model Garden (200+)Bedrock (widest)Broad (OCI Gen AI)Foundry Models (1000+)
AgentsPlatform + A2AAgentCoreEnterprise AI AgentsFoundry Agent Service
Custom siliconTPU (Ironwood/8th)Trainium/InferentiaGPU (NVIDIA)Maia (emerging)
Data gravityBigQueryS3/RedshiftOracle DB 26ai (in-DB vectors)Fabric/Synapse
DistributionWorkspace (3B+)Console/partnersOracle apps/EBSM365
Best whenBigQuery/Workspace central; want Gemini + TPU full stackAlready on AWS; want model choice + silicon economicsRun Oracle DB/EBS; want in-DB vectors + sovereigntyMicrosoft-centric; want OpenAI + M365
Honest take
The cloud your data and identity already live in usually wins - gravity beats a marginally better model. GCP's edge is a genuine full-stack story when BigQuery and Workspace are already yours.

Sources

Primary Google material used for this portal (June 2026). Verify specifics against current docs - names and versions are mid-transition.

Accuracy note
Compiled by Brijesh Gogia for expertoracle.com. Independent and not affiliated with Google. Google Cloud's AI naming changed substantially at Cloud Next '26 - treat this as orientation and confirm in the console/docs before designing.

Gemini Enterprise

The packaged enterprise product - search and agents over your company's knowledge, tied to Workspace. The successor to Agentspace.

Official documentation ↗

Gemini Enterprise gives business users a governed assistant that searches across enterprise systems and runs pre-built or custom agents, with connectors and partner agents (Box, Workday, Salesforce, ServiceNow). For Workspace customers it is the path of least resistance to enterprise GenAI - the buy option that sits on top of the platform's build option.

CapabilityWhat it gives you
Enterprise searchPermission-aware search across connected systems and documents.
Pre-built & partner agentsReady agents from Google and partners (Box, Workday, Salesforce, ServiceNow).
Workspace integrationAssistance in Gmail, Docs, Sheets, Meet for 3B+ users.
GovernanceInherits IAM and data-access controls; managed via the Agent Registry.
Build vs buy
For an internal knowledge assistant, pilot Gemini Enterprise before building custom RAG - the connectors, permissions, and Workspace integration save real work. Build on the platform when you need bespoke logic.

Grounding & RAG

Keep answers tied to your data and to fresh facts.

Official documentation ↗

OptionUse
RAG EngineManaged retrieval pipeline: ingest, chunk, embed, retrieve - minimal code.
Grounding with Google SearchGround responses in live web results with citations.
Vertex AI Search retrievalEnterprise retrieval over your indexed corpora, permission-aware.
BigQuery / AlloyDB / Vector SearchBring your own vector store when you want control or data locality.
Retrieval is the failure point
Most RAG quality problems are retrieval, not the model. Tune chunking, add rerankers, and evaluate retrieval before blaming Gemini.

Vertex AI Build (training & tuning)

The classic ML platform under the new name - train, tune, deploy, and run MLOps.

Official documentation ↗

CapabilityUse
TuningSupervised fine-tuning and distillation of Gemini/open models for your task.
Custom trainingTrain your own models on CPU/GPU/TPU with managed jobs.
Prediction / endpointsOnline and batch serving with autoscaling.
Pipelines / Feature Store / EvalMLOps: reproducible pipelines, features, and evaluation.
Colab Enterprise / NotebooksManaged notebooks for development.
Order of operations
Prompt + grounding first; tune only with evidence the base model misses your bar; distill to cut run-cost once a tuned large model proves out.

Generative Media

Image, video, speech, and music generation - all first-party on the platform.

Official documentation ↗

ModelModality
ImagenImage generation and editing.
VeoText/image-to-video generation.
ChirpSpeech-to-text and text-to-speech.
Lyria 3Music generation.
Gemini Flash ImageImage generation/editing inside the Gemini family.
Provenance
Google applies SynthID watermarking to generated media - useful for content-credential and compliance requirements.

Vectors & Data

Where embeddings and ground-truth live. Pick by where your data already is.

Official documentation ↗

StoreBest for
BigQuery vector searchVectors next to your analytics data; Gemini-in-BigQuery for SQL-native AI. Strongest when BQ is your warehouse.
AlloyDB AI (pgvector + ScaNN)Low-latency vectors beside operational Postgres data, with Google's ScaNN index.
Vertex AI Vector SearchPurpose-built, high-scale vector search (formerly Matching Engine).
Spanner / Firestore vectorsVectors in globally-distributed or document/app databases.
Default
If your data is in BigQuery, start there. Use AlloyDB AI for operational/low-latency, Vector Search for the largest dedicated indexes.

TPUs & GPUs

Google's silicon is the cost/perf lever; NVIDIA GPUs are the compatibility lever.

Official documentation ↗

SiliconRole
TPU Ironwood (v7)~4.6 PFLOPS/chip; 9,216-chip superpods (~42.5 EFLOPS). Frontier training and inference.
TPU 8t (8th gen, training)Scales to ~9,600 TPUs and ~2 PB shared HBM per superpod; ~3x Ironwood, up to ~2x perf/Watt.
TPU 8i (8th gen, inference)Boardfly topology connecting ~1,152 TPUs/pod; ~3x on-chip SRAM; ~80% better perf/$ for inference.
A3 / A4 GPU VMsNVIDIA H100/H200/Blackwell for max framework/CUDA compatibility.
AI HypercomputerThe integrated supercomputing architecture (silicon + network + software) under it all.
Architect's lever
For Gemini and TPU-friendly open models at volume, TPUs can win decisively on price/perf. Keep GPUs where a specific CUDA/framework path is required.

Governance & Safety

Independent screening and responsible-AI controls for prompts, responses, and models.

Official documentation ↗

ControlWhat it does
Model ArmorScreen prompts and responses for prompt injection, jailbreaks, sensitive data, and unsafe content - independent of the model.
Safety filtersConfigurable content-safety thresholds on Gemini.
Responsible AI toolingEvaluation, explainability, and safety guidance.
Agent Identity / IAMLeast-privilege access for agents and humans across the platform.
Apply at the platform layer
Put Model Armor between the app and the model so the same policy holds regardless of which model an agent selects.

Architecture Patterns

The shapes most Google Cloud GenAI workloads fall into.

1. Enterprise assistant

Gemini Enterprise over your connectors + Workspace, or a custom RAG Engine app on the platform with Model Armor.

2. Production agent

ADK + Agent Runtime + Agent Identity + Gateway (MCP via Apigee) + Observability. Add A2A for multi-agent.

3. SQL-native AI

Gemini-in-BigQuery and BQ vector search bring generation and retrieval to data already in the warehouse.

4. Multimodal pipeline

Gemini long-context over documents/video; Imagen/Veo for generation; SynthID for provenance.

5. Custom/open model service

Tune Gemma 4 or an open model, serve on TPU/GPU endpoints; distill to cut cost.

6. Workspace-embedded

Gemini for Workspace and Code Assist - buy the assistant in the tools people already use.

Decision Matrix

Fast answers for design reviews.

QuestionDefault answer
Which model?Gemini 3.x Flash for volume; 3.x Pro for hardest reasoning; Claude (also in Model Garden) when it wins your eval; Gemma 4 for open/on-prem.
Buy or build the assistant?Gemini Enterprise first; build on the platform for bespoke logic.
Agent framework?ADK + Agent Runtime; adopt A2A only when you have multi-agent/multi-vendor needs.
Where do vectors live?BigQuery if data is there; AlloyDB AI for operational/low-latency; Vector Search for largest dedicated indexes.
TPU or GPU?TPU for Gemini/open models at volume (price/perf); GPU for specific CUDA/framework needs.
RAG how?RAG Engine or Vertex AI Search for managed; bring-your-own vector store for control.

Pricing & Cost Control

Shape, not exact numbers - rates change and vary by model/region. Confirm on Google Cloud pricing pages.

LeverHow it billsControl
Gemini APIPer input/output token (and per modality), per model tier.Flash for routine work; Pro only when needed; cap output; use context caching.
Provisioned ThroughputReserved capacity for steady high volume.Commit after you know the load.
Vector Search / RAGIndex storage + query + embedding tokens.Right-size chunks; prune stale docs; prefer BigQuery if data is there.
Training / endpointsAccelerator-hours (TPU/GPU) + serving.Autoscale; batch where possible; TPUs for price/perf.
AgentsModel tokens x steps + tool calls + runtime.Cap loop length; route cheap model for routine steps.
The agent cost trap
Agent loops multiply token cost by steps. Budget per-conversation, log token usage, and use Flash for routing/routine steps with Pro reserved for the hard parts.

Risks & Gotchas

Read this one.

Rebrand & naming drift
Vertex AI to Gemini Enterprise Agent Platform, Agentspace to Gemini Enterprise. SDKs, IAM roles, and URLs mix old/new. Confirm current names and check model lifecycle pages before pinning.
Model version expiry
Gemini versions have explicit lifecycles and can be retired. Pin versions, monitor deprecations, and test before upgrading - auto-latest will bite you.
Multi-agent sprawl
A2A and agent-of-agents are powerful and hard to govern. Enforce Agent Identity, least privilege, Registry governance, budgets, and Observability from the start.
Data residency & grounding
Grounding with Google Search and external tools can move data off your boundary. Confirm residency; prefer in-boundary retrieval where compliance requires.
Quotas & capacity
Accelerator and API quotas throttle real workloads; TPU capacity can be contended. Request increases early and design for backoff.
Estate fit
GCP AI shines when BigQuery/Workspace are central. If your data lives in AWS/Azure, weigh egress and identity friction before committing.