Oracle AI, the practical way

This portal covers Oracle's full AI stack as of June 24, 2026. From OCI Generative AI Service, Enterprise AI Agents, and Oracle AI Data Platform, to AI Vector Search in Oracle AI Database 26ai, to the 22 Fusion Agentic Applications launched in March 2026. Architecture, trade-offs, risks, pricing. No marketing talk.

Refreshed June 2026 Architecture-first Enterprise focus Vendor-neutral

TL;DR

Oracle's AI story in 2026 has three centers of gravity. OCI Enterprise AI packages Generative AI Models, Enterprise AI Agents, and Governance into a managed build platform. Fusion Agentic Applications (GA Mar 2026) ships 22 pre-built agentic apps embedded inside Fusion Cloud ERP, HCM, SCM, and CX. Underneath everything sits Oracle AI Database 26ai with native vector search, Select AI, Private Agent Factory, and Private AI Services Container for workloads that must stay database-close or private. If you're an enterprise already on Oracle, this stack is increasingly hard to ignore.

How this portal is organized

Left sidebar groups Oracle AI into seven layers. Each service has its own page with tabs: Overview, Architecture, Models or Features, Pricing, Risks, and When to use. The bottom of the sidebar has decision matrices, architecture patterns, and a cross-cloud comparison.

NEWGenerative AI

Cohere Command A family, Llama 4 Scout & Maverick, xAI Grok 4.x, NVIDIA Nemotron, Google Gemini options, OpenAI open-weight models, importable compatible models, and dedicated AI clusters.

GAAI Vector Search

Native VECTOR datatype, HNSW + IVF indexes, unified hybrid search across vector + relational + JSON + graph + spatial, plus private agents and private AI containers.

NEWFusion Agentic Apps

22 pre-built agentic applications for Finance, HR, SCM, and CX. Native to the transactional system, governed by Fusion roles.

NEWAI Data Platform

Governed data discovery, catalogs, workspaces, pipelines, notebooks, RBAC, and agent-ready data connections for AI teams that need a shared data plane.

Who this is for

Enterprise architects, Oracle DBAs and Apps DBAs moving into AI, technical leads scoping pilots, and anyone who has to defend an Oracle-vs-hyperscaler choice in a steering committee. Assumes you already know cloud, databases, and identity. Does not assume you know what an embedding is, and explains the AI-specific bits as it goes.

The Oracle AI mental model

Think of Oracle's AI in three layers stacked on top of each other.

Figure 1 · Oracle AI is layered. Most enterprises start at Layer 3 (Fusion apps) or Layer 1 (vector search in the DB they already own).

What sets Oracle apart in 2026

Differentiator	What it means in practice
Vectors live in the source-of-truth DB	No separate Pinecone or Weaviate. Transactionally consistent vector search next to your rows. Means RAG ground truth and operational data never drift.
Unified hybrid search	One SQL can join vector similarity with relational predicates, JSON paths, graph hops, and spatial filters. Other vendors need an orchestration layer.
Multi-model gateway (BYO LLM)	Cohere, Meta Llama 4, xAI Grok, NVIDIA Nemotron, one service, one endpoint, one bill. Useful when you want vendor optionality without re-architecting.
Fusion-native agents	22 agentic apps run inside Fusion with full role security, approval hierarchies, and transactional context. Other vendors' agents have to bolt on to ERP via APIs.
Sovereign and Gov coverage	OCI Gen AI is GA in US Gov, US Classified, UAE Central, EU sovereign. Often the only path for regulated workloads.
Free-tier vector search	AI Vector Search ships at no extra licence cost in 26ai. Compared to Postgres + pgvector + vector-DB-as-a-service, the TCO comparison is brutal for the competition if you already pay for the DB.

What Oracle is still weak at (be honest)

Gaps worth naming

Oracle does not own a frontier model. It packages partner/open model families and import paths behind OCI controls. If your differentiation depends on a single exact proprietary model/version that is not exposed in your target OCI region, Oracle is not where you go for that workload, use the model vendor directly or via AWS/Azure, behind a model gateway. Oracle's bet is enterprise integration, governance, data gravity, and private deployment patterns, not owning the top model leaderboard.

Tooling maturity

Compared with AWS Bedrock Studio, SageMaker Studio, and Azure AI Foundry, the developer experience is improving fast but still feels Oracle-Console-y. Notebooks, MLOps lineage, and prompt-engineering UIs are catching up, not leading.

How to read the rest of this portal

Each service tab follows the same shape: Overview → Architecture → Models/Features → Pricing → Risks. If you only have time for one tab, read Risks. The other tabs tell you what something does. Risks tells you what burns you in production.

What's New - Q4 2025 through June 2026

Material changes that affect architecture, cost, or risk decisions. Curated, not a press-release dump.

TL;DR

Three things matter most. One: Oracle AI Database 26ai moved AI into the database, including vector search, Select AI, private agents, hybrid RAG exposed as MCP tools, and private vector-service containers. Two: OCI Enterprise AI is now a broader platform: Generative AI Models, Enterprise AI Agents, and Governance rather than only an endpoint catalog, and the June 2026 wave widened model choice (Nemotron 3 Ultra, Qwen, Gemma, gpt-oss on B200), promoted Cohere Rerank 4 to on-demand, and added OCI Resource Analytics. Three: Fusion Agentic Applications launched Mar 2026 with 22 pre-built agents inside Fusion ERP/HCM/SCM and expanded to CX in Apr 2026.

June 2026 in one line

The June refresh is mostly platform breadth, not new tech direction: more models you can import and host (Nemotron 3 Ultra, Qwen, Gemma, OpenAI gpt-oss on B200), Cohere Rerank 4 now on-demand and on dedicated clusters, full multimodal (Embed 4 + xAI Voice), a new Abu Dhabi footprint for Enterprise AI, and OCI Resource Analytics for AI-queryable cloud-estate data. Source: Oracle "What's New in AI, June 2026" (published June 11, 2026).

Major releases timeline

Date	Release	Why it matters
Oct 2025	Database 23ai renamed Oracle AI Database 26ai	Aligns with calendar versioning. AI Vector Search now standard, not an add-on. Branding tells customers AI is a first-class workload.
Jan 2026	Oracle AI Database 26ai Linux x86-64 on-prem (RU 23.26.1)	Enterprises can run AI Vector Search on their existing Exadata / commodity Linux without going to OCI.
Jan 2026	OCI Gen AI in US Classified Cloud	Top Secret / classified workloads can now use Gen AI without leaving the Oracle classified environment.
Jan 2026	xAI Grok 4.1 Fast + Cohere Command A Vision, Command A Reasoning	Cheaper Grok variant for high-volume. Cohere adds vision and reasoning variants for enterprise agent patterns.
Mar 2026	OCI Enterprise AI GA	Oracle formalizes the stack around Generative AI Models, Enterprise AI Agents, and Enterprise AI Governance.
Mar 2026	Enterprise AI Agents GA	Agent runtime expands beyond basic RAG into a managed platform with tools, vector stores, responses API, and governance hooks.
Mar 2026	Fusion Agentic Applications launch (22 apps)	Native ERP/HCM/SCM agents. Not bolt-on. Approval hierarchies and Fusion roles flow through automatically.
Mar 2026	AI Agent Studio adds Agentic Applications Builder	No-code orchestration of Oracle, partner, and external agents. Free with Fusion subscription.
Mar 2026	AI guardrails for OCI Gen AI on-demand	Native guardrail evaluation changes the production control model: validate prompts and responses at the platform layer, not only in app code.
Mar 2026	NVIDIA GTC 2026: OCI Superclusters with GB200 NVL72	For frontier training. Less relevant to most enterprises but signals Oracle's continuing GPU access advantage.
Apr 2026	RU 23.26.2 for Oracle AI Database 26ai	Quarterly cadence now driving vector improvements (DML on HNSW, hybrid search refinements).
Apr 2026	Fusion Agentic Apps for CX (Sales, Service, Marketing)	Expands from back-office (ERP/HCM) into customer-facing flows.
Apr 2026	NVIDIA Nemotron 3 Nano Omni on OCI Gen AI	Adds a strong small-model option for multimodal use cases on commodity GPUs.
May 2026	Import compatible models into OCI Gen AI	Architecturally important: teams can bring compatible models such as Qwen/Gemma-style models into the OCI Gen AI control plane instead of leaving everything in Data Science.
May 2026	Cohere Embed 4 supports mixed text + image input	Useful for multimodal RAG over PDFs, slides, screenshots, catalog images, and claims packets.
May 2026	Cohere Rerank 4 on OCI Gen AI	Better second-stage retrieval quality for RAG. Drop-in upgrade for existing RAG pipelines.
May 2026	xAI Voice text-to-speech on OCI Gen AI	Adds hosted TTS to Oracle's Gen AI layer; use for call-center summaries, training narration, accessibility, and agent voice responses.
May 2026	Grok 4.3 on OCI	Model catalog expansion for reasoning-heavy research and analysis workloads. Confirm region/model availability before designing around it.
May 2026	OCI Gen AI in UAE Central (Abu Dhabi)	Sovereignty win for GCC customers and BFSI. Bedrock and Azure OpenAI parity issue for the region.
Jun 2026	Deprecated Gen AI APIs become unavailable	Do not build new integrations on legacy GenerateText/SummarizeText style APIs. Use the current chat / responses APIs and SDK patterns.
Jun 2026	Cohere Rerank 4 now on-demand and on Dedicated AI Clusters	Reranking is no longer dedicated-cluster only. On-demand pricing lowers the barrier to adding a second-stage reranker to existing RAG pipelines. Quality lift over raw vector search for little engineering cost.
Jun 2026	NVIDIA Nemotron 3 Ultra on OCI Enterprise AI (dedicated clusters)	Open-weights frontier reasoning/agentic model you host on Oracle-recommended GPUs behind a managed OCI endpoint. Option for teams that want a strong open model under their own control plane, not a vendor API.
Jun 2026	New Model Import models: Alibaba Qwen, Google Gemma; gpt-oss-20b/120b on B200 in Abu Dhabi	Widens the bring-your-own-model catalog inside the OCI Gen AI control plane. gpt-oss on B200 clusters in UAE Central pairs open OpenAI weights with sovereign-region hosting.
Jun 2026	Multimodal: Cohere Embed 4 (text/image/combined) + xAI Voice TTS in Enterprise AI	Confirms multimodal RAG and voice as first-class on the platform. Embed 4 handles mixed text-image inputs; xAI Voice covers narration, accessibility, and agent voice responses.
Jun 2026	OCI Enterprise AI GA in UAE Central (Abu Dhabi)	Moves beyond a single Gen AI endpoint to the full Enterprise AI stack in-region, on-demand or dedicated. Data-residency win for GCC and BFSI customers.
Jun 2026	OCI Resource Analytics for cloud-estate intelligence	Near-real-time view of resources, relationships, and config metadata across regions/tenancies. Runs on Oracle AI Database with Select AI and MCP server support, so agents and assistants can query your estate in natural language.
Jun 2026	OCI AI Accelerator Packs + Enterprise AI Chat reference architecture	Preconfigured, self-service AI solutions launchable from the OCI Console, plus a published reference architecture and GitHub deployment guide for enterprise-grade AI chat. Lowers time-to-first-pilot.
Jun 2026	Hybrid RAG in 26ai exposed as an MCP tool	Oracle guidance now shows turning a 26ai vector index into an MCP tool for hybrid (vector + keyword) RAG. Signals MCP becoming the default integration surface between the database and agents.

Practical implications for architects

If you have an existing Oracle DB estate

26ai upgrade unlocks RAG without buying a vector DB. Plan an architecture review: which workloads can move to in-DB embeddings vs which need OCI Gen AI Agents service? The decision often comes down to whether the corpus is mostly structured (DB-side) or mostly documents (Agents service).

If you run Fusion Apps

Pilot 1-2 of the 22 Agentic Apps now. They are included in your Fusion subscription. The build-vs-buy math for custom agents got worse, Oracle's are pre-wired into roles, approvals, and data. Build only what Fusion does not cover.

If you are building greenfield AI on OCI

Start with Enterprise AI Agents and the current Responses-style APIs for managed RAG, tools, vector stores, and governance. Drop into Data Science only when you need custom model training or hosting AI Quick Actions doesn't cover.

If multiple teams need the same AI data

Use Oracle AI Data Platform to define governed data products, catalogs, owners, lineage, RBAC, and refresh rules. Do not let every chatbot create its own private copy of the corpus.

If sovereignty drives the decision

OCI's reach into Gov, Classified, UAE, and EU sovereign regions has widened. For workloads where the data physically cannot leave a jurisdiction, OCI Gen AI is increasingly the only major-vendor option with a GA service.

Watch-out: version naming

Internally, 26ai is version 23.26.1 (Jan 2026 RU). Patches, MOS notes, and many docs still say "23ai" or "Database 23c/23ai". Don't let the marketing rename confuse procurement or patching playbooks. 26ai = 23ai with calendar-year naming. Same product family.

Service Map

Every Oracle AI service worth knowing, in one diagram. Use this to orient before you go deep.

Figure 2 · Oracle AI service map, June 2026. Layers from consume (top) to infrastructure (bottom).

Reading the map

The top band is where you start if you are buying outcomes, pick a Fusion agent, hook up APEX RAG, or use a managed Gen AI Agent. The middle band is where you start if you are building, pick a model, write prompts, expose endpoints. The bottom two bands are where the data and compute live; you do not get to ignore them, because they drive cost and latency.

A common mistake

Teams start at the platform band (Gen AI Service) when an app-band service (Generative AI Agents) would have shipped in a month instead of six. Always check whether a managed service one layer up does what you need before you build it yourself.

OCI Generative AI Service GA

Oracle's managed foundation-model platform. Use hosted models, import compatible models, build Enterprise AI Agents, apply guardrails, or rent a dedicated AI cluster. Region-restricted, IAM-governed, OCI-billed.

Official documentation ↗

Overview

Architecture

Capabilities

Pricing model

Risks & gotchas

When to use

TL;DR

One platform, several control points. You can call hosted models for chat, embeddings, rerank, and text-to-speech; import compatible models into the Gen AI control plane; build Enterprise AI Agents; and apply native guardrails. You pick on-demand for elasticity or dedicated AI clusters for steady throughput, isolation, fine-tuning, and private capacity.

What problem this solves

Most enterprises don't want to manage GPU clusters, model weights, guardrail services, vector-store plumbing, and vendor contracts separately. They want a single OCI-governed surface with IAM, private networking, logging, cost controls, and the ability to swap models without rewriting the app. That's the offer. The trade-off is catalog and feature availability vary sharply by region and model family.

Two consumption modes: pick one per workload

Mode	How you pay	Latency & isolation	Best for
On-demand	Per 1M characters (input + output for generation; input only for embeddings)	Shared GPU pool. Burst-tolerant. Variable latency under load.	Prototyping, low-volume prod, spiky workloads, dev environments.
Dedicated AI Cluster	Hourly per cluster, irrespective of utilization	Dedicated GPUs in your tenancy. Stable latency. Tenancy-isolated.	Steady high-volume traffic, regulated data, sub-second SLA, fine-tuned models, custom-trained adapters, imported compatible models.

Rule of thumb

Cross over to a dedicated cluster when your on-demand bill exceeds the cluster hourly rate by ~30%. Below that, on-demand is cheaper and operationally simpler. Above it, dedicated wins on cost and latency.

Reference architecture

Figure · OCI Generative AI Service reference. Service gateway keeps traffic on the Oracle backbone, never the public internet.

Network and identity

The Gen AI Service endpoint is reachable through a Service Gateway and supports private endpoint patterns for workloads that should stay off the public internet. Authentication is OCI IAM, calls from compute instances use instance principals or resource principals, external apps use signed requests or governed API-key patterns. Authorization is granted via IAM policies on the generative-ai-family resource type. You can scope to compartments, model endpoints, and agent resources.

Where the data goes

Oracle's stated position is that on-demand requests are not used to train shared models. Dedicated clusters provide stronger tenant isolation. Logs of prompts and completions can be captured to OCI Logging at your discretion. For regulated data, prefer dedicated clusters, private endpoints, zero-trust network controls where available, and VCN-side egress rules so the only allowed path is to OCI services you approved.

Capability matrix (June 2026)

Capability	On-demand	Dedicated cluster	Notes
Text generation	●	●	All chat / completion models.
Embeddings	●	●	Cohere Embed v4 multimodal, English + multilingual.
Reranking	●	●	Cohere Rerank 4 (May 2026).
Vision (image input)	●	●	Cohere Command A Vision, Llama 4 multimodal, Grok 4.3 vision.
Text-to-speech	●	◐	xAI Voice support arrives through OCI Gen AI, separate from OCI Speech transcription.
Function calling / tools	●	●	Standard JSON-mode tool calling on Cohere & Llama.
Responses API / hosted tools	●	◐	Use the current Responses-style APIs for agentic applications; do not build on deprecated text APIs.
Streaming output	●	●	SSE.
Import compatible models	○	●	Use when a model is compatible with the OCI Gen AI serving path but is not yet in the managed catalog.
Fine-tuning (LoRA / T-Few)	○	●	Requires dedicated cluster.
Custom-trained adapters	○	●	Per-customer model endpoints.
Long context (>200K)	◐	●	Llama 4 Maverick & Grok 4.20 support longer contexts; quotas tighter on shared pool.
Content moderation / guardrails	●	●	Native guardrail evaluation for prompts/responses; supported controls vary by on-demand vs dedicated endpoint.

Region availability (as of June 2026)

Always confirm in the Console, but at time of writing OCI Gen AI is GA in: US (Chicago, Phoenix, Ashburn), Frankfurt, London, Amsterdam, Tokyo, Osaka, Sydney, Mumbai, Hyderabad, São Paulo, Toronto, Saudi Arabia (Jeddah, Riyadh), Israel, Singapore, Seoul, UAE Central (Abu Dhabi, full Enterprise AI as of June 2026), US Gov Cloud, US Classified Cloud. Not every model is in every region, Grok and Nemotron have narrower footprints, Cohere is widest. Abu Dhabi now also hosts imported OpenAI gpt-oss-20b/120b on B200 dedicated clusters.

Region trap

Model availability lags region launch. A region may have Cohere Command A but not Llama 4 Maverick. Pin model+region in your architecture review before committing.

Pricing mental model

On-demand pricing is per 1 million characters, not per token. Character count includes whitespace. For generation, you pay input + output; for embeddings, input only. For dedicated clusters you pay hourly per unit, where one "unit" is a specific GPU configuration that varies by model family. Verify current rates on the OCI pricing page, they move.

Cost behaviour by mode

Workload	Best mode	Why
POC / pilot, <5M chars/day	On-demand	Pay only for what you call. Cluster idle cost would dominate.
Steady 50M+ chars/day, predictable	Dedicated cluster	Hourly rate amortizes well past a threshold; latency stable.
Fine-tuned Cohere for legal review	Dedicated cluster (required)	Custom adapters only deploy on dedicated.
Multi-tenant SaaS, bursty per customer	On-demand with circuit breakers	Quotas + retry shed load gracefully; cluster overprovisioning expensive.
Regulated data with isolation requirement	Dedicated cluster	Tenant isolation is the buying criterion, not cost.

Hidden cost drivers

System prompt bloat. Every request pays for the full system prompt. A 4KB persona prompt at scale dominates the bill. Use prompt caching where supported, or templatize.
Naive RAG context windows. Stuffing 20 chunks into context costs 20x more than stuffing 4. Use reranking (Rerank 4) to cut to top-K, then send.
Retries and timeouts. A 504 retry is two billed calls. Cap retries, log them, set them as alarms.
Egress. Calls from outside OCI to the Gen AI endpoint can incur OCI egress on the response path. Keep callers inside OCI when volume is high.
Embeddings re-runs. Re-embedding your whole corpus when you change models is expensive. Version your embeddings and decide policy up front.

Risks to think about before production

Risk	Impact	Mitigation
Model deprecation	Apps break when a model is retired	Abstract model name behind a config flag. Test against the next model in the family early. Subscribe to OCI release notes for Gen AI.
Quota throttling under burst	5xx during peak, lost revenue	Set Alarms on 429/503 from the endpoint. Request quota increases proactively. Move hot workloads to dedicated.
Region-model mismatch	Cannot deploy because model missing in region	Document model-region matrix as part of architecture review. Use a different region for inference if data residency allows.
Cross-tenant prompt leaking	Sensitive data echoed to other tenants in your SaaS	Per-tenant prompt isolation, no global cache of completions, audit log review.
Hallucinated tool calls	Agent calls wrong API with wrong args	Strict JSON schema validation, dry-run flag, idempotent tool design, human-in-loop on side-effectful tools.
Prompt injection from documents	RAG-fed document overrides system prompt	Use a defensive system prompt; classify retrieved chunks; mark untrusted content explicitly; pre-scan inbound docs.
Cost runaway from agent loops	Recursive agents consume thousands of dollars overnight	Per-session token budget, max-step cap, alarm on cost per session, kill switch.
Compliance audit gaps	Cannot prove what the model said to whom and when	Always log prompt + completion + model + version to OCI Logging. Retain per regulatory policy.

Use Gen AI Service when…

You need an LLM endpoint inside your OCI tenancy with IAM, VCN, and audit aligned to your existing controls.
You want vendor optionality across Cohere, Llama, Grok, NVIDIA, Gemini-style integrations, and importable compatible models without running each stack yourself.
You are building a custom app or pipeline, not consuming a pre-built one.

Skip it and use something else when…

Your use case is covered by a Fusion Agentic App, use that instead, it ships in days.
You need a specific frontier model or exact version that is not exposed in your OCI region, go direct, use that vendor's platform, or isolate the exception behind a model gateway.
Your corpus is mostly documents and you want managed RAG, use OCI Generative AI Agents (Knowledge Bases) instead of writing your own retrieval.
Your traffic is <100K requests/month, the per-character bill is fine, but evaluate whether OpenAI direct is operationally simpler given your team's existing tooling.

OCI Generative AI Agents GA ENTERPRISE AI AGENTS

Oracle's managed agent runtime. Build RAG agents, tool-using agents, and Responses API applications with vector stores, knowledge bases, hosted tools, session state, and governance hooks.

Official documentation ↗

Overview

Architecture

Knowledge bases

Tools & orchestration

Risks & gotchas

When to use

TL;DR

Gen AI Agents started as managed RAG; by 2026 it is part of OCI Enterprise AI Agents. It wraps the Gen AI model layer with knowledge bases, vector stores, hosted tools, multi-turn session state, Responses-style APIs, and governance controls. Result: managed agents without owning every piece of retrieval, tool-calling, and audit plumbing. Trades flexibility for time-to-market.

What you get out of the box

Knowledge Bases backed by Object Storage, OCI OpenSearch, or Oracle Database 23ai/26ai Vector Search.
Vector Stores for reusable retrieval assets across assistants and hosted tools.
Multi-turn conversations with automatic context retention per session.
Custom instructions at agent level (system-prompt-like persona).
Responses API patterns for streaming, tools, file search, and stateful turns.
Guardrails integrated with OCI Enterprise AI Governance.
Human-in-the-loop approval steps as a first-class concept.
Citations back to source documents so users can verify.

What it is not

It is not a replacement for Fusion Agentic Apps when a pre-built agent already covers the process. It is not a free-form "let the model do anything" runtime; tools, memory, files, and guardrails still need explicit design. And it does not remove the need to think carefully about chunking, embedding, metadata, identity, and retrieval quality, managed doesn't mean magic.

Reference architecture

Figure · A request flows client → session → agent → LLM. The agent decides whether to query the KB, call a tool, or just answer. Citations flow back.

Three knowledge base source types

Source	Best for	How indexing works	Trade-offs
OCI Object Storage	Static document corpora (PDFs, DOCX, MD)	Service ingests files on a schedule, chunks, embeds, stores in a service-managed vector store	Simplest. Limited tuning. Refresh latency on source changes. Good first move.
OCI Search with OpenSearch	BYO indexed corpus where you control chunking and metadata	You ingest and index into OpenSearch yourself; agent queries it; chunks must be <512 tokens	You own the pipeline. More work. Better when corpus is large or filtering is heavy.
Oracle AI Database 23ai/26ai	RAG over relational + document corpora with security filters	Documents and vectors live in the DB; uses native VECTOR datatype and HNSW/IVF; agent issues hybrid SQL	Best when the DB is already your source of truth. Row-level security flows through. Requires DBA skills.

Decision shortcut

If your corpus is <10K docs and rarely changes, use Object Storage. If you need to filter by user permissions or join with relational data, use Oracle Database 23ai. If you have an existing search team and a large heterogeneous corpus, use OpenSearch.

Tools = how the agent acts

An agent without tools is a chatbot. With tools, it can call APIs, query systems, write to records, search files, or invoke governed internal services. The agent picks tools by function-calling on the underlying LLM. You define each tool with a name, description, and JSON schema. The runtime validates the model's output against the schema before invoking the backing OCI Function, HTTP endpoint, integration flow, or MCP-style tool server.

Common tool patterns

Read-only lookups: query Fusion HCM, Service Cloud, custom APIs. Safe to call freely.
Side-effectful actions: submit POs, create tickets, send emails. Wrap in human-in-loop approval.
Computation: call a calculator/converter function. Cheap to allow.
Hosted tool use: file search, code execution, and vector-store lookup where the Responses API supports it.
Long-running jobs: start a workflow in OCI Functions, return a job ID, poll for status.

Multi-agent orchestration

Native multi-agent orchestration is provided through the AI Agent Studio Agentic Applications Builder (Mar 2026 release). It lets you compose multiple Gen AI Agents into a workflow with shared memory, conditional routing, and ROI measurement. For Fusion customers it is free and the recommended path. For non-Fusion environments, you can compose at the application layer with the SDK.

Risks and gotchas

Risk	What goes wrong	What to do
Stale knowledge base	Source bucket updates but the index hasn't re-ingested yet	Configure ingestion schedule, monitor lag, surface a "Last refreshed" timestamp to users.
Wrong chunk size	Retrievals are too narrow or too wide, hurting answer quality	Default chunking rarely optimal for technical docs. Pilot with OpenSearch where you control it.
Citations don't match claim	LLM invents text but cites a real chunk	Strict prompting + post-hoc verification step. For high-stakes use, render the chunk text alongside the answer.
Permission bleed	Agent returns a doc the user shouldn't see	Filter at retrieval time using user identity. With 23ai KB this is straightforward via VPD. With Object Storage you must bucket-segregate or pre-filter.
Tool failure cascade	Tool returns error, agent retries, retries, retries	Cap retries, expose tool errors as plain text in the conversation, add max-step.
Slow first-token under load	Cold-start on the underlying LLM hurts perceived latency	Pre-warm via synthetic traffic; for SLA-critical agents use dedicated cluster.

Use Generative AI Agents when

You need an internal RAG chatbot over a corpus, and you don't want to build a retrieval pipeline.
You want managed citations and multi-turn out of the box.
Your knowledge base is one of the three supported source types.

Skip when

You need exotic retrieval (graph RAG, multi-hop reasoning across sources), build with the Gen AI Service directly.
Your use case is in Fusion, use the Fusion Agentic App instead.
You need on-prem inference, Agents is a managed service in the cloud.

Foundation Model Catalog

Model families and model-delivery paths to understand on OCI Generative AI as of June 24, 2026. Always confirm exact model IDs and region availability in the OCI Console before implementation.

Official documentation ↗

TL;DR

Oracle is not trying to own one frontier model. The 2026 architecture is a model platform: Oracle-hosted Cohere, Meta, xAI, and NVIDIA families; Google Gemini options where exposed through the OCI Gen AI integration path; OpenAI open-weight models through AI Quick Actions; and compatible-model imports for dedicated serving. Treat model identity as configuration, not application logic.

Models by family

Cohere: Oracle's strategic partner

Model	Type	Strengths	Best for
Cohere Command A	Chat / generation	Strong RAG behavior, enterprise tone, multilingual	Default general-purpose chat agent for enterprise apps
Cohere Command A Vision Jan 2026	Multimodal	Image + text understanding	Document understanding pipelines, screenshot Q&A
Cohere Command A Reasoning Jan 2026	Reasoning	Chain-of-thought, multi-step planning	Agent planning, complex tool selection
Cohere Embed v4	Embeddings	Multilingual, multimodal, 1024-dim or 256-dim	Default embedding model for RAG on OCI
Cohere Rerank 4 Jun 2026	Reranker	Pairwise scoring of query vs candidate; cuts top-N to top-K. Now available on-demand and on dedicated clusters (Jun 2026), not dedicated-only	Second-stage RAG retrieval; quality lift over raw vector search

Meta: open weights, broad fit

Model	Strengths	Best for
Meta Llama 4 Scout	Efficient, smaller MoE; cheap inference	High-volume classification, summarization, lightweight RAG
Meta Llama 4 Maverick	Larger MoE; long context; multimodal	Long-document analysis, complex multi-doc RAG
Meta Llama 3.3 70B	Dense, well-understood baseline	Fine-tune target where you have abundant labeled data

xAI Grok

Model	Strengths	Best for
Grok 4.3 May 2026	Strong reasoning, real-world knowledge breadth	Research assistants, analyst summarization
Grok 4.20	General-purpose chat, faster variant where regionally available	Consumer-facing agents where latency matters
Grok 4.20 Multi-Agent	Model-side support for multi-agent style orchestration where available	Workflows with multiple specialist sub-agents
Grok 4.1 Fast Jan 2026	Lowest-cost Grok variant	High-volume routing, low-complexity tasks

NVIDIA

Model	Strengths	Best for
Nemotron 3 Nano Omni Apr 2026	Small footprint, multimodal, optimized for NVIDIA stack	Edge-ish inference, multimodal classification, cost-sensitive workloads
Nemotron 3 Ultra Jun 2026	Open weights, training data, and recipes; frontier reasoning and agentic performance. Hosted via OCI Enterprise AI imported-model deployment on dedicated AI clusters	Teams that want a strong open model on Oracle-recommended GPUs behind a managed OCI endpoint and their own control plane

Other delivery paths: Gemini, gpt-oss, and imported compatible models

Path	What it means	Best for
Google Gemini model options	Use when exposed through the OCI Gen AI integration path in your region and tenancy. Treat availability as a region-specific architecture dependency.	Teams that want Gemini behavior but need OCI-side governance, networking, or billing alignment.
OpenAI gpt-oss in AI Quick Actions	Open-weight OpenAI models deployed through OCI Data Science / AI Quick Actions rather than the managed Gen AI on-demand catalog.	Private/custom deployments where open weights matter more than a managed token endpoint.
Import compatible models Jun 2026	Bring compatible model artifacts into OCI Generative AI dedicated serving so they can use the same endpoint and governance patterns. June 2026 added Alibaba Qwen and Google Gemma families, plus OpenAI gpt-oss-20b/120b on B200 clusters in Abu Dhabi.	Model standardization when your chosen model is not yet a first-party catalog model.
Direct external API	Keep Claude/GPT/Gemini direct calls behind your own model gateway when the exact model, version, or region is not available on OCI.	Exception workloads where model quality beats platform consolidation.

Choosing a model: quick heuristic

If your need is…	Start with
Default enterprise chat / RAG agent	Cohere Command A
Image + text in the same prompt	Cohere Command A Vision or Llama 4 Maverick
Complex multi-step planning	Cohere Command A Reasoning or Grok 4.3
Cheap, high-volume classification	Grok 4.1 Fast or Llama 4 Scout
Long context (>200K tokens)	Llama 4 Maverick or Grok 4.20
Multi-agent native orchestration	Grok multi-agent variants where available, or Enterprise AI Agents / AI Agent Studio for platform orchestration
Text-to-speech agent output	xAI Voice on OCI Gen AI
Embeddings for RAG	Cohere Embed v4
Reranking RAG candidates	Cohere Rerank 4
Fine-tuning on your data	Llama 3.3 70B (mature), Cohere via dedicated cluster, or AI Quick Actions for open-weight models

Model availability moves

Oracle ships new models monthly. Always confirm the current list and per-region availability in the OCI Console under Generative AI before locking in a model name in code or a contract.

Embeddings & Rerank

The unglamorous half of RAG. Get embeddings wrong and the LLM has no chance.

Official documentation ↗

What embeddings actually are

An embedding is a fixed-length vector of floats that represents semantic content. Two passages that mean similar things produce vectors that point in similar directions. Vector search finds the K nearest neighbors of a query vector and returns the passages they came from. That is retrieval. The LLM then writes an answer from those passages. That is the generation.

Cohere Embed 4: the default on OCI

Property	Value
Default dimensions	1024 (with a 256-dim variant for cost/storage sensitivity)
Languages	100+ via the multilingual variant
Input modalities	Text, image, and mixed text+image input for multimodal retrieval patterns
Max input	~512 tokens per chunk (chunk first, embed second)
Where it runs	OCI Generative AI Service, on-demand or dedicated

Chunking discipline (the part teams skip)

Chunk by structure first, length second. Split on headings, paragraphs, table rows, not arbitrary character counts.
Aim for ~300-500 tokens per chunk. Smaller chunks improve precision; larger improve context.
Overlap by 10-15%. Prevents losing the cross-boundary sentence.
Carry metadata. Source URI, page number, last modified, owning department. You will need this for filtering and citations.
Re-embed on policy change. Switching embedding models or dimensions means re-embedding the entire corpus. Plan version, cost, and rollback upfront.

Two-stage retrieval (the pattern that wins)

Figure · Two-stage retrieval. Vector recall is wide and cheap; rerank is narrow and precise.

Hybrid search: keyword + vector

Vector search misses queries like "Form 1099-B" because the model treats it as similar to many other tax forms. Keyword search nails it. Hybrid combines both with a weighted score. Oracle AI Database 26ai's Unified Hybrid Vector Search supports this natively in one SQL. OCI OpenSearch supports it as well via score blending.

Enterprise AI Governance & Guardrails

The platform controls around models and agents: guardrails, private endpoints, API keys, IAM, audit, and network isolation.

Official documentation ↗

What Oracle provides natively

Guardrails for OCI Generative AI: prompt and response checks for unsafe content, prompt injection, and other policy violations.
On-demand guardrail evaluation: call guardrails directly around a model request, or compose guardrails into your agent path.
Dedicated endpoint guardrails: inform/block behavior for dedicated AI cluster endpoints where supported.
PII detection via OCI Language Service, still useful as a deterministic pre/post filter when you need explicit PII categories.
Agent-level governance: citations, tool schemas, human-in-loop approvals, and session limits.
Private networking controls: private endpoints, service gateways, IAM policies, resource principals, and zero-trust network patterns where available.

What still belongs in your architecture

Control	Why Oracle's platform control is not enough by itself	Architecture move
Domain policy	Generic guardrails do not know your business rules, competitors, contract terms, or regulatory scope.	Keep a domain policy layer in your app or agent instructions, then post-check outputs against explicit policy.
Authorization	A model can only be safe if retrieval and tools enforce the user's actual entitlements.	Filter before retrieval. Use VPD / row-level security in 26ai, Fusion roles in Fusion, and compartment/IAM boundaries in OCI.
Tool safety	Guardrails do not make a side-effectful tool safe.	Schema validation, dry-run mode, idempotency keys, approval gates, and maximum-step budgets.
Audit evidence	Compliance needs exact inputs, outputs, model, version, user, tool calls, and citations.	Write structured audit events to OCI Logging or your SIEM for every model/agent turn.
Network isolation	Private endpoint support must still be paired with route, DNS, and egress controls.	Use service gateways/private endpoints, deny public egress, and document every approved outbound path.

Layered defense pattern

Figure · Guardrails are one layer. Production governance also needs identity, retrieval filtering, tool control, audit, and network isolation.

The one rule

Treat every retrieved chunk as untrusted user input. Your system prompt and guardrail policy must say so explicitly. This single discipline blocks most indirect prompt injection.

Oracle AI Vector Search GA 26ai · Jan 2026

Vectors as a first-class datatype, inside Oracle Database, indexed by HNSW or IVF, and joinable with relational, JSON, graph, and spatial in a single SQL statement.

Official documentation ↗

Overview

Architecture

Features (26ai)

Indexes: HNSW vs IVF

Pricing & sizing

Risks & gotchas

When to use

TL;DR

If your data already lives in Oracle Database, AI Vector Search means RAG without a separate vector store. The vector lives next to the row. Permissions, backups, replication, failover, all reuse what you already operate. Comparable functionally to pgvector, Pinecone, or Weaviate, but with the killer feature of Unified Hybrid Search: vectors joined with relational predicates, JSON paths, graph hops, and spatial filters in one query. No separate orchestration layer.

What it actually is

A new native datatype, VECTOR(dimensions, format), with two index types (HNSW and IVF), a SQL function set (VECTOR_DISTANCE, VECTOR_EMBEDDING, VECTOR_CHUNKS), and the ability to load ONNX embedding models into the database so embedding generation happens server-side without network calls. Plus the SQL planner has been extended to combine vector predicates with normal predicates intelligently.

Why this is a big deal for Oracle shops

No new license. Included in all editions of 26ai, including Standard Edition 2.
No new operational team. Your existing DBAs run it.
Row-level security flows through. A VPD policy that protects the row also protects its vector.
Backups already cover it. RMAN, Data Guard, GoldenGate just work.
Transactionally consistent retrieval. Vector search returns results consistent with your read snapshot, a property no standalone vector DB offers.

Architecture

Figure · 26ai keeps relational, vector, JSON, spatial, graph, and ONNX-embedding in a single engine. SQL is the API.

Feature set: what 26ai adds vs first 23ai release

Feature	Status	Why it matters
VECTOR datatype	GA	First-class storage. Variable dimensions and formats (FLOAT32, FLOAT16, INT8, BINARY).
HNSW index	GA	In-memory graph index. Fastest recall for moderate corpora that fit in Vector Memory Pool.
IVF index	GA	On-disk partitioning index. Scales to very large corpora without memory pressure.
HNSW with DML 26ai	GA	Transactionally consistent vector queries even with concurrent inserts/updates, including on RAC.
Unified Hybrid Vector Search 26ai	GA	Mix vector + relational + JSON + spatial + graph predicates in one query, planned together.
In-DB ONNX embedding	GA	Generate vectors server-side. No network egress for the embedding step.
DBMS_VECTOR_CHAIN	GA	PL/SQL package for chunk → embed → store → retrieve pipelines.
Distance functions	GA	L2, cosine, dot, Hamming, Manhattan. Pick per use case.
Quantization	GA	Reduces vector storage 4-32x with controlled accuracy loss.
Globally Distributed DB vector search	GA	Vector search across sharded deployments. For geo-distributed corpora.
Free in Autonomous DB free tier	GA	Try it on the always-free ATP without spending a cent.

Unified Hybrid Search: one query, many predicates

The standout 26ai capability. A single SQL can ask: "Find me passages semantically similar to this query, where the owning department is HR, that mention 'parental leave' (text predicate), authored after Jan 2025, in any of these JSON-tagged jurisdictions." The optimizer plans vector and non-vector predicates together. In other architectures this requires post-filtering or a metadata sidecar. In 26ai it's one statement.

Index trade-offs

Property	HNSW	IVF
Storage	In-memory (Vector Memory Pool)	On-disk
Query speed	Sub-millisecond for moderate corpora	Single-digit ms with right partitioning
Build cost	Higher; graph construction	Lower; partition-based
DML support	Yes (26ai), transactionally consistent	Yes
Best fit	≤ few million vectors, latency-sensitive	Tens of millions+, memory-constrained
RAC behavior	Replicated on all instances	Distributed across instances
Tuning knobs	M, ef_construction, ef_search	nlist, nprobe

Pick HNSW by default; switch to IVF when the index doesn't fit in memory

The Vector Memory Pool is sized via parameter vector_memory_size. Monitor it. If HNSW pages start spilling, performance collapses and you should re-plan as IVF or shard the corpus.

Licensing

AI Vector Search is included in all editions of Oracle AI Database 26ai at no additional license cost. Standard Edition 2, Enterprise Edition, Autonomous Database, Exadata Cloud@Customer, Exadata Database Service, all include it. This is the single biggest commercial pivot from prior versions, where vector workloads required additional features or third-party tools.

What you actually pay for

CPU/memory of the DB hosting vectors. Plan extra memory for HNSW (rule of thumb: count × dim × 4 bytes × 1.5 overhead).
Storage for vectors. A 1024-dim FLOAT32 vector ≈ 4 KB. 10M vectors ≈ 40 GB. Add overhead for indexes.
Embedding generation. If you use OCI Gen AI for embeddings, you pay per character at the Gen AI rate. If you use ONNX in-DB, no per-call charge, just CPU.
RAC + Data Guard if you need HA. Standard DB licensing rules apply.

Sizing example

Corpus	Vectors	Storage (FLOAT32)	HNSW RAM ballpark
Internal wiki, 50K docs × 10 chunks each	500,000	~2 GB	~6-10 GB
Product catalog with descriptions + reviews	5,000,000	~20 GB	~60-100 GB
Legal corpus, fine-grained	50,000,000	~200 GB	HNSW won't fit; use IVF

Risks and gotchas

Risk	What goes wrong	Mitigation
Vector Memory Pool spill	HNSW degrades when index doesn't fit; latency blows up silently	Monitor `v$vector_memory_pool`; alarm on usage > 80%; pre-plan IVF migration path.
Re-embedding cost	Switching embedding models requires regenerating all vectors	Version the embedding model in metadata; batch re-embed; budget the LLM cost.
Chunking baked into the table	Bad chunk size hurts forever unless re-ingested	Store raw doc + chunks separately; design re-chunkability from day one.
RAC HNSW replication overhead	HNSW index duplicated on every instance; memory bloat at scale	For very large indexes on RAC, consider IVF or distribute across shards.
Quantization accuracy loss	FLOAT32→INT8 saves space but can shift top-K results	A/B test recall before adopting; keep one full-precision baseline.
Hybrid query plan surprises	Optimizer picks wrong order; vector predicate evaluated on too many rows	Use SQL hints, gather stats on vector columns, test with EXPLAIN PLAN.
ONNX model drift	Embedding model loaded into DB grows stale vs the OCI hosted version	Pin a model version per table; document upgrade procedure.
PII in vectors	Embeddings can leak the original text via inversion attacks	Treat vector columns as PII; protect with VPD; encrypt at rest (TDE on by default in Autonomous).

Use AI Vector Search when

Your source-of-truth data already lives in Oracle Database.
You need vector retrieval to respect existing row-level security.
You want to join vector similarity with relational, JSON, spatial, or graph predicates in one query.
You don't want to operate a separate vector DB.
You need on-prem inference (Exadata, Linux x86-64), 26ai is on-prem GA Jan 2026.

Skip and use something else when

Your data lives outside Oracle and pulling it in is impractical, use OCI OpenSearch or a Knowledge Base backed by Object Storage.
You need exotic ANN algorithms (DiskANN, ScaNN) that 26ai doesn't ship, go to a specialist vector DB.
You're a Postgres shop without Oracle, pgvector is fine for moderate scale.

Select AI GA

Natural language to SQL inside the database. PL/SQL package, four modes, multiple LLM providers, RAG-capable. Available in Autonomous and on-prem 26ai.

Official documentation ↗

TL;DR

Select AI lets users ask the database in plain English. Behind the scenes DBMS_CLOUD_AI sends the question plus schema metadata to an LLM (OpenAI, Cohere, Azure OpenAI, or OCI Gen AI), gets SQL back, and either runs it (runsql), shows it (showsql), explains the result (narrate), or chats (chat). Reported accuracy ~95% on TPC-H. Useful for analyst self-service. Not a replacement for hand-tuned queries on hot paths.

The four modes

Mode	What it returns	Typical use
`runsql`	Executes the generated SQL and returns rows	Self-service reporting for trusted users
`showsql`	Returns SQL text without executing	Analyst review before running; explainability
`narrate`	Returns SQL + natural-language explanation of results	Business-user dashboards, embedded BI
`chat`	General chat with the underlying LLM, no SQL focus	General-purpose assistant from within the DB

Provider integration

Select AI is provider-pluggable. You create an AI profile that names a provider (OpenAI, Cohere, Azure OpenAI, OCI Generative AI) and credentials, then attach it to a session. Switching providers is a config change, not a code change. Credentials live in Vault.

Where it fits in an enterprise

Good fit

Internal analytics self-service. Quarter-end ad-hoc questions. Sales ops, finance ops, customer support analytics. Embedded chat-with-data in APEX apps. Low-volume, knowledgeable users who can spot a wrong SQL.

Poor fit

Production OLTP queries (latency, predictability). External customer-facing chat (cost, security, schema leakage). Tables with unstable schemas or cryptic column names (the LLM gets confused). High-volume bursty workloads (cost spikes).

Risks specific to Select AI

Schema disclosure. The LLM sees your table and column names. If those reveal sensitive structure, scope it via grants and avoid passing schemas with regulated-data hints in their names.
Wrong SQL that runs. The model may produce SQL that returns wrong numbers without erroring. Prefer showsql for non-trivial questions and let a human approve.
Cost surprises. A natural-language question can produce a SQL that table-scans a fact table. Add query timeouts and resource manager plans.
Cross-database queries. Don't expect the model to understand database links or sharded topologies without explicit metadata coaching.

In-Database ONNX Embeddings

Load an embedding model into Oracle AI Database 26ai. Generate vectors with a SQL function. No network call, no API key, no per-character cost.

Official documentation ↗

The pattern

Most embedding pipelines call out to a hosted model (OCI Gen AI, OpenAI). That introduces latency, cost per call, and a data-leak surface. In-DB ONNX inverts the dependency: you load the embedding model into the DB once, then call VECTOR_EMBEDDING(text USING model_name) as a function in any SQL. Embeddings happen on the DB server.

Why architects care

No egress. Embedding data never leaves the DB box. Critical for regulated content.
No per-call cost. Pay for CPU you already own, not per million characters.
Lower latency on bulk re-embed. Eliminate network round-trip per chunk.
Simpler ops. No external service dependency in the embedding pipeline.

Trade-offs

Concern	In-DB ONNX	Hosted (OCI Gen AI)
Latency per call	Lower (no network)	Higher (network + service)
Cost per call	None, pay for DB CPU	Per character
Model freshness	You manage upgrades	Oracle maintains
Model selection	Anything in ONNX format ≤ size limit	Curated set
CPU pressure on DB	Yes, sizing concern	None
Compliance / sovereignty	Strongest (data never leaves)	Service-bound

Where it slots in

Figure · Path A is simpler ops, Path B is cheaper at volume and the only option for high-sovereignty data.

Don't embed on the hot path without thinking

Calling VECTOR_EMBEDDING inside an OLTP transaction will tax the DB CPU and burn redo. Embed at ingest time, store the vector, query the stored vector, same as you would with any external embedding pipeline.

Oracle AI Database Private Agent Factory 26ai

A no-code/private agent factory for enterprise data. Use it when business users or engineers need knowledge agents grounded in approved repositories, files, web sources, and Oracle Database data without exposing the workflow through a general-purpose SaaS chatbot layer.

Official documentation ↗

TL;DR

Private Agent Factory matters because it treats Oracle's database and enterprise repositories as the trust boundary. It includes no-code agent creation, pre-built assistants, prompt lab patterns, knowledge agents, approved data sources, embeddings, and private retrieval. This is the right pattern when you need grounded agents over enterprise content without pushing sensitive schema and documents into a separate chatbot platform.

Reference architecture

Figure · Private Agent Factory is strongest when SQL privileges, vector search, and audit logs must stay database-native.

Use it when

The corpus is private enterprise content: database rows, internal sites, file shares, SharePoint, Google Drive, or uploaded documents.
You need no-code agent creation for business users while preserving engineered controls around approved sources and model management.
You need explainable retrieval over documents and vectors without standing up a separate vector DB.

Do not use it when

The agent is primarily a Fusion process agent covered by Fusion Agentic Apps or AI Agent Studio.
The corpus is mostly non-Oracle documents in object stores and a managed OCI Generative AI Agent would ship faster.
You need a consumer-grade assistant UX with broad channels, analytics, and bot lifecycle tools; evaluate Oracle Digital Assistant or app-layer tooling.

Oracle Private AI Services Container 26ai

A lightweight containerized web service for Oracle AI Database 26ai that offloads expensive vector work outside the database: embedding generation and HNSW vector-index creation.

Official documentation ↗

TL;DR

Private AI Services Container is not a private LLM chatbot runtime. Current docs describe two services: a Vector Embedding Service and a Vector Index Service. It can run in your data center or cloud compute, does not require internet access, processes requests statelessly, and helps free database CPU/GPU capacity for search and transactional work.

Architecture decision

Question	Use in-DB ONNX / DB CPU	Use Private AI Services Container
Embedding volume is low or DB CPU is available	Simple and local	Probably unnecessary
Embedding/index creation is expensive	Can starve database resources	Offload work to external compute while storing vectors in Oracle AI Database
Need GPU-accelerated HNSW index creation	Limited by DB host capability	Use the Vector Index Service with NVIDIA GPU-backed compute
Need no-internet/private operation	Good if model already loaded in DB	Good: container can run without internet and is called by DBMS_VECTOR or REST clients
Need hosted chat / reasoning model	Not the right layer	Not the right layer; use OCI Gen AI, Private Agent Factory with configured LLMs, or a model gateway

Two services in the container

Service	What it does	How it connects
Vector Embedding Service	Generates embeddings outside the database and stores/uses them with Oracle AI Database similarity search.	Called from `DBMS_VECTOR` procedures such as `UTL_TO_EMBEDDING` / `UTL_TO_EMBEDDINGS`, or via REST/OpenAI SDK-style clients.
Vector Index Service	Offloads HNSW vector index creation to GPU-backed compute for faster index builds.	Referenced from `CREATE VECTOR INDEX` parameters that point at the container REST endpoint and API key.

Risks

Model freshness. You manage embedding model updates; stale embeddings quietly degrade retrieval quality.
Capacity sizing. Offloaded vector work shifts latency and throughput onto your container hosts.
Patch ownership. Treat the container like production infrastructure, not a demo appliance.
Endpoint security. Protect the container endpoint and API key; it can be invoked by database jobs or REST clients.
Audit consistency. Log embedding/index jobs, model versions, container version, target table/index, and caller.

OCI Vision GA

Pretrained and custom-trainable image analysis. Object detection, classification, OCR, document image understanding. API + Console + SDK.

Official documentation ↗

TL;DR

Two modes. Pretrained: call an API, get labels/boxes/text/faces. Cheapest, fastest, no setup. Custom: upload labeled images, train your own classifier or detector through the Console. Useful when off-the-shelf labels miss your domain (manufacturing defects, retail SKUs).

Capabilities

Capability	Pretrained	Custom training	Typical use
Object detection	Yes	Yes	Count items, locate defects, retail shelf scanning
Image classification	Yes	Yes	Tag content, route images by category
OCR (text in images)	Yes	-	Receipt scanning, signage extraction
Document image analysis	Yes	-	Forms, tables, overlaps with Document Understanding
Face detection	Yes	-	Privacy-aware face blur, attendance counting

Indicative pricing (verify on the OCI pricing page)

Pretrained image analysis is in the low-cents-per-thousand-images range. Custom model training is hourly per GPU-hour. Always check current numbers before committing.

When to use Vision vs Document Understanding

Rule

If the input is a document (PDF, invoice, form), start with Document Understanding: it's purpose-built for tables, forms, key-value. If the input is a scene (a photo of a shelf, a manufacturing line, a security camera frame), use Vision.

Risks

Custom-trained models drift as products and packaging change. Retrain quarterly or on accuracy degradation alarms.
OCR accuracy degrades on low-quality scans. Pre-process (deskew, contrast) before sending.
Face detection has compliance implications. Document the legal basis before deploying.

OCI Language GA

NLU primitives for text: sentiment, entity recognition, PII detection, key phrase extraction, language detection, classification, translation.

Official documentation ↗

TL;DR

Not an LLM. A set of classical NLP services with pretrained models, exposed as APIs. Cheap per call, deterministic outputs, easy to embed in pipelines. Use for the boring-but-essential text tasks where you don't need generation, PII scrubbing, sentiment scoring on tickets, language routing on multilingual input.

Capabilities

Capability	Use case
Sentiment analysis	Customer feedback triage, NPS-style scoring
Aspect-based sentiment	"The screen is great but the battery is poor" → screen+, battery-
Named entity recognition (NER)	Extract people, orgs, locations, dates
PII detection	Pre/post filter for LLM pipelines
Key phrase extraction	Auto-tag content
Language detection	Route multilingual tickets
Text classification	Custom-trainable category labels
Translation	Common language pairs; not best-in-class, fine for internal use

Language coverage

Most analytical features (sentiment, NER) cover English, Spanish, French, German, Portuguese, Italian out of the box. Coverage varies by feature, check the docs per service. For broader language coverage, pair with an LLM via Gen AI.

Where Language slots into Gen AI

The pattern that works: use Language as cheap pre/post filters around expensive LLM calls. Detect language to pick the right system prompt, scrub PII before sending to the model, classify intent to skip the LLM when a deterministic answer exists. This drops Gen AI cost by 30-60% on a typical customer-support workload.

OCI Speech GA

Speech-to-text (ASR) for audio files and streams. Multiple languages, speaker diarization, SRT/VTT output, profanity handling.

Official documentation ↗

Capabilities

Batch transcription of audio/video files in Object Storage.
Real-time streaming for low-latency captioning use cases.
Speaker diarization ("who spoke when") for call recordings and meetings.
Normalization of times, addresses, numbers, URLs in the output text.
Profanity filter: remove, mask, or tag.
SRT/VTT subtitle output for video.
Custom vocabulary for domain words the base model mis-hears.

Common architectures

Contact center analytics

Recordings land in Object Storage → Speech transcribes with diarization → Language extracts sentiment + entities → Gen AI summarizes the call → write back to CRM. End-to-end at <30¢ per call typically.

Meeting summarization

Teams/Zoom recording → Speech with diarization → Gen AI summarizes per speaker, extracts decisions, generates action items → write to Asana/Jira via tool call from an agent.

Risks

Background noise drops accuracy. Pre-process or use a dedicated noise-suppression step before transcription.
Diarization struggles with overlapping speakers. Document accuracy expectations to stakeholders.
Audio data residency matters more than most teams think, keep buckets in the right region.
Real-time streaming has stricter quotas; plan capacity before peak loads (e.g. live earnings calls).

xAI Voice on OCI Generative AI May 2026

Text-to-speech through the OCI Generative AI model layer. Treat it as output generation for voice agents, training narration, call-center assist, and accessibility workflows.

Official documentation ↗

TL;DR

OCI Speech is speech-to-text. xAI Voice is text-to-speech. Keep the distinction clear in architecture diagrams: Speech turns audio into text; xAI Voice turns model output or authored content into audio. Voice quality, latency, language support, and region availability must be tested in the exact OCI region you plan to use.

Reference pattern: voice agent response

Figure · Full voice loop: OCI Speech for input, Gen AI/agent for reasoning, xAI Voice for output.

Use it when

You need voice output from an Oracle-hosted Gen AI workflow without procuring a separate TTS vendor.
You are building an internal assistant, training-content generator, or call-center agent response path.
You can tolerate region/model availability checks and quality testing before launch.

Risks

Latency. Voice adds another model call after generation. Stream audio where possible.
Unsafe audio. Guardrail text before synthesizing. Audio moderation after generation is harder.
Voice consistency. Pin the voice/model choice and test regression on every catalog update.
Cost. Cache repeated announcements and training clips instead of regenerating.

OCI Document Understanding GA Generative extraction 2026

Extract text, tables, key-value pairs, signatures, and classifications from PDFs and document images. 2026 update added generative extraction for context-aware parsing.

Official documentation ↗

TL;DR

The boring backbone of most enterprise AI projects. Invoices, contracts, KYC packets, claims forms. Document Understanding handles OCR, layout, table detection, and key-value extraction. The 2026 generative extraction upgrade improves accuracy on free-form fields and complex tables by adding LLM-grade context reasoning.

Capabilities

Feature	What it does
Text extraction	OCR with layout preservation
Table extraction	Detect tables, extract as structured rows + cells
Key-value extraction	Pretrained for invoices, receipts, IDs; custom-trainable for your forms
Document classification	Route into the right downstream queue
Signature detection	Flag whether a signature is present in a region
Generative extraction 2026	LLM-backed extraction for ambiguous fields, free-form sections, multi-column layouts

Pricing structure

Charged per transaction (a page or a document, depending on the operation). First 5,000 transactions per month are free: useful for low-volume pilots and Always-Free tier exploration.

Reference pipeline

Figure · Reference document automation pipeline.

Risks

Custom KV models need labeled data, budget annotation time honestly.
Generative extraction is more accurate but slower and more expensive than classic OCR-only. Mix modes based on document type.
Tables with merged cells or nested headers still cause issues, sample your hardest documents in PoC.

OCI Anomaly Detection GA

Multivariate time-series anomaly detection using Oracle Labs' MSET2 algorithm. Trained on your historical normal-operation data; scores new observations as anomalous or not.

Official documentation ↗

Why this exists

Most enterprise anomaly problems aren't univariate (a single sensor spike). They're multivariate, "this combination of pressure, temperature, vibration, and current is unusual together even though no single value is out of spec." MSET2 (Multivariate State Estimation Technique) was developed at Oracle Labs for nuclear plant monitoring; it generalizes to manufacturing, fleet telemetry, and IT ops.

How it works

Train on a window of historical "normal" data, sensor values, KPI series, whatever is multivariate and time-aligned.
Score new observations, returns an anomaly score per timestamp and per signal.
Explain: identifies which signals are contributing to the anomaly.

Where it fits

Industrial

Production lines, turbines, HVAC, refrigeration. Multivariate sensor data already collected; just needs a model and a daily training refresh.

IT ops

Application telemetry, error rate, latency, throughput, GC time. Catches issues that single-metric alerts miss.

Risks

Concept drift. What was normal six months ago isn't now. Retrain on a rolling window.
Cold start. Needs enough clean historical normal data, typically weeks to months.
False positives. Tune detection sensitivity per use case; pair with operator runbook.
Not a forecasting service. Use OCI Forecasting if you need next-value prediction.

OCI Forecasting GA

AutoML for univariate and multivariate time-series. Pick a target series, optional exogenous regressors, and a horizon; the service auto-selects and trains a model.

Official documentation ↗

What it gives you

Point forecasts + prediction intervals.
Auto algorithm selection across classical (ARIMA, ETS) and ML (Prophet-style, gradient boosted) approaches.
Holiday and seasonality handling out of the box.
Multi-horizon forecasts.
Explanation of which features drive a forecast.

Common use cases

Domain	Series	Why Forecasting helps
Retail	Daily demand per SKU per store	Replenishment planning
Finance ops	AR / AP cash flow	Working capital forecasting
Workforce	Contact volume per skill per 30-min	Staff scheduling
Energy	Load per substation	Procurement and dispatch

Risks

Garbage in, garbage out, handle missing values and outliers upstream.
Forecasts are only as good as the regressors you provide; promos, holidays, pricing must be fed in if they drive the series.
Auto model selection isn't auto governance, log the chosen model + features per retrain cycle for audit.

Oracle AI Data Platform 2026

The governed data plane for AI teams. Use it to organize data products, catalogs, connections, metadata, workspaces, notebooks, and pipelines so agents and models do not each invent their own data access layer.

Official documentation ↗

TL;DR

AI projects fail when every assistant has its own copy of data, metadata, and permissions. Oracle AI Data Platform is the control layer for turning enterprise data into governed, reusable AI assets: catalogs, data products, agent-ready connections, notebooks, Spark workflows, and repeatable data pipelines.

Architecture role

Figure · AI Data Platform is not a model runtime; it is the shared data control plane for AI workloads.

When to use it

Situation	Why AI Data Platform helps
Multiple teams are building RAG over the same documents	Create governed, reusable vector/index assets instead of duplicate chunk/embed pipelines.
Agents need data from many Oracle and non-Oracle sources	Centralize connection, lineage, ownership, and refresh rules.
Data products already exist or are being formalized	Expose them to AI consumers with ownership and policy instead of raw tables/buckets.
Data engineering owns pipelines, app teams own agents	Separate concerns cleanly: Data Flow and catalog upstream, Enterprise AI Agents downstream.

Risks

Governance theater. Catalog entries without owners, refresh SLAs, and access rules do not help agents.
Pipeline duplication. Decide which datasets, catalogs, and metadata flows are authoritative and version them like APIs.
Latency. A shared data plane is not always the fastest path for hot transactional reads; keep OLTP queries close to the source DB.
Team boundaries. Data product owners must participate in AI design, or agents will be grounded on misunderstood data.

OCI Data Science GA

JupyterLab notebooks, MLOps pipelines, model catalog, model deployment, jobs, monitoring. Built on conda environments and an Operator pattern.

Official documentation ↗

TL;DR

The home of custom ML on OCI. Notebook sessions (JupyterLab) for exploration, Jobs for batch training, Pipelines for orchestration, Model Catalog for governance, Model Deployment for inference endpoints. If you're building a model from scratch, classical ML or fine-tuning an open-weight LLM, this is where you live. For pre-built FM deployment, use AI Quick Actions (it sits on top of Data Science).

Building blocks

Component	What it is	When you use it
Notebook sessions	Managed JupyterLab on VM or BM (CPU or GPU)	Exploration, prototyping, training scripts
Conda environments	Curated environments incl. PyTorch, TF, RAPIDS, LangChain, Oracle SDKs	Reproducible runtime
Jobs	Run notebooks or scripts on demand on chosen shapes	Batch training, scheduled retraining
Pipelines	DAG of jobs with input/output passing	Multi-step training and eval workflows
Model Catalog	Versioned registry with metadata, provenance, tags	Governance, audit, hand-off to deploy
Model Deployment	Managed HTTP endpoint with autoscale	Hosting custom models for inference
Model Monitoring	Drift, performance, schema integrity over time	Production health
Feature Store	Centralized feature definitions, online/offline	Multi-team ML at scale

Where it fits in the AI stack

Data Science is the "build your own" layer. If your problem is solved by Cohere or Llama as-is, you don't need Data Science, use Gen AI Service. If you need a custom model (classical ML, fine-tuned open-weight LLM, vision model, time-series), you do. Many enterprises end up using Data Science only for the 10-20% of use cases that don't fit a managed AI service, the rest go through Gen AI Service or AI Quick Actions.

Risks and ops realities

Idle notebook spend. Notebook sessions running on a GPU cost real money even when idle. Auto-stop policies are essential.
Environment sprawl. Teams customize conda environments; reproducibility erodes. Pin environments per project.
Model-to-production gap. Notebook code rarely runs cleanly as a Job. Budget time for the productionization step every project.
Compliance for training data. Training datasets often contain PII. Treat them with the same controls as the source-of-record DB.

AI Quick Actions GA Llama 4 + gpt-oss 2026

No-code foundation-model deployment and fine-tuning. Pick a model from a catalog, click Deploy, get an endpoint. Or pick a model, point at training data, click Fine-tune.

Official documentation ↗

What's in the model catalog (June 2026)

Meta Llama 4: Scout, Maverick.
Meta Llama 3.x: including Llama 3.2 90B Vision Instruct.
OpenAI open-weight: gpt-oss-120b, gpt-oss-20b.
Phi, Falcon, Mistral, Granite, pre-cached, faster cold start.
Bring-your-own from Hugging Face, direct import.

What it actually saves you

If you've ever deployed an open-weight LLM on raw GPU instances, you've burned days on Docker images, vLLM/TGI tuning, autoscaling, health checks, log shipping. AI Quick Actions does all of that with one click. You give up some flexibility (you can't pick exactly which serving runtime version, for instance) for an order of magnitude faster time-to-endpoint.

When AI Quick Actions vs Gen AI Service

Question	Answer
You want a managed endpoint with no infra	Gen AI Service
You need a model not in Gen AI catalog (e.g. Mistral, Falcon, gpt-oss)	AI Quick Actions
You need fine-tuning with custom data	AI Quick Actions OR Gen AI dedicated cluster
You need on-demand pay-per-token	Gen AI Service (AQA is dedicated GPU)
You need to import from Hugging Face	AI Quick Actions
You need to keep the model entirely in your tenancy	AI Quick Actions (dedicated by definition)

Risks

Dedicated GPU cost, deployment runs hourly regardless of traffic. Auto-shutdown for dev/test environments.
Model size vs shape, Llama 4 Maverick won't fit on a single GPU; AQA picks the right shape, but you should understand the floor cost.
Fine-tuning quality depends on data quality more than algorithm choice. The clicky UI doesn't change that.

Fusion Agentic Applications GA · Mar 2026

22 pre-built AI agents embedded inside Oracle Fusion Cloud. Native to the transactional system. Governed by Fusion roles, approvals, and data. CX expansion in Apr 2026.

Official documentation ↗

TL;DR

This is Oracle's biggest 2026 application-layer announcement. Twenty-two agentic apps that live inside Fusion ERP, HCM, SCM, and (since Apr 2026) CX. They are not chatbots, not copilots, and not add-ons. They run inside the transactional system, see the same data and approval hierarchies users see, and execute work autonomously when allowed. If you run Fusion, you already paid for this, pilot it before you build anything custom.

What "agentic" means here

Oracle's definition: outcome-driven, proactive, reasoning, and engineered for enterprise execution. Concretely, an agentic app does four things a copilot doesn't: it (1) initiates work without a user prompt, (2) plans across multiple steps and tools, (3) executes within the transactional system using existing roles, and (4) measures outcomes back. The boundary between "agent" and "automation" is fuzzy, but the integration depth is the meaningful difference.

Where the 22 agentic apps sit (representative: Oracle keeps adding)

Pillar	Example agentic apps
Finance	Procure-to-Pay agent · Expense intake agent · Period-close anomaly investigation · Collections triage
HR / HCM	Workforce Operations agent (scheduling, payroll issue triage) · Recruiting assistant · Performance review summarization · Time-off conflict resolver
Supply chain	Demand-supply imbalance investigator · Supplier risk monitor · Logistics exception handler · Quality issue triage
CX Apr 2026	Sales next-best-action · Service case summarization & resolution · Marketing campaign optimizer

Why this changes build-vs-buy math

Before Fusion Agentic Apps, an enterprise wanting a "smart period-close" agent had to (a) buy or build an LLM platform, (b) integrate it with Fusion Financials data, (c) replicate role permissions, (d) wire it into approval workflows, (e) operate it. Now Oracle ships steps (a)-(e) as a configured agentic app under your existing Fusion subscription. The build case has to clear a much higher bar.

For large enterprise Fusion customers

Run a pilot on 1-2 agentic apps in a sandbox. The TCO conversation versus building custom on OCI Gen AI is now lopsided whenever the use case maps to a Fusion agent. Treat custom builds as the exception, not the default.

Risks

Change management. Agentic apps execute work, that means human approvers see different workflows. Governance and communications matter more than the tech.
Configuration drift. Each agentic app has settings. Track them per environment and put them under change control.
Data quality exposed. Agents reason on Fusion data. If your master data is messy, agents are less useful. Fix data first.
Vendor coupling. Deeper dependency on Fusion. Be intentional about which agents you adopt and which you keep optionality on.

AI Agent Studio for Fusion Expanded Mar 2026

Build, connect, and orchestrate agents that work alongside Fusion. Includes the Agentic Applications Builder, content intelligence, contextual memory, ROI measurement, and workflow tools. Included with Fusion subscriptions at no extra cost.

Official documentation ↗

What you can build

Custom agents on top of Fusion data and APIs, using Oracle, partner, or external agents as building blocks.
Agentic applications (workflows of agents) via the no-code Agentic Applications Builder.
External integrations: pull in Slack, Teams, Microsoft 365, ServiceNow, and similar.
ROI dashboards: measure agent impact (time saved, cycle time, decision accuracy).

Studio vs OCI Generative AI Agents: what's the difference?

Question	AI Agent Studio (Fusion)	OCI Generative AI Agents
Audience	Fusion customers and Fusion partners	Any OCI customer building agents
Data integration	Native to Fusion data + roles	Object Storage, OpenSearch, 23ai
UX	Low-code/no-code builder	API + SDK + Console
Cost	Free with Fusion subscription	Pay per use (model + retrieval costs)
Best fit	Workflows touching Fusion records	RAG over enterprise corpora outside Fusion

They are complementary, not redundant. A bank could use OCI Generative AI Agents for an internal policy chatbot over a SharePoint-style document corpus, and AI Agent Studio for finance-close agents that operate inside Fusion ERP.

Oracle Digital Assistant GA

Oracle's conversational-assistant platform for enterprise channels. Use it when the hard problem is bot lifecycle, skills, channel delivery, and human-agent handoff rather than raw model prompting.

Official documentation ↗

TL;DR

Oracle Digital Assistant (ODA) is still relevant in the GenAI era. It gives you channel adapters, skills, conversation flows, analytics, human handoff, and Fusion/Oracle app integration. OCI Generative AI Agents can power the intelligence behind a bot; ODA is often the front door and lifecycle layer for chat experiences.

ODA vs Enterprise AI Agents

Question	Oracle Digital Assistant	OCI Enterprise AI Agents
Primary job	Conversation UX, channels, skills, routing, handoff	LLM reasoning, RAG, tools, vector stores, responses API
Best user surface	Web chat, mobile, messaging, service channels, app-embedded assistants	API-driven agents embedded into apps, workflows, or custom UIs
Human handoff	First-class pattern	Design through tools/workflows
Knowledge grounding	Can integrate LLM/GenAI capabilities into skills	Native knowledge bases/vector stores
Best fit	Customer/service bot with channels and lifecycle management	Enterprise RAG/tool agent behind one or more apps

Reference pattern

Figure · ODA is the conversation/product layer; Enterprise AI Agents are the reasoning and retrieval layer.

Risks

Wrong layer. Do not rebuild RAG plumbing in ODA skills when Enterprise AI Agents already provides it.
Channel complexity. Each channel has identity, session, and attachment quirks; test the exact deployment channel.
Handoff design. Human handoff must include transcript, context, model answer, and source citations, not just "transfer to agent."

APEX AI GA 24.2 RAG & AI configs

Low-code AI inside Oracle APEX. AI Assistant for developers and end users, AI-driven data modeling, dynamic-action generative text, AI Configurations + RAG Sources, vector search integration.

Official documentation ↗

TL;DR

If you build internal apps with APEX, AI is no longer something you bolt on. AI Configurations let you define a system prompt + model + RAG sources once, then reuse across pages. Dynamic Actions "Show AI Assistant" and "Generate Text with AI" embed chat and generation in two clicks. Search Configurations wire 26ai Vector Search into your search pages without writing the SQL yourself. For Oracle-shop developers, this is the fastest path from "we have a wiki" to "we have a RAG chatbot over our wiki" in production.

What APEX 24.2 adds (relevant to AI)

Feature	What it does
AI Configurations (Shared Component)	Bundle system prompt + welcome message + RAG sources. Reuse across pages.
RAG Sources	Point at 23ai Vector Search tables, REST endpoints, or APEX queries.
Show AI Assistant (Dynamic Action)	Chat panel using the chosen AI Configuration.
Generate Text with AI (Dynamic Action)	Generate content on demand from a user prompt + template.
AI-Driven Data Modeling	Describe a model in plain English; APEX generates tables, sample data.
Search Configuration with Vector Search	Add semantic search to APEX page items without hand-writing SQL.
APEX_AI PL/SQL package	Programmatic access from PL/SQL when the dynamic actions aren't enough.

Provider support

APEX talks to OCI Generative AI, OpenAI, Cohere, and Azure OpenAI through provider configurations. The AI Configuration abstracts the provider, apps don't change when you swap.

Reference pattern: RAG chatbot in APEX in a day

Create a VECTOR column on your content table in 23ai/26ai. Populate it (in-DB ONNX or Cohere via DBMS_VECTOR_CHAIN).
Create a RAG Source pointing to that table.
Create an AI Configuration with a system prompt + that RAG Source + your preferred provider.
On any page, add the "Show AI Assistant" dynamic action bound to that configuration.
Ship.

For Apps DBAs and Oracle-shop developers

APEX AI is the lowest-effort way to add a working RAG agent to an internal tool. If you can write a SQL query and click in a Console, you can ship one in a day. The hard part, chunking, embedding pipeline, RAG orchestration, is hidden behind shared components.

Risks

Provider credentials live in APEX Web Credentials, protect them like any other secret.
Embed cost is real if you store all your content in the DB and embed via OCI Gen AI. Use in-DB ONNX where possible.
Audit trail of LLM calls isn't automatic, log via APEX_AI calls to a log table for compliance.

MySQL HeatWave GenAI GA

In-database LLMs, automated vector store, lakehouse access, and natural-language chat, all inside MySQL HeatWave. Multilingual, JavaScript-callable, VLM-enhanced PDF parsing as of MySQL 9.4.2.

Official documentation ↗

TL;DR

For MySQL shops, HeatWave GenAI is the analog of what 26ai is for Oracle DB shops: vector search + LLM access without leaving the database. Differences: HeatWave bundles in-database LLMs (you don't have to call out), it has tight Object Storage ingestion that auto-parses PDFs/PPTs/HTML/DOC, and it integrates with HeatWave Lakehouse so you can query non-MySQL data alongside MySQL data. The lakehouse + vector store combination is the most differentiated part of the offering.

Components

Component	What it is
In-database LLMs	Models that run inside HeatWave for generation, summarization, chat, no external API call required
Vector store	Inbuilt store for embeddings + similarity search
Automated ingestion	Parses PDF (incl. scanned), PPT, TXT, HTML, DOC from Object Storage; chunks; embeds; loads
VLM-based PDF parsing	Vision-Language-Model enhanced extraction for complex PDFs (tables, charts). Added MySQL 9.4.2.
Lakehouse Navigator	UI to browse MySQL + Object Storage data, load into vector store
JavaScript stored programs	Invoke GenAI from JS inside HeatWave; preprocess SQL data, call LLMs, post-process
Multilingual	Supports 24+ languages across the GenAI APIs

When to choose HeatWave GenAI over OCI Gen AI + 23ai

You already run MySQL HeatWave for analytics.
Your source data is heterogeneous (MySQL + Object Storage + S3) and you want lakehouse ingestion.
You want LLM inference inside the database (no egress to an external service).
You don't need Oracle Database features (PL/SQL, VPD, RAC).

Risks

Sizing, in-DB LLM inference is GPU-intensive on the HeatWave node. Pick shapes deliberately.
Available model family inside HeatWave is narrower than OCI Gen AI's catalog.
HeatWave is OCI-first; some features are not available on AWS or Azure deployments of HeatWave.

AI Infrastructure: GPU shapes & networking

What you actually rent when you need raw inference or training capacity. NVIDIA H100 / H200 / B200 / GB200 NVL72, RDMA cluster networks, dedicated regions, sovereign deployments.

Official documentation ↗

TL;DR

Oracle's GPU story is unusually strong because of long-standing NVIDIA collaboration and aggressive supply commitments. H100/H200/B200 bare-metal shapes plus GB200 NVL72 Superclusters (announced expansion at GTC 2026) are available in commercial, gov, classified, sovereign, and dedicated regions. For most enterprise AI, you don't touch these directly, Gen AI Service, AI Quick Actions, and Fusion abstract them away. You care about GPU shapes when (a) you're fine-tuning a 70B+ model, (b) you're hosting custom inference, or (c) you're doing frontier training.

Shape family (illustrative: confirm in OCI Compute docs)

Shape family	GPU	Per node	Typical use
BM.GPU.A100.8	NVIDIA A100 80 GB	8 GPUs · NVLink	Mature training and inference baseline
BM.GPU.H100.8	NVIDIA H100 80 GB	8 GPUs · NVLink	Default for fine-tuning 70B-class models
BM.GPU.H200.8	NVIDIA H200 141 GB	8 GPUs · NVLink	Long-context inference, larger models in fewer nodes
BM.GPU.B200	NVIDIA B200	Blackwell-class	New-generation inference and training
GB200 NVL72 Supercluster	NVIDIA GB200 NVL72	Rack-scale	Frontier training, very large model serving

Cluster networks & RDMA

For multi-node training, GPUs talk to each other faster than they talk to anything else. OCI Cluster Networks use RDMA over Converged Ethernet (RoCE) with very low latency and high bandwidth between bare-metal GPU nodes. If you're training a model that doesn't fit on one node, this is the lever that determines wall-clock time.

Sovereignty and region matrix

Region type	Notable AI availability
Commercial OCI (50+ regions)	Full Gen AI catalog, GPU shapes, Data Science
US Gov Cloud	OCI Gen AI GA, full service set
US Classified Cloud	OCI Gen AI GA (May 2026), select services
UAE Central (Abu Dhabi)	OCI Gen AI GA (May 2026)
EU Sovereign	OCI Gen AI subset, full data residency
Dedicated Region (DRCC)	Full OCI in your data center, including AI services where licensed

Where Oracle has a real edge

For workloads that absolutely cannot leave a jurisdiction (sovereign, gov, classified, regulated industries), Oracle is often the only hyperscaler with GA Gen AI in the right region. Don't underweight this for procurement.

Architecture Patterns

Five reference patterns that cover most enterprise Oracle AI projects. Each names the services, the data flow, and the failure modes.

Pattern 1: Internal RAG chatbot over enterprise documents

Pattern 1 · The most common Oracle RAG architecture in 2026.

Pattern 2: In-database RAG inside an APEX app

For Oracle-shop teams that already use APEX, this collapses the stack dramatically. The data, the embeddings, the search, the chat UI, all inside the database and APEX. Provider call out to OCI Gen AI for generation only. Fastest time-to-prod for internal tools.

Pattern 2 · APEX + 26ai = minimal-moving-parts RAG. Ideal for internal tools.

Pattern 3: Document automation pipeline

Invoices, claims, contracts arrive as PDFs and need to become structured records. Doc Understanding extracts, validation rules check, exceptions route to humans, results land in the system of record. Add a Gen AI step to summarize or classify when needed.

Pattern 4: Fusion-native agentic workflow

For Fusion customers, this is now the default. Pick an agentic app, configure thresholds and approval routing, monitor outcomes. Custom logic goes into AI Agent Studio. Custom data integrations via Fusion REST APIs or OIC. Almost never needs to call OCI Gen AI directly, the agent uses the embedded LLM.

Pattern 5: Custom fine-tuned model deployment

Niche but real. You have labeled data and a use case where a 7B-13B fine-tuned open-weight model beats prompting a frontier model on cost and accuracy. Pipeline: Data Science notebook for preparation → AI Quick Actions or Gen AI dedicated cluster for fine-tuning → Model Deployment endpoint → integrate via your app. Reserve this for cases where you've already proven a managed model doesn't work.

A pattern most teams skip too long

Many Oracle customers default to Pattern 5 (custom) when Pattern 1 or 4 would have shipped in weeks. The bias toward "we'll build it ourselves" wastes quarters. Always justify why a managed pattern doesn't work before going custom.

Decision Matrix

Quick answers to the questions that come up in every architecture review.

Do I use OCI Generative AI Service or direct model APIs?

Use OCI Gen AI when you need OCI-native IAM, private networking, billing, audit, guardrails, and a supported or importable model fits. Go direct when you need an exact model/version that is not available in your target OCI region.

Vector store: 26ai or OpenSearch or a third-party vector DB?

26ai if data is already in Oracle DB or you need joins with relational/JSON/spatial/graph predicates. OpenSearch if you have a search team operating it. Third-party only when you need a retrieval engine or managed ecosystem Oracle does not provide.

On-demand, dedicated AI cluster, or imported model?

On-demand for pilots and bursty workloads. Dedicated for predictable volume, isolation, fine-tuning, or regulated data. Imported compatible models when your model is not in the OCI managed catalog but you still want OCI endpoint/governance patterns.

Build a custom agent or use Fusion Agentic Apps?

If you run Fusion and the use case maps to one of the 22 apps, use Fusion. Custom build only when no Fusion app exists or your scenario is fundamentally outside Fusion's data.

Enterprise AI Agents or AI Data Platform first?

Use Enterprise AI Agents first when one team needs one agent quickly. Use AI Data Platform first when many teams need reusable governed catalogs, data products, lineage, RBAC, notebooks, and workflow ownership.

Embed via in-DB ONNX or via Gen AI Service?

In-DB ONNX for high-volume corpora, sovereignty needs, or zero-egress requirements. Gen AI Service for best-quality embeddings and low operational burden. Often a mix: in-DB for bulk, Gen AI for occasional re-embeds.

Enterprise AI Agents or roll-your-own RAG?

Enterprise AI Agents unless you need non-standard retrieval, graph RAG, custom ranking, or a non-Oracle model gateway. Then build with Gen AI Service + 26ai/OpenSearch and keep the agent framework thin.

AI Quick Actions or Data Science Model Deployment?

AI Quick Actions if the model is in its catalog or on Hugging Face. Data Science Model Deployment if you have a custom-trained model that isn't an LLM.

Select AI for analytics?

Yes for knowledgeable internal analysts on Autonomous DB. No for customer-facing or production-OLTP queries.

APEX AI vs custom front-end?

APEX AI for internal tools and rapid POCs. Custom front-end when you need pixel-perfect UX or external-facing branding.

Private Agent Factory or OCI Enterprise AI Agents?

Private Agent Factory when the trust boundary is the Oracle Database and you want database-native security/audit. Enterprise AI Agents when the trust boundary is OCI and you need broader managed tools, vector stores, and hosted APIs.

Need to run vector AI privately?

26ai on Exadata or Linux x86-64 gives you vector search + in-DB ONNX embeddings. Private AI Services Container offloads embedding generation and HNSW index creation while keeping vectors tied to Oracle AI Database. DRCC is the larger cloud-in-your-data-center option for broader OCI services.

Pricing & Cost Control

Where the money actually goes, and the levers that move it.

Verify exact rates

Numbers here are descriptive of cost behavior, not procurement quotes. Always check the OCI pricing pages, ask your account team for current rates, and rebuild the model in your own cost calculator before signing anything.

Cost drivers by service

Service	Unit	What inflates the bill
OCI Gen AI on-demand	Per 1M characters	System prompt bloat, oversized RAG context, retries, naive token-by-token streaming logging
OCI Gen AI dedicated cluster	Per hour	Cluster idle outside business hours, over-provisioned units, dev/test left running
Imported compatible models	Dedicated serving + storage	Wrong shape, oversized context windows, duplicate model copies across compartments
AI Vector Search	DB CPU + memory + storage	HNSW Vector Memory Pool sizing, re-embeds, dense quantization, RAC HNSW replication
Enterprise AI Agents	Underlying Gen AI + retrieval + hosted tools	Long sessions retained, large KBs over-ingested, no rerank cap, tool loops
xAI Voice / TTS	Generated audio output	Regenerating static content, long answers, no audio cache
Oracle AI Data Platform	Platform resources, Spark/workflow runs, workspaces, storage	Duplicate catalogs, unnecessary reprocessing, stale pipelines rerun in full
OCI Vision	Per 1000 images / GPU-hour	Custom training reruns, full-fidelity images when down-sampled would work
OCI Document Understanding	Per transaction (first 5K free/mo)	Re-running on bad PDFs, generative-extraction mode by default
OCI Speech	Per audio-minute	Real-time streams left open, retries on noisy audio
OCI Language	Per API call	Calling Language inside per-token pipelines when batched would do
Data Science	Notebook session VM/GPU hours, Jobs, Model Deployment	Idle GPU notebooks, dev model deployments running 24/7
AI Quick Actions	Dedicated GPU hours	POC deployments forgotten, oversized shapes
Private AI Services Container	Private compute + storage + ops	Under-sized hosts, unmanaged embedding-model updates, duplicated environments
Fusion Agentic Apps	Included with Fusion subscription	Included, no marginal cost beyond the model usage in the underlying Gen AI Service if you customize
AI Agent Studio	Included with Fusion subscription	Same, no marginal cost
HeatWave GenAI	Per HeatWave node (GPU shapes)	Wrong shape for in-DB LLM inference

The cost-control checklist

Token / character budgets per session. Hard cap. Alarm on breach. Kill on hard breach.
Rerank to top-K. Reduce context size by 60-80% with no quality loss in most RAG.
Prompt caching. Where the model supports it, cache the system prompt.
Audio caching. For xAI Voice, store repeated greetings, disclosures, and training clips rather than regenerate.
Idle-shutdown on every notebook session, every dev Model Deployment, every dev AQA deployment.
Tag everything. OCI cost tracking by tag is the only way to attribute spend across teams.
Quotas. Service limits + Resource Manager quotas prevent a runaway agent from consuming a quarter of budget overnight.
Two-tier deployments. Cheaper model for routing/classification, expensive model only on the path that needs it.
Egress. Keep callers inside OCI when calling Gen AI at volume. Outbound from OCI adds up.

Indicative cost shape (illustrative, not a quote)

Workload	Dominant cost	Order of magnitude
Pilot RAG chatbot, 1K queries/day	Gen AI on-demand per character	Tens to low hundreds USD/month
Production internal chatbot, 50K queries/day	Gen AI on-demand + reranker	Low thousands USD/month
High-volume customer-facing assistant, 1M queries/day	Gen AI dedicated cluster	Tens of thousands USD/month per cluster
Fine-tuned model serving sustained traffic	Dedicated cluster + storage	Tens of thousands USD/month per cluster
Voice-enabled agent	Gen AI text + xAI Voice TTS + audio storage	Similar to chatbot cost plus generated-audio usage
Document automation, 100K pages/month	Doc Understanding transactions	Hundreds to low thousands USD/month
26ai vector search on Exadata	DB CPU/memory	Often $0 incremental over existing DB licence

Risks & Gotchas

The honest list. The stuff you want to hear before pilot, not after go-live.

Model and provider risks

Risk	What happens	What to do
Model deprecation	App breaks when a model retires	Abstract model name; test against the family's next-gen early; monitor release notes.
Deprecated Gen AI APIs	Legacy text-generation integrations stop working after API retirement windows	Use current chat/responses SDK paths; inventory old GenerateText/SummarizeText-style calls before June 2026.
Catalog drift	Models available in one region but not another	Document the model-region matrix; revisit at every release; pin a fallback model.
Pricing changes mid-contract	Per-character rates change; budget overruns	Negotiate dedicated commitments for predictable spend; alarm on weekly rate change.
Vendor strategy shifts	A model is dropped from OCI catalog for partnership reasons	Treat model identity as a config; have a tested second choice.
"Best" model isn't on OCI	Stakeholders ask "why not Claude/GPT?"	Document the multi-criteria choice openly; price out hybrid (OCI gov regions + direct API for some workloads).

RAG quality risks

Risk	What happens	What to do
Bad chunking	Retrievals miss the answer that's actually in the corpus	Pilot chunk sizes; structural chunking over fixed-length; measure recall against a gold set.
No reranker	Vector top-5 contains noise, LLM hallucinates around it	Add Cohere Rerank 4 as standard.
Stale knowledge base	Index lags source-of-truth changes	Schedule + alarms on ingestion lag; expose "Last refreshed" to users.
Hallucinated citations	Answer claims a chunk supports it when it doesn't	Render source chunks alongside; post-hoc verification step for high-stakes outputs.
Duplicate AI data pipelines	Different teams transform or ingest the same corpus differently and get inconsistent answers	Use AI Data Platform or a shared data-product process to declare authoritative datasets, owners, RBAC, and refresh rules.

Security risks

Risk	What happens	What to do
Prompt injection from documents	Retrieved doc instructs the model to override system prompt	Defensive system prompt; mark retrieved chunks explicitly as untrusted; classify chunks for adversarial content.
PII leakage	Model echoes PII back to wrong user	PII scrub on input + output; per-user data isolation in retrieval; audit log of prompt + completion.
Cross-tenant leakage in SaaS	Cached completions surface across tenants	Per-tenant prompts; no global cache; tenant-scoped sessions.
Schema disclosure via Select AI	LLM sees sensitive column names	Grant minimally; avoid sensitive hints in column names; review queries before runsql.
Vector inversion attack	Embeddings reverse-engineered to recover text	Treat vectors as PII; protect with VPD; TDE at rest.
Private vector container drift	Container embedding model or index service falls behind the database/vector-search design	Patch and test Private AI Services Container on a monthly cadence; track embedding model, container version, and index parameters in audit logs.

Operational risks

Risk	What happens	What to do
Cost runaway	Recursive agent or buggy loop burns a budget overnight	Per-session budgets; max-step caps; alarms on cost-per-session anomalies; kill switch.
Quota throttling	503s during peak, customer pain	Pre-warm; raise quotas before peaks; move hot workloads to dedicated.
Audit gaps	Cannot prove what was said in regulated context	Log prompt + completion + model + version + user; retain per policy; index for review.
Model output schema breaks	Tool call args mis-formatted	Strict JSON schema; validate before dispatch; fallback to clarifying turn.
Notebook GPU left idle	$thousands wasted per month per team	Auto-stop after N minutes; weekly idle report; tag-based chargeback.

Strategic risks

Vendor lock-in. Going deep on Fusion Agentic Apps deepens Fusion lock-in. Worth doing where the apps fit, but be explicit about which workloads stay portable.
Skills. Oracle AI requires SQL/PL/SQL + cloud + AI knowledge. The unicorn engineer who has all three is rare. Plan training; pair Apps DBAs with data scientists.
Pace of change. Oracle ships new models monthly. Architectures that assume model stability age fast. Build for swappability.
Realistic accuracy expectations. 95% accurate is great in lab, terrible if a 5% wrong tax filing creates regulatory exposure. Match accuracy expectation to consequence.

OCI vs AWS vs Azure vs GCP: AI services

Honest four-way side-by-side as of June 24, 2026. Not Oracle marketing, not anyone's marketing. Names and prices move monthly, so verify in each console before you commit.

TL;DR

All four are now broad model platforms plus a managed agent runtime plus a governance layer. The real differences in mid-2026 are (1) who owns the frontier model (AWS has the Anthropic relationship and Amazon Nova; Microsoft Foundry has OpenAI and Phi; Google owns Gemini outright via DeepMind; OCI owns none and instead resells Cohere, xAI, Meta, NVIDIA and lets you import the rest), (2) data gravity (OCI wins when your system of record is Oracle Database or Fusion; Google wins when it is BigQuery), and (3) sovereignty (OCI has the widest GA Gen AI footprint across gov, classified, EU sovereign, and GCC; Google now runs Gemini fully air-gapped on-prem via Google Distributed Cloud; AWS only opened its European Sovereign Cloud in Jan 2026 with a thin model catalog). Two naming changes to know: Azure AI Foundry is now Microsoft Foundry (effective Jan 1, 2026), and Vertex AI is now the Gemini Enterprise Agent Platform (Google Cloud Next, Apr 2026).

Generative AI platforms

Aspect	OCI Enterprise AI / Gen AI	AWS Bedrock	Microsoft Foundry (was Azure AI Foundry)	Google Cloud (Gemini Enterprise, was Vertex AI)
Frontier / foundation models	Cohere, Meta Llama 4, xAI Grok, NVIDIA Nemotron 3, Gemini options; imported Qwen, Gemma, OpenAI gpt-oss	Anthropic Claude (Opus 4.8/4.7, Sonnet 4.6, Haiku 4.5), Amazon Nova 2, Meta Llama, Mistral, DeepSeek, Qwen, Cohere, and OpenAI GPT-5.5/5.4	OpenAI GPT-5 family + GPT-5.5, Anthropic Claude, Meta Llama, Mistral, xAI Grok, Microsoft Phi	Google Gemini 3 Pro / 3 Flash (in-house); 200+ in Model Garden incl. Anthropic Claude, Meta Llama, Mistral, open Gemma
Owns a frontier model?	No (platform/aggregator strategy)	Anthropic stake; Amazon Nova in-house	OpenAI partnership; Phi in-house SLMs	Yes - Gemini, built in-house by Google DeepMind
Managed agent runtime	OCI Enterprise AI Agents (GA Mar 2026): RAG, tools, vector stores, Responses-style API, governance hooks	Bedrock AgentCore (GA Oct 2025) + Managed Harness (GA Apr 2026): runtime, gateway, memory, identity, policy, code interpreter, browser, evals	Foundry Agent Service: GPT-5 family, model router, built-in browser automation and MCP tools	Gemini Enterprise Agent Platform: ADK (stable v1.0), Agent Engine runtime + Memory Bank, Agent Studio, A2A protocol v1.0
Vector in source DB	Oracle AI Database 26ai native VECTOR type + HNSW/IVF; hybrid RAG exposed as MCP tool	Aurora / RDS pgvector; Aurora DSQL vectors where applicable; OpenSearch	Azure AI Search; Azure SQL / SQL Server vector; Fabric	AlloyDB AI (pgvector + ScaNN), BigQuery vector search, Spanner vector
Guardrails / safety	Enterprise AI Governance + native guardrails (prompt + response eval, Mar 2026) + Language PII filters	Bedrock Guardrails (mature) + Automated Reasoning checks	Azure AI Content Safety + Foundry guardrails/evaluations	Model Armor AI firewall (prompt-injection, data-leak, content filters; multi-model)
Shared AI data plane	Oracle AI Data Platform + OCI Resource Analytics (Jun 2026)	SageMaker Unified Studio + Glue / S3 Tables	Microsoft Fabric + OneLake + Foundry	BigQuery + Vertex + Dataplex governance
Model routing / cost control	Flexible model routing (work across models, not one-size-fits-all)	Per-model selection; intelligent prompt routing on Bedrock	Model router: Quality / Cost / Balanced modes, up to ~60% inference savings	Vertex model selection; per-agent pricing in the Gemini Enterprise product
Apps integration	Fusion Agentic Apps native (22 agents, ERP/HCM/SCM/CX)	Amazon Q Business, Amazon Connect	Microsoft 365 Copilot, Dynamics 365	Google Workspace (Gemini in Docs/Gmail/Sheets), Workspace Studio
On-demand pricing unit	Per 10,000 transactions (characters) for several models; token-based for newer ones	Per 1M input/output tokens	Per 1M input/output tokens	Per 1M input/output tokens
Developer tooling maturity	Improving fast but still behind on breadth	Mature (Bedrock Studio, SageMaker, AgentCore)	Mature (Foundry portal, VS Code, GitHub)	Mature (Vertex / Gemini platform, Colab Enterprise, Workbench)

How to read the model row

The headline in 2026 is that all four sell each other's neighbors. OpenAI's open gpt-oss models run on OCI and Bedrock. Anthropic Claude runs on Bedrock, Microsoft Foundry, and Google Cloud. The one model that stays first-party is Google's Gemini, which you only get on GCP (with narrow exceptions where OCI exposes it). The lock-in is no longer the model. It is the data plane, the agent runtime, and the governance model around it.

Model availability, side by side (June 2026)

Model family	OCI	AWS Bedrock	Microsoft Foundry	Google Cloud
Google Gemini 3 (Pro / Flash)	~ limited, where exposed	✗	✗	✓ first-party flagship
Anthropic Claude (Opus 4.8 / Sonnet 4.6 / Haiku 4.5)	✗ not first-party (call direct)	✓ flagship	✓ available	✓ Model Garden
OpenAI GPT-5 / GPT-5.5 (hosted API)	✗	~ GPT-5.5 / 5.4 added	✓ primary	✗
OpenAI gpt-oss (open weights)	✓ import + AI Quick Actions	✓	✓	✓ self-deploy
Cohere Command / Embed 4 / Rerank 4	✓ strategic partner	✓	~ partial	~ Model Garden
Meta Llama 4 (Scout / Maverick)	✓	✓	✓	✓
xAI Grok (4.x)	✓	~ select	✓	✗
NVIDIA Nemotron 3 (Nano Omni / Ultra)	✓ dedicated clusters	~	~	~ via NIM
Amazon Nova 2	✗	✓ in-house	✗	✗
Microsoft Phi	✗	✗	✓ in-house	✗
Alibaba Qwen / Google Gemma (open)	✓ import	✓	✓	✓ Gemma in-house

✓ = first-party / managed · ~ = partial, region-limited, or recently added · ✗ = not native (use direct API or a gateway). Always confirm exact model IDs and regions in the console.

Pricing, normalized (representative, June 2026)

Read this before the table

Comparing list prices across clouds is a trap. OCI bills several on-demand models per 10,000 transactions (characters), while AWS, Azure, and Google bill per token. Roughly, 1 token ≈ 4 characters in English, so 10,000 characters ≈ 2,500 tokens, but this varies by language and tokenizer. Note Google charges Gemini 3 Pro at a higher rate once a request crosses 200K input tokens. The figures below are representative list prices pulled from vendor and third-party pricing pages in June 2026, normalized to USD per 1M tokens (input / output) where possible. Treat them as order-of-magnitude, not quotes. Verify on the official pricing pages.

Item	OCI	AWS Bedrock	Microsoft Foundry	Google Cloud
Flagship reasoning model (in / out per 1M tok)	Grok / Cohere top tier, token-based up to ~$10.7 in (varies by model)	Claude Opus 4.8 ≈ $5 / $25	GPT-5.5 ≈ $5 / $30	Gemini 3 Pro ≈ $2 / $12 (≤200K ctx); $4 / $18 beyond
Mid-tier workhorse (in / out per 1M tok)	Cohere Command A / Llama 4 (low per-character rates)	Claude Sonnet 4.6 ≈ $3 / $15	GPT-5 mini (lower-cost tier)	Gemini 3 Flash ≈ $0.50 / $3.00
Cheapest small model	Llama 4 Scout ≈ $0.0018 / 10K transactions	Amazon Nova Micro ≈ $0.035 / 1M in	GPT-5 nano / Phi (low-latency tier)	Gemini 3 Flash-Lite tier
Embeddings	Cohere Embed 4 ≈ $0.001 / 10K transactions	Titan / Nova multimodal embeddings, per 1M tok	Azure OpenAI embeddings, per 1M tok	Vertex AI text embeddings, per 1M tok
Reranker	Cohere Rerank 4 on-demand; dedicated ≈ $10 / cluster-hour	Cohere Rerank via Bedrock	via Azure AI Search semantic ranker	Vertex AI Ranking API / grounding
Dedicated / provisioned	Per AI-unit-hour (e.g. large Cohere ≈ $24, large Meta ≈ $12)	Provisioned Throughput (model units / hour)	Provisioned Throughput Units (PTUs)	Provisioned Throughput (GSUs)
Prompt caching discount	Model-dependent	Up to ~90% on cached input	Up to ~90% on cached input	Context caching discount

Sources: Oracle OCI Generative AI pricing page; Anthropic / AWS Bedrock pricing; Microsoft Foundry pricing; Google Vertex / Gemini pricing; third-party aggregators (June 2026). Prices change without notice.

Agents & RAG platforms, in depth

Capability	OCI Enterprise AI Agents	AWS Bedrock AgentCore	Foundry Agent Service	Gemini Enterprise Agent Platform
GA status	GA Mar 2026	AgentCore GA Oct 2025; Managed Harness GA Apr 2026	GA; GPT-5 family rolling into the runtime	GA; rebranded from Vertex AI at Cloud Next, Apr 2026
Managed RAG / knowledge stores	Built-in vector stores + 26ai + OpenSearch; Object Storage ingestion	Bedrock Knowledge Bases (managed ingestion + vector store)	Foundry vector index + Azure AI Search	Vertex AI Search + grounding; AlloyDB / BigQuery vectors
Tools / function calling	Tools + Responses-style API	Gateway turns APIs/Lambdas into agent tools	Built-in tools + MCP + browser automation	Function calling + tools; A2A protocol v1.0 for agent-to-agent
Memory / identity	Session state; governance hooks	Managed memory, identity, policy engine	Thread state; Entra ID identity	Agent Engine Sessions + Memory Bank; Google IAM
Built-in browser / code exec	Via tools / custom	Yes: browser tool + code interpreter built in	Yes: browser automation; code interpreter	Yes: code execution + computer-use tooling
Observability / evals	Governance + monitoring hooks	Built-in evaluations + observability	Foundry evaluations + tracing	Vertex evaluations + tracing
Standout strength	Wired into Oracle data + Fusion roles, approvals, RBAC	Most complete standalone agent infra; model-agnostic	Tight M365 / Entra / GitHub fit + model router economics	Owns Gemini end-to-end; ADK + A2A; tight Workspace + BigQuery fit

Architect's read on agents

If you are building agents in the abstract, Bedrock AgentCore is the most complete runtime today. If your agents mostly act on Oracle data or inside Fusion, OCI's tighter coupling to roles, approvals, and the database usually beats a more capable but disconnected runtime. Foundry wins when the agent lives in the Microsoft 365 / Entra world. Google's platform wins when you want one vendor from the chip (TPU) to the model (Gemini) to the agent (ADK + A2A), or when your data already sits in BigQuery and Workspace.

Sovereignty & governance

Dimension	OCI	AWS	Microsoft	Google
US government	US Gov Cloud + US Classified Cloud with GA Gen AI (since Jan 2026)	GovCloud (US) + Secret / Top Secret regions	Azure Government + classified offerings	Assured Workloads for Gov; IL5-capable regions
EU sovereign	EU Sovereign Cloud (GA, EU-operated)	European Sovereign Cloud GA Jan 15, 2026 (Germany; partition aws-eusc)	Microsoft Cloud for Sovereignty + EU Data Boundary	Sovereign Cloud; partner-operated (T-Systems Germany, S3NS / Thales France)
Gen AI in the sovereign region?	Yes broad model set GA in sovereign/regulated regions	Limited Bedrock present but only Nova Lite / Pro at ESC launch (no Claude/Llama/Mistral)	Varies by offering and region	Yes Gemini runs fully air-gapped on-prem via GDC
GCC / Middle East	Saudi (Jeddah, Riyadh), UAE Central (Abu Dhabi, full Enterprise AI Jun 2026), Israel	UAE, Bahrain regions (model availability varies)	UAE, Qatar regions (model availability varies)	Saudi (Dammam), Qatar (Doha), Israel regions
Guardrails maturity	Native platform guardrails (Mar 2026) + governance + PII filters	Bedrock Guardrails (most mature) + Automated Reasoning	Content Safety + Foundry evaluations	Model Armor AI firewall (multi-model, in-line)
Sovereign-region cost note	Standard regional pricing	~15% premium in ESC; 2 AZs; no Free Tier at launch	Varies by sovereign offering	GDC air-gapped needs Google-supplied hardware

Sovereignty: where OCI and Google now lead

Sovereignty is the dimension where the usual pecking order flips. OCI has the widest set of frontier and open models GA inside gov, classified, EU-sovereign, and GCC cloud regions. Google has taken a different and arguably stronger path for the hardest cases: Gemini now runs fully air-gapped on-prem through Google Distributed Cloud, even on a single disconnected server. AWS only opened its European Sovereign Cloud in January 2026, and at launch Bedrock there is limited to Amazon Nova Lite and Pro. Rule of thumb: for in-region cloud, shortlist OCI. For a true air-gap or on-prem mandate, shortlist OCI and Google.

When each wins

Buying criterion	Winner	Why
You already run Oracle DB / Fusion	OCI	Data gravity + Fusion Agentic Apps + free vector search
You want the Gemini model specifically	GCP	Gemini is first-party to Google; only narrow exposure elsewhere
You're BigQuery / Workspace native	GCP	Data gravity in BigQuery; Gemini wired into Docs, Sheets, Gmail
You need a specific frontier model/version right now	AWS, Azure, GCP, or direct API	OCI catalog is broad in 2026, but exact model/version/region still decides.
You need M365 / Dynamics integration	Azure	Copilot ecosystem
You're AWS-native on infra	AWS	IAM / VPC / observability already there
Sovereign data with GA Gen AI in region	OCI	Reach into Gov, Classified, UAE, EU Sov
True air-gapped or on-prem Gen AI	GCP or OCI	Gemini runs air-gapped on GDC; OCI has classified / sovereign regions
Pure consumer SaaS at low cost	AWS, GCP, or direct API	Wider model price competition; Gemini Flash is cheap
Document-heavy enterprise back office	Tie (all four good)	Each has competent doc AI + RAG

Quick alignment (informal)

OCI	AWS	Microsoft Foundry (was Azure AI Foundry)	Google Cloud
OCI Generative AI Service	Bedrock	Microsoft Foundry / Azure OpenAI	Vertex AI / Gemini API
OCI Enterprise AI Agents	Bedrock Agents + KB + AgentCore	Foundry Agent Service	Gemini Enterprise Agent Platform (ADK + Agent Engine)
Oracle AI Data Platform	SageMaker Unified Studio / Glue / S3 Tables	Fabric / OneLake / Foundry data plane	BigQuery + Dataplex + Vertex
AI Vector Search (26ai)	OpenSearch · Aurora pgvector	Azure AI Search · SQL DB vector	AlloyDB AI · BigQuery vectors
OCI Data Science · AI Quick Actions	SageMaker · Bedrock Marketplace	Foundry · Azure ML	Vertex AI Workbench · Model Garden
OCI Vision	Rekognition	Azure AI Vision	Cloud Vision / Vertex Vision
OCI Language	Comprehend	Azure AI Language	Cloud Natural Language
OCI Speech + xAI Voice	Transcribe + Polly	Azure AI Speech	Speech-to-Text + Text-to-Speech (Chirp)
OCI Document Understanding	Textract	Azure Document Intelligence	Document AI
OCI Anomaly Detection	Lookout for Metrics (deprecating)	Anomaly Detector (deprecating)	Timeseries Insights / BigQuery ML
OCI Forecasting	SageMaker Canvas Forecast	Azure ML AutoML Forecasting	Vertex AI Forecasting / BigQuery ML
Fusion Agentic Apps	Q Business	M365 Copilot	Gemini for Workspace / Agentspace
HeatWave GenAI	(no direct equivalent)	(no direct equivalent)	BigQuery ML + Gemini

The unfashionable truth

For most enterprises, the right answer is multi-cloud AI. OCI for Oracle-data-anchored workloads and sovereign deployments. AWS or Azure for the frontier-model workloads. Google when you want Gemini, BigQuery-anchored AI, or air-gapped on-prem. Pretending one vendor wins everything is a procurement narrative, not an architecture one.

Sources used for this June 2026 refresh

Primary Oracle docs and blogs the content was anchored to, plus the competitive sources used for the OCI vs AWS vs Azure comparison. Verify the latest before locking in commitments.

June 2026 update (new this refresh)

OCI vs AWS vs Azure vs GCP comparison sources

OCI Generative AI

OCI Enterprise AI Agents and Governance

Oracle AI Database 26ai / 23ai Vector Search

Oracle AI Data Platform

Select AI

Fusion Agentic Apps & AI Agent Studio

APEX AI

Oracle Digital Assistant

MySQL HeatWave GenAI

OCI Data Science & AI Quick Actions

OCI AI Services

Infrastructure / NVIDIA partnership

Verification discipline

Pricing pages change without notice. Release notes get amended. Service names get renamed (see 23ai → 26ai). Always confirm in the OCI Console for your region and the official pricing pages before commitments.