Oracle AI, the practical way

This portal covers Oracle's full AI stack as of June 24, 2026. From OCI Generative AI Service, Enterprise AI Agents, and Oracle AI Data Platform, to AI Vector Search in Oracle AI Database 26ai, to the 22 Fusion Agentic Applications launched in March 2026. Architecture, trade-offs, risks, pricing. No marketing talk.

Refreshed June 2026 Architecture-first Enterprise focus Vendor-neutral
TL;DR

Oracle's AI story in 2026 has three centers of gravity. OCI Enterprise AI packages Generative AI Models, Enterprise AI Agents, and Governance into a managed build platform. Fusion Agentic Applications (GA Mar 2026) ships 22 pre-built agentic apps embedded inside Fusion Cloud ERP, HCM, SCM, and CX. Underneath everything sits Oracle AI Database 26ai with native vector search, Select AI, Private Agent Factory, and Private AI Services Container for workloads that must stay database-close or private. If you're an enterprise already on Oracle, this stack is increasingly hard to ignore.

How this portal is organized

Left sidebar groups Oracle AI into seven layers. Each service has its own page with tabs: Overview, Architecture, Models or Features, Pricing, Risks, and When to use. The bottom of the sidebar has decision matrices, architecture patterns, and a cross-cloud comparison.

NEWGenerative AI

Cohere Command A family, Llama 4 Scout & Maverick, xAI Grok 4.x, NVIDIA Nemotron, Google Gemini options, OpenAI open-weight models, importable compatible models, and dedicated AI clusters.

GAAI Vector Search

Native VECTOR datatype, HNSW + IVF indexes, unified hybrid search across vector + relational + JSON + graph + spatial, plus private agents and private AI containers.

NEWFusion Agentic Apps

22 pre-built agentic applications for Finance, HR, SCM, and CX. Native to the transactional system, governed by Fusion roles.

NEWAI Data Platform

Governed data discovery, catalogs, workspaces, pipelines, notebooks, RBAC, and agent-ready data connections for AI teams that need a shared data plane.

Who this is for

Enterprise architects, Oracle DBAs and Apps DBAs moving into AI, technical leads scoping pilots, and anyone who has to defend an Oracle-vs-hyperscaler choice in a steering committee. Assumes you already know cloud, databases, and identity. Does not assume you know what an embedding is, and explains the AI-specific bits as it goes.

The Oracle AI mental model

Think of Oracle's AI in three layers stacked on top of each other.

LAYER 3 · APPLICATION AI (consume) Fusion Agentic Apps · AI Agent Studio · APEX AI Assistant · ODA · Industry SaaS (NetSuite, Health, Hospitality) Pre-built. Governed by app roles. Customize via low-code, not by training models. LAYER 2 · PLATFORM AI (build) OCI Generative AI Models · Enterprise AI Agents · Governance · AI Data Platform · AI Quick Actions OCI Vision · Language · Speech · xAI Voice · Document Understanding · Anomaly Detection · Forecasting Consume hosted models, import compatible models, build agents, govern responses, fine-tune, deploy, monitor. LAYER 1 · DATA & INFRASTRUCTURE (ground) Oracle AI Database 26ai (Vector + Select AI + Private Agent Factory) · MySQL HeatWave GenAI · Object Storage · OpenSearch Private AI Services Container · GPU shapes: H100, H200, B200, GB200 · Sovereign & Gov regions This is where your enterprise data lives. Vectors, private tools, and sometimes the model stay next to the rows.
Figure 1 · Oracle AI is layered. Most enterprises start at Layer 3 (Fusion apps) or Layer 1 (vector search in the DB they already own).

What sets Oracle apart in 2026

DifferentiatorWhat it means in practice
Vectors live in the source-of-truth DBNo separate Pinecone or Weaviate. Transactionally consistent vector search next to your rows. Means RAG ground truth and operational data never drift.
Unified hybrid searchOne SQL can join vector similarity with relational predicates, JSON paths, graph hops, and spatial filters. Other vendors need an orchestration layer.
Multi-model gateway (BYO LLM)Cohere, Meta Llama 4, xAI Grok, NVIDIA Nemotron, one service, one endpoint, one bill. Useful when you want vendor optionality without re-architecting.
Fusion-native agents22 agentic apps run inside Fusion with full role security, approval hierarchies, and transactional context. Other vendors' agents have to bolt on to ERP via APIs.
Sovereign and Gov coverageOCI Gen AI is GA in US Gov, US Classified, UAE Central, EU sovereign. Often the only path for regulated workloads.
Free-tier vector searchAI Vector Search ships at no extra licence cost in 26ai. Compared to Postgres + pgvector + vector-DB-as-a-service, the TCO comparison is brutal for the competition if you already pay for the DB.

What Oracle is still weak at (be honest)

Gaps worth naming
Oracle does not own a frontier model. It packages partner/open model families and import paths behind OCI controls. If your differentiation depends on a single exact proprietary model/version that is not exposed in your target OCI region, Oracle is not where you go for that workload, use the model vendor directly or via AWS/Azure, behind a model gateway. Oracle's bet is enterprise integration, governance, data gravity, and private deployment patterns, not owning the top model leaderboard.
Tooling maturity
Compared with AWS Bedrock Studio, SageMaker Studio, and Azure AI Foundry, the developer experience is improving fast but still feels Oracle-Console-y. Notebooks, MLOps lineage, and prompt-engineering UIs are catching up, not leading.

How to read the rest of this portal

Each service tab follows the same shape: Overview → Architecture → Models/Features → Pricing → Risks. If you only have time for one tab, read Risks. The other tabs tell you what something does. Risks tells you what burns you in production.

What's New - Q4 2025 through June 2026

Material changes that affect architecture, cost, or risk decisions. Curated, not a press-release dump.

TL;DR

Three things matter most. One: Oracle AI Database 26ai moved AI into the database, including vector search, Select AI, private agents, hybrid RAG exposed as MCP tools, and private vector-service containers. Two: OCI Enterprise AI is now a broader platform: Generative AI Models, Enterprise AI Agents, and Governance rather than only an endpoint catalog, and the June 2026 wave widened model choice (Nemotron 3 Ultra, Qwen, Gemma, gpt-oss on B200), promoted Cohere Rerank 4 to on-demand, and added OCI Resource Analytics. Three: Fusion Agentic Applications launched Mar 2026 with 22 pre-built agents inside Fusion ERP/HCM/SCM and expanded to CX in Apr 2026.

June 2026 in one line
The June refresh is mostly platform breadth, not new tech direction: more models you can import and host (Nemotron 3 Ultra, Qwen, Gemma, OpenAI gpt-oss on B200), Cohere Rerank 4 now on-demand and on dedicated clusters, full multimodal (Embed 4 + xAI Voice), a new Abu Dhabi footprint for Enterprise AI, and OCI Resource Analytics for AI-queryable cloud-estate data. Source: Oracle "What's New in AI, June 2026" (published June 11, 2026).

Major releases timeline

DateReleaseWhy it matters
Oct 2025Database 23ai renamed Oracle AI Database 26aiAligns with calendar versioning. AI Vector Search now standard, not an add-on. Branding tells customers AI is a first-class workload.
Jan 2026Oracle AI Database 26ai Linux x86-64 on-prem (RU 23.26.1)Enterprises can run AI Vector Search on their existing Exadata / commodity Linux without going to OCI.
Jan 2026OCI Gen AI in US Classified CloudTop Secret / classified workloads can now use Gen AI without leaving the Oracle classified environment.
Jan 2026xAI Grok 4.1 Fast + Cohere Command A Vision, Command A ReasoningCheaper Grok variant for high-volume. Cohere adds vision and reasoning variants for enterprise agent patterns.
Mar 2026OCI Enterprise AI GAOracle formalizes the stack around Generative AI Models, Enterprise AI Agents, and Enterprise AI Governance.
Mar 2026Enterprise AI Agents GAAgent runtime expands beyond basic RAG into a managed platform with tools, vector stores, responses API, and governance hooks.
Mar 2026Fusion Agentic Applications launch (22 apps)Native ERP/HCM/SCM agents. Not bolt-on. Approval hierarchies and Fusion roles flow through automatically.
Mar 2026AI Agent Studio adds Agentic Applications BuilderNo-code orchestration of Oracle, partner, and external agents. Free with Fusion subscription.
Mar 2026AI guardrails for OCI Gen AI on-demandNative guardrail evaluation changes the production control model: validate prompts and responses at the platform layer, not only in app code.
Mar 2026NVIDIA GTC 2026: OCI Superclusters with GB200 NVL72For frontier training. Less relevant to most enterprises but signals Oracle's continuing GPU access advantage.
Apr 2026RU 23.26.2 for Oracle AI Database 26aiQuarterly cadence now driving vector improvements (DML on HNSW, hybrid search refinements).
Apr 2026Fusion Agentic Apps for CX (Sales, Service, Marketing)Expands from back-office (ERP/HCM) into customer-facing flows.
Apr 2026NVIDIA Nemotron 3 Nano Omni on OCI Gen AIAdds a strong small-model option for multimodal use cases on commodity GPUs.
May 2026Import compatible models into OCI Gen AIArchitecturally important: teams can bring compatible models such as Qwen/Gemma-style models into the OCI Gen AI control plane instead of leaving everything in Data Science.
May 2026Cohere Embed 4 supports mixed text + image inputUseful for multimodal RAG over PDFs, slides, screenshots, catalog images, and claims packets.
May 2026Cohere Rerank 4 on OCI Gen AIBetter second-stage retrieval quality for RAG. Drop-in upgrade for existing RAG pipelines.
May 2026xAI Voice text-to-speech on OCI Gen AIAdds hosted TTS to Oracle's Gen AI layer; use for call-center summaries, training narration, accessibility, and agent voice responses.
May 2026Grok 4.3 on OCIModel catalog expansion for reasoning-heavy research and analysis workloads. Confirm region/model availability before designing around it.
May 2026OCI Gen AI in UAE Central (Abu Dhabi)Sovereignty win for GCC customers and BFSI. Bedrock and Azure OpenAI parity issue for the region.
Jun 2026Deprecated Gen AI APIs become unavailableDo not build new integrations on legacy GenerateText/SummarizeText style APIs. Use the current chat / responses APIs and SDK patterns.
Jun 2026Cohere Rerank 4 now on-demand and on Dedicated AI ClustersReranking is no longer dedicated-cluster only. On-demand pricing lowers the barrier to adding a second-stage reranker to existing RAG pipelines. Quality lift over raw vector search for little engineering cost.
Jun 2026NVIDIA Nemotron 3 Ultra on OCI Enterprise AI (dedicated clusters)Open-weights frontier reasoning/agentic model you host on Oracle-recommended GPUs behind a managed OCI endpoint. Option for teams that want a strong open model under their own control plane, not a vendor API.
Jun 2026New Model Import models: Alibaba Qwen, Google Gemma; gpt-oss-20b/120b on B200 in Abu DhabiWidens the bring-your-own-model catalog inside the OCI Gen AI control plane. gpt-oss on B200 clusters in UAE Central pairs open OpenAI weights with sovereign-region hosting.
Jun 2026Multimodal: Cohere Embed 4 (text/image/combined) + xAI Voice TTS in Enterprise AIConfirms multimodal RAG and voice as first-class on the platform. Embed 4 handles mixed text-image inputs; xAI Voice covers narration, accessibility, and agent voice responses.
Jun 2026OCI Enterprise AI GA in UAE Central (Abu Dhabi)Moves beyond a single Gen AI endpoint to the full Enterprise AI stack in-region, on-demand or dedicated. Data-residency win for GCC and BFSI customers.
Jun 2026OCI Resource Analytics for cloud-estate intelligenceNear-real-time view of resources, relationships, and config metadata across regions/tenancies. Runs on Oracle AI Database with Select AI and MCP server support, so agents and assistants can query your estate in natural language.
Jun 2026OCI AI Accelerator Packs + Enterprise AI Chat reference architecturePreconfigured, self-service AI solutions launchable from the OCI Console, plus a published reference architecture and GitHub deployment guide for enterprise-grade AI chat. Lowers time-to-first-pilot.
Jun 2026Hybrid RAG in 26ai exposed as an MCP toolOracle guidance now shows turning a 26ai vector index into an MCP tool for hybrid (vector + keyword) RAG. Signals MCP becoming the default integration surface between the database and agents.

Practical implications for architects

If you have an existing Oracle DB estate

26ai upgrade unlocks RAG without buying a vector DB. Plan an architecture review: which workloads can move to in-DB embeddings vs which need OCI Gen AI Agents service? The decision often comes down to whether the corpus is mostly structured (DB-side) or mostly documents (Agents service).

If you run Fusion Apps

Pilot 1-2 of the 22 Agentic Apps now. They are included in your Fusion subscription. The build-vs-buy math for custom agents got worse, Oracle's are pre-wired into roles, approvals, and data. Build only what Fusion does not cover.

If you are building greenfield AI on OCI

Start with Enterprise AI Agents and the current Responses-style APIs for managed RAG, tools, vector stores, and governance. Drop into Data Science only when you need custom model training or hosting AI Quick Actions doesn't cover.

If multiple teams need the same AI data

Use Oracle AI Data Platform to define governed data products, catalogs, owners, lineage, RBAC, and refresh rules. Do not let every chatbot create its own private copy of the corpus.

If sovereignty drives the decision

OCI's reach into Gov, Classified, UAE, and EU sovereign regions has widened. For workloads where the data physically cannot leave a jurisdiction, OCI Gen AI is increasingly the only major-vendor option with a GA service.

Watch-out: version naming
Internally, 26ai is version 23.26.1 (Jan 2026 RU). Patches, MOS notes, and many docs still say "23ai" or "Database 23c/23ai". Don't let the marketing rename confuse procurement or patching playbooks. 26ai = 23ai with calendar-year naming. Same product family.

Service Map

Every Oracle AI service worth knowing, in one diagram. Use this to orient before you go deep.

CONSUME · Pre-built application AI Fusion Agentic Apps 22 agents · ERP/HCM/SCM/CX AI Agent Studio Build/orchestrate agents APEX AI Assistant Low-code + RAG Industry / Digital Assistants NetSuite, ODA, Health, Hospitality Enterprise AI Agents RAG · tools · responses BUILD · Platform AI services OCI Generative AI Service Models · import · voice · guardrails AI Data Platform Data products · vectors · lineage Data Science + AQA MLOps · FM deploy · fine-tune Embeddings · Rerank · Governance Embed 4 · Rerank 4 · guardrails OCI Vision Image / OCR OCI Language NLU / NER / Sent Speech + xAI Voice ASR / TTS / Diarize Doc Understanding Extract / Classify Anomaly Detect MSET2 time-series Forecasting AutoML time-series DATA · Where embeddings & ground truth live Oracle AI Database 26ai VECTOR datatype · HNSW · IVF Select AI · ONNX in-DB Private agents · hybrid search MySQL HeatWave GenAI In-DB LLMs Vector store Lakehouse OCI Object Storage Source docs for RAG Bucket = KB source OCI Search with OpenSearch Pre-indexed corpora Hybrid keyword + vector INFRASTRUCTURE · GPU shapes & networking BM.GPU.H100.8 · BM.GPU.H200.8 · BM.GPU.B200 · GB200 NVL72 Superclusters · RDMA cluster networks Regions: 50+ commercial · 3 Gov · 1 Classified · UAE · EU Sovereign · Dedicated Region
Figure 2 · Oracle AI service map, June 2026. Layers from consume (top) to infrastructure (bottom).

Reading the map

The top band is where you start if you are buying outcomes, pick a Fusion agent, hook up APEX RAG, or use a managed Gen AI Agent. The middle band is where you start if you are building, pick a model, write prompts, expose endpoints. The bottom two bands are where the data and compute live; you do not get to ignore them, because they drive cost and latency.

A common mistake
Teams start at the platform band (Gen AI Service) when an app-band service (Generative AI Agents) would have shipped in a month instead of six. Always check whether a managed service one layer up does what you need before you build it yourself.

OCI Generative AI Service GA

Oracle's managed foundation-model platform. Use hosted models, import compatible models, build Enterprise AI Agents, apply guardrails, or rent a dedicated AI cluster. Region-restricted, IAM-governed, OCI-billed.

Overview
Architecture
Capabilities
Pricing model
Risks & gotchas
When to use
TL;DR

One platform, several control points. You can call hosted models for chat, embeddings, rerank, and text-to-speech; import compatible models into the Gen AI control plane; build Enterprise AI Agents; and apply native guardrails. You pick on-demand for elasticity or dedicated AI clusters for steady throughput, isolation, fine-tuning, and private capacity.

What problem this solves

Most enterprises don't want to manage GPU clusters, model weights, guardrail services, vector-store plumbing, and vendor contracts separately. They want a single OCI-governed surface with IAM, private networking, logging, cost controls, and the ability to swap models without rewriting the app. That's the offer. The trade-off is catalog and feature availability vary sharply by region and model family.

Two consumption modes: pick one per workload

ModeHow you payLatency & isolationBest for
On-demandPer 1M characters (input + output for generation; input only for embeddings)Shared GPU pool. Burst-tolerant. Variable latency under load.Prototyping, low-volume prod, spiky workloads, dev environments.
Dedicated AI ClusterHourly per cluster, irrespective of utilizationDedicated GPUs in your tenancy. Stable latency. Tenancy-isolated.Steady high-volume traffic, regulated data, sub-second SLA, fine-tuned models, custom-trained adapters, imported compatible models.
Rule of thumb
Cross over to a dedicated cluster when your on-demand bill exceeds the cluster hourly rate by ~30%. Below that, on-demand is cheaper and operationally simpler. Above it, dedicated wins on cost and latency.

Reference architecture

Customer VCN (private subnets, NSG-protected) Application OKE pod / OCI Functions / Compute VM / APEX OCI SDK call Gen AI Service Endpoint Regional, private Service Gateway access IAM + Resource Principal Model dispatch ▸ Cohere Command A / A Vision / A Reasoning ▸ Meta Llama 4 Scout · Maverick · Llama 3.3 ▸ xAI Grok 4.x + xAI Voice TTS ▸ NVIDIA Nemotron · Google Gemini options · OpenAI gpt-oss ▸ Cohere Embed 4 · Rerank 4 · imported compatible models On-demand inference pool Shared GPU fleet, per-character billing No capacity provisioning Per-tenant quotas apply Dedicated AI Cluster Customer-tenanted GPUs (BM.GPU.A100/H100) Hourly billing · Reserved capacity Required for fine-tuning & custom adapters RAG context source Oracle AI Database 26ai (VECTOR + Hybrid search) or OpenSearch / Object Storage Observability OCI Logging · Monitoring Audit logs of all model calls Usage metrics per compartment Security OCI IAM policies on model + endpoint Vault for prompt templates / API keys Data never used to train shared models
Figure · OCI Generative AI Service reference. Service gateway keeps traffic on the Oracle backbone, never the public internet.

Network and identity

The Gen AI Service endpoint is reachable through a Service Gateway and supports private endpoint patterns for workloads that should stay off the public internet. Authentication is OCI IAM, calls from compute instances use instance principals or resource principals, external apps use signed requests or governed API-key patterns. Authorization is granted via IAM policies on the generative-ai-family resource type. You can scope to compartments, model endpoints, and agent resources.

Where the data goes

Oracle's stated position is that on-demand requests are not used to train shared models. Dedicated clusters provide stronger tenant isolation. Logs of prompts and completions can be captured to OCI Logging at your discretion. For regulated data, prefer dedicated clusters, private endpoints, zero-trust network controls where available, and VCN-side egress rules so the only allowed path is to OCI services you approved.

Capability matrix (June 2026)

CapabilityOn-demandDedicated clusterNotes
Text generationAll chat / completion models.
EmbeddingsCohere Embed v4 multimodal, English + multilingual.
RerankingCohere Rerank 4 (May 2026).
Vision (image input)Cohere Command A Vision, Llama 4 multimodal, Grok 4.3 vision.
Text-to-speechxAI Voice support arrives through OCI Gen AI, separate from OCI Speech transcription.
Function calling / toolsStandard JSON-mode tool calling on Cohere & Llama.
Responses API / hosted toolsUse the current Responses-style APIs for agentic applications; do not build on deprecated text APIs.
Streaming outputSSE.
Import compatible modelsUse when a model is compatible with the OCI Gen AI serving path but is not yet in the managed catalog.
Fine-tuning (LoRA / T-Few)Requires dedicated cluster.
Custom-trained adaptersPer-customer model endpoints.
Long context (>200K)Llama 4 Maverick & Grok 4.20 support longer contexts; quotas tighter on shared pool.
Content moderation / guardrailsNative guardrail evaluation for prompts/responses; supported controls vary by on-demand vs dedicated endpoint.

Region availability (as of June 2026)

Always confirm in the Console, but at time of writing OCI Gen AI is GA in: US (Chicago, Phoenix, Ashburn), Frankfurt, London, Amsterdam, Tokyo, Osaka, Sydney, Mumbai, Hyderabad, São Paulo, Toronto, Saudi Arabia (Jeddah, Riyadh), Israel, Singapore, Seoul, UAE Central (Abu Dhabi, full Enterprise AI as of June 2026), US Gov Cloud, US Classified Cloud. Not every model is in every region, Grok and Nemotron have narrower footprints, Cohere is widest. Abu Dhabi now also hosts imported OpenAI gpt-oss-20b/120b on B200 dedicated clusters.

Region trap
Model availability lags region launch. A region may have Cohere Command A but not Llama 4 Maverick. Pin model+region in your architecture review before committing.

Pricing mental model

On-demand pricing is per 1 million characters, not per token. Character count includes whitespace. For generation, you pay input + output; for embeddings, input only. For dedicated clusters you pay hourly per unit, where one "unit" is a specific GPU configuration that varies by model family. Verify current rates on the OCI pricing page, they move.

Cost behaviour by mode

WorkloadBest modeWhy
POC / pilot, <5M chars/dayOn-demandPay only for what you call. Cluster idle cost would dominate.
Steady 50M+ chars/day, predictableDedicated clusterHourly rate amortizes well past a threshold; latency stable.
Fine-tuned Cohere for legal reviewDedicated cluster (required)Custom adapters only deploy on dedicated.
Multi-tenant SaaS, bursty per customerOn-demand with circuit breakersQuotas + retry shed load gracefully; cluster overprovisioning expensive.
Regulated data with isolation requirementDedicated clusterTenant isolation is the buying criterion, not cost.

Hidden cost drivers

  • System prompt bloat. Every request pays for the full system prompt. A 4KB persona prompt at scale dominates the bill. Use prompt caching where supported, or templatize.
  • Naive RAG context windows. Stuffing 20 chunks into context costs 20x more than stuffing 4. Use reranking (Rerank 4) to cut to top-K, then send.
  • Retries and timeouts. A 504 retry is two billed calls. Cap retries, log them, set them as alarms.
  • Egress. Calls from outside OCI to the Gen AI endpoint can incur OCI egress on the response path. Keep callers inside OCI when volume is high.
  • Embeddings re-runs. Re-embedding your whole corpus when you change models is expensive. Version your embeddings and decide policy up front.

Risks to think about before production

RiskImpactMitigation
Model deprecationApps break when a model is retiredAbstract model name behind a config flag. Test against the next model in the family early. Subscribe to OCI release notes for Gen AI.
Quota throttling under burst5xx during peak, lost revenueSet Alarms on 429/503 from the endpoint. Request quota increases proactively. Move hot workloads to dedicated.
Region-model mismatchCannot deploy because model missing in regionDocument model-region matrix as part of architecture review. Use a different region for inference if data residency allows.
Cross-tenant prompt leakingSensitive data echoed to other tenants in your SaaSPer-tenant prompt isolation, no global cache of completions, audit log review.
Hallucinated tool callsAgent calls wrong API with wrong argsStrict JSON schema validation, dry-run flag, idempotent tool design, human-in-loop on side-effectful tools.
Prompt injection from documentsRAG-fed document overrides system promptUse a defensive system prompt; classify retrieved chunks; mark untrusted content explicitly; pre-scan inbound docs.
Cost runaway from agent loopsRecursive agents consume thousands of dollars overnightPer-session token budget, max-step cap, alarm on cost per session, kill switch.
Compliance audit gapsCannot prove what the model said to whom and whenAlways log prompt + completion + model + version to OCI Logging. Retain per regulatory policy.

Use Gen AI Service when…

  • You need an LLM endpoint inside your OCI tenancy with IAM, VCN, and audit aligned to your existing controls.
  • You want vendor optionality across Cohere, Llama, Grok, NVIDIA, Gemini-style integrations, and importable compatible models without running each stack yourself.
  • You are building a custom app or pipeline, not consuming a pre-built one.

Skip it and use something else when…

  • Your use case is covered by a Fusion Agentic App, use that instead, it ships in days.
  • You need a specific frontier model or exact version that is not exposed in your OCI region, go direct, use that vendor's platform, or isolate the exception behind a model gateway.
  • Your corpus is mostly documents and you want managed RAG, use OCI Generative AI Agents (Knowledge Bases) instead of writing your own retrieval.
  • Your traffic is <100K requests/month, the per-character bill is fine, but evaluate whether OpenAI direct is operationally simpler given your team's existing tooling.

OCI Generative AI Agents GA ENTERPRISE AI AGENTS

Oracle's managed agent runtime. Build RAG agents, tool-using agents, and Responses API applications with vector stores, knowledge bases, hosted tools, session state, and governance hooks.

Overview
Architecture
Knowledge bases
Tools & orchestration
Risks & gotchas
When to use
TL;DR

Gen AI Agents started as managed RAG; by 2026 it is part of OCI Enterprise AI Agents. It wraps the Gen AI model layer with knowledge bases, vector stores, hosted tools, multi-turn session state, Responses-style APIs, and governance controls. Result: managed agents without owning every piece of retrieval, tool-calling, and audit plumbing. Trades flexibility for time-to-market.

What you get out of the box

  • Knowledge Bases backed by Object Storage, OCI OpenSearch, or Oracle Database 23ai/26ai Vector Search.
  • Vector Stores for reusable retrieval assets across assistants and hosted tools.
  • Multi-turn conversations with automatic context retention per session.
  • Custom instructions at agent level (system-prompt-like persona).
  • Responses API patterns for streaming, tools, file search, and stateful turns.
  • Guardrails integrated with OCI Enterprise AI Governance.
  • Human-in-the-loop approval steps as a first-class concept.
  • Citations back to source documents so users can verify.

What it is not

It is not a replacement for Fusion Agentic Apps when a pre-built agent already covers the process. It is not a free-form "let the model do anything" runtime; tools, memory, files, and guardrails still need explicit design. And it does not remove the need to think carefully about chunking, embedding, metadata, identity, and retrieval quality, managed doesn't mean magic.

Reference architecture

OCI Generative AI Agents service (managed) Client Web/Mobile/APEX Slack / Teams / API Session manager Multi-turn history Per-user context Agent runtime Plans · Tools · Reasoning Custom instructions OCI Gen AI Service Cohere / Llama / Grok / import Responses API + guardrails RETRIEVAL LAYER Knowledge Base Source: Object Storage / OS / 26ai Chunk · Embed · Index Auto-refresh on source change Hybrid keyword + vector Reranker Cohere Rerank 4 Cuts to top-K Improves precision Citation tracker Source URI per chunk Returned to client Audit-trail-ready TOOL LAYER Function tools Call APIs · Fusion · ATP Functions · OIC · MCP tools Human-in-loop Approval gates For side-effectful actions Guardrails PII · toxicity · jailbreak Pre + post filters Observability Per-turn traces OCI Logging
Figure · A request flows client → session → agent → LLM. The agent decides whether to query the KB, call a tool, or just answer. Citations flow back.

Three knowledge base source types

SourceBest forHow indexing worksTrade-offs
OCI Object StorageStatic document corpora (PDFs, DOCX, MD)Service ingests files on a schedule, chunks, embeds, stores in a service-managed vector storeSimplest. Limited tuning. Refresh latency on source changes. Good first move.
OCI Search with OpenSearchBYO indexed corpus where you control chunking and metadataYou ingest and index into OpenSearch yourself; agent queries it; chunks must be <512 tokensYou own the pipeline. More work. Better when corpus is large or filtering is heavy.
Oracle AI Database 23ai/26aiRAG over relational + document corpora with security filtersDocuments and vectors live in the DB; uses native VECTOR datatype and HNSW/IVF; agent issues hybrid SQLBest when the DB is already your source of truth. Row-level security flows through. Requires DBA skills.
Decision shortcut
If your corpus is <10K docs and rarely changes, use Object Storage. If you need to filter by user permissions or join with relational data, use Oracle Database 23ai. If you have an existing search team and a large heterogeneous corpus, use OpenSearch.

Tools = how the agent acts

An agent without tools is a chatbot. With tools, it can call APIs, query systems, write to records, search files, or invoke governed internal services. The agent picks tools by function-calling on the underlying LLM. You define each tool with a name, description, and JSON schema. The runtime validates the model's output against the schema before invoking the backing OCI Function, HTTP endpoint, integration flow, or MCP-style tool server.

Common tool patterns

  • Read-only lookups: query Fusion HCM, Service Cloud, custom APIs. Safe to call freely.
  • Side-effectful actions: submit POs, create tickets, send emails. Wrap in human-in-loop approval.
  • Computation: call a calculator/converter function. Cheap to allow.
  • Hosted tool use: file search, code execution, and vector-store lookup where the Responses API supports it.
  • Long-running jobs: start a workflow in OCI Functions, return a job ID, poll for status.

Multi-agent orchestration

Native multi-agent orchestration is provided through the AI Agent Studio Agentic Applications Builder (Mar 2026 release). It lets you compose multiple Gen AI Agents into a workflow with shared memory, conditional routing, and ROI measurement. For Fusion customers it is free and the recommended path. For non-Fusion environments, you can compose at the application layer with the SDK.

Risks and gotchas

RiskWhat goes wrongWhat to do
Stale knowledge baseSource bucket updates but the index hasn't re-ingested yetConfigure ingestion schedule, monitor lag, surface a "Last refreshed" timestamp to users.
Wrong chunk sizeRetrievals are too narrow or too wide, hurting answer qualityDefault chunking rarely optimal for technical docs. Pilot with OpenSearch where you control it.
Citations don't match claimLLM invents text but cites a real chunkStrict prompting + post-hoc verification step. For high-stakes use, render the chunk text alongside the answer.
Permission bleedAgent returns a doc the user shouldn't seeFilter at retrieval time using user identity. With 23ai KB this is straightforward via VPD. With Object Storage you must bucket-segregate or pre-filter.
Tool failure cascadeTool returns error, agent retries, retries, retriesCap retries, expose tool errors as plain text in the conversation, add max-step.
Slow first-token under loadCold-start on the underlying LLM hurts perceived latencyPre-warm via synthetic traffic; for SLA-critical agents use dedicated cluster.

Use Generative AI Agents when

  • You need an internal RAG chatbot over a corpus, and you don't want to build a retrieval pipeline.
  • You want managed citations and multi-turn out of the box.
  • Your knowledge base is one of the three supported source types.

Skip when

  • You need exotic retrieval (graph RAG, multi-hop reasoning across sources), build with the Gen AI Service directly.
  • Your use case is in Fusion, use the Fusion Agentic App instead.
  • You need on-prem inference, Agents is a managed service in the cloud.

Foundation Model Catalog

Model families and model-delivery paths to understand on OCI Generative AI as of June 24, 2026. Always confirm exact model IDs and region availability in the OCI Console before implementation.

TL;DR

Oracle is not trying to own one frontier model. The 2026 architecture is a model platform: Oracle-hosted Cohere, Meta, xAI, and NVIDIA families; Google Gemini options where exposed through the OCI Gen AI integration path; OpenAI open-weight models through AI Quick Actions; and compatible-model imports for dedicated serving. Treat model identity as configuration, not application logic.

Models by family

Cohere: Oracle's strategic partner

ModelTypeStrengthsBest for
Cohere Command AChat / generationStrong RAG behavior, enterprise tone, multilingualDefault general-purpose chat agent for enterprise apps
Cohere Command A Vision Jan 2026MultimodalImage + text understandingDocument understanding pipelines, screenshot Q&A
Cohere Command A Reasoning Jan 2026ReasoningChain-of-thought, multi-step planningAgent planning, complex tool selection
Cohere Embed v4EmbeddingsMultilingual, multimodal, 1024-dim or 256-dimDefault embedding model for RAG on OCI
Cohere Rerank 4 Jun 2026RerankerPairwise scoring of query vs candidate; cuts top-N to top-K. Now available on-demand and on dedicated clusters (Jun 2026), not dedicated-onlySecond-stage RAG retrieval; quality lift over raw vector search

Meta: open weights, broad fit

ModelStrengthsBest for
Meta Llama 4 ScoutEfficient, smaller MoE; cheap inferenceHigh-volume classification, summarization, lightweight RAG
Meta Llama 4 MaverickLarger MoE; long context; multimodalLong-document analysis, complex multi-doc RAG
Meta Llama 3.3 70BDense, well-understood baselineFine-tune target where you have abundant labeled data

xAI Grok

ModelStrengthsBest for
Grok 4.3 May 2026Strong reasoning, real-world knowledge breadthResearch assistants, analyst summarization
Grok 4.20General-purpose chat, faster variant where regionally availableConsumer-facing agents where latency matters
Grok 4.20 Multi-AgentModel-side support for multi-agent style orchestration where availableWorkflows with multiple specialist sub-agents
Grok 4.1 Fast Jan 2026Lowest-cost Grok variantHigh-volume routing, low-complexity tasks

NVIDIA

ModelStrengthsBest for
Nemotron 3 Nano Omni Apr 2026Small footprint, multimodal, optimized for NVIDIA stackEdge-ish inference, multimodal classification, cost-sensitive workloads
Nemotron 3 Ultra Jun 2026Open weights, training data, and recipes; frontier reasoning and agentic performance. Hosted via OCI Enterprise AI imported-model deployment on dedicated AI clustersTeams that want a strong open model on Oracle-recommended GPUs behind a managed OCI endpoint and their own control plane

Other delivery paths: Gemini, gpt-oss, and imported compatible models

PathWhat it meansBest for
Google Gemini model optionsUse when exposed through the OCI Gen AI integration path in your region and tenancy. Treat availability as a region-specific architecture dependency.Teams that want Gemini behavior but need OCI-side governance, networking, or billing alignment.
OpenAI gpt-oss in AI Quick ActionsOpen-weight OpenAI models deployed through OCI Data Science / AI Quick Actions rather than the managed Gen AI on-demand catalog.Private/custom deployments where open weights matter more than a managed token endpoint.
Import compatible models Jun 2026Bring compatible model artifacts into OCI Generative AI dedicated serving so they can use the same endpoint and governance patterns. June 2026 added Alibaba Qwen and Google Gemma families, plus OpenAI gpt-oss-20b/120b on B200 clusters in Abu Dhabi.Model standardization when your chosen model is not yet a first-party catalog model.
Direct external APIKeep Claude/GPT/Gemini direct calls behind your own model gateway when the exact model, version, or region is not available on OCI.Exception workloads where model quality beats platform consolidation.

Choosing a model: quick heuristic

If your need is…Start with
Default enterprise chat / RAG agentCohere Command A
Image + text in the same promptCohere Command A Vision or Llama 4 Maverick
Complex multi-step planningCohere Command A Reasoning or Grok 4.3
Cheap, high-volume classificationGrok 4.1 Fast or Llama 4 Scout
Long context (>200K tokens)Llama 4 Maverick or Grok 4.20
Multi-agent native orchestrationGrok multi-agent variants where available, or Enterprise AI Agents / AI Agent Studio for platform orchestration
Text-to-speech agent outputxAI Voice on OCI Gen AI
Embeddings for RAGCohere Embed v4
Reranking RAG candidatesCohere Rerank 4
Fine-tuning on your dataLlama 3.3 70B (mature), Cohere via dedicated cluster, or AI Quick Actions for open-weight models
Model availability moves
Oracle ships new models monthly. Always confirm the current list and per-region availability in the OCI Console under Generative AI before locking in a model name in code or a contract.

Embeddings & Rerank

The unglamorous half of RAG. Get embeddings wrong and the LLM has no chance.

What embeddings actually are

An embedding is a fixed-length vector of floats that represents semantic content. Two passages that mean similar things produce vectors that point in similar directions. Vector search finds the K nearest neighbors of a query vector and returns the passages they came from. That is retrieval. The LLM then writes an answer from those passages. That is the generation.

Cohere Embed 4: the default on OCI

PropertyValue
Default dimensions1024 (with a 256-dim variant for cost/storage sensitivity)
Languages100+ via the multilingual variant
Input modalitiesText, image, and mixed text+image input for multimodal retrieval patterns
Max input~512 tokens per chunk (chunk first, embed second)
Where it runsOCI Generative AI Service, on-demand or dedicated

Chunking discipline (the part teams skip)

  • Chunk by structure first, length second. Split on headings, paragraphs, table rows, not arbitrary character counts.
  • Aim for ~300-500 tokens per chunk. Smaller chunks improve precision; larger improve context.
  • Overlap by 10-15%. Prevents losing the cross-boundary sentence.
  • Carry metadata. Source URI, page number, last modified, owning department. You will need this for filtering and citations.
  • Re-embed on policy change. Switching embedding models or dimensions means re-embedding the entire corpus. Plan version, cost, and rollback upfront.

Two-stage retrieval (the pattern that wins)

User query "How do I reset a Pi at quarter end?" Stage 1: Recall Vector search Top 50 candidates Stage 2: Rerank Cohere Rerank 4 Top 5 with scores LLM Generates answer + citations Why two stages? Vector search is fast but noisy. Reranker is slower but precise. Combine: recall many, rerank to a few. Net effect: better answers, lower LLM cost (smaller context), better citations. Without rerank, RAG quality plateaus around top-5 vector recall, usually not enough for high-precision tasks.
Figure · Two-stage retrieval. Vector recall is wide and cheap; rerank is narrow and precise.

Hybrid search: keyword + vector

Vector search misses queries like "Form 1099-B" because the model treats it as similar to many other tax forms. Keyword search nails it. Hybrid combines both with a weighted score. Oracle AI Database 26ai's Unified Hybrid Vector Search supports this natively in one SQL. OCI OpenSearch supports it as well via score blending.

Enterprise AI Governance & Guardrails

The platform controls around models and agents: guardrails, private endpoints, API keys, IAM, audit, and network isolation.

What Oracle provides natively

  • Guardrails for OCI Generative AI: prompt and response checks for unsafe content, prompt injection, and other policy violations.
  • On-demand guardrail evaluation: call guardrails directly around a model request, or compose guardrails into your agent path.
  • Dedicated endpoint guardrails: inform/block behavior for dedicated AI cluster endpoints where supported.
  • PII detection via OCI Language Service, still useful as a deterministic pre/post filter when you need explicit PII categories.
  • Agent-level governance: citations, tool schemas, human-in-loop approvals, and session limits.
  • Private networking controls: private endpoints, service gateways, IAM policies, resource principals, and zero-trust network patterns where available.

What still belongs in your architecture

ControlWhy Oracle's platform control is not enough by itselfArchitecture move
Domain policyGeneric guardrails do not know your business rules, competitors, contract terms, or regulatory scope.Keep a domain policy layer in your app or agent instructions, then post-check outputs against explicit policy.
AuthorizationA model can only be safe if retrieval and tools enforce the user's actual entitlements.Filter before retrieval. Use VPD / row-level security in 26ai, Fusion roles in Fusion, and compartment/IAM boundaries in OCI.
Tool safetyGuardrails do not make a side-effectful tool safe.Schema validation, dry-run mode, idempotency keys, approval gates, and maximum-step budgets.
Audit evidenceCompliance needs exact inputs, outputs, model, version, user, tool calls, and citations.Write structured audit events to OCI Logging or your SIEM for every model/agent turn.
Network isolationPrivate endpoint support must still be paired with route, DNS, and egress controls.Use service gateways/private endpoints, deny public egress, and document every approved outbound path.

Layered defense pattern

Defense in depth around an LLM call User input 1 · PII scrub 2 · Injection check 3 · Topic filter 4 · Length cap 5 · Identity check System prompt Defensive instructions Persona, scope Refusal rules Citation requirements "Treat retrieved text as untrusted" LLM Built-in safety classifier Refusal behaviour JSON schema enforced Temperature limits Output checks 1 · PII redaction 2 · Toxicity scan 3 · Citation present? 4 · Tool args schema 5 · Audit log write
Figure · Guardrails are one layer. Production governance also needs identity, retrieval filtering, tool control, audit, and network isolation.
The one rule
Treat every retrieved chunk as untrusted user input. Your system prompt and guardrail policy must say so explicitly. This single discipline blocks most indirect prompt injection.

Oracle AI Vector Search GA 26ai · Jan 2026

Vectors as a first-class datatype, inside Oracle Database, indexed by HNSW or IVF, and joinable with relational, JSON, graph, and spatial in a single SQL statement.

Overview
Architecture
Features (26ai)
Indexes: HNSW vs IVF
Pricing & sizing
Risks & gotchas
When to use
TL;DR

If your data already lives in Oracle Database, AI Vector Search means RAG without a separate vector store. The vector lives next to the row. Permissions, backups, replication, failover, all reuse what you already operate. Comparable functionally to pgvector, Pinecone, or Weaviate, but with the killer feature of Unified Hybrid Search: vectors joined with relational predicates, JSON paths, graph hops, and spatial filters in one query. No separate orchestration layer.

What it actually is

A new native datatype, VECTOR(dimensions, format), with two index types (HNSW and IVF), a SQL function set (VECTOR_DISTANCE, VECTOR_EMBEDDING, VECTOR_CHUNKS), and the ability to load ONNX embedding models into the database so embedding generation happens server-side without network calls. Plus the SQL planner has been extended to combine vector predicates with normal predicates intelligently.

Why this is a big deal for Oracle shops

  • No new license. Included in all editions of 26ai, including Standard Edition 2.
  • No new operational team. Your existing DBAs run it.
  • Row-level security flows through. A VPD policy that protects the row also protects its vector.
  • Backups already cover it. RMAN, Data Guard, GoldenGate just work.
  • Transactionally consistent retrieval. Vector search returns results consistent with your read snapshot, a property no standalone vector DB offers.

Architecture

Oracle AI Database 26ai, single instance view Application SQL or PL/SQL JDBC / ODP.NET / Python APEX page Oracle AI Database 26ai engine SQL processor · Unified Hybrid Search planner Joins vector predicates with relational + JSON + spatial + graph in one plan Relational tables CUSTOMERS, ORDERS, etc. Standard datatypes Indexes: B-tree, bitmap VPD-protected rows Vector columns VECTOR(1024, FLOAT32) Indexes: HNSW (in-mem) or IVF (on disk) DML-consistent JSON · Spatial · Graph JSON-relational duality SDO_GEOMETRY Property graph (SQL/PGQ) All joinable with vector In-DB ONNX runtime Load embedding model into DB DBMS_VECTOR_CHAIN.LOAD_ONNX_MODEL VECTOR_EMBEDDING(text USING model) No network call · No leak risk Server-side embedding generation External LLM calls (optional) DBMS_VECTOR_CHAIN / Select AI Provider creds in Vault OCI Gen AI · OpenAI · Cohere · Azure For generation, summarization DB stays the orchestrator
Figure · 26ai keeps relational, vector, JSON, spatial, graph, and ONNX-embedding in a single engine. SQL is the API.

Feature set: what 26ai adds vs first 23ai release

FeatureStatusWhy it matters
VECTOR datatypeGAFirst-class storage. Variable dimensions and formats (FLOAT32, FLOAT16, INT8, BINARY).
HNSW indexGAIn-memory graph index. Fastest recall for moderate corpora that fit in Vector Memory Pool.
IVF indexGAOn-disk partitioning index. Scales to very large corpora without memory pressure.
HNSW with DML 26aiGATransactionally consistent vector queries even with concurrent inserts/updates, including on RAC.
Unified Hybrid Vector Search 26aiGAMix vector + relational + JSON + spatial + graph predicates in one query, planned together.
In-DB ONNX embeddingGAGenerate vectors server-side. No network egress for the embedding step.
DBMS_VECTOR_CHAINGAPL/SQL package for chunk → embed → store → retrieve pipelines.
Distance functionsGAL2, cosine, dot, Hamming, Manhattan. Pick per use case.
QuantizationGAReduces vector storage 4-32x with controlled accuracy loss.
Globally Distributed DB vector searchGAVector search across sharded deployments. For geo-distributed corpora.
Free in Autonomous DB free tierGATry it on the always-free ATP without spending a cent.

Unified Hybrid Search: one query, many predicates

The standout 26ai capability. A single SQL can ask: "Find me passages semantically similar to this query, where the owning department is HR, that mention 'parental leave' (text predicate), authored after Jan 2025, in any of these JSON-tagged jurisdictions." The optimizer plans vector and non-vector predicates together. In other architectures this requires post-filtering or a metadata sidecar. In 26ai it's one statement.

Index trade-offs

PropertyHNSWIVF
StorageIn-memory (Vector Memory Pool)On-disk
Query speedSub-millisecond for moderate corporaSingle-digit ms with right partitioning
Build costHigher; graph constructionLower; partition-based
DML supportYes (26ai), transactionally consistentYes
Best fit≤ few million vectors, latency-sensitiveTens of millions+, memory-constrained
RAC behaviorReplicated on all instancesDistributed across instances
Tuning knobsM, ef_construction, ef_searchnlist, nprobe
Pick HNSW by default; switch to IVF when the index doesn't fit in memory
The Vector Memory Pool is sized via parameter vector_memory_size. Monitor it. If HNSW pages start spilling, performance collapses and you should re-plan as IVF or shard the corpus.

Licensing

AI Vector Search is included in all editions of Oracle AI Database 26ai at no additional license cost. Standard Edition 2, Enterprise Edition, Autonomous Database, Exadata Cloud@Customer, Exadata Database Service, all include it. This is the single biggest commercial pivot from prior versions, where vector workloads required additional features or third-party tools.

What you actually pay for

  • CPU/memory of the DB hosting vectors. Plan extra memory for HNSW (rule of thumb: count × dim × 4 bytes × 1.5 overhead).
  • Storage for vectors. A 1024-dim FLOAT32 vector ≈ 4 KB. 10M vectors ≈ 40 GB. Add overhead for indexes.
  • Embedding generation. If you use OCI Gen AI for embeddings, you pay per character at the Gen AI rate. If you use ONNX in-DB, no per-call charge, just CPU.
  • RAC + Data Guard if you need HA. Standard DB licensing rules apply.

Sizing example

CorpusVectorsStorage (FLOAT32)HNSW RAM ballpark
Internal wiki, 50K docs × 10 chunks each500,000~2 GB~6-10 GB
Product catalog with descriptions + reviews5,000,000~20 GB~60-100 GB
Legal corpus, fine-grained50,000,000~200 GBHNSW won't fit; use IVF

Risks and gotchas

RiskWhat goes wrongMitigation
Vector Memory Pool spillHNSW degrades when index doesn't fit; latency blows up silentlyMonitor v$vector_memory_pool; alarm on usage > 80%; pre-plan IVF migration path.
Re-embedding costSwitching embedding models requires regenerating all vectorsVersion the embedding model in metadata; batch re-embed; budget the LLM cost.
Chunking baked into the tableBad chunk size hurts forever unless re-ingestedStore raw doc + chunks separately; design re-chunkability from day one.
RAC HNSW replication overheadHNSW index duplicated on every instance; memory bloat at scaleFor very large indexes on RAC, consider IVF or distribute across shards.
Quantization accuracy lossFLOAT32→INT8 saves space but can shift top-K resultsA/B test recall before adopting; keep one full-precision baseline.
Hybrid query plan surprisesOptimizer picks wrong order; vector predicate evaluated on too many rowsUse SQL hints, gather stats on vector columns, test with EXPLAIN PLAN.
ONNX model driftEmbedding model loaded into DB grows stale vs the OCI hosted versionPin a model version per table; document upgrade procedure.
PII in vectorsEmbeddings can leak the original text via inversion attacksTreat vector columns as PII; protect with VPD; encrypt at rest (TDE on by default in Autonomous).

Use AI Vector Search when

  • Your source-of-truth data already lives in Oracle Database.
  • You need vector retrieval to respect existing row-level security.
  • You want to join vector similarity with relational, JSON, spatial, or graph predicates in one query.
  • You don't want to operate a separate vector DB.
  • You need on-prem inference (Exadata, Linux x86-64), 26ai is on-prem GA Jan 2026.

Skip and use something else when

  • Your data lives outside Oracle and pulling it in is impractical, use OCI OpenSearch or a Knowledge Base backed by Object Storage.
  • You need exotic ANN algorithms (DiskANN, ScaNN) that 26ai doesn't ship, go to a specialist vector DB.
  • You're a Postgres shop without Oracle, pgvector is fine for moderate scale.

Select AI GA

Natural language to SQL inside the database. PL/SQL package, four modes, multiple LLM providers, RAG-capable. Available in Autonomous and on-prem 26ai.

TL;DR

Select AI lets users ask the database in plain English. Behind the scenes DBMS_CLOUD_AI sends the question plus schema metadata to an LLM (OpenAI, Cohere, Azure OpenAI, or OCI Gen AI), gets SQL back, and either runs it (runsql), shows it (showsql), explains the result (narrate), or chats (chat). Reported accuracy ~95% on TPC-H. Useful for analyst self-service. Not a replacement for hand-tuned queries on hot paths.

The four modes

ModeWhat it returnsTypical use
runsqlExecutes the generated SQL and returns rowsSelf-service reporting for trusted users
showsqlReturns SQL text without executingAnalyst review before running; explainability
narrateReturns SQL + natural-language explanation of resultsBusiness-user dashboards, embedded BI
chatGeneral chat with the underlying LLM, no SQL focusGeneral-purpose assistant from within the DB

Provider integration

Select AI is provider-pluggable. You create an AI profile that names a provider (OpenAI, Cohere, Azure OpenAI, OCI Generative AI) and credentials, then attach it to a session. Switching providers is a config change, not a code change. Credentials live in Vault.

Where it fits in an enterprise

Good fit

Internal analytics self-service. Quarter-end ad-hoc questions. Sales ops, finance ops, customer support analytics. Embedded chat-with-data in APEX apps. Low-volume, knowledgeable users who can spot a wrong SQL.

Poor fit

Production OLTP queries (latency, predictability). External customer-facing chat (cost, security, schema leakage). Tables with unstable schemas or cryptic column names (the LLM gets confused). High-volume bursty workloads (cost spikes).

Risks specific to Select AI

  • Schema disclosure. The LLM sees your table and column names. If those reveal sensitive structure, scope it via grants and avoid passing schemas with regulated-data hints in their names.
  • Wrong SQL that runs. The model may produce SQL that returns wrong numbers without erroring. Prefer showsql for non-trivial questions and let a human approve.
  • Cost surprises. A natural-language question can produce a SQL that table-scans a fact table. Add query timeouts and resource manager plans.
  • Cross-database queries. Don't expect the model to understand database links or sharded topologies without explicit metadata coaching.

In-Database ONNX Embeddings

Load an embedding model into Oracle AI Database 26ai. Generate vectors with a SQL function. No network call, no API key, no per-character cost.

The pattern

Most embedding pipelines call out to a hosted model (OCI Gen AI, OpenAI). That introduces latency, cost per call, and a data-leak surface. In-DB ONNX inverts the dependency: you load the embedding model into the DB once, then call VECTOR_EMBEDDING(text USING model_name) as a function in any SQL. Embeddings happen on the DB server.

Why architects care

  • No egress. Embedding data never leaves the DB box. Critical for regulated content.
  • No per-call cost. Pay for CPU you already own, not per million characters.
  • Lower latency on bulk re-embed. Eliminate network round-trip per chunk.
  • Simpler ops. No external service dependency in the embedding pipeline.

Trade-offs

ConcernIn-DB ONNXHosted (OCI Gen AI)
Latency per callLower (no network)Higher (network + service)
Cost per callNone, pay for DB CPUPer character
Model freshnessYou manage upgradesOracle maintains
Model selectionAnything in ONNX format ≤ size limitCurated set
CPU pressure on DBYes, sizing concernNone
Compliance / sovereigntyStrongest (data never leaves)Service-bound

Where it slots in

Path A: External embeddings DB row → call OCI Gen AI / OpenAI → receive vector → store in VECTOR column + Best model quality + Oracle maintains updates - Network + per-char cost - Data egress Path B: In-DB ONNX DB row → VECTOR_EMBEDDING(text USING model) → vector materialized inline → stored locally + Zero egress · zero per-call cost + Server-side · bulk-efficient - DB CPU pressure - You own model lifecycle
Figure · Path A is simpler ops, Path B is cheaper at volume and the only option for high-sovereignty data.
Don't embed on the hot path without thinking
Calling VECTOR_EMBEDDING inside an OLTP transaction will tax the DB CPU and burn redo. Embed at ingest time, store the vector, query the stored vector, same as you would with any external embedding pipeline.

Oracle AI Database Private Agent Factory 26ai

A no-code/private agent factory for enterprise data. Use it when business users or engineers need knowledge agents grounded in approved repositories, files, web sources, and Oracle Database data without exposing the workflow through a general-purpose SaaS chatbot layer.

TL;DR

Private Agent Factory matters because it treats Oracle's database and enterprise repositories as the trust boundary. It includes no-code agent creation, pre-built assistants, prompt lab patterns, knowledge agents, approved data sources, embeddings, and private retrieval. This is the right pattern when you need grounded agents over enterprise content without pushing sensitive schema and documents into a separate chatbot platform.

Reference architecture

Private agent architecture anchored in Oracle AI Database 26ai Private app APEX / Java / Python Authenticated user Private Agent Factory No-code agent definition Knowledge agents · prompt lab Approved data sources Oracle AI Database 26ai Tables · files · repositories VECTOR indexes · hybrid search Roles · auditing · approved sources SharePoint · Google Drive · internal sites Optional model path: OCI Gen AI, private AI container, or approved external model gateway
Figure · Private Agent Factory is strongest when SQL privileges, vector search, and audit logs must stay database-native.

Use it when

  • The corpus is private enterprise content: database rows, internal sites, file shares, SharePoint, Google Drive, or uploaded documents.
  • You need no-code agent creation for business users while preserving engineered controls around approved sources and model management.
  • You need explainable retrieval over documents and vectors without standing up a separate vector DB.

Do not use it when

  • The agent is primarily a Fusion process agent covered by Fusion Agentic Apps or AI Agent Studio.
  • The corpus is mostly non-Oracle documents in object stores and a managed OCI Generative AI Agent would ship faster.
  • You need a consumer-grade assistant UX with broad channels, analytics, and bot lifecycle tools; evaluate Oracle Digital Assistant or app-layer tooling.

Oracle Private AI Services Container 26ai

A lightweight containerized web service for Oracle AI Database 26ai that offloads expensive vector work outside the database: embedding generation and HNSW vector-index creation.

TL;DR

Private AI Services Container is not a private LLM chatbot runtime. Current docs describe two services: a Vector Embedding Service and a Vector Index Service. It can run in your data center or cloud compute, does not require internet access, processes requests statelessly, and helps free database CPU/GPU capacity for search and transactional work.

Architecture decision

QuestionUse in-DB ONNX / DB CPUUse Private AI Services Container
Embedding volume is low or DB CPU is availableSimple and localProbably unnecessary
Embedding/index creation is expensiveCan starve database resourcesOffload work to external compute while storing vectors in Oracle AI Database
Need GPU-accelerated HNSW index creationLimited by DB host capabilityUse the Vector Index Service with NVIDIA GPU-backed compute
Need no-internet/private operationGood if model already loaded in DBGood: container can run without internet and is called by DBMS_VECTOR or REST clients
Need hosted chat / reasoning modelNot the right layerNot the right layer; use OCI Gen AI, Private Agent Factory with configured LLMs, or a model gateway

Two services in the container

ServiceWhat it doesHow it connects
Vector Embedding ServiceGenerates embeddings outside the database and stores/uses them with Oracle AI Database similarity search.Called from DBMS_VECTOR procedures such as UTL_TO_EMBEDDING / UTL_TO_EMBEDDINGS, or via REST/OpenAI SDK-style clients.
Vector Index ServiceOffloads HNSW vector index creation to GPU-backed compute for faster index builds.Referenced from CREATE VECTOR INDEX parameters that point at the container REST endpoint and API key.

Risks

  • Model freshness. You manage embedding model updates; stale embeddings quietly degrade retrieval quality.
  • Capacity sizing. Offloaded vector work shifts latency and throughput onto your container hosts.
  • Patch ownership. Treat the container like production infrastructure, not a demo appliance.
  • Endpoint security. Protect the container endpoint and API key; it can be invoked by database jobs or REST clients.
  • Audit consistency. Log embedding/index jobs, model versions, container version, target table/index, and caller.

OCI Vision GA

Pretrained and custom-trainable image analysis. Object detection, classification, OCR, document image understanding. API + Console + SDK.

TL;DR

Two modes. Pretrained: call an API, get labels/boxes/text/faces. Cheapest, fastest, no setup. Custom: upload labeled images, train your own classifier or detector through the Console. Useful when off-the-shelf labels miss your domain (manufacturing defects, retail SKUs).

Capabilities

CapabilityPretrainedCustom trainingTypical use
Object detectionYesYesCount items, locate defects, retail shelf scanning
Image classificationYesYesTag content, route images by category
OCR (text in images)Yes-Receipt scanning, signage extraction
Document image analysisYes-Forms, tables, overlaps with Document Understanding
Face detectionYes-Privacy-aware face blur, attendance counting

Indicative pricing (verify on the OCI pricing page)

Pretrained image analysis is in the low-cents-per-thousand-images range. Custom model training is hourly per GPU-hour. Always check current numbers before committing.

When to use Vision vs Document Understanding

Rule
If the input is a document (PDF, invoice, form), start with Document Understanding: it's purpose-built for tables, forms, key-value. If the input is a scene (a photo of a shelf, a manufacturing line, a security camera frame), use Vision.

Risks

  • Custom-trained models drift as products and packaging change. Retrain quarterly or on accuracy degradation alarms.
  • OCR accuracy degrades on low-quality scans. Pre-process (deskew, contrast) before sending.
  • Face detection has compliance implications. Document the legal basis before deploying.

OCI Language GA

NLU primitives for text: sentiment, entity recognition, PII detection, key phrase extraction, language detection, classification, translation.

TL;DR

Not an LLM. A set of classical NLP services with pretrained models, exposed as APIs. Cheap per call, deterministic outputs, easy to embed in pipelines. Use for the boring-but-essential text tasks where you don't need generation, PII scrubbing, sentiment scoring on tickets, language routing on multilingual input.

Capabilities

CapabilityUse case
Sentiment analysisCustomer feedback triage, NPS-style scoring
Aspect-based sentiment"The screen is great but the battery is poor" → screen+, battery-
Named entity recognition (NER)Extract people, orgs, locations, dates
PII detectionPre/post filter for LLM pipelines
Key phrase extractionAuto-tag content
Language detectionRoute multilingual tickets
Text classificationCustom-trainable category labels
TranslationCommon language pairs; not best-in-class, fine for internal use

Language coverage

Most analytical features (sentiment, NER) cover English, Spanish, French, German, Portuguese, Italian out of the box. Coverage varies by feature, check the docs per service. For broader language coverage, pair with an LLM via Gen AI.

Where Language slots into Gen AI

The pattern that works: use Language as cheap pre/post filters around expensive LLM calls. Detect language to pick the right system prompt, scrub PII before sending to the model, classify intent to skip the LLM when a deterministic answer exists. This drops Gen AI cost by 30-60% on a typical customer-support workload.

OCI Speech GA

Speech-to-text (ASR) for audio files and streams. Multiple languages, speaker diarization, SRT/VTT output, profanity handling.

Capabilities

  • Batch transcription of audio/video files in Object Storage.
  • Real-time streaming for low-latency captioning use cases.
  • Speaker diarization ("who spoke when") for call recordings and meetings.
  • Normalization of times, addresses, numbers, URLs in the output text.
  • Profanity filter: remove, mask, or tag.
  • SRT/VTT subtitle output for video.
  • Custom vocabulary for domain words the base model mis-hears.

Common architectures

Contact center analytics

Recordings land in Object Storage → Speech transcribes with diarization → Language extracts sentiment + entities → Gen AI summarizes the call → write back to CRM. End-to-end at <30¢ per call typically.

Meeting summarization

Teams/Zoom recording → Speech with diarization → Gen AI summarizes per speaker, extracts decisions, generates action items → write to Asana/Jira via tool call from an agent.

Risks

  • Background noise drops accuracy. Pre-process or use a dedicated noise-suppression step before transcription.
  • Diarization struggles with overlapping speakers. Document accuracy expectations to stakeholders.
  • Audio data residency matters more than most teams think, keep buckets in the right region.
  • Real-time streaming has stricter quotas; plan capacity before peak loads (e.g. live earnings calls).

xAI Voice on OCI Generative AI May 2026

Text-to-speech through the OCI Generative AI model layer. Treat it as output generation for voice agents, training narration, call-center assist, and accessibility workflows.

TL;DR

OCI Speech is speech-to-text. xAI Voice is text-to-speech. Keep the distinction clear in architecture diagrams: Speech turns audio into text; xAI Voice turns model output or authored content into audio. Voice quality, latency, language support, and region availability must be tested in the exact OCI region you plan to use.

Reference pattern: voice agent response

User audio Phone / web mic OCI Speech ASR transcript Agent / LLM RAG + tools + policy xAI Voice TTS output Audio reply Stream/playback Production controls Moderate the text before TTS. Log final text + voice model + audio object URI. Cache static narration. Cap response length.
Figure · Full voice loop: OCI Speech for input, Gen AI/agent for reasoning, xAI Voice for output.

Use it when

  • You need voice output from an Oracle-hosted Gen AI workflow without procuring a separate TTS vendor.
  • You are building an internal assistant, training-content generator, or call-center agent response path.
  • You can tolerate region/model availability checks and quality testing before launch.

Risks

  • Latency. Voice adds another model call after generation. Stream audio where possible.
  • Unsafe audio. Guardrail text before synthesizing. Audio moderation after generation is harder.
  • Voice consistency. Pin the voice/model choice and test regression on every catalog update.
  • Cost. Cache repeated announcements and training clips instead of regenerating.

OCI Document Understanding GA Generative extraction 2026

Extract text, tables, key-value pairs, signatures, and classifications from PDFs and document images. 2026 update added generative extraction for context-aware parsing.

TL;DR

The boring backbone of most enterprise AI projects. Invoices, contracts, KYC packets, claims forms. Document Understanding handles OCR, layout, table detection, and key-value extraction. The 2026 generative extraction upgrade improves accuracy on free-form fields and complex tables by adding LLM-grade context reasoning.

Capabilities

FeatureWhat it does
Text extractionOCR with layout preservation
Table extractionDetect tables, extract as structured rows + cells
Key-value extractionPretrained for invoices, receipts, IDs; custom-trainable for your forms
Document classificationRoute into the right downstream queue
Signature detectionFlag whether a signature is present in a region
Generative extraction 2026LLM-backed extraction for ambiguous fields, free-form sections, multi-column layouts

Pricing structure

Charged per transaction (a page or a document, depending on the operation). First 5,000 transactions per month are free: useful for low-volume pilots and Always-Free tier exploration.

Reference pipeline

PDFs / images Object Storage bucket Event trigger Doc Understanding OCR · tables · KV Generative extraction Validation Confidence thresholds Human-in-loop System of record Fusion / EBS / ATP Via OIC / Functions For low-volume: 5K transactions/month free covers most pilots. For high-volume: rate-limit upstream; batch ingestion; track per-page cost trend.
Figure · Reference document automation pipeline.

Risks

  • Custom KV models need labeled data, budget annotation time honestly.
  • Generative extraction is more accurate but slower and more expensive than classic OCR-only. Mix modes based on document type.
  • Tables with merged cells or nested headers still cause issues, sample your hardest documents in PoC.

OCI Anomaly Detection GA

Multivariate time-series anomaly detection using Oracle Labs' MSET2 algorithm. Trained on your historical normal-operation data; scores new observations as anomalous or not.

Why this exists

Most enterprise anomaly problems aren't univariate (a single sensor spike). They're multivariate, "this combination of pressure, temperature, vibration, and current is unusual together even though no single value is out of spec." MSET2 (Multivariate State Estimation Technique) was developed at Oracle Labs for nuclear plant monitoring; it generalizes to manufacturing, fleet telemetry, and IT ops.

How it works

  • Train on a window of historical "normal" data, sensor values, KPI series, whatever is multivariate and time-aligned.
  • Score new observations, returns an anomaly score per timestamp and per signal.
  • Explain: identifies which signals are contributing to the anomaly.

Where it fits

Industrial

Production lines, turbines, HVAC, refrigeration. Multivariate sensor data already collected; just needs a model and a daily training refresh.

IT ops

Application telemetry, error rate, latency, throughput, GC time. Catches issues that single-metric alerts miss.

Risks

  • Concept drift. What was normal six months ago isn't now. Retrain on a rolling window.
  • Cold start. Needs enough clean historical normal data, typically weeks to months.
  • False positives. Tune detection sensitivity per use case; pair with operator runbook.
  • Not a forecasting service. Use OCI Forecasting if you need next-value prediction.

OCI Forecasting GA

AutoML for univariate and multivariate time-series. Pick a target series, optional exogenous regressors, and a horizon; the service auto-selects and trains a model.

What it gives you

  • Point forecasts + prediction intervals.
  • Auto algorithm selection across classical (ARIMA, ETS) and ML (Prophet-style, gradient boosted) approaches.
  • Holiday and seasonality handling out of the box.
  • Multi-horizon forecasts.
  • Explanation of which features drive a forecast.

Common use cases

DomainSeriesWhy Forecasting helps
RetailDaily demand per SKU per storeReplenishment planning
Finance opsAR / AP cash flowWorking capital forecasting
WorkforceContact volume per skill per 30-minStaff scheduling
EnergyLoad per substationProcurement and dispatch

Risks

  • Garbage in, garbage out, handle missing values and outliers upstream.
  • Forecasts are only as good as the regressors you provide; promos, holidays, pricing must be fed in if they drive the series.
  • Auto model selection isn't auto governance, log the chosen model + features per retrain cycle for audit.

Oracle AI Data Platform 2026

The governed data plane for AI teams. Use it to organize data products, catalogs, connections, metadata, workspaces, notebooks, and pipelines so agents and models do not each invent their own data access layer.

TL;DR

AI projects fail when every assistant has its own copy of data, metadata, and permissions. Oracle AI Data Platform is the control layer for turning enterprise data into governed, reusable AI assets: catalogs, data products, agent-ready connections, notebooks, Spark workflows, and repeatable data pipelines.

Architecture role

Oracle AI Data Platform as the data plane between sources and agents Enterprise sources Oracle DB · Fusion Object Storage · SaaS Streams · apps AI Data Platform Catalog · data products Spark workflows · pipelines Notebooks · RBAC · lineage Policies · lineage · owners Agents Enterprise AI Fusion APEX Models Gen AI Data Science AQA Key idea: one governed data-product / vector-index layer feeds many AI experiences, instead of one-off ingestion per chatbot.
Figure · AI Data Platform is not a model runtime; it is the shared data control plane for AI workloads.

When to use it

SituationWhy AI Data Platform helps
Multiple teams are building RAG over the same documentsCreate governed, reusable vector/index assets instead of duplicate chunk/embed pipelines.
Agents need data from many Oracle and non-Oracle sourcesCentralize connection, lineage, ownership, and refresh rules.
Data products already exist or are being formalizedExpose them to AI consumers with ownership and policy instead of raw tables/buckets.
Data engineering owns pipelines, app teams own agentsSeparate concerns cleanly: Data Flow and catalog upstream, Enterprise AI Agents downstream.

Risks

  • Governance theater. Catalog entries without owners, refresh SLAs, and access rules do not help agents.
  • Pipeline duplication. Decide which datasets, catalogs, and metadata flows are authoritative and version them like APIs.
  • Latency. A shared data plane is not always the fastest path for hot transactional reads; keep OLTP queries close to the source DB.
  • Team boundaries. Data product owners must participate in AI design, or agents will be grounded on misunderstood data.

OCI Data Science GA

JupyterLab notebooks, MLOps pipelines, model catalog, model deployment, jobs, monitoring. Built on conda environments and an Operator pattern.

TL;DR

The home of custom ML on OCI. Notebook sessions (JupyterLab) for exploration, Jobs for batch training, Pipelines for orchestration, Model Catalog for governance, Model Deployment for inference endpoints. If you're building a model from scratch, classical ML or fine-tuning an open-weight LLM, this is where you live. For pre-built FM deployment, use AI Quick Actions (it sits on top of Data Science).

Building blocks

ComponentWhat it isWhen you use it
Notebook sessionsManaged JupyterLab on VM or BM (CPU or GPU)Exploration, prototyping, training scripts
Conda environmentsCurated environments incl. PyTorch, TF, RAPIDS, LangChain, Oracle SDKsReproducible runtime
JobsRun notebooks or scripts on demand on chosen shapesBatch training, scheduled retraining
PipelinesDAG of jobs with input/output passingMulti-step training and eval workflows
Model CatalogVersioned registry with metadata, provenance, tagsGovernance, audit, hand-off to deploy
Model DeploymentManaged HTTP endpoint with autoscaleHosting custom models for inference
Model MonitoringDrift, performance, schema integrity over timeProduction health
Feature StoreCentralized feature definitions, online/offlineMulti-team ML at scale

Where it fits in the AI stack

Data Science is the "build your own" layer. If your problem is solved by Cohere or Llama as-is, you don't need Data Science, use Gen AI Service. If you need a custom model (classical ML, fine-tuned open-weight LLM, vision model, time-series), you do. Many enterprises end up using Data Science only for the 10-20% of use cases that don't fit a managed AI service, the rest go through Gen AI Service or AI Quick Actions.

Risks and ops realities

  • Idle notebook spend. Notebook sessions running on a GPU cost real money even when idle. Auto-stop policies are essential.
  • Environment sprawl. Teams customize conda environments; reproducibility erodes. Pin environments per project.
  • Model-to-production gap. Notebook code rarely runs cleanly as a Job. Budget time for the productionization step every project.
  • Compliance for training data. Training datasets often contain PII. Treat them with the same controls as the source-of-record DB.

AI Quick Actions GA Llama 4 + gpt-oss 2026

No-code foundation-model deployment and fine-tuning. Pick a model from a catalog, click Deploy, get an endpoint. Or pick a model, point at training data, click Fine-tune.

What's in the model catalog (June 2026)

  • Meta Llama 4: Scout, Maverick.
  • Meta Llama 3.x: including Llama 3.2 90B Vision Instruct.
  • OpenAI open-weight: gpt-oss-120b, gpt-oss-20b.
  • Phi, Falcon, Mistral, Granite, pre-cached, faster cold start.
  • Bring-your-own from Hugging Face, direct import.

What it actually saves you

If you've ever deployed an open-weight LLM on raw GPU instances, you've burned days on Docker images, vLLM/TGI tuning, autoscaling, health checks, log shipping. AI Quick Actions does all of that with one click. You give up some flexibility (you can't pick exactly which serving runtime version, for instance) for an order of magnitude faster time-to-endpoint.

When AI Quick Actions vs Gen AI Service

QuestionAnswer
You want a managed endpoint with no infraGen AI Service
You need a model not in Gen AI catalog (e.g. Mistral, Falcon, gpt-oss)AI Quick Actions
You need fine-tuning with custom dataAI Quick Actions OR Gen AI dedicated cluster
You need on-demand pay-per-tokenGen AI Service (AQA is dedicated GPU)
You need to import from Hugging FaceAI Quick Actions
You need to keep the model entirely in your tenancyAI Quick Actions (dedicated by definition)

Risks

  • Dedicated GPU cost, deployment runs hourly regardless of traffic. Auto-shutdown for dev/test environments.
  • Model size vs shape, Llama 4 Maverick won't fit on a single GPU; AQA picks the right shape, but you should understand the floor cost.
  • Fine-tuning quality depends on data quality more than algorithm choice. The clicky UI doesn't change that.

Fusion Agentic Applications GA · Mar 2026

22 pre-built AI agents embedded inside Oracle Fusion Cloud. Native to the transactional system. Governed by Fusion roles, approvals, and data. CX expansion in Apr 2026.

TL;DR

This is Oracle's biggest 2026 application-layer announcement. Twenty-two agentic apps that live inside Fusion ERP, HCM, SCM, and (since Apr 2026) CX. They are not chatbots, not copilots, and not add-ons. They run inside the transactional system, see the same data and approval hierarchies users see, and execute work autonomously when allowed. If you run Fusion, you already paid for this, pilot it before you build anything custom.

What "agentic" means here

Oracle's definition: outcome-driven, proactive, reasoning, and engineered for enterprise execution. Concretely, an agentic app does four things a copilot doesn't: it (1) initiates work without a user prompt, (2) plans across multiple steps and tools, (3) executes within the transactional system using existing roles, and (4) measures outcomes back. The boundary between "agent" and "automation" is fuzzy, but the integration depth is the meaningful difference.

Where the 22 agentic apps sit (representative: Oracle keeps adding)

PillarExample agentic apps
FinanceProcure-to-Pay agent · Expense intake agent · Period-close anomaly investigation · Collections triage
HR / HCMWorkforce Operations agent (scheduling, payroll issue triage) · Recruiting assistant · Performance review summarization · Time-off conflict resolver
Supply chainDemand-supply imbalance investigator · Supplier risk monitor · Logistics exception handler · Quality issue triage
CX Apr 2026Sales next-best-action · Service case summarization & resolution · Marketing campaign optimizer

Why this changes build-vs-buy math

Before Fusion Agentic Apps, an enterprise wanting a "smart period-close" agent had to (a) buy or build an LLM platform, (b) integrate it with Fusion Financials data, (c) replicate role permissions, (d) wire it into approval workflows, (e) operate it. Now Oracle ships steps (a)-(e) as a configured agentic app under your existing Fusion subscription. The build case has to clear a much higher bar.

For large enterprise Fusion customers
Run a pilot on 1-2 agentic apps in a sandbox. The TCO conversation versus building custom on OCI Gen AI is now lopsided whenever the use case maps to a Fusion agent. Treat custom builds as the exception, not the default.

Risks

  • Change management. Agentic apps execute work, that means human approvers see different workflows. Governance and communications matter more than the tech.
  • Configuration drift. Each agentic app has settings. Track them per environment and put them under change control.
  • Data quality exposed. Agents reason on Fusion data. If your master data is messy, agents are less useful. Fix data first.
  • Vendor coupling. Deeper dependency on Fusion. Be intentional about which agents you adopt and which you keep optionality on.

AI Agent Studio for Fusion Expanded Mar 2026

Build, connect, and orchestrate agents that work alongside Fusion. Includes the Agentic Applications Builder, content intelligence, contextual memory, ROI measurement, and workflow tools. Included with Fusion subscriptions at no extra cost.

What you can build

  • Custom agents on top of Fusion data and APIs, using Oracle, partner, or external agents as building blocks.
  • Agentic applications (workflows of agents) via the no-code Agentic Applications Builder.
  • External integrations: pull in Slack, Teams, Microsoft 365, ServiceNow, and similar.
  • ROI dashboards: measure agent impact (time saved, cycle time, decision accuracy).

Studio vs OCI Generative AI Agents: what's the difference?

QuestionAI Agent Studio (Fusion)OCI Generative AI Agents
AudienceFusion customers and Fusion partnersAny OCI customer building agents
Data integrationNative to Fusion data + rolesObject Storage, OpenSearch, 23ai
UXLow-code/no-code builderAPI + SDK + Console
CostFree with Fusion subscriptionPay per use (model + retrieval costs)
Best fitWorkflows touching Fusion recordsRAG over enterprise corpora outside Fusion

They are complementary, not redundant. A bank could use OCI Generative AI Agents for an internal policy chatbot over a SharePoint-style document corpus, and AI Agent Studio for finance-close agents that operate inside Fusion ERP.

Oracle Digital Assistant GA

Oracle's conversational-assistant platform for enterprise channels. Use it when the hard problem is bot lifecycle, skills, channel delivery, and human-agent handoff rather than raw model prompting.

TL;DR

Oracle Digital Assistant (ODA) is still relevant in the GenAI era. It gives you channel adapters, skills, conversation flows, analytics, human handoff, and Fusion/Oracle app integration. OCI Generative AI Agents can power the intelligence behind a bot; ODA is often the front door and lifecycle layer for chat experiences.

ODA vs Enterprise AI Agents

QuestionOracle Digital AssistantOCI Enterprise AI Agents
Primary jobConversation UX, channels, skills, routing, handoffLLM reasoning, RAG, tools, vector stores, responses API
Best user surfaceWeb chat, mobile, messaging, service channels, app-embedded assistantsAPI-driven agents embedded into apps, workflows, or custom UIs
Human handoffFirst-class patternDesign through tools/workflows
Knowledge groundingCan integrate LLM/GenAI capabilities into skillsNative knowledge bases/vector stores
Best fitCustomer/service bot with channels and lifecycle managementEnterprise RAG/tool agent behind one or more apps

Reference pattern

Channels Web · mobile · service Digital Assistant Skills · flows · handoff Analytics · channel state Enterprise AI Agent RAG · tools · guardrails Responses API Systems Fusion · Service · DB Knowledge bases · APIs
Figure · ODA is the conversation/product layer; Enterprise AI Agents are the reasoning and retrieval layer.

Risks

  • Wrong layer. Do not rebuild RAG plumbing in ODA skills when Enterprise AI Agents already provides it.
  • Channel complexity. Each channel has identity, session, and attachment quirks; test the exact deployment channel.
  • Handoff design. Human handoff must include transcript, context, model answer, and source citations, not just "transfer to agent."

APEX AI GA 24.2 RAG & AI configs

Low-code AI inside Oracle APEX. AI Assistant for developers and end users, AI-driven data modeling, dynamic-action generative text, AI Configurations + RAG Sources, vector search integration.

TL;DR

If you build internal apps with APEX, AI is no longer something you bolt on. AI Configurations let you define a system prompt + model + RAG sources once, then reuse across pages. Dynamic Actions "Show AI Assistant" and "Generate Text with AI" embed chat and generation in two clicks. Search Configurations wire 26ai Vector Search into your search pages without writing the SQL yourself. For Oracle-shop developers, this is the fastest path from "we have a wiki" to "we have a RAG chatbot over our wiki" in production.

What APEX 24.2 adds (relevant to AI)

FeatureWhat it does
AI Configurations (Shared Component)Bundle system prompt + welcome message + RAG sources. Reuse across pages.
RAG SourcesPoint at 23ai Vector Search tables, REST endpoints, or APEX queries.
Show AI Assistant (Dynamic Action)Chat panel using the chosen AI Configuration.
Generate Text with AI (Dynamic Action)Generate content on demand from a user prompt + template.
AI-Driven Data ModelingDescribe a model in plain English; APEX generates tables, sample data.
Search Configuration with Vector SearchAdd semantic search to APEX page items without hand-writing SQL.
APEX_AI PL/SQL packageProgrammatic access from PL/SQL when the dynamic actions aren't enough.

Provider support

APEX talks to OCI Generative AI, OpenAI, Cohere, and Azure OpenAI through provider configurations. The AI Configuration abstracts the provider, apps don't change when you swap.

Reference pattern: RAG chatbot in APEX in a day

  1. Create a VECTOR column on your content table in 23ai/26ai. Populate it (in-DB ONNX or Cohere via DBMS_VECTOR_CHAIN).
  2. Create a RAG Source pointing to that table.
  3. Create an AI Configuration with a system prompt + that RAG Source + your preferred provider.
  4. On any page, add the "Show AI Assistant" dynamic action bound to that configuration.
  5. Ship.
For Apps DBAs and Oracle-shop developers
APEX AI is the lowest-effort way to add a working RAG agent to an internal tool. If you can write a SQL query and click in a Console, you can ship one in a day. The hard part, chunking, embedding pipeline, RAG orchestration, is hidden behind shared components.

Risks

  • Provider credentials live in APEX Web Credentials, protect them like any other secret.
  • Embed cost is real if you store all your content in the DB and embed via OCI Gen AI. Use in-DB ONNX where possible.
  • Audit trail of LLM calls isn't automatic, log via APEX_AI calls to a log table for compliance.

MySQL HeatWave GenAI GA

In-database LLMs, automated vector store, lakehouse access, and natural-language chat, all inside MySQL HeatWave. Multilingual, JavaScript-callable, VLM-enhanced PDF parsing as of MySQL 9.4.2.

TL;DR

For MySQL shops, HeatWave GenAI is the analog of what 26ai is for Oracle DB shops: vector search + LLM access without leaving the database. Differences: HeatWave bundles in-database LLMs (you don't have to call out), it has tight Object Storage ingestion that auto-parses PDFs/PPTs/HTML/DOC, and it integrates with HeatWave Lakehouse so you can query non-MySQL data alongside MySQL data. The lakehouse + vector store combination is the most differentiated part of the offering.

Components

ComponentWhat it is
In-database LLMsModels that run inside HeatWave for generation, summarization, chat, no external API call required
Vector storeInbuilt store for embeddings + similarity search
Automated ingestionParses PDF (incl. scanned), PPT, TXT, HTML, DOC from Object Storage; chunks; embeds; loads
VLM-based PDF parsingVision-Language-Model enhanced extraction for complex PDFs (tables, charts). Added MySQL 9.4.2.
Lakehouse NavigatorUI to browse MySQL + Object Storage data, load into vector store
JavaScript stored programsInvoke GenAI from JS inside HeatWave; preprocess SQL data, call LLMs, post-process
MultilingualSupports 24+ languages across the GenAI APIs

When to choose HeatWave GenAI over OCI Gen AI + 23ai

  • You already run MySQL HeatWave for analytics.
  • Your source data is heterogeneous (MySQL + Object Storage + S3) and you want lakehouse ingestion.
  • You want LLM inference inside the database (no egress to an external service).
  • You don't need Oracle Database features (PL/SQL, VPD, RAC).

Risks

  • Sizing, in-DB LLM inference is GPU-intensive on the HeatWave node. Pick shapes deliberately.
  • Available model family inside HeatWave is narrower than OCI Gen AI's catalog.
  • HeatWave is OCI-first; some features are not available on AWS or Azure deployments of HeatWave.

AI Infrastructure: GPU shapes & networking

What you actually rent when you need raw inference or training capacity. NVIDIA H100 / H200 / B200 / GB200 NVL72, RDMA cluster networks, dedicated regions, sovereign deployments.

TL;DR

Oracle's GPU story is unusually strong because of long-standing NVIDIA collaboration and aggressive supply commitments. H100/H200/B200 bare-metal shapes plus GB200 NVL72 Superclusters (announced expansion at GTC 2026) are available in commercial, gov, classified, sovereign, and dedicated regions. For most enterprise AI, you don't touch these directly, Gen AI Service, AI Quick Actions, and Fusion abstract them away. You care about GPU shapes when (a) you're fine-tuning a 70B+ model, (b) you're hosting custom inference, or (c) you're doing frontier training.

Shape family (illustrative: confirm in OCI Compute docs)

Shape familyGPUPer nodeTypical use
BM.GPU.A100.8NVIDIA A100 80 GB8 GPUs · NVLinkMature training and inference baseline
BM.GPU.H100.8NVIDIA H100 80 GB8 GPUs · NVLinkDefault for fine-tuning 70B-class models
BM.GPU.H200.8NVIDIA H200 141 GB8 GPUs · NVLinkLong-context inference, larger models in fewer nodes
BM.GPU.B200NVIDIA B200Blackwell-classNew-generation inference and training
GB200 NVL72 SuperclusterNVIDIA GB200 NVL72Rack-scaleFrontier training, very large model serving

Cluster networks & RDMA

For multi-node training, GPUs talk to each other faster than they talk to anything else. OCI Cluster Networks use RDMA over Converged Ethernet (RoCE) with very low latency and high bandwidth between bare-metal GPU nodes. If you're training a model that doesn't fit on one node, this is the lever that determines wall-clock time.

Sovereignty and region matrix

Region typeNotable AI availability
Commercial OCI (50+ regions)Full Gen AI catalog, GPU shapes, Data Science
US Gov CloudOCI Gen AI GA, full service set
US Classified CloudOCI Gen AI GA (May 2026), select services
UAE Central (Abu Dhabi)OCI Gen AI GA (May 2026)
EU SovereignOCI Gen AI subset, full data residency
Dedicated Region (DRCC)Full OCI in your data center, including AI services where licensed
Where Oracle has a real edge
For workloads that absolutely cannot leave a jurisdiction (sovereign, gov, classified, regulated industries), Oracle is often the only hyperscaler with GA Gen AI in the right region. Don't underweight this for procurement.

Architecture Patterns

Five reference patterns that cover most enterprise Oracle AI projects. Each names the services, the data flow, and the failure modes.

Pattern 1: Internal RAG chatbot over enterprise documents

User Web · Slack · Teams Gen AI Agents Session + tools Knowledge Base 23ai · OS · OpenSearch Cohere Rerank 4 Cohere Command A Variants If corpus is in Object Storage and rarely changes: Gen AI Agents with Object Storage KB → simplest. If corpus is structured + needs row-level security: Gen AI Agents with 23ai KB → VPD flows through. If you already operate OpenSearch: Gen AI Agents with OpenSearch KB → reuse your indexing.
Pattern 1 · The most common Oracle RAG architecture in 2026.

Pattern 2: In-database RAG inside an APEX app

For Oracle-shop teams that already use APEX, this collapses the stack dramatically. The data, the embeddings, the search, the chat UI, all inside the database and APEX. Provider call out to OCI Gen AI for generation only. Fastest time-to-prod for internal tools.

User APEX page APEX AI Assistant AI Configuration + RAG Source Oracle AI Database 26ai VECTOR column · HNSW In-DB ONNX embeddings OCI Gen AI Generation only Strengths: Lowest moving parts. Reuses existing DB roles & backups. Weakness: Constrained to APEX UI. Not for customer-facing high-traffic SaaS.
Pattern 2 · APEX + 26ai = minimal-moving-parts RAG. Ideal for internal tools.

Pattern 3: Document automation pipeline

Invoices, claims, contracts arrive as PDFs and need to become structured records. Doc Understanding extracts, validation rules check, exceptions route to humans, results land in the system of record. Add a Gen AI step to summarize or classify when needed.

Pattern 4: Fusion-native agentic workflow

For Fusion customers, this is now the default. Pick an agentic app, configure thresholds and approval routing, monitor outcomes. Custom logic goes into AI Agent Studio. Custom data integrations via Fusion REST APIs or OIC. Almost never needs to call OCI Gen AI directly, the agent uses the embedded LLM.

Pattern 5: Custom fine-tuned model deployment

Niche but real. You have labeled data and a use case where a 7B-13B fine-tuned open-weight model beats prompting a frontier model on cost and accuracy. Pipeline: Data Science notebook for preparation → AI Quick Actions or Gen AI dedicated cluster for fine-tuning → Model Deployment endpoint → integrate via your app. Reserve this for cases where you've already proven a managed model doesn't work.

A pattern most teams skip too long
Many Oracle customers default to Pattern 5 (custom) when Pattern 1 or 4 would have shipped in weeks. The bias toward "we'll build it ourselves" wastes quarters. Always justify why a managed pattern doesn't work before going custom.

Decision Matrix

Quick answers to the questions that come up in every architecture review.

Do I use OCI Generative AI Service or direct model APIs?
Use OCI Gen AI when you need OCI-native IAM, private networking, billing, audit, guardrails, and a supported or importable model fits. Go direct when you need an exact model/version that is not available in your target OCI region.
Vector store: 26ai or OpenSearch or a third-party vector DB?
26ai if data is already in Oracle DB or you need joins with relational/JSON/spatial/graph predicates. OpenSearch if you have a search team operating it. Third-party only when you need a retrieval engine or managed ecosystem Oracle does not provide.
On-demand, dedicated AI cluster, or imported model?
On-demand for pilots and bursty workloads. Dedicated for predictable volume, isolation, fine-tuning, or regulated data. Imported compatible models when your model is not in the OCI managed catalog but you still want OCI endpoint/governance patterns.
Build a custom agent or use Fusion Agentic Apps?
If you run Fusion and the use case maps to one of the 22 apps, use Fusion. Custom build only when no Fusion app exists or your scenario is fundamentally outside Fusion's data.
Enterprise AI Agents or AI Data Platform first?
Use Enterprise AI Agents first when one team needs one agent quickly. Use AI Data Platform first when many teams need reusable governed catalogs, data products, lineage, RBAC, notebooks, and workflow ownership.
Embed via in-DB ONNX or via Gen AI Service?
In-DB ONNX for high-volume corpora, sovereignty needs, or zero-egress requirements. Gen AI Service for best-quality embeddings and low operational burden. Often a mix: in-DB for bulk, Gen AI for occasional re-embeds.
Enterprise AI Agents or roll-your-own RAG?
Enterprise AI Agents unless you need non-standard retrieval, graph RAG, custom ranking, or a non-Oracle model gateway. Then build with Gen AI Service + 26ai/OpenSearch and keep the agent framework thin.
AI Quick Actions or Data Science Model Deployment?
AI Quick Actions if the model is in its catalog or on Hugging Face. Data Science Model Deployment if you have a custom-trained model that isn't an LLM.
Select AI for analytics?
Yes for knowledgeable internal analysts on Autonomous DB. No for customer-facing or production-OLTP queries.
APEX AI vs custom front-end?
APEX AI for internal tools and rapid POCs. Custom front-end when you need pixel-perfect UX or external-facing branding.
Private Agent Factory or OCI Enterprise AI Agents?
Private Agent Factory when the trust boundary is the Oracle Database and you want database-native security/audit. Enterprise AI Agents when the trust boundary is OCI and you need broader managed tools, vector stores, and hosted APIs.
Need to run vector AI privately?
26ai on Exadata or Linux x86-64 gives you vector search + in-DB ONNX embeddings. Private AI Services Container offloads embedding generation and HNSW index creation while keeping vectors tied to Oracle AI Database. DRCC is the larger cloud-in-your-data-center option for broader OCI services.

Pricing & Cost Control

Where the money actually goes, and the levers that move it.

Verify exact rates
Numbers here are descriptive of cost behavior, not procurement quotes. Always check the OCI pricing pages, ask your account team for current rates, and rebuild the model in your own cost calculator before signing anything.

Cost drivers by service

ServiceUnitWhat inflates the bill
OCI Gen AI on-demandPer 1M charactersSystem prompt bloat, oversized RAG context, retries, naive token-by-token streaming logging
OCI Gen AI dedicated clusterPer hourCluster idle outside business hours, over-provisioned units, dev/test left running
Imported compatible modelsDedicated serving + storageWrong shape, oversized context windows, duplicate model copies across compartments
AI Vector SearchDB CPU + memory + storageHNSW Vector Memory Pool sizing, re-embeds, dense quantization, RAC HNSW replication
Enterprise AI AgentsUnderlying Gen AI + retrieval + hosted toolsLong sessions retained, large KBs over-ingested, no rerank cap, tool loops
xAI Voice / TTSGenerated audio outputRegenerating static content, long answers, no audio cache
Oracle AI Data PlatformPlatform resources, Spark/workflow runs, workspaces, storageDuplicate catalogs, unnecessary reprocessing, stale pipelines rerun in full
OCI VisionPer 1000 images / GPU-hourCustom training reruns, full-fidelity images when down-sampled would work
OCI Document UnderstandingPer transaction (first 5K free/mo)Re-running on bad PDFs, generative-extraction mode by default
OCI SpeechPer audio-minuteReal-time streams left open, retries on noisy audio
OCI LanguagePer API callCalling Language inside per-token pipelines when batched would do
Data ScienceNotebook session VM/GPU hours, Jobs, Model DeploymentIdle GPU notebooks, dev model deployments running 24/7
AI Quick ActionsDedicated GPU hoursPOC deployments forgotten, oversized shapes
Private AI Services ContainerPrivate compute + storage + opsUnder-sized hosts, unmanaged embedding-model updates, duplicated environments
Fusion Agentic AppsIncluded with Fusion subscriptionIncluded, no marginal cost beyond the model usage in the underlying Gen AI Service if you customize
AI Agent StudioIncluded with Fusion subscriptionSame, no marginal cost
HeatWave GenAIPer HeatWave node (GPU shapes)Wrong shape for in-DB LLM inference

The cost-control checklist

  • Token / character budgets per session. Hard cap. Alarm on breach. Kill on hard breach.
  • Rerank to top-K. Reduce context size by 60-80% with no quality loss in most RAG.
  • Prompt caching. Where the model supports it, cache the system prompt.
  • Audio caching. For xAI Voice, store repeated greetings, disclosures, and training clips rather than regenerate.
  • Idle-shutdown on every notebook session, every dev Model Deployment, every dev AQA deployment.
  • Tag everything. OCI cost tracking by tag is the only way to attribute spend across teams.
  • Quotas. Service limits + Resource Manager quotas prevent a runaway agent from consuming a quarter of budget overnight.
  • Two-tier deployments. Cheaper model for routing/classification, expensive model only on the path that needs it.
  • Egress. Keep callers inside OCI when calling Gen AI at volume. Outbound from OCI adds up.

Indicative cost shape (illustrative, not a quote)

WorkloadDominant costOrder of magnitude
Pilot RAG chatbot, 1K queries/dayGen AI on-demand per characterTens to low hundreds USD/month
Production internal chatbot, 50K queries/dayGen AI on-demand + rerankerLow thousands USD/month
High-volume customer-facing assistant, 1M queries/dayGen AI dedicated clusterTens of thousands USD/month per cluster
Fine-tuned model serving sustained trafficDedicated cluster + storageTens of thousands USD/month per cluster
Voice-enabled agentGen AI text + xAI Voice TTS + audio storageSimilar to chatbot cost plus generated-audio usage
Document automation, 100K pages/monthDoc Understanding transactionsHundreds to low thousands USD/month
26ai vector search on ExadataDB CPU/memoryOften $0 incremental over existing DB licence

Risks & Gotchas

The honest list. The stuff you want to hear before pilot, not after go-live.

Model and provider risks

RiskWhat happensWhat to do
Model deprecationApp breaks when a model retiresAbstract model name; test against the family's next-gen early; monitor release notes.
Deprecated Gen AI APIsLegacy text-generation integrations stop working after API retirement windowsUse current chat/responses SDK paths; inventory old GenerateText/SummarizeText-style calls before June 2026.
Catalog driftModels available in one region but not anotherDocument the model-region matrix; revisit at every release; pin a fallback model.
Pricing changes mid-contractPer-character rates change; budget overrunsNegotiate dedicated commitments for predictable spend; alarm on weekly rate change.
Vendor strategy shiftsA model is dropped from OCI catalog for partnership reasonsTreat model identity as a config; have a tested second choice.
"Best" model isn't on OCIStakeholders ask "why not Claude/GPT?"Document the multi-criteria choice openly; price out hybrid (OCI gov regions + direct API for some workloads).

RAG quality risks

RiskWhat happensWhat to do
Bad chunkingRetrievals miss the answer that's actually in the corpusPilot chunk sizes; structural chunking over fixed-length; measure recall against a gold set.
No rerankerVector top-5 contains noise, LLM hallucinates around itAdd Cohere Rerank 4 as standard.
Stale knowledge baseIndex lags source-of-truth changesSchedule + alarms on ingestion lag; expose "Last refreshed" to users.
Hallucinated citationsAnswer claims a chunk supports it when it doesn'tRender source chunks alongside; post-hoc verification step for high-stakes outputs.
Duplicate AI data pipelinesDifferent teams transform or ingest the same corpus differently and get inconsistent answersUse AI Data Platform or a shared data-product process to declare authoritative datasets, owners, RBAC, and refresh rules.

Security risks

RiskWhat happensWhat to do
Prompt injection from documentsRetrieved doc instructs the model to override system promptDefensive system prompt; mark retrieved chunks explicitly as untrusted; classify chunks for adversarial content.
PII leakageModel echoes PII back to wrong userPII scrub on input + output; per-user data isolation in retrieval; audit log of prompt + completion.
Cross-tenant leakage in SaaSCached completions surface across tenantsPer-tenant prompts; no global cache; tenant-scoped sessions.
Schema disclosure via Select AILLM sees sensitive column namesGrant minimally; avoid sensitive hints in column names; review queries before runsql.
Vector inversion attackEmbeddings reverse-engineered to recover textTreat vectors as PII; protect with VPD; TDE at rest.
Private vector container driftContainer embedding model or index service falls behind the database/vector-search designPatch and test Private AI Services Container on a monthly cadence; track embedding model, container version, and index parameters in audit logs.

Operational risks

RiskWhat happensWhat to do
Cost runawayRecursive agent or buggy loop burns a budget overnightPer-session budgets; max-step caps; alarms on cost-per-session anomalies; kill switch.
Quota throttling503s during peak, customer painPre-warm; raise quotas before peaks; move hot workloads to dedicated.
Audit gapsCannot prove what was said in regulated contextLog prompt + completion + model + version + user; retain per policy; index for review.
Model output schema breaksTool call args mis-formattedStrict JSON schema; validate before dispatch; fallback to clarifying turn.
Notebook GPU left idle$thousands wasted per month per teamAuto-stop after N minutes; weekly idle report; tag-based chargeback.

Strategic risks

  • Vendor lock-in. Going deep on Fusion Agentic Apps deepens Fusion lock-in. Worth doing where the apps fit, but be explicit about which workloads stay portable.
  • Skills. Oracle AI requires SQL/PL/SQL + cloud + AI knowledge. The unicorn engineer who has all three is rare. Plan training; pair Apps DBAs with data scientists.
  • Pace of change. Oracle ships new models monthly. Architectures that assume model stability age fast. Build for swappability.
  • Realistic accuracy expectations. 95% accurate is great in lab, terrible if a 5% wrong tax filing creates regulatory exposure. Match accuracy expectation to consequence.

OCI vs AWS vs Azure vs GCP: AI services

Honest four-way side-by-side as of June 24, 2026. Not Oracle marketing, not anyone's marketing. Names and prices move monthly, so verify in each console before you commit.

TL;DR

All four are now broad model platforms plus a managed agent runtime plus a governance layer. The real differences in mid-2026 are (1) who owns the frontier model (AWS has the Anthropic relationship and Amazon Nova; Microsoft Foundry has OpenAI and Phi; Google owns Gemini outright via DeepMind; OCI owns none and instead resells Cohere, xAI, Meta, NVIDIA and lets you import the rest), (2) data gravity (OCI wins when your system of record is Oracle Database or Fusion; Google wins when it is BigQuery), and (3) sovereignty (OCI has the widest GA Gen AI footprint across gov, classified, EU sovereign, and GCC; Google now runs Gemini fully air-gapped on-prem via Google Distributed Cloud; AWS only opened its European Sovereign Cloud in Jan 2026 with a thin model catalog). Two naming changes to know: Azure AI Foundry is now Microsoft Foundry (effective Jan 1, 2026), and Vertex AI is now the Gemini Enterprise Agent Platform (Google Cloud Next, Apr 2026).

Generative AI platforms

AspectOCI Enterprise AI / Gen AIAWS BedrockMicrosoft Foundry (was Azure AI Foundry)Google Cloud (Gemini Enterprise, was Vertex AI)
Frontier / foundation modelsCohere, Meta Llama 4, xAI Grok, NVIDIA Nemotron 3, Gemini options; imported Qwen, Gemma, OpenAI gpt-ossAnthropic Claude (Opus 4.8/4.7, Sonnet 4.6, Haiku 4.5), Amazon Nova 2, Meta Llama, Mistral, DeepSeek, Qwen, Cohere, and OpenAI GPT-5.5/5.4OpenAI GPT-5 family + GPT-5.5, Anthropic Claude, Meta Llama, Mistral, xAI Grok, Microsoft PhiGoogle Gemini 3 Pro / 3 Flash (in-house); 200+ in Model Garden incl. Anthropic Claude, Meta Llama, Mistral, open Gemma
Owns a frontier model?No (platform/aggregator strategy)Anthropic stake; Amazon Nova in-houseOpenAI partnership; Phi in-house SLMsYes - Gemini, built in-house by Google DeepMind
Managed agent runtimeOCI Enterprise AI Agents (GA Mar 2026): RAG, tools, vector stores, Responses-style API, governance hooksBedrock AgentCore (GA Oct 2025) + Managed Harness (GA Apr 2026): runtime, gateway, memory, identity, policy, code interpreter, browser, evalsFoundry Agent Service: GPT-5 family, model router, built-in browser automation and MCP toolsGemini Enterprise Agent Platform: ADK (stable v1.0), Agent Engine runtime + Memory Bank, Agent Studio, A2A protocol v1.0
Vector in source DBOracle AI Database 26ai native VECTOR type + HNSW/IVF; hybrid RAG exposed as MCP toolAurora / RDS pgvector; Aurora DSQL vectors where applicable; OpenSearchAzure AI Search; Azure SQL / SQL Server vector; FabricAlloyDB AI (pgvector + ScaNN), BigQuery vector search, Spanner vector
Guardrails / safetyEnterprise AI Governance + native guardrails (prompt + response eval, Mar 2026) + Language PII filtersBedrock Guardrails (mature) + Automated Reasoning checksAzure AI Content Safety + Foundry guardrails/evaluationsModel Armor AI firewall (prompt-injection, data-leak, content filters; multi-model)
Shared AI data planeOracle AI Data Platform + OCI Resource Analytics (Jun 2026)SageMaker Unified Studio + Glue / S3 TablesMicrosoft Fabric + OneLake + FoundryBigQuery + Vertex + Dataplex governance
Model routing / cost controlFlexible model routing (work across models, not one-size-fits-all)Per-model selection; intelligent prompt routing on BedrockModel router: Quality / Cost / Balanced modes, up to ~60% inference savingsVertex model selection; per-agent pricing in the Gemini Enterprise product
Apps integrationFusion Agentic Apps native (22 agents, ERP/HCM/SCM/CX)Amazon Q Business, Amazon ConnectMicrosoft 365 Copilot, Dynamics 365Google Workspace (Gemini in Docs/Gmail/Sheets), Workspace Studio
On-demand pricing unitPer 10,000 transactions (characters) for several models; token-based for newer onesPer 1M input/output tokensPer 1M input/output tokensPer 1M input/output tokens
Developer tooling maturityImproving fast but still behind on breadthMature (Bedrock Studio, SageMaker, AgentCore)Mature (Foundry portal, VS Code, GitHub)Mature (Vertex / Gemini platform, Colab Enterprise, Workbench)
How to read the model row
The headline in 2026 is that all four sell each other's neighbors. OpenAI's open gpt-oss models run on OCI and Bedrock. Anthropic Claude runs on Bedrock, Microsoft Foundry, and Google Cloud. The one model that stays first-party is Google's Gemini, which you only get on GCP (with narrow exceptions where OCI exposes it). The lock-in is no longer the model. It is the data plane, the agent runtime, and the governance model around it.

Model availability, side by side (June 2026)

Model familyOCIAWS BedrockMicrosoft FoundryGoogle Cloud
Google Gemini 3 (Pro / Flash)~ limited, where exposed first-party flagship
Anthropic Claude (Opus 4.8 / Sonnet 4.6 / Haiku 4.5) not first-party (call direct) flagship available Model Garden
OpenAI GPT-5 / GPT-5.5 (hosted API)~ GPT-5.5 / 5.4 added primary
OpenAI gpt-oss (open weights) import + AI Quick Actions self-deploy
Cohere Command / Embed 4 / Rerank 4 strategic partner~ partial~ Model Garden
Meta Llama 4 (Scout / Maverick)
xAI Grok (4.x)~ select
NVIDIA Nemotron 3 (Nano Omni / Ultra) dedicated clusters~~~ via NIM
Amazon Nova 2 in-house
Microsoft Phi in-house
Alibaba Qwen / Google Gemma (open) import Gemma in-house

✓ = first-party / managed · ~ = partial, region-limited, or recently added · ✗ = not native (use direct API or a gateway). Always confirm exact model IDs and regions in the console.

Pricing, normalized (representative, June 2026)

Read this before the table
Comparing list prices across clouds is a trap. OCI bills several on-demand models per 10,000 transactions (characters), while AWS, Azure, and Google bill per token. Roughly, 1 token ≈ 4 characters in English, so 10,000 characters ≈ 2,500 tokens, but this varies by language and tokenizer. Note Google charges Gemini 3 Pro at a higher rate once a request crosses 200K input tokens. The figures below are representative list prices pulled from vendor and third-party pricing pages in June 2026, normalized to USD per 1M tokens (input / output) where possible. Treat them as order-of-magnitude, not quotes. Verify on the official pricing pages.
ItemOCIAWS BedrockMicrosoft FoundryGoogle Cloud
Flagship reasoning model (in / out per 1M tok)Grok / Cohere top tier, token-based up to ~$10.7 in (varies by model)Claude Opus 4.8 ≈ $5 / $25GPT-5.5 ≈ $5 / $30Gemini 3 Pro ≈ $2 / $12 (≤200K ctx); $4 / $18 beyond
Mid-tier workhorse (in / out per 1M tok)Cohere Command A / Llama 4 (low per-character rates)Claude Sonnet 4.6 ≈ $3 / $15GPT-5 mini (lower-cost tier)Gemini 3 Flash ≈ $0.50 / $3.00
Cheapest small modelLlama 4 Scout ≈ $0.0018 / 10K transactionsAmazon Nova Micro ≈ $0.035 / 1M inGPT-5 nano / Phi (low-latency tier)Gemini 3 Flash-Lite tier
EmbeddingsCohere Embed 4 ≈ $0.001 / 10K transactionsTitan / Nova multimodal embeddings, per 1M tokAzure OpenAI embeddings, per 1M tokVertex AI text embeddings, per 1M tok
RerankerCohere Rerank 4 on-demand; dedicated ≈ $10 / cluster-hourCohere Rerank via Bedrockvia Azure AI Search semantic rankerVertex AI Ranking API / grounding
Dedicated / provisionedPer AI-unit-hour (e.g. large Cohere ≈ $24, large Meta ≈ $12)Provisioned Throughput (model units / hour)Provisioned Throughput Units (PTUs)Provisioned Throughput (GSUs)
Prompt caching discountModel-dependentUp to ~90% on cached inputUp to ~90% on cached inputContext caching discount

Sources: Oracle OCI Generative AI pricing page; Anthropic / AWS Bedrock pricing; Microsoft Foundry pricing; Google Vertex / Gemini pricing; third-party aggregators (June 2026). Prices change without notice.

Agents & RAG platforms, in depth

CapabilityOCI Enterprise AI AgentsAWS Bedrock AgentCoreFoundry Agent ServiceGemini Enterprise Agent Platform
GA statusGA Mar 2026AgentCore GA Oct 2025; Managed Harness GA Apr 2026GA; GPT-5 family rolling into the runtimeGA; rebranded from Vertex AI at Cloud Next, Apr 2026
Managed RAG / knowledge storesBuilt-in vector stores + 26ai + OpenSearch; Object Storage ingestionBedrock Knowledge Bases (managed ingestion + vector store)Foundry vector index + Azure AI SearchVertex AI Search + grounding; AlloyDB / BigQuery vectors
Tools / function callingTools + Responses-style APIGateway turns APIs/Lambdas into agent toolsBuilt-in tools + MCP + browser automationFunction calling + tools; A2A protocol v1.0 for agent-to-agent
Memory / identitySession state; governance hooksManaged memory, identity, policy engineThread state; Entra ID identityAgent Engine Sessions + Memory Bank; Google IAM
Built-in browser / code execVia tools / customYes: browser tool + code interpreter built inYes: browser automation; code interpreterYes: code execution + computer-use tooling
Observability / evalsGovernance + monitoring hooksBuilt-in evaluations + observabilityFoundry evaluations + tracingVertex evaluations + tracing
Standout strengthWired into Oracle data + Fusion roles, approvals, RBACMost complete standalone agent infra; model-agnosticTight M365 / Entra / GitHub fit + model router economicsOwns Gemini end-to-end; ADK + A2A; tight Workspace + BigQuery fit
Architect's read on agents
If you are building agents in the abstract, Bedrock AgentCore is the most complete runtime today. If your agents mostly act on Oracle data or inside Fusion, OCI's tighter coupling to roles, approvals, and the database usually beats a more capable but disconnected runtime. Foundry wins when the agent lives in the Microsoft 365 / Entra world. Google's platform wins when you want one vendor from the chip (TPU) to the model (Gemini) to the agent (ADK + A2A), or when your data already sits in BigQuery and Workspace.

Sovereignty & governance

DimensionOCIAWSMicrosoftGoogle
US governmentUS Gov Cloud + US Classified Cloud with GA Gen AI (since Jan 2026)GovCloud (US) + Secret / Top Secret regionsAzure Government + classified offeringsAssured Workloads for Gov; IL5-capable regions
EU sovereignEU Sovereign Cloud (GA, EU-operated)European Sovereign Cloud GA Jan 15, 2026 (Germany; partition aws-eusc)Microsoft Cloud for Sovereignty + EU Data BoundarySovereign Cloud; partner-operated (T-Systems Germany, S3NS / Thales France)
Gen AI in the sovereign region?Yes broad model set GA in sovereign/regulated regionsLimited Bedrock present but only Nova Lite / Pro at ESC launch (no Claude/Llama/Mistral)Varies by offering and regionYes Gemini runs fully air-gapped on-prem via GDC
GCC / Middle EastSaudi (Jeddah, Riyadh), UAE Central (Abu Dhabi, full Enterprise AI Jun 2026), IsraelUAE, Bahrain regions (model availability varies)UAE, Qatar regions (model availability varies)Saudi (Dammam), Qatar (Doha), Israel regions
Guardrails maturityNative platform guardrails (Mar 2026) + governance + PII filtersBedrock Guardrails (most mature) + Automated ReasoningContent Safety + Foundry evaluationsModel Armor AI firewall (multi-model, in-line)
Sovereign-region cost noteStandard regional pricing~15% premium in ESC; 2 AZs; no Free Tier at launchVaries by sovereign offeringGDC air-gapped needs Google-supplied hardware
Sovereignty: where OCI and Google now lead
Sovereignty is the dimension where the usual pecking order flips. OCI has the widest set of frontier and open models GA inside gov, classified, EU-sovereign, and GCC cloud regions. Google has taken a different and arguably stronger path for the hardest cases: Gemini now runs fully air-gapped on-prem through Google Distributed Cloud, even on a single disconnected server. AWS only opened its European Sovereign Cloud in January 2026, and at launch Bedrock there is limited to Amazon Nova Lite and Pro. Rule of thumb: for in-region cloud, shortlist OCI. For a true air-gap or on-prem mandate, shortlist OCI and Google.

When each wins

Buying criterionWinnerWhy
You already run Oracle DB / FusionOCIData gravity + Fusion Agentic Apps + free vector search
You want the Gemini model specificallyGCPGemini is first-party to Google; only narrow exposure elsewhere
You're BigQuery / Workspace nativeGCPData gravity in BigQuery; Gemini wired into Docs, Sheets, Gmail
You need a specific frontier model/version right nowAWS, Azure, GCP, or direct APIOCI catalog is broad in 2026, but exact model/version/region still decides.
You need M365 / Dynamics integrationAzureCopilot ecosystem
You're AWS-native on infraAWSIAM / VPC / observability already there
Sovereign data with GA Gen AI in regionOCIReach into Gov, Classified, UAE, EU Sov
True air-gapped or on-prem Gen AIGCP or OCIGemini runs air-gapped on GDC; OCI has classified / sovereign regions
Pure consumer SaaS at low costAWS, GCP, or direct APIWider model price competition; Gemini Flash is cheap
Document-heavy enterprise back officeTie (all four good)Each has competent doc AI + RAG

Quick alignment (informal)

OCIAWSMicrosoft Foundry (was Azure AI Foundry)Google Cloud
OCI Generative AI ServiceBedrockMicrosoft Foundry / Azure OpenAIVertex AI / Gemini API
OCI Enterprise AI AgentsBedrock Agents + KB + AgentCoreFoundry Agent ServiceGemini Enterprise Agent Platform (ADK + Agent Engine)
Oracle AI Data PlatformSageMaker Unified Studio / Glue / S3 TablesFabric / OneLake / Foundry data planeBigQuery + Dataplex + Vertex
AI Vector Search (26ai)OpenSearch · Aurora pgvectorAzure AI Search · SQL DB vectorAlloyDB AI · BigQuery vectors
OCI Data Science · AI Quick ActionsSageMaker · Bedrock MarketplaceFoundry · Azure MLVertex AI Workbench · Model Garden
OCI VisionRekognitionAzure AI VisionCloud Vision / Vertex Vision
OCI LanguageComprehendAzure AI LanguageCloud Natural Language
OCI Speech + xAI VoiceTranscribe + PollyAzure AI SpeechSpeech-to-Text + Text-to-Speech (Chirp)
OCI Document UnderstandingTextractAzure Document IntelligenceDocument AI
OCI Anomaly DetectionLookout for Metrics (deprecating)Anomaly Detector (deprecating)Timeseries Insights / BigQuery ML
OCI ForecastingSageMaker Canvas ForecastAzure ML AutoML ForecastingVertex AI Forecasting / BigQuery ML
Fusion Agentic AppsQ BusinessM365 CopilotGemini for Workspace / Agentspace
HeatWave GenAI(no direct equivalent)(no direct equivalent)BigQuery ML + Gemini
The unfashionable truth
For most enterprises, the right answer is multi-cloud AI. OCI for Oracle-data-anchored workloads and sovereign deployments. AWS or Azure for the frontier-model workloads. Google when you want Gemini, BigQuery-anchored AI, or air-gapped on-prem. Pretending one vendor wins everything is a procurement narrative, not an architecture one.

Sources used for this June 2026 refresh

Primary Oracle docs and blogs the content was anchored to, plus the competitive sources used for the OCI vs AWS vs Azure comparison. Verify the latest before locking in commitments.

June 2026 update (new this refresh)

OCI vs AWS vs Azure vs GCP comparison sources

OCI Generative AI

OCI Enterprise AI Agents and Governance

Oracle AI Database 26ai / 23ai Vector Search

Oracle AI Data Platform

Select AI

Fusion Agentic Apps & AI Agent Studio

APEX AI

Oracle Digital Assistant

MySQL HeatWave GenAI

OCI Data Science & AI Quick Actions

OCI AI Services

Infrastructure / NVIDIA partnership

Verification discipline
Pricing pages change without notice. Release notes get amended. Service names get renamed (see 23ai → 26ai). Always confirm in the OCI Console for your region and the official pricing pages before commitments.