Oracle AI, the practical way
This portal covers Oracle's full AI stack as of June 24, 2026. From OCI Generative AI Service, Enterprise AI Agents, and Oracle AI Data Platform, to AI Vector Search in Oracle AI Database 26ai, to the 22 Fusion Agentic Applications launched in March 2026. Architecture, trade-offs, risks, pricing. No marketing talk.
Oracle's AI story in 2026 has three centers of gravity. OCI Enterprise AI packages Generative AI Models, Enterprise AI Agents, and Governance into a managed build platform. Fusion Agentic Applications (GA Mar 2026) ships 22 pre-built agentic apps embedded inside Fusion Cloud ERP, HCM, SCM, and CX. Underneath everything sits Oracle AI Database 26ai with native vector search, Select AI, Private Agent Factory, and Private AI Services Container for workloads that must stay database-close or private. If you're an enterprise already on Oracle, this stack is increasingly hard to ignore.
How this portal is organized
Left sidebar groups Oracle AI into seven layers. Each service has its own page with tabs: Overview, Architecture, Models or Features, Pricing, Risks, and When to use. The bottom of the sidebar has decision matrices, architecture patterns, and a cross-cloud comparison.
Cohere Command A family, Llama 4 Scout & Maverick, xAI Grok 4.x, NVIDIA Nemotron, Google Gemini options, OpenAI open-weight models, importable compatible models, and dedicated AI clusters.
Native VECTOR datatype, HNSW + IVF indexes, unified hybrid search across vector + relational + JSON + graph + spatial, plus private agents and private AI containers.
22 pre-built agentic applications for Finance, HR, SCM, and CX. Native to the transactional system, governed by Fusion roles.
Governed data discovery, catalogs, workspaces, pipelines, notebooks, RBAC, and agent-ready data connections for AI teams that need a shared data plane.
Who this is for
Enterprise architects, Oracle DBAs and Apps DBAs moving into AI, technical leads scoping pilots, and anyone who has to defend an Oracle-vs-hyperscaler choice in a steering committee. Assumes you already know cloud, databases, and identity. Does not assume you know what an embedding is, and explains the AI-specific bits as it goes.
The Oracle AI mental model
Think of Oracle's AI in three layers stacked on top of each other.
What sets Oracle apart in 2026
| Differentiator | What it means in practice |
|---|---|
| Vectors live in the source-of-truth DB | No separate Pinecone or Weaviate. Transactionally consistent vector search next to your rows. Means RAG ground truth and operational data never drift. |
| Unified hybrid search | One SQL can join vector similarity with relational predicates, JSON paths, graph hops, and spatial filters. Other vendors need an orchestration layer. |
| Multi-model gateway (BYO LLM) | Cohere, Meta Llama 4, xAI Grok, NVIDIA Nemotron, one service, one endpoint, one bill. Useful when you want vendor optionality without re-architecting. |
| Fusion-native agents | 22 agentic apps run inside Fusion with full role security, approval hierarchies, and transactional context. Other vendors' agents have to bolt on to ERP via APIs. |
| Sovereign and Gov coverage | OCI Gen AI is GA in US Gov, US Classified, UAE Central, EU sovereign. Often the only path for regulated workloads. |
| Free-tier vector search | AI Vector Search ships at no extra licence cost in 26ai. Compared to Postgres + pgvector + vector-DB-as-a-service, the TCO comparison is brutal for the competition if you already pay for the DB. |
What Oracle is still weak at (be honest)
How to read the rest of this portal
Each service tab follows the same shape: Overview → Architecture → Models/Features → Pricing → Risks. If you only have time for one tab, read Risks. The other tabs tell you what something does. Risks tells you what burns you in production.
What's New - Q4 2025 through June 2026
Material changes that affect architecture, cost, or risk decisions. Curated, not a press-release dump.
Three things matter most. One: Oracle AI Database 26ai moved AI into the database, including vector search, Select AI, private agents, hybrid RAG exposed as MCP tools, and private vector-service containers. Two: OCI Enterprise AI is now a broader platform: Generative AI Models, Enterprise AI Agents, and Governance rather than only an endpoint catalog, and the June 2026 wave widened model choice (Nemotron 3 Ultra, Qwen, Gemma, gpt-oss on B200), promoted Cohere Rerank 4 to on-demand, and added OCI Resource Analytics. Three: Fusion Agentic Applications launched Mar 2026 with 22 pre-built agents inside Fusion ERP/HCM/SCM and expanded to CX in Apr 2026.
Major releases timeline
| Date | Release | Why it matters |
|---|---|---|
| Oct 2025 | Database 23ai renamed Oracle AI Database 26ai | Aligns with calendar versioning. AI Vector Search now standard, not an add-on. Branding tells customers AI is a first-class workload. |
| Jan 2026 | Oracle AI Database 26ai Linux x86-64 on-prem (RU 23.26.1) | Enterprises can run AI Vector Search on their existing Exadata / commodity Linux without going to OCI. |
| Jan 2026 | OCI Gen AI in US Classified Cloud | Top Secret / classified workloads can now use Gen AI without leaving the Oracle classified environment. |
| Jan 2026 | xAI Grok 4.1 Fast + Cohere Command A Vision, Command A Reasoning | Cheaper Grok variant for high-volume. Cohere adds vision and reasoning variants for enterprise agent patterns. |
| Mar 2026 | OCI Enterprise AI GA | Oracle formalizes the stack around Generative AI Models, Enterprise AI Agents, and Enterprise AI Governance. |
| Mar 2026 | Enterprise AI Agents GA | Agent runtime expands beyond basic RAG into a managed platform with tools, vector stores, responses API, and governance hooks. |
| Mar 2026 | Fusion Agentic Applications launch (22 apps) | Native ERP/HCM/SCM agents. Not bolt-on. Approval hierarchies and Fusion roles flow through automatically. |
| Mar 2026 | AI Agent Studio adds Agentic Applications Builder | No-code orchestration of Oracle, partner, and external agents. Free with Fusion subscription. |
| Mar 2026 | AI guardrails for OCI Gen AI on-demand | Native guardrail evaluation changes the production control model: validate prompts and responses at the platform layer, not only in app code. |
| Mar 2026 | NVIDIA GTC 2026: OCI Superclusters with GB200 NVL72 | For frontier training. Less relevant to most enterprises but signals Oracle's continuing GPU access advantage. |
| Apr 2026 | RU 23.26.2 for Oracle AI Database 26ai | Quarterly cadence now driving vector improvements (DML on HNSW, hybrid search refinements). |
| Apr 2026 | Fusion Agentic Apps for CX (Sales, Service, Marketing) | Expands from back-office (ERP/HCM) into customer-facing flows. |
| Apr 2026 | NVIDIA Nemotron 3 Nano Omni on OCI Gen AI | Adds a strong small-model option for multimodal use cases on commodity GPUs. |
| May 2026 | Import compatible models into OCI Gen AI | Architecturally important: teams can bring compatible models such as Qwen/Gemma-style models into the OCI Gen AI control plane instead of leaving everything in Data Science. |
| May 2026 | Cohere Embed 4 supports mixed text + image input | Useful for multimodal RAG over PDFs, slides, screenshots, catalog images, and claims packets. |
| May 2026 | Cohere Rerank 4 on OCI Gen AI | Better second-stage retrieval quality for RAG. Drop-in upgrade for existing RAG pipelines. |
| May 2026 | xAI Voice text-to-speech on OCI Gen AI | Adds hosted TTS to Oracle's Gen AI layer; use for call-center summaries, training narration, accessibility, and agent voice responses. |
| May 2026 | Grok 4.3 on OCI | Model catalog expansion for reasoning-heavy research and analysis workloads. Confirm region/model availability before designing around it. |
| May 2026 | OCI Gen AI in UAE Central (Abu Dhabi) | Sovereignty win for GCC customers and BFSI. Bedrock and Azure OpenAI parity issue for the region. |
| Jun 2026 | Deprecated Gen AI APIs become unavailable | Do not build new integrations on legacy GenerateText/SummarizeText style APIs. Use the current chat / responses APIs and SDK patterns. |
| Jun 2026 | Cohere Rerank 4 now on-demand and on Dedicated AI Clusters | Reranking is no longer dedicated-cluster only. On-demand pricing lowers the barrier to adding a second-stage reranker to existing RAG pipelines. Quality lift over raw vector search for little engineering cost. |
| Jun 2026 | NVIDIA Nemotron 3 Ultra on OCI Enterprise AI (dedicated clusters) | Open-weights frontier reasoning/agentic model you host on Oracle-recommended GPUs behind a managed OCI endpoint. Option for teams that want a strong open model under their own control plane, not a vendor API. |
| Jun 2026 | New Model Import models: Alibaba Qwen, Google Gemma; gpt-oss-20b/120b on B200 in Abu Dhabi | Widens the bring-your-own-model catalog inside the OCI Gen AI control plane. gpt-oss on B200 clusters in UAE Central pairs open OpenAI weights with sovereign-region hosting. |
| Jun 2026 | Multimodal: Cohere Embed 4 (text/image/combined) + xAI Voice TTS in Enterprise AI | Confirms multimodal RAG and voice as first-class on the platform. Embed 4 handles mixed text-image inputs; xAI Voice covers narration, accessibility, and agent voice responses. |
| Jun 2026 | OCI Enterprise AI GA in UAE Central (Abu Dhabi) | Moves beyond a single Gen AI endpoint to the full Enterprise AI stack in-region, on-demand or dedicated. Data-residency win for GCC and BFSI customers. |
| Jun 2026 | OCI Resource Analytics for cloud-estate intelligence | Near-real-time view of resources, relationships, and config metadata across regions/tenancies. Runs on Oracle AI Database with Select AI and MCP server support, so agents and assistants can query your estate in natural language. |
| Jun 2026 | OCI AI Accelerator Packs + Enterprise AI Chat reference architecture | Preconfigured, self-service AI solutions launchable from the OCI Console, plus a published reference architecture and GitHub deployment guide for enterprise-grade AI chat. Lowers time-to-first-pilot. |
| Jun 2026 | Hybrid RAG in 26ai exposed as an MCP tool | Oracle guidance now shows turning a 26ai vector index into an MCP tool for hybrid (vector + keyword) RAG. Signals MCP becoming the default integration surface between the database and agents. |
Practical implications for architects
26ai upgrade unlocks RAG without buying a vector DB. Plan an architecture review: which workloads can move to in-DB embeddings vs which need OCI Gen AI Agents service? The decision often comes down to whether the corpus is mostly structured (DB-side) or mostly documents (Agents service).
Pilot 1-2 of the 22 Agentic Apps now. They are included in your Fusion subscription. The build-vs-buy math for custom agents got worse, Oracle's are pre-wired into roles, approvals, and data. Build only what Fusion does not cover.
Start with Enterprise AI Agents and the current Responses-style APIs for managed RAG, tools, vector stores, and governance. Drop into Data Science only when you need custom model training or hosting AI Quick Actions doesn't cover.
Use Oracle AI Data Platform to define governed data products, catalogs, owners, lineage, RBAC, and refresh rules. Do not let every chatbot create its own private copy of the corpus.
OCI's reach into Gov, Classified, UAE, and EU sovereign regions has widened. For workloads where the data physically cannot leave a jurisdiction, OCI Gen AI is increasingly the only major-vendor option with a GA service.
Service Map
Every Oracle AI service worth knowing, in one diagram. Use this to orient before you go deep.
Reading the map
The top band is where you start if you are buying outcomes, pick a Fusion agent, hook up APEX RAG, or use a managed Gen AI Agent. The middle band is where you start if you are building, pick a model, write prompts, expose endpoints. The bottom two bands are where the data and compute live; you do not get to ignore them, because they drive cost and latency.
OCI Generative AI Service GA
Oracle's managed foundation-model platform. Use hosted models, import compatible models, build Enterprise AI Agents, apply guardrails, or rent a dedicated AI cluster. Region-restricted, IAM-governed, OCI-billed.
One platform, several control points. You can call hosted models for chat, embeddings, rerank, and text-to-speech; import compatible models into the Gen AI control plane; build Enterprise AI Agents; and apply native guardrails. You pick on-demand for elasticity or dedicated AI clusters for steady throughput, isolation, fine-tuning, and private capacity.
What problem this solves
Most enterprises don't want to manage GPU clusters, model weights, guardrail services, vector-store plumbing, and vendor contracts separately. They want a single OCI-governed surface with IAM, private networking, logging, cost controls, and the ability to swap models without rewriting the app. That's the offer. The trade-off is catalog and feature availability vary sharply by region and model family.
Two consumption modes: pick one per workload
| Mode | How you pay | Latency & isolation | Best for |
|---|---|---|---|
| On-demand | Per 1M characters (input + output for generation; input only for embeddings) | Shared GPU pool. Burst-tolerant. Variable latency under load. | Prototyping, low-volume prod, spiky workloads, dev environments. |
| Dedicated AI Cluster | Hourly per cluster, irrespective of utilization | Dedicated GPUs in your tenancy. Stable latency. Tenancy-isolated. | Steady high-volume traffic, regulated data, sub-second SLA, fine-tuned models, custom-trained adapters, imported compatible models. |
Reference architecture
Network and identity
The Gen AI Service endpoint is reachable through a Service Gateway and supports private endpoint patterns for workloads that should stay off the public internet. Authentication is OCI IAM, calls from compute instances use instance principals or resource principals, external apps use signed requests or governed API-key patterns. Authorization is granted via IAM policies on the generative-ai-family resource type. You can scope to compartments, model endpoints, and agent resources.
Where the data goes
Oracle's stated position is that on-demand requests are not used to train shared models. Dedicated clusters provide stronger tenant isolation. Logs of prompts and completions can be captured to OCI Logging at your discretion. For regulated data, prefer dedicated clusters, private endpoints, zero-trust network controls where available, and VCN-side egress rules so the only allowed path is to OCI services you approved.
Capability matrix (June 2026)
| Capability | On-demand | Dedicated cluster | Notes |
|---|---|---|---|
| Text generation | ● | ● | All chat / completion models. |
| Embeddings | ● | ● | Cohere Embed v4 multimodal, English + multilingual. |
| Reranking | ● | ● | Cohere Rerank 4 (May 2026). |
| Vision (image input) | ● | ● | Cohere Command A Vision, Llama 4 multimodal, Grok 4.3 vision. |
| Text-to-speech | ● | ◐ | xAI Voice support arrives through OCI Gen AI, separate from OCI Speech transcription. |
| Function calling / tools | ● | ● | Standard JSON-mode tool calling on Cohere & Llama. |
| Responses API / hosted tools | ● | ◐ | Use the current Responses-style APIs for agentic applications; do not build on deprecated text APIs. |
| Streaming output | ● | ● | SSE. |
| Import compatible models | ○ | ● | Use when a model is compatible with the OCI Gen AI serving path but is not yet in the managed catalog. |
| Fine-tuning (LoRA / T-Few) | ○ | ● | Requires dedicated cluster. |
| Custom-trained adapters | ○ | ● | Per-customer model endpoints. |
| Long context (>200K) | ◐ | ● | Llama 4 Maverick & Grok 4.20 support longer contexts; quotas tighter on shared pool. |
| Content moderation / guardrails | ● | ● | Native guardrail evaluation for prompts/responses; supported controls vary by on-demand vs dedicated endpoint. |
Region availability (as of June 2026)
Always confirm in the Console, but at time of writing OCI Gen AI is GA in: US (Chicago, Phoenix, Ashburn), Frankfurt, London, Amsterdam, Tokyo, Osaka, Sydney, Mumbai, Hyderabad, São Paulo, Toronto, Saudi Arabia (Jeddah, Riyadh), Israel, Singapore, Seoul, UAE Central (Abu Dhabi, full Enterprise AI as of June 2026), US Gov Cloud, US Classified Cloud. Not every model is in every region, Grok and Nemotron have narrower footprints, Cohere is widest. Abu Dhabi now also hosts imported OpenAI gpt-oss-20b/120b on B200 dedicated clusters.
Pricing mental model
On-demand pricing is per 1 million characters, not per token. Character count includes whitespace. For generation, you pay input + output; for embeddings, input only. For dedicated clusters you pay hourly per unit, where one "unit" is a specific GPU configuration that varies by model family. Verify current rates on the OCI pricing page, they move.
Cost behaviour by mode
| Workload | Best mode | Why |
|---|---|---|
| POC / pilot, <5M chars/day | On-demand | Pay only for what you call. Cluster idle cost would dominate. |
| Steady 50M+ chars/day, predictable | Dedicated cluster | Hourly rate amortizes well past a threshold; latency stable. |
| Fine-tuned Cohere for legal review | Dedicated cluster (required) | Custom adapters only deploy on dedicated. |
| Multi-tenant SaaS, bursty per customer | On-demand with circuit breakers | Quotas + retry shed load gracefully; cluster overprovisioning expensive. |
| Regulated data with isolation requirement | Dedicated cluster | Tenant isolation is the buying criterion, not cost. |
Hidden cost drivers
- System prompt bloat. Every request pays for the full system prompt. A 4KB persona prompt at scale dominates the bill. Use prompt caching where supported, or templatize.
- Naive RAG context windows. Stuffing 20 chunks into context costs 20x more than stuffing 4. Use reranking (Rerank 4) to cut to top-K, then send.
- Retries and timeouts. A 504 retry is two billed calls. Cap retries, log them, set them as alarms.
- Egress. Calls from outside OCI to the Gen AI endpoint can incur OCI egress on the response path. Keep callers inside OCI when volume is high.
- Embeddings re-runs. Re-embedding your whole corpus when you change models is expensive. Version your embeddings and decide policy up front.
Risks to think about before production
| Risk | Impact | Mitigation |
|---|---|---|
| Model deprecation | Apps break when a model is retired | Abstract model name behind a config flag. Test against the next model in the family early. Subscribe to OCI release notes for Gen AI. |
| Quota throttling under burst | 5xx during peak, lost revenue | Set Alarms on 429/503 from the endpoint. Request quota increases proactively. Move hot workloads to dedicated. |
| Region-model mismatch | Cannot deploy because model missing in region | Document model-region matrix as part of architecture review. Use a different region for inference if data residency allows. |
| Cross-tenant prompt leaking | Sensitive data echoed to other tenants in your SaaS | Per-tenant prompt isolation, no global cache of completions, audit log review. |
| Hallucinated tool calls | Agent calls wrong API with wrong args | Strict JSON schema validation, dry-run flag, idempotent tool design, human-in-loop on side-effectful tools. |
| Prompt injection from documents | RAG-fed document overrides system prompt | Use a defensive system prompt; classify retrieved chunks; mark untrusted content explicitly; pre-scan inbound docs. |
| Cost runaway from agent loops | Recursive agents consume thousands of dollars overnight | Per-session token budget, max-step cap, alarm on cost per session, kill switch. |
| Compliance audit gaps | Cannot prove what the model said to whom and when | Always log prompt + completion + model + version to OCI Logging. Retain per regulatory policy. |
Use Gen AI Service when…
- You need an LLM endpoint inside your OCI tenancy with IAM, VCN, and audit aligned to your existing controls.
- You want vendor optionality across Cohere, Llama, Grok, NVIDIA, Gemini-style integrations, and importable compatible models without running each stack yourself.
- You are building a custom app or pipeline, not consuming a pre-built one.
Skip it and use something else when…
- Your use case is covered by a Fusion Agentic App, use that instead, it ships in days.
- You need a specific frontier model or exact version that is not exposed in your OCI region, go direct, use that vendor's platform, or isolate the exception behind a model gateway.
- Your corpus is mostly documents and you want managed RAG, use OCI Generative AI Agents (Knowledge Bases) instead of writing your own retrieval.
- Your traffic is <100K requests/month, the per-character bill is fine, but evaluate whether OpenAI direct is operationally simpler given your team's existing tooling.
OCI Generative AI Agents GA ENTERPRISE AI AGENTS
Oracle's managed agent runtime. Build RAG agents, tool-using agents, and Responses API applications with vector stores, knowledge bases, hosted tools, session state, and governance hooks.
Gen AI Agents started as managed RAG; by 2026 it is part of OCI Enterprise AI Agents. It wraps the Gen AI model layer with knowledge bases, vector stores, hosted tools, multi-turn session state, Responses-style APIs, and governance controls. Result: managed agents without owning every piece of retrieval, tool-calling, and audit plumbing. Trades flexibility for time-to-market.
What you get out of the box
- Knowledge Bases backed by Object Storage, OCI OpenSearch, or Oracle Database 23ai/26ai Vector Search.
- Vector Stores for reusable retrieval assets across assistants and hosted tools.
- Multi-turn conversations with automatic context retention per session.
- Custom instructions at agent level (system-prompt-like persona).
- Responses API patterns for streaming, tools, file search, and stateful turns.
- Guardrails integrated with OCI Enterprise AI Governance.
- Human-in-the-loop approval steps as a first-class concept.
- Citations back to source documents so users can verify.
What it is not
It is not a replacement for Fusion Agentic Apps when a pre-built agent already covers the process. It is not a free-form "let the model do anything" runtime; tools, memory, files, and guardrails still need explicit design. And it does not remove the need to think carefully about chunking, embedding, metadata, identity, and retrieval quality, managed doesn't mean magic.
Reference architecture
Three knowledge base source types
| Source | Best for | How indexing works | Trade-offs |
|---|---|---|---|
| OCI Object Storage | Static document corpora (PDFs, DOCX, MD) | Service ingests files on a schedule, chunks, embeds, stores in a service-managed vector store | Simplest. Limited tuning. Refresh latency on source changes. Good first move. |
| OCI Search with OpenSearch | BYO indexed corpus where you control chunking and metadata | You ingest and index into OpenSearch yourself; agent queries it; chunks must be <512 tokens | You own the pipeline. More work. Better when corpus is large or filtering is heavy. |
| Oracle AI Database 23ai/26ai | RAG over relational + document corpora with security filters | Documents and vectors live in the DB; uses native VECTOR datatype and HNSW/IVF; agent issues hybrid SQL | Best when the DB is already your source of truth. Row-level security flows through. Requires DBA skills. |
Tools = how the agent acts
An agent without tools is a chatbot. With tools, it can call APIs, query systems, write to records, search files, or invoke governed internal services. The agent picks tools by function-calling on the underlying LLM. You define each tool with a name, description, and JSON schema. The runtime validates the model's output against the schema before invoking the backing OCI Function, HTTP endpoint, integration flow, or MCP-style tool server.
Common tool patterns
- Read-only lookups: query Fusion HCM, Service Cloud, custom APIs. Safe to call freely.
- Side-effectful actions: submit POs, create tickets, send emails. Wrap in human-in-loop approval.
- Computation: call a calculator/converter function. Cheap to allow.
- Hosted tool use: file search, code execution, and vector-store lookup where the Responses API supports it.
- Long-running jobs: start a workflow in OCI Functions, return a job ID, poll for status.
Multi-agent orchestration
Native multi-agent orchestration is provided through the AI Agent Studio Agentic Applications Builder (Mar 2026 release). It lets you compose multiple Gen AI Agents into a workflow with shared memory, conditional routing, and ROI measurement. For Fusion customers it is free and the recommended path. For non-Fusion environments, you can compose at the application layer with the SDK.
Risks and gotchas
| Risk | What goes wrong | What to do |
|---|---|---|
| Stale knowledge base | Source bucket updates but the index hasn't re-ingested yet | Configure ingestion schedule, monitor lag, surface a "Last refreshed" timestamp to users. |
| Wrong chunk size | Retrievals are too narrow or too wide, hurting answer quality | Default chunking rarely optimal for technical docs. Pilot with OpenSearch where you control it. |
| Citations don't match claim | LLM invents text but cites a real chunk | Strict prompting + post-hoc verification step. For high-stakes use, render the chunk text alongside the answer. |
| Permission bleed | Agent returns a doc the user shouldn't see | Filter at retrieval time using user identity. With 23ai KB this is straightforward via VPD. With Object Storage you must bucket-segregate or pre-filter. |
| Tool failure cascade | Tool returns error, agent retries, retries, retries | Cap retries, expose tool errors as plain text in the conversation, add max-step. |
| Slow first-token under load | Cold-start on the underlying LLM hurts perceived latency | Pre-warm via synthetic traffic; for SLA-critical agents use dedicated cluster. |
Use Generative AI Agents when
- You need an internal RAG chatbot over a corpus, and you don't want to build a retrieval pipeline.
- You want managed citations and multi-turn out of the box.
- Your knowledge base is one of the three supported source types.
Skip when
- You need exotic retrieval (graph RAG, multi-hop reasoning across sources), build with the Gen AI Service directly.
- Your use case is in Fusion, use the Fusion Agentic App instead.
- You need on-prem inference, Agents is a managed service in the cloud.
Foundation Model Catalog
Model families and model-delivery paths to understand on OCI Generative AI as of June 24, 2026. Always confirm exact model IDs and region availability in the OCI Console before implementation.
Oracle is not trying to own one frontier model. The 2026 architecture is a model platform: Oracle-hosted Cohere, Meta, xAI, and NVIDIA families; Google Gemini options where exposed through the OCI Gen AI integration path; OpenAI open-weight models through AI Quick Actions; and compatible-model imports for dedicated serving. Treat model identity as configuration, not application logic.
Models by family
Cohere: Oracle's strategic partner
| Model | Type | Strengths | Best for |
|---|---|---|---|
| Cohere Command A | Chat / generation | Strong RAG behavior, enterprise tone, multilingual | Default general-purpose chat agent for enterprise apps |
| Cohere Command A Vision Jan 2026 | Multimodal | Image + text understanding | Document understanding pipelines, screenshot Q&A |
| Cohere Command A Reasoning Jan 2026 | Reasoning | Chain-of-thought, multi-step planning | Agent planning, complex tool selection |
| Cohere Embed v4 | Embeddings | Multilingual, multimodal, 1024-dim or 256-dim | Default embedding model for RAG on OCI |
| Cohere Rerank 4 Jun 2026 | Reranker | Pairwise scoring of query vs candidate; cuts top-N to top-K. Now available on-demand and on dedicated clusters (Jun 2026), not dedicated-only | Second-stage RAG retrieval; quality lift over raw vector search |
Meta: open weights, broad fit
| Model | Strengths | Best for |
|---|---|---|
| Meta Llama 4 Scout | Efficient, smaller MoE; cheap inference | High-volume classification, summarization, lightweight RAG |
| Meta Llama 4 Maverick | Larger MoE; long context; multimodal | Long-document analysis, complex multi-doc RAG |
| Meta Llama 3.3 70B | Dense, well-understood baseline | Fine-tune target where you have abundant labeled data |
xAI Grok
| Model | Strengths | Best for |
|---|---|---|
| Grok 4.3 May 2026 | Strong reasoning, real-world knowledge breadth | Research assistants, analyst summarization |
| Grok 4.20 | General-purpose chat, faster variant where regionally available | Consumer-facing agents where latency matters |
| Grok 4.20 Multi-Agent | Model-side support for multi-agent style orchestration where available | Workflows with multiple specialist sub-agents |
| Grok 4.1 Fast Jan 2026 | Lowest-cost Grok variant | High-volume routing, low-complexity tasks |
NVIDIA
| Model | Strengths | Best for |
|---|---|---|
| Nemotron 3 Nano Omni Apr 2026 | Small footprint, multimodal, optimized for NVIDIA stack | Edge-ish inference, multimodal classification, cost-sensitive workloads |
| Nemotron 3 Ultra Jun 2026 | Open weights, training data, and recipes; frontier reasoning and agentic performance. Hosted via OCI Enterprise AI imported-model deployment on dedicated AI clusters | Teams that want a strong open model on Oracle-recommended GPUs behind a managed OCI endpoint and their own control plane |
Other delivery paths: Gemini, gpt-oss, and imported compatible models
| Path | What it means | Best for |
|---|---|---|
| Google Gemini model options | Use when exposed through the OCI Gen AI integration path in your region and tenancy. Treat availability as a region-specific architecture dependency. | Teams that want Gemini behavior but need OCI-side governance, networking, or billing alignment. |
| OpenAI gpt-oss in AI Quick Actions | Open-weight OpenAI models deployed through OCI Data Science / AI Quick Actions rather than the managed Gen AI on-demand catalog. | Private/custom deployments where open weights matter more than a managed token endpoint. |
| Import compatible models Jun 2026 | Bring compatible model artifacts into OCI Generative AI dedicated serving so they can use the same endpoint and governance patterns. June 2026 added Alibaba Qwen and Google Gemma families, plus OpenAI gpt-oss-20b/120b on B200 clusters in Abu Dhabi. | Model standardization when your chosen model is not yet a first-party catalog model. |
| Direct external API | Keep Claude/GPT/Gemini direct calls behind your own model gateway when the exact model, version, or region is not available on OCI. | Exception workloads where model quality beats platform consolidation. |
Choosing a model: quick heuristic
| If your need is… | Start with |
|---|---|
| Default enterprise chat / RAG agent | Cohere Command A |
| Image + text in the same prompt | Cohere Command A Vision or Llama 4 Maverick |
| Complex multi-step planning | Cohere Command A Reasoning or Grok 4.3 |
| Cheap, high-volume classification | Grok 4.1 Fast or Llama 4 Scout |
| Long context (>200K tokens) | Llama 4 Maverick or Grok 4.20 |
| Multi-agent native orchestration | Grok multi-agent variants where available, or Enterprise AI Agents / AI Agent Studio for platform orchestration |
| Text-to-speech agent output | xAI Voice on OCI Gen AI |
| Embeddings for RAG | Cohere Embed v4 |
| Reranking RAG candidates | Cohere Rerank 4 |
| Fine-tuning on your data | Llama 3.3 70B (mature), Cohere via dedicated cluster, or AI Quick Actions for open-weight models |
Embeddings & Rerank
The unglamorous half of RAG. Get embeddings wrong and the LLM has no chance.
What embeddings actually are
An embedding is a fixed-length vector of floats that represents semantic content. Two passages that mean similar things produce vectors that point in similar directions. Vector search finds the K nearest neighbors of a query vector and returns the passages they came from. That is retrieval. The LLM then writes an answer from those passages. That is the generation.
Cohere Embed 4: the default on OCI
| Property | Value |
|---|---|
| Default dimensions | 1024 (with a 256-dim variant for cost/storage sensitivity) |
| Languages | 100+ via the multilingual variant |
| Input modalities | Text, image, and mixed text+image input for multimodal retrieval patterns |
| Max input | ~512 tokens per chunk (chunk first, embed second) |
| Where it runs | OCI Generative AI Service, on-demand or dedicated |
Chunking discipline (the part teams skip)
- Chunk by structure first, length second. Split on headings, paragraphs, table rows, not arbitrary character counts.
- Aim for ~300-500 tokens per chunk. Smaller chunks improve precision; larger improve context.
- Overlap by 10-15%. Prevents losing the cross-boundary sentence.
- Carry metadata. Source URI, page number, last modified, owning department. You will need this for filtering and citations.
- Re-embed on policy change. Switching embedding models or dimensions means re-embedding the entire corpus. Plan version, cost, and rollback upfront.
Two-stage retrieval (the pattern that wins)
Hybrid search: keyword + vector
Vector search misses queries like "Form 1099-B" because the model treats it as similar to many other tax forms. Keyword search nails it. Hybrid combines both with a weighted score. Oracle AI Database 26ai's Unified Hybrid Vector Search supports this natively in one SQL. OCI OpenSearch supports it as well via score blending.
Enterprise AI Governance & Guardrails
The platform controls around models and agents: guardrails, private endpoints, API keys, IAM, audit, and network isolation.
What Oracle provides natively
- Guardrails for OCI Generative AI: prompt and response checks for unsafe content, prompt injection, and other policy violations.
- On-demand guardrail evaluation: call guardrails directly around a model request, or compose guardrails into your agent path.
- Dedicated endpoint guardrails: inform/block behavior for dedicated AI cluster endpoints where supported.
- PII detection via OCI Language Service, still useful as a deterministic pre/post filter when you need explicit PII categories.
- Agent-level governance: citations, tool schemas, human-in-loop approvals, and session limits.
- Private networking controls: private endpoints, service gateways, IAM policies, resource principals, and zero-trust network patterns where available.
What still belongs in your architecture
| Control | Why Oracle's platform control is not enough by itself | Architecture move |
|---|---|---|
| Domain policy | Generic guardrails do not know your business rules, competitors, contract terms, or regulatory scope. | Keep a domain policy layer in your app or agent instructions, then post-check outputs against explicit policy. |
| Authorization | A model can only be safe if retrieval and tools enforce the user's actual entitlements. | Filter before retrieval. Use VPD / row-level security in 26ai, Fusion roles in Fusion, and compartment/IAM boundaries in OCI. |
| Tool safety | Guardrails do not make a side-effectful tool safe. | Schema validation, dry-run mode, idempotency keys, approval gates, and maximum-step budgets. |
| Audit evidence | Compliance needs exact inputs, outputs, model, version, user, tool calls, and citations. | Write structured audit events to OCI Logging or your SIEM for every model/agent turn. |
| Network isolation | Private endpoint support must still be paired with route, DNS, and egress controls. | Use service gateways/private endpoints, deny public egress, and document every approved outbound path. |
Layered defense pattern
Oracle AI Vector Search GA 26ai · Jan 2026
Vectors as a first-class datatype, inside Oracle Database, indexed by HNSW or IVF, and joinable with relational, JSON, graph, and spatial in a single SQL statement.
If your data already lives in Oracle Database, AI Vector Search means RAG without a separate vector store. The vector lives next to the row. Permissions, backups, replication, failover, all reuse what you already operate. Comparable functionally to pgvector, Pinecone, or Weaviate, but with the killer feature of Unified Hybrid Search: vectors joined with relational predicates, JSON paths, graph hops, and spatial filters in one query. No separate orchestration layer.
What it actually is
A new native datatype, VECTOR(dimensions, format), with two index types (HNSW and IVF), a SQL function set (VECTOR_DISTANCE, VECTOR_EMBEDDING, VECTOR_CHUNKS), and the ability to load ONNX embedding models into the database so embedding generation happens server-side without network calls. Plus the SQL planner has been extended to combine vector predicates with normal predicates intelligently.
Why this is a big deal for Oracle shops
- No new license. Included in all editions of 26ai, including Standard Edition 2.
- No new operational team. Your existing DBAs run it.
- Row-level security flows through. A VPD policy that protects the row also protects its vector.
- Backups already cover it. RMAN, Data Guard, GoldenGate just work.
- Transactionally consistent retrieval. Vector search returns results consistent with your read snapshot, a property no standalone vector DB offers.
Architecture
Feature set: what 26ai adds vs first 23ai release
| Feature | Status | Why it matters |
|---|---|---|
| VECTOR datatype | GA | First-class storage. Variable dimensions and formats (FLOAT32, FLOAT16, INT8, BINARY). |
| HNSW index | GA | In-memory graph index. Fastest recall for moderate corpora that fit in Vector Memory Pool. |
| IVF index | GA | On-disk partitioning index. Scales to very large corpora without memory pressure. |
| HNSW with DML 26ai | GA | Transactionally consistent vector queries even with concurrent inserts/updates, including on RAC. |
| Unified Hybrid Vector Search 26ai | GA | Mix vector + relational + JSON + spatial + graph predicates in one query, planned together. |
| In-DB ONNX embedding | GA | Generate vectors server-side. No network egress for the embedding step. |
| DBMS_VECTOR_CHAIN | GA | PL/SQL package for chunk → embed → store → retrieve pipelines. |
| Distance functions | GA | L2, cosine, dot, Hamming, Manhattan. Pick per use case. |
| Quantization | GA | Reduces vector storage 4-32x with controlled accuracy loss. |
| Globally Distributed DB vector search | GA | Vector search across sharded deployments. For geo-distributed corpora. |
| Free in Autonomous DB free tier | GA | Try it on the always-free ATP without spending a cent. |
Unified Hybrid Search: one query, many predicates
The standout 26ai capability. A single SQL can ask: "Find me passages semantically similar to this query, where the owning department is HR, that mention 'parental leave' (text predicate), authored after Jan 2025, in any of these JSON-tagged jurisdictions." The optimizer plans vector and non-vector predicates together. In other architectures this requires post-filtering or a metadata sidecar. In 26ai it's one statement.
Index trade-offs
| Property | HNSW | IVF |
|---|---|---|
| Storage | In-memory (Vector Memory Pool) | On-disk |
| Query speed | Sub-millisecond for moderate corpora | Single-digit ms with right partitioning |
| Build cost | Higher; graph construction | Lower; partition-based |
| DML support | Yes (26ai), transactionally consistent | Yes |
| Best fit | ≤ few million vectors, latency-sensitive | Tens of millions+, memory-constrained |
| RAC behavior | Replicated on all instances | Distributed across instances |
| Tuning knobs | M, ef_construction, ef_search | nlist, nprobe |
vector_memory_size. Monitor it. If HNSW pages start spilling, performance collapses and you should re-plan as IVF or shard the corpus.
Licensing
AI Vector Search is included in all editions of Oracle AI Database 26ai at no additional license cost. Standard Edition 2, Enterprise Edition, Autonomous Database, Exadata Cloud@Customer, Exadata Database Service, all include it. This is the single biggest commercial pivot from prior versions, where vector workloads required additional features or third-party tools.
What you actually pay for
- CPU/memory of the DB hosting vectors. Plan extra memory for HNSW (rule of thumb: count × dim × 4 bytes × 1.5 overhead).
- Storage for vectors. A 1024-dim FLOAT32 vector ≈ 4 KB. 10M vectors ≈ 40 GB. Add overhead for indexes.
- Embedding generation. If you use OCI Gen AI for embeddings, you pay per character at the Gen AI rate. If you use ONNX in-DB, no per-call charge, just CPU.
- RAC + Data Guard if you need HA. Standard DB licensing rules apply.
Sizing example
| Corpus | Vectors | Storage (FLOAT32) | HNSW RAM ballpark |
|---|---|---|---|
| Internal wiki, 50K docs × 10 chunks each | 500,000 | ~2 GB | ~6-10 GB |
| Product catalog with descriptions + reviews | 5,000,000 | ~20 GB | ~60-100 GB |
| Legal corpus, fine-grained | 50,000,000 | ~200 GB | HNSW won't fit; use IVF |
Risks and gotchas
| Risk | What goes wrong | Mitigation |
|---|---|---|
| Vector Memory Pool spill | HNSW degrades when index doesn't fit; latency blows up silently | Monitor v$vector_memory_pool; alarm on usage > 80%; pre-plan IVF migration path. |
| Re-embedding cost | Switching embedding models requires regenerating all vectors | Version the embedding model in metadata; batch re-embed; budget the LLM cost. |
| Chunking baked into the table | Bad chunk size hurts forever unless re-ingested | Store raw doc + chunks separately; design re-chunkability from day one. |
| RAC HNSW replication overhead | HNSW index duplicated on every instance; memory bloat at scale | For very large indexes on RAC, consider IVF or distribute across shards. |
| Quantization accuracy loss | FLOAT32→INT8 saves space but can shift top-K results | A/B test recall before adopting; keep one full-precision baseline. |
| Hybrid query plan surprises | Optimizer picks wrong order; vector predicate evaluated on too many rows | Use SQL hints, gather stats on vector columns, test with EXPLAIN PLAN. |
| ONNX model drift | Embedding model loaded into DB grows stale vs the OCI hosted version | Pin a model version per table; document upgrade procedure. |
| PII in vectors | Embeddings can leak the original text via inversion attacks | Treat vector columns as PII; protect with VPD; encrypt at rest (TDE on by default in Autonomous). |
Use AI Vector Search when
- Your source-of-truth data already lives in Oracle Database.
- You need vector retrieval to respect existing row-level security.
- You want to join vector similarity with relational, JSON, spatial, or graph predicates in one query.
- You don't want to operate a separate vector DB.
- You need on-prem inference (Exadata, Linux x86-64), 26ai is on-prem GA Jan 2026.
Skip and use something else when
- Your data lives outside Oracle and pulling it in is impractical, use OCI OpenSearch or a Knowledge Base backed by Object Storage.
- You need exotic ANN algorithms (DiskANN, ScaNN) that 26ai doesn't ship, go to a specialist vector DB.
- You're a Postgres shop without Oracle, pgvector is fine for moderate scale.
Select AI GA
Natural language to SQL inside the database. PL/SQL package, four modes, multiple LLM providers, RAG-capable. Available in Autonomous and on-prem 26ai.
Select AI lets users ask the database in plain English. Behind the scenes DBMS_CLOUD_AI sends the question plus schema metadata to an LLM (OpenAI, Cohere, Azure OpenAI, or OCI Gen AI), gets SQL back, and either runs it (runsql), shows it (showsql), explains the result (narrate), or chats (chat). Reported accuracy ~95% on TPC-H. Useful for analyst self-service. Not a replacement for hand-tuned queries on hot paths.
The four modes
| Mode | What it returns | Typical use |
|---|---|---|
runsql | Executes the generated SQL and returns rows | Self-service reporting for trusted users |
showsql | Returns SQL text without executing | Analyst review before running; explainability |
narrate | Returns SQL + natural-language explanation of results | Business-user dashboards, embedded BI |
chat | General chat with the underlying LLM, no SQL focus | General-purpose assistant from within the DB |
Provider integration
Select AI is provider-pluggable. You create an AI profile that names a provider (OpenAI, Cohere, Azure OpenAI, OCI Generative AI) and credentials, then attach it to a session. Switching providers is a config change, not a code change. Credentials live in Vault.
Where it fits in an enterprise
Internal analytics self-service. Quarter-end ad-hoc questions. Sales ops, finance ops, customer support analytics. Embedded chat-with-data in APEX apps. Low-volume, knowledgeable users who can spot a wrong SQL.
Production OLTP queries (latency, predictability). External customer-facing chat (cost, security, schema leakage). Tables with unstable schemas or cryptic column names (the LLM gets confused). High-volume bursty workloads (cost spikes).
Risks specific to Select AI
- Schema disclosure. The LLM sees your table and column names. If those reveal sensitive structure, scope it via grants and avoid passing schemas with regulated-data hints in their names.
- Wrong SQL that runs. The model may produce SQL that returns wrong numbers without erroring. Prefer
showsqlfor non-trivial questions and let a human approve. - Cost surprises. A natural-language question can produce a SQL that table-scans a fact table. Add query timeouts and resource manager plans.
- Cross-database queries. Don't expect the model to understand database links or sharded topologies without explicit metadata coaching.
In-Database ONNX Embeddings
Load an embedding model into Oracle AI Database 26ai. Generate vectors with a SQL function. No network call, no API key, no per-character cost.
The pattern
Most embedding pipelines call out to a hosted model (OCI Gen AI, OpenAI). That introduces latency, cost per call, and a data-leak surface. In-DB ONNX inverts the dependency: you load the embedding model into the DB once, then call VECTOR_EMBEDDING(text USING model_name) as a function in any SQL. Embeddings happen on the DB server.
Why architects care
- No egress. Embedding data never leaves the DB box. Critical for regulated content.
- No per-call cost. Pay for CPU you already own, not per million characters.
- Lower latency on bulk re-embed. Eliminate network round-trip per chunk.
- Simpler ops. No external service dependency in the embedding pipeline.
Trade-offs
| Concern | In-DB ONNX | Hosted (OCI Gen AI) |
|---|---|---|
| Latency per call | Lower (no network) | Higher (network + service) |
| Cost per call | None, pay for DB CPU | Per character |
| Model freshness | You manage upgrades | Oracle maintains |
| Model selection | Anything in ONNX format ≤ size limit | Curated set |
| CPU pressure on DB | Yes, sizing concern | None |
| Compliance / sovereignty | Strongest (data never leaves) | Service-bound |
Where it slots in
VECTOR_EMBEDDING inside an OLTP transaction will tax the DB CPU and burn redo. Embed at ingest time, store the vector, query the stored vector, same as you would with any external embedding pipeline.
Oracle AI Database Private Agent Factory 26ai
A no-code/private agent factory for enterprise data. Use it when business users or engineers need knowledge agents grounded in approved repositories, files, web sources, and Oracle Database data without exposing the workflow through a general-purpose SaaS chatbot layer.
Private Agent Factory matters because it treats Oracle's database and enterprise repositories as the trust boundary. It includes no-code agent creation, pre-built assistants, prompt lab patterns, knowledge agents, approved data sources, embeddings, and private retrieval. This is the right pattern when you need grounded agents over enterprise content without pushing sensitive schema and documents into a separate chatbot platform.
Reference architecture
Use it when
- The corpus is private enterprise content: database rows, internal sites, file shares, SharePoint, Google Drive, or uploaded documents.
- You need no-code agent creation for business users while preserving engineered controls around approved sources and model management.
- You need explainable retrieval over documents and vectors without standing up a separate vector DB.
Do not use it when
- The agent is primarily a Fusion process agent covered by Fusion Agentic Apps or AI Agent Studio.
- The corpus is mostly non-Oracle documents in object stores and a managed OCI Generative AI Agent would ship faster.
- You need a consumer-grade assistant UX with broad channels, analytics, and bot lifecycle tools; evaluate Oracle Digital Assistant or app-layer tooling.
Oracle Private AI Services Container 26ai
A lightweight containerized web service for Oracle AI Database 26ai that offloads expensive vector work outside the database: embedding generation and HNSW vector-index creation.
Private AI Services Container is not a private LLM chatbot runtime. Current docs describe two services: a Vector Embedding Service and a Vector Index Service. It can run in your data center or cloud compute, does not require internet access, processes requests statelessly, and helps free database CPU/GPU capacity for search and transactional work.
Architecture decision
| Question | Use in-DB ONNX / DB CPU | Use Private AI Services Container |
|---|---|---|
| Embedding volume is low or DB CPU is available | Simple and local | Probably unnecessary |
| Embedding/index creation is expensive | Can starve database resources | Offload work to external compute while storing vectors in Oracle AI Database |
| Need GPU-accelerated HNSW index creation | Limited by DB host capability | Use the Vector Index Service with NVIDIA GPU-backed compute |
| Need no-internet/private operation | Good if model already loaded in DB | Good: container can run without internet and is called by DBMS_VECTOR or REST clients |
| Need hosted chat / reasoning model | Not the right layer | Not the right layer; use OCI Gen AI, Private Agent Factory with configured LLMs, or a model gateway |
Two services in the container
| Service | What it does | How it connects |
|---|---|---|
| Vector Embedding Service | Generates embeddings outside the database and stores/uses them with Oracle AI Database similarity search. | Called from DBMS_VECTOR procedures such as UTL_TO_EMBEDDING / UTL_TO_EMBEDDINGS, or via REST/OpenAI SDK-style clients. |
| Vector Index Service | Offloads HNSW vector index creation to GPU-backed compute for faster index builds. | Referenced from CREATE VECTOR INDEX parameters that point at the container REST endpoint and API key. |
Risks
- Model freshness. You manage embedding model updates; stale embeddings quietly degrade retrieval quality.
- Capacity sizing. Offloaded vector work shifts latency and throughput onto your container hosts.
- Patch ownership. Treat the container like production infrastructure, not a demo appliance.
- Endpoint security. Protect the container endpoint and API key; it can be invoked by database jobs or REST clients.
- Audit consistency. Log embedding/index jobs, model versions, container version, target table/index, and caller.
OCI Vision GA
Pretrained and custom-trainable image analysis. Object detection, classification, OCR, document image understanding. API + Console + SDK.
Two modes. Pretrained: call an API, get labels/boxes/text/faces. Cheapest, fastest, no setup. Custom: upload labeled images, train your own classifier or detector through the Console. Useful when off-the-shelf labels miss your domain (manufacturing defects, retail SKUs).
Capabilities
| Capability | Pretrained | Custom training | Typical use |
|---|---|---|---|
| Object detection | Yes | Yes | Count items, locate defects, retail shelf scanning |
| Image classification | Yes | Yes | Tag content, route images by category |
| OCR (text in images) | Yes | - | Receipt scanning, signage extraction |
| Document image analysis | Yes | - | Forms, tables, overlaps with Document Understanding |
| Face detection | Yes | - | Privacy-aware face blur, attendance counting |
Indicative pricing (verify on the OCI pricing page)
Pretrained image analysis is in the low-cents-per-thousand-images range. Custom model training is hourly per GPU-hour. Always check current numbers before committing.
When to use Vision vs Document Understanding
Risks
- Custom-trained models drift as products and packaging change. Retrain quarterly or on accuracy degradation alarms.
- OCR accuracy degrades on low-quality scans. Pre-process (deskew, contrast) before sending.
- Face detection has compliance implications. Document the legal basis before deploying.
OCI Language GA
NLU primitives for text: sentiment, entity recognition, PII detection, key phrase extraction, language detection, classification, translation.
Not an LLM. A set of classical NLP services with pretrained models, exposed as APIs. Cheap per call, deterministic outputs, easy to embed in pipelines. Use for the boring-but-essential text tasks where you don't need generation, PII scrubbing, sentiment scoring on tickets, language routing on multilingual input.
Capabilities
| Capability | Use case |
|---|---|
| Sentiment analysis | Customer feedback triage, NPS-style scoring |
| Aspect-based sentiment | "The screen is great but the battery is poor" → screen+, battery- |
| Named entity recognition (NER) | Extract people, orgs, locations, dates |
| PII detection | Pre/post filter for LLM pipelines |
| Key phrase extraction | Auto-tag content |
| Language detection | Route multilingual tickets |
| Text classification | Custom-trainable category labels |
| Translation | Common language pairs; not best-in-class, fine for internal use |
Language coverage
Most analytical features (sentiment, NER) cover English, Spanish, French, German, Portuguese, Italian out of the box. Coverage varies by feature, check the docs per service. For broader language coverage, pair with an LLM via Gen AI.
Where Language slots into Gen AI
The pattern that works: use Language as cheap pre/post filters around expensive LLM calls. Detect language to pick the right system prompt, scrub PII before sending to the model, classify intent to skip the LLM when a deterministic answer exists. This drops Gen AI cost by 30-60% on a typical customer-support workload.
OCI Speech GA
Speech-to-text (ASR) for audio files and streams. Multiple languages, speaker diarization, SRT/VTT output, profanity handling.
Capabilities
- Batch transcription of audio/video files in Object Storage.
- Real-time streaming for low-latency captioning use cases.
- Speaker diarization ("who spoke when") for call recordings and meetings.
- Normalization of times, addresses, numbers, URLs in the output text.
- Profanity filter: remove, mask, or tag.
- SRT/VTT subtitle output for video.
- Custom vocabulary for domain words the base model mis-hears.
Common architectures
Recordings land in Object Storage → Speech transcribes with diarization → Language extracts sentiment + entities → Gen AI summarizes the call → write back to CRM. End-to-end at <30¢ per call typically.
Teams/Zoom recording → Speech with diarization → Gen AI summarizes per speaker, extracts decisions, generates action items → write to Asana/Jira via tool call from an agent.
Risks
- Background noise drops accuracy. Pre-process or use a dedicated noise-suppression step before transcription.
- Diarization struggles with overlapping speakers. Document accuracy expectations to stakeholders.
- Audio data residency matters more than most teams think, keep buckets in the right region.
- Real-time streaming has stricter quotas; plan capacity before peak loads (e.g. live earnings calls).
xAI Voice on OCI Generative AI May 2026
Text-to-speech through the OCI Generative AI model layer. Treat it as output generation for voice agents, training narration, call-center assist, and accessibility workflows.
OCI Speech is speech-to-text. xAI Voice is text-to-speech. Keep the distinction clear in architecture diagrams: Speech turns audio into text; xAI Voice turns model output or authored content into audio. Voice quality, latency, language support, and region availability must be tested in the exact OCI region you plan to use.
Reference pattern: voice agent response
Use it when
- You need voice output from an Oracle-hosted Gen AI workflow without procuring a separate TTS vendor.
- You are building an internal assistant, training-content generator, or call-center agent response path.
- You can tolerate region/model availability checks and quality testing before launch.
Risks
- Latency. Voice adds another model call after generation. Stream audio where possible.
- Unsafe audio. Guardrail text before synthesizing. Audio moderation after generation is harder.
- Voice consistency. Pin the voice/model choice and test regression on every catalog update.
- Cost. Cache repeated announcements and training clips instead of regenerating.
OCI Document Understanding GA Generative extraction 2026
Extract text, tables, key-value pairs, signatures, and classifications from PDFs and document images. 2026 update added generative extraction for context-aware parsing.
The boring backbone of most enterprise AI projects. Invoices, contracts, KYC packets, claims forms. Document Understanding handles OCR, layout, table detection, and key-value extraction. The 2026 generative extraction upgrade improves accuracy on free-form fields and complex tables by adding LLM-grade context reasoning.
Capabilities
| Feature | What it does |
|---|---|
| Text extraction | OCR with layout preservation |
| Table extraction | Detect tables, extract as structured rows + cells |
| Key-value extraction | Pretrained for invoices, receipts, IDs; custom-trainable for your forms |
| Document classification | Route into the right downstream queue |
| Signature detection | Flag whether a signature is present in a region |
| Generative extraction 2026 | LLM-backed extraction for ambiguous fields, free-form sections, multi-column layouts |
Pricing structure
Charged per transaction (a page or a document, depending on the operation). First 5,000 transactions per month are free: useful for low-volume pilots and Always-Free tier exploration.
Reference pipeline
Risks
- Custom KV models need labeled data, budget annotation time honestly.
- Generative extraction is more accurate but slower and more expensive than classic OCR-only. Mix modes based on document type.
- Tables with merged cells or nested headers still cause issues, sample your hardest documents in PoC.
OCI Anomaly Detection GA
Multivariate time-series anomaly detection using Oracle Labs' MSET2 algorithm. Trained on your historical normal-operation data; scores new observations as anomalous or not.
Why this exists
Most enterprise anomaly problems aren't univariate (a single sensor spike). They're multivariate, "this combination of pressure, temperature, vibration, and current is unusual together even though no single value is out of spec." MSET2 (Multivariate State Estimation Technique) was developed at Oracle Labs for nuclear plant monitoring; it generalizes to manufacturing, fleet telemetry, and IT ops.
How it works
- Train on a window of historical "normal" data, sensor values, KPI series, whatever is multivariate and time-aligned.
- Score new observations, returns an anomaly score per timestamp and per signal.
- Explain: identifies which signals are contributing to the anomaly.
Where it fits
Production lines, turbines, HVAC, refrigeration. Multivariate sensor data already collected; just needs a model and a daily training refresh.
Application telemetry, error rate, latency, throughput, GC time. Catches issues that single-metric alerts miss.
Risks
- Concept drift. What was normal six months ago isn't now. Retrain on a rolling window.
- Cold start. Needs enough clean historical normal data, typically weeks to months.
- False positives. Tune detection sensitivity per use case; pair with operator runbook.
- Not a forecasting service. Use OCI Forecasting if you need next-value prediction.
OCI Forecasting GA
AutoML for univariate and multivariate time-series. Pick a target series, optional exogenous regressors, and a horizon; the service auto-selects and trains a model.
What it gives you
- Point forecasts + prediction intervals.
- Auto algorithm selection across classical (ARIMA, ETS) and ML (Prophet-style, gradient boosted) approaches.
- Holiday and seasonality handling out of the box.
- Multi-horizon forecasts.
- Explanation of which features drive a forecast.
Common use cases
| Domain | Series | Why Forecasting helps |
|---|---|---|
| Retail | Daily demand per SKU per store | Replenishment planning |
| Finance ops | AR / AP cash flow | Working capital forecasting |
| Workforce | Contact volume per skill per 30-min | Staff scheduling |
| Energy | Load per substation | Procurement and dispatch |
Risks
- Garbage in, garbage out, handle missing values and outliers upstream.
- Forecasts are only as good as the regressors you provide; promos, holidays, pricing must be fed in if they drive the series.
- Auto model selection isn't auto governance, log the chosen model + features per retrain cycle for audit.
Oracle AI Data Platform 2026
The governed data plane for AI teams. Use it to organize data products, catalogs, connections, metadata, workspaces, notebooks, and pipelines so agents and models do not each invent their own data access layer.
AI projects fail when every assistant has its own copy of data, metadata, and permissions. Oracle AI Data Platform is the control layer for turning enterprise data into governed, reusable AI assets: catalogs, data products, agent-ready connections, notebooks, Spark workflows, and repeatable data pipelines.
Architecture role
When to use it
| Situation | Why AI Data Platform helps |
|---|---|
| Multiple teams are building RAG over the same documents | Create governed, reusable vector/index assets instead of duplicate chunk/embed pipelines. |
| Agents need data from many Oracle and non-Oracle sources | Centralize connection, lineage, ownership, and refresh rules. |
| Data products already exist or are being formalized | Expose them to AI consumers with ownership and policy instead of raw tables/buckets. |
| Data engineering owns pipelines, app teams own agents | Separate concerns cleanly: Data Flow and catalog upstream, Enterprise AI Agents downstream. |
Risks
- Governance theater. Catalog entries without owners, refresh SLAs, and access rules do not help agents.
- Pipeline duplication. Decide which datasets, catalogs, and metadata flows are authoritative and version them like APIs.
- Latency. A shared data plane is not always the fastest path for hot transactional reads; keep OLTP queries close to the source DB.
- Team boundaries. Data product owners must participate in AI design, or agents will be grounded on misunderstood data.
OCI Data Science GA
JupyterLab notebooks, MLOps pipelines, model catalog, model deployment, jobs, monitoring. Built on conda environments and an Operator pattern.
The home of custom ML on OCI. Notebook sessions (JupyterLab) for exploration, Jobs for batch training, Pipelines for orchestration, Model Catalog for governance, Model Deployment for inference endpoints. If you're building a model from scratch, classical ML or fine-tuning an open-weight LLM, this is where you live. For pre-built FM deployment, use AI Quick Actions (it sits on top of Data Science).
Building blocks
| Component | What it is | When you use it |
|---|---|---|
| Notebook sessions | Managed JupyterLab on VM or BM (CPU or GPU) | Exploration, prototyping, training scripts |
| Conda environments | Curated environments incl. PyTorch, TF, RAPIDS, LangChain, Oracle SDKs | Reproducible runtime |
| Jobs | Run notebooks or scripts on demand on chosen shapes | Batch training, scheduled retraining |
| Pipelines | DAG of jobs with input/output passing | Multi-step training and eval workflows |
| Model Catalog | Versioned registry with metadata, provenance, tags | Governance, audit, hand-off to deploy |
| Model Deployment | Managed HTTP endpoint with autoscale | Hosting custom models for inference |
| Model Monitoring | Drift, performance, schema integrity over time | Production health |
| Feature Store | Centralized feature definitions, online/offline | Multi-team ML at scale |
Where it fits in the AI stack
Data Science is the "build your own" layer. If your problem is solved by Cohere or Llama as-is, you don't need Data Science, use Gen AI Service. If you need a custom model (classical ML, fine-tuned open-weight LLM, vision model, time-series), you do. Many enterprises end up using Data Science only for the 10-20% of use cases that don't fit a managed AI service, the rest go through Gen AI Service or AI Quick Actions.
Risks and ops realities
- Idle notebook spend. Notebook sessions running on a GPU cost real money even when idle. Auto-stop policies are essential.
- Environment sprawl. Teams customize conda environments; reproducibility erodes. Pin environments per project.
- Model-to-production gap. Notebook code rarely runs cleanly as a Job. Budget time for the productionization step every project.
- Compliance for training data. Training datasets often contain PII. Treat them with the same controls as the source-of-record DB.
AI Quick Actions GA Llama 4 + gpt-oss 2026
No-code foundation-model deployment and fine-tuning. Pick a model from a catalog, click Deploy, get an endpoint. Or pick a model, point at training data, click Fine-tune.
What's in the model catalog (June 2026)
- Meta Llama 4: Scout, Maverick.
- Meta Llama 3.x: including Llama 3.2 90B Vision Instruct.
- OpenAI open-weight: gpt-oss-120b, gpt-oss-20b.
- Phi, Falcon, Mistral, Granite, pre-cached, faster cold start.
- Bring-your-own from Hugging Face, direct import.
What it actually saves you
If you've ever deployed an open-weight LLM on raw GPU instances, you've burned days on Docker images, vLLM/TGI tuning, autoscaling, health checks, log shipping. AI Quick Actions does all of that with one click. You give up some flexibility (you can't pick exactly which serving runtime version, for instance) for an order of magnitude faster time-to-endpoint.
When AI Quick Actions vs Gen AI Service
| Question | Answer |
|---|---|
| You want a managed endpoint with no infra | Gen AI Service |
| You need a model not in Gen AI catalog (e.g. Mistral, Falcon, gpt-oss) | AI Quick Actions |
| You need fine-tuning with custom data | AI Quick Actions OR Gen AI dedicated cluster |
| You need on-demand pay-per-token | Gen AI Service (AQA is dedicated GPU) |
| You need to import from Hugging Face | AI Quick Actions |
| You need to keep the model entirely in your tenancy | AI Quick Actions (dedicated by definition) |
Risks
- Dedicated GPU cost, deployment runs hourly regardless of traffic. Auto-shutdown for dev/test environments.
- Model size vs shape, Llama 4 Maverick won't fit on a single GPU; AQA picks the right shape, but you should understand the floor cost.
- Fine-tuning quality depends on data quality more than algorithm choice. The clicky UI doesn't change that.
Fusion Agentic Applications GA · Mar 2026
22 pre-built AI agents embedded inside Oracle Fusion Cloud. Native to the transactional system. Governed by Fusion roles, approvals, and data. CX expansion in Apr 2026.
This is Oracle's biggest 2026 application-layer announcement. Twenty-two agentic apps that live inside Fusion ERP, HCM, SCM, and (since Apr 2026) CX. They are not chatbots, not copilots, and not add-ons. They run inside the transactional system, see the same data and approval hierarchies users see, and execute work autonomously when allowed. If you run Fusion, you already paid for this, pilot it before you build anything custom.
What "agentic" means here
Oracle's definition: outcome-driven, proactive, reasoning, and engineered for enterprise execution. Concretely, an agentic app does four things a copilot doesn't: it (1) initiates work without a user prompt, (2) plans across multiple steps and tools, (3) executes within the transactional system using existing roles, and (4) measures outcomes back. The boundary between "agent" and "automation" is fuzzy, but the integration depth is the meaningful difference.
Where the 22 agentic apps sit (representative: Oracle keeps adding)
| Pillar | Example agentic apps |
|---|---|
| Finance | Procure-to-Pay agent · Expense intake agent · Period-close anomaly investigation · Collections triage |
| HR / HCM | Workforce Operations agent (scheduling, payroll issue triage) · Recruiting assistant · Performance review summarization · Time-off conflict resolver |
| Supply chain | Demand-supply imbalance investigator · Supplier risk monitor · Logistics exception handler · Quality issue triage |
| CX Apr 2026 | Sales next-best-action · Service case summarization & resolution · Marketing campaign optimizer |
Why this changes build-vs-buy math
Before Fusion Agentic Apps, an enterprise wanting a "smart period-close" agent had to (a) buy or build an LLM platform, (b) integrate it with Fusion Financials data, (c) replicate role permissions, (d) wire it into approval workflows, (e) operate it. Now Oracle ships steps (a)-(e) as a configured agentic app under your existing Fusion subscription. The build case has to clear a much higher bar.
Risks
- Change management. Agentic apps execute work, that means human approvers see different workflows. Governance and communications matter more than the tech.
- Configuration drift. Each agentic app has settings. Track them per environment and put them under change control.
- Data quality exposed. Agents reason on Fusion data. If your master data is messy, agents are less useful. Fix data first.
- Vendor coupling. Deeper dependency on Fusion. Be intentional about which agents you adopt and which you keep optionality on.
AI Agent Studio for Fusion Expanded Mar 2026
Build, connect, and orchestrate agents that work alongside Fusion. Includes the Agentic Applications Builder, content intelligence, contextual memory, ROI measurement, and workflow tools. Included with Fusion subscriptions at no extra cost.
What you can build
- Custom agents on top of Fusion data and APIs, using Oracle, partner, or external agents as building blocks.
- Agentic applications (workflows of agents) via the no-code Agentic Applications Builder.
- External integrations: pull in Slack, Teams, Microsoft 365, ServiceNow, and similar.
- ROI dashboards: measure agent impact (time saved, cycle time, decision accuracy).
Studio vs OCI Generative AI Agents: what's the difference?
| Question | AI Agent Studio (Fusion) | OCI Generative AI Agents |
|---|---|---|
| Audience | Fusion customers and Fusion partners | Any OCI customer building agents |
| Data integration | Native to Fusion data + roles | Object Storage, OpenSearch, 23ai |
| UX | Low-code/no-code builder | API + SDK + Console |
| Cost | Free with Fusion subscription | Pay per use (model + retrieval costs) |
| Best fit | Workflows touching Fusion records | RAG over enterprise corpora outside Fusion |
They are complementary, not redundant. A bank could use OCI Generative AI Agents for an internal policy chatbot over a SharePoint-style document corpus, and AI Agent Studio for finance-close agents that operate inside Fusion ERP.
Oracle Digital Assistant GA
Oracle's conversational-assistant platform for enterprise channels. Use it when the hard problem is bot lifecycle, skills, channel delivery, and human-agent handoff rather than raw model prompting.
Oracle Digital Assistant (ODA) is still relevant in the GenAI era. It gives you channel adapters, skills, conversation flows, analytics, human handoff, and Fusion/Oracle app integration. OCI Generative AI Agents can power the intelligence behind a bot; ODA is often the front door and lifecycle layer for chat experiences.
ODA vs Enterprise AI Agents
| Question | Oracle Digital Assistant | OCI Enterprise AI Agents |
|---|---|---|
| Primary job | Conversation UX, channels, skills, routing, handoff | LLM reasoning, RAG, tools, vector stores, responses API |
| Best user surface | Web chat, mobile, messaging, service channels, app-embedded assistants | API-driven agents embedded into apps, workflows, or custom UIs |
| Human handoff | First-class pattern | Design through tools/workflows |
| Knowledge grounding | Can integrate LLM/GenAI capabilities into skills | Native knowledge bases/vector stores |
| Best fit | Customer/service bot with channels and lifecycle management | Enterprise RAG/tool agent behind one or more apps |
Reference pattern
Risks
- Wrong layer. Do not rebuild RAG plumbing in ODA skills when Enterprise AI Agents already provides it.
- Channel complexity. Each channel has identity, session, and attachment quirks; test the exact deployment channel.
- Handoff design. Human handoff must include transcript, context, model answer, and source citations, not just "transfer to agent."
APEX AI GA 24.2 RAG & AI configs
Low-code AI inside Oracle APEX. AI Assistant for developers and end users, AI-driven data modeling, dynamic-action generative text, AI Configurations + RAG Sources, vector search integration.
If you build internal apps with APEX, AI is no longer something you bolt on. AI Configurations let you define a system prompt + model + RAG sources once, then reuse across pages. Dynamic Actions "Show AI Assistant" and "Generate Text with AI" embed chat and generation in two clicks. Search Configurations wire 26ai Vector Search into your search pages without writing the SQL yourself. For Oracle-shop developers, this is the fastest path from "we have a wiki" to "we have a RAG chatbot over our wiki" in production.
What APEX 24.2 adds (relevant to AI)
| Feature | What it does |
|---|---|
| AI Configurations (Shared Component) | Bundle system prompt + welcome message + RAG sources. Reuse across pages. |
| RAG Sources | Point at 23ai Vector Search tables, REST endpoints, or APEX queries. |
| Show AI Assistant (Dynamic Action) | Chat panel using the chosen AI Configuration. |
| Generate Text with AI (Dynamic Action) | Generate content on demand from a user prompt + template. |
| AI-Driven Data Modeling | Describe a model in plain English; APEX generates tables, sample data. |
| Search Configuration with Vector Search | Add semantic search to APEX page items without hand-writing SQL. |
| APEX_AI PL/SQL package | Programmatic access from PL/SQL when the dynamic actions aren't enough. |
Provider support
APEX talks to OCI Generative AI, OpenAI, Cohere, and Azure OpenAI through provider configurations. The AI Configuration abstracts the provider, apps don't change when you swap.
Reference pattern: RAG chatbot in APEX in a day
- Create a VECTOR column on your content table in 23ai/26ai. Populate it (in-DB ONNX or Cohere via DBMS_VECTOR_CHAIN).
- Create a RAG Source pointing to that table.
- Create an AI Configuration with a system prompt + that RAG Source + your preferred provider.
- On any page, add the "Show AI Assistant" dynamic action bound to that configuration.
- Ship.
Risks
- Provider credentials live in APEX Web Credentials, protect them like any other secret.
- Embed cost is real if you store all your content in the DB and embed via OCI Gen AI. Use in-DB ONNX where possible.
- Audit trail of LLM calls isn't automatic, log via APEX_AI calls to a log table for compliance.
MySQL HeatWave GenAI GA
In-database LLMs, automated vector store, lakehouse access, and natural-language chat, all inside MySQL HeatWave. Multilingual, JavaScript-callable, VLM-enhanced PDF parsing as of MySQL 9.4.2.
For MySQL shops, HeatWave GenAI is the analog of what 26ai is for Oracle DB shops: vector search + LLM access without leaving the database. Differences: HeatWave bundles in-database LLMs (you don't have to call out), it has tight Object Storage ingestion that auto-parses PDFs/PPTs/HTML/DOC, and it integrates with HeatWave Lakehouse so you can query non-MySQL data alongside MySQL data. The lakehouse + vector store combination is the most differentiated part of the offering.
Components
| Component | What it is |
|---|---|
| In-database LLMs | Models that run inside HeatWave for generation, summarization, chat, no external API call required |
| Vector store | Inbuilt store for embeddings + similarity search |
| Automated ingestion | Parses PDF (incl. scanned), PPT, TXT, HTML, DOC from Object Storage; chunks; embeds; loads |
| VLM-based PDF parsing | Vision-Language-Model enhanced extraction for complex PDFs (tables, charts). Added MySQL 9.4.2. |
| Lakehouse Navigator | UI to browse MySQL + Object Storage data, load into vector store |
| JavaScript stored programs | Invoke GenAI from JS inside HeatWave; preprocess SQL data, call LLMs, post-process |
| Multilingual | Supports 24+ languages across the GenAI APIs |
When to choose HeatWave GenAI over OCI Gen AI + 23ai
- You already run MySQL HeatWave for analytics.
- Your source data is heterogeneous (MySQL + Object Storage + S3) and you want lakehouse ingestion.
- You want LLM inference inside the database (no egress to an external service).
- You don't need Oracle Database features (PL/SQL, VPD, RAC).
Risks
- Sizing, in-DB LLM inference is GPU-intensive on the HeatWave node. Pick shapes deliberately.
- Available model family inside HeatWave is narrower than OCI Gen AI's catalog.
- HeatWave is OCI-first; some features are not available on AWS or Azure deployments of HeatWave.
AI Infrastructure: GPU shapes & networking
What you actually rent when you need raw inference or training capacity. NVIDIA H100 / H200 / B200 / GB200 NVL72, RDMA cluster networks, dedicated regions, sovereign deployments.
Oracle's GPU story is unusually strong because of long-standing NVIDIA collaboration and aggressive supply commitments. H100/H200/B200 bare-metal shapes plus GB200 NVL72 Superclusters (announced expansion at GTC 2026) are available in commercial, gov, classified, sovereign, and dedicated regions. For most enterprise AI, you don't touch these directly, Gen AI Service, AI Quick Actions, and Fusion abstract them away. You care about GPU shapes when (a) you're fine-tuning a 70B+ model, (b) you're hosting custom inference, or (c) you're doing frontier training.
Shape family (illustrative: confirm in OCI Compute docs)
| Shape family | GPU | Per node | Typical use |
|---|---|---|---|
| BM.GPU.A100.8 | NVIDIA A100 80 GB | 8 GPUs · NVLink | Mature training and inference baseline |
| BM.GPU.H100.8 | NVIDIA H100 80 GB | 8 GPUs · NVLink | Default for fine-tuning 70B-class models |
| BM.GPU.H200.8 | NVIDIA H200 141 GB | 8 GPUs · NVLink | Long-context inference, larger models in fewer nodes |
| BM.GPU.B200 | NVIDIA B200 | Blackwell-class | New-generation inference and training |
| GB200 NVL72 Supercluster | NVIDIA GB200 NVL72 | Rack-scale | Frontier training, very large model serving |
Cluster networks & RDMA
For multi-node training, GPUs talk to each other faster than they talk to anything else. OCI Cluster Networks use RDMA over Converged Ethernet (RoCE) with very low latency and high bandwidth between bare-metal GPU nodes. If you're training a model that doesn't fit on one node, this is the lever that determines wall-clock time.
Sovereignty and region matrix
| Region type | Notable AI availability |
|---|---|
| Commercial OCI (50+ regions) | Full Gen AI catalog, GPU shapes, Data Science |
| US Gov Cloud | OCI Gen AI GA, full service set |
| US Classified Cloud | OCI Gen AI GA (May 2026), select services |
| UAE Central (Abu Dhabi) | OCI Gen AI GA (May 2026) |
| EU Sovereign | OCI Gen AI subset, full data residency |
| Dedicated Region (DRCC) | Full OCI in your data center, including AI services where licensed |
Architecture Patterns
Five reference patterns that cover most enterprise Oracle AI projects. Each names the services, the data flow, and the failure modes.
Pattern 1: Internal RAG chatbot over enterprise documents
Pattern 2: In-database RAG inside an APEX app
For Oracle-shop teams that already use APEX, this collapses the stack dramatically. The data, the embeddings, the search, the chat UI, all inside the database and APEX. Provider call out to OCI Gen AI for generation only. Fastest time-to-prod for internal tools.
Pattern 3: Document automation pipeline
Invoices, claims, contracts arrive as PDFs and need to become structured records. Doc Understanding extracts, validation rules check, exceptions route to humans, results land in the system of record. Add a Gen AI step to summarize or classify when needed.
Pattern 4: Fusion-native agentic workflow
For Fusion customers, this is now the default. Pick an agentic app, configure thresholds and approval routing, monitor outcomes. Custom logic goes into AI Agent Studio. Custom data integrations via Fusion REST APIs or OIC. Almost never needs to call OCI Gen AI directly, the agent uses the embedded LLM.
Pattern 5: Custom fine-tuned model deployment
Niche but real. You have labeled data and a use case where a 7B-13B fine-tuned open-weight model beats prompting a frontier model on cost and accuracy. Pipeline: Data Science notebook for preparation → AI Quick Actions or Gen AI dedicated cluster for fine-tuning → Model Deployment endpoint → integrate via your app. Reserve this for cases where you've already proven a managed model doesn't work.
Decision Matrix
Quick answers to the questions that come up in every architecture review.
Pricing & Cost Control
Where the money actually goes, and the levers that move it.
Cost drivers by service
| Service | Unit | What inflates the bill |
|---|---|---|
| OCI Gen AI on-demand | Per 1M characters | System prompt bloat, oversized RAG context, retries, naive token-by-token streaming logging |
| OCI Gen AI dedicated cluster | Per hour | Cluster idle outside business hours, over-provisioned units, dev/test left running |
| Imported compatible models | Dedicated serving + storage | Wrong shape, oversized context windows, duplicate model copies across compartments |
| AI Vector Search | DB CPU + memory + storage | HNSW Vector Memory Pool sizing, re-embeds, dense quantization, RAC HNSW replication |
| Enterprise AI Agents | Underlying Gen AI + retrieval + hosted tools | Long sessions retained, large KBs over-ingested, no rerank cap, tool loops |
| xAI Voice / TTS | Generated audio output | Regenerating static content, long answers, no audio cache |
| Oracle AI Data Platform | Platform resources, Spark/workflow runs, workspaces, storage | Duplicate catalogs, unnecessary reprocessing, stale pipelines rerun in full |
| OCI Vision | Per 1000 images / GPU-hour | Custom training reruns, full-fidelity images when down-sampled would work |
| OCI Document Understanding | Per transaction (first 5K free/mo) | Re-running on bad PDFs, generative-extraction mode by default |
| OCI Speech | Per audio-minute | Real-time streams left open, retries on noisy audio |
| OCI Language | Per API call | Calling Language inside per-token pipelines when batched would do |
| Data Science | Notebook session VM/GPU hours, Jobs, Model Deployment | Idle GPU notebooks, dev model deployments running 24/7 |
| AI Quick Actions | Dedicated GPU hours | POC deployments forgotten, oversized shapes |
| Private AI Services Container | Private compute + storage + ops | Under-sized hosts, unmanaged embedding-model updates, duplicated environments |
| Fusion Agentic Apps | Included with Fusion subscription | Included, no marginal cost beyond the model usage in the underlying Gen AI Service if you customize |
| AI Agent Studio | Included with Fusion subscription | Same, no marginal cost |
| HeatWave GenAI | Per HeatWave node (GPU shapes) | Wrong shape for in-DB LLM inference |
The cost-control checklist
- Token / character budgets per session. Hard cap. Alarm on breach. Kill on hard breach.
- Rerank to top-K. Reduce context size by 60-80% with no quality loss in most RAG.
- Prompt caching. Where the model supports it, cache the system prompt.
- Audio caching. For xAI Voice, store repeated greetings, disclosures, and training clips rather than regenerate.
- Idle-shutdown on every notebook session, every dev Model Deployment, every dev AQA deployment.
- Tag everything. OCI cost tracking by tag is the only way to attribute spend across teams.
- Quotas. Service limits + Resource Manager quotas prevent a runaway agent from consuming a quarter of budget overnight.
- Two-tier deployments. Cheaper model for routing/classification, expensive model only on the path that needs it.
- Egress. Keep callers inside OCI when calling Gen AI at volume. Outbound from OCI adds up.
Indicative cost shape (illustrative, not a quote)
| Workload | Dominant cost | Order of magnitude |
|---|---|---|
| Pilot RAG chatbot, 1K queries/day | Gen AI on-demand per character | Tens to low hundreds USD/month |
| Production internal chatbot, 50K queries/day | Gen AI on-demand + reranker | Low thousands USD/month |
| High-volume customer-facing assistant, 1M queries/day | Gen AI dedicated cluster | Tens of thousands USD/month per cluster |
| Fine-tuned model serving sustained traffic | Dedicated cluster + storage | Tens of thousands USD/month per cluster |
| Voice-enabled agent | Gen AI text + xAI Voice TTS + audio storage | Similar to chatbot cost plus generated-audio usage |
| Document automation, 100K pages/month | Doc Understanding transactions | Hundreds to low thousands USD/month |
| 26ai vector search on Exadata | DB CPU/memory | Often $0 incremental over existing DB licence |
Risks & Gotchas
The honest list. The stuff you want to hear before pilot, not after go-live.
Model and provider risks
| Risk | What happens | What to do |
|---|---|---|
| Model deprecation | App breaks when a model retires | Abstract model name; test against the family's next-gen early; monitor release notes. |
| Deprecated Gen AI APIs | Legacy text-generation integrations stop working after API retirement windows | Use current chat/responses SDK paths; inventory old GenerateText/SummarizeText-style calls before June 2026. |
| Catalog drift | Models available in one region but not another | Document the model-region matrix; revisit at every release; pin a fallback model. |
| Pricing changes mid-contract | Per-character rates change; budget overruns | Negotiate dedicated commitments for predictable spend; alarm on weekly rate change. |
| Vendor strategy shifts | A model is dropped from OCI catalog for partnership reasons | Treat model identity as a config; have a tested second choice. |
| "Best" model isn't on OCI | Stakeholders ask "why not Claude/GPT?" | Document the multi-criteria choice openly; price out hybrid (OCI gov regions + direct API for some workloads). |
RAG quality risks
| Risk | What happens | What to do |
|---|---|---|
| Bad chunking | Retrievals miss the answer that's actually in the corpus | Pilot chunk sizes; structural chunking over fixed-length; measure recall against a gold set. |
| No reranker | Vector top-5 contains noise, LLM hallucinates around it | Add Cohere Rerank 4 as standard. |
| Stale knowledge base | Index lags source-of-truth changes | Schedule + alarms on ingestion lag; expose "Last refreshed" to users. |
| Hallucinated citations | Answer claims a chunk supports it when it doesn't | Render source chunks alongside; post-hoc verification step for high-stakes outputs. |
| Duplicate AI data pipelines | Different teams transform or ingest the same corpus differently and get inconsistent answers | Use AI Data Platform or a shared data-product process to declare authoritative datasets, owners, RBAC, and refresh rules. |
Security risks
| Risk | What happens | What to do |
|---|---|---|
| Prompt injection from documents | Retrieved doc instructs the model to override system prompt | Defensive system prompt; mark retrieved chunks explicitly as untrusted; classify chunks for adversarial content. |
| PII leakage | Model echoes PII back to wrong user | PII scrub on input + output; per-user data isolation in retrieval; audit log of prompt + completion. |
| Cross-tenant leakage in SaaS | Cached completions surface across tenants | Per-tenant prompts; no global cache; tenant-scoped sessions. |
| Schema disclosure via Select AI | LLM sees sensitive column names | Grant minimally; avoid sensitive hints in column names; review queries before runsql. |
| Vector inversion attack | Embeddings reverse-engineered to recover text | Treat vectors as PII; protect with VPD; TDE at rest. |
| Private vector container drift | Container embedding model or index service falls behind the database/vector-search design | Patch and test Private AI Services Container on a monthly cadence; track embedding model, container version, and index parameters in audit logs. |
Operational risks
| Risk | What happens | What to do |
|---|---|---|
| Cost runaway | Recursive agent or buggy loop burns a budget overnight | Per-session budgets; max-step caps; alarms on cost-per-session anomalies; kill switch. |
| Quota throttling | 503s during peak, customer pain | Pre-warm; raise quotas before peaks; move hot workloads to dedicated. |
| Audit gaps | Cannot prove what was said in regulated context | Log prompt + completion + model + version + user; retain per policy; index for review. |
| Model output schema breaks | Tool call args mis-formatted | Strict JSON schema; validate before dispatch; fallback to clarifying turn. |
| Notebook GPU left idle | $thousands wasted per month per team | Auto-stop after N minutes; weekly idle report; tag-based chargeback. |
Strategic risks
- Vendor lock-in. Going deep on Fusion Agentic Apps deepens Fusion lock-in. Worth doing where the apps fit, but be explicit about which workloads stay portable.
- Skills. Oracle AI requires SQL/PL/SQL + cloud + AI knowledge. The unicorn engineer who has all three is rare. Plan training; pair Apps DBAs with data scientists.
- Pace of change. Oracle ships new models monthly. Architectures that assume model stability age fast. Build for swappability.
- Realistic accuracy expectations. 95% accurate is great in lab, terrible if a 5% wrong tax filing creates regulatory exposure. Match accuracy expectation to consequence.
OCI vs AWS vs Azure vs GCP: AI services
Honest four-way side-by-side as of June 24, 2026. Not Oracle marketing, not anyone's marketing. Names and prices move monthly, so verify in each console before you commit.
All four are now broad model platforms plus a managed agent runtime plus a governance layer. The real differences in mid-2026 are (1) who owns the frontier model (AWS has the Anthropic relationship and Amazon Nova; Microsoft Foundry has OpenAI and Phi; Google owns Gemini outright via DeepMind; OCI owns none and instead resells Cohere, xAI, Meta, NVIDIA and lets you import the rest), (2) data gravity (OCI wins when your system of record is Oracle Database or Fusion; Google wins when it is BigQuery), and (3) sovereignty (OCI has the widest GA Gen AI footprint across gov, classified, EU sovereign, and GCC; Google now runs Gemini fully air-gapped on-prem via Google Distributed Cloud; AWS only opened its European Sovereign Cloud in Jan 2026 with a thin model catalog). Two naming changes to know: Azure AI Foundry is now Microsoft Foundry (effective Jan 1, 2026), and Vertex AI is now the Gemini Enterprise Agent Platform (Google Cloud Next, Apr 2026).
Generative AI platforms
| Aspect | OCI Enterprise AI / Gen AI | AWS Bedrock | Microsoft Foundry (was Azure AI Foundry) | Google Cloud (Gemini Enterprise, was Vertex AI) |
|---|---|---|---|---|
| Frontier / foundation models | Cohere, Meta Llama 4, xAI Grok, NVIDIA Nemotron 3, Gemini options; imported Qwen, Gemma, OpenAI gpt-oss | Anthropic Claude (Opus 4.8/4.7, Sonnet 4.6, Haiku 4.5), Amazon Nova 2, Meta Llama, Mistral, DeepSeek, Qwen, Cohere, and OpenAI GPT-5.5/5.4 | OpenAI GPT-5 family + GPT-5.5, Anthropic Claude, Meta Llama, Mistral, xAI Grok, Microsoft Phi | Google Gemini 3 Pro / 3 Flash (in-house); 200+ in Model Garden incl. Anthropic Claude, Meta Llama, Mistral, open Gemma |
| Owns a frontier model? | No (platform/aggregator strategy) | Anthropic stake; Amazon Nova in-house | OpenAI partnership; Phi in-house SLMs | Yes - Gemini, built in-house by Google DeepMind |
| Managed agent runtime | OCI Enterprise AI Agents (GA Mar 2026): RAG, tools, vector stores, Responses-style API, governance hooks | Bedrock AgentCore (GA Oct 2025) + Managed Harness (GA Apr 2026): runtime, gateway, memory, identity, policy, code interpreter, browser, evals | Foundry Agent Service: GPT-5 family, model router, built-in browser automation and MCP tools | Gemini Enterprise Agent Platform: ADK (stable v1.0), Agent Engine runtime + Memory Bank, Agent Studio, A2A protocol v1.0 |
| Vector in source DB | Oracle AI Database 26ai native VECTOR type + HNSW/IVF; hybrid RAG exposed as MCP tool | Aurora / RDS pgvector; Aurora DSQL vectors where applicable; OpenSearch | Azure AI Search; Azure SQL / SQL Server vector; Fabric | AlloyDB AI (pgvector + ScaNN), BigQuery vector search, Spanner vector |
| Guardrails / safety | Enterprise AI Governance + native guardrails (prompt + response eval, Mar 2026) + Language PII filters | Bedrock Guardrails (mature) + Automated Reasoning checks | Azure AI Content Safety + Foundry guardrails/evaluations | Model Armor AI firewall (prompt-injection, data-leak, content filters; multi-model) |
| Shared AI data plane | Oracle AI Data Platform + OCI Resource Analytics (Jun 2026) | SageMaker Unified Studio + Glue / S3 Tables | Microsoft Fabric + OneLake + Foundry | BigQuery + Vertex + Dataplex governance |
| Model routing / cost control | Flexible model routing (work across models, not one-size-fits-all) | Per-model selection; intelligent prompt routing on Bedrock | Model router: Quality / Cost / Balanced modes, up to ~60% inference savings | Vertex model selection; per-agent pricing in the Gemini Enterprise product |
| Apps integration | Fusion Agentic Apps native (22 agents, ERP/HCM/SCM/CX) | Amazon Q Business, Amazon Connect | Microsoft 365 Copilot, Dynamics 365 | Google Workspace (Gemini in Docs/Gmail/Sheets), Workspace Studio |
| On-demand pricing unit | Per 10,000 transactions (characters) for several models; token-based for newer ones | Per 1M input/output tokens | Per 1M input/output tokens | Per 1M input/output tokens |
| Developer tooling maturity | Improving fast but still behind on breadth | Mature (Bedrock Studio, SageMaker, AgentCore) | Mature (Foundry portal, VS Code, GitHub) | Mature (Vertex / Gemini platform, Colab Enterprise, Workbench) |
gpt-oss models run on OCI and Bedrock. Anthropic Claude runs on Bedrock, Microsoft Foundry, and Google Cloud. The one model that stays first-party is Google's Gemini, which you only get on GCP (with narrow exceptions where OCI exposes it). The lock-in is no longer the model. It is the data plane, the agent runtime, and the governance model around it.
Model availability, side by side (June 2026)
| Model family | OCI | AWS Bedrock | Microsoft Foundry | Google Cloud |
|---|---|---|---|---|
| Google Gemini 3 (Pro / Flash) | ~ limited, where exposed | ✗ | ✗ | ✓ first-party flagship |
| Anthropic Claude (Opus 4.8 / Sonnet 4.6 / Haiku 4.5) | ✗ not first-party (call direct) | ✓ flagship | ✓ available | ✓ Model Garden |
| OpenAI GPT-5 / GPT-5.5 (hosted API) | ✗ | ~ GPT-5.5 / 5.4 added | ✓ primary | ✗ |
| OpenAI gpt-oss (open weights) | ✓ import + AI Quick Actions | ✓ | ✓ | ✓ self-deploy |
| Cohere Command / Embed 4 / Rerank 4 | ✓ strategic partner | ✓ | ~ partial | ~ Model Garden |
| Meta Llama 4 (Scout / Maverick) | ✓ | ✓ | ✓ | ✓ |
| xAI Grok (4.x) | ✓ | ~ select | ✓ | ✗ |
| NVIDIA Nemotron 3 (Nano Omni / Ultra) | ✓ dedicated clusters | ~ | ~ | ~ via NIM |
| Amazon Nova 2 | ✗ | ✓ in-house | ✗ | ✗ |
| Microsoft Phi | ✗ | ✗ | ✓ in-house | ✗ |
| Alibaba Qwen / Google Gemma (open) | ✓ import | ✓ | ✓ | ✓ Gemma in-house |
✓ = first-party / managed · ~ = partial, region-limited, or recently added · ✗ = not native (use direct API or a gateway). Always confirm exact model IDs and regions in the console.
Pricing, normalized (representative, June 2026)
| Item | OCI | AWS Bedrock | Microsoft Foundry | Google Cloud |
|---|---|---|---|---|
| Flagship reasoning model (in / out per 1M tok) | Grok / Cohere top tier, token-based up to ~$10.7 in (varies by model) | Claude Opus 4.8 ≈ $5 / $25 | GPT-5.5 ≈ $5 / $30 | Gemini 3 Pro ≈ $2 / $12 (≤200K ctx); $4 / $18 beyond |
| Mid-tier workhorse (in / out per 1M tok) | Cohere Command A / Llama 4 (low per-character rates) | Claude Sonnet 4.6 ≈ $3 / $15 | GPT-5 mini (lower-cost tier) | Gemini 3 Flash ≈ $0.50 / $3.00 |
| Cheapest small model | Llama 4 Scout ≈ $0.0018 / 10K transactions | Amazon Nova Micro ≈ $0.035 / 1M in | GPT-5 nano / Phi (low-latency tier) | Gemini 3 Flash-Lite tier |
| Embeddings | Cohere Embed 4 ≈ $0.001 / 10K transactions | Titan / Nova multimodal embeddings, per 1M tok | Azure OpenAI embeddings, per 1M tok | Vertex AI text embeddings, per 1M tok |
| Reranker | Cohere Rerank 4 on-demand; dedicated ≈ $10 / cluster-hour | Cohere Rerank via Bedrock | via Azure AI Search semantic ranker | Vertex AI Ranking API / grounding |
| Dedicated / provisioned | Per AI-unit-hour (e.g. large Cohere ≈ $24, large Meta ≈ $12) | Provisioned Throughput (model units / hour) | Provisioned Throughput Units (PTUs) | Provisioned Throughput (GSUs) |
| Prompt caching discount | Model-dependent | Up to ~90% on cached input | Up to ~90% on cached input | Context caching discount |
Sources: Oracle OCI Generative AI pricing page; Anthropic / AWS Bedrock pricing; Microsoft Foundry pricing; Google Vertex / Gemini pricing; third-party aggregators (June 2026). Prices change without notice.
Agents & RAG platforms, in depth
| Capability | OCI Enterprise AI Agents | AWS Bedrock AgentCore | Foundry Agent Service | Gemini Enterprise Agent Platform |
|---|---|---|---|---|
| GA status | GA Mar 2026 | AgentCore GA Oct 2025; Managed Harness GA Apr 2026 | GA; GPT-5 family rolling into the runtime | GA; rebranded from Vertex AI at Cloud Next, Apr 2026 |
| Managed RAG / knowledge stores | Built-in vector stores + 26ai + OpenSearch; Object Storage ingestion | Bedrock Knowledge Bases (managed ingestion + vector store) | Foundry vector index + Azure AI Search | Vertex AI Search + grounding; AlloyDB / BigQuery vectors |
| Tools / function calling | Tools + Responses-style API | Gateway turns APIs/Lambdas into agent tools | Built-in tools + MCP + browser automation | Function calling + tools; A2A protocol v1.0 for agent-to-agent |
| Memory / identity | Session state; governance hooks | Managed memory, identity, policy engine | Thread state; Entra ID identity | Agent Engine Sessions + Memory Bank; Google IAM |
| Built-in browser / code exec | Via tools / custom | Yes: browser tool + code interpreter built in | Yes: browser automation; code interpreter | Yes: code execution + computer-use tooling |
| Observability / evals | Governance + monitoring hooks | Built-in evaluations + observability | Foundry evaluations + tracing | Vertex evaluations + tracing |
| Standout strength | Wired into Oracle data + Fusion roles, approvals, RBAC | Most complete standalone agent infra; model-agnostic | Tight M365 / Entra / GitHub fit + model router economics | Owns Gemini end-to-end; ADK + A2A; tight Workspace + BigQuery fit |
Sovereignty & governance
| Dimension | OCI | AWS | Microsoft | |
|---|---|---|---|---|
| US government | US Gov Cloud + US Classified Cloud with GA Gen AI (since Jan 2026) | GovCloud (US) + Secret / Top Secret regions | Azure Government + classified offerings | Assured Workloads for Gov; IL5-capable regions |
| EU sovereign | EU Sovereign Cloud (GA, EU-operated) | European Sovereign Cloud GA Jan 15, 2026 (Germany; partition aws-eusc) | Microsoft Cloud for Sovereignty + EU Data Boundary | Sovereign Cloud; partner-operated (T-Systems Germany, S3NS / Thales France) |
| Gen AI in the sovereign region? | Yes broad model set GA in sovereign/regulated regions | Limited Bedrock present but only Nova Lite / Pro at ESC launch (no Claude/Llama/Mistral) | Varies by offering and region | Yes Gemini runs fully air-gapped on-prem via GDC |
| GCC / Middle East | Saudi (Jeddah, Riyadh), UAE Central (Abu Dhabi, full Enterprise AI Jun 2026), Israel | UAE, Bahrain regions (model availability varies) | UAE, Qatar regions (model availability varies) | Saudi (Dammam), Qatar (Doha), Israel regions |
| Guardrails maturity | Native platform guardrails (Mar 2026) + governance + PII filters | Bedrock Guardrails (most mature) + Automated Reasoning | Content Safety + Foundry evaluations | Model Armor AI firewall (multi-model, in-line) |
| Sovereign-region cost note | Standard regional pricing | ~15% premium in ESC; 2 AZs; no Free Tier at launch | Varies by sovereign offering | GDC air-gapped needs Google-supplied hardware |
When each wins
| Buying criterion | Winner | Why |
|---|---|---|
| You already run Oracle DB / Fusion | OCI | Data gravity + Fusion Agentic Apps + free vector search |
| You want the Gemini model specifically | GCP | Gemini is first-party to Google; only narrow exposure elsewhere |
| You're BigQuery / Workspace native | GCP | Data gravity in BigQuery; Gemini wired into Docs, Sheets, Gmail |
| You need a specific frontier model/version right now | AWS, Azure, GCP, or direct API | OCI catalog is broad in 2026, but exact model/version/region still decides. |
| You need M365 / Dynamics integration | Azure | Copilot ecosystem |
| You're AWS-native on infra | AWS | IAM / VPC / observability already there |
| Sovereign data with GA Gen AI in region | OCI | Reach into Gov, Classified, UAE, EU Sov |
| True air-gapped or on-prem Gen AI | GCP or OCI | Gemini runs air-gapped on GDC; OCI has classified / sovereign regions |
| Pure consumer SaaS at low cost | AWS, GCP, or direct API | Wider model price competition; Gemini Flash is cheap |
| Document-heavy enterprise back office | Tie (all four good) | Each has competent doc AI + RAG |
Quick alignment (informal)
| OCI | AWS | Microsoft Foundry (was Azure AI Foundry) | Google Cloud |
|---|---|---|---|
| OCI Generative AI Service | Bedrock | Microsoft Foundry / Azure OpenAI | Vertex AI / Gemini API |
| OCI Enterprise AI Agents | Bedrock Agents + KB + AgentCore | Foundry Agent Service | Gemini Enterprise Agent Platform (ADK + Agent Engine) |
| Oracle AI Data Platform | SageMaker Unified Studio / Glue / S3 Tables | Fabric / OneLake / Foundry data plane | BigQuery + Dataplex + Vertex |
| AI Vector Search (26ai) | OpenSearch · Aurora pgvector | Azure AI Search · SQL DB vector | AlloyDB AI · BigQuery vectors |
| OCI Data Science · AI Quick Actions | SageMaker · Bedrock Marketplace | Foundry · Azure ML | Vertex AI Workbench · Model Garden |
| OCI Vision | Rekognition | Azure AI Vision | Cloud Vision / Vertex Vision |
| OCI Language | Comprehend | Azure AI Language | Cloud Natural Language |
| OCI Speech + xAI Voice | Transcribe + Polly | Azure AI Speech | Speech-to-Text + Text-to-Speech (Chirp) |
| OCI Document Understanding | Textract | Azure Document Intelligence | Document AI |
| OCI Anomaly Detection | Lookout for Metrics (deprecating) | Anomaly Detector (deprecating) | Timeseries Insights / BigQuery ML |
| OCI Forecasting | SageMaker Canvas Forecast | Azure ML AutoML Forecasting | Vertex AI Forecasting / BigQuery ML |
| Fusion Agentic Apps | Q Business | M365 Copilot | Gemini for Workspace / Agentspace |
| HeatWave GenAI | (no direct equivalent) | (no direct equivalent) | BigQuery ML + Gemini |
Sources used for this June 2026 refresh
Primary Oracle docs and blogs the content was anchored to, plus the competitive sources used for the OCI vs AWS vs Azure comparison. Verify the latest before locking in commitments.
June 2026 update (new this refresh)
- What's New in Oracle AI, June 2026 edition (published June 11, 2026)
- Cohere Rerank 4 on OCI (on-demand + dedicated)
- NVIDIA Nemotron 3 Ultra imported-model deployment
- Model Import: Qwen, Gemma, gpt-oss on B200 (release notes)
- OCI Enterprise AI in UAE Central (Abu Dhabi) release notes
- OCI Resource Analytics announcement
- Hybrid RAG in Oracle 26ai as an MCP tool
OCI vs AWS vs Azure vs GCP comparison sources
- OCI Generative AI pricing
- Amazon Bedrock AgentCore GA (Oct 2025)
- Bedrock AgentCore Managed Harness GA (Apr 2026)
- Opening the AWS European Sovereign Cloud (Jan 2026)
- AWS European Sovereign Cloud launch press release
- GPT-5.5 in Microsoft Foundry
- Microsoft Foundry model router (Quality/Cost/Balanced)
- GPT-5 family powers Foundry Agent Service
- Azure AI Foundry renamed Microsoft Foundry (Ignite 2025)
- Anthropic Claude API pricing (Opus 4.8 / Sonnet 4.6 / Haiku 4.5)
- Gemini Enterprise Agent Platform (formerly Vertex AI)
- Introducing Gemini 3 Flash (Google blog)
- Gemini / Vertex AI generative AI pricing
- Run Gemini air-gapped on-prem with Google Distributed Cloud
- Google Model Armor (AI firewall)
OCI Generative AI
- Generative AI documentation
- OCI Generative AI overview
- Generative AI release notes
- OCI Enterprise AI overview
- OCI Enterprise AI GA announcement
- What's New in AI, May 2026 (Oracle blog)
- OCI Gen AI in US Classified Cloud
- Enterprise AI, comparing models on OCI
OCI Enterprise AI Agents and Governance
- Gen AI Agents service overview
- Enterprise AI platform: models, agents, governance
- Announcing the OCI Gen AI Agents RAG service
- AI-powered enterprise search with Gen AI Agents + 23ai + OpenSearch
Oracle AI Database 26ai / 23ai Vector Search
- Introducing Oracle AI Database 26ai
- 26ai release updates (AI Vector Search)
- Oracle AI Vector Search product page
- AI Vector Search key features
- 26ai on-prem Linux x86-64 availability
- Oracle AI Database 26ai AI, ML, and Analytics docs bookshelf
- Oracle AI Database Private Agent Factory User's Guide
- Oracle Private AI Services Container User's Guide
- HNSW and IVF in 26ai (community deep dive)
Oracle AI Data Platform
- Introduction to Oracle AI Data Platform
- Oracle AI Data Platform documentation
- IAM policies for Oracle AI Data Platform Workbench
Select AI
- Autonomous AI Database Select AI overview
- Select AI User's Guide
- Use Select AI for Natural Language Interaction
Fusion Agentic Apps & AI Agent Studio
- Oracle introduces Fusion Agentic Applications (Mar 2026)
- AI Agent Studio expansion (Mar 2026)
- Fusion Agentic Apps for CX (Apr 2026)
- AI Agents for Fusion Applications
APEX AI
- APEX 24.2 GA announcement
- AI Configurations and RAG Sources in APEX
- AI-Driven Data Modeling in APEX 24.2
- Including Generative AI in APEX apps (docs)
Oracle Digital Assistant
- Oracle Digital Assistant documentation
- Oracle Digital Assistant features
- Deploy an Oracle Digital Assistant chatbot powered by OCI Generative AI Agents
MySQL HeatWave GenAI
OCI Data Science & AI Quick Actions
- AI Quick Actions overview
- Introducing AI Quick Actions
- Deploy Llama 4 with AI Quick Actions
- OpenAI gpt-oss models in AI Quick Actions
OCI AI Services
- OCI AI Services hub
- OCI Vision
- OCI Language pricing
- OCI Speech pricing
- OCI Document Understanding pricing