As on 26 June 2026
← expertoracle.com

AWS AI, the practical way

An architecture-first reference for the Amazon AI stack as of June 2026. From Amazon Bedrock and the Nova model family, to Bedrock AgentCore for production agents, to SageMaker for custom models. Trade-offs, pricing shape, and risks. No marketing.

Refreshed June 2026Architecture-firstEnterprise focusVendor-neutral
Naming, 2026
No big rebrand on AWS this year - the shift is in the agent story: Bedrock Agents (2025) became Bedrock AgentCore, a managed runtime with Memory, Gateway, Identity, Observability, Web Search, and (preview) Payments. SageMaker was repositioned as the unified center for data + analytics + AI, with SageMaker Unified Studio now GA.
TL;DR

AWS's AI story in 2026 has three layers. Amazon Bedrock is the managed gateway to dozens of foundation models (Anthropic Claude, Meta Llama, Mistral, DeepSeek, NVIDIA Nemotron, and Amazon's own Nova) behind one API, with Knowledge Bases, Guardrails, customization, and evaluations. Bedrock AgentCore turned the 2025 agent preview into a managed runtime for production agents. SageMaker - now the unified center for data + analytics + AI - is where you train, fine-tune, and host custom models. If you already run on AWS, the data-gravity, IAM integration, and custom-silicon economics make this stack the path of least resistance.

The AWS AI mental model

LAYER 3 - ASSISTANTS & AGENTS (consume) Amazon Q Developer & Q Business - Bedrock Agents / AgentCore - Amazon Quick - Q in QuickSight / Connect Pre-built or low-code. Governed by IAM. You configure tools and knowledge, not model weights. LAYER 2 - PLATFORM AI (build) Amazon Bedrock: model catalog - Knowledge Bases - Guardrails - Flows - Evaluations - AgentCore SageMaker AI - Unified Studio - HyperPod - JumpStart - applied AI (Rekognition, Textract, Transcribe…) Consume hosted models, import/customize, build agents, govern responses, fine-tune, deploy, monitor. LAYER 1 - DATA & INFRASTRUCTURE (ground) S3 (incl. S3 Vectors) - OpenSearch / Aurora pgvector / Kendra - Bedrock Data Automation - Lake Formation Trainium2/3 - Inferentia2 - EC2 P5/P6 (Blackwell) / G7 - UltraClusters - SageMaker HyperPod Your data and vectors live here, next to the rest of your AWS estate and IAM.
Figure 1 - The AWS AI stack is layered. Start at Layer 3/2 (Bedrock); drop to Layer 1 only when you need custom training or chips.

What sets AWS apart in 2026

DifferentiatorWhat it means in practice
Widest managed model catalogBedrock fronts Anthropic Claude, Meta Llama, Mistral, Cohere, AI21, DeepSeek, NVIDIA Nemotron, Stability, and Amazon Nova behind one API and one bill. Switching models is a parameter change, not a re-architecture.
Anthropic relationship + TrainiumDeep Anthropic partnership (Project Rainier Trainium clusters) means frontier Claude models are first-class on Bedrock, often with strong price/perf on AWS silicon.
AgentCore as managed runtimeMemory, Gateway (tools/MCP), Identity, Observability, Browser, Code Interpreter, Web Search, and Payments (preview) - framework-agnostic (Strands, LangChain, OpenAI Agents SDK, Claude Agent SDK).
Data gravity + IAMIf your data is already in S3/Redshift/Aurora, RAG ground truth and access control are native. No new identity plane.
Custom silicon economicsTrainium/Inferentia give a cost lever for training and high-volume inference that pure-GPU clouds cannot match on price.

Where AWS is weaker (be honest)

Own frontier model
Amazon Nova is competitive on price/latency and improving fast, but it is not the model you reach for when you need the absolute top of the reasoning leaderboard - that is usually Claude (also on Bedrock) or a competitor's flagship. Amazon's bet is breadth, integration, and silicon economics, not owning the #1 model.
Surface area & sprawl
The catalog of overlapping services (Bedrock vs SageMaker vs Q vs applied-AI, three vector stores, two studios) is large. Picking the right primitive is itself an architecture decision - see the Decision Matrix.

How to read this portal

Each flagship service tab has sub-tabs (Overview / Architecture / Capabilities / Pricing / Risks / When to use) with a reference-architecture diagram. If you only read one sub-tab, read Risks. The others tell you what something does; Risks tells you what bites you in production.

What's New - late 2025 through June 2026

Material changes that affect architecture, cost, or risk. Curated, not a press-release dump.

TL;DR

The dominant 2026 theme is agents going to production: Bedrock AgentCore added managed Knowledge Bases, a managed agent harness, native Web Search, and (preview) autonomous Payments. Model breadth widened (NVIDIA Nemotron 3 Super on Bedrock, Nova Forge for Nova customization, Reinforcement Fine-Tuning). And SageMaker was repositioned as the unified data+AI center, with SageMaker Unified Studio now GA and Amazon Q Developer embedded throughout.

DateReleaseWhy it matters
Dec 2025Next-gen SageMaker + Unified Studio (re:Invent)SageMaker repositioned as the single center for data, analytics, and AI - Glue, EMR, Athena, Redshift, Bedrock, and SageMaker AI in one workspace with a lakehouse.
Dec 2025Trainium3 announcedNext-gen training/inference silicon; continues AWS's price/perf lever vs pure-GPU stacks. Confirm region/instance availability before designing around it.
Jan 2026SageMaker Unified Studio GA + Amazon Q Developer GA in StudioData professionals get GenAI assistance across the lifecycle; Bedrock and SageMaker AI usable from one IDE.
Feb 2026Reinforcement Fine-Tuning in BedrockTailor models to narrow tasks with reward signals - higher accuracy on domain workflows without full training.
Mar 2026NVIDIA Nemotron 3 Super on Bedrock; Nova Forge SDKOpen-weight frontier reasoning model managed; Nova Forge lets enterprises customize Nova on their data and deploy inside Bedrock.
Apr 2026AgentCore Payments (preview)Agents can autonomously pay for APIs, MCP servers, web content, and other agents - built with Coinbase and Stripe. New control-plane and audit considerations.
May 2026Agent Toolkit for AWS; AgentCore managed harnessDeclare and run an agent in ~3 API calls, no orchestration code. Lowers time-to-first-agent dramatically.
Jun 2026AWS Summit NY: Managed Knowledge Bases (Smart Parsing, Agentic Retriever), Web Search on AgentCore (GA), Amazon Quick, S3 Annotations, EC2 G7 (RTX PRO Blackwell)Fully-managed RAG with multi-format parsing; grounded answers with zero data egress; mutable per-object context in S3; new inference GPU tier.
Practical read
If you piloted Bedrock Agents in 2025, plan a migration review to AgentCore: the managed Memory, Gateway, Identity, and Observability replace a lot of custom glue. If you run SageMaker Studio (classic), plan the move to Unified Studio.

Service Map

The AWS AI services worth knowing, grouped by what you do with them.

COREAmazon Bedrock

Managed multi-model API: catalog, Knowledge Bases, Guardrails, Flows, Evaluations, customization, AgentCore.

MODELSAmazon Nova

Amazon's own FM family: Micro, Lite, Pro, Premier, plus Canvas (image), Reel (video), Sonic (speech). Forge to customize.

AGENTSBedrock AgentCore

Runtime, Memory, Gateway, Identity, Observability, Browser, Code Interpreter, Web Search, Payments (preview).

BUILDSageMaker AI + Unified Studio

Train, fine-tune, host custom models; HyperPod for FM training; one studio over data+analytics+AI.

ASSISTAmazon Q

Q Developer (coding/ops agent), Q Business (enterprise RAG assistant), Q in QuickSight/Connect, Amazon Quick.

APPLIEDApplied AI

Rekognition, Textract, Comprehend, Transcribe, Polly, Translate, Lex, Kendra, Personalize.

DATAVectors & Data

S3 Vectors, OpenSearch vector, Aurora/RDS pgvector, MemoryDB, Kendra GenAI Index, Bedrock Data Automation.

SILICONChips & GPUs

Trainium2/3, Inferentia2, EC2 P5/P6 (Blackwell), G7, UltraClusters, Capacity Blocks.

GOVERNGuardrails

Content filters, denied topics, PII redaction, contextual grounding, Automated Reasoning checks.

How to read this
The flagship services (Amazon Bedrock, Bedrock AgentCore, SageMaker AI) carry full sub-tabs - Overview / Architecture / Capabilities / Pricing / Risks / When-to-use - with reference-architecture diagrams. Secondary services use a single rich page with the same architecture-first, risk-honest treatment. If you're scoping production, read a service's Risks before its Overview.

Amazon Bedrock

The managed, serverless gateway to foundation models. One API, one IAM model, one bill, many vendors.

Official documentation ↗

Overview
Architecture
Capabilities
Pricing model
Risks & gotchas
When to use
TL;DR

Bedrock exposes many foundation models through a unified API (Converse / InvokeModel). You never manage servers; you pay per token on-demand, reserve capacity with Provisioned Throughput, or run Batch. Around the models sit Knowledge Bases (managed RAG), Guardrails (independent safety), Flows (orchestration), Evaluations, and customization. It is the default starting point for almost any GenAI workload on AWS.

What problem this solves

Most teams don't want to run GPU fleets, manage model weights, stand up a guardrail service, wire a vector store, and negotiate vendor contracts separately. Bedrock's offer is one IAM-governed, serverless surface where you swap Claude / Llama / Nova with a parameter, apply the same Guardrail policy across models, and keep data inside your AWS account. The trade-off is that exact model and feature availability varies by region - confirm before you design.

Two consumption modes, plus batch

ModeHow you payBest for
On-demandPer input/output token, no commitment.Prototyping, variable/low volume, model comparison.
Provisioned ThroughputReserved model units (hourly + term commitments).Steady high volume needing predictable latency and cost.
BatchDiscounted vs on-demand, asynchronous.Large non-interactive jobs: enrichment, classification, embedding generation.
Rule of thumb
Start on-demand. Move to Provisioned Throughput when sustained traffic makes reserved units cheaper than pay-go and you need latency guarantees. Push bulk, non-interactive work to Batch for the discount.

Reference architecture

Your AWS account / VPC - IAM-governed, PrivateLink to Bedrock Application Lambda / ECS / EKS / EC2 AWS SDK - Converse API Bedrock runtime endpoint Regional, PrivateLink-capable Guardrails in-line Model dispatch (catalog) ▸ Anthropic Claude (Opus/Sonnet/Haiku) ▸ Amazon Nova (Micro/Lite/Pro/Premier) ▸ Llama / Mistral / DeepSeek / Nemotron ▸ Cohere Embed / Rerank, Titan, Stability ▸ Marketplace + imported custom models On-demand / Provisioned / Batch per-token, reserved units, or async batch cross-region inference, prompt caching AgentCore runtime (optional) Memory, Gateway/MCP, Identity, Observability framework-agnostic agents Knowledge Bases (RAG) S3 + connectors, Smart Parsing OpenSearch / S3 Vectors / Aurora Agentic Retriever Govern & observe Guardrails, Evaluations CloudWatch, CloudTrail model-invocation logging Identity & security IAM policies, resource controls KMS keys, PrivateLink, no public egress data not used to train base models
Figure - Amazon Bedrock reference shape. PrivateLink + IAM keep traffic off the public internet; Guardrails sit in-line; Knowledge Bases and AgentCore are opt-in around the model call.

Network and identity

Bedrock is reachable over PrivateLink (VPC endpoints) so model traffic never traverses the public internet. Authorization is IAM: scope policies to specific models, Knowledge Bases, Guardrails, and agents; callers use roles (instance/task/Lambda execution roles). Encrypt with KMS and prefer customer-managed keys for regulated data.

Where the data goes

AWS's stated position is that your prompts and completions are not used to train the base foundation models and stay within your account and region. You control model-invocation logging to CloudWatch/S3. For data residency, pin the region and confirm the model is available there; cross-region inference can move data across regions, so weigh it against compliance needs.

Capability matrix (June 2026)

CapabilityStatusNotes
Model catalog + MarketplaceFirst-party + partner FMs; 100+ via Marketplace; import custom weights.
Knowledge Bases (RAG)Managed RAG; 2026 adds Smart Parsing + Agentic Retriever.
GuardrailsContent filters, denied topics, PII, contextual grounding, Automated Reasoning checks.
FlowsVisual orchestration of prompts, models, KBs, Lambda.
EvaluationsAutomatic + LLM-as-judge model and RAG evaluation.
CustomizationFine-tuning, continued pre-training, distillation, Reinforcement Fine-Tuning.
Prompt caching / cross-regionCut cost/latency on repeated context; auto-route to capacity in other regions.
Batch inferenceDiscounted asynchronous processing at scale.
AgentCoreManaged agent runtime - its own tab.

How Bedrock bills

LeverHow it billsControl
On-demandPer input/output token, per model.Right-size model per task; cache prompts; cap output tokens.
Provisioned ThroughputReserved model units (hourly + term).Commit only after you know the steady load.
BatchDiscounted vs on-demand.Use for non-interactive enrichment.
Knowledge BasesStorage + query + embedding tokens (+ vector store).Tune chunk size; prune stale docs; pick S3 Vectors for cost.
Cost surprise
On-demand token pricing varies 10-30x between a flagship and a small model. A chatty agent loop on a flagship is the classic surprise bill. Set budgets, cache, and route small models for routine steps.
Model/region availability
Not every model is in every region. Confirm the exact model+region before you design around it; cross-region inference helps but has data-residency implications.
Cost blowouts
Flagship models in agent loops dominate bills. Budget per conversation, right-size the model per task, and cache repeated context.
Guardrails are opt-in
Guardrails are not applied unless you attach them. Make them part of the deployment, not an afterthought, and validate prompts and responses at the platform layer.
Quotas
Default token/request quotas throttle real workloads. Request increases early; design for throttling and backoff.
  • Use Bedrock for almost any GenAI workload on AWS - model optionality, managed RAG/guardrails/evals, no infrastructure.
  • Drop to SageMaker only when you need custom training, exotic hosting, or a model not in the catalog.
  • Add AgentCore when the workload is an agent heading to production.
  • Lead with a small model + prompt caching for cost; reserve Provisioned Throughput once volume is steady.

Foundation Model Catalog

Indicative view of model families on Bedrock in 2026. Exact versions and regions change frequently - confirm in the console.

Official documentation ↗

ProviderFamiliesTypical use
AnthropicClaude (Opus / Sonnet / Haiku tiers)Top-tier reasoning, agents, coding, long context. The frontier default on Bedrock.
AmazonNova Micro / Lite / Pro / Premier; Canvas, Reel, SonicCost/latency-optimized text and multimodal; image, video, speech generation.
MetaLlama (open weights)Open-weight workloads, customization, on-prem parity.
MistralMistral / MixtralEfficient European open-weight options.
DeepSeekDeepSeek-R1 and successorsStrong open reasoning at low cost.
NVIDIANemotron 3 (Super)Open-weight frontier reasoning/agentic, hosted managed.
Cohere / AI21 / StabilityCommand / Embed / Rerank, Jamba, Stable Diffusion / ImageEmbeddings, reranking, long-context, image generation.
Embeddings + rerank
For RAG, pair an embedding model (Amazon Titan Text Embeddings, Cohere Embed) with a reranker (Cohere Rerank) for a quality lift at low engineering cost.
Pin model IDs
Models rev and deprecate. Pin a specific model ID in production, watch deprecation notices, and evaluate before auto-upgrading.

Amazon Nova

Amazon's own foundation-model family - optimized for price, latency, and AWS integration.

Official documentation ↗

ModelModalityBest for
Nova MicroTextCheapest, fastest text - classification, routing, simple extraction at scale.
Nova LiteMultimodal (text+image/video in)Low-cost multimodal understanding, high-volume workloads.
Nova ProMultimodalBalanced capability/cost for most enterprise tasks and agents.
Nova PremierMultimodal, most capableComplex reasoning; also the teacher model for distillation.
Nova CanvasImage generationStudio-quality images with content credentials/watermarking.
Nova ReelVideo generationShort-form video from text/image prompts.
Nova SonicSpeech-to-speechReal-time voice interactions with low latency.
Nova Forge (2026)
Forge SDK lets you customize Nova on domain data (fine-tune/distill) and deploy directly within Bedrock - useful when you want Nova's economics with your own task accuracy.
Positioning
Use Nova where cost and latency dominate and the task is well-scoped. For the hardest reasoning, A/B it against Claude on the same Bedrock API before committing.

Amazon Bedrock AgentCore GA

The managed runtime for production agents. Framework-agnostic - bring Strands, LangChain, OpenAI Agents SDK, or the Claude Agent SDK.

Official documentation ↗

Overview
Architecture
Modules
Risks & gotchas
When to use
TL;DR

AgentCore makes the hard parts of running agents - memory, tool auth, networking, identity, tracing, and safety - managed concerns, and standardizes the integration surface (MCP, OpenAPI). The 2026 managed harness lets you declare and run an agent in ~3 API calls with no orchestration code. It is framework-agnostic, so you keep your agent logic and let AWS run the plumbing.

What problem this solves

Hand-built agent loops prototype fast and operate badly: state, retries, tool credentials, private networking, observability, and guardrails all become your code. AgentCore turns those into managed modules you opt into, and adds capabilities most teams can't easily build - a managed Browser, sandboxed Code Interpreter, zero-egress Web Search, and (preview) autonomous Payments.

Migrate 2025 pilots
If you built on Bedrock Agents + custom glue in 2025, AgentCore's managed Memory, Gateway, Identity, and Observability replace most of that scaffolding. Move for the operational maturity alone.

Module map

Your agent (any framework) - Runtime / managed harness Memoryshort + long term Gatewaytools / MCP / APIs Identityscoped access Observabilitytraces / eval Browserheadless web Code Interpretersandboxed exec Web Searchgrounded, zero-egress Paymentspreview
Figure - AgentCore modules. Mix and match; you don't have to adopt all of them.
ModuleWhat it gives youStatus
Runtime / HarnessManaged serverless execution; declare and run an agent in ~3 API calls, no orchestration code.GA
MemoryShort-term and long-term memory so agents retain context across turns and sessions.GA
GatewayTurn APIs, Lambda, and MCP servers into governed agent tools with auth and access control.GA
IdentityScoped, least-privilege access for agents; policies verified by Automated Reasoning (same tech as IAM/S3).GA
ObservabilityTraces of every step and tool call, and where the agent went off track; evaluation vs real traffic.GA
Browser & Code InterpreterHeadless browsing and sandboxed code execution as managed tools.GA
Web SearchGrounded, cited answers from the live web with zero data egress from your AWS environment.GA
PaymentsAgents autonomously pay for APIs, MCP servers, content, and other agents (Coinbase/Stripe).Preview
Agentic payments = new risk class
An agent that can spend money needs hard budget caps, human-in-the-loop thresholds, and immutable audit. Treat AgentCore Payments as a controlled pilot, not a default.
Runaway loops & actions
Unbounded loops and broad tool access cause cost blowouts and unintended actions. Enforce step caps, budgets, least-privilege Identity, and human approval on high-impact tools.
Tool/data egress
Gateway tools and Web Search can move data. Vet tools, prefer scoped auth, and keep agents on private networking with egress controls. Web Search is zero-egress by design - confirm other tools are too.
  • Use AgentCore for any agent heading to production - managed memory, tools, identity, and observability beat hand-rolled glue.
  • Adopt modules incrementally - you don't need all of them; start with Runtime + Memory + Gateway + Observability.
  • Gate Payments behind budgets and human approval; pilot before trusting autonomous spend.
  • Pair with Bedrock Guardrails so safety holds regardless of the model the agent uses.

AWS vs OCI vs Azure vs GCP

A practitioner's quick read. Every cloud does the basics; the differences are in defaults, data gravity, and silicon.

DimensionAWSOCIAzureGCP
Model breadth (managed)Widest (Bedrock)Broad (OCI Gen AI)Foundry Models (1000+)Model Garden (200+)
Frontier own modelNova (mid); Claude hostedNone (partners)OpenAI GPT-5.xGemini 3.x
AgentsAgentCoreEnterprise AI AgentsFoundry Agent ServiceAgent Platform + A2A
Custom siliconTrainium/InferentiaGPU (NVIDIA)Maia (emerging)TPU (Ironwood/8th)
Data gravityS3 / RedshiftOracle DB 26ai (in-DB vectors)Fabric / OneLakeBigQuery
Best whenAlready on AWS; want model choice + silicon economicsRun Oracle DB/EBS; want in-DB vectors + sovereigntyMicrosoft shop; want OpenAI + M365BigQuery/Workspace central; want Gemini + TPU
Honest take
The cloud you already run is usually the right one for GenAI - data gravity and IAM beat a marginally better model. AWS's edge is the widest model catalog plus custom-silicon economics; its tax is service sprawl.

Sources

Primary AWS material used for this portal (June 2026). Verify specifics against current docs before committing - this space moves weekly.

Accuracy note
Compiled by Brijesh Gogia for expertoracle.com. Independent and not affiliated with Amazon/AWS. Model names, availability, and pricing change frequently - treat this as orientation, confirm in the AWS console/docs before designing.

Knowledge Bases & RAG

Managed retrieval-augmented generation - the most common enterprise GenAI pattern.

Official documentation ↗

Ingestion (offline) & Query (online) S3 + connectorsdocs, sites, DBs Smart Parsing + embedmulti-format prep Vector storeOpenSearch / S3 VectorsAurora pgvector Query (agent/app)via Bedrock Agentic Retrievermulti-step + rerank Grounded answermodel + citations Guardrailsgrounding check
Figure - Bedrock Knowledge Bases. 2026 adds Smart Parsing (multi-format prep) and an Agentic Retriever for multi-step queries; Guardrails check grounding on the way out.
Use Knowledge Bases whenBuild your own when
You want managed RAG with minimal code, the corpus is mostly documents, and Smart Parsing / built-in retrieval quality matter.You need fine control over chunking, hybrid search, custom rerankers, or a vector store you already operate.
Bedrock Data Automation
For multimodal corpora (documents, images, audio, video), BDA extracts structured output to feed a Knowledge Base - cleaner than rolling your own parsers.
Retrieval is the failure point
Most "the model hallucinated" incidents are retrieval misses. Tune chunking, add a reranker, enable contextual-grounding Guardrails, and evaluate retrieval before blaming the LLM.

Guardrails

An independent safety layer you apply to any model - first-party or imported.

Official documentation ↗

ControlWhat it catches
Content filtersHate, insults, sexual, violence, misconduct, prompt attacks - tunable thresholds.
Denied topicsBlock subjects out of scope for your application.
Sensitive info / PIIDetect and redact or block PII and custom regex patterns.
Contextual groundingScore answers for grounding against source and relevance to the query - reduce hallucination.
Automated Reasoning checksMathematically verify outputs against encoded policies/rules - high-assurance domains.
Apply at the platform layer
Guardrails sit between the app and the model, so the same policy applies regardless of which model the agent picks. Validate prompts and responses here, not only in app code.
Opt-in
Guardrails do nothing until attached to an invocation or agent. Make attaching them part of the deployment template.

SageMaker AI

Where you train, fine-tune, and host models when Bedrock's managed path isn't enough.

Official documentation ↗

Overview
Architecture
Components
When to use
TL;DR

SageMaker AI is the full-control path: managed training jobs, real-time / serverless / async / batch endpoints, JumpStart for one-click model deploy/fine-tune, HyperPod for large-scale FM training, and MLOps (Pipelines, Model Registry, Clarify, Model Monitor). Reach for it when Bedrock's managed surface can't express what you need - custom training, exotic hosting, or a model outside the catalog.

Bedrock vs SageMaker

Bedrock = consume and customize managed models, fast, serverless. SageMaker = own the training, hosting, and MLOps. Many teams use both: Bedrock for the app, SageMaker for the custom model behind it. Start in Bedrock; drop here only when you must.

Reference architecture

SageMaker AI - IAM, VPC, S3 data lake Data (S3 / lakehouse)Glue / Feature Store Train / Fine-tunejobs, JumpStart, HyperPod Model Registryversions, approval Endpointsreal-time / serverless / async / batch Pipelines (MLOps)reproducible, lineage, approval gates Clarify + Model Monitorbias / explainability / drift
Figure - SageMaker AI. Data to training to registry to endpoints, wrapped in MLOps pipelines with bias/drift monitoring.
ComponentUse
JumpStartOne-click deploy/fine-tune of open and partner foundation models.
Training & InferenceManaged training jobs; real-time / serverless / async / batch endpoints with autoscaling.
HyperPodResilient, self-healing clusters for FM pre-training and heavy fine-tuning across thousands of accelerators.
Pipelines / Model RegistryReproducible MLOps pipelines, lineage, approval gates, deployment.
Clarify / Model MonitorBias/explainability and production drift detection.
  • Use SageMaker for custom training, specialized hosting, large-scale FM training (HyperPod), or models outside the Bedrock catalog.
  • Stay in Bedrock for consume/customize-managed; come here only for full control.
  • Use Unified Studio as the front door tying data, analytics, and these AI tools together.

SageMaker Unified Studio GA

The single workspace over data, analytics, and AI - the front door to the next-gen SageMaker.

Official documentation ↗

Unified Studio brings EMR, Glue, Athena, Redshift, Bedrock, and SageMaker AI into one IDE on a lakehouse foundation, with Amazon Q Developer embedded for code, troubleshooting, and ETL. It stitches the data and AI lifecycles together so the same governed data powers analytics and model building, and it replaces the older SageMaker Studio Classic experience.

LakehouseGlue / EMR / AthenaRedshiftBedrockQ DeveloperGovernance / catalog
Migration
If you run Studio Classic, plan the move to Unified Studio - newer Bedrock and governance features land here first.

Model Customization

Four ways to make a model better at your task, from cheapest to most involved.

Official documentation ↗

TechniqueWhenCost/effort
Prompt + RAGMost tasks - ground the model in your data without changing weights.Low
Fine-tuningConsistent style/format or narrow task accuracy from labeled examples.Medium
Reinforcement Fine-TuningOptimize toward a reward signal where correctness is checkable (2026).Medium-High
DistillationTeach a small, cheap model from a large one - keep quality, cut cost/latency.Medium
Continued pre-trainingInject large domain corpora; rarely needed for most enterprises.High
Order of operations
Exhaust prompt engineering and RAG first. Fine-tune only with evidence the base model can't hit your accuracy/format bar. Distill once a fine-tuned large model proves out, to cut run-cost.

Amazon Q

AWS's family of GenAI assistants for developers, businesses, and operations.

Official documentation ↗

ProductWhat it does
Q DeveloperAgentic coding and ops assistant - code generation, transformation/modernization, troubleshooting, AWS console help. Embedded in IDEs and SageMaker Unified Studio.
Q BusinessEnterprise RAG assistant over your apps and documents (40+ connectors), with access controls inherited from the source systems.
Amazon Quick2026 evolution toward autonomous background agents with specialized expertise; an activity feed across email, messaging, calendar, and tasks.
Q in QuickSight / ConnectNatural-language BI and contact-center assistance embedded in those services.
Build vs buy
For internal knowledge assistants, pilot Q Business before building custom RAG - the connectors and permission inheritance save real engineering. Build on Bedrock when you need bespoke UX or logic Q can't express.

Applied AI Services

Task-specific managed APIs - no model selection, just call them.

Official documentation ↗

ServiceTask
RekognitionImage/video analysis: labels, faces, moderation, text-in-image.
TextractDocument extraction: text, forms, tables, queries from PDFs/images.
ComprehendNLP: entities, sentiment, key phrases, PII, custom classification.
TranscribeSpeech-to-text with diarization, custom vocabulary, call analytics.
PollyText-to-speech with neural and generative voices.
TranslateNeural machine translation across many languages.
LexConversational bots (the engine behind many IVR/chat flows).
KendraEnterprise search; the GenAI Index feeds RAG with permission-aware retrieval.
PersonalizeReal-time recommendations from your interaction data.
Trend
Several classic tasks (doc extraction, classification, summarization) are increasingly done with Bedrock + a multimodal model or Bedrock Data Automation. Use the applied service when it is cheaper, lower-latency, or compliance-certified for that exact task; reach for Bedrock when you need flexibility.

Vectors & Data

Where your embeddings and ground-truth live. Pick by scale, latency, and what you already run.

Official documentation ↗

StoreBest for
S3 VectorsCost-optimized vector storage/query at massive scale directly in S3 (2026) - cheapest for large, less latency-sensitive corpora.
OpenSearch Serverless (vector)Low-latency hybrid (keyword + vector) search; the common Knowledge Base default.
Aurora / RDS PostgreSQL (pgvector)Vectors next to relational data with transactional consistency.
MemoryDB / DocumentDB / Neptune AnalyticsIn-memory vectors, document-store vectors, and graph+vector analytics respectively.
Kendra GenAI IndexManaged, permission-aware retrieval index purpose-built for RAG.
Default
Most teams start with a Bedrock Knowledge Base on OpenSearch Serverless. Move to S3 Vectors for cost at scale, or pgvector when vectors must sit beside operational rows.

Chips & GPUs

The silicon under the stack. AWS's custom chips are the cost lever; NVIDIA GPUs are the compatibility lever.

Official documentation ↗

SiliconRole
Trainium2 / Trainium3AWS training (and increasingly inference) accelerators; Trn2 UltraServers and Project Rainier power large Anthropic/enterprise training at strong price/perf.
Inferentia2Cost-efficient high-volume inference.
EC2 P5 / P6 (NVIDIA Blackwell)Top-end GPU training/inference; maximum framework compatibility.
EC2 G7 (RTX PRO Blackwell)2026 graphics/inference tier for cost-effective serving and visual workloads.
UltraClusters / Capacity Blocks / HyperPodNetwork-dense GPU/Trainium fabrics; reserve capacity windows; resilient FM-training clusters.
Architect's lever
For high-volume inference, benchmark Inferentia2/Trainium against GPU instances - the price difference can dominate TCO. Keep GPUs where you need a specific CUDA/framework path.

Architecture Patterns

The handful of shapes most AWS GenAI workloads fall into.

1. Managed RAG assistant

Bedrock + Knowledge Base (OpenSearch/S3 Vectors) + Guardrails, fronted by API Gateway/Lambda or Q Business. The default enterprise knowledge assistant.

2. Production agent

AgentCore Runtime + Memory + Gateway (tools/MCP) + Identity + Observability. Add Web Search for grounding. Human-in-the-loop on high-impact actions.

3. Custom model service

SageMaker fine-tune/host (or import to Bedrock) behind a private endpoint; distill to cut cost once quality is proven.

4. Multimodal pipeline

Bedrock Data Automation extracts from docs/images/audio/video into structured output feeding a Knowledge Base or warehouse.

5. Batch enrichment

Bedrock batch inference over large datasets in S3 for classification, summarization, or embedding generation at lowest cost.

6. Embedded BI/ops assistant

Q in QuickSight/Connect, or Q Developer in the SDLC - buy the assistant rather than build it.

Decision Matrix

Fast answers to the questions that come up in every design review.

QuestionDefault answer
Consume a model or train one?Consume via Bedrock. Train/fine-tune in SageMaker only with evidence the base model can't meet the bar.
Which model?Claude for hardest reasoning/agents; Nova for cost/latency; Llama/Mistral/DeepSeek for open-weight/customization. A/B on the same Bedrock API.
Build RAG or use Knowledge Bases?Knowledge Bases unless you need bespoke chunking/hybrid/rerank control.
Bedrock Agents or AgentCore?AgentCore for anything heading to production - managed memory, tools, identity, observability.
Which vector store?OpenSearch Serverless default; S3 Vectors for cost at scale; pgvector for vectors beside relational data.
Buy an assistant or build?Q Business/Q Developer first; build on Bedrock when you need custom UX/logic.
GPU or AWS silicon?Trainium/Inferentia for cost at volume; NVIDIA for specific framework/CUDA needs.

Pricing & Cost Control

Shape, not exact numbers - rates change and vary by model/region. Always confirm in the AWS pricing pages.

LeverHow it billsControl
Bedrock on-demandPer input/output token, per model.Right-size model per task; cache prompts; cap output tokens.
Provisioned ThroughputReserved model units (hourly).For steady high volume; commit only after you know the load.
Batch inferenceDiscounted vs on-demand.Use for non-interactive enrichment jobs.
Knowledge Bases / OpenSearchStorage + query + embedding tokens.Tune chunk size; prune stale docs; pick S3 Vectors for cost.
SageMakerTraining + endpoint instance-hours.Serverless/async endpoints; autoscale to zero where possible.
AgentsModel tokens x steps + tool calls.Cap loop length; cheap model for routing, strong model only when needed.
The agent cost trap
Agent loops multiply token cost by the number of steps. A 10-step loop on a flagship model is the most common surprise bill. Budget per-conversation, log token usage, and route to small models for routine steps.

Risks & Gotchas

Read this one. What actually bites teams in production.

Model/region drift
Models and versions change and aren't uniform across regions. Pin model IDs, monitor deprecations, and test before auto-upgrading.
Runaway agent cost & actions
Unbounded loops and tool access cause both cost blowouts and unintended actions. Enforce step caps, budgets, least-privilege Identity, and human approval on high-impact tools. For AgentCore Payments, treat spend as a first-class control.
Data egress & residency
Cross-region inference and external tools (web search, third-party MCP) can move data. Confirm residency; prefer zero-egress options where compliance requires.
Service sprawl & lock-in
Mixing Bedrock, SageMaker, Q, and three vector stores creates operational complexity and AWS-specific coupling. Standardize on a few primitives; keep prompts/eval portable.
Guardrails are opt-in
Retrieval quality and missing guardrails, not the model, are usually the failure. Attach Guardrails by default and use contextual grounding + rerankers before blaming the LLM.
Quotas
Default account quotas (tokens/min, requests/min, concurrent training) throttle real workloads. Request increases early; design for backoff.