⏱ 20 min read
Most teams still talk about retrieval-augmented generation as if it were a prompt trick.
It isn’t.
RAG is not a clever sidecar bolted onto a language model. It is a data platform wearing an AI badge. The sooner an enterprise accepts that, the sooner it stops building fragile demos and starts building systems that survive contact with production. The retrieval layer is where your organization’s meaning gets negotiated: what a “customer” is, which policy version counts, how a claim relates to a contract, whether a product bulletin supersedes a safety notice, and who is allowed to see any of it. That is not prompt engineering. That is architecture.
This is the central mistake in many AI programs. Teams buy a vector database, wire up embeddings, dump documents into it, and call the result “enterprise knowledge.” Then they act surprised when the assistant hallucinates with confidence, returns stale pricing, leaks confidential guidance, or gives the right answer from the wrong source. The failure is rarely in the model. The failure is in treating retrieval like search infrastructure rather than like a governed, domain-aware data product.
A retrieval architecture worth deploying in a large enterprise behaves less like a chatbot backend and more like a composed information supply chain. It ingests from operational systems, events, content repositories, and process applications. It transforms documents into chunks, chunks into representations, representations into retrievable knowledge objects. It preserves lineage, policy boundaries, temporal validity, and domain semantics. It reconciles competing truths. It gives different answers to different users for good reasons. In short: it looks suspiciously like a modern data platform, because that is what it is.
And once you say it plainly, the design decisions get sharper. You stop asking “Which vector DB should we use?” as if the storage engine were the architecture. You start asking “What are the bounded contexts?”, “Where do canonical facts come from?”, “What event contracts tell us a policy has changed?”, “How do we re-index without poisoning retrieval?”, “What happens when source systems disagree?”, “How do we trace generated answers back to domain authority?”, and “How do we migrate from document dumps to governed knowledge products without freezing delivery for a year?”
Those are the right questions. They belong to enterprise architecture, domain-driven design, and migration planning. They also happen to be the difference between an AI assistant that impresses in a workshop and one that earns a place in a regulated business process.
Context
The market packaged RAG as a shortcut. A quick way to inject enterprise knowledge into a general-purpose model without expensive fine-tuning. Fair enough. That framing helped adoption. But it also smuggled in a dangerous simplification: if the model is the brain, retrieval must be just memory. And memory, in this caricature, is simple.
Enterprise memory is never simple.
Most organizations do not have “a knowledge base.” They have contracts in content management systems, support procedures in SharePoint, SKUs in ERP, entitlements in CRM, claims in workflow tools, audit evidence in case systems, engineering bulletins in PLM, and half the important truth trapped in email attachments that should have died years ago. The idea that one can flatten all this into embeddings and call it done is the architectural equivalent of pouring concrete over exposed wiring.
A serious retrieval system sits in the middle of a larger topology: source systems of record, event streams, document pipelines, metadata enrichment, access-control propagation, semantic indexing, lexical search, graph relationships, cache layers, orchestration services, and model gateways. The generated answer is the visible tip. Underneath is a platform.
This is why RAG architecture matters most in enterprises with fragmented information landscapes, strict compliance requirements, and strong domain boundaries. In these environments, retrieval is not just “find similar text.” Retrieval is “find the right authoritative material, valid for this user, in this jurisdiction, at this moment, with lineage and confidence.” The shape of that sentence should tell you immediately that this is data architecture territory. enterprise architecture with ArchiMate
Problem
The core problem is not that large language models lack knowledge. The core problem is that enterprise knowledge is messy, contextual, distributed, and contested.
If your retrieval layer ignores that, it produces three predictable pathologies.
First, it collapses domain semantics. A “customer” in billing is not always a “customer” in servicing. A “policy” in legal is not a “policy” in insurance operations. A “case” in support is not a “case” in claims. If your ingestion pipeline treats all content as generic text blobs, the system will retrieve semantically adjacent but operationally wrong information. The answer sounds plausible because the words overlap. The business outcome is wrong because the domain meaning does not.
Second, it loses temporal truth. Enterprises live in versions. Product guidance changes. Policies are superseded. Exceptions expire. Jurisdictional interpretations vary. Retrieval systems that do not model validity windows, effective dates, and supersession relationships create a quiet catastrophe: answers grounded in obsolete authority.
Third, it breaks trust boundaries. Most retrieval demos are built by people with admin rights over a sample corpus. Real enterprises have entitlements, need-to-know partitions, legal holds, Chinese walls, and regional restrictions. A retrieval system that cannot propagate and enforce these controls reliably will either leak information or become so constrained that nobody trusts it.
The temptation is to solve these with one more filter, one more ranker, one more metadata tag. That usually helps for a quarter and fails by the second scale-up. Because the real issue is architectural: retrieval is being treated like an application feature instead of a data platform capability.
Forces
Several forces pull on retrieval architecture at once, and they rarely point in the same direction.
Accuracy versus freshness.
Batch indexing is easier to manage and cheaper to run. Event-driven updates are fresher but harder to reconcile, monitor, and replay. In domains like customer support, hours of staleness may be acceptable. In payments fraud or claims adjudication, they may not.
Semantic richness versus operational simplicity.
Chunking raw documents into vectors is simple. Building domain-aware knowledge objects with relationships, taxonomies, and entity normalization is not. But the latter is often what separates “search-ish” retrieval from decision-grade retrieval.
Autonomy versus consistency.
Domain teams want to publish their own knowledge products and evolve independently. Platform teams want common ingestion contracts, metadata standards, and governance. Push too hard either way and you get either chaos or paralysis. EA governance checklist
Recall versus precision.
Broad retrieval maximizes the chance of finding something relevant. Narrow retrieval reduces noise. LLMs are especially sensitive to this balance because stuffing too much context into the prompt degrades answer quality just as surely as missing a key source.
Latency versus control.
A retrieval pipeline with policy checks, graph expansion, re-ranking, and reconciliation is safer and smarter. It is also slower. Users do not care about your architectural purity if the assistant takes 12 seconds to answer a routine question.
Central platform versus embedded domain ownership.
A central AI platform can standardize embeddings, model gateways, observability, and controls. But if it swallows domain semantics into one giant corpus, it recreates the enterprise data swamp with shinier tooling.
This is where domain-driven design is useful, not as a ritual, but as a survival mechanism. Retrieval systems need bounded contexts. They need explicit published language. They need anti-corruption layers where legacy content semantics leak into cleaner domain models. Otherwise the retrieval layer becomes a semantic junk drawer.
Solution
The right move is to design AI retrieval as a domain-oriented data platform.
That means a few concrete things.
Treat each major domain as a producer of governed knowledge products, not just a supplier of documents. A claims domain publishes policy clauses, adjudication guidance, procedural playbooks, exception rules, and historical case exemplars with explicit metadata, validity, source authority, access policy, and semantic identifiers. A product domain publishes service bulletins, SKU relationships, troubleshooting flows, and compatibility matrices. A legal domain publishes approved policy interpretations and regulatory mappings. Each bounded context owns its semantics.
Then provide a common retrieval platform beneath those products. This platform handles ingestion frameworks, event subscriptions, document processing, chunking standards, embedding generation, lexical indexing, vector storage, graph links, metadata filtering, access-control enforcement, observability, lineage, and retrieval orchestration. The platform is shared. The meaning is not.
This is the crucial distinction. Shared infrastructure, federated semantics.
A mature implementation usually combines several retrieval modes:
- lexical search for exact language and known phrases
- vector search for semantic similarity
- metadata filters for policy, entitlement, geography, and time
- graph traversal for relationship-aware expansion
- re-ranking for final context selection
- citation assembly for traceability
Do not pick one retrieval mechanism and make it a religion. Enterprises need an ensemble.
At the orchestration layer, the system should accept an intent-rich query, not just a string. User identity, role, channel, language, product context, customer segment, geography, and process step all matter. “Can I waive this fee?” means something very different in a branch workflow than in internal policy audit. Retrieval needs that context upstream, not bolted on after the search results arrive.
This is also where reconciliation enters the picture. Enterprises routinely have overlapping or conflicting content. A local operating procedure may conflict with a global policy. A product bulletin may supersede a support article. A CRM note may describe an exception that is not reflected in the official rulebook. A retrieval platform has to decide whether to merge, rank, flag, or suppress conflicting knowledge. That is not merely a search concern; it is a policy and data stewardship concern.
Here is the high-level topology.
The diagram is simple on purpose. Real platforms have more boxes. But the shape matters: source truth, domain curation, multiple indexes, policy-aware retrieval orchestration, then generation.
Not the other way around.
Architecture
A useful way to describe the architecture is in layers.
1. Source and change layer
The first responsibility is capturing information from systems of record and systems of engagement. This usually includes APIs, CDC pipelines, file drops, content repository connectors, and event streams. Kafka often belongs here because it gives a durable, replayable backbone for content and metadata changes. If a product manual changes, or a policy version is published, or a claim reaches a new adjudication state, those events should feed the retrieval platform with enough signal to decide whether to reprocess and re-index.
Do not confuse “document arrived” with “knowledge changed.” A good event contract distinguishes cosmetic edits from semantic changes. That sounds fussy until you have re-embedded ten million chunks because someone changed a footer template.
2. Domain curation layer
This is where domain-driven design earns its keep.
Each bounded context maps raw source artifacts into a domain model. A claims team may define objects like CoverageClause, ExceptionRule, AdjudicationStep, JurisdictionVariant, and CasePattern. A service domain may define ServiceBulletin, DiagnosticProcedure, CompatiblePart, and RecallNotice. This curation layer also normalizes identities, tags validity periods, records source authority, and maps terms into a ubiquitous language.
You can think of this as the anti-corruption layer between messy enterprise content and retrieval-grade knowledge.
Without it, your vector index becomes a landfill.
3. Knowledge shaping layer
This layer transforms domain objects into retrievable forms. Chunking is part of it, but chunking is the least interesting part. More important are the chunk boundaries, semantic labels, relationship links, citation references, ACL inheritance, temporal metadata, and embedding strategy. Some content should be chunked by section; some by decision rule; some by procedural step; some should not be chunked at all.
This is also where entity extraction and graph construction can help. If “Premium Waiver,” “Hardship Exception,” and “Fee Adjustment” are connected concepts in one domain but not another, the graph makes that explicit.
4. Retrieval orchestration layer
At query time, an orchestrator combines retrieval strategies. It may first classify intent, map terms to a domain, derive filters from user and process context, query lexical and vector indexes, expand along graph relationships, reconcile duplicates, and re-rank results before packaging context for the model. In many enterprises, this orchestration belongs in a service layer rather than inside one vendor product, because the policy logic, domain routing, and observability requirements are too important to outsource blindly.
5. Generation and control layer
The model gateway should provide model selection, prompt templates, safety controls, token accounting, caching, and audit logs. The answer must preserve citations and confidence signals from retrieval. If the model cannot find high-confidence authoritative context, the safest answer may be “I don’t know” or “Escalate.” Enterprises need systems that know when to stop.
The bounded-context view is worth making explicit.
The anti-corruption layer is not optional in most enterprises. Legacy repositories encode local assumptions, outdated taxonomies, and accidental semantics. If you let those leak directly into retrieval, the model will cheerfully amplify the confusion.
Migration Strategy
No enterprise with real history gets to rebuild retrieval architecture from scratch. You migrate, or you die trying.
The sensible migration path is a progressive strangler. Start with a narrow, high-value domain where retrieval quality matters and where source authority is reasonably clear. Build the platform spine once: ingestion framework, identity propagation, metadata model, retrieval orchestration, citations, telemetry. Then onboard one domain knowledge product at a time. Let the old search and content systems continue serving users while the new path gradually takes traffic.
The first phase is usually read-only augmentation. The assistant answers questions using curated retrieval, but it does not drive workflow decisions. This lets you measure answer quality, source coverage, and user trust without operational blast radius.
The second phase is guided task support. Retrieval is embedded into workflow applications: case handling, service consoles, underwriting workbenches. The assistant surfaces authoritative passages, next-best procedures, or checklist guidance, but a human still makes the decision.
The third phase is decision-adjacent automation, where structured retrieval outputs feed downstream rules or agent workflows. This is where governance, reconciliation, and failure handling become non-negotiable.
A strangler migration usually looks like this:
A few migration rules matter.
Do not bulk-ingest everything first.
That creates a giant low-quality corpus and a false sense of progress. Start with the authoritative 20 percent that drives 80 percent of important queries.
Keep reconciliation visible.
When the new platform and legacy systems disagree, log it, surface it, and route it to data stewards. Silent divergence is poison.
Version your knowledge products.
You will change chunking strategies, metadata schemas, and embedding models. If you cannot reprocess and compare versions safely, your migration will become superstition.
Use replayable event streams.
Kafka earns its place here because migration involves backfills, reprocessing, and dual-run validation. You want to replay change events into new pipelines without inventing heroics.
Design fallback early.
If the domain-aware retrieval path fails, what should happen? Fall back to lexical search? Use only approved FAQs? Route to human? Decide this before launch, not after the first executive demo.
Reconciliation deserves its own paragraph because it is where many AI initiatives quietly fail. During migration, you will have overlapping corpora: old portal content, manually curated FAQs, and newly normalized domain products. They will conflict. Some conflicts are benign duplicates. Some reveal broken publishing processes. Some expose that nobody actually knows which source is authoritative. Good. Better to discover that in the retrieval platform than in a regulatory review. Create explicit stewardship workflows for conflict resolution, supersession rules, and exception approval. Retrieval architecture surfaces organizational truth debt. That is a feature.
Enterprise Example
Consider a large insurer operating across multiple regions. It wants an internal claims assistant to help handlers answer questions such as: “Is windshield damage covered under this policy in this state?”, “What documents are required before settlement?”, and “What changed in the fraud escalation process last quarter?”
The raw materials live everywhere. Policy language sits in a document management system. Jurisdictional endorsements are managed separately. Claims procedures are in SharePoint. Fraud triggers come from a risk platform. Historical exemplars live in the case system. Legal interpretations are published as memos. None of these repositories agree on identifiers. Some refer to products by marketing name, others by underwriting code.
A naive RAG implementation would index all of it and hope semantic search sorts things out. It won’t. The assistant will retrieve clauses from the wrong product family, procedures from the wrong region, and memos that were superseded three months ago.
The insurer instead defines several knowledge products:
CoverageKnowledgeProductowned by underwritingClaimsProcedureKnowledgeProductowned by operationsFraudGuidanceKnowledgeProductowned by riskRegulatoryInterpretationKnowledgeProductowned by legal
Each product publishes normalized entities with product codes, state applicability, effective dates, supersession links, confidence metadata, and ACLs. Kafka streams policy publication events, procedure updates, and memo releases into the platform. A claims-domain anti-corruption layer maps legacy labels into canonical identifiers. Query orchestration derives context from the case screen: state, product, claim type, user role, and workflow stage. Retrieval combines lexical search for exact policy clauses, vector search for semantically similar procedures, and graph expansion to related endorsements and fraud rules.
Now the assistant can answer: “For this auto policy in Florida, windshield damage is covered under endorsement X effective from Jan 1, 2026. Before settlement, collect documents A and B. If invoice pattern Y appears, follow fraud escalation procedure version 3.2.” It cites each source and suppresses legal memos the user is not entitled to see.
That is not a chatbot trick. That is a domain-aware information platform embedded in claims operations.
Operational Considerations
Production retrieval architecture lives or dies on operations, not diagrams.
Observability must span the full chain: source event received, transform applied, domain object published, index updated, retrieval query executed, result selected, prompt assembled, answer returned. You need to know not just that latency spiked, but whether the spike came from ACL filtering, graph expansion, model congestion, or index refresh lag.
Quality measurement has to go beyond generic LLM evaluation. Measure retrieval recall for curated test sets. Measure citation accuracy. Measure stale-answer rate. Measure source-authority violations. Measure semantic drift after embedding upgrades. Measure how often fallback paths are used. In regulated settings, measure “answer abstention quality” too; refusing correctly is a capability.
Access control propagation is usually underestimated. If source ACLs do not map cleanly into the retrieval store, stop and solve that before scaling. Security retrofits are expensive and humiliating.
Cost control matters. Embedding every tiny source change is a tax on indecision. Intelligent change detection, tiered indexing, and selective reprocessing will save real money. So will caching retrieval results for common intents where policy allows.
Data retention and legal hold rules apply. If the source document must be deleted, retained, or frozen for litigation, the retrieval platform has to honor that. “It’s only in the vector index” is not a defense anyone should want to test.
Model and index evolution should be decoupled. You want the option to change the generation model without rebuilding the retrieval substrate, and vice versa. Tight coupling feels convenient until procurement, compliance, or pricing forces a change.
Tradeoffs
There is no free architecture here.
A domain-oriented retrieval platform is more work than a document dump into a vector store. It requires domain ownership, metadata discipline, governance, and event design. It also requires admitting that enterprise truth is plural and negotiated. Some organizations would rather buy another portal than face that fact. ArchiMate for governance
The payoff is not elegance. The payoff is operational trust.
Still, the tradeoffs are real:
- Central standards slow local experimentation.
- Rich semantics increase implementation cost.
- Event-driven freshness increases operational complexity.
- Multiple retrieval modes increase tuning burden.
- Strict ACL enforcement can reduce recall and perceived usefulness.
- Reconciliation workflows create organizational friction.
That friction is not accidental. It is where your architecture meets your politics.
Failure Modes
The most common failure mode is the vector swamp: vast quantities of chunked text with inconsistent metadata, weak source lineage, and no real domain model. It demos well. It degrades quietly. Nobody notices until users stop trusting it.
Another is authority inversion: the assistant retrieves the most semantically similar text rather than the most authoritative source. In consumer search this is annoying. In enterprise operations it is dangerous.
Then there is stale truth, where change capture is incomplete or delayed. The assistant answers from yesterday’s policy in today’s workflow. This is especially ugly when the generated prose sounds crisp and certain.
A fourth is semantic bleed across contexts. Terms from one bounded context contaminate another. “Suspension,” “closure,” “write-off,” “waiver,” and “settlement” all carry context-sensitive meanings. Retrieval without domain boundaries turns these into traps.
Finally, security mismatch remains the career-limiting failure. If source permissions and retrieval permissions drift apart, the architecture is broken no matter how clever the prompting is.
When Not To Use
Do not build a full retrieval platform for every AI problem.
If the domain is small, stable, and low-risk, a simpler knowledge base or curated prompt context may be enough. If your use case depends mostly on structured transactional data rather than unstructured knowledge, then a tool-using agent over APIs and databases may be a better fit than heavy RAG. If your source estate is so chaotic that no one can identify authority, a retrieval platform will surface the mess but not magically resolve it. In that case, start with governance and content rationalization.
And if the business is not prepared to assign domain owners, define semantics, and steward reconciliation, then it should not pretend it is building enterprise AI. It is building a demo estate.
Related Patterns
Several architecture patterns sit naturally beside this approach.
Data mesh thinking helps because it frames domains as owners of data products. Retrieval knowledge products fit that model well, provided the platform supplies common interoperability.
CQRS can be useful where operational systems own writes and retrieval indexes are specialized read models optimized for AI consumption.
Event-driven architecture is a strong fit for freshness, replay, and progressive migration. Kafka is often the backbone, not because it is fashionable, but because durable event history makes reprocessing survivable.
Knowledge graphs complement vector search when relationships, hierarchy, supersession, and entity identity matter.
Strangler fig migration is the right modernization pattern for replacing brittle enterprise search and document portals incrementally.
Anti-corruption layers are essential wherever legacy taxonomies and content structures would otherwise pollute the domain model.
Summary
RAG is not a retrieval trick. It is a data platform problem with an AI interface.
That one sentence clears away a lot of confusion. It tells us why domain semantics matter, why bounded contexts matter, why reconciliation matters, why Kafka and event-driven change capture matter, why citations and authority ranking matter, and why migration must be progressive rather than revolutionary. It also explains why so many first-generation enterprise RAG systems disappoint: they optimize embeddings before they understand meaning. event-driven architecture patterns
Build retrieval as shared platform infrastructure with federated domain-owned knowledge products. Normalize legacy chaos through anti-corruption layers. Use multiple retrieval modes. Propagate identity and policy all the way through. Embrace reconciliation as part of the design, not an embarrassing afterthought. Migrate with a strangler pattern and keep fallback paths alive until trust is earned.
The model may write the sentence. But the platform decides whether the sentence deserves to exist.
Frequently Asked Questions
What is a data mesh?
A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.
What is a data product in architecture terms?
A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.
How does data mesh relate to enterprise architecture?
Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.