Your AI Platform Is a Data Routing Layer

⏱ 21 min read

Most enterprise AI programs begin with a flattering lie.

The lie is that the hard part is the model.

So teams spend months debating model providers, GPU capacity, prompt frameworks, vector databases, agent runtimes, and benchmark scores. They hold architecture reviews full of arrows pointing at LLMs as if intelligence itself were the center of gravity. But once these systems reach production, the truth arrives with very little ceremony: the model is often the easiest component to replace. The hard part is deciding what data reaches it, under what semantics, through which controls, with what timing guarantees, and how the result gets reconciled back into the business.

That is why an enterprise AI platform is, in practice, a data routing layer.

Not a chatbot shell. Not a prompt catalog. Not a magical “AI fabric.” A routing layer.

It sits between domains, policies, systems of record, event streams, inference endpoints, and user-facing workflows. Its job is not merely to call a model. Its job is to shape inference topology: where inference happens, what context is attached, which policies apply, how partial results are handled, and how decisions are observed, corrected, and audited.

This distinction matters because enterprises do not suffer from a shortage of models. They suffer from semantic drift, duplicated orchestration, uncontrolled data movement, and brittle coupling between business workflows and probabilistic services. In other words, they suffer from architecture problems.

And architecture problems don’t yield to better prompts.

What follows is an opinionated view: if you are building AI in a serious enterprise, think less about “AI apps” and more about inference topology. Think in domains. Think in flows. Think in event boundaries and policy enforcement points. Think in reconciliation paths for when the model is wrong, late, or unavailable. If you do that, your platform starts looking less like a collection of AI tools and more like what it really is: a controlled routing system for business meaning.

Context

The last decade trained enterprise architects to think in APIs, events, and product-aligned microservices. Systems were decomposed around business capabilities. Kafka became the backbone for streaming facts across domains. Teams learned, sometimes painfully, that a service boundary is not a deployment trick but a semantic commitment. event-driven architecture patterns

AI reopens that lesson.

The naive approach inserts model calls directly into channels: the web app calls the LLM, the claims app calls the classifier, the support bot calls retrieval and generation, the fraud engine calls a feature store and a model endpoint. Every product team builds some variation of context assembly, prompt management, fallback logic, content filtering, provider switching, and response logging. The result looks agile for about six months. Then it calcifies into a distributed mess of duplicated policy and inconsistent semantics.

One team defines “customer” as CRM profile plus billing history. Another includes recent support incidents. A third redacts fields differently. A fourth sends raw notes to a public model because “it was just a prototype.” You can guess what happens next: compliance panics, security inserts gateways after the fact, platform teams scramble to centralize controls, and the business wonders why every AI use case feels custom.

This is not new. We have seen the pattern before with integration platforms, service buses, API gateways, and event backbones. The names change; the gravity does not. Whenever a capability depends on moving information between domains under policy, topology becomes architecture.

AI simply raises the stakes because inference is probabilistic, context hungry, and operationally expensive. A bad API call usually fails loudly. A bad AI route often succeeds quietly.

That is worse.

Problem

Most enterprise AI architectures are built around inference endpoints, but production reality is driven by routing decisions.

A customer service answer may need:

customer identity from CRM
policy coverage from an insurance system
recent claim events from Kafka
sensitive note redaction from a privacy service
retrieval from a knowledge corpus
a model selected by jurisdiction and cost tier
post-processing by a rules engine
human review if confidence is low
result reconciliation back into a case management system

The business requirement is not “call model X.” The requirement is “produce an answer in a workflow, using governed context, under domain rules, with a recoverable audit trail.”

That is routing.

The same applies to document extraction, fraud triage, next-best-action, underwriting summarization, coding assistance, pricing recommendations, and internal knowledge assistants. In every case, the critical decisions are upstream and downstream of the model:

what facts are assembled
whether those facts are fresh enough
how domain language is translated
whether the use case requires synchronous or asynchronous inference
who owns correction
how outputs become business state

When teams ignore that, they create a familiar anti-pattern: AI orchestration embedded inside every application. The short-term benefit is speed. The long-term cost is semantic entropy.

This entropy shows up in several ugly ways:

Context fragmentation

Different teams build different views of the same business entity.

Policy inconsistency

PII redaction, retention rules, and provider restrictions vary by implementation.

Coupling to model vendors

Provider-specific prompts, schemas, and failure handling leak into product code.

No reconciliation path

AI outputs are accepted optimistically with no durable correction model.

Observability without meaning

Metrics report latency and token counts, but not business correctness by domain outcome.

Migration paralysis

Once model calls are woven into dozens of services, changing topology becomes a change program.

The platform problem is not “How do we expose AI to teams?” It is “How do we route data and decisions through inference in a way that preserves domain integrity?”

Forces

Enterprise architecture is the art of surviving conflicting truths. AI adds more of them.

1. Domain semantics versus centralized reuse

A platform team wants consistency. Domain teams want control. Both are right.

A claims domain understands the difference between first notice of loss, adjudication, reserve adjustment, and settlement exception. A central AI team does not. But domain teams should not each reinvent provider routing, redaction, caching, and fallback controls. The right split is not centralize everything or federate everything. The right split is centralize routing capabilities while keeping domain semantics local.

That is textbook domain-driven design, though people often forget the “domain” part when the topic becomes AI.

2. Synchronous user expectations versus asynchronous enterprise reality

Users expect instant answers. Enterprises run on eventually consistent workflows.

Some inference belongs in the request path: search augmentation, chat response drafting, agent assistance. Other inference should be event-driven: claim document extraction, policy anomaly detection, case summarization, downstream enrichment. Treating all AI as synchronous creates latency, cost, and resilience problems. Treating all AI as async degrades user experience.

Topology matters because different workloads need different routes.

3. Cost versus quality

The best model is rarely the one you should call every time.

You may want a small model for triage, a larger model for escalation, deterministic rules for known paths, and human review for edge cases. This is not a model decision. It is a routing policy decision informed by business value.

4. Governance versus delivery speed

Security teams want approved providers, jurisdiction controls, data minimization, retention, and auditability. Delivery teams want velocity. If governance is bolted on later, it becomes a tax. If governance is built into routing, it becomes a path selection rule. EA governance checklist

5. Freshness versus stability

Some inferences need live operational data. Others need curated snapshots to avoid nondeterministic prompts and replay issues. Route live data everywhere and you lose reproducibility. Route only snapshots and you lose relevance.

6. Enterprise integration gravity

Most serious AI use cases touch Kafka, service APIs, identity systems, MDM, content stores, and systems of record. The architecture must respect the existing integration estate, not fantasize that an AI platform will replace it.

It won’t.

Solution

Treat the AI platform as an inference routing layer with explicit domain contracts.

That means the platform owns the mechanics of inference routing:

policy enforcement
provider and model selection
prompt and tool execution infrastructure
context assembly framework
retrieval plumbing
observability
fallback and retry behavior
asynchronous job handling
response normalization
audit and lineage

But it does not own business meaning. Domains own:

entity definitions
event semantics
workflow boundaries
acceptance criteria
human review rules
correction and reconciliation processes
business KPI instrumentation

This separation is the difference between useful centralization and another platform that everyone bypasses.

A good inference routing layer behaves like a logistics network for meaning. It does not generate value by itself. It moves the right material to the right place under the right conditions. It knows which roads are allowed, which cargo needs special handling, and where customs checks occur. It does not pretend all packages are the same.

Core principles

1. Route by intent, not by model

Applications should ask for a business capability, not a provider-specific invocation.

Bad:

call-gpt4-with-this-prompt

Better:

generate_claim_summary
classify_fraud_signal
draft_customer_response

Intent-based routing preserves the option to change models, prompts, tools, and policies without rewriting every caller.

2. Keep domain context products separate from platform mechanics

The claims domain should publish a context contract for “claim summary context.” The platform should know how to assemble, redact, cache, and route it. The platform should not invent claim semantics.

3. Make reconciliation a first-class path

Inference outputs must be correctable. If the model extracts a diagnosis code incorrectly or drafts a wrong explanation of benefits, there must be a workflow for review, correction, replay, and lineage.

4. Support multiple inference topologies

You need request/response, event-driven enrichment, batch scoring, streaming inference, human-in-the-loop escalation, and hybrid patterns. One topology will not fit every use case.

5. Measure business outcomes, not just technical throughput

Latency matters. Token cost matters. But business accuracy by domain scenario matters more.

Architecture

A practical architecture usually has five layers.

Experience and process layer

Portals, agent desktops, digital channels, BPM/workflow tools, case management.

Domain services and event backbone

Microservices, Kafka topics, domain APIs, systems of record.

Inference routing layer

The AI platform proper: policy, orchestration, context assembly, provider abstraction, prompt/tool runtime, guardrails, observability.

Knowledge and context sources

Document stores, search indexes, vector retrieval, feature stores, master data, policy repositories.

Inference execution endpoints

LLMs, classifiers, embeddings, OCR, speech, custom ML services.

Here is the shape of it.

This is not an ESB revival with better marketing. The difference is in the contracts. An old-school integration bus often centralized business transformation logic until it became a bottleneck. A sound inference routing layer centralizes generic inference concerns while domain transformations remain close to the domain.

That boundary is everything.

Domain semantics discussion

If you ignore semantics, your AI platform becomes a very expensive string processor.

Take “customer.” In a bank, the retail customer domain, fraud domain, collections domain, and onboarding domain each have legitimate but different views. An onboarding assistant may need KYC status and document deficiencies. A collections assistant may need delinquency stage and hardship indicators. A fraud triage service may need device risk and linked-party analysis. These are not technical variations; they are bounded contexts.

So the platform should not expose one giant “customer context” endpoint. That way lies leakage, over-fetching, and accidental data exposure.

Instead, each domain should define context products with explicit semantics:

OnboardingApplicantContext
CollectionsAccountContext
FraudCaseContext

The routing layer can then apply common mechanics to all of them:

fetch and merge
redact
enrich
cache
route to suitable model
normalize output
emit audit event

This is domain-driven design with operational teeth.

Inference topology patterns

At least four topologies appear repeatedly.

1. Inline assist

The user is waiting. Latency budgets are strict. The route must be fast, bounded, and often partial.

2. Event-driven enrichment

A domain event lands on Kafka. The routing layer enriches a case, document, or entity asynchronously.

3. Batch or backlog processing

Large document sets, historical backfills, nightly prioritization. Cheap models and resilient queues matter more than chat-like responsiveness.

4. Escalation topology

A cheap route handles the common case. A richer route or human review handles ambiguity.

A mature platform supports all four without making every application team solve them independently.

Kafka and microservices

Kafka is particularly useful when AI should follow business events rather than sit awkwardly in front of them.

Examples:

ClaimSubmitted triggers document classification and summary generation.
PaymentExceptionRaised triggers anomaly explanation.
CustomerInteractionClosed triggers auto-summarization and next-best-action recommendation.
ProductSpecUpdated triggers embedding refresh and retrieval index update.

The pattern is simple: domain services publish facts. The routing layer subscribes where AI enrichment is appropriate. It does not become the source of truth. It emits derived facts or recommendations back into the ecosystem, ideally on separate topics with clear semantics such as ClaimSummaryGenerated or FraudCaseTriageSuggested.

That separation matters. Generated output is not the same as approved business state.

Migration Strategy

Most firms already have AI logic scattered across applications. So the migration strategy should be progressive strangler, not big bang.

Big bang rewrites are how architecture becomes theatre.

A better sequence looks like this:

Step 1: Identify duplicated inference mechanics

Find the repeated capabilities:

provider SDK wrappers
prompt templates
PII redaction
retry logic
output parsing
usage logging
fallback between models

These are your first candidates for centralization.

Step 2: Introduce intent-based APIs

Wrap existing direct model calls with business-intent interfaces. Do not change domain behavior yet; simply hide provider details.

Step 3: Externalize policy and routing

Move model selection, redaction, provider restrictions, and prompt/tool configuration into the platform. Applications still call the same business intents, but mechanics become centralized.

Step 4: Shift asynchronous use cases onto Kafka-driven flows

Document extraction, summarization after case closure, enrichment after event arrival—move these out of request threads and into durable event processing.

Step 5: Establish reconciliation

Create review queues, correction UIs, lineage IDs, replay support, and event models for accepted versus suggested outputs.

Step 6: Refactor domain context products

As teams gain confidence, replace ad hoc data gathering with explicit context contracts owned by domains.

Step 7: Retire direct model integrations

Only after policy, observability, and reconciliation are in place should teams remove direct provider dependencies from applications.

Here is the migration shape.

Step 7: Retire direct model integrations — Retire direct model integrations

Reconciliation discussion

Reconciliation is where many AI architectures quietly fail.

A generated summary may be edited by a human. An extracted field may be corrected. A recommendation may be accepted, overridden, or ignored. If you overwrite the original state without lineage, you lose the ability to learn, audit, or replay. If you treat suggestions as final truth, you invite silent corruption.

A robust model uses at least three states:

generated suggestion
human or rule-validated acceptance
committed business fact

The routing layer should assign correlation IDs and version context inputs. Domain workflows should record whether outputs were accepted or corrected. Kafka topics can then carry both generated and reconciled events.

This is especially important during migration. For a while, old and new paths may coexist. Reconciliation becomes the mechanism that lets you compare outcomes safely.

Enterprise Example

Consider a global insurer modernizing claims operations across property, auto, and travel lines.

They began, as many do, with local experiments. The call center had a summarization bot. Property claims had document extraction using one cloud provider. Auto had a fraud note classifier built by a data science team. Travel claims used another vendor entirely for multilingual response drafting. Every team was productive. Every team was also creating a future problem.

The same claimant data was routed differently by line of business. Some flows redacted medical details; others did not. Several use cases embedded provider-specific prompts inside microservices. There was no common lineage model. Case workers corrected AI outputs every day, but those corrections disappeared into screens rather than feeding improvement loops. Security found outbound data patterns they could not explain with confidence. microservices architecture diagrams

The insurer did not need “one model.” It needed one routing strategy.

What they built

They introduced an inference routing layer between claims applications and model providers.

The claims domain defined context products:

FNOLContext
ClaimDocumentContext
AdjusterCaseContext
FraudReferralContext

A Kafka backbone already existed, so they used events aggressively:

ClaimOpened
DocumentReceived
CaseAssigned
InvestigationRequested
ClaimClosed

The routing layer subscribed to these events and triggered different inference topologies:

OCR and extraction on DocumentReceived
summarization on CaseAssigned
triage recommendation on InvestigationRequested
closure summary on ClaimClosed

For real-time agent assist, the call center UI invoked an intent like draft_claimant_explanation. The routing layer assembled only the context allowed for that jurisdiction, selected a lower-latency model for common interactions, and escalated to a stronger model only when confidence was low or the conversation involved policy interpretation.

Why it worked

Because domain semantics stayed with the claims teams.

The central platform team did not define what a reserve adjustment meant or which investigation reasons mattered. It handled redaction, provider switching, audit logs, prompt runtime, and normalization. Claims teams owned acceptance logic and review workflows.

The key operational improvement

They created a reconciliation model in the case management system:

AI suggestion stored as suggestion
adjuster edits captured separately
approved case summary committed as business record
acceptance/correction event emitted for analytics

Within six months, they could answer questions that had previously been impossible:

Which use cases save handling time by claim type?
Which jurisdictions require a different routing policy?
Where are human corrections concentrated?
Which providers create cost spikes without corresponding business value?
Which context fields correlate with bad recommendations?

That is the kind of visibility enterprises actually need. Not “our prompt score improved by 12%.”

Operational Considerations

Operational excellence here is not just uptime. It is controlled meaning under production stress.

Observability

Track technical and business metrics together:

latency by route and use case
token and provider cost
retry rates
fallback frequency
policy denials
confidence distribution
human correction rates
business outcome deltas

A route that is cheap and fast but repeatedly corrected is not a good route.

Caching and replay

Context caching can reduce latency and cost, but stale context can poison outputs. Cache immutable or slow-changing enrichment aggressively. Be more careful with operational state.

Replay matters for incidents, audits, and migrations. Version prompts, tools, context schemas, and policy sets so you can explain why a result occurred.

Security and privacy

Put policy enforcement before provider invocation, not after. Redact or tokenize sensitive fields based on domain and jurisdiction. Keep provider-specific legal constraints outside application code.

Resilience

Models timeout. Providers throttle. Retrieval stores go stale. Tool calls fail halfway through. Your routing layer needs:

circuit breakers
provider failover
degraded modes
asynchronous fallback
dead-letter queues for event-driven flows

Graceful degradation is underrated. Sometimes the right answer is a rules-only response plus a message that richer analysis is pending.

Product operating model

Treat major intents as products with owners, SLAs, and domain stewards. “Case summarization” is not a prompt; it is an operational capability.

Now every change to claims or onboarding semantics requires a central backlog ticket. Delivery slows. Domain teams bypass the platform.

2. The platform is only a thin provider proxy

You centralize SDK calls but not policy, lineage, context assembly, or reconciliation. You get little benefit and plenty of ceremony.

3. One giant context object

It seems convenient. It becomes a privacy risk and a semantic junk drawer.

4. No distinction between suggestion and fact

Generated output is written directly into systems of record. Corrections happen later, if ever. Trust collapses after the first incident.

5. Metrics without business truth

The dashboard celebrates low latency while operations teams quietly ignore the outputs.

6. Migration stalls at adapters

Teams wrap direct model calls but never move routing policy or reconciliation into the platform. You end up with a prettier version of the old mess.

7. Kafka topics become AI exhaust pipes

The routing layer emits poorly defined “AIResult” events with unclear ownership or semantics. Downstream consumers guess what they mean. That never ends well.

When Not To Use

This pattern is not universal.

Do not build a full inference routing layer when:

you have a single low-risk use case with limited data sensitivity
the application is isolated and unlikely to spread
domain semantics are trivial
there is no meaningful need for provider switching or audit
a small team can own the end-to-end workflow without cross-enterprise reuse

A departmental prototype, an internal coding assistant for a single engineering group, or a one-off content generation tool may not deserve this machinery.

Also avoid it if your organization lacks even basic domain boundaries. If “customer data” is still a political argument rather than a managed concept, an AI routing platform will not fix your operating model. It will merely expose its weaknesses.

Architecture cannot compensate indefinitely for organizational ambiguity.

A few patterns sit close to this one.

API Gateway

Useful at channel ingress, but too shallow for full inference orchestration. Good for authentication and routing, insufficient for context semantics and reconciliation.

Event-Driven Architecture

Essential for asynchronous inference and enrichment. Kafka is often the right backbone when AI should react to business events rather than hijack user requests.

Backend for Frontend

Helpful when AI experiences differ by channel, but it should call intent-based inference capabilities rather than own model orchestration itself.

Strangler Fig Migration

The right migration style for scattered AI integrations. Replace mechanics incrementally while preserving domain workflows.

Domain-Driven Design

Absolutely central. Bounded contexts should define inference inputs and acceptance semantics. Without DDD thinking, AI platforms become integration mud.

Human-in-the-Loop Workflow

Not a concession. A serious architectural component for high-risk or low-confidence routes.

Data Products

Relevant, but context products for inference need stronger attention to timeliness, redaction, and workflow semantics than many generic analytical data products provide.

Summary

Enterprise AI is not primarily a model problem. It is a routing problem shaped by semantics, policy, timing, and correction.

The winning architecture is not one where every application talks directly to a model, nor one where a central AI team hoards domain logic. It is one where domains define meaning and workflows, while a shared inference routing layer handles the mechanics of getting governed context to the right inference path and bringing outputs back safely.

Think in intents, not model calls.

Think in context products, not giant payloads.

Think in Kafka events where enrichment belongs off the request path.

Think in progressive strangler migration, not heroic rewrites.

Think in reconciliation, because generated output is not business truth.

If you do that, the platform becomes something useful and durable. Not an AI fashion accessory, but an operational backbone for probabilistic computing inside a real enterprise.

And that, in the end, is the point.

The model may be clever. The architecture must be wiser.

Frequently Asked Questions

What is cloud architecture?

Cloud architecture describes how technology components — compute, storage, networking, security, and services — are structured and connected to deliver a system in a cloud environment. It covers decisions on scalability, resilience, cost, and operational model.

What is the difference between availability and resilience?

Availability is the percentage of time a system is operational. Resilience is the ability to recover from failures — absorbing disruption and returning to normal. A system can be highly available through redundancy but still lack resilience if it cannot handle unexpected failure modes gracefully.

How do you model cloud architecture in ArchiMate?

Cloud services (EC2, S3, Lambda, etc.) are Technology Services or Nodes in the Technology layer. Application Components are assigned to these nodes. Multi-region or multi-cloud dependencies appear as Serving and Flow relationships. Data residency constraints go in the Motivation layer.