Your AI Platform Is Just ETL with Better Marketing

⏱ 19 min read

Every few years, the industry reinvents a very old machine and gives it a better suit.

Yesterday it was business intelligence. Then big data. Then data lakehouse. Now it is AI platform. The decks get prettier. The budgets get larger. The promises get grander. But if you strip away the orchestration layer, the vector store, the prompt templates, the model gateway, and the breathless keynote language, a surprising amount of what people call an AI platform is still a pipeline that extracts data, transforms it into something useful, and loads it somewhere else for consumption.

That is not a criticism. It is the point.

The trouble starts when organizations pretend they have built something fundamentally new and therefore exempt from the disciplines that made enterprise integration work in the first place. They forget lineage. They ignore reconciliation. They bypass domain ownership. They shove every data shape into a “feature store” or “knowledge layer” and hope embeddings will erase semantic ambiguity. Then they act shocked when the model hallucinates, the retrieval pipeline goes stale, downstream services diverge, and nobody can explain which version of “customer” the AI assistant just used to deny a claim or approve a loan.

An AI platform is not magic. It is a distributed data system with probabilistic components. Which means all the old problems return, just wearing different badges: consistency, identity, latency, ownership, quality, observability, backfills, retries, schema drift, and failure isolation. Add models and prompts, and you do not remove enterprise architecture. You need more of it.

This is where a lot of teams lose the plot. They design around models instead of domains. They optimize for experimentation but not for operational truth. They build a magnificent prompt pipeline on top of deeply ambiguous business semantics. The result is a machine that is very good at sounding confident while being organizationally untrustworthy.

The better way is almost boring. Start with domain-driven design. Treat AI capabilities as consumers and producers of business meaning, not as a magical layer floating above the enterprise. Design your platform the way you would design any serious integration architecture: bounded contexts, explicit contracts, event streams where they fit, reconciliation where they must exist, and migration paths that assume the legacy estate will fight back.

In other words: if your AI platform cannot survive the same scrutiny as an ETL estate, it is not a platform. It is a demo environment with a budget.

Context

The enterprise AI wave arrived in three rough acts.

First came experimentation: notebooks, isolated model endpoints, and a small skunkworks team that could summarize documents and classify support tickets. Then came centralization: platform teams introduced gateways, prompt registries, vector databases, policy controls, and reusable components. Finally came industrialization: business units wanted AI embedded into workflows, customer journeys, case management, and operational decisioning.

That third act changes everything.

A toy summarization demo can tolerate semantic fuzziness. A production underwriting workflow cannot. A retrieval-augmented chatbot for internal FAQs can survive stale data for a day. A copilot helping a call center issue refunds probably cannot. Once AI participates in operational systems, all the old enterprise concerns stop being optional.

The irony is that organizations often know how to solve these concerns. They solved them in data warehousing, service-oriented architecture, master data management, event-driven integration, and reporting estates. But because AI is sold as exceptional, teams frequently bypass hard-earned lessons and build one more shadow integration stack.

The result looks modern:

  • source systems feeding Kafka topics
  • CDC streams from operational databases
  • normalization services in microservices
  • enrichment jobs
  • embedding pipelines
  • feature stores
  • vector indexes
  • model gateways
  • agent orchestrators
  • response caches
  • feedback loops

And yes, that stack is useful. But it is still a pipeline architecture. The novelty is not the existence of data movement. The novelty is that one of the consumers is a probabilistic model and one of the transformations is semantic encoding.

That distinction matters, but not as much as vendors want you to believe.

Problem

The central problem is simple: enterprises are building AI platforms as if data semantics were downstream concerns.

They are not. They are upstream architecture.

If the meaning of customer, order, case, policy, account, product, entitlement, risk, or incident is unresolved across the enterprise, then the AI platform will not unify it. It will merely ingest the ambiguity faster. Models are excellent at compressing patterns. They are terrible substitutes for explicit business meaning.

This is why so many AI initiatives produce impressive prototypes and disappointing operations.

Common symptoms show up early:

  • vector indexes built from conflicting source documents
  • duplicated customer profiles from multiple CRMs
  • prompts relying on data fields with unclear lineage
  • model outputs that cannot be reconciled to business transactions
  • retrieval pipelines returning “true enough” answers that are operationally false
  • downstream teams unable to explain how a recommendation was formed
  • platform teams becoming accidental owners of business semantics they do not understand

Underneath each symptom is a familiar architectural smell: integration without bounded context discipline.

The platform turns into a semantic landfill. Everything goes in. Nothing is truly resolved. Embeddings become a kind of technical perfume sprayed over data quality problems.

That works until the AI moves from assistance to action.

The moment an AI workflow updates a claim reserve, generates a legal notice, recommends fraud intervention, prioritizes care management, or influences pricing, you need more than retrieval quality. You need business accountability. You need to know whether the data is current, complete, canonical enough for the use case, and reconcilable with systems of record.

If this sounds like ETL thinking, good. It is.

Forces

There are several forces pulling the architecture in different directions, and pretending otherwise is where bad designs are born.

Speed versus semantic integrity.

AI teams want rapid experimentation. Domain teams want correctness. Both are right. A platform that requires six months of data governance before any prototype appears will lose political support. A platform that ships in six weeks with no domain contracts will lose operational trust. EA governance checklist

Central platform versus domain ownership.

Platform teams naturally want reusable infrastructure: model gateways, observability, document pipelines, vector stores, policy controls. But domain knowledge lives in business-aligned systems and teams. The platform should provide paved roads, not define the meaning of policy coverage or credit exposure.

Streaming versus reconciliation.

Kafka and event-driven microservices are excellent for propagating change and decoupling systems. They are not magic. Out-of-order events, duplicates, missed publications, schema evolution, and compensating actions happen in real enterprises every day. If your AI platform relies purely on streams without periodic reconciliation to authoritative sources, drift will creep in.

Probabilistic outputs versus deterministic workflows.

A model may produce useful suggestions with confidence scoring. Operational systems often need explicit yes/no state transitions with auditability. Somewhere in the architecture you must convert fuzzy recommendation into governed business action.

Global knowledge layer versus bounded contexts.

Executives love the idea of a single enterprise knowledge graph or unified AI brain. Architects should be suspicious. Some shared reference data is useful. A universal semantic model for every domain usually becomes political fiction encoded in software.

Here is the heart of it: architecture is the art of preserving optionality while protecting meaning. AI platforms fail when they maximize optionality for experimentation and accidentally destroy meaning.

Solution

The strongest architecture is to treat the AI platform as an evolution of enterprise data integration, not a replacement for it.

That means a few opinionated decisions.

First, organize around domain-driven design. Each bounded context owns its business language, source truth, and event contracts. The AI platform does not invent a universal model of the enterprise. It integrates through explicit context mappings. “Customer” in support, billing, and risk may overlap, but they are not automatically the same thing. If they need alignment, do it intentionally.

Second, separate operational truth from AI-ready projections. Systems of record remain where business commitments are made. The AI platform creates derived views, embeddings, features, knowledge indexes, and prompts from domain-owned data products. That sounds obvious, but many teams quietly allow the AI layer to become an ungoverned shadow master.

Third, combine streaming ingestion with reconciliation loops. Kafka is useful for near-real-time propagation. It is not enough on its own. Every serious platform needs periodic re-sync, replay, drift detection, and completeness checks against authoritative stores. Real systems drop messages, miss CDC windows, and survive partial outages. Reconciliation is not bureaucracy. It is how you stop confidence theater.

Fourth, define an explicit decision boundary. Models can advise, rank, classify, summarize, and propose. Domain services should commit transactions. If an LLM suggests a refund amount, a billing or case-management service should own the actual state transition. This protects auditability and keeps business rules in the domain where they belong.

Fifth, build AI services as consumers of domain events and publishers of recommendations, not as hidden workflow owners. This is the difference between augmentation and accidental platform overreach.

A useful mental model is this:

  • ETL moves and reshapes data for reporting or downstream use.
  • AI pipelines move and reshape data for probabilistic inference and contextual generation.
  • Enterprise architecture discipline applies to both.
  • The extra challenge in AI is semantic and operational uncertainty, not freedom from integration rules.

Architecture

A practical enterprise architecture has four layers:

  1. Systems of record and operational services
  2. Integration backbone
  3. AI preparation and inference services
  4. Experience and action channels

The backbone can be event-driven, batch, or hybrid. In most enterprises, it is hybrid because reality is hybrid.

Architecture
Architecture

This diagram matters because it shows a crucial separation: AI inference is not the destination of the architecture. Domain microservices still own the business action. microservices architecture diagrams

Domain semantics first

Suppose an insurance enterprise has claims, policy administration, billing, and customer support. Each bounded context has different meanings:

  • Claims care about incident, claimant, reserve, adjuster, fraud indicators.
  • Policy administration cares about coverage, endorsements, insured assets, effective dates.
  • Billing cares about invoicing, delinquency, payment plans.
  • Support cares about contact history, complaint category, service obligations.

If the AI platform ingests all of this into a broad “customer knowledge base” without preserving context, it will produce polished nonsense. A claimant is not always the policyholder. A policy’s active state may differ from billing standing. A support promise may not alter claims liability. Those distinctions are not details. They are the business.

A domain-driven AI architecture therefore creates context-specific data products and context mappings rather than a universal bucket.

Diagram 2
Domain semantics first

Notice what is absent: no fake enterprise super-entity trying to flatten every concept into one canonical object. Canonical models are useful for narrow integration seams. They become dangerous when used as a substitute for domain language.

Event-driven where relevant

Kafka earns its place when:

  • source systems emit meaningful business events
  • consumers need near-real-time updates
  • multiple downstream services require decoupled consumption
  • replay is valuable for rebuilding projections or indexes

Good event contracts describe domain facts, not database trivia. ClaimRegistered, PaymentMissed, PolicyEndorsed, CaseAssigned are useful. RowUpdatedInTableX is an integration tax dressed up as architecture.

That said, event streams should feed AI projections, not replace source accountability. An embedding index built from events can go stale or incomplete. So can a feature store. This is why event-driven pipelines need periodic rehydration from authoritative stores.

Reconciliation is non-negotiable

This is the part AI platform vendors rarely put on the first slide because it sounds less glamorous than “autonomous intelligence.”

But ask anyone who has operated a serious distributed platform: reconciliation is what keeps you out of front-page incidents.

You need at least three kinds:

  • Data completeness reconciliation: Did all expected records/events arrive?
  • Semantic reconciliation: Do projected AI views still align with current source truth?
  • Action reconciliation: Did AI recommendations that were accepted actually result in the intended business transaction?

Without these loops, your platform accumulates invisible entropy. Search indexes lag. Features drift. Summaries reference outdated facts. Recommended actions disagree with transactional state.

Here is the ugly truth: if you cannot answer “what was the authoritative business state when this model made that recommendation?” then you do not have enterprise AI. You have theater.

Diagram 3
Reconciliation is non-negotiable

Migration Strategy

Most enterprises cannot stop the world and rebuild their integration landscape around AI. Nor should they. The winning move is a progressive strangler migration.

Start with one narrow, high-value use case inside a bounded context. Not “enterprise assistant.” Something concrete: claims document summarization, support case triage, invoice dispute classification, contract clause extraction. Build the AI flow as a projection off existing systems, with explicit domain ownership and reconciliation.

Then extend.

A practical migration usually follows these stages:

1. Wrap legacy sources, do not replace them

Use CDC, APIs, file drops, or integration adapters to expose legacy truth. The first objective is observability and usable domain events, not immediate modernization.

2. Create domain-owned AI-ready projections

For each target use case, build a projection optimized for retrieval or inference. Redact sensitive fields where needed. Preserve lineage. Tag source timestamps. Record transformation versions.

3. Introduce AI services beside existing workflows

Do not rip out deterministic workflow engines on day one. Let AI suggest, classify, rank, or summarize alongside existing manual or rules-driven paths. Measure before automating.

4. Add reconciliation before scaling

This is where impatient programs fail. Once the first use case works, everybody wants ten more. Pause and add replay, drift monitoring, completeness checks, and fallback paths. Scale after trust, not before.

5. Strangle old manual integration paths selectively

As confidence grows, retire brittle file-based handoffs, duplicate indexing jobs, or bespoke point-to-point enrichments. Replace them with domain events and reusable projection pipelines.

6. Move from recommendation to governed action

Only when outputs are measurable, explainable enough, and operationally reconciled should the architecture permit higher autonomy. Even then, keep domain services as transaction owners.

Migration is not merely technical. It is organizational. Domain teams must own semantics. Platform teams own common capabilities. Risk, compliance, and operations must be in the room early because retrospective governance is a fantasy. ArchiMate for governance

Enterprise Example

Consider a global bank modernizing its customer operations platform.

The bank has:

  • a legacy core banking system
  • a separate CRM for relationship managers
  • a case management tool for service operations
  • Kafka already in place for event streaming
  • dozens of microservices around onboarding, fraud, payments, and communications

The AI ambition sounds familiar: create a “360-degree customer copilot” for call-center agents and relationship managers.

A naive approach would dump CRM notes, account data, product documents, policy files, and support cases into a giant vector index, layer an LLM on top, and declare victory. Many organizations do exactly this. It demos beautifully. It also fails in the first serious audit.

Why? Because “customer” is not one thing.

In retail banking, there are household relationships, legal entities, beneficial owners, product holders, authorized users, vulnerable customer markers, sanctions flags, marketing consent states, and active dispute records. Those are distinct concepts with different owners and regulatory implications. A generic semantic search over all of them is dangerous.

The better architecture starts with bounded contexts:

  • Customer onboarding owns KYC and identity verification semantics.
  • Accounts owns product holdings and balances.
  • Servicing owns case and interaction history.
  • Fraud owns investigation markers and intervention policies.
  • Communications owns approved outbound content and consent constraints.

The bank builds domain event streams such as:

  • CustomerVerified
  • AccountOpened
  • CaseCreated
  • DisputeRaised
  • ConsentUpdated

AI projections are then created per use case:

  • agent assist retrieval index for servicing
  • fraud summarization context for investigators
  • relationship-manager briefing summaries for commercial banking

Crucially, the assistant does not directly update customer records. It proposes next-best actions and drafts summaries. A servicing microservice commits the actual case status or outbound communication after policy checks.

Reconciliation jobs compare:

  • case summaries against current case state
  • account snapshots against core banking balances
  • consent state in the retrieval index against authoritative communications records

This architecture delivered something mundane and powerful: trust. Agents got useful assistance without the bank betting its control environment on a vector database pretending to be a system of record.

That is how enterprise AI should feel. Less magic. More reliability.

Operational Considerations

Production AI platforms live or die in operations, not architecture diagrams.

Observability

You need end-to-end lineage:

  • source record version
  • event offset or ingestion batch
  • transformation version
  • embedding model version
  • prompt template version
  • inference model version
  • recommendation ID
  • user acceptance or override outcome

Without this, incident response becomes folklore.

Security and privacy

AI pipelines often aggregate data across contexts that were never previously co-located. This creates fresh risk. Redaction, tokenization, policy enforcement, tenant isolation, and context-aware authorization matter. Retrieval should respect business permissions, not merely technical connectivity.

Prompt and model governance

Treat prompts, routing rules, and model selection as governed artifacts. A prompt change can alter business behavior as surely as a rules-engine release. Version it accordingly.

Cost discipline

Embedding every document, retaining every conversation, and invoking large models for trivial tasks is architectural laziness with a cloud invoice attached. Use smaller models where possible. Cache stable outputs. Rebuild indexes selectively. Design retrieval for precision, not vanity scale.

Human override

Operational AI needs visible override paths. If agents or analysts cannot challenge or correct the recommendation, the organization will either mistrust the tool or over-trust it. Neither ends well.

Tradeoffs

No architecture escapes tradeoffs, and good architects say them out loud.

Centralized platform services increase reuse but can slow domain autonomy.

A common model gateway, prompt registry, and observability stack are good ideas. A centralized semantic model for all business concepts usually is not.

Event-driven architecture reduces coupling but increases temporal complexity.

Kafka is excellent for propagation and replay. It also introduces ordering, idempotency, and lag concerns that batch-minded teams often underestimate.

Reconciliation improves trust but adds cost and latency.

Periodic rebuilds, audits, and state comparisons consume money and engineering effort. Skip them and you will pay later in incidents.

Domain-specific projections preserve meaning but duplicate data.

That duplication is often healthy. Shared truth does not require a single physical representation for every use case.

Keeping AI out of transactional ownership reduces risk but can limit autonomy.

That is fine. Most enterprises need useful augmentation before they need autonomy.

Failure Modes

The most common failure modes are painfully consistent.

1. The semantic swamp

Everything gets indexed. Nothing gets clarified. Users receive plausible but conflicting answers because the platform collapsed multiple bounded contexts into one retrieval surface.

2. Streaming fundamentalism

Teams assume Kafka makes the platform “real time” and therefore correct. Meanwhile events are missed, replay logic is weak, and no reconciliation exists. Silent drift accumulates.

3. Shadow master data

The AI platform becomes the easiest place to look up “customer truth,” so teams quietly start relying on it operationally. Soon the projection outranks the source in practice, and governance collapses.

4. Automation theater

Recommendations are pushed directly into workflow updates without enough policy checks, explainability, or domain validation. Small errors become scaled errors.

5. Platform overreach

The central AI team starts defining business entities because no one else moved fast enough. This creates organizational resentment and semantic fragility at the same time.

6. No rollback story

A bad model release, corrupted embedding run, or prompt regression occurs, and nobody can reconstruct prior state. Enterprise systems need rollback and replay plans, not just hope.

When Not To Use

This approach is not always appropriate.

Do not build a full AI platform with domain projections, Kafka streams, and reconciliation loops if: event-driven architecture patterns

  • the use case is isolated and low-risk
  • the data is small, static, and local to one team
  • there is no meaningful operational consequence
  • a simpler search or rules solution solves the problem
  • the organization lacks basic data ownership and will not fund it

Sometimes a department tool with straightforward ETL and a narrow inference service is enough. Not every problem deserves a platform. In fact, “platform” is one of the most overused words in enterprise IT. If you have one use case, one team, and one data source, you probably have an application, not a platform.

Also, if your enterprise has not resolved foundational data ownership issues, an AI platform will not save you. It will expose them at scale.

Several adjacent patterns fit naturally here.

CQRS helps separate write models from AI-ready read projections.

Event sourcing can support replayable histories, though it should be used carefully and not as a religion.

Data mesh contributes the idea of domain-owned data products, provided it is grounded in real governance rather than slideware.

Strangler fig migration is the right instinct for replacing brittle integration paths gradually.

Anti-corruption layers are especially valuable when legacy systems expose ugly semantics that should not leak into AI consumers.

Outbox pattern improves reliability when publishing domain events from transactional systems.

Saga patterns matter when AI recommendations trigger multi-step business processes that require compensations.

These patterns are useful because they reinforce the same principle: preserve domain meaning, control coupling, and make failure survivable.

Summary

An AI platform is not beyond ETL. It is ETL plus semantics, inference, and uncertainty.

That is enough novelty to matter, but not enough novelty to ignore the last thirty years of enterprise architecture. If anything, AI makes old disciplines more valuable. Bounded contexts matter more because language gets blurrier. Reconciliation matters more because projections multiply. Event contracts matter more because downstream consumers now include probabilistic systems. Migration strategy matters more because you cannot pause the enterprise while you experiment.

So yes: your AI platform is just ETL with better marketing.

Say it without embarrassment. ETL built every serious reporting and integration estate most enterprises rely on today. The point is not that AI is unimpressive. The point is that enterprise reliability comes from respecting the boring machinery beneath the shiny layer.

Build AI as a domain-aware projection and decision-support system.

Keep transactional truth in domain services.

Use Kafka where it buys decoupling, not as theology.

Reconcile relentlessly.

Migrate progressively with a strangler approach.

And never let embeddings pretend they solved business semantics.

The best enterprise architectures are not the ones that sound futuristic. They are the ones that survive contact with accounting, operations, regulators, and Monday morning.

That is the bar.

Frequently Asked Questions

What is cloud architecture?

Cloud architecture describes how technology components — compute, storage, networking, security, and services — are structured and connected to deliver a system in a cloud environment. It covers decisions on scalability, resilience, cost, and operational model.

What is the difference between availability and resilience?

Availability is the percentage of time a system is operational. Resilience is the ability to recover from failures — absorbing disruption and returning to normal. A system can be highly available through redundancy but still lack resilience if it cannot handle unexpected failure modes gracefully.

How do you model cloud architecture in ArchiMate?

Cloud services (EC2, S3, Lambda, etc.) are Technology Services or Nodes in the Technology layer. Application Components are assigned to these nodes. Multi-region or multi-cloud dependencies appear as Serving and Flow relationships. Data residency constraints go in the Motivation layer.