⏱ 21 min read
Most analytics platforms are built with the emotional posture of a tax auditor. They arrive after the fact, inspect what happened, and produce judgments in dashboards, cubes, extracts, and slide decks. That mindset is the original sin.
An analytics platform is not the center of the enterprise. It is not the source of truth. It is not where business meaning is born. It is, in architectural terms, a consumer layer.
That sounds obvious until you walk into a large company and discover the exact opposite: the data lake quietly redefining customer status, the warehouse inventing its own order lifecycle, the BI model “fixing” broken operational semantics with undocumented SQL, and half the organization trusting a revenue metric that exists nowhere in the transaction systems that actually collect money.
This is how architecture decays. Not with a dramatic outage, but with a second copy of reality.
The better mental model is blunt and useful: your operational domain creates facts; your analytics platform consumes them. It may aggregate, project, denormalize, enrich, and reconcile. But it should not become the hidden author of business truth. Once analytics starts owning operational semantics, you have built a shadow enterprise.
This article lays out that topology clearly: what problem it solves, how domain-driven design changes the shape of analytics, why Kafka and event streams matter, how to migrate there with a progressive strangler, where reconciliation fits, and where this pattern becomes the wrong tool. This is not an academic model. It is the architecture you need when dozens of systems, hundreds of services, and thousands of users all want answers from the same business without agreeing on what the business means.
Context
Every large enterprise eventually accumulates three worlds:
- Operational systems that run the business
Order capture, billing, fulfillment, pricing, identity, inventory, customer service.
- Analytical systems that explain the business
Reports, dashboards, experimentation, forecasting, planning, regulatory extracts, data science workbenches.
- Integration plumbing trying to connect the first two
Batch ETL, CDC, Kafka topics, APIs, object storage, reverse ETL, data contracts, lineage tools. event-driven architecture patterns
The trouble begins when these worlds drift in responsibility.
Operational systems usually evolve around transactions and workflow. They care about consistency, latency, permissions, and the next customer action. Analytics systems care about broad history, trend analysis, cross-domain correlation, and query flexibility. These are different forces and they create different shapes. That difference is healthy.
What is unhealthy is letting the analytical estate become the place where business semantics are repaired, overridden, or re-authored.
A customer is not “gold” because a BI engineer joined three tables and applied a filter last Tuesday. An order is not “completed” because a warehouse transformation says it is. Revenue is not “recognized” because a dashboard says so. Domain semantics must originate in bounded contexts that own them. Analytics should consume those semantics and make them legible at scale.
That is the architecture topology worth defending.
Problem
Most enterprise analytics programs start with a practical need: “bring all the data together.” Fair enough. Then a predictable sequence follows.
First, teams ingest operational tables in bulk. Next, they discover the sources disagree. Then they add transformation logic “temporarily” in the pipeline. Then business users begin trusting the warehouse more than the operational systems because it has the prettier definitions. Eventually, operational teams themselves start querying analytical datasets to understand business state. At that point the consumer has become a surrogate producer.
This breaks in several ways.
The first break is semantic drift. Different layers define “customer,” “active,” “fulfilled,” “churned,” and “net revenue” differently. Nobody notices for months because every number is plausible.
The second break is ownership inversion. The teams closest to the domain no longer control how their own facts are interpreted. A central data team becomes the accidental steward of every concept in the enterprise.
The third break is latency dishonesty. Analytical systems are often eventually consistent, but their users treat them as if they were current. A dashboard says inventory is available; the fulfillment service says otherwise. One of them is wrong, and usually the prettier one wins the argument until reality intervenes.
The fourth break is reconciliation debt. Once multiple semantic copies exist, every issue turns into a forensic investigation across ETL jobs, transformations, service logs, and business rules embedded in five places.
The root cause is almost always the same: analytics was designed as a place to own enterprise truth instead of a layer to consume and project domain truth.
Forces
Good architecture lives in the tension between legitimate forces. This pattern exists because several of them collide hard.
Domain autonomy versus enterprise visibility
Teams need bounded contexts with clear ownership. Billing should own invoices. Fulfillment should own shipment state. Identity should own customer credentials and verification status. But the enterprise also needs cross-cutting views: customer lifetime value, order-to-cash cycle time, regional demand trends, fraud patterns.
Those views are real. The mistake is assuming cross-domain visibility requires cross-domain semantic ownership.
Transactional correctness versus analytical flexibility
Operational systems favor normalized models, command workflows, and narrow queries with tight SLAs. Analytics favors denormalized models, long retention, broad scans, historical snapshots, and many different read patterns.
Trying to make one store serve both concerns usually creates a miserable compromise. Either the operational model becomes distorted for reporting, or the analytical model becomes hostage to transaction design.
Event time versus processing time
In enterprise systems, what happened, when it happened, and when you learned about it are often different. Orders are placed, payments are retried, shipments are delayed, returns are backdated, and corrections arrive days later. Analytics platforms must model this honestly or they will quietly lie.
Scale of consumption
One domain event may feed dashboards, machine learning features, finance reconciliation, operational monitoring, and regulatory reporting. That fan-out argues strongly for a consumer-oriented topology with stable publishing contracts rather than direct coupling to live transactional stores.
Change over time
Definitions evolve. “Active customer” changes after a policy revision. Product hierarchy changes after an acquisition. Region mappings change after reorganization. The architecture has to tolerate semantic evolution without making historical analysis unusable.
This is where domain-driven thinking matters. If you do not know which bounded context owns a concept, your analytics platform will decide for you. And it will do it badly.
Solution
The solution is simple to say and harder to enforce:
Treat the analytics platform as a downstream consumer layer that projects domain facts into analytical models without stealing semantic ownership from source domains.
That means a few concrete things.
1. Domain systems publish facts, not just tables
A billing context should expose bill-issued, payment-captured, invoice-adjusted facts in a governed way. A fulfillment context should expose shipment-created, packed, dispatched, delivered, returned facts. These may be events on Kafka, CDC streams with contracts, APIs for reference lookup, or periodic extracts where nothing better exists.
The exact mechanism matters less than the discipline: domains publish facts they own.
2. Analytics builds read models, not shadow command models
The analytical estate should create star schemas, wide tables, feature sets, time-series aggregates, and curated marts. That is its job. But it should not become the system where order completion logic, customer eligibility rules, or billing status transitions are primarily defined.
If analytics needs to derive metrics, derive them from published business facts and name them clearly as analytical projections.
3. Cross-domain meaning is assembled, not invented
Some analytical views are inherently composite. “Customer profitability” spans sales, service, finance, and fulfillment. No single bounded context owns that concept operationally. Fine. Build that as an analytical projection, but do not smuggle operational semantics into it. The underlying facts must remain attributable to their domains.
4. Reconciliation is a first-class capability
In large enterprises, upstream systems are late, wrong, duplicated, corrected, and out of order. If your analytics platform is a consumer layer, it must be excellent at reconciliation: comparing consumed facts with source-of-record snapshots, detecting drift, replaying streams, handling late-arriving events, and surfacing confidence levels.
5. Consumer contracts matter as much as producer contracts
Analytics is not “just downstream.” It is often the largest consumer in the company. It deserves explicit contracts for schema evolution, event versioning, data quality expectations, and semantic change notices.
The architecture topology looks like this:
The important line in that diagram is conceptual, not visual: facts flow downstream; semantic ownership does not flow with them.
Architecture
A sound implementation usually has four analytical layers. Call them what you like, but do not collapse them casually.
Raw ingestion layer
This is the landing zone for domain emissions: Kafka topics, CDC logs, object drops, and reference extracts. Keep them close to source fidelity. Preserve metadata such as source system, event time, ingestion time, schema version, partition keys, and correlation IDs.
This layer is not where business cleanup should happen. It is where provenance survives.
Standardized event and reference layer
Here you normalize transport differences without changing business meaning. You may convert Avro to Parquet, align naming conventions, decode CDC change types, enrich with source metadata, and map technical identifiers to stable enterprise identifiers where governance allows. EA governance checklist
This is still not the place to redefine the business.
Curated analytical projection layer
This is where the real analytical work begins. Build consumer-friendly models:
- order lifecycle facts
- customer interaction timelines
- product demand aggregates
- finance adjustment ledgers
- inventory movement snapshots
These projections can join across domains, denormalize heavily, and expose metrics useful for analytics. But their lineage back to domain facts should remain explicit.
Consumption layer
Semantic marts, dashboards, machine learning features, planning extracts, and regulatory reports live here. This is where user-specific shape matters most. It is also where metric sprawl can become dangerous unless governed carefully.
Domain semantics and bounded contexts
This pattern only works if you take domain semantics seriously.
Domain-driven design is not decoration here. It is the mechanism that stops analytics from becoming a swamp with SQL. Each bounded context defines the language of its facts. “Order accepted” in Ordering is not the same thing as “payment authorized” in Billing or “ready to ship” in Fulfillment. They may all contribute to an enterprise order journey, but they are distinct domain statements.
A common failure is forcing every domain into a single canonical enterprise model too early. That usually creates vague nouns and political arguments. Better to accept multiple bounded truths and compose them analytically.
A customer service context might define “active customer” as any customer with a non-closed account. Marketing might define “active customer” as anyone with digital engagement in 90 days. Finance might define “active customer” as anyone generating recognized revenue in the period. Do not pick one in the warehouse and pretend the debate is over. Name each metric in its domain language and build explicit analytical mappings where comparison is needed.
That is architecture doing its job: preserving meaning, not flattening it.
Kafka and streaming
Kafka is relevant here because it gives you durable, replayable, ordered-by-partition event streams that are ideal for a consumer-layer analytics platform. But Kafka is not magic. It helps when domains can publish stable event contracts and when consumers need replay, fan-out, and near-real-time projections.
Use Kafka for:
- domain events with broad consumption
- CDC streams that need durable propagation
- decoupled near-real-time analytical updates
- replayable correction and reprocessing workflows
Do not use Kafka as an excuse to publish garbage schemas or every internal state twitch. Event design still matters. “CustomerUpdated” with a blob payload is not domain-driven architecture; it is laziness with throughput.
A better topology includes both event streams and reference data lookups:
Notice the reconciliation stage. In real enterprises, streams alone are not enough.
Migration Strategy
You will rarely get to build this cleanly from scratch. Most companies already have a warehouse stuffed with copied tables and tribal SQL. So the migration path matters more than the target diagram.
This is a classic case for a progressive strangler.
Do not announce a glorious platform rewrite. You will create two failing systems instead of one improving one. Migrate domain by domain, metric by metric, with visible coexistence and explicit reconciliation.
Step 1: Identify semantic hotspots
Look for concepts where analytical and operational truth currently diverge:
- order status
- booked versus recognized revenue
- customer identity and householding
- returns and refunds
- inventory availability
These are your fault lines. Start there, because architecture earns trust by fixing painful disagreements, not by publishing elegant principles.
Step 2: Establish source-of-truth ownership
For each hotspot, define the bounded context that owns each atomic fact. Not the metric. The fact.
Example:
- Ordering owns order accepted and order cancelled.
- Billing owns payment captured and invoice posted.
- Fulfillment owns shipped and delivered.
- Finance owns recognition adjustments.
This sounds bureaucratic. It is not. It is the minimum needed to stop the warehouse from making things up.
Step 3: Introduce publish-consume contracts
Create event or CDC contracts for those owned facts. If Kafka is available, publish them there. If not, expose stable extracts and schema-controlled feeds. Add versioning and change management from day one.
Step 4: Build parallel analytical projections
Do not rip out the old warehouse logic immediately. Build a new analytical projection beside it, sourced from domain facts. Then compare outputs with the legacy model. Expect mismatches. The mismatches are the migration.
Step 5: Reconcile continuously
Reconciliation is not a testing phase; it is an operating model. Compare counts, state transitions, balances, and temporal alignment between old and new views. Surface differences with severity and root-cause categories:
- late event
- duplicate event
- source correction
- mapping mismatch
- business rule divergence
- identity resolution issue
Step 6: Cut over consumers incrementally
Move one dashboard, one finance report, one ML feature pipeline at a time. Avoid big-bang BI migration. Every consumer has hidden assumptions, and analytical consumers are notorious for undocumented dependencies.
Step 7: Retire semantic logic from the old platform
This is the hard discipline. Once a domain-backed projection is trusted, remove the corresponding “fix-up” logic from legacy transforms. If you leave both in place, the old platform will continue to spawn alternate truth.
A migration view looks like this:
The strangler pattern works here because analytics consumers are diverse and sticky. You need a path that allows old and new to coexist long enough to expose differences without freezing business delivery.
Enterprise Example
Consider a global retailer with ecommerce, stores, and a wholesale business. It has separate systems for order capture, payments, warehouse management, CRM, loyalty, and general ledger. Over a decade, the enterprise built a central warehouse that became the de facto place to define customer value, order completion, and revenue.
Everything looked fine until omnichannel fulfillment arrived.
A “completed order” in the warehouse meant:
- payment approved
- at least one shipment dispatched
- no full cancellation
But operations had become more complicated:
- split shipments across warehouses
- partial pickup in store
- backorders
- reshipments after carrier loss
- post-invoice price adjustments
- fraud reversals after dispatch
The warehouse logic kept simplifying this mess into a single status because BI needed a neat funnel chart. Finance used a different revenue model. Customer service used the CRM. Supply chain used warehouse scans. Executives saw four truths for the same week.
The fix was not “better SQL.” The fix was topology.
The retailer re-framed analytics as a consumer layer. Ordering published order accepted, amended, cancelled. Payments published authorized, captured, reversed. Fulfillment published allocated, packed, dispatched, delivered, returned. Finance published invoice posted and recognition adjusted. Customer and loyalty published identity and membership facts.
Kafka carried the domain events; CDC filled gaps for old systems. The analytics platform consumed them into an order journey projection with explicit milestones and state history instead of a fake single status. Reconciliation compared fulfillment scans, billing extracts, and analytical projections daily. Late events were marked and replayed. Dashboards shifted from “order completed” to more precise measures:
- order accepted rate
- dispatch lead time
- delivered within promise
- net recognized revenue
- return-adjusted margin
The result was not prettier. It was more honest. And honesty scales better than elegance.
More importantly, ownership changed. The fulfillment team now owned what “dispatched” meant. Billing owned “captured.” Finance owned recognition adjustments. Analytics assembled enterprise views but no longer invented domain semantics. That reduced argument time dramatically because every disputed number could be traced back to a domain fact and a projection rule.
This is what mature enterprise architecture looks like: less centralized control than people expect, and more semantic discipline than they usually want.
Operational Considerations
A consumer-layer analytics platform is not easier to run. It is easier to reason about. Those are not the same thing.
Reconciliation as an operational capability
You need formal reconciliation pipelines, not ad hoc spreadsheet checks. Reconcile:
- event counts versus source counts
- state transitions versus valid lifecycle rules
- financial balances
- inventory movement totals
- identity mappings
- late-arriving and corrected events
Publish reconciliation health as a product metric. If your projections are 96% aligned and drifting on one source, users should know.
Data quality and observability
Track freshness, completeness, schema drift, null-rate anomalies, duplication, and referential gaps. Attach lineage from analytical metrics back to source events and transformations. In a large enterprise, “where did this number come from?” is not a philosophical question. It is Tuesday.
Idempotency and replay
Consumers must be replay-safe. Duplicates happen. Reordered events happen. Corrections happen. Your projection logic should support idempotent application, version-aware merges, and replay from retained streams or raw storage.
Identity resolution
Cross-domain analytics often hinges on identity stitching: customer IDs across channels, product IDs across acquisitions, location hierarchies across reorganizations. Handle this as a governed capability, not scattered lookup logic in every transformation.
Temporal modeling
Store both business effective time and processing time where relevant. Without that, you cannot answer basic enterprise questions like “what did we believe on the close date?” versus “what do we know now after corrections?”
Security and access
Consumer-layer does not mean free-for-all. Domain facts often include sensitive data. Mask or tokenize personal data, separate restricted marts, and preserve auditability. The analytics platform may be downstream, but it is still one of the biggest concentration points of enterprise risk.
Tradeoffs
This pattern is strong, but not free.
What you gain
You gain semantic clarity. Domain ownership remains visible. Cross-domain analytics becomes compositional instead of political. Migration gets safer because contracts can evolve. Reprocessing and replay become practical. Reconciliation becomes systematic rather than heroic.
You also gain a healthier operating model: data teams stop pretending to be the secret owners of the business.
What you pay
You pay in design discipline. Event contracts need stewardship. Domain teams must do more than dump tables. Reconciliation pipelines take real engineering. Consumers must tolerate eventual consistency and historical correction.
You also lose some short-term convenience. It is faster to slap a transformation in the warehouse than to align a bounded context, define a business fact, and publish a contract. But that speed is rented. The interest comes due during every audit, outage, and metric dispute.
The central tradeoff
The real tradeoff is between local semantic ownership and centralized interpretive convenience.
A warehouse that defines everything centrally feels efficient early on. Then the business changes and every concept becomes contested. A consumer-layer architecture feels slower at first because it insists that meaning has an owner. Then the enterprise scales and this restraint starts paying for itself every week.
Failure Modes
This pattern fails in familiar ways.
1. Event theater
Teams publish events, but the events are just thin wrappers around table mutations with no domain meaning. Consumers still have to reverse-engineer business state. You have streaming, but not semantics.
2. Canonical model overreach
An enterprise architecture group invents a universal business schema and forces all domains through it. The result is vague contracts, endless governance meetings, and producers lying to fit the model. ArchiMate for governance
3. Analytics reclaims semantics anyway
Despite the principles, analysts keep adding “temporary” corrective logic because source systems are messy. Over time the consumer layer becomes a producer of alternate truth again.
4. No reconciliation loop
The platform ingests data but lacks a formal way to compare consumed facts with source truth and corrections. Drift accumulates silently until a financial close or regulatory inquiry exposes it.
5. Stream obsession
Everything is forced into Kafka even when a daily reference extract or API lookup is perfectly adequate. This creates operational noise and complexity without improving meaning.
6. Bounded contexts ignored
The organization says “domain-driven” but still has unclear ownership. If nobody owns the semantics, the analytics platform will absorb them by default.
The most common failure is not technical. It is social: unwillingness to decide who owns what a business term means.
When Not To Use
Do not use this pattern everywhere.
If you are a small company with one operational database and a handful of reports, a formal consumer-layer architecture is probably overkill. You do not need Kafka, reconciliation engines, and semantic marts to answer ten operational questions from one app.
Do not force this pattern when the analytical system is the primary system of record. Some planning, actuarial, or risk platforms are not downstream consumers; they are authoritative domains in their own right. Treat them as bounded contexts, not just analytics.
Avoid over-engineering if your primary need is simple batch regulatory reporting from a stable source with little semantic contention. A straightforward extract pipeline may be enough.
And be careful in domains where near-real-time analytical decisions loop directly into operations. If analytics outputs become operational commands, you have crossed a boundary. At that point you need to treat parts of the analytical estate as operational decision services, with all the governance and reliability that implies.
The line is simple: if the platform mainly explains the business, it is a consumer layer. If it actively runs the business, it is something more, and you should design it accordingly.
Related Patterns
Several adjacent patterns fit naturally here.
Event-driven architecture
Useful for publishing domain facts with loose coupling and replay. Good fit when many consumers need the same events.
CQRS read models
The analytical layer is essentially a large-scale read-model estate. It materializes views optimized for consumption rather than command processing.
Data mesh, used carefully
The good part of data mesh applies: domain-oriented ownership and data as a product. The bad implementation pattern to avoid is pretending every domain can publish analytically ready truth without central standards for contracts, lineage, and governance. enterprise architecture with ArchiMate
Strangler fig migration
Ideal for replacing legacy warehouse semantics incrementally while proving correctness through parallel runs and reconciliation.
Lakehouse or warehouse modernization
These are technology choices, not topology choices. You can implement this consumer-layer model in a warehouse, lakehouse, or hybrid platform. The key issue is ownership and semantics, not storage branding.
Summary
The most dangerous sentence in enterprise analytics is: “we’ll fix it in the warehouse.”
That sentence sounds pragmatic. It is often the start of a shadow enterprise.
A better architecture draws a harder line. Operational domains own business facts and semantics within their bounded contexts. The analytics platform consumes those facts, reconciles them, and projects them into forms useful for dashboards, finance, forecasting, machine learning, and enterprise insight. It is downstream, but not secondary. It is powerful, but not sovereign.
That distinction matters.
It protects domain meaning. It makes migration possible. It gives Kafka and event streams a real role instead of decorative infrastructure. It forces reconciliation into the design instead of leaving it for quarter-end panic. And it stops the analytical estate from quietly becoming the place where the business is redefined by SQL and hope.
Build your analytics platform as a consumer layer and it will tell you what your enterprise is doing.
Build it as a hidden producer of truth and eventually it will tell you a very convincing story about a business that does not exist.
Frequently Asked Questions
What is cloud architecture?
Cloud architecture describes how technology components — compute, storage, networking, security, and services — are structured and connected to deliver a system in a cloud environment. It covers decisions on scalability, resilience, cost, and operational model.
What is the difference between availability and resilience?
Availability is the percentage of time a system is operational. Resilience is the ability to recover from failures — absorbing disruption and returning to normal. A system can be highly available through redundancy but still lack resilience if it cannot handle unexpected failure modes gracefully.
How do you model cloud architecture in ArchiMate?
Cloud services (EC2, S3, Lambda, etc.) are Technology Services or Nodes in the Technology layer. Application Components are assigned to these nodes. Multi-region or multi-cloud dependencies appear as Serving and Flow relationships. Data residency constraints go in the Motivation layer.