⏱ 21 min read
Most data platforms are built like warehouses and expected to behave like orchestras.
That mismatch is the root of a lot of enterprise pain.
We inherited a mental model where data platforms were places to store things: tables, files, extracts, curated marts, reports. The language was static. Land the data. Clean the data. Publish the data. It all sounded reassuringly industrial, as if the organization’s mess could be solved by piling facts into one large, well-governed place.
But modern enterprises do not run on storage. They run on composition.
A pricing decision is composed from product policies, customer entitlements, supply constraints, promotions, tax rules, risk signals, and channel context. A fulfillment promise is composed from inventory positions, transport capacity, warehouse cutoffs, carrier performance, and order priority. A regulatory report is composed from transactions, legal entity hierarchies, accounting mappings, and effective-dated policy rules. The valuable thing is not merely the data. It is the arrangement of semantics across domains at a point in time for a business purpose.
That changes the architecture.
A serious data platform is not just a repository, nor merely a streaming backbone, nor a dashboard factory. It is a composition engine: a system that assembles domain facts, policies, state transitions, and derived interpretations into usable business outcomes. Once you see that, the topology of the platform changes. Domain boundaries matter more than schema standardization. Reconciliation matters more than “single source of truth” slogans. Event streams become useful, but only if they preserve business meaning. Microservices help, but only when they reflect domain seams rather than organizational fashion.
This is where enterprise architecture has to grow up a little. Data architecture, integration architecture, operational architecture, and domain-driven design are no longer separate conversations. They are the same conversation told from different angles.
The question is not “How do we centralize all enterprise data?” The question is “How do we compose domain semantics safely, incrementally, and repeatedly across a changing business?”
That is a better question. It leads to better systems.
Context
The average enterprise has accumulated layers of data infrastructure that mirror its own history of reorganizations and procurement decisions. There is usually a transactional core: ERP, CRM, billing, manufacturing, claims, policy administration, booking, order management, or some improvised combination of all of them. Around that core sit integration tools, message brokers, ETL jobs, operational data stores, data lakes, streaming platforms, warehouse platforms, API gateways, MDM hubs, and enough spreadsheets to keep the whole thing humble.
Every one of these tools came with a promise. Consolidate. Synchronize. Standardize. Govern. Democratize. And every one of them delivered some value. But together they often leave the enterprise with a peculiar condition: a lot of data movement and not enough business coherence.
The reason is simple. The platform is usually designed around technology capabilities rather than domain topology.
Domain-driven design gives us a more useful lens. Businesses are not undifferentiated pools of information. They are landscapes of bounded contexts. A customer in billing is not the same thing as a customer in marketing. Inventory in planning is not inventory in store operations. An order in e-commerce is not an order in finance. These are not data quality mistakes. They are semantic differences born from different purposes, constraints, and workflows.
A data platform that ignores this ends up doing semantic violence. It flattens distinctions too early, centralizes concepts that should remain contextual, and creates canonical models that satisfy no one. Then the organization compensates with endless transformation logic, exception handling, and governance committees. EA governance checklist
There is another path. Treat the platform as the machinery for composing bounded contexts, not erasing them.
That does not mean fragmentation. It means controlled interoperability. It means understanding where semantics are authoritative, where they are projected, where they are reconciled, and where they are interpreted for a downstream use case. It means building topology, not just pipelines.
Problem
Most enterprises have at least one fantasy architecture diagram in circulation. In the middle there is a giant “enterprise data platform.” Arrows from all systems point inward. From there, arrows point outward to analytics, AI, APIs, operations, and executive bliss.
Reality is less photogenic.
The giant platform becomes a gravity well. Source systems dump their data into it in whatever shape they can manage. Central teams normalize fields, infer relationships, repair broken references, and create “gold” datasets. Downstream teams consume these artifacts and quickly discover that the data is late, ambiguous, inconsistent with operational state, or detached from the business events that gave it meaning.
So they build side logic.
Soon there are separate reconciliation jobs, CDC pipelines, Kafka topics, reverse ETL syncs, API mashups, and service-level caches. Everyone is composing data somewhere, but no one admits that composition is the main architectural concern. It is treated as glue code, not first-class design. event-driven architecture patterns
This creates several recurring pathologies:
- Canonical model paralysis: too much effort spent trying to define universal enterprise entities.
- Semantic drift: field names remain stable while business meaning changes underneath.
- Operational/analytical split-brain: the warehouse says one thing, the transaction system another.
- Broken lineage of intent: you can trace columns, but not business decisions.
- Latency mismatch: some use cases need event-time truth; others need reconciled, period-end truth.
- Ownership ambiguity: no one can answer who is authoritative for a concept under contention.
The usual response is more tooling. A catalog here, a mesh there, a streaming platform everywhere. Useful, certainly. Sufficient, no.
The real issue is architectural. The enterprise is trying to answer composition questions with storage answers.
Forces
Enterprise architecture is never a clean-sheet exercise. It is pressure management. In a data platform, the key forces pull in different directions, and pretending otherwise is how bad designs get approved.
1. Domain semantics versus enterprise consistency
Business domains need local language, local invariants, and local models. Finance, fulfillment, service, risk, and sales do not think in the same concepts, and they should not be forced to. Yet the enterprise also needs cross-domain outcomes: margin reporting, customer experience metrics, fraud detection, and working capital visibility.
This is the central tension. If you over-standardize, you destroy useful meaning. If you over-localize, you cannot compose across the business.
2. Event-time truth versus reconciled truth
Kafka and event streaming are powerful precisely because they preserve the sequence of business facts as they happen. But many enterprise use cases do not run on raw event-time truth alone. They run on reconciled interpretations: settled payments, adjusted inventory, closed accounting periods, policy-effective states.
Streaming gives immediacy. Reconciliation gives trust. You need both.
3. Autonomy versus governance
Domain teams want to ship quickly, publish their own events, expose their own APIs, and own their own schemas. Central platform teams want security, lineage, retention policies, quality controls, and compliance. Both are right. Neither is sufficient.
4. Progressive migration versus operational continuity
You are not replacing the estate in one move. Not in a bank. Not in a retailer. Not in a manufacturer. You will migrate progressively, often with a strangler pattern around a stubborn core. During that migration, old and new worlds must both function, and they will disagree in awkward ways.
5. Reuse versus coupling
A shared composition layer can reduce duplicated logic. It can also become a new monolith in disguise. Shared means efficient; shared also means politically contested and operationally fragile.
These are not problems to solve away. They are forces to balance.
Solution
The architecture I recommend is straightforward in principle and demanding in execution:
Design the data platform as a composition engine organized by domain topology.
That means four things.
First, treat domains as first-class architectural units
A domain is not just a folder in the catalog. It has semantic ownership, event vocabulary, quality obligations, and interfaces for composition. Bounded contexts matter because they tell you where language is stable and where translation is required.
This is pure domain-driven design territory. Use bounded contexts to define where data is authoritative, where it is merely referential, and where anti-corruption layers must exist. A customer identity domain should not be casually merged with CRM campaign notions of a customer. An inventory availability domain should not be flattened into an accounting stock balance just because both use the word “inventory.”
Second, separate source truth, derived truth, and composed truth
These are different species.
- Source truth is what the operational domain asserts.
- Derived truth is what that domain or another computes from source events or states.
- Composed truth is the cross-domain view assembled for a business capability.
Enterprises get into trouble when they collapse these categories. A “gold customer” table often mixes operational records, mastering rules, survivorship logic, and downstream segmentation. Useful? Yes. Honest? Not really.
Composition should be explicit.
Third, build reconciliation as a product, not an afterthought
Reconciliation is where architecture meets finance, risk, and operational reality. It is the discipline of proving that the composed view still corresponds to the business world, despite asynchronous events, partial failures, retries, duplicates, timing gaps, and upstream defects.
A composition engine that cannot reconcile is just a faster way to spread confusion.
Fourth, enable progressive strangler migration
Do not attempt a big-bang replacement of the central warehouse, the integration hub, or the transactional core. Wrap domains, publish events, capture change data where necessary, create composition services and analytical products incrementally, and move decision-making use cases one capability at a time. Let the new topology grow around the old estate until the old estate becomes less central.
That is architecture with a pulse.
Architecture
The platform has three broad layers, though I dislike overly neat stack diagrams because they encourage neat thinking about messy systems. Still, the shape matters.
- Domain planes where operational systems and domain services publish authoritative data and events.
- Composition planes where domain facts are joined, interpreted, reconciled, and exposed for business use cases.
- Consumption planes where applications, analytics, AI, and operational workflows consume fit-for-purpose views.
Here is the topology.
Domain planes
Each domain should publish one or more of the following:
- Events representing business facts and state transitions
- Operational APIs for current state lookup and command interactions
- Reference datasets with explicit ownership and refresh contracts
- Change feeds where legacy systems cannot natively emit meaningful events
Do not confuse CDC with domain events. CDC tells you what changed in a database. Domain events tell you what happened in the business. CDC is often necessary in migration; it is rarely sufficient as the long-term semantic contract.
Composition plane
This is the heart of the architecture.
Composition services consume domain events, reference data, APIs, and state snapshots to build fit-for-purpose views. Some of these are operational compositions—say, order promise or customer servicing context. Others are analytical compositions—say, profitability by legal entity and channel. Still others are regulatory compositions requiring effective-dated mappings and strict auditability.
This plane usually includes:
- Semantic mapping services or anti-corruption layers between bounded contexts
- State stores for materialized composed views
- Rules or policy engines for business interpretation
- Reconciliation pipelines for completeness, balancing, and exception management
- Lineage metadata that captures not just data origin but semantic transformation intent
And yes, Kafka is relevant here. A stream-processing approach can maintain incremental composed state efficiently. But event streams alone do not absolve you from data modeling. They merely make your modeling mistakes happen in real time.
Consumption plane
Consumers should receive products that match their needs:
- Low-latency operational read models
- Curated analytical datasets
- Feature sets for ML and decisioning
- Regulatory extracts with traceable rule application
One platform, many products. That is composition.
Domain semantics and topology
The phrase “domain topology” matters because architecture is not merely about components. It is about how meanings are arranged.
A useful topology usually distinguishes:
- Systems of record: authoritative for operational facts
- Systems of interpretation: apply business logic, policy, and temporal rules
- Systems of projection: publish read models or analytical products
- Systems of reconciliation: detect and resolve divergence
This sounds obvious until you see an enterprise warehouse attempting to be all four at once.
A better approach is to make semantic transitions explicit.
At each transition, ask hard questions:
- What is the bounded context here?
- Who is authoritative?
- What temporal assumptions are in play?
- What data defects are tolerable?
- What reconciliation must occur before this can be trusted?
- Is this a reusable composition or a local projection?
These are architecture questions, not implementation details.
Migration Strategy
If the platform is a composition engine, migration cannot be “move all data to the new platform and refactor later.” That is how enterprises get expensive replicas of existing confusion.
Use a progressive strangler migration.
Start with a business capability that suffers from semantic fragmentation and high coordination cost. Not a vanity use case. Not “let’s rebuild reporting.” Pick something with operational pain and visible business value: order promise, customer 360 for service, inventory visibility, claims status, revenue assurance.
Then proceed in stages.
Stage 1: Identify authoritative domain seams
Map bounded contexts and current system ownership. Do not start from tables; start from decisions and workflows. Determine where meaning is created, where it is changed, and where it is merely copied.
Stage 2: Establish event capture and semantic contracts
Where modern services exist, publish domain events. Where legacy systems dominate, use CDC initially but wrap it with semantic interpretation so downstream consumers do not couple directly to physical table changes.
Stage 3: Build the first composition product
Create one composed view for one business outcome. Keep it narrow enough to deliver, broad enough to matter. Include explicit reconciliation from the beginning.
Stage 4: Run dual operation and compare
This is where architects earn their keep. The legacy report or operational workflow continues. The new composition product runs alongside it. Differences are expected. In fact, the comparison is the migration asset. It reveals semantic mismatches, missing events, timing issues, and broken assumptions.
Stage 5: Shift consumption gradually
Move one downstream process at a time to the new composed product. Resist the urge to migrate every dependent system at once. Progressive replacement beats synchronized heroics.
Stage 6: Retire old transforms and central artifacts
Only after the new topology is trusted should old ETL chains, warehouse marts, or point-to-point integrations be retired. Until then, they remain part of the operating estate.
Here is the migration pattern.
A strangler migration is not a trick for application modernization alone. It is essential for data platforms because semantics are discovered during use, not fully known in advance.
Reconciliation discussion
Reconciliation deserves its own section because enterprises routinely underestimate it.
In a composed platform, reconciliation is the mechanism that answers, “Can we trust this assembled view enough to act on it?”
There are several forms:
- Completeness reconciliation: did we receive all expected events or records?
- Balance reconciliation: do totals align across domains, periods, or ledgers?
- State reconciliation: does the composed state match authoritative operational state within acceptable lag?
- Temporal reconciliation: are effective dates, event times, and processing times aligned correctly?
- Policy reconciliation: was the correct business rule version applied?
Without these, your beautiful event-driven architecture turns into a high-speed rumor mill.
A practical pattern is to maintain both:
- a fast path for operational composition, accepting bounded inconsistency,
- and a trust path for reconciled outputs, especially for finance, compliance, and executive reporting.
One is for acting quickly. The other is for sleeping at night.
Enterprise Example
Consider a global retailer with stores, e-commerce, wholesale channels, and regional fulfillment networks. The company wants a reliable “available-to-promise” capability across channels. This sounds like an inventory problem. It is not. It is a composition problem.
The relevant domains include:
- Product: sellable units, substitutions, pack structures
- Inventory: on-hand, reserved, damaged, in-transit stock
- Order management: demand allocation, hold statuses, cancellations
- Fulfillment: pick-pack-ship constraints, warehouse cutoffs
- Transportation: carrier capacity, lane performance
- Pricing and promotions: commitment windows and customer expectations
- Customer: service level entitlements
- Finance: valuation and transfer pricing implications
The retailer’s legacy architecture used nightly batch feeds into a warehouse and several operational point integrations. Store systems reported stock every few hours. The website had its own cache. Call center agents saw a different promise date than the checkout flow. Finance had another inventory number entirely.
The first instinct was to build a central inventory lakehouse with standardized product and stock tables. That would have produced a cleaner mess.
Instead, the company treated the platform as a composition engine.
Inventory, order, fulfillment, and transport domains published events to Kafka. Legacy store systems used CDC into a semantic adapter that emitted domain-style stock adjustment events. Composition services built an available-to-promise view by combining event streams with domain APIs for current exceptions and policy rules for cutoff windows. A reconciliation service compared composed availability against warehouse management snapshots and order allocation outcomes.
The result was not one universal inventory truth. It was several deliberate truths:
- Operational ATP truth for checkout and customer service
- Reconciled stock truth for finance and replenishment
- Exception truth for supply chain operations
That distinction mattered. Operational ATP tolerated seconds of lag and occasional fallback to cached rules. Finance did not. Trying to use one model for both had been the original failure.
Migration happened progressively. One region, one warehouse network, one channel at a time. The old warehouse reports remained for six months as a benchmark. Variance reports were reviewed daily. The biggest surprises were not technical; they were semantic. “Available” meant one thing in stores, another in the warehouse, and a third in e-commerce. Architecture did not erase that difference. It made it explicit and composable.
That is what good enterprise architecture looks like in the real world: fewer slogans, more clarity.
Operational Considerations
A composition engine is more demanding operationally than a passive data repository. That is the price of relevance.
Observability
You need observability for:
- event lag
- schema drift
- composition latency
- reconciliation exceptions
- state-store health
- policy version usage
- downstream freshness
Traditional pipeline monitoring is not enough. You also need semantic observability: are the business outcomes still coherent?
Data contracts
Domain teams should publish contracts for events, APIs, and reference data. But contracts must include more than field names and types. They need business definitions, effective dating rules, null semantics, identifiers, and deprecation plans.
Versioning
Composition services are highly sensitive to schema and meaning changes. Support versioned events and policy rule traceability. A field can stay technically compatible while becoming semantically dangerous.
Idempotency and duplicates
Kafka-based architectures must assume retries, duplicates, reordering at boundaries, and delayed consumers. Composition logic needs idempotent processing and explicit handling for late-arriving data.
Security and compliance
Composed views often mix personal, financial, and operational data. Access control must operate at the product level and, where needed, at row and column levels. Domain ownership does not remove enterprise compliance obligations.
Platform team model
A central platform team should provide shared capabilities: event infrastructure, metadata, quality tooling, state-store patterns, contract governance, lineage, and reconciliation frameworks. But the team should not become the owner of every business semantic. That way lies another integration empire. ArchiMate for governance
Tradeoffs
There is no free architecture here. Let us be honest about the tradeoffs.
Benefit: better semantic integrity
Cost: more explicit modeling work
You will spend more time defining domain boundaries, contracts, and composition rules. Good. That work was always there; you were just hiding it in ETL and spreadsheets before.
Benefit: progressive modernization
Cost: temporary duplication
During migration, old and new pipelines coexist. This is expensive and occasionally maddening. It is still cheaper than a failed big-bang rewrite.
Benefit: domain autonomy
Cost: increased coordination at composition points
Autonomous domains are useful until their outputs need to be assembled for enterprise decisions. Coordination does not disappear; it moves to where it belongs.
Benefit: real-time potential
Cost: more operational complexity
Event-driven composition with Kafka can support low-latency decisioning. It also introduces offset management, replay semantics, stream versioning, and state consistency issues. You are trading batch opacity for operational explicitness.
Benefit: reusable composition products
Cost: risk of central overreach
Shared composition services can save effort. They can also become politically contested if they attempt to own too much logic. Keep them focused on stable cross-domain needs.
Failure Modes
Most failures in this style of architecture are not caused by Kafka, or microservices, or the warehouse technology. They are caused by conceptual shortcuts. microservices architecture diagrams
1. Treating CDC as domain truth
CDC is a migration aid, not a semantic strategy. If consumers bind directly to table mutations, you have simply moved database coupling into the platform.
2. Recreating the canonical data model under a new name
Some teams call it a mesh. Others call it a semantic layer. Others call it a data product model. If the goal is still one universal enterprise entity model, the old trap remains.
3. Ignoring reconciliation
Fast composition without trust controls creates operational theatre. The dashboards look modern right up until month-end close.
4. Over-fragmenting domains
Too many tiny microservices and hyper-local data products create composition chaos. Domain-driven design is about meaningful boundaries, not maximal decomposition.
5. Building a platform that only analysts can use
If operational use cases are excluded, the platform becomes another reporting estate. Composition belongs close to decisions, not just reports.
6. Underestimating temporal semantics
Effective dates, processing dates, local business calendars, and correction events will hurt you if ignored. Time is a domain concept, not a metadata footnote.
When Not To Use
This architecture is not mandatory for every organization.
Do not use a composition-engine approach if:
- your business is simple enough that a conventional warehouse and a few operational integrations genuinely suffice;
- domain boundaries are weak because the company is still in an early, highly centralized operating model;
- there is no appetite for domain ownership or semantic contract discipline;
- your use cases are almost entirely historical analytics with little need for operational composition;
- the estate is so small that introducing Kafka, stream processing, and multiple composition products would be architecture as cosplay.
Also, do not use this pattern as an excuse to rebuild everything in microservices. Microservices are relevant when domain seams and deployment independence justify them. They are not a moral virtue.
A composition engine is for enterprises where cross-domain decisions are frequent, semantics vary materially by context, and migration from legacy cores must be incremental. If those conditions do not hold, simpler is better.
Related Patterns
Several patterns sit naturally around this architecture.
Data mesh
Useful when interpreted carefully. The strongest part of data mesh is domain ownership. The weakest implementations are those that stop at ownership and neglect cross-domain composition.
CQRS and materialized views
Very relevant. Many composed products are effectively read models built from multiple domain sources.
Event sourcing
Helpful in specific domains where event history is the natural source of truth. Not necessary everywhere.
Anti-corruption layer
Essential when integrating legacy systems or crossing bounded contexts with incompatible semantics.
Master data management
Still useful, especially for identity and reference domains. But MDM should not be mistaken for enterprise semantic unification.
Strangler fig pattern
Central to migration. Build the new composition topology around the old estate and shrink old dependencies gradually.
Lakehouse or warehouse semantic layers
These remain valuable, especially for analytical composition. They just should not be asked to carry the entire burden of operational and domain semantic integration.
Summary
The old image of the data platform as a giant storage center is past its sell-by date.
A modern enterprise does not win by collecting data in one place and hoping meaning emerges. It wins by composing domain truths into business capabilities. That requires a platform shaped by domain topology, not just by infrastructure layers. It requires bounded contexts, semantic ownership, reconciliation, and progressive migration. It often benefits from Kafka and microservices, but only when they serve business seams rather than fashion trends.
The core idea is simple enough to remember:
Store less hope in the center. Compose more meaning at the edges and across them.
When you do that, the architecture becomes more honest. Source truth is explicit. Derived truth is visible. Composed truth is deliberate. Reconciliation is built in. Migration becomes possible without fantasy deadlines. And the enterprise gets something rarer than a new platform.
It gets a system that understands its business well enough to change with it.
Frequently Asked Questions
What is a data mesh?
A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.
What is a data product in architecture terms?
A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.
How does data mesh relate to enterprise architecture?
Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.