⏱ 19 min read
Most data platforms are sold as plumbing.
A little ingestion here, a warehouse there, some streaming in the middle, and a dashboard on top to make everyone feel the money was well spent. The pitch is always the same: centralize the data, standardize the stack, unlock insights. It sounds clean. It sounds rational. It also misses the point.
A modern data platform is not primarily a storage system, nor an analytics engine, nor a fancy message bus with better branding. In an enterprise, its real job is much more political and much more structural: it is a consumer contract layer. It sits between producers that change for their own reasons and consumers that depend on stable meaning. If you design it like infrastructure alone, it will betray you. If you design it like a contract boundary, it starts to behave.
That distinction matters because enterprises do not fail from lack of databases. They fail from semantic drift. “Customer” means one thing in billing, another in CRM, a third in e-commerce, and a suspiciously optimistic fourth in marketing. Teams publish events with names that sound universal and payloads that are little more than persistence models wearing a JSON coat. Consumers build on top of these feeds, then wake up six months later to discover the producer has “improved” the schema and quietly changed the business meaning. This is how analytics turns into archaeology.
The uncomfortable truth is that data platforms inherit all the hard problems of distributed systems and all the hard problems of domain modeling. They are not a neutral substrate. They are where enterprise semantics get routed, translated, versioned, delayed, disputed, and eventually audited. Treating them as a contract layer is not philosophical decoration. It is the only way to build one that survives contact with a real organization.
Context
Most large organizations already have the ingredients:
- operational systems owned by product teams
- microservices publishing events to Kafka or similar
- a warehouse or lakehouse for reporting and machine learning
- ETL and ELT pipelines built by data engineering
- APIs consumed by partner teams, channels, and regulators
- a long tail of batch jobs nobody fully trusts
On paper, this looks modern. In practice, it often behaves like a maze of accidental dependencies.
A pricing service emits price updates. An order service emits order lifecycle events. A customer service emits profile changes. Downstream consumers—fulfillment, finance, fraud, marketing, support, BI—subscribe, transform, join, enrich, and cache. Soon every consumer has encoded assumptions about event timing, uniqueness, completeness, and meaning. The platform team then tries to impose “canonical models” after the fact, usually as a grand central schema no one truly owns.
This is where domain-driven design becomes useful, not as ceremony but as a survival tool. Bounded contexts exist because language breaks at the seams of responsibility. The platform cannot abolish those seams. It has to make them explicit and usable.
The data platform, then, should be understood as a place where producer models are not directly exposed as enterprise truth. Instead, the platform brokers contracts for consumers: stable, intentional, versioned representations aligned to domain semantics and consumption needs.
That sounds subtle. It is not. It changes everything.
Problem
The default architecture for enterprise data sharing is deceptively simple:
- source systems publish data
- platform ingests data
- consumers use data
The flaw sits in step 2. “Ingests data” usually means “copies producer structures downstream and hopes transformations sort it out later.” That creates four familiar pathologies.
1. Producer leakage
Operational schemas leak into every consumer-facing interface. Table columns, service DTOs, and internal event structures become de facto enterprise contracts. The producer changes an enum, splits a field, alters status behavior, or redefines a lifecycle transition. Downstream breaks are inevitable.
2. Semantic inconsistency
Two consumers subscribe to the same event and derive different business meaning. One treats order_confirmed as financially binding; another treats it as customer acknowledgement. Both can point to the same payload. Neither is entirely wrong. The contract was.
3. Integration by archaeology
Consumers reverse-engineer business semantics from historical data patterns, Slack threads, and tribal knowledge. They do not integrate with a domain model; they excavate one.
4. No clear place for reconciliation
Real enterprises do not have one source of truth. They have competing sources of authority depending on the question. Finance reconciles revenue. Operations reconciles inventory. Customer support reconciles identity. If the platform lacks an explicit contract layer, reconciliation becomes ad hoc, buried in every pipeline, impossible to audit.
In short: a platform that merely routes bytes will eventually route confusion.
Forces
Any serious architecture here has to respect a set of competing forces.
Stability versus autonomy
Producer teams need freedom to evolve. Consumer teams need stability. You cannot maximize both. A contract layer exists to create a managed asymmetry: producers evolve internally, while external contracts change deliberately.
Domain fidelity versus standardization
Executives love enterprise-wide canonical models. Architects should be suspicious. Standardization is useful for cross-cutting concepts, but it often crushes domain meaning. A “customer” object that tries to serve retail onboarding, healthcare eligibility, and B2B invoicing becomes a taxidermied animal: all the parts are there, but nothing moves.
Real time versus correctness
Kafka makes propagation fast. It does not make it correct. Event streams are late, duplicated, reordered, and occasionally wrong. The faster your consumers react, the more exposed they are to transient inconsistency. Some contracts should be low-latency. Others should be reconciled, periodic, and authoritative.
Local optimization versus enterprise operability
A team can always build one more transformation to suit a local need. At scale, this creates a brittle mesh of hidden semantics. The platform must decide which transformations belong in consumer teams and which deserve elevation into a shared contract.
Analytical flexibility versus governed meaning
Analysts want raw access. Regulators want lineage. Product teams want speed. Security wants control. A good platform gives different access modes without pretending they are the same thing.
These forces are why data platform architecture is never just a technology choice. It is an agreement about where meaning is stabilized.
Solution
The core idea is simple: design the data platform as a set of consumer-oriented contracts, not as a mirror of producer internals.
This means the platform exposes a small number of deliberate interfaces:
- raw ingestion zones for preservation and traceability
- domain-aligned data products for bounded-context usage
- consumer contract views or streams for stable downstream integration
- reconciliation layers for authoritative cross-context assertions
- routing and mediation policies to govern who gets what shape, with what guarantees
The platform becomes a translation and stabilization layer between operational change and consumer dependency.
Here is the shape of it.
The important move is between domain data products and the consumer contract layer.
A domain data product reflects the language and authority of a bounded context. It is owned close to the domain. An order domain product should speak in the language of order management, not in the language of every team that cares about orders.
A consumer contract, by contrast, is shaped around downstream needs while preserving explicit semantics. It may aggregate, redact, rename, version, or delay fields. It may include derived statuses that are meaningful to finance but not native to order management. It is not “raw truth.” It is governed truth for a use case.
That does not make it fake. It makes it usable.
This is where many platform teams go wrong. They hear “consumer-oriented” and build bespoke feeds for every downstream team. That is just a different path to chaos. The right model is layered:
- preserve raw facts
- model domain facts
- expose intentional contracts
- reconcile where authority spans domains
The contract layer is not a dumping ground for custom projections. It is a curated set of stable interfaces.
Architecture
A practical architecture usually contains five layers.
1. Source and event emission layer
Operational systems publish domain events or expose CDC streams. Kafka is often the backbone here because it handles fan-out well and supports replay, but Kafka is not the architecture. It is just a very capable conveyor belt. event-driven architecture patterns
Events emitted here should be domain events where possible, not database mutation gossip. “OrderShipped” is useful. “RowUpdatedInShipmentTable” is a cry for help.
Still, many enterprises begin with CDC because their legacy systems cannot emit proper events. That is acceptable as a migration tactic if you are honest about the semantics: CDC is a change feed, not a business contract.
2. Raw preservation layer
This layer stores unmodified input with lineage and timestamps. Its purpose is not broad consumption. Its purpose is forensic traceability, backfill, replay, and evidence during disputes.
This is where you keep what actually arrived, when it arrived, and from whom.
3. Domain data product layer
Here, inputs are normalized into bounded-context views. Identity rules, event ordering logic, deduplication, enrichment within context, and domain-level quality checks happen here. Ownership matters. The team closest to the domain should own the semantics, even if the platform team operates the runtime.
These products should carry explicit metadata:
- source authority
- freshness expectations
- identity keys
- allowed nullability
- semantic definitions
- deprecation policy
This is the bridge between data engineering and DDD. Without bounded context ownership, all “data products” become glorified shared tables.
4. Reconciliation layer
This layer settles questions no single bounded context can answer alone.
For example:
- Is an order financially recognized?
- Is an item truly available to promise?
- Is this person the same regulated customer across channels?
- Has a claim been fully adjudicated?
Those answers often require combining signals from order management, inventory, payments, customer identity, and compliance. They should not be reinvented by every consumer. Reconciliation services or batch/stream hybrids establish authoritative assertions and exceptions.
5. Consumer contract layer
This is the public frontage of the platform. It can expose:
- versioned Kafka topics
- serving tables
- APIs
- materialized views
- outbound partner feeds
Each contract should state:
- intended consumers
- semantic meaning
- change policy
- delivery guarantees
- latency profile
- lineage back to domain products and sources
Think of this as the anti-corruption layer for the enterprise’s own data estate.
Domain semantics: the part people skip
The hardest problem is not routing. It is naming what the data means.
A contract is not stable because the schema stays the same. It is stable because the meaning stays legible. A field called status is not a contract. It is an argument waiting to happen.
Good consumer contracts lean into semantic precision:
financial_recognition_statusavailable_to_promise_quantitycustomer_contactability_preferencefulfillment_ready_at
These names are longer because they are cheaper.
This is pure domain-driven design territory. Ubiquitous language should not stop at service boundaries. The platform should encode it in contracts, documentation, lineage, quality rules, and deprecation notices.
A useful test: can a downstream team explain a field’s business meaning without calling the producer team? If not, the contract layer has failed.
Migration Strategy
No enterprise gets to start fresh. You inherit nightly ETL, fragile CDC, duplicated marts, and a warehouse full of suspiciously similar “gold” tables. So the migration strategy matters as much as the destination.
This is a classic strangler move, but applied to data semantics rather than just application routing.
Step 1: map existing producer-consumer dependencies
Find who consumes what, at what latency, and for what decision. This sounds obvious and is usually missing. Most organizations know where pipelines run; they do not know what business commitments sit on top of them.
Classify consumers:
- exploratory analytics
- operational reporting
- decision automation
- external/regulatory
- downstream service orchestration
The last three are where contract discipline matters most.
Step 2: separate raw preservation from consumer access
If consumers are pulling directly from ingestion or CDC layers, stop adding more. Preserve raw for replay and audit, but begin steering new consumer use cases toward curated contracts.
Step 3: identify high-value contract domains
Pick domains where semantic instability is causing real pain. Orders, customers, payments, inventory, and claims are common starting points. Build domain products first, not enterprise-wide canonical models.
Step 4: create compatibility contracts
For existing consumers, introduce contract topics or views that emulate current behavior while cleaning semantics underneath. This reduces migration friction. A contract layer should absorb producer churn so consumers can migrate on business timelines, not producer sprint cycles.
Step 5: introduce reconciliation for disputed facts
Do not pretend all inconsistencies can be solved with better schemas. Some require explicit business arbitration. Build reconciliation logic as a first-class capability with exception reporting.
Step 6: progressively cut consumers over
Migrate consumers one segment at a time. Keep parallel runs where decisions matter. Compare outputs, investigate deltas, and only then retire legacy feeds.
Step 7: enforce governance at the edge
Once contracts exist, prevent direct dependency on unstable internal structures for critical consumers. Otherwise the old path will regrow like ivy.
Reconciliation during migration
This deserves emphasis because it is usually underdesigned.
When you migrate consumers from legacy feeds to platform contracts, you will uncover differences:
- duplicate events previously ignored
- late-arriving records changing historical aggregates
- identity mismatches across systems
- status transitions interpreted differently
- silently dropped records in old jobs
If you treat these as “data quality issues” alone, you will stall. Many are really semantic disagreements. Reconciliation must produce both:
- a resolved contract output
- an exception model for unresolved cases
In other words, the platform should not just publish “the answer.” It should publish where the answer is uncertain and who owns that uncertainty.
That is how trust is built.
Enterprise Example
Consider a global retailer with e-commerce, stores, marketplace sellers, and a finance organization that closes books across 40 countries.
They had Kafka topics from order, payment, and shipment services. They had a cloud warehouse fed by CDC from ERP and store systems. They also had at least nine different definitions of “net sales.”
The order service published order_completed. Marketing used it for campaign attribution. Fulfillment used it to begin picking. Finance used it in a revenue mart after subtracting returns, tax, fraud cancellations, and payment failures. Support used it to answer “where is my order?” Everyone thought they were using the same event. They were not.
The first mistake was assuming the order service event could be the enterprise contract. It could not. It represented one bounded context: order capture.
The architecture was refactored into:
- raw Kafka and CDC preservation
- domain products for order capture, payment settlement, shipment execution, and returns
- a reconciliation service for commercial order state
- versioned consumer contracts:
- fulfillment_order_ready
- finance_recognized_sale
- customer_order_timeline
- marketplace_partner_order_export
A few hard choices followed.
Fulfillment needed low latency and tolerated eventual consistency. Finance needed correctness and accepted delay. Support needed a customer-readable timeline, which required combining noisy events into a stable narrative. Marketplace partners needed a redacted external contract and stricter versioning than internal consumers.
This was not one “golden order table.” It was a set of contracts with different semantics, latency, and governance. EA governance checklist
The migration ran in waves. First, finance consumed a reconciled revenue contract in parallel with its legacy ETL mart. Variance reports were produced daily. Some differences revealed defects in the new logic; many exposed long-accepted defects in the old jobs. Then support moved from joining six source tables to a contract API built off the customer order timeline product. Call handling time dropped because agents no longer interpreted internal statuses manually. Finally, partner feeds were switched from direct warehouse extracts to versioned outbound contracts, reducing breakages during internal service changes.
The result was not dramatic in the way conference talks like. No one said “we achieved data mesh enlightenment.” What changed was more important: teams stopped negotiating field meanings in every integration. The platform became a place where that negotiation happened once, visibly, with ownership.
That is architecture doing its job.
Operational Considerations
A contract layer is only credible if it behaves operationally.
Versioning
Version contracts explicitly. Backward compatibility rules should be published and enforced. Breaking changes should create new versions, not surprise payloads.
Lineage and observability
Every contract should trace back to source events, transformations, and reconciliation logic. Observability must cover:
- lag
- freshness
- schema drift
- volume anomalies
- reconciliation exceptions
- contract usage
If you cannot see who depends on a contract, you do not have governance; you have hope. ArchiMate for governance
Data quality tied to business meaning
Null checks and row counts are not enough. Measure semantic quality:
- percentage of orders with unresolved payment state
- number of customer identities with conflicting regulatory IDs
- inventory availability confidence
- percentage of revenue recognized after SLA window
Security and policy
A consumer contract layer is a good place to enforce masking, minimization, residency, and purpose-bound access. It is far easier to secure intentional contracts than sprawling raw copies.
Runtime choices
Not everything needs to be streaming. Kafka is excellent for event propagation and near-real-time contracts. Warehouses and lakehouses are often better for reconciled, historical, and analytically rich contracts. The right platform uses both without romantic attachment.
A bad architecture debates stream versus batch as ideology. A good one asks which guarantee the consumer actually needs.
Tradeoffs
This approach is not free.
First, it adds layers. Some engineers will complain that you are “duplicating data” and “adding unnecessary transformations.” They are half right. You are adding structure because unmanaged reuse is more expensive than managed duplication.
Second, it requires stronger ownership. Domain teams must participate in semantics, not just payload emission. Platform teams must act as stewards of contracts, not ticket-driven pipeline operators.
Third, it slows down naive publishing. A producer can no longer toss an internal schema into Kafka and declare victory. Good. That was never a platform strategy.
Fourth, consumer-specific contracts can proliferate if poorly governed. The answer is not to avoid contracts; it is to tier them:
- enterprise shared contracts
- domain shared contracts
- edge-specific derived contracts with clear lifecycle rules
The tradeoff is deliberate friction in exchange for less accidental coupling.
I would take that deal every time.
Failure Modes
There are several predictable ways this goes wrong.
Canonical model mania
The enterprise creates one giant universal model and forces every domain through it. This produces endless governance meetings and impoverished semantics. It looks tidy in PowerPoint and collapses under real business change.
Contract sprawl
Every consumer gets a bespoke projection. The platform becomes a reporting factory with streaming lipstick. Soon nobody knows which contracts are strategic versus temporary.
Producer abdication
Domain teams publish low-quality events and expect the platform to “fix it later.” The platform can mediate semantics, but it cannot conjure missing business facts.
Reconciliation hidden in dashboards
Cross-domain truth gets recreated separately in finance SQL, support tooling, and ML features. The platform then exports “trusted data” that nobody actually trusts.
Governance by document
If semantic definitions live only in Confluence, they are already stale. Contracts need executable governance: schemas, tests, lineage, SLAs, and deprecation controls.
Over-promising real time
Teams use streaming contracts for decisions that actually require settled truth. Then they discover event-time disorder, retries, and reversals the hard way. Real time is a feature. Correctness is a requirement.
When Not To Use
There are cases where a heavy consumer contract layer is unnecessary.
If you are a small company with a handful of systems and one analytics team, raw-plus-warehouse may be enough. Do not build a semantic ministry before you have semantic conflict.
If consumers are exploratory analysts and data scientists working directly with curated domain products, forcing every use case through formal contracts may slow learning.
If the business domain is simple, stable, and tightly centralized, direct integration can be acceptable for a while.
And if your source systems are so poor that core identifiers and event integrity are absent, your first problem is operational system quality, not contract design. The platform cannot be a moral laundering machine for broken source data.
Use this pattern when:
- many consumers depend on shared business facts
- producer change is frequent
- semantics differ across bounded contexts
- regulatory, financial, or operational decisions require traceable meaning
- reconciliation is a recurring enterprise concern
In those conditions, not having a contract layer is the expensive choice.
Related Patterns
This architecture sits near several familiar patterns, but it is not identical to any one of them.
Anti-Corruption Layer
In DDD, an anti-corruption layer protects one model from another. The consumer contract layer is essentially an enterprise-scale anti-corruption layer between producer internals and downstream consumers.
Data Products
Useful, but insufficient alone. A data product aligned to a bounded context is necessary. It does not automatically become the right interface for every consumer.
Event-Carried State Transfer
Common in microservices and Kafka ecosystems. Useful for propagation, dangerous when mistaken for stable business contracts.
CQRS Read Models
Consumer contracts often look like read models. The difference is enterprise scope and governance. These are not just app-local projections; they are shared interfaces with lifecycle policy.
Strangler Fig Migration
Essential for adoption. Replace unstable direct dependencies progressively, with parallel runs and reconciliation rather than big-bang rewrites.
Summary
A data platform is not just where data lands. It is where meaning is negotiated into something stable enough to build on.
That is why I argue it should be designed as a consumer contract layer.
Raw data still matters. Domain-aligned products still matter. Kafka, microservices, CDC, warehouses, and APIs all still have a role. But the architectural center of gravity should be the contract boundary between producer autonomy and consumer dependence. That is where semantics need to be explicit, versioned, and operationally governed. microservices architecture diagrams
The key ideas are straightforward:
- do not expose producer internals as enterprise truth
- model domain products around bounded contexts
- create intentional consumer contracts with clear semantics
- make reconciliation a first-class capability
- migrate progressively using strangler patterns and parallel validation
- govern contracts through executable controls, not documentation alone
In real enterprises, the question is rarely whether data exists. The question is whether downstream teams can rely on what it means tomorrow.
That is the job.
Everything else is plumbing.
Frequently Asked Questions
What is a data mesh?
A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.
What is a data product in architecture terms?
A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.
How does data mesh relate to enterprise architecture?
Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.