⏱ 20 min read
A central data platform often begins life as a rescue mission.
One team steps in to clean up reporting chaos, standardize pipelines, tame warehouse sprawl, and make analytics vaguely trustworthy. For a while, this feels like progress. The dashboards stop disagreeing quite so loudly. Finance gets one revenue number instead of five. A proper lakehouse appears on the architecture diagram, and everyone breathes out.
Then success curdles.
The platform team becomes the front door for every new data request, every schema change, every quality issue, every integration with Kafka, every machine learning feature feed, every retention policy, every argument about what a customer is. The team that was supposed to enable flow becomes a customs checkpoint. Work piles up. Context gets lost. Delivery slows. Domain meaning decays. The platform becomes efficient at moving data while becoming strangely bad at preserving what the data means.
This is not a tooling problem. It is not solved by buying a better catalog, a shinier streaming engine, or another layer of orchestration software. It is a topology problem. More precisely, it is a mismatch between the shape of the enterprise and the shape of the teams building its data systems.
When data platforms ignore domains, they create a dangerous illusion: that all enterprise data can be managed as one homogeneous technical estate. It cannot. Revenue recognition is not the same thing as cart activity. Claims adjudication is not the same thing as customer support interaction. Manufacturing yield, exposure risk, marketing attribution, and subscription billing each carry their own semantics, invariants, cadence, and failure consequences. The technical substrate may be shared. The meaning never is.
This is where flow topology matters. If teams are arranged so that domain knowledge lives far away from data production and data consumption, flow degrades. Every change becomes a handoff. Every handoff becomes a translation. Every translation loses meaning. And the platform team, despite working heroically, becomes the slowest part of the value stream.
The uncomfortable truth is simple: a data platform without domains becomes a bureaucracy with good intentions.
The better answer is not to abolish the platform team. That would be cargo-cult decentralization. The answer is to put the platform in its proper place: as an enabling capability for domain-owned data products and domain-aligned streams of change. In other words, use domain-driven design to shape the semantics, use flow-based team topology to shape the operating model, and use progressive migration patterns to get there without breaking the estate.
Context
Most enterprises arrive here honestly.
They did not wake up one morning and decide to build a centralized bottleneck. They inherited fragmented operational systems, duplicated reporting logic, and years of integration sediment. A central platform team was often the only rational response. Someone had to impose standards on ingestion, metadata, security, storage, lineage, and compute. Someone had to create reusable building blocks. Someone had to stop every project from inventing its own half-broken pipeline framework.
So the platform team was formed. It built ingestion services, event streams, warehouse models, governed access patterns, and often a self-service promise. It also became the place where the enterprise outsourced semantic judgment. That is the point where trouble starts.
In domain-driven design terms, the enterprise does not have “data” as a single cohesive domain. It has many business domains, each with bounded contexts and their own ubiquitous language. The definition of “order,” “customer,” “active,” “settled,” “exposed,” or “available” is not globally stable. It is contextual. Once a central team starts standardizing these concepts without living inside the business context that gives them meaning, the models become politically neat but operationally wrong.
This is why centralization often produces clean diagrams and dirty semantics.
A good architecture respects the fact that data is not merely a technical asset. It is a projection of business behavior. If the business is domain-shaped, the data architecture must be domain-shaped too.
Problem
The classic anti-pattern looks like this:
- Source systems emit data into a shared platform.
- A central team owns ingestion, transformation, quality rules, canonical models, and downstream publication.
- Domain teams request changes through tickets, committees, or backlog prioritization.
- Consumers depend on central curated datasets because that is where governance and trust supposedly live.
This scales badly for three reasons.
First, change queues accumulate. A product team changes checkout. Finance introduces a revised revenue policy. Risk wants a new fraud signal. Customer support adds a lifecycle state. All of this lands with the platform team. The team must understand the source change, infer the business implication, update transformation logic, negotiate backward compatibility, and publish revised datasets. The queue grows not because the team is incompetent, but because all semantic change in the enterprise has been funneled through one organizational chokepoint.
Second, semantic drift becomes inevitable. The platform team sees tables and topics. The domain sees commitments, obligations, exceptions, and edge cases. If those two views are separated by handoffs, the technical model drifts from the business model. This produces a familiar corporate disease: trusted pipelines that tell the wrong story.
Third, ownership gets diluted. When no domain team owns the meaning and quality of its published data, everyone complains and nobody fixes. Platform teams end up carrying incidents for issues they did not create and cannot properly judge. Domain teams assume the platform will “sort out the data.” Consumers blame the platform for inconsistencies rooted in source behavior. Accountability vanishes into the architecture.
That is how a platform team becomes a bottleneck and a scapegoat at the same time. It is an ugly combination.
Forces
Several competing forces make this problem hard.
Standardization versus semantic fidelity
Enterprises need common controls: security, lineage, observability, retention, schema evolution practices, and shared infrastructure. But they also need domain-specific meaning. Push too hard on standardization and you erase the context that makes the data useful. Push too hard on local freedom and you get chaos, duplication, and governance holes. EA governance checklist
Autonomy versus interoperability
Domain teams should be able to publish and evolve their own data products without waiting on a central team. But consumers still need discoverability, contracts, and compatibility. Autonomy without interoperability is just federated confusion.
Event speed versus reconciliation reality
In Kafka-heavy estates, there is a temptation to treat streaming as truth and immediacy as quality. Real enterprises are messier. Messages arrive late. Events are duplicated. APIs fail. batch corrections happen. Reference data changes after the fact. A serious architecture must discuss reconciliation, not merely ingestion. Fast pipelines that cannot reconcile are elegant until month-end.
Platform leverage versus platform overreach
A shared platform is essential. Every domain should not be building its own lineage engine or access-control framework. But platform teams are most valuable when they provide paved roads, not when they become air traffic control for every taxiing movement.
Legacy gravity versus target-state purity
Most enterprises are not greenfield. They have ERP systems, warehouses, ETL jobs, brittle report dependencies, and a long tail of “temporary” integrations old enough to vote. Migration cannot assume a clean break. It must account for coexistence, strangling, and deliberate reconciliation.
These forces do not disappear. Architecture is not the art of denying tradeoffs. It is the art of choosing them on purpose.
Solution
The useful pattern is this: organize data ownership around business domains, with a platform team providing enabling capabilities rather than central semantic control.
That sounds fashionable. It is also practical.
The domain owns the meaning, quality intent, publication contracts, and lifecycle of the data it exposes. The platform provides the machinery: secure storage, streaming backbone, schema registry, catalog, observability, policy enforcement, CI/CD templates, and common data product scaffolding. Stream-aligned teams in the business domains publish data products close to where operational truth is created. Downstream consumers subscribe to those products through explicit contracts, not by reverse-engineering database exhaust.
This is flow topology applied to data architecture. Team boundaries are aligned to value flow and bounded contexts. The result is fewer semantic translations, faster change, and clearer ownership.
A simple way to say it:
The platform should make good data behavior easy. It should not become the interpreter of the business.
Domain semantics first
Domain-driven design gives the architecture its spine.
Each domain should define:
- its core business entities and events
- the language used to describe them
- the invariants that matter
- the quality expectations and reconciliation rules
- the data products it publishes
- the contracts and versioning strategy for consumers
This is more than naming topics well. It means understanding bounded contexts. “Customer” in billing, CRM, and fraud may be related but not identical. Forcing one canonical enterprise customer model too early often creates fiction. Better to make context explicit and integrate through published mappings, reference models, or anti-corruption layers where needed.
Platform as product, not process gate
The platform team should behave like an internal product organization. It offers:
- self-service provisioning
- reusable data pipeline templates
- managed Kafka topics and schema governance
- standardized monitoring and lineage
- policy-as-code for access and retention
- storage and compute abstractions
- quality and contract testing frameworks
The platform should not require a ticket for every transformation or semantic change. If it does, it is not self-service. It is concierge engineering with a backlog.
Data products, not central curated monoliths
A data product is not just a table with a proud name. It is a published, discoverable, governed, versioned interface carrying domain meaning. It has an owner. It has expectations. It has consumers. It has a support model. It can be event streams, operational APIs, warehouse views, feature feeds, or batch exports, but the semantics must be explicit.
Architecture
The target architecture usually combines operational microservices, event streaming, domain data products, and a federated governance layer. microservices architecture diagrams
A few points matter here.
First, Kafka is useful when the business benefits from events as first-class signals: orders placed, invoices issued, claims approved, shipments delayed, accounts suspended. But Kafka is not the architecture; it is a transport and retention mechanism. The architecture lives in the contracts, ownership, and semantics around those events. event-driven architecture patterns
Second, microservices can help if they align with bounded contexts and emit meaningful domain events. They hurt when they atomize a domain into technical shards and then force data consumers to reconstruct business truth from a hailstorm of low-level events.
Third, not every product must be real-time. Some domains are naturally event-centric; others need reconciled periodic outputs. A monthly finance close dataset can still be a data product. Architecture gets healthier when it stops pretending all truth is immediate.
Reconciliation is a first-class concern
This deserves emphasis because many glossy platform designs skip it.
Operational reality creates divergence:
- out-of-order events
- duplicate messages
- missing events
- late-arriving reference data
- source system corrections
- bulk backfills after outages
- competing truth between event logs and system-of-record extracts
A serious domain-owned data product defines how reconciliation works. For example:
- event stream as operational signal
- daily ledger extract as control dataset
- deterministic replay from Kafka for short retention windows
- periodic comparison against source-of-record
- exception topics and compensating corrections
That sounds less glamorous than “real-time insights.” It is also how grown-up enterprises survive audit and quarter-end.
The key idea is that a data product is not “good” because it streams. It is good because it can explain itself when reality gets messy.
Migration Strategy
You do not migrate from centralized platform bottleneck to domain-aligned data ownership with a memo and a workshop. You migrate by progressively strangling centralized responsibilities while strengthening domain capabilities.
A sensible migration has stages.
1. Identify bounded contexts and value streams
Start with the business, not the warehouse. Where do core business changes originate? Which domains own those changes? Which datasets are really cross-domain views versus local truths? This exercise typically reveals that many so-called enterprise models are really unstable compromises.
Map domains, producers, consumers, critical decisions, and current pain. Especially note where the platform team is acting as translator, approver, or semantic arbitrator. Those are likely migration hotspots.
2. Classify data products
Not all datasets deserve equal treatment. Separate them into:
- authoritative domain products
- derived analytical products
- enterprise reference products
- temporary compatibility outputs
- legacy curated assets to be retired
This avoids trying to “productize” every historical table. Some assets should simply be strangled and buried.
3. Introduce domain ownership for new change first
The cleanest first move is not rewriting everything. It is saying: all net-new business capabilities must publish domain-owned data products through platform guardrails. New topics, schemas, and curated outputs are owned by the relevant domain team. The platform provides templates, policies, and tooling.
This changes the future before it fights the past.
4. Strangle legacy central transformations
Pick a high-value domain area where the platform team repeatedly mediates changes—orders, billing, claims, inventory, policy administration. Then carve out one product at a time.
A common progression:
- central pipeline still runs
- domain team publishes a parallel product
- consumers are migrated gradually
- outputs are compared for reconciliation
- old central transform is deprecated
- responsibility transfers fully
That parallel run matters. Enterprises should distrust any migration plan that assumes semantic equivalence without proving it.
5. Move governance from approval to guardrails
The central data governance model must evolve. If governance is a committee that blesses changes one by one, decentralization will fail. Replace approval-centric governance with: ArchiMate for governance
- standard product metadata requirements
- schema compatibility rules
- quality SLOs
- lineage publication
- policy enforcement automation
- consumer notification standards
- incident responsibilities
Governance should shape behavior by default. It should not become another queue.
6. Refactor the platform team itself
This is often overlooked. If the same platform team keeps its old charter, it will continue attracting semantic work. Change its mission.
The platform team should own:
- common infrastructure
- enablement and developer experience
- platform reliability
- reusable components
- federation support
- standards and controls
It should explicitly stop owning domain semantics, bespoke transformations, and local data issue triage except as temporary migration support.
Enterprise Example
Consider a multinational retailer with e-commerce, stores, loyalty, finance, and supply chain systems.
Initially, the retailer built a large central data platform team to unify reporting. Orders from commerce, point-of-sale transactions, stock movements, loyalty interactions, and finance postings all flowed into a shared lakehouse. The platform team created a canonical “customer,” “order,” and “product” model consumed by analytics, pricing, and forecasting teams.
This worked until the retailer accelerated digital change.
The commerce domain changed checkout flows every few weeks. The loyalty team introduced new membership rules. Finance changed revenue allocation for bundles. Supply chain revised inventory availability logic to support ship-from-store. Every one of these changes required central model updates. The platform team became overwhelmed. Analytics lagged releases by weeks. The “customer 360” model contained contradictory meanings depending on whether a person was anonymous, logged in, in-store identified, or loyalty-enrolled. Forecasting trusted inventory numbers that finance did not. Support dashboards disagreed with commerce KPIs. Nobody was lying. The architecture was.
The retailer changed course.
Commerce, loyalty, finance, and supply chain each became owners of domain data products. Commerce published checkout events, order lifecycle products, and cart conversion outputs. Loyalty published membership status, points ledger, and benefit entitlement products. Finance published invoice, payment, and reconciled revenue products. Supply chain published inventory position, reservation, and fulfillment promise products.
Kafka handled event distribution where needed, but not every product was streaming. Finance still produced reconciled daily and monthly products because financial truth required controls and adjustments. The platform team provided the event backbone, schema registry, access controls, lineage, and a standard product publishing framework.
During migration, the old central curated “enterprise order” dataset ran in parallel with the new commerce-owned order product. Reconciliation jobs compared gross sales, net sales, returns, payment status, and fulfillment states across both outputs. Exceptions were reviewed jointly by commerce and finance. Only after three reporting cycles and a quarterly close did major consumers switch over.
The outcome was not magical. It was better.
Lead time for domain data changes dropped because commerce no longer waited for platform backlog triage. Semantic quality improved because the teams changing the operational behavior also changed the published data contracts. The platform team became more effective because it stopped trying to be an expert in promotions, accounting policy, and stock reservation all at once. Governance also improved because owners were explicit.
The retailer still had shared enterprise views. But those views were assembled from domain-owned products, not invented centrally in isolation.
That is the distinction that matters.
Operational Considerations
A domain-aligned data architecture succeeds or fails in operations, not in strategy decks.
Ownership and support
Every data product needs:
- a named owning team
- support hours and escalation path
- quality objectives
- change notification rules
- deprecation policy
- consumer dependency visibility
If the owner field says “data platform,” you are probably back where you started.
Contracts and versioning
For Kafka topics, use schema compatibility rules and explicit version strategies. For tables and views, document stability expectations, refresh cadence, late-data behavior, and key semantics. Breaking changes should be rare and intentional. Consumers should not learn about changes from failing jobs in production.
Observability
At minimum, track:
- freshness
- volume anomalies
- schema changes
- null and distribution drift
- reconciliation discrepancy rates
- consumer access patterns
- incident ownership
Observability should expose semantic health, not just job runtime. A green pipeline that publishes nonsense is still an outage.
Security and policy
Federation does not mean relaxing controls. In fact, it usually requires stronger automation:
- classification tags
- row and column-level access controls
- retention enforcement
- jurisdictional handling
- audit lineage
- consent and privacy policy propagation
These controls belong in the platform because domains should not each reinvent them poorly.
Data lifecycle and storage choices
Some products are best represented as immutable event streams. Others need snapshot tables, slowly changing dimensions, feature stores, or serving APIs. Do not force one storage pattern everywhere. Architecture should fit access and semantics.
Tradeoffs
There is no free lunch here.
A domain-oriented approach improves semantic fidelity and flow, but it introduces coordination costs. You may end up with more products, more visible overlap, and more negotiation across contexts. Some duplication is not failure; it is the natural price of bounded contexts.
You also need stronger engineering maturity in domain teams. If they cannot own contracts, quality, or operational support, pushing data ownership into domains simply spreads fragility around the enterprise.
Another tradeoff is consistency. A central team can impose superficial uniformity faster. Domain ownership produces healthier diversity, but it can feel messy. Leaders who crave one canonical everything often find this uncomfortable.
And there is a platform tradeoff too. Building true self-service platform capabilities is harder than running a ticket queue. It requires product thinking, internal UX, investment in automation, and relentless reduction of accidental complexity. Many firms claim to have a platform when they actually have a service desk with Terraform.
Failure Modes
Several failure modes are common.
Fake decentralization
The organization declares domain ownership, but all meaningful changes still require central approval, bespoke platform work, or architecture review boards. This is decentralization in PowerPoint only.
Domain teams without domain authority
If data products are assigned to teams that do not control source behavior or business definitions, ownership becomes ceremonial. The right team must be close enough to operational truth to fix the causes, not merely polish the outputs.
Canonical model obsession
Trying to force a single enterprise-wide canonical model too early is a reliable way to stall. Canonical models are sometimes useful at the edges, but they should emerge where they genuinely reduce complexity, not as a political desire for neatness.
Event theater
Teams emit endless Kafka events without clear product contracts, retention intent, or consumer semantics. Streams become another data swamp, only faster.
No reconciliation path
This one is lethal. If there is no mechanism to compare published products against source-of-record controls, incidents will become theological debates. Mature systems are designed for disagreement and correction.
Platform neglect
In reaction to central bottlenecks, some enterprises underinvest in the shared platform. Then every domain rebuilds tooling, governance weakens, and interoperability collapses. The answer to bad centralization is not anarchy.
When Not To Use
This approach is not universal.
Do not lean heavily into domain-aligned data ownership if your enterprise is very small, your data estate is simple, and one team genuinely understands the entire business context. In that case, heavy federation can be more ceremony than value.
Do not use it if your domain teams lack the engineering capability or operating mandate to own data products. You cannot decentralize responsibility into a vacuum.
Do not over-apply it for low-change reporting estates where the primary need is stable consolidated reporting from a handful of systems and semantic change is rare. A central team may be perfectly sufficient there.
And do not pretend microservices and Kafka are mandatory. If your source systems are packaged applications with batch exports and your consumers are mainly finance and regulatory reporting, a strong batch-oriented architecture with explicit domain ownership may be the better answer.
The principle is domain ownership of meaning and flow. The implementation can be event-driven or not.
Related Patterns
Several architecture patterns reinforce this approach.
Team Topologies
Stream-aligned teams should own change close to the business flow. Platform teams provide self-service capabilities. Complicated-subsystem teams may help with advanced data science or optimization engines. Enabling teams can coach domains on product publishing and governance.
Bounded Contexts
From domain-driven design, bounded contexts prevent false universal models. They help architects reason about where translation is necessary and where direct sharing is appropriate.
Strangler Fig Pattern
This is the practical migration pattern for replacing central curated assets with domain-owned products gradually, without a dangerous big-bang rewrite.
Anti-Corruption Layer
Useful when a domain must consume or publish against a legacy canonical model without polluting its own language.
Event Sourcing and CQRS
Relevant in some domains, but not required. They can help where business history and state transitions matter deeply, but they are easy to misuse. They should serve the domain, not architecture fashion.
Summary
Central data platform teams become bottlenecks when they are asked to do something no team can do well at enterprise scale: own the meaning of every domain’s data.
That burden accumulates slowly. First as helpful standardization. Then as central curation. Then as semantic mediation. Eventually the platform team sits in the middle of every change, every disagreement, every delay, and every incident. The flow of work slows because the architecture has ignored the shape of the business.
The fix is not to abandon the platform. It is to put the platform in the right role.
Use domain-driven design to anchor semantics in bounded contexts. Let stream-aligned domain teams own the data products closest to operational truth. Use the platform as an enabling product that provides guardrails, tooling, security, lineage, and interoperability. Migrate progressively with a strangler approach. Reconcile relentlessly. Trust contracts more than central intuition. And accept that healthy enterprises are not built from one universal model, but from well-governed relationships between many meaningful ones.
A good data platform should feel like a highway system.
It should make movement fast, safe, and boring.
It should not require every vehicle to stop at headquarters and explain where it is going.
Frequently Asked Questions
What is a data mesh?
A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.
What is a data product in architecture terms?
A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.
How does data mesh relate to enterprise architecture?
Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.