Your Data Platform Is Not a Platform

⏱ 20 min read

Most so-called data platforms are not platforms at all. They are shared utility corridors: a tangle of pipelines, storage accounts, naming conventions, permissions, and heroic workarounds held together by tribal memory and a few overworked engineers. They are sold as neutral infrastructure, but they behave like centralized applications with no product model, no clear ownership, and no stable contract with the business.

That is the first uncomfortable truth.

The second is worse: many enterprises do not really have a data problem. They have a boundary problem. Data pipelines become the place where unresolved domain semantics go to die. Teams push events they do not understand, transform records they cannot define, and publish tables whose meaning changes every quarter. Then leadership wonders why the “platform” is slow, brittle, and politically radioactive.

A real platform enables others to move faster without negotiating every step. A fake platform becomes a coordination tax.

If you are running Kafka, warehouses, lakehouses, streaming jobs, CDC connectors, ETL tools, and a catalog, but every meaningful change still requires cross-team choreography, your issue is not tooling. It is topology. Specifically, it is the mismatch between pipeline ownership and domain boundary topology. event-driven architecture patterns

This is where domain-driven design earns its keep. Not as a fashionable set of sticky notes, but as a practical way to decide who owns meaning, where data contracts live, how reconciliation happens, and which pipelines belong inside a domain versus across domains. Once you see the platform through that lens, a lot of enterprise pain stops looking mysterious. It starts looking structural.

The design principle is simple and hard at the same time:

Data pipelines should be owned where domain meaning is created, not where integration convenience is highest.

That one sentence will upset central data teams, integration teams, and sometimes the domains themselves. Good. Architecture that never upsets anyone is usually just diagramming.

Context

Enterprises tend to arrive here honestly.

They begin with a handful of operational systems: ERP, CRM, billing, order management, supply chain, customer channels. Reporting is initially modest, so a central BI or integration team extracts data into a warehouse. This works for a while. Then digital channels arrive. Then mobile. Then APIs. Then event streaming. Then machine learning. Then “real-time.” Every new initiative leaves behind another path for moving data.

Soon the organization has three kinds of flow:

Operational flow between systems of record and systems of engagement
Analytical flow from source systems into reporting, forecasting, and data science
Event flow for near-real-time propagation, integration, and automation

At small scale, one central team can govern this. At enterprise scale, it becomes fantasy.

The platform team responds the way platform teams always respond: standardize more, centralize more, add guardrails, templates, golden paths, and approval workflows. Some of that is necessary. Too much of it turns the platform into an airport security line.

Meanwhile, the business domains continue to change. Product introduces bundles. Finance redefines revenue recognition. Sales changes account hierarchy. Operations creates a new fulfillment path. Legal splits regional handling rules. None of these changes are merely technical. They alter domain semantics. They change the meaning of customer, order, shipment, invoice, entitlement, return, active subscriber, recognized revenue.

And that is the key point: data architecture fails when semantic ownership is separated from pipeline ownership.

The warehouse still loads. Kafka still streams. dbt still compiles. The jobs are green. But the meaning is wrong, inconsistent, delayed, or disputed. Green dashboards with red meetings.

Problem

The anti-pattern usually looks like this:

A central data platform team owns ingestion, transformation, storage, quality checks, canonical schemas, and downstream publication. Source teams provide “feeds.” Consumer teams submit requests. The central team becomes the translation bureau for the entire company.

It sounds efficient. It is not.

A central team can own infrastructure. It can own standards. It can own self-service capabilities. It cannot, at enterprise scale, own the semantics of every major business concept without becoming the bottleneck and the scapegoat at the same time.

This creates several predictable pathologies.

1. Pipeline ownership drifts away from domain ownership

The people building the pipeline are not the people accountable for what the data means. They rely on documentation, reverse engineering, and meetings. Every transformation becomes interpretive dance.

2. Canonical models become political artifacts

The organization invents a “single customer model” or “enterprise order schema” too early, often in the integration layer. This feels clean in PowerPoint and creates misery in delivery. Different domains need different projections and definitions. A universal canonical model often encodes compromise rather than truth.

3. Change becomes expensive

A source team changes an event. The platform team updates ingestion. Analytics updates downstream models. Integration consumers break. Governance raises review questions. Weeks pass for what should have been a local domain change with explicit contract management. EA governance checklist

4. Reconciliation is treated as an exception

In real enterprises, data from operational systems will diverge. Timing differs. Keys mismatch. late-arriving events happen. CDC duplicates appear. Manual corrections bypass process. If your architecture assumes consistency without designing reconciliation as a first-class capability, it will fail in production and lie in governance reviews. ArchiMate for governance

5. Kafka becomes a distributed version of the same central bottleneck

A message broker does not decentralize ownership by itself. Plenty of organizations run Kafka with centrally owned topics, centrally defined event payloads, and centrally approved schemas. That is not event-driven architecture. It is integration middleware with better marketing.

The net effect is brutal: everyone says they want faster delivery, but the topology ensures slow learning.

Forces

Good architecture is usually a response to forces, not faith. Here the forces are strong and often in tension.

Domain semantics versus platform standardization

The platform wants consistency, shared tooling, and reduced cognitive load. Domains need language that reflects their business reality. Standardization is useful at the infrastructure layer; dangerous at the meaning layer.

Local autonomy versus enterprise interoperability

If every domain publishes whatever it likes, the enterprise becomes unreadable. If everything must be normalized centrally, the enterprise becomes immobile. The answer is not choosing one. It is placing contracts at the right boundaries.

Event speed versus data trust

Real-time propagation is seductive. But speed without reconciliation is just fast disagreement. Enterprises need both event-driven flow and periodic alignment against authoritative sources.

Product ownership versus shared services

A domain team can own a data product tied to its bounded context. The platform team should provide capabilities: schema registry, lineage, observability, storage primitives, streaming runtime, catalog, access control. The moment the platform starts owning business transformations by default, it stops being a platform.

Legacy gravity versus architectural intent

Most firms do not get to start clean. They have mainframes, packaged applications, nightly batches, warehouse logic nobody dares touch, and revenue-critical reports built on mysterious SQL. Migration must respect this gravity. Any architecture that assumes a greenfield rewrite is cosplay.

Solution

The solution is to treat data movement as part of domain architecture, not as a separate centralized concern. That means aligning pipeline ownership to bounded contexts and using the platform as an enabling substrate rather than a semantic control tower.

This is not merely “data mesh,” though it overlaps. The useful part is not the slogan. The useful part is the operating model:

Domains own the meaning of the data they create
Cross-domain publication happens through explicit contracts
The platform owns paved roads, not business semantics
Reconciliation is designed, automated, and visible
Migration uses strangler patterns, not big-bang replacement

Think in terms of topology.

Inside a bounded context, teams are free to model events, operational stores, and derived datasets according to local needs. At the boundary, they publish stable interfaces: events, APIs, data products, or materialized views that are fit for external consumption. That publication is an act of product management, not a side effect of database access.

The important move is this: separate internal model freedom from external contract discipline.

A domain can have ugly internals. Most do. What matters is that what crosses the boundary is intentional, versioned, observable, and governed.

A practical ownership model

There are three layers of ownership.

Platform ownership

- Kafka clusters

- schema registry

- storage and compute foundations

- CI/CD templates

- observability

- lineage and catalog

- access control and policy enforcement

Domain ownership

- source-aligned events

- operational data products

- semantic transformations inside the bounded context

- external contracts for consumer use

- reconciliation rules for their authoritative entities

Consumer ownership

- consumer-specific projections

- analytical models

- local enrichment

- SLA decisions for their use case

The central mistake is skipping the middle and forcing platform ownership to absorb domain semantics.

Architecture

A workable architecture usually combines event streams, domain-owned operational stores, and analytical serving patterns. Not every data flow belongs on Kafka. Not every analytical need deserves a real-time stream. But Kafka is highly relevant where domain events are meaningful, time-sensitive, and consumed by multiple downstream capabilities.

Here is the topology that matters more than the tool list.

This picture says something many organizations avoid saying aloud: the platform is horizontal, but ownership is vertical.

Domain semantics discussion

A customer in sales is not the same thing as a customer in finance. In sales, a customer might be a prospect, account hierarchy, or opportunity-bearing party. In finance, a customer may be a bill-to legal entity with credit and tax attributes. In support, a customer might be a user, tenant, or subscriber identity. There is overlap, but no universal essence that survives every use case without distortion.

Domain-driven design gives us a language for this: bounded contexts, ubiquitous language, context maps. In practice, this means you stop trying to force semantic convergence in the transport layer. You let each domain define its terms internally, then publish contracts that are fit for agreed external purposes.

That might mean:

Sales publishes AccountCreated and OpportunityWon
Fulfillment publishes ShipmentDispatched
Finance publishes InvoiceIssued and PaymentAllocated

It does not mean one enterprise committee invents CanonicalBusinessPartyV12.

Canonical forms are sometimes useful at the edges: regulatory reporting, MDM reference entities, or curated enterprise reporting. But they should be derived and governed as explicit downstream products, not imposed as the universal upstream language.

Reconciliation as first-class architecture

In a real enterprise, event flow and batch truth coexist. That is not failure. That is life.

You need reconciliation in at least three places:

Source-to-event reconciliation: did every operational change produce the expected event?
Event-to-read-model reconciliation: did downstream consumers materialize state correctly?
Cross-domain business reconciliation: does fulfillment revenue align with finance recognition, allowing for timing and business rules?

Design this deliberately.

Diagram 2 — Reconciliation as first-class architecture

This is where many “modern data platform” narratives become suspiciously silent. They talk about streaming but not about the day after streaming when records are missing, duplicated, reordered, or corrected by human intervention.

A strong enterprise architecture assumes drift and gives operations a way to detect, explain, and repair it.

Contracts and publication patterns

Not every consumer should read raw events directly. Good publication patterns include:

Domain events for process reaction and automation
Operational data products for trusted domain-owned consumption
Consumer-aligned projections for specific analytical or operational needs
Reference entities for shared master-like concepts with explicit stewardship

If your CFO dashboard depends on five raw event topics and a hope, you have built a demo, not an enterprise system.

Migration Strategy

The migration path matters more than the target diagram. Most firms already have warehouse transformations, ETL jobs, shared integration hubs, and maybe Kafka streams feeding side systems. You cannot switch ownership overnight, because ownership is not a YAML file. It is capability, accountability, budget, skill, and organizational trust.

Use a progressive strangler migration.

Start by identifying high-value bounded contexts where semantics are strong and ownership is clear. Orders. Billing. Claims. Shipments. Policies. Subscriptions. Pick one with both pain and leadership support.

Then move in stages.

Stage 1: Make hidden semantics visible

Catalog existing pipelines by domain concept, not just technology. Ask:

Which domain originated this data?
Who can authoritatively define it?
Who changes it most often?
Who currently transforms it?
Where do disputes about meaning occur?

This is usually a humbling exercise. You will discover that “customer master” is maintained by six teams and trusted by none.

Stage 2: Establish domain publication boundaries

For the chosen domain, define:

internal model
external event contracts
operational data product interfaces
quality rules
reconciliation responsibilities
versioning approach

Do not try to perfect enterprise taxonomy first. Publish one good bounded contract.

Stage 3: Dual run with reconciliation

Continue feeding legacy central pipelines while introducing domain-owned publication. Compare outputs. Reconcile counts, key coverage, state transitions, and business aggregates. Expect mismatches. The point is to expose them while the old path still exists.

Stage 4: Shift consumers incrementally

Move consumers one class at a time:

low-risk analytics
downstream operational read models
external integrations
critical financial or regulatory consumers last

A strangler pattern is successful when new consumers stop onboarding to the old path before old consumers are fully retired.

Stage 5: Retire central semantic transformation

The platform keeps the pipes, security, lineage, and observability. But the central team stops being the default owner of business logic for that domain. This is the real milestone. The code move is easy. The accountability move is hard.

Migration reasoning

Why strangler rather than rewrite?

Because rewrite assumes you already understand your domain semantics. Most enterprises do not. They discover them by comparing old and new paths under load, over time, with actual business corrections. Reconciliation is not just quality control. It is semantic discovery.

And because critical reporting has hidden dependencies. A nightly ledger extract may feed a treasury process nobody mentioned in architecture review. The old estate is full of these booby traps. Strangler migration lowers the blast radius.

Enterprise Example

Consider a global manufacturing company with three major systems:

SAP for finance and order settlement
Salesforce for sales and account management
a bespoke fulfillment platform handling warehouse operations, shipping, and returns

The company introduced Kafka to enable real-time order visibility and customer notifications. A central data engineering team also maintained the warehouse and built “canonical order” tables for analytics and integration.

Everything looked sophisticated. Nothing was simple.

Sales said an order was created when a rep booked it in Salesforce. Fulfillment said an order was real only after allocation and warehouse release. Finance recognized revenue based on shipment and invoice conditions from SAP. Customer service built dashboards off the warehouse and routinely found orders that were “complete” in one system and “missing” in another. Executives demanded a single order truth.

The first instinct was a bigger canonical model. That would have made things worse.

The better move was to define three bounded contexts:

Sales Order Capture
Fulfillment Execution
Financial Settlement

Each domain published its own contracts:

Sales: OrderBooked
Fulfillment: OrderAllocated, OrderShipped, ReturnReceived
Finance: InvoiceIssued, RevenueRecognized, CreditPosted

A domain-owned operational data product was created for each context. The central platform provided Kafka, schema validation, lineage, and observability. The warehouse team stopped inventing universal order semantics upstream and instead built curated cross-domain views explicitly named for their purpose, such as:

customer_order_journey
financial_order_recognition
shipment_exception_dashboard

Most importantly, they introduced reconciliation:

shipment events reconciled to warehouse dispatch snapshots
invoice events reconciled to SAP settlement extracts
cross-domain matching rules aligned shipped units to invoiced units with timing tolerances and exception queues

What happened?

Not magic. Better arguments.

Meetings improved because disagreement became explicit. Instead of debating the “real” order status, teams asked which bounded context and which business purpose were in play. Customer service used the journey view. Finance used settlement views. Operations reacted to fulfillment events. The enterprise got more truth by giving up the fantasy of one universal truth.

That is grown-up architecture.

Operational Considerations

A domain-aligned data architecture is not looser. In some ways it is stricter, because ownership can no longer hide behind central ambiguity. enterprise architecture with ArchiMate

Observability

You need telemetry at multiple layers:

topic lag and throughput
schema change rates
consumer failure rates
reconciliation exception volumes
data freshness by product
contract usage by consumer

If a domain publishes an event nobody can reliably consume, that is not “decentralization.” That is abandonment.

Data quality

Quality checks should be split:

platform-level checks: format, schema, access, transport
domain-level checks: business validity, state transitions, referential assumptions
consumer-level checks: fitness for purpose

Trying to centralize all quality rules ensures that the important ones arrive last.

Security and policy

This is where centralization remains essential. Domains should not individually reinvent access models for PII, financial data, export controls, or retention. The platform should enforce policy as code and expose reusable controls. Decentralized semantics does not mean decentralized compliance.

Versioning

Schema evolution in Kafka and APIs must be disciplined. Backward compatibility where possible. Clear deprecation windows. Consumer notification. Contract test automation. Versioning is less about syntax than about semantic stability. Renaming a field is cheap. Changing what “active customer” means is not.

Support model

You need a federated operating model:

platform SRE and enablement
domain data product owners
consumer teams accountable for local projections
an architecture forum for context mapping and boundary disputes

Without this, decentralization becomes “please ask in Slack.”

Tradeoffs

There is no free architecture. This one buys semantic clarity and scalable ownership by accepting some complexity.

What you gain

faster change inside domains
clearer accountability for data meaning
less central bottlenecking
better alignment between operational and analytical semantics
more honest cross-domain integration
easier migration from legacy central ETL over time

What you pay

duplicated effort across domains
need for stronger product ownership skills in domain teams
more contracts to manage
harder cross-domain reporting if you expected one canonical schema to do everything
more emphasis on reconciliation and exception handling

This is a worthwhile trade if your enterprise is large enough that semantic disputes dominate technical concerns. If your organization is small and your domains are weak, the overhead may not pay for itself.

Failure Modes

This pattern fails in recognizable ways.

1. “Domain-owned” becomes code for “ungoverned”

Teams publish topics and tables with no discoverability, poor naming, no SLA, and no lifecycle management. That is not federated architecture. It is entropy with CI/CD.

2. The platform team retreats too far

A good platform team is not a ticket queue, but it is also not absent. If every domain has to solve schema governance, stream operations, lineage, and policy alone, you have decentralized toil.

3. Boundaries are drawn around systems, not domains

Owning “the SAP pipeline” is not a domain. Owning financial settlement is. If boundaries mirror application estates instead of business meaning, you preserve legacy confusion in a new technical wrapper.

4. Reconciliation is deferred

Teams say they will add reconciliation “later.” Later never comes. Then confidence collapses at the first audit issue or customer-impacting mismatch.

5. Enterprise reporting is neglected

Some advocates overreact to canonical models and underinvest in curated enterprise views. Executives still need coherent reporting across domains. The trick is to build those views explicitly, downstream, with known definitions and stewardship.

6. Kafka is overused

Not every integration needs event streaming. Sometimes a batch snapshot, CDC feed, or API is simpler and more reliable. Kafka is excellent for event propagation and decoupled consumers. It is not a universal solvent.

When Not To Use

Do not use this approach by default in every situation.

Small organizations with low semantic complexity

If you have a handful of systems, one data team, and stable reporting needs, domain-aligned ownership may introduce more governance ceremony than value.

Environments with weak domain accountability

If business domains do not own products, roadmaps, or operational quality, asking them to own data products is wishful thinking. You cannot architect accountability into existence.

Heavily regulated reporting pipelines with minimal change

For some regulatory flows, a centralized curated pipeline may be entirely appropriate, especially where semantics are externally defined and internal variation is low.

Purely technical telemetry platforms

Infrastructure logs, metrics, and traces usually align better to platform or capability ownership than business bounded contexts. Do not force domain semantics where none exist.

Organizations chasing fashion

If the real aim is to adopt “data mesh” language without changing incentives, staffing, and ownership, stop. New nouns will not save old behavior.

Several adjacent patterns complement this architecture.

Bounded Contexts and Context Maps

From domain-driven design, these help identify semantic boundaries and integration relationships.

Strangler Fig Migration

Essential for progressively replacing centralized pipelines and hidden transformations without a big-bang cutover.

Event-Carried State Transfer

Useful when domains need to share state changes asynchronously, but only when event contracts are semantically clear.

CQRS Read Models

Helpful for consumer-specific projections derived from domain events or data products.

Change Data Capture

Valuable in migration and reconciliation, but dangerous when treated as a substitute for domain publication. CDC tells you that data changed, not what it meant.

Data Products

A useful framing if you take product obligations seriously: discoverability, trust, support, versioning, and consumer empathy.

Summary

Your data platform is not a platform if it owns everybody’s semantics. It is a centralized translation department with better infrastructure.

The architecture that scales in the enterprise is not one giant canonical pipeline. It is a topology where domain teams own the data they mean, the platform team owns the capabilities that make that safe and fast, and cross-domain use is managed through explicit contracts, curated views, and relentless reconciliation.

That is the heart of it.

Put ownership where meaning is created. Keep standards where they truly standardize. Accept that enterprise truth is plural before it is curated. Migrate with a strangler, not a revolution. Reconcile continuously, because systems drift and people improvise. Use Kafka where events matter, not because the logo looks modern on a slide.

Most of all, stop calling something a platform just because many teams are forced to use it.

A platform creates leverage.

If yours mostly creates meetings, it is time to redraw the boundaries.

Frequently Asked Questions

What is a data mesh?

A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.

What is a data product in architecture terms?

A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.

How does data mesh relate to enterprise architecture?

Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.