Your Data Platform Has No Domains Only Pipelines

⏱ 21 min read

Most data platforms do not fail because the technology is weak. They fail because nobody can answer a simple question with confidence: who owns this fact?

That is the original sin.

A company starts with a warehouse, then a lake, then a lakehouse, then a streaming backbone, then some heroic team adds Kafka, dbt, Airflow, Flink, Spark, and a semantic layer. Every addition sounds sensible. Every team says they are improving access, agility, or scale. Yet six quarters later the platform is full of polished movement and thin meaning. Data flows everywhere. Ownership lives nowhere. The system is alive with pipelines but empty of domains.

This is the uncomfortable truth in many enterprises: you do not have a domain-oriented data platform just because your data is tagged by business area or stored in separate schemas. If the operating model is still “central team builds pipelines for everyone,” then the topology is pipeline-centric, not domain-centric. The shape of the org leaks into the shape of the platform. And when the shape is wrong, the platform becomes a conveyor belt for ambiguity.

That is why the phrase “ownership topology” matters. It names the real design problem. The issue is not whether your storage engine supports tables, streams, or object files. The issue is whether your architecture expresses business ownership, semantic authority, and operational accountability in a way the enterprise can actually run.

A good data platform is not just a substrate for movement. It is a map of responsibility.

And responsibility, in enterprise architecture, is where things become real.

Context

Most large organisations arrive here honestly. They do not set out to build a mess. They are trying to solve immediate pain.

The reporting team needs customer metrics. Finance needs reconciled revenue numbers. Operations wants daily inventory movement. Marketing wants event data in near real time. Product wants clickstream analytics. Compliance wants retention controls. The platform team sees repeated ingestion work and sensibly builds shared pipelines. Then the shared pipelines become shared transformations. Then shared curation. Then shared definitions. Before long, the platform team is not providing a platform; it is impersonating the business.

This usually shows up in a familiar architecture:

source systems emit data through batch jobs or Kafka topics
central ingestion lands it into a data lake or warehouse
central transformation teams standardise and join
downstream consumers build dashboards, machine learning features, and extracts
everyone debates definitions in steering meetings because the code has become the constitution

There is often a thin layer of “domain naming” on top. Schemas might be called sales, claims, customer, or billing. But naming is cheap. Ownership is expensive. If nobody in Sales can truly assert the semantics of booked_revenue, then sales.booked_revenue is decoration.

Domain-driven design has been teaching this lesson for years. A domain is not a folder. It is not a topic prefix. It is not a taxonomy label maintained by governance. A domain is a boundary around language, behaviour, invariants, and decision-making. In software, we learned that bounded contexts matter because the same word means different things in different parts of the business. In data, the same truth applies, but the consequences are often worse because ambiguity gets copied, cached, and institutionalised.

A pipeline can move bytes.

A domain can make promises.

Those are not the same thing.

Problem

The core problem is that many data platforms are optimised around flow topology rather than ownership topology.

Flow topology asks:

How does data get from A to B?
What tooling orchestrates transformation?
How do we scale movement, storage, and processing?
How do we expose datasets efficiently?

Ownership topology asks harder questions:

Which domain is authoritative for this business fact?
Where is semantic meaning defined and evolved?
Who is accountable when numbers drift?
Where do cross-domain contracts live?
How is reconciliation done when two systems are “correct” in different ways?

Without ownership topology, the enterprise gets these pathologies:

1. Semantic drift disguised as technical debt

Metrics slowly diverge across pipelines. “Active customer” means one thing in CRM analytics, another in billing, and a third in product telemetry. Nobody notices until executive reporting breaks. By then there are fifteen downstream consumers and no agreed canonical meaning.

2. Central platform teams become bottlenecks

Every meaningful change must pass through one overworked team because they own transformation logic. The business waits. Workarounds proliferate. Shadow pipelines appear in notebooks, BI tools, and local scripts. Governance becomes fiction. EA governance checklist

3. Kafka becomes a transport without accountability

Streaming adoption often amplifies the issue. Teams publish events, but event contracts are weak and semantics are unstable. Topics multiply. Consumers infer business meaning from payloads that were never designed as enduring facts. The organisation ends up with real-time confusion.

4. Reconciliation is treated as an exception

In the real world, domains disagree. Orders say one thing, payments say another, shipments a third. A mature architecture expects disagreement and designs reconciliation explicitly. A pipeline-centric model usually assumes there is one correct upstream source and hides mismatches until month-end close or audit.

5. Data products are declared, not operated

Many enterprises adopt the language of “data products” but keep the old operating model. A team publishes a curated table and calls it a product. Yet there is no owner, no SLA, no contract, no lifecycle, no stewardship of semantics. It is branding without responsibility.

The result is not just inefficiency. It is organisational confusion rendered in infrastructure.

Forces

This design problem persists because the forces are real and they pull in different directions.

Need for consistency

Executives want one version of the truth. Regulators demand traceability. Finance needs controlled numbers. Shared curation feels like the shortest path to consistency.

Need for speed

Product, operations, and machine learning teams want direct access to changing business data. They cannot wait three months for a central backlog.

Need for interoperability

A customer journey crosses sales, fulfillment, support, and billing. The enterprise needs cross-domain analysis, not isolated silos.

Need for local semantics

Each domain has language that does not collapse cleanly into enterprise-wide abstractions. Sales opportunities, insurance claims, trade settlements, and patient encounters all carry local rules and temporal nuances.

Need for platform efficiency

No sensible organisation wants every domain team handcrafting ingestion, storage policy, access control, quality monitoring, or Kafka operations from scratch. event-driven architecture patterns

Need for survivable change

Source systems are replaced. microservices split and merge. ERP migrations happen. Mergers create duplicate capabilities. The architecture must survive business change without rewriting the whole analytical estate every two years. microservices architecture diagrams

That is why this cannot be solved with a purity argument. Full centralisation is brittle. Full decentralisation is chaos. The interesting design lives in the seam.

Solution

The practical answer is to build the data platform around domain-owned semantic authority on top of a shared self-service platform.

In plain language:

the platform team owns capabilities
domain teams own meaning
cross-domain composition is a first-class concern, not an accidental BI activity
reconciliation is designed into the model
pipelines exist, but they are servants of domains, not substitutes for them

This is where domain-driven design helps. Not in a ceremonial way. In a working, enterprise way.

A domain-oriented data platform should distinguish at least four things:

Source-aligned data

- close to operational systems

- useful for traceability and low-friction ingestion

- not yet a business promise

Domain data products

- published by domain owners

- semantically curated

- versioned with contracts, quality rules, and lifecycle expectations

- the place where business meaning is asserted

Cross-domain compositional products

- built from multiple domain products

- owned by teams whose responsibility is enterprise-wide use cases such as finance reporting, risk, or customer 360

- never confused with raw truth from a single domain

Reconciliation products

- explicit models that explain and manage mismatches between bounded contexts

- essential in finance, supply chain, healthcare, telecom, insurance, and any serious enterprise

The key move is subtle but important: do not force all semantics upward into a central warehouse team, and do not pretend every upstream event is already domain truth.

Instead, let domains publish stable analytical facts in their own bounded context. Then compose and reconcile across contexts deliberately.

A simple ownership model

Platform team: ingestion framework, Kafka platform, catalog, data contracts tooling, storage abstractions, observability, access control, CI/CD for data, lineage, quality framework
Domain teams: domain events, semantic models, business definitions, data quality rules, stewardship, SLA negotiation, change management for consumers
Enterprise data office or governance function: policy, critical data element oversight, retention standards, federated governance, arbitration of shared terms
Cross-domain product teams: customer 360, regulatory reporting, revenue intelligence, fraud analytics, planning models

This is not merely mesh rhetoric. It is an ownership topology.

Architecture

A useful architecture has to show where semantics harden and where they remain fluid.

The architecture has three distinct layers of concern.

1. Source-aligned ingestion

This is where CDC, event capture, batch landing, schema evolution, and immutable retention belong. The goal is fidelity and recoverability. If Kafka is used, topics should be treated carefully. Some are operational integration events. Some may become analytical source feeds. But a Kafka topic is not automatically a domain data product. Too many firms confuse publish-subscribe with semantic stewardship.

A customer service microservice might emit CaseClosed. Useful, yes. But is that enough to define “resolved complaint” for regulatory reporting? Usually not. Operational event schemas are optimised for service collaboration, not enterprise analytics. They are one input into domain truth, not the whole thing.

2. Domain semantic publication

This is the heart of the architecture.

A Billing domain should publish concepts like invoice issued, invoice paid, credit note applied, delinquency state, billing account hierarchy. It owns definitions, quality thresholds, late-arriving-data rules, and temporal semantics. It does not merely forward tables from the billing package.

Likewise Sales owns bookings, pipeline stages, account assignment, and opportunity semantics. Fulfillment owns shipment state, partial fulfillment logic, return completion, and inventory allocation status.

These products should be discoverable, documented, observable, and contract-managed. They need explicit interfaces for analytical consumption: tables, views, streams, APIs, or semantic models depending on usage. The exact form matters less than the operational contract.

3. Cross-domain composition and reconciliation

This is where enterprises either become honest or become political.

Cross-domain products are necessary because the business runs across boundaries. But they should not erase those boundaries. A customer 360 model is not “the customer truth.” It is a composition. Revenue intelligence is not simply Billing plus Sales. It is a governed model with assumptions. Reconciliation products explain divergence between orders, invoices, payments, returns, and ledger entries.

This deserves its own diagram.

3. Cross-domain composition and reconciliation — Cross-domain composition and reconciliation

That box in the middle is where maturity lives.

A serious enterprise does not pretend all numbers align naturally. It builds mechanisms to classify mismatches:

timing differences
key matching failures
duplicates
partial lifecycle completion
semantic disagreement between domains
operational defects in source systems
policy exceptions

This is not a side activity. It is architecture.

Migration Strategy

Most organisations cannot jump from central pipelines to domain ownership in one move. Nor should they try. The sensible path is a progressive strangler migration.

The old platform still has to run payroll, produce month-end numbers, and support operational analytics. You cannot freeze the estate for a noble redesign. You need a migration that preserves continuity while shifting accountability.

Step 1: Identify the highest-value semantic seams

Do not begin by trying to domain-align everything. Start where semantic confusion is expensive:

revenue
customer identity and lifecycle
order-to-cash
claims
policy servicing
inventory and fulfillment
regulatory exposure
reference data that drives pricing or risk

Look for areas with repeated disputes, spreadsheet reconciliations, and executive mistrust. Pain is an excellent prioritisation tool.

Step 2: Map bounded contexts and authority

For each key business concept, ask:

which domain creates it?
which domain can change it?
which domain is authoritative at which lifecycle stage?
where are handoffs and translations needed?
which concepts are shared language and which are false friends?

This exercise often reveals that “customer” alone has multiple bounded contexts: prospect, contract party, billing account, service subscriber, legal entity, household, and support contact. That is healthy clarity, not duplication.

Step 3: Introduce domain products beside existing pipelines

Do not rip out central curated models first. Stand up domain-owned products in parallel. Feed them from source-aligned data and, where necessary, from existing warehouse assets. Give them contracts, dashboards, and known consumers. This is the strangler move: new use cases are built on the new products, while old pipelines continue serving legacy consumers.

Step 4: Build reconciliation as a bridge, not an afterthought

During migration, old and new numbers will differ. They must. The mistake is to treat every mismatch as a defect in the new model. Many mismatches reveal hidden assumptions in legacy pipelines. Create formal reconciliation views that show:

expected alignment rules
unexplained differences
lifecycle lag
consumer impact
cutover readiness

This gives leadership confidence and prevents endless arguments by anecdote.

Step 5: Shift ownership before shifting tooling

A common anti-pattern is migrating from warehouse to lakehouse or batch to streaming and calling that domain modernisation. It is not. If the same central team still owns the semantics, you have moved platforms, not architecture.

Ownership changes should come first or at least move in lockstep with technology changes.

Step 6: Retire central transformations selectively

Once domain products are stable and adoption is real, progressively decommission redundant central transformations. Keep enterprise compositional products where they add value, but remove hidden semantic logic that rightly belongs in domains.

Here is the migration shape.

Step 6: Retire central transformations selectively — Retire central transformations selectively

This is slower than a greenfield fantasy and faster than another three-year platform programme that changes every tool and no responsibility.

Enterprise Example

Consider a global manufacturer with e-commerce, wholesale distribution, field service, and subscription maintenance contracts. Typical large enterprise. Typical large mess.

They had:

SAP ERP for order management and finance
Salesforce for pipeline and account management
a homegrown fulfillment system
separate microservices for digital subscriptions
Kafka for near-real-time integration
a cloud data lake feeding a warehouse
one central data engineering team producing “trusted” models

The critical issue was revenue and customer lifecycle reporting. Sales reported bookings from Salesforce. Billing reported invoices from SAP. Digital subscriptions emitted usage and entitlement events from microservices over Kafka. Fulfillment tracked shipments separately. Finance had ledger truth, but only after close. Every leadership meeting involved debates over what counted as revenue, active customer, renewal, and churn.

The central team tried to solve this by building a giant customer and revenue pipeline. It got bigger every quarter. So did the arguments.

The turnaround came when they changed the ownership model.

What they did

They created domain-aligned publication teams:

Sales domain published account ownership, opportunity lifecycle, booking events, and pipeline state
Billing/Finance domain published invoice facts, payment facts, credit adjustments, and accounting mapping
Fulfillment domain published shipment completion, return completion, and installation milestones
Subscription domain published entitlement lifecycle, renewal state, and usage settlement facts
Customer domain did not attempt one giant golden customer record; instead it published party resolution and identity linkage services with confidence scoring

A platform team continued to run Kafka, CDC tooling, catalog, data quality framework, and access patterns.

Then they built two cross-domain products:

Order-to-Revenue Reconciliation

- linked bookings, shipment completion, invoicing, subscription activation, and ledger entry timing

- explained why “booked,” “billed,” “recognized,” and “collected” differed

Customer Lifecycle 360

- composed domain perspectives rather than flattening them into one fake universal customer

- showed transitions between prospect, sold-to, bill-to, service user, and subscriber roles

Why it worked

Because they stopped asking one central team to define the business for everybody.

Sales could own sales semantics.

Finance could own accounting semantics.

Subscription teams could own digital contract events.

Cross-domain reporting became a composition and reconciliation exercise, not a semantic land grab.

Technical shape

Kafka remained important, but its role changed. Operational events from microservices were treated as inputs. Domain publication layers transformed event streams into stable analytical facts. Contract governance improved because consumers no longer treated raw service events as permanent enterprise truth. ArchiMate for governance

The result was not perfect consistency. That would be suspicious. The result was explainable consistency. Executives could see why bookings were ahead of invoices. Finance could trace recognition timing. Service teams could measure install lag. Audit had lineage and controls. Most importantly, change no longer required surgery on one giant pipeline.

That is what good architecture feels like: less heroism, more legibility.

Operational Considerations

A domain-oriented ownership topology still needs hard operational discipline.

Data contracts

Every domain product needs a contract covering:

semantic definition
schema and compatibility expectations
freshness SLA
quality thresholds
deprecation policy
consumer support model

Without this, “ownership” becomes sloganware.

Observability

You need technical and semantic observability.

Technical observability includes job failures, lag, volume anomalies, schema drift, Kafka consumer lag, storage failures.

Semantic observability includes null rates in key facts, sudden distribution changes, business rule breaches, reconciliation deltas, orphaned records, identity match deterioration.

Versioning and compatibility

Domain products evolve. They always do. Mature teams support additive change first, version breaking changes carefully, and provide migration windows. Streaming contracts especially need discipline. Backward-compatible events do not guarantee semantically compatible facts.

Federated governance

This is one place where many data mesh conversations become naïve. Federated governance is not everybody doing what they like. It means local ownership inside common guardrails:

common metadata standards
lineage requirements
identity and access controls
retention and privacy policy
naming and discoverability conventions
critical data element controls
stewardship escalation routes

Identity and master data

Not every enterprise needs a giant MDM programme, but most need some strategy for entity resolution. Customer, supplier, product, and asset identities often cross domain boundaries. The trick is not to erase bounded contexts. It is to provide linkage where needed while respecting local semantics.

Cost discipline

Decentralised publication can increase duplication if you are careless. Platform guardrails should make the paved road cheap: shared transformation patterns, reusable quality rules, standard storage layouts, and common streaming templates.

Tradeoffs

There is no free lunch here.

Benefit: clearer accountability

Cost: more coordination at boundaries

You gain explicit owners for semantics. You also create more visible handoffs between teams. That can feel slower at first because ambiguity is no longer hidden inside central SQL.

Benefit: faster local change

Cost: less comfort from central control

Domain teams can evolve their products quickly. But central reporting teams lose the illusion that they can define everyone’s data in one place.

Benefit: better semantic integrity

Cost: duplicate representations

The same real-world entity may appear differently in multiple bounded contexts. That is not always waste. Sometimes it is the correct expression of business reality. But it does require discipline to explain.

Benefit: resilient migration path

Cost: temporary coexistence overhead

Progressive strangler migration means running old and new in parallel. Reconciliation work increases before it decreases. Leaders must tolerate a period where complexity is more visible.

Benefit: streaming becomes useful

Cost: extra publication layers

If you stop treating raw Kafka events as ready-made analytics, you add work. Good. That work is where semantics become trustworthy.

Failure Modes

This approach can go wrong in very predictable ways.

1. Domain teams are named but not empowered

If “domain owner” means someone attends meetings but cannot prioritise engineering work, you still have a centralised platform with distributed frustration.

2. Every team publishes everything

Without product discipline, domains flood the platform with low-quality outputs. Catalog noise increases and consumers bypass the official products.

3. Platform team abdicates standards

Decentralisation without paved roads turns into artisanal chaos. Every team chooses different storage formats, observability conventions, and contract styles. Governance collapses under variety.

4. Reconciliation is postponed

This is the big one. Teams publish products, but nobody funds the cross-domain reconciliation layer. Executive reports then continue to disagree and the organisation concludes that domain ownership “didn’t work.” In truth, they stopped halfway.

5. Customer 360 becomes an empire

A central “customer” team often re-centralises all semantics under the banner of shared truth. Sometimes that is justified for identity resolution. Often it becomes a domain vacuum cleaner, pulling in sales, billing, support, and digital semantics that should remain local.

6. Kafka contract sprawl

If event schemas proliferate without lifecycle control, the domain publication layer becomes brittle and expensive. Operational microservices are notorious for evolving events around local service needs. That is fine for service boundaries. It is dangerous for analytical dependencies unless buffered by domain-owned products.

When Not To Use

This pattern is not a religion.

Do not use a domain-oriented ownership topology if:

your company is genuinely small and one team can hold the semantics in its head
your data use is mostly straightforward reporting off one or two systems
there is no meaningful domain engineering capacity outside a central team
your main issue is basic platform reliability, not semantic ownership
your organisation is in the middle of a large ERP consolidation and cannot yet identify stable business boundaries
compliance demands such tight central control that federated ownership is politically impossible for now

In those situations, a well-run central platform may be the right temporary answer. Better a coherent centre than performative decentralisation.

Also, do not force domain publication everywhere. Some datasets are platform exhaust, transient telemetry, or low-value extracts. Not every table deserves the ceremony of a data product.

Architecture is the art of spending attention where meaning is expensive.

This ownership topology sits alongside several related patterns.

Data mesh

Useful as a framing device, especially around domain ownership and self-serve platform capabilities. But many implementations stay at the slogan level. The missing piece is often reconciliation and bounded-context semantics.

Data products

Essential, if treated as operating commitments rather than catalog labels.

Event-driven architecture

Helpful for timely movement and decoupling, especially with Kafka. But events are not enough. You still need semantic publication and contract stewardship.

CQRS and read models

Often relevant in microservices landscapes. Analytical domain products can be thought of as durable read models for enterprise consumption, though with stronger governance and lineage requirements.

Master data management

Sometimes necessary for identity linkage and reference control. But it should complement bounded contexts, not erase them.

Strangler fig migration

The right migration pattern for moving from central pipelines to domain-owned semantics without breaking the enterprise.

Lakehouse and semantic layer

Both can be useful implementation choices. Neither solves ownership by itself.

That last point is worth underlining. Tooling can support the design. Tooling cannot substitute for it.

Summary

If your data platform is organised mainly around pipelines, then you do not really have domains. You have transportation.

And transportation, while useful, does not settle meaning.

A sustainable enterprise data platform needs an ownership topology that makes semantic authority explicit. Domain teams should own business facts in their bounded contexts. Platform teams should provide shared capabilities. Cross-domain composition should be deliberate. Reconciliation should be designed in from the start, especially where finance, operations, and customer lifecycle intersect. Kafka and microservices can accelerate this architecture, but only if you resist the temptation to mistake event flow for domain truth.

The migration path is progressive, not revolutionary. Use a strangler approach. Publish domain products beside legacy pipelines. Reconcile old and new. Shift accountability before you chase the next storage engine.

The memorable lesson is simple:

Pipelines move data. Domains carry meaning.

If your architecture cannot show who owns the meaning, the platform will eventually drown in motion.

Frequently Asked Questions

What is a data mesh?

A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.

What is a data product in architecture terms?

A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.

How does data mesh relate to enterprise architecture?

Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.