⏱ 19 min read
Data Platform Migration Is Mostly Political | ownership topology
Most data platform migration stories are told as if they were infrastructure stories. They aren’t. They are stories about authority, incentives, and fear, disguised as stories about pipelines and storage engines.
That is the first hard truth.
The second is even less comfortable: most enterprises do not fail data platform migrations because they picked the wrong warehouse, the wrong lakehouse format, or the wrong streaming stack. They fail because nobody decided who owns meaning. A table can be moved in a weekend. A metric cannot. A Kafka topic can be mirrored across regions. A business concept like “active customer,” “fulfilled order,” or “net revenue” will start fights that last quarters. event-driven architecture patterns
A data platform is not merely a technical substrate. It is an ownership topology made visible through technology. Every dataset, event stream, dashboard, and machine learning feature carries a hidden signature: some team’s interpretation of the business. If that ownership is ambiguous, the platform becomes a swamp. If ownership is centralized beyond reason, it becomes a ticket queue with SQL.
That is why data platform migration is mostly political. Technology matters, certainly. But technology is often the easiest part. The real migration is from one model of responsibility to another.
Context
Most large organizations arrive at migration for familiar reasons. The current platform is expensive, brittle, slow, or embarrassingly manual. Perhaps a legacy enterprise data warehouse has become a monument to reporting from 2012. Perhaps a sprawl of ETL jobs in Informatica, SSIS, Airflow, Spark, and shell scripts has turned lineage into folklore. Perhaps a cloud migration has created a split-brain estate where the old warehouse is still considered “official,” while product teams build their own pipelines into Snowflake, BigQuery, Databricks, Redshift, or Kafka-driven operational stores. cloud architecture guide
The symptoms are visible everywhere:
- duplicate customer tables
- five versions of margin
- endless reconciliation meetings
- dashboards that disagree by single-digit percentages but trigger executive panic
- data engineering teams acting as interpreters between finance, product, operations, and technology
- domain teams shipping services fast but waiting months for trusted analytics
At that point, leadership says, “We need a modern data platform.”
Usually what they mean is one of three things:
- Move from legacy warehouse to cloud-native analytics.
- Introduce streaming and event-driven ingestion.
- Decentralize data ownership while preserving trust.
Only the first two are technology statements. The third is the real architecture problem.
Domain-driven design helps here, because it gives us vocabulary that most data programs lack. Bounded contexts matter in data as much as they matter in transactional systems. “Customer” in sales is not “customer” in billing. “Order” in e-commerce checkout is not “order” in fulfillment. The migration goes wrong the moment a central data team pretends these are all one thing and can be harmonized through naming conventions alone.
They cannot.
Problem
A traditional enterprise data platform often evolves around a hidden bargain: source systems emit records, and a central team turns those records into enterprise truth. That bargain works for a while. It creates consistency, economies of scale, and a single place to enforce controls. But it also creates deep structural coupling.
The central team becomes responsible for semantics it does not truly own.
That is the fracture line.
As the business grows, domains move faster than the central model. Product teams launch subscription plans, bundles, promotions, and partner channels. Finance adjusts revenue recognition logic. Operations introduces partial fulfillment and backorder exceptions. Marketing invents attribution rules that mutate quarterly. None of this fits neatly into a single canonical model maintained by a platform team three organizational layers away.
So the enterprise compensates in predictable ways:
- source-aligned raw ingestion grows without clear contracts
- semantic transformations proliferate in downstream marts
- domain teams export data and build side systems
- Kafka topics are produced with weak schemas and stronger opinions than anyone admits
- “gold” datasets become political settlements rather than technical outputs
The result is not just inconsistency. It is ownership collapse.
When ownership collapses, every migration becomes dangerous. Teams are asked to move pipelines before agreeing on business meaning. They are asked to replicate reports before identifying the domain authority behind them. They are asked to cut over to a new platform before reconciliation is designed. This is how migrations end with dual-running estates that last years.
The core problem can be phrased simply: the enterprise has data assets, but no clear map of semantic ownership and operational accountability across domains.
Forces
Several forces pull against one another in any serious migration.
1. Central consistency versus domain autonomy
Executives want one number for revenue. Domain teams need local control over their definitions and workflows. Centralization improves standardization. Decentralization improves responsiveness. You do not get both for free.
2. Platform scale versus semantic proximity
A platform team can run storage, orchestration, security, observability, and cost controls at scale. But it should not be the long-term owner of order lifecycle semantics or fraud classifications. Those belong near the domain.
3. Historical reconciliation versus future agility
Legacy estates contain years of embedded business logic. Some of it is obsolete, some accidental, some mission-critical. During migration, the organization wants to move fast while also proving parity with historical reporting. These desires often collide.
4. Event-driven freshness versus analytical stability
Kafka and microservices make real-time data propagation possible. They do not guarantee semantic completeness. Operational events are optimized for business process execution, not enterprise analytics. An event named OrderCreated is not necessarily enough to compute booked revenue, fulfillment performance, or customer lifetime value.
5. Ownership clarity versus organizational reality
The architecture may say “domain teams own their data products.” The org chart may say otherwise. Some domains have strong product and engineering capability. Others do not. Ownership is easy to declare and hard to sustain.
These forces are why migration is political. Architecture exposes tensions that management would prefer remain hidden.
Solution
The practical solution is not “build a new platform.” It is to redesign the ownership topology and let the platform follow.
I would state the target model this way:
- platform teams own capabilities
- domain teams own semantics
- governance teams own policy
- consuming teams own composition for their use cases
That sounds obvious. In practice, it is radical.
A modern data platform should be organized around bounded contexts, explicit data contracts, and progressive strangler migration. The old estate is not ripped out in one heroic release. It is surrounded, decomposed, and gradually starved. Domain by domain. Metric by metric. Consumer by consumer.
The key move is to stop treating data migration as a bulk replication exercise and instead treat it as a sequence of ownership decisions:
- Who is authoritative for customer identity?
- Who defines order state transitions?
- Which events are operational facts versus analytical convenience?
- Which datasets are source-aligned, which are domain-modeled, and which are consumption-specific?
- Who signs off reconciliation at each semantic boundary?
This is domain-driven design applied to data. Not as a slogan. As a survival tactic.
Ownership layers
A healthy ownership topology usually has four layers:
- Source-aligned ingestion
- raw and lightly standardized replicas of operational data
- owned operationally by platform, with schema contracts from source teams
- Domain data products
- business-meaningful datasets and event streams within bounded contexts
- owned by domain teams
- Cross-domain semantic models
- curated compositions for enterprise reporting or shared use cases
- owned jointly, usually with a designated lead domain
- Consumption artifacts
- dashboards, extracts, features, APIs
- owned by consuming teams
Without this layering, every dataset becomes everyone’s problem.
Architecture
A workable target architecture for enterprise migration usually combines batch and streaming, with Kafka where latency and event choreography justify it, and a warehouse or lakehouse for durable analytical serving. The architecture should not force every domain into real-time. That is how teams end up building expensive theatre.
Here is the shape.
This architecture is boring in the best way. It separates platform concerns from domain concerns. It also acknowledges a fact many programs deny: not all enterprise truth comes from events. Some comes from slowly changing dimensions, snapshots, late-arriving corrections, financial close processes, or reference data mastered elsewhere.
Domain semantics first
The most important architectural element is not Kafka, storage, or orchestration. It is the domain model.
Suppose the commerce domain emits OrderPlaced, PaymentAuthorized, and ShipmentConfirmed. Useful events, yes. But what does the enterprise need? It may need “booked order,” “gross merchandise value,” “net revenue eligible,” “fulfilled line,” or “returned within SLA.” Those are semantic constructs, not merely events. They often span multiple bounded contexts.
This is why event streams should not be mistaken for enterprise semantics. They are ingredients. Sometimes excellent ingredients. Still not the meal.
The domain data product layer is where events, tables, and reference data are interpreted into durable business meaning. And that layer must be owned by the people closest to the business process.
Cross-domain composition
Not everything belongs inside a single bounded context. There are enterprise questions that naturally cross domains. For example:
- customer 360 across sales, service, billing, and digital channels
- order-to-cash across commerce, fulfillment, invoicing, and collections
- workforce productivity across HR, scheduling, and operations
These require composed semantic models. The trap is to let a central team invent these models without clear domain participation. The better pattern is federated composition: each contributing domain publishes stable data products; a designated owner assembles the enterprise view.
That owner is not “the data team.” It is whichever domain has the strongest business accountability for the outcome.
Event and batch coexistence
Kafka is valuable when you need near-real-time propagation, decoupled integration, and replayable event logs. It is not a religion. Plenty of business facts arrive late, are corrected in bulk, or become official only after reconciliation and approval. Financial close is not improved by pretending it is a streaming problem.
A pragmatic platform supports:
- CDC for operational replication
- Kafka for domain event propagation
- scheduled and event-triggered transformations
- snapshotting and reconciliation stores
- immutable audit trails for restatement analysis
That blend is what enterprises actually need.
Migration Strategy
Big-bang migration is usually a management fantasy. The real world prefers progressive strangler migration because it respects uncertainty.
The migration should proceed in slices defined by business capability and ownership maturity, not just technology stack.
Step 1: Inventory by semantic dependency, not by pipeline
Many migrations start with technical inventory: jobs, tables, reports, interfaces. Necessary, but not sufficient. You also need a semantic inventory:
- key metrics and KPIs
- authoritative source domains
- downstream consumers
- business sign-off owners
- known exceptions and manual adjustments
- close-process dependencies
- compliance obligations
This shifts the conversation from “what runs nightly?” to “what decisions depend on this?”
Step 2: Identify bounded contexts and ownership gaps
Map major business domains and decide which team can reasonably own data products for each. Some gaps will be obvious. For example, finance semantics may depend on ERP data but no technology team in finance exists to own the model. In that case, you need an enabling team or temporary stewardship model. Do not hide the gap. Surface it.
Step 3: Establish source contracts
For operational systems and microservices, define contracts around schemas, keys, event meaning, change handling, and retention. With Kafka, use schema evolution rules and compatibility modes. Without explicit contracts, downstream migration becomes archaeology. microservices architecture diagrams
Step 4: Build parallel domain products
Create the new domain data products while legacy marts still operate. Do not attempt immediate one-to-one replication of every historical artifact. Focus on high-value semantic slices. For each slice, make the new product explicit about:
- grain
- timeliness
- key definitions
- derivation rules
- known exclusions
- ownership and support path
Step 5: Reconcile aggressively
Reconciliation is where migrations either earn trust or lose it.
You need multiple kinds of reconciliation:
- record reconciliation: did all expected records arrive?
- metric reconciliation: do key aggregates match within agreed thresholds?
- semantic reconciliation: are differences understood and accepted?
- timing reconciliation: are variances caused by latency windows or late corrections?
- financial reconciliation: are close-critical numbers controlled and auditable?
Treat reconciliation as a first-class architectural capability, not an afterthought.
The difference between a good migration and a bad one is often this: in a good migration, every discrepancy has an owner and a path to resolution. In a bad migration, discrepancies are discussed in meetings until people become numb.
Step 6: Cut over by consumer, not by platform
The enterprise rarely cuts over the entire platform at once. It cuts over reporting packs, APIs, analytics models, and operational feeds one consumer group at a time. This lets you isolate blast radius and learn.
Step 7: Retire legacy deliberately
Legacy pipelines do not die because someone announces a new platform. They die when:
- no active consumer depends on them
- reconciliation has been signed off
- control evidence is archived
- support ownership is removed
Without these conditions, retirement becomes mythical.
Enterprise Example
Consider a global retailer with three major estates:
- an on-prem enterprise data warehouse used by finance and merchandising
- a cloud data lake fed by e-commerce and mobile events
- a Kafka backbone used by newer microservices for checkout, pricing, and fulfillment
Leadership wanted “one modern platform.” The initial program aimed to replicate all legacy warehouse data into the cloud and rebuild reports there. It looked sensible on a slide. It stalled almost immediately.
Why? Because “sales” meant different things in different places.
Merchandising cared about gross demand by item and channel. Finance cared about recognized revenue after cancellations, returns, tax, and timing adjustments. E-commerce cared about submitted basket conversion. Fulfillment cared about shipped lines and substitutions. There was no single sales truth waiting to be copied. There were several valid truths serving different bounded contexts.
The migration was reset around ownership topology.
The retailer identified six major domains: customer, catalog, pricing, order, fulfillment, and finance. It then made a crucial decision: the central data platform team would stop owning metric definitions. Instead:
- the order domain owned booked order semantics
- fulfillment owned shipped and delivered semantics
- finance owned recognized revenue and period-close adjustments
- customer owned identity resolution rules for customer analytics
- the central platform owned ingestion, storage, lineage, policy enforcement, and observability
Kafka remained important, especially for order and fulfillment events. But those streams were no longer treated as complete enterprise truth. They fed domain products that combined events with master data, late corrections, ERP entries, and returns processing.
The migration then proceeded in slices:
- Replace daily e-commerce order reporting.
- Replace fulfillment SLA dashboards.
- Replace finance revenue bridge for one region.
- Expand domain by domain.
Reconciliation exposed several hidden failure modes:
- duplicate events during retry storms
- order amendments that changed line-level economics after initial placement
- returns posted in ERP days after warehouse receipt
- customer merges that rewrote historical identity linkage
Because ownership was explicit, those were not generic “data quality issues.” They were domain issues with named teams responsible.
The outcome was not a pristine universal model. It was better: a platform where major business semantics had clear authorities, and enterprise reporting was composed from governed domain products. Finance close stayed partly batch-oriented. Fulfillment analytics went near real-time through Kafka-fed pipelines. Customer analytics used identity snapshots with versioning. Different needs, one platform backbone.
That is what mature enterprise architecture looks like. Not uniformity. Coherent diversity.
Operational Considerations
A migration architecture that ignores operations is just a diagram collection.
Data contracts and schema evolution
With microservices and Kafka, producers will evolve schemas. Some changes are harmless. Some are lethal. You need compatibility rules, deprecation windows, and consumer impact analysis. Otherwise the platform becomes an endless negotiation.
Lineage and discoverability
Catalogs are useful, but only if they capture real lineage and real ownership. Every critical domain product should show:
- upstream systems and streams
- transformation logic references
- business owner
- technical owner
- freshness expectations
- quality assertions
- downstream critical consumers
Quality as executable policy
“Trusted data” is too vague. Quality checks should be executable and tiered:
- structural checks
- referential checks
- volume and drift checks
- business rule checks
- reconciliation controls
Not every dataset deserves the same rigor. Revenue does. Experimental feature telemetry probably does not.
Security and policy boundaries
Ownership topology does not remove central policy. It makes it more important. PII, retention, residency, and access controls need platform-enforced guardrails. Domain autonomy without security policy is just distributed irresponsibility.
Cost management
Cloud data platforms can become astonishingly expensive when every domain copies everything into its own models. Shared storage patterns, lifecycle management, and workload governance matter. Ownership should not mean unlimited duplication. EA governance checklist
Support and incident management
When a KPI breaks at month-end, who wakes up? The answer should be obvious from the ownership model. If it isn’t, your migration is not done.
Tradeoffs
No architecture of this kind is free of compromise.
Benefit: semantic clarity
You gain clearer accountability and better alignment with business reality.
Cost: organizational complexity
Federated ownership requires stronger coordination. Some teams will be better at it than others.
Benefit: faster domain evolution
Domains can change their products without waiting for a central bottleneck.
Cost: harder enterprise harmonization
Cross-domain views need deliberate composition and governance. There is no magical canonical model.
Benefit: more resilient migration
Progressive strangler migration lowers risk and allows partial success.
Cost: longer transition
Dual-running, reconciliation, and phased cutover take time and money.
Benefit: better fit for event-driven systems
Kafka and microservices can feed domain-aligned products naturally.
Cost: event incompleteness
Operational events often omit context required for analytics and controls.
This is the pattern with enterprise architecture: every benefit has a bill attached. Good architects read the bill before ordering.
Failure Modes
There are several common ways this approach goes wrong.
“Data mesh” theater
Leadership announces domain ownership, but no domain teams receive funding, capability, or authority. The central team remains accountable for everything. The result is rebranded centralization with worse documentation.
Platform overreach
The platform team defines business semantics because domain teams are slow or unavailable. This feels efficient in the short term and creates semantic debt in the long term.
Domain fragmentation
Every team publishes its own version of common concepts with no cross-domain stewardship. The enterprise then rediscovers why central governance existed.
Event obsession
Teams assume Kafka topics are the definitive source for all analytics. Late corrections, ERP postings, and operational exceptions prove otherwise.
Reconciliation neglect
The migration plan allocates months to building pipelines and days to proving equivalence. That ratio should often be reversed.
Legacy immortality
New products launch, but old reports remain “just in case.” Costs rise, trust fragments, and the migration never finishes.
Canonical model fantasy
Someone tries to force one universal business schema across all domains. It becomes a diplomatic document rather than a usable model.
When Not To Use
This approach is not for every situation.
Do not use a federated ownership-topology migration if:
- the organization is small and one team genuinely understands the whole business
- the platform serves mostly straightforward reporting with stable semantics
- domain teams lack the capability or mandate to own data products
- regulatory constraints require extreme central control with limited variation
- the migration is primarily infrastructure relocation with no semantic redesign
In those cases, a simpler centralized warehouse migration may be perfectly sensible.
Likewise, do not introduce Kafka simply because “real-time” sounds modern. If the business process closes daily, receives bulk corrections, and values auditability over immediacy, a batch-first design is often better.
Architecture is not a contest in trend adoption.
Related Patterns
Several related patterns often complement this approach.
Data products
A useful concept when grounded in ownership, SLAs, and actual consumers. Useless when it means “a table with a nicer name.”
Data mesh
Helpful for framing federated ownership and self-serve platform capabilities. Dangerous when adopted as vocabulary without organizational change.
Strangler fig pattern
Excellent for migration. Build the new path around the old one, prove value incrementally, and retire legacy piece by piece.
Change data capture
A practical bridge from legacy systems into the new estate. Often more valuable than glamorous event redesign.
CQRS and event sourcing
Relevant in some operational domains, especially when domain event history matters. Not a blanket answer for analytics.
Semantic layer
Useful for stable business-facing metrics and definitions, especially across BI tools. But it cannot substitute for domain ownership underneath.
Summary
Data platform migration is mostly political because platforms encode decisions about who gets to define reality.
That sounds dramatic. It is also true.
If you migrate technology without migrating ownership, you will reproduce the old dysfunction on better hardware. If you centralize semantics too aggressively, the platform becomes a bottleneck. If you decentralize without guardrails, the estate fractures into local truths. The art is in designing the ownership topology so platform capabilities, domain semantics, and governance policy reinforce rather than undermine one another.
The practical path is a progressive strangler migration:
- map bounded contexts
- assign semantic ownership
- build domain data products
- reconcile relentlessly
- cut over consumer by consumer
- retire legacy with evidence
Kafka, microservices, cloud warehouses, and lakehouses all have their place. None of them solve the central question: who owns meaning?
That is the architecture question hiding inside the migration question. And in the enterprise, it is usually the only one that really matters.
Frequently Asked Questions
What is a data mesh?
A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.
What is a data product in architecture terms?
A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.
How does data mesh relate to enterprise architecture?
Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.