Data Platform Migration Requires Dual Ownership

⏱ 21 min read

Most data platform migrations fail for a boring reason: they pretend the old world is already dead.

It isn’t.

The legacy warehouse still runs finance. The old ETL still feeds regulatory reports. The customer master in the aging ERP still decides who gets billed. Meanwhile, the new platform arrives with lakehouse promises, event streams, Kafka clusters, domain data products, and a team full of justified optimism. The migration plan says “cut over in phases.” Reality says “both systems now matter, and neither can be wrong.” event-driven architecture patterns

That gap between plan and reality is where coexistence topology becomes essential.

A data platform migration is not simply a technical relocation from one storage engine to another. It is a temporary but often prolonged redistribution of responsibility. During that period, old and new platforms both participate in the same business outcomes. They both influence decisions. They both produce data that somebody depends on. Which means they both need ownership.

This is the architectural point many organizations resist. They want a clean handoff. One team exits, another team enters, and the migration behaves like moving houses over a weekend. But enterprise platforms are more like moving a hospital during active surgery. You don’t get to turn the lights off in one building just because the new one has better equipment.

So the right pattern is dual ownership through coexistence: a deliberate topology in which legacy and target platforms are both treated as first-class operational participants during migration, with explicit boundaries, reconciliation rules, and migration sequencing tied to domain semantics rather than infrastructure milestones.

This pattern is especially relevant when organizations are moving from centralized batch warehouses to event-driven data platforms, from monolithic integration layers to Kafka-based streaming architectures, or from application-owned reporting extracts to domain-aligned data products. It is also vital when microservices and analytical platforms start to intersect. Once multiple bounded contexts publish, transform, and consume facts asynchronously, migration stops being a copy exercise. It becomes a semantics exercise. microservices architecture diagrams

And semantics, not plumbing, is where migrations live or die.

Context

Large enterprises rarely migrate data platforms from a blank slate. They migrate while the business continues to trade, settle, ship, insure, bill, and report.

A typical landscape looks like this:

a legacy data warehouse or MPP appliance
nightly ETL pipelines with years of embedded business logic
operational databases feeding downstream extracts
reporting marts built for specific functions
new cloud object storage and processing engines
Kafka or equivalent event backbone
microservices emitting domain events
data governance and compliance requirements that span old and new systems
multiple teams with partial knowledge of how data actually behaves

The official architecture often says the old platform is “source” and the new platform is “target.” That language is comforting and usually wrong. During migration, the target increasingly becomes a source for some consumers while the source remains authoritative for others. Some domains move earlier. Some stay back because of regulatory logic, embedded transformations, or upstream application constraints. Some consumers are migrated to curated products in the new platform while others continue to depend on old reports.

This creates a coexistence topology: an architecture in which legacy and modern data platforms operate in parallel, exchanging data, validating results, and sharing responsibility for continuity.

The key phrase here is sharing responsibility. Not mirroring. Not temporary technical overlap. Actual responsibility for business-critical outcomes.

If you are moving toward data mesh ideas, this gets even sharper. A domain-aligned sales data product on the new platform may coexist with finance’s dependence on a legacy general ledger feed. Customer events may stream through Kafka in near real time while compliance extracts still run in batch. The enterprise has not finished migrating just because one platform can technically hold the data.

A platform migration ends only when business semantics, operational controls, and consumer trust have migrated too.

Problem

The central problem is this: during migration, the enterprise has one business reality but two technical representations of it.

Those representations drift for predictable reasons:

different ingestion timing
different transformation logic
different schema evolution practices
inconsistent reference data
event duplication or loss
replay behavior in streaming systems
hidden logic buried in ETL or BI layers
domain concepts that were never formally defined

When organizations ignore this, they create a dangerous fiction: that the target platform is “just a copy” until cutover. In practice, it is already shaping decision-making, dashboards, machine learning features, customer operations, and downstream service behavior. Once even one business process begins using migrated outputs, the target platform has entered production reality. It has become part of the enterprise’s truth-making machinery.

That means the migration problem is not “how do we move data?” It is “how do we run one business through two platforms without losing semantic integrity, operational control, or accountability?”

This is why dual ownership matters.

Without dual ownership, every discrepancy becomes a governance argument: EA governance checklist

Was the old team supposed to fix it?
Is the new team allowed to override it?
Who owns reconciliation?
Who signs off on cutover?
Who handles incidents when old and new disagree?
Which number goes to the board, regulator, or customer?

When nobody owns the overlap, the overlap owns you.

Forces

Several forces pull against a simplistic migration.

1. Business continuity beats architectural purity

No executive gets promoted because the migration diagram looked elegant. They care that invoicing worked, inventory was visible, and regulated reports stayed accurate. This always favors gradual coexistence over big-bang replacement.

2. Domain semantics are entangled in old systems

Legacy ETL is often dismissed as technical debt. Sometimes it is. But often it is also where years of domain decisions have accreted: how a customer household is defined, when an order is considered fulfilled, how net revenue is recognized, how claims are reopened, how policy endorsements are restated. You cannot modernize safely until you surface those semantics.

This is classic domain-driven design territory. The migration must identify bounded contexts, ubiquitous language, and aggregate-level meaning before moving pipelines. Otherwise you migrate data structures while breaking business concepts.

3. Event-driven architectures introduce new consistency patterns

Kafka helps decouple producers and consumers. It also exposes timing, ordering, idempotency, and replay issues that old batch systems often masked. During coexistence, the enterprise may compare a batch-curated result in the legacy warehouse against a stream-derived projection in the new platform. If you do not design for reconciliation, the mismatch looks like failure even when it is merely lateness or a different consistency model.

4. Consumers migrate unevenly

Some consumers are easy to move. A self-service dashboard can be re-pointed. A machine learning feature pipeline can be rebuilt. But statutory reporting, executive scorecards, ERP extracts, and operational alerts often have deep dependency chains. The migration therefore proceeds by consumer segment and domain value stream, not by technical layer alone.

5. Ownership boundaries are political as well as technical

Platform teams, domain teams, BI teams, and application teams all touch the migration. If ownership is fuzzy, incentives diverge. Legacy teams may resist because they fear abandonment without support. New teams may optimize for delivery speed and underinvest in controls. Coexistence topology works only when accountability is explicit.

6. Trust is earned by comparability

New platforms fail socially before they fail technically. The data may be correct, but if finance sees a revenue number that differs from the old system by 2.3%, trust collapses. Architects love target-state diagrams. Operators love reconciliation reports. The operators are right.

Solution

The solution is a coexistence topology with dual ownership, governed by domain semantics and executed through progressive strangler migration.

In plain language:

keep old and new platforms live during migration
define which domains or use cases each platform currently owns
create explicit reconciliation between overlapping outputs
migrate by bounded context, data product, or decision flow
move consumers progressively, not all at once
use events, CDC, and published interfaces rather than ad hoc extracts wherever possible
measure semantic equivalence before declaring cutover
only retire legacy components when both data behavior and operational behavior have stabilized

Dual ownership does not mean duplicated chaos. It means structured overlap.

There are three design ideas underneath this pattern.

Domain-first migration

Use domain-driven design to identify the business meaning you are migrating. A “customer” in sales is not necessarily the same as a “customer” in finance or risk. A “policy” in underwriting may not match the claims context. A migration plan organized solely around schemas, tables, or ingestion tools will miss these distinctions.

Instead, organize migration around bounded contexts:

customer engagement
order fulfillment
billing
claims
product catalog
finance close

For each bounded context, define:

source of authority during each migration phase
event or data contract semantics
acceptable consistency window
reconciliation rules
sign-off criteria
retirement triggers

Progressive strangler migration

The strangler pattern is usually described for applications, but it applies just as well to data platforms. You don’t replace the whole warehouse or integration estate. You wrap, intercept, redirect, and gradually absorb capability into the new platform.

The progression often looks like this:

ingest legacy outputs into the new platform
expose equivalent curated products or APIs
compare outputs in parallel
migrate selected consumers
shift upstream derivation logic where appropriate
decommission old pathways once confidence is proven

This is not glamorous work. It is good work.

Reconciliation as a first-class architectural capability

Reconciliation is not a testing activity bolted on at the end. It is part of the topology. During coexistence, you need automated comparison of:

record counts
key distributions
aggregate balances
event completeness
late-arriving corrections
business rule outcomes
dimensional conformance
lineage and freshness

The purpose is not merely to catch defects. It is to explain differences. In real enterprises, some divergence is expected and acceptable within defined windows. The architecture must distinguish between semantic mismatch, processing lag, and operational incident.

Architecture

A practical coexistence topology typically includes four layers of concern:

Operational source domains

ERP, CRM, billing, policy, claims, order management, microservices.

Movement and event backbone

CDC, Kafka topics, file drops where unavoidable, integration services.

Old and new data platforms in parallel

Legacy warehouse / marts and modern lakehouse / analytical platform.

Consumer-facing semantic products

Reports, APIs, domain data products, ML features, regulatory extracts.

The crucial rule is that consumers should increasingly depend on stable semantic interfaces rather than platform internals. That is how you preserve optionality while migrating underneath.

This topology says something important: the migration is not one pipe from left to right. It is a period of overlapping derivation and overlapping consumption.

For architecture governance, every overlapping flow needs answers to these questions: ArchiMate for governance

which platform is authoritative for which decision?
what latency is expected?
what business rules are encoded where?
how is schema evolution controlled?
how are corrections and replays handled?
who owns incidents and sign-off?

If Kafka is in the picture, event design becomes central. Events should represent domain facts, not database gossip. “OrderPlaced” is useful. “OrderTableRowUpdated” is not. During coexistence, fact-style events let the new platform build durable projections while legacy systems continue their existing batch logic. Weak events tied to source tables simply replicate old confusion at higher speed.

This is where microservices can help and hurt. They help when each service has a clear bounded context and emits well-defined events. They hurt when every service publishes under-specified payloads with incompatible customer identifiers and no lifecycle semantics. A data platform migration magnifies service design quality. Bad domain boundaries become expensive downstream.

Ownership model

Dual ownership usually means:

legacy platform team owns continued correctness of legacy outputs and known transformation behavior
modern platform team owns ingestion, curated products, new interfaces, observability, and migration mechanics
domain teams own semantic definitions, business rule interpretation, and acceptance criteria
governance or architecture function resolves authority boundaries and decommissioning gates

If this sounds heavy, that’s because ambiguity is heavier.

That gate in the middle matters. It should be based on business acceptance, not just row counts.

Migration Strategy

The migration strategy should be progressive, evidence-based, and domain-scoped.

Phase 1: Discover semantics, not just pipelines

Start by inventorying:

critical data products and reports
upstream dependencies
embedded transformations
consumer usage and business criticality
reference/master data dependencies
quality rules and exception handling
timing expectations
audit and lineage requirements

Then map these to bounded contexts. This usually exposes the real shape of the problem. You discover that “customer” has five active definitions, “active policy” differs across functions, and one ancient ETL script is responsible for half the month-end reporting logic.

Good. Now you are doing architecture.

Phase 2: Establish coexistence contracts

For each domain slice, define:

current source of authority
target semantic contract
event schemas or interface contracts
reconciliation dimensions
materialization patterns
consumer migration order
fallback path

This phase often introduces a canonical or shared contract per bounded context, but be careful. Enterprise architects love enterprise-wide canonical models more than enterprises deserve. Keep contracts contextual. Shared only where the business meaning is truly shared.

Phase 3: Build parallel ingestion and target products

Bring data into the new platform through CDC, Kafka events, batch ingestion, or controlled extracts. Preserve lineage. Build target data products that reflect domain language, not just technical landing zones.

This is where many teams overbuild raw zones and underbuild usable semantics. Don’t stop at ingestion. A migrated platform with no trusted semantic layer is just an expensive attic.

Phase 4: Reconcile continuously

Run parallel outputs and compare them. Reconciliation should include:

counts by partition and business key
financial or operational balances
key business metrics
orphan detection
slowly changing dimension behavior
event lag and duplication
backfill consistency
late-arriving fact impact

Document acceptable variance windows. For example, a streaming customer activity projection may differ from legacy batch totals intraday but must converge by next morning. Finance aggregates may require exact parity by close. Not all consistency is equal.

Phase 5: Shift consumers incrementally

Move low-risk and high-value consumers first. Dashboards and exploratory analytics often go before regulatory reporting. Internal APIs may go before external commitments. Data science features may move once quality and timeliness are proven.

Each cutover should be reversible for a while. Not because rollback is pretty, but because confidence always trails design.

Phase 6: Retire derivations, not just infrastructure

A common mistake is to shut down old compute while retaining old semantics in hidden extracts or shadow reports. Real retirement means:

old business rules are either no longer needed or explicitly reimplemented
consumer dependencies are removed
support procedures and controls are updated
audit sign-off is complete
operational ownership is transferred
legacy reconciliation obligations are ended

A server turned off is not a migration complete. A behavior retired is.

Enterprise Example

Consider a global insurer moving from a twenty-year-old enterprise data warehouse to a cloud lakehouse with Kafka-based event ingestion.

On paper, the program looked straightforward:

claims, policy, and billing systems feed Kafka or CDC
raw data lands in cloud storage
curated domain products are built for actuarial, operations, and finance
legacy warehouse is decommissioned in 18 months

In reality, the claims domain exposed the trap.

The old warehouse contained a deeply embedded set of transformations around claim lifecycle:

reopened claims were restated against prior accounting periods
fraud review statuses changed reserve visibility
subrogation recoveries were netted differently for operational vs statutory reporting
some claim events were corrected after settlement with backdated business dates
geography mappings depended on old branch structures long removed from operational systems

None of this was obvious from source schemas.

The new platform team initially ingested claim events from microservices into Kafka, built streaming projections, and produced a “modern” claims data product. It was technically elegant and semantically wrong. Finance saw reserve numbers diverge. Operations saw claim counts rise because reopened claims were represented differently. Trust evaporated in two steering meetings.

The fix was not more technology. It was dual ownership.

The insurer established:

a claims domain working group with business SMEs
continued legacy ownership for statutory claims outputs
modern platform ownership for new operational analytics products
explicit semantic contracts for “open claim,” “settled claim,” and “reopened claim”
reconciliation dashboards comparing reserve totals, claim states, and event lag
progressive consumer migration, starting with operational BI before finance close

Kafka still mattered. It became the backbone for domain facts and replayable history. But it was not treated as magic truth. The team introduced reconciliation jobs that compared stream-derived projections against legacy month-end positions and identified explainable differences due to timing versus true semantic mismatches.

After nine months, operational claims analytics moved fully to the new platform. Finance close remained on the legacy warehouse for another two quarters while the restatement logic was rebuilt and audited. That is dual ownership done properly: no false cutover, no ideological insistence on immediacy, no pretending one team can infer twenty years of domain meaning from a few Avro schemas.

The migration succeeded because the enterprise admitted that coexistence was not failure. It was the path.

Operational Considerations

Coexistence topology creates operational demands that many migration plans underestimate.

Observability

You need observability across both platforms:

ingestion lag
topic health and consumer offsets in Kafka
pipeline freshness
reconciliation status
schema drift
lineage completeness
failed replay counts
downstream consumer usage

A platform team that can show green Spark jobs but not semantic freshness is flying blind.

Incident management

Incidents during coexistence are trickier because there may be two plausible answers. Runbooks should define:

who declares the system of record for each use case
when consumers are held on legacy outputs
how reconciliation exceptions are triaged
replay and correction procedures
communication paths for business stakeholders

Data governance

Governance should focus less on abstract policy and more on active control:

ownership registers for data products
approved business definitions
schema versioning rules
retention and replay policy
PII handling across old and new platforms
audit evidence for migration equivalence

Cost management

Dual running costs money. Compute, storage, licenses, support staff, and duplicated controls all accumulate. But this is not an argument against coexistence. It is an argument against endless coexistence. Set retirement milestones by domain and consumer, and tie them to measurable acceptance.

Testing strategy

Test at multiple levels:

event contract tests
transformation rule tests
historical backfill tests
reconciliation against production snapshots
cutover rehearsal
consumer-level acceptance tests

In data migration, the most important tests are often not unit tests but comparative tests against lived business reality.

Tradeoffs

This pattern is powerful, but let’s not romanticize it.

Benefits

lower business risk than big-bang migration
explicit handling of semantic ambiguity
safer consumer transition
measurable confidence through reconciliation
supports gradual modernization with Kafka and microservices
aligns well with domain-driven data product thinking

Costs

temporary duplication of effort and infrastructure
more governance overhead
more complex incident response
prolonged need for legacy expertise
temptation to linger indefinitely in hybrid mode

Tension points

The big tension is speed versus certainty. Product teams want to move. Control functions want proof. Architects have to broker that tension honestly.

Another tension is local domain freedom versus enterprise consistency. Domain-oriented migration is right, but some enterprise facts must remain aligned: customer identity, legal entity, chart of accounts, product hierarchy. Pretending every bounded context can define everything independently leads to fractured reporting. Pretending one canonical model can rule them all leads to paralysis. The balance is contextual contracts with selective enterprise reference alignment.

Failure Modes

Coexistence topology fails in recognizable ways.

1. Shadow ownership

Both teams think the other team owns reconciliation, incident triage, or semantic validation. Result: drift persists, trust collapses, migration stalls.

2. Technical parity mistaken for business parity

The new platform has all tables loaded and all pipelines green, but key business definitions differ. The architecture is “complete” and the migration is unusable.

3. Kafka as a dumping ground

Teams publish low-quality technical events with unstable schemas and no domain meaning. The new platform becomes a stream-shaped copy of old confusion.

4. No retirement discipline

The organization keeps both platforms alive “just in case.” Dual ownership becomes permanent cost without strategic intent. This is not coexistence topology anymore. It is institutional indecision.

5. Reconciliation without explanation

The team detects mismatches but cannot classify them into timing, duplication, correction, logic divergence, or source defects. Noise overwhelms signal.

6. Ignoring consumer behavior

Teams migrate data but not the actual habits of analysts, report operators, and downstream service owners. People keep exporting from old tools because the new interfaces are unfamiliar or incomplete.

7. Domain boundaries drawn too late

If bounded contexts are not clarified early, the migration builds shared pipelines around ambiguous concepts. Later, when semantics are corrected, everything has to be reworked.

When Not To Use

Coexistence topology is not always the right answer.

Do not use it when:

the platform is low criticality and can tolerate a straightforward cutover
the legacy estate is so unstable that parallel operation adds risk rather than reducing it
the domain is narrow, well understood, and semantically simple
there are very few consumers and rollback is easy
regulatory or contractual constraints require an atomic switchover with pre-approved validation

Also avoid this pattern if the organization lacks the discipline to manage explicit ownership and retirement. Dual ownership without strong governance is just prolonged confusion.

For a small internal analytics stack with limited consumers, a simpler migration may be better. For a narrow SaaS reporting backend, full coexistence may be overkill. This pattern earns its keep in large enterprises where semantic complexity, operational continuity, and stakeholder trust dominate the risk profile.

Several adjacent patterns often work with coexistence topology.

Strangler Fig Pattern

Use it to progressively intercept and replace legacy data flows and consumer dependencies.

Anti-Corruption Layer

Very useful when legacy semantics are awkward or polluted. Translate them before they infect the new domain model.

Event Carried State Transfer

Helpful for propagating changes through Kafka, but only when event semantics are clear and versioned.

CQRS and Materialized Views

Useful in the modern platform for building projections optimized for specific consumer needs during migration.

Data Product Architecture

A strong fit when domains can publish curated, owned, discoverable outputs rather than exposing raw platform internals.

Master Data / Reference Data Alignment

Essential for enterprise-wide dimensions that must stay consistent across coexistence.

This is the broader ecosystem of patterns around the migration. Coexistence topology is the operating model that ties them together.

Summary

Data platform migration is not a move. It is a managed period of shared truth-making.

That is why dual ownership matters.

When legacy and modern platforms coexist, both participate in business outcomes. Both shape decisions. Both can fail the enterprise. Treating the overlap as incidental is the architectural mistake. Treating it as a deliberate topology is the correction.

The practical recipe is clear:

organize migration around bounded contexts and domain semantics
use progressive strangler migration rather than big-bang replacement
let Kafka and event-driven mechanisms carry domain facts where appropriate
build reconciliation into the architecture, not just into testing
define explicit ownership across legacy teams, platform teams, and domain teams
migrate consumers progressively
retire behaviors, not just boxes

The memorable version is simpler: during migration, truth has two addresses. Architect accordingly.

That is not an argument for indecision. It is an argument for realism. In enterprise architecture, realism wins.

Frequently Asked Questions

What is a data mesh?

A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.

What is a data product in architecture terms?

A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.

How does data mesh relate to enterprise architecture?

Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.