Data Platform Migration Fails When You Move | NILUS

⏱ 19 min read

Most data platform migrations fail for an embarrassingly simple reason: teams move storage before they move understanding.

They copy tables from one warehouse to another, replay CDC streams into a lakehouse, stand up Kafka topics, provision shiny new compute, and declare the future has arrived. But the business does not run on tables. It runs on meaning. “Customer” means something. “Active policy” means something. “Net revenue” means something, and in any serious enterprise it means something slightly different depending on who is talking, what decision is being made, and which system has the authority to decide.

That is the crack migrations fall into.

A legacy platform may be slow, expensive, politically radioactive, and technically stale. Still, it often contains decades of embedded semantic decisions: how returns are netted, when a claim becomes open, why an account is considered delinquent, which product hierarchy is official after quarter close, and which source wins when records disagree. When teams migrate data without migrating those semantics, they don’t modernize the platform. They create a second, more confusing reality.

This is why the most reliable migration pattern is not a table-by-table rewrite. It is a semantic strangler: progressively replacing legacy meaning with new bounded semantic services, derived products, and governed data contracts, while proving equivalence and surfacing intentional differences. You strangle not just old pipelines, but old interpretations.

A table copy is easy. A semantic copy is architecture.

And architecture, at enterprise scale, is mostly about deciding where meaning lives, who owns it, and how change is allowed to happen without breaking the business.

Context

Modern data platform migration usually arrives wrapped in urgency. The old warehouse is too expensive. The appliance is at end of life. MPP licensing is a board-level complaint. The nightly ETL window is collapsing. The data lake became a dumping ground. The reporting estate has multiplied into a tax nobody can explain. Meanwhile business teams want machine learning, self-service analytics, streaming use cases, customer 360, and near-real-time operational insight.

So the migration begins.

A common playbook appears:

replicate source tables into a new platform
rebuild ETL in Spark, dbt, or cloud-native services
expose a semantic layer
redirect BI reports
retire legacy jobs

That playbook sounds sensible. It is also dangerously incomplete.

In the old world, many semantics were hidden in ugly places:

COBOL copybooks and mainframe extract logic
stored procedures nobody wants to read
Informatica mappings with conditional branches
hand-maintained reference tables
report filters buried in BI tools
finance close adjustments applied outside the platform
service APIs that “correct” bad upstream data
operational teams reconciling exceptions in spreadsheets

These are not implementation details. They are part of the enterprise’s behavioral model. Domain-driven design teaches us a useful discipline here: don’t confuse the data representation with the domain model. A table called customer is not the Customer domain. It is merely one system’s memory of it.

If you migrate physical structures while ignoring bounded contexts and business language, you will faithfully reproduce none of what matters.

Problem

The core problem is semantic drift during migration.

Legacy and target platforms may contain rows with the same keys and columns, yet produce different answers to important business questions. That happens because data platforms are not neutral containers. They encode business rules, temporal assumptions, quality thresholds, survivorship logic, aggregation conventions, and authority boundaries.

A few examples make this plain:

In insurance, a policy admin system may say a policy is active if issued and not cancelled. Finance may treat it as active only after premium recognition rules are met. Risk may use bound exposure from an earlier event in the lifecycle.
In retail, “net sales” may exclude tax and shipping in finance, include recognized discounts in merchandising, and be event-based in e-commerce telemetry.
In banking, “customer” may mean a legal entity in KYC, a household in marketing, and a party-role relationship in lending.

If the migration team simply lands source tables and rebuilds transformations to “match the old outputs,” two bad outcomes tend to appear.

First, they accidentally preserve contradictions without understanding them. The new platform becomes a faster way to deliver the same confusion.

Second, they unintentionally standardize one interpretation and break dozens of downstream consumers who were relying on the old nuances.

This is the ugly truth: in large enterprises, inconsistency is often not a bug but a frozen treaty between bounded contexts. Migrations fail when teams bulldoze that treaty in the name of simplification.

Forces

Several forces pull against a clean migration.

1. Pressure for technical simplification

Executives fund platform migrations to reduce cost, complexity, and vendor lock-in. They want fewer tools, fewer jobs, fewer copies. Architects want this too. But semantic simplification is not always possible just because technical simplification is desirable.

2. Domain ambiguity

Business terms are overloaded. “Order,” “account,” “member,” “exposure,” “capacity,” “shipment”—these are often context-dependent. Teams assume agreement because they use the same nouns. Then reconciliation starts and everyone discovers they meant different things.

3. Legacy logic is distributed

There is rarely a single place where truth lives. Logic is smeared across ETL jobs, batch extracts, APIs, MDM rules, analyst SQL, and manual workarounds. Migrating it requires archaeology.

4. Demand for coexistence

Most enterprises cannot do a big-bang cutover. Old reports, operational processes, regulatory submissions, and machine-learning pipelines must continue during transition. So the migration must support dual running, comparison, and progressive switchover.

5. Event and batch worlds collide

Kafka, CDC, and microservices invite a real-time architecture. Finance close, regulatory reporting, and historical restatement still require batch discipline, temporal consistency, and replayable snapshots. A good migration architecture has to support both. microservices architecture diagrams

6. Organizational ownership is fragmented

Data teams often do not own the underlying domain decisions. Product, finance, operations, and compliance all have pieces of authority. You cannot solve a semantic problem with plumbing alone.

That is why a data platform migration is not merely a data engineering project. It is a domain model refactoring conducted under load.

Solution

Use a semantic strangler.

The classic strangler pattern replaces a legacy system incrementally by routing more and more behavior through a new implementation until the old system can be retired. For data platform migration, that idea needs a crucial extension: the migration should strangle meaning, not just pipelines.

A semantic strangler has a few defining characteristics:

Bounded contexts are identified before pipelines are rebuilt.

You decide where terms have distinct meanings and stop pretending there is one universal enterprise definition for everything.

Canonical models are used sparingly.

A global enterprise schema is usually where migrations go to die. Instead, define explicit semantic contracts per domain product, with translation where necessary.

Legacy semantics are captured as executable behavior.

Don’t document logic in PowerPoint and hope. Recreate old business meaning in tested transformations, decision services, or derived products.

Progressive cutover happens by domain capability.

Move “customer eligibility,” “policy exposure,” “invoice settlement,” or “product attribution” one semantic capability at a time, not all tables from source system X.

Reconciliation is first-class.

During coexistence, you need side-by-side comparison of old and new outputs, variance classification, and workflows for resolving intentional versus accidental differences.

Data products expose semantics explicitly.

Every published dataset or stream should declare grain, owner, source authority, freshness, quality rules, and business interpretation.

This is where domain-driven design matters. DDD is not only for transactional microservices. It is equally useful in data architecture because it disciplines language and boundaries. It tells us to model around business capability, respect bounded contexts, and make translations explicit. That is exactly what migration requires.

Architecture

At a high level, the semantic strangler architecture separates ingestion, semantic processing, publication, and reconciliation.

The raw zone is not the target state. It is evidence. It preserves source facts and timing so you can replay, compare, and debug. Many migrations make the raw layer too central and accidentally turn the lake into the architecture. It isn’t. The architecture lives in the semantic processing layer.

That semantic layer should be organized by domain, not by technical stage alone. A “silver” zone is not a bounded context. Customer identity resolution is. Claims lifecycle is. Revenue recognition is. Product hierarchy management is.

Within each domain, you define:

source systems of authority
event and state models
domain rules
survivorship and matching logic
temporal semantics
quality and completeness expectations
published outputs for analytical and operational use

Where Kafka and microservices fit is important. If a domain already emits operational events with stable contracts—say OrderPlaced, PaymentCaptured, PolicyCancelled—those are often better semantic building blocks than reverse-engineered table changes. But don’t become doctrinaire. CDC on a transactional database may still be the most practical way to bootstrap, especially in old enterprises. The architectural question is not “events or tables?” It is “where is authoritative business meaning best expressed?”

Here is the progression:

Diagram 2 — Data Platform Migration Fails When You Move Tables Not Meani

This pattern lets you build shadow products in the new platform that mirror legacy outputs initially, then evolve them toward cleaner domain semantics with controlled consumer migration.

A practical architecture often includes these components:

Semantic transformation services

These may be implemented as dbt models, Spark jobs, Flink pipelines, SQL transformations, or domain-aligned services. The technology matters less than the ownership and tests around the business meaning.

Metadata and contract registry

Every data product needs machine-readable contract metadata: schema, business definition, allowed nullability, freshness, lineage, owner, deprecation state.

Reconciliation engine

This compares old versus new at the right level: row, aggregate, event count, balance total, period close, dimensional attribution, or KPI output. It classifies differences into:

exact match
expected difference
source latency difference
model defect
data quality defect
unresolved

Exception workflow

Reconciliation without workflow becomes dashboard theater. Someone has to triage discrepancies, assign owners, document decisions, and approve cutover.

Consumer routing

BI tools, APIs, and downstream jobs need a controlled mechanism to switch from legacy outputs to semantic products. The switch should be reversible.

Migration Strategy

The migration should be organized as a sequence of semantic releases, not a monolithic platform replacement.

1. Inventory business decisions, not just assets

Start by cataloging critical decisions and metrics:

what business questions are answered?
what processes depend on them?
what is the tolerance for difference?
who signs off?

This changes the conversation. Instead of “move 800 tables,” the mission becomes “safely migrate policy exposure used in underwriting, pricing, and regulatory reporting.”

2. Discover bounded contexts

Map where terms diverge. A customer mastered for KYC is not the same model used for marketing segmentation. Capture those as distinct contexts with explicit translation.

This is where many enterprises resist. They want one enterprise customer definition. Sometimes that is possible at a narrow level, usually for identity keys and governance. But trying to collapse all domain semantics into one model often creates a brittle abstraction that satisfies nobody. EA governance checklist

3. Build semantic shadows

For each high-value domain capability, create a shadow product in the new platform that reproduces legacy semantics as closely as practical. This is not surrender to the past. It is controlled compatibility.

Examples:

net_sales_finance_v1
active_policy_underwriting_v1
customer_household_marketing_v1

Version the products and state the semantics in plain language.

4. Run dual pipelines and reconcile

Operate legacy and new semantic products in parallel. Compare outputs over meaningful periods. Reconciliation should include:

record counts
key coverage
aggregate balances
temporal alignment
dimensional distribution
sampled deep dives
business KPI equivalence

The trick is to reconcile at multiple levels. A row-perfect comparison may be impossible if keys or timing differ. But period totals may still need to match exactly for finance.

5. Classify variances

Not every difference is a defect. Some are improvements:

duplicate handling cleaned up
late-arriving events correctly attributed
reference data updated
timezone normalized
business rule fixed

These need formal approval. Otherwise, every migration argument becomes political.

6. Cut over consumer by consumer

Don’t flip the entire reporting estate in one move. Migrate consumers incrementally:

one dashboard family
one regulatory filing feed
one ML feature group
one API endpoint

Track readiness by consumer criticality, dependency complexity, and variance tolerance.

7. Retire legacy logic only after semantic confidence

You can decommission a legacy pipeline when:

reconciliation is stable over agreed periods
exception volume is understood
downstream consumers have switched
operational support exists in the new environment
rollback is no longer needed

This is slower than a brute-force migration. It is also far more likely to end with an actual retirement instead of a permanent dual estate.

Enterprise Example

Consider a global insurer migrating from a Teradata warehouse and assorted ETL tooling into a cloud lakehouse with Kafka-based event ingestion for newer platforms. event-driven architecture patterns

The executive goal sounded simple: consolidate data, reduce cost, and enable near-real-time underwriting analytics.

The first migration wave did what many first waves do. It replicated policy, claims, billing, and customer tables into the new platform. Teams rebuilt hundreds of transformations. The new platform delivered faster data. It also produced a nasty surprise: the underwriting dashboard showed active policy counts 4.7% lower than the legacy warehouse.

Engineering initially treated this as a data quality issue. It wasn’t.

After digging in, they discovered four separate meanings of “active policy”:

policy administration: issued and not cancelled
underwriting exposure: bound and effective within risk period
billing: first successful payment posted
finance: recognized according to accounting calendar and endorsements

The old warehouse had hidden these distinctions through a stack of ETL rules and reporting conventions. One mart used policy admin semantics; another quietly filtered on billing status; a regulatory extract used finance treatment with month-end adjustment files.

The team stopped the table-first migration and restructured around bounded contexts:

Policy Lifecycle
Underwriting Exposure
Billing & Collections
Finance Reporting

They then built semantic products for each context, with named definitions and owners. Kafka events from the modern policy platform were used where event semantics were strong, especially for endorsements and cancellations. CDC remained for older billing and claims systems. A reconciliation service compared old warehouse outputs to new products across policy counts, premium measures, and monthly balances.

The migration then proceeded capability by capability:

underwriting operational analytics moved first because event-based latency mattered and consumers could tolerate small explainable differences
finance products moved later, only after month-end close reconciliations passed for three cycles
customer 360 was postponed because identity semantics across regions were too inconsistent to standardize safely

The result was not a single universal insurance model. It was better: a set of explicit semantic products with translations where needed. The legacy warehouse was retired in phases. More importantly, the business stopped arguing about whether counts were “wrong” and started discussing which context they needed.

That is what good architecture does. It turns invisible ambiguity into visible choice.

Operational Considerations

A semantic strangler lives or dies in operations.

Observability

You need more than pipeline uptime. Track:

source freshness by domain
event lag and replay backlog
reconciliation pass rates
variance trends by product
schema contract violations
consumer cutover status
exception aging

If your monitoring only tells you a Spark job failed, you are still operating plumbing, not semantics.

Temporal discipline

Many migration errors are really time errors:

late-arriving records
event ordering issues
snapshot timing mismatches
different close calendars
timezone conversions

Every semantic product should make its temporal basis explicit: event time, processing time, as-of date, accounting period, effective date, or snapshot timestamp.

Data quality ownership

Quality checks should sit close to domain semantics. Null checks and duplicate counts are not enough. Domain checks matter:

can a claim be closed before opened?
can recognized revenue exceed booked premium?
can a household have multiple primary contacts under this context?
can settlement happen before invoice issuance?

Change management

Schema evolution, rule changes, and reference data updates need governance. Kafka event contracts are powerful, but a poorly managed event field change can break semantic products just as easily as a source table alteration. ArchiMate for governance

Security and compliance

Migrations often increase data accessibility. That is useful and dangerous. Domain products should carry policy metadata for PII, retention, cross-border restrictions, and access control. A customer semantic layer that mixes regulated identity and marketing attributes without careful boundaries is a compliance incident waiting to happen.

Tradeoffs

No worthwhile pattern is free.

More up-front analysis

A semantic strangler demands domain discovery and business participation early. That frustrates teams hoping for a fast technical migration.

Temporary duplication

You will run parallel logic, duplicate outputs, and maintain reconciliation machinery. It looks inefficient. In truth, it is the cost of safe replacement in complex systems.

Slower platform “completion”

If your success metric is “all data moved by Q4,” this pattern may look slow. If your metric is “legacy retired without business disruption,” it is usually faster in the only sense that matters.

Boundaries can become too rigid

DDD can be abused. If teams over-rotate into purity, every domain becomes its own little kingdom and integration gets harder. The point is explicit semantics, not theology.

Not every domain deserves equal investment

Some data is low-value, low-risk, and short-lived. You do not need a grand semantic product for every staging feed. Architecture needs proportion.

Failure Modes

This pattern can still go wrong. Common failure modes include:

1. Mistaking ingestion for migration

Landing raw data in the cloud is not migration. It is relocation.

2. Building an enterprise canonical model too early

This often creates endless debates, slow delivery, and a lowest-common-denominator schema that hides real semantic differences.

3. No executable reconciliation

If comparison is manual, sporadic, or only aggregate-level, defects escape and trust collapses.

4. Consumer cutover without semantic sign-off

Teams switch dashboards or APIs because the platform deadline says so, not because business meaning has been proven. This is how executives lose faith in migrations.

5. Treating Kafka topics as self-describing truth

Events help, but event names can lie. CustomerUpdated tells you very little without contract discipline and domain ownership.

6. Central data team owning semantics it does not understand

The platform team can facilitate. Domain teams must own domain meaning.

7. Permanent coexistence

Sometimes the strangler never strangles. Dual running becomes the destination because nobody made decommissioning criteria explicit.

A migration that ends with both old and new platforms still in use is not transformation. It is architectural debt with better branding.

When Not To Use

The semantic strangler is powerful, but not universal.

Do not use it when:

The legacy platform is genuinely simple

If semantics are shallow, business rules are minimal, and downstream usage is limited, a direct migration may be cheaper and entirely adequate.

You have a small greenfield domain

If a new capability has clean ownership and few dependencies, build the target semantic model directly rather than preserving compatibility with irrelevant legacy behavior.

The old outputs are not trusted and not worth reproducing

Sometimes the right move is to intentionally break from the past. But if you do that, make it explicit and manage it as business change, not accidental divergence.

Regulatory timelines demand a tightly scoped replication

In rare cases, exact reproduction for a fixed filing or data retention need may matter more than semantic redesign. Even then, be honest that you are replicating, not modernizing.

Organizational maturity is absent

If there are no domain owners, no tolerance for dual running, and no appetite for reconciliation, this pattern will stall. Better to narrow scope than pretend semantics can be resolved by architecture alone.

Several adjacent patterns are useful here.

Strangler Fig Pattern

The core migration mechanism: incremental replacement around a live legacy estate.

Anti-Corruption Layer

Essential when translating old domain concepts into new bounded contexts. It protects the target model from legacy semantic pollution.

Data Mesh

Helpful insofar as it reinforces domain ownership and data products. Unhelpful if it is treated as decentralization theater without semantic contracts and governance.

Event Sourcing

Useful in domains where reconstructing lifecycle semantics matters. Not required everywhere, and often impractical for legacy migration.

Change Data Capture

A practical bridge for coexistence, especially with legacy systems. But CDC carries source state changes, not guaranteed business events.

CQRS

Can help separate operational and analytical projections, especially where Kafka-driven streams support multiple domain views.

The pattern language matters because enterprise architecture is rarely one idea. It is a composition of constraints, protections, and sequencing decisions.

Summary

Data platform migration fails when teams move tables and leave meaning behind.

The fix is not more pipelines, more cloud services, or a larger semantic layer purchased from a vendor slide deck. The fix is to treat migration as a semantic refactoring of the enterprise. Identify bounded contexts. Make domain meaning explicit. Build semantic products. Reconcile old and new behavior. Cut over progressively. Retire legacy only when confidence is earned.

This is why the semantic strangler is such a useful idea. It accepts the reality that legacy systems contain business behavior, not just technical debt. It gives you a way to preserve what matters, improve what should change, and prove the difference in between.

In other words: don’t migrate data first. Migrate interpretation.

Everything else is just forklifting confusion into a more expensive place.

Frequently Asked Questions

What is a data mesh?

A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.

What is a data product in architecture terms?

A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.

How does data mesh relate to enterprise architecture?

Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.