Data Platform Evolution Is Incremental

⏱ 19 min read

Most data platforms do not fail because the technology is weak. They fail because the organization tries to replace a living circulatory system in one operation.

That is the recurring fantasy: declare the old warehouse dead, stand up a lakehouse, stream everything through Kafka, sprinkle some microservices around the edges, and call it modernization. Then reality arrives. Revenue reports disagree. Customer counts drift. Finance refuses to sign off. Compliance asks a simple question — “which number is the official one?” — and the room goes quiet. event-driven architecture patterns

A data platform is not just plumbing. It is accumulated meaning. It contains years of domain assumptions, invisible contracts, reporting habits, reconciliation rules, and business compromises encoded in SQL, ETL jobs, dashboards, spreadsheets, and tribal memory. You are not replacing software. You are renegotiating truth.

That is why effective platform evolution is incremental. Not timid. Incremental. There is a difference. The right move is often a strangler migration topology: let the new platform grow around the old one, take one bounded slice of business meaning at a time, and move authority carefully, with reconciliation before replacement.

The central idea is simple and hard to practice. You do not migrate “the data platform” as one thing. You migrate domains, semantics, and decision rights in sequence. You preserve business continuity by running old and new in parallel where necessary, comparing outputs, narrowing gaps, and only then shifting traffic, consumers, and trust.

This is architecture in the real enterprise sense: not a diagramming exercise, but a disciplined way to let the business keep breathing while its spine is rebuilt.

Context

Most enterprises have inherited a layered estate of systems that grew by era rather than design. The transactional core lives in one set of applications. Analytical reporting grew in another. Then came a data warehouse, then a Hadoop detour, then cloud storage, then a stream platform, then machine learning demands, then regulatory pressure for lineage and retention.

The result is usually familiar:

  • operational systems optimized for transactions
  • batch ETL pipelines feeding a central warehouse
  • a growing Kafka estate capturing events from some systems but not others
  • domain teams wanting self-serve data products
  • reporting consumers dependent on old semantic models
  • duplicated logic spread across SQL jobs, APIs, notebooks, and dashboards

At first glance, this looks like a technical integration problem. It is not. It is a semantic fragmentation problem wearing a technical costume.

Sales thinks a customer is an account.

Support thinks a customer is a contract holder.

Billing thinks a customer is a legal entity.

Marketing thinks a customer is an addressable profile.

All are rational inside their own bounded context. Trouble begins when the data platform pretends there is one universal “customer” table and calls that consistency. It is not consistency. It is ambiguity concentrated into a shared asset.

Domain-driven design matters here because data platforms are where bad domain modeling goes to become institutional. If you flatten bounded contexts too early, your migration will not just move data badly; it will preserve confusion more efficiently.

Problem

The typical migration pitch sounds clean:

  1. build the new cloud-native platform
  2. ingest all source systems
  3. recreate existing reports
  4. migrate consumers
  5. decommission legacy

In practice, this sequence collapses under its own simplifications.

First, “recreate existing reports” assumes the old reports actually express stable business rules. Many do not. They express layered compensations for source defects, delayed updates, manual enrichments, and undocumented exceptions. That brittle pile of logic is often the true system of record for decisions.

Second, “ingest all source systems” treats data movement as the core problem. It rarely is. The hard part is determining where meaning is created, where it is transformed, and who gets to declare a value authoritative.

Third, “migrate consumers” ignores that many consumers are not applications but operating habits. A finance analyst trusting one monthly close report is a stronger dependency than a hundred API clients.

Finally, a big-bang replacement creates the worst possible risk profile. You switch technical foundations, semantics, controls, and operational responsibilities at once. When the numbers diverge, there is no stable baseline left.

The real problem is this: how do you evolve a platform’s technical architecture without breaking the organization’s ability to agree on facts?

That question forces a better approach.

Forces

Several forces shape any serious data platform migration.

1. Domain semantics are unevenly distributed

Not every data set carries equal business weight. Product clickstream can tolerate some drift during migration. Revenue recognition cannot. Master data, regulatory reporting, and financial aggregates have a different failure cost from exploratory analytics.

This means migration order should follow semantic criticality, not technical convenience.

2. Legacy platforms embody hidden policy

A twenty-year-old warehouse is full of unpleasant surprises, but it also contains decisions the business has already normalized around. Legacy logic often includes:

  • survivorship rules
  • late-arriving fact handling
  • slowly changing dimension interpretations
  • exclusion logic for bad source records
  • reconciliation thresholds
  • manual overrides

If you ignore these, you are not modernizing; you are deleting institutional memory.

3. Event streams help, but they do not solve semantics

Kafka is enormously useful in migration. It decouples producers and consumers, supports dual-running, and makes change propagation faster and more observable. But Kafka is a transport and log abstraction, not a semantic model. Publishing more events does not magically align bounded contexts.

A badly named event is just confusion with lower latency.

4. Consumers have different tolerance for change

Operational APIs may need continuity at the interface level. BI tools may tolerate back-end refactoring if schemas remain stable. Data science teams might welcome richer raw access but still require historical consistency. Regulators want lineage, evidence, and controls, not innovation theater.

5. The migration itself creates new complexity

Running old and new in parallel costs money and attention. Reconciliation pipelines are real systems. Canonical models can become semantic graveyards. Temporary adapters often outlive their intended lifespan. A strangler topology is safer than a big bang, but it is not free.

Architecture is choosing which pain to take in what order.

Solution

The right move is a progressive strangler migration topology for the data platform.

The word “strangler” can sound brutal, but the pattern is humane in practice. Instead of replacing the old platform wholesale, you put a new topology around it. New capabilities are built in the target platform. Existing capabilities are migrated one bounded domain at a time. Data products, APIs, reports, and event consumers are redirected gradually. Legacy components shrink until what remains is obvious, low-value, or impossible to move economically.

This is not just application strangling applied mechanically to data. A data platform strangler has to handle three extra concerns:

  • semantic partitioning: deciding what business meaning moves as a coherent unit
  • reconciliation: proving that old and new produce acceptably equivalent outcomes
  • authority transfer: explicitly shifting system-of-record responsibility

That middle concern is the one most programs underestimate. In real enterprises, migration trust is won through reconciliation. Not through architecture slides. Not through cloud adoption metrics. Through disciplined side-by-side comparison of metrics, entities, and decisions.

A good migration topology usually includes these elements:

  • source-aligned ingestion from operational systems
  • an event backbone where useful, often Kafka
  • domain-oriented data products or bounded-context stores
  • a semantic serving layer for reports, APIs, and analytics
  • reconciliation pipelines comparing legacy and target outputs
  • routing or consumption controls that allow selective cutover
  • lineage and observability across both worlds

The design principle is straightforward: move closest to domain boundaries, not around technical layers. If you migrate by infrastructure tier alone — first storage, then processing, then serving — you often preserve the old semantic mess in a shinier stack. If you migrate by domain capability, each cutover can be tested against business outcomes.

Architecture

At a high level, the strangler migration topology introduces a parallel path from sources to consumers while preserving the legacy path until trust is established.

Architecture
Architecture

This diagram is deceptively simple. The hard part lies in the boxes we are tempted to label “domain data products” and move on.

In a serious enterprise platform, domain data products should reflect bounded contexts, not an enterprise-wide canonical fantasy. That means the customer domain in billing can coexist with the customer domain in service, provided the platform makes those distinctions explicit and governs downstream composition carefully.

A useful architecture separates concerns into a few layers.

Source ingestion and event capture

Use CDC, integration feeds, or application events to capture operational change. Kafka is often the right tool when multiple downstream consumers need near-real-time propagation or replay. But use it with discipline. Distinguish between:

  • raw source change events
  • business domain events
  • platform control events

A row change in a source table is not automatically a business event. If you publish database mutations and call them domain truth, you are just moving coupling into the stream.

Domain processing and ownership

Each migration slice should have clear ownership. A domain team or platform-aligned product team must define:

  • core entities and their meaning
  • quality rules
  • temporal handling
  • reference data dependencies
  • published contracts

This is where domain-driven design earns its keep. Bounded contexts let you preserve local truth without forcing premature unification. Context mapping then becomes a practical migration tool: where does translation occur, who owns it, and which downstream consumers need which view?

Semantic serving

Consumers should not all bind directly to raw tables or event topics. Expose stable serving interfaces:

  • curated marts
  • data product APIs
  • versioned event topics
  • semantic models for BI

This allows the internals of the migration to change while consumer-facing contracts stay controlled.

Reconciliation and observability

A migration without reconciliation is a leap of faith. A good platform includes dedicated comparison flows:

  • entity-level diffs
  • aggregate comparisons
  • temporal variance analysis
  • late-arrival impact checks
  • exception queues for triage

You do not reconcile once. You reconcile continuously during coexistence.

Reconciliation and observability
Reconciliation and observability

That exception workflow matters. If every mismatch becomes a platform incident, the migration grinds to a halt. Some variances are acceptable, some indicate hidden business logic, and some expose bad source data that the legacy platform has been masking for years.

Architecture must allow disagreement to be investigated rather than merely detected.

Migration Strategy

A strangler migration succeeds or fails on sequencing.

The wrong sequence is usually technical: move storage, then ETL, then reports, then jobs. The right sequence is semantic and economic: move bounded contexts where business value is clear, dependencies are manageable, and reconciliation can be made credible.

A practical strategy often looks like this.

1. Establish the domain map

Inventory the current platform not by technologies but by business capabilities:

  • customer onboarding
  • order lifecycle
  • billing and invoicing
  • fulfillment
  • claims
  • ledger and close
  • workforce analytics

For each capability, identify:

  • upstream sources
  • downstream consumers
  • official metrics
  • critical semantic disputes
  • latency requirements
  • regulatory implications

This is where you discover that “sales reporting” is not one thing, and “customer master” is not master of much at all.

2. Identify migration slices

Choose slices with a coherent business boundary. A slice should be big enough to carry meaning, small enough to reconcile, and valuable enough to justify effort.

Good early slices are often:

  • operationally important but not financially existential
  • fed by a limited set of sources
  • consumed by a manageable group
  • rich enough to exercise the target architecture

Bad early slices are cross-enterprise golden records with dozens of unresolved semantic conflicts. Those become sinkholes.

3. Build dual paths

For each slice, run both legacy and target paths. This can involve:

  • duplicate ingestion
  • event publication and replay
  • parallel transformations
  • mirrored semantic models
  • side-by-side report production

The temptation will be to rush cutover once the new path “looks right.” Resist it. Side-by-side operation is where hidden assumptions surface.

4. Reconcile by decision, not just by data

Comparing row counts is necessary and almost useless on its own. Reconcile at the level the business cares about:

  • invoice totals
  • active customer counts
  • churn metrics
  • claims reserves
  • stock valuation
  • regulatory submissions

If the old and new platforms disagree, ask which business decision would change. That frames priority correctly.

5. Transfer authority explicitly

A migration is complete only when the organization agrees the new platform is authoritative for a given semantic area. That needs a visible decision:

  • official source designation
  • consumer redirection
  • deprecation timeline
  • support model update
  • controls and lineage signoff

Until authority transfers, you have a pilot, not a migration.

6. Decommission aggressively, but not recklessly

Old assets should not be left undead. Once a slice is cut over and stable:

  • retire redundant jobs
  • shut down duplicate feeds
  • archive historical artifacts
  • remove hidden backdoor dependencies
  • update governance catalogues

Every legacy component left behind becomes a future ambiguity.

Here is the progression in topology form.

6. Decommission aggressively, but not recklessly
Decommission aggressively, but not recklessly

Enterprise Example

Consider a global insurer migrating from a legacy enterprise data warehouse to a cloud-native platform with Kafka, object storage, domain-aligned processing, and self-service analytics. cloud architecture guide

On paper, the goal was modernization. In reality, the pressure came from three directions:

  • the warehouse could not support near-real-time claims monitoring
  • data science teams were bypassing governance with ad hoc extracts
  • finance and risk reporting logic had become unmaintainable

The first instinct from technology leadership was to migrate the warehouse wholesale. In six months they had a new storage stack, a new orchestration tool, and hundreds of replicated tables. They also had almost no trust. Claims counts did not line up. Policy status differed by region. Actuarial models produced inconsistent cohorts because the historical effective-dating rules had been interpreted differently.

The program reset. That was the wise move.

Instead of “migrating the warehouse,” they mapped the business into bounded contexts: policy, claims, billing, broker, finance, and risk. They chose claims intake and operational claims visibility as the first migration slice. Why? Because it had obvious business value, manageable downstream consumers, and did not require immediate replacement of financial close processes.

Kafka was introduced as an event backbone for claims lifecycle events, but with discipline. Raw policy admin CDC remained separate from curated claims domain events. The team learned quickly that if they exposed source mutation streams directly to downstream consumers, every source schema quirk became a platform contract. They corrected course by publishing versioned domain events with explicit semantics.

The strangler topology worked like this:

  • legacy operational systems continued feeding the warehouse
  • CDC and application events fed Kafka
  • the new claims domain pipeline built curated data products
  • operational dashboards and a fraud detection service consumed the new products
  • legacy claims reporting remained intact during coexistence
  • reconciliation compared claim counts, reserve movements, and status transitions daily

What mattered was not that every row matched instantly. What mattered was understanding why differences existed. Some variances came from late-arriving regional feeds. Some came from the legacy warehouse excluding claims in specific suspended states. One discrepancy exposed a decades-old workaround where reopened claims were counted differently in one geography because of regulatory reporting needs.

That is a beautiful example of why migration is a semantic exercise. The old platform was not merely old. It was full of business history.

After three months of reconciliation, the new claims visibility platform became authoritative for operational monitoring and fraud analytics. The warehouse remained authoritative for statutory finance. That split was intentional. The program did not force artificial uniformity. It moved authority where it was earned.

Later phases migrated billing and broker domains. Finance came much later, with stronger controls, a longer dual-run, and explicit signoff from controllers. The final architecture was not a monolith warehouse replacement. It was a federated but governed platform with domain data products, shared lineage, Kafka-backed propagation where needed, and stable semantic serving for enterprise reporting.

That is what mature modernization looks like. Less revolution. More surgery with good instruments.

Operational Considerations

Architecture diagrams are clean because they omit the operators’ pain. In migration, operations decides whether the design survives contact with the enterprise.

Data quality and observability

You need observability across both legacy and target paths:

  • freshness and lag
  • completeness
  • schema drift
  • null spikes
  • duplicate rates
  • reconciliation variance

This should be visible by domain, not just by pipeline. A claims operations manager does not care that topic throughput is healthy if reserve movement aggregates are delayed.

Temporal consistency

Many migration bugs are temporal, not structural. Late-arriving data, out-of-order events, backdated corrections, and changing reference data can make both old and new outputs “correct” from different time perspectives.

If you do not model event time, processing time, and effective time explicitly, your reconciliation will become an argument factory.

Metadata and lineage

Lineage is often treated as governance decoration. During strangler migration it is operationally essential. When numbers diverge, lineage tells you whether the issue came from source capture, transformation logic, semantic mapping, or serving-layer aggregation. EA governance checklist

Security and access control

Parallel platforms multiply exposure. Temporary replication paths and dual storage locations can quietly break least privilege. Migrations often create more sensitive copies than steady state. Design for policy consistency from the start.

Cost management

Running two platforms in parallel is expensive. So is endless migration. Set coexistence windows deliberately. Not every low-value report deserves dual-running for six months.

Tradeoffs

Strangler migration is a strong pattern, but not a free lunch.

The obvious benefit is reduced transition risk. You preserve business continuity, gain learning through coexistence, and let authority move in stages. You also create room for domain-driven redesign instead of forklift replication.

The cost is complexity. For a while, you run more things:

  • two paths
  • more metadata
  • more controls
  • more monitoring
  • more decisions about which truth applies where

There is also an organizational tradeoff. Incremental migration demands patience and disciplined governance. Leaders who want one dramatic launch event often find this unsatisfying. But architecture is not stagecraft.

Another tradeoff concerns standardization. Domain-driven slices preserve semantic integrity, but they can frustrate central data teams that want one canonical enterprise model. The answer is not to abandon enterprise consistency. It is to achieve consistency through explicit context mapping, shared standards, and composed views rather than premature semantic flattening.

In short: strangler migration buys safety and learning by spending time and coordination.

That is usually a good deal.

Failure Modes

There are a few repeat offenders.

Canonical model obsession

Teams create a giant enterprise-wide schema intended to unify every source and use case. It becomes abstract, bloated, and detached from actual bounded contexts. Migration stalls because every domain disagreement must be resolved centrally before anything can move.

Transport mistaken for architecture

Kafka gets installed and suddenly every problem is described as an eventing problem. But the streams simply mirror source system confusion at high speed. Without domain contracts, topic governance, and versioning, the stream platform becomes a distributed legacy warehouse.

Reconciliation theater

The program says reconciliation exists, but it really means a few aggregate comparisons in a spreadsheet before go-live. Then downstream users discover real semantic differences after cutover. Trust collapses fast and recovers slowly.

No authority model

Both old and new platforms continue serving the same business question with no explicit designation of record. Teams cherry-pick whichever answer suits them. You have not modernized. You have created competitive truth.

Permanent coexistence

Temporary adapters, duplicate marts, and side-by-side reports linger because decommissioning is politically hard. The organization carries migration cost forever and never receives simplification benefits.

When Not To Use

Strangler migration is powerful, but there are cases where it is the wrong tool.

Do not use it when the platform is tiny, low criticality, and has limited semantic complexity. A small internal analytics stack with a few pipelines may be cheaper to replace directly.

Do not use it when the legacy estate is so broken that parallel comparison is impossible or misleading. If source capture is unreliable and old outputs are untrusted everywhere, reconciliation against the old world may not buy much. In that case, you may need a source-first rebuild with targeted validation against business processes rather than legacy outputs.

Do not use it when regulatory or contractual constraints prevent prolonged dual-running of sensitive data in separate environments unless those controls are designed in from day one.

And do not use it as an excuse to avoid making hard domain decisions. Incremental does not mean indefinite ambiguity.

Several related patterns usually appear alongside a strangler migration topology.

  • Anti-corruption layer: useful when legacy semantic models must be translated before entering the new platform.
  • Change data capture: effective for source-aligned ingestion, especially during coexistence.
  • Event-carried state transfer: useful for distributing derived state, but dangerous if used without ownership clarity.
  • Data mesh-inspired domain products: valuable when domain ownership is real, not just relabeled central control.
  • CQRS-style serving separation: helpful when operational APIs and analytical consumers need different representations.
  • Materialized views and semantic marts: still necessary; self-serve does not remove the need for curated enterprise consumption.

These patterns are not a menu of trendy ideas. They are tools. Use them in service of domain clarity and migration safety.

Summary

Data platform evolution is incremental because enterprise truth is incremental.

That is the heart of it.

You are not replacing a database with a better database. You are moving an organization’s working semantics from one topology to another while the business continues to trade, bill, comply, and decide. A strangler migration topology works because it accepts this reality. It grows the new around the old, slices the move by bounded context, reconciles before replacing, and transfers authority deliberately.

Kafka can help. Microservices can help. Cloud-native storage and processing can help. But none of them rescue weak semantics. Domain-driven design does. Clear authority models do. Reconciliation does. Honest tradeoff decisions do. cloud architecture patterns

The memorable line, if there must be one, is this: in data platform migration, the first system you must preserve is trust.

Everything else is implementation.

Frequently Asked Questions

What is a data mesh?

A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.

What is a data product in architecture terms?

A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.

How does data mesh relate to enterprise architecture?

Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.