Progressive Data Migration in Microservices

⏱ 21 min read

Big-bang migrations are the architectural equivalent of replacing an aircraft engine in mid-flight by first turning both engines off. It is dramatic. It is brave. It is usually a terrible idea.

Most enterprise data migrations fail for the same reason enterprise transformations fail: they are framed as technical moves when they are really changes in business meaning under operational load. The database is never just a database. It is accumulated policy, history, workaround, habit, and power structure. When organizations decompose a monolith into microservices, they often talk about APIs, containers, Kafka topics, and deployment pipelines. Then the hard truth arrives: the real gravity sits in the data. Not tables. Meaning. event-driven architecture patterns

That is why progressive data migration matters.

A progressive migration does not ask an enterprise to leap from one world into another. It changes the world in stages. It creates room for validation, reconciliation, rollback, and learning. It acknowledges that source and target systems will coexist longer than anyone wants. It accepts duplication as a temporary cost of control. And, done well, it turns migration from a cliff into a sequence of deliberate footholds.

This article lays out how to think about progressive data migration in a microservices landscape, especially where event streaming with Kafka is involved. I’ll take a strongly opinionated position: if your migration plan begins with “copy the data and switch over on weekend,” you are probably ignoring domain boundaries, operational realities, and failure modes that will later become executive escalations.

The better path is usually a strangler-style migration for data as much as for functionality. Not glamorous. Not quick. But survivable.

Context

Microservices promise independently deployable services aligned to business capabilities. In practice, many organizations start from a large shared database, then discover that code decomposition is easy compared to data decomposition. A service can own an API in a sprint. Owning the truth for customer credit exposure, order commitments, policy endorsements, or inventory reservations is another matter entirely.

The legacy platform usually encodes years of cross-domain coupling. Customer data is mixed with billing assumptions. Order records contain fulfillment semantics. Product tables carry pricing exceptions and regional rules. Reporting jobs depend on transaction timing. Batch integrations assume overnight completeness. Compliance controls are buried in triggers and stored procedures. Nobody intended this architecture, but enterprises are excellent sedimentary systems.

So the migration challenge is not just “move rows from database A to database B.” It is “reallocate responsibility for business facts without breaking the surrounding ecosystem.”

That distinction changes everything.

A progressive migration approach is most relevant when:

  • a monolith is being decomposed into microservices
  • domains are being separated along bounded contexts
  • zero or near-zero downtime matters
  • both old and new systems must run in parallel for some time
  • regulatory, financial, or customer-impacting data cannot tolerate silent divergence
  • event streaming or asynchronous integration is part of the target architecture

In these settings, migration becomes a long-running architectural capability, not a one-off project task.

Problem

The core problem is deceptively simple: how do you move from a legacy data model to service-owned data stores without losing domain integrity or operational control?

The naive answer is to extract data per service, sync changes during transition, and cut over. The enterprise answer is harsher. Data is not neutral. The same field often means different things in different contexts. “Customer status” in CRM may indicate marketing eligibility; in billing it may indicate payment standing; in fraud it may indicate account restrictions. Migration exposes these semantic fractures immediately.

There are several intertwined problems:

  1. Data ownership is unclear.
  2. Shared databases blur responsibility. Multiple teams update the same entities for different reasons.

  1. The source model is not aligned to domains.
  2. Legacy schemas optimize for application convenience, not bounded contexts.

  1. Dual writes are dangerous.
  2. Updating old and new systems in one business flow introduces race conditions and inconsistency.

  1. Historical data contains exceptions.
  2. Enterprises are full of “special cases” that are actually normal business.

  1. Consumers depend on side effects, not contracts.
  2. Reports, downstream jobs, and partner feeds often rely on incidental behavior from the old platform.

  1. Cutover risk is nonlinear.
  2. The last 10% of migration complexity contains 90% of the operational danger.

This is why progressive migration is not merely an implementation pattern. It is a risk management strategy anchored in domain-driven design.

Forces

Architecture is the art of balancing forces, not choosing ideals. Progressive migration exists because several forces pull in opposite directions.

Domain autonomy vs historical entanglement

You want each microservice to own its data and business rules. But the legacy system has years of entanglement. The desired future is clean bounded contexts; the actual present is mutual dependency by trigger, SQL join, and midnight batch.

Delivery speed vs semantic correctness

Leadership wants visible migration progress. Teams want to ship slices of functionality. But moving quickly without clarifying domain semantics creates expensive rework. A service that owns the wrong concept is just a smaller monolith with better marketing.

Availability vs consistency

Customers and operations need continuity. At the same time, progressive migration introduces temporary duplication and eventual consistency. You cannot wish this away. You must decide where lag is acceptable, where strong consistency is mandatory, and what reconciliation means in each case.

Simplicity vs observability

A migration architecture with CDC, Kafka, outbox patterns, reconciliation jobs, snapshots, and cutover controls can look complicated. It is complicated. But pretending it is simple usually means the complexity is hidden in manual operations and untracked divergence.

Reusability vs bounded context discipline

Shared canonical models are seductive. They promise consistency. In practice they often flatten domain differences and create central governance bottlenecks. Progressive migration works better when services publish domain events in their own language and translation happens at context boundaries. EA governance checklist

A useful rule: when migration debates become heated, the issue is usually not technology. It is ownership of meaning.

Solution

The solution is a progressive strangler migration for both behavior and data.

You introduce new microservices around clear bounded contexts. Each service takes ownership of a subset of business capability and, over time, the corresponding source of truth. During transition, legacy and new systems coexist. Change propagation happens through controlled patterns such as change data capture, outbox events, Kafka streams, anti-corruption layers, and explicit reconciliation. Ownership shifts in phases, not all at once.

The architecture usually evolves through four broad states:

  1. Observe and mirror
  2. New services consume legacy changes and build read models or shadow stores. They do not yet own writes.

  1. Introduce new behavior
  2. New services handle selected business flows while still depending on legacy truth for some data.

  1. Shift write ownership
  2. Writes for a bounded context move to the service-owned store. Legacy receives updates through events, adapters, or transitional synchronization.

  1. Retire legacy dependency
  2. Legacy no longer acts as authority for that domain slice. Remaining consumers are redirected or replaced.

This is not a linear conveyor belt. Different domains move at different speeds. Customer profile may migrate early; financial ledger should migrate with extreme caution; reporting may remain downstream for years.

The key idea is that migration phases should be defined by business authority, not by infrastructure completion. The decisive question is not “is the new database live?” It is “which system is authoritative for this business fact?”

That is the line that matters.

Architecture

A practical migration architecture has a few recurring building blocks.

1. Bounded contexts and service-owned stores

Start with domain-driven design. Identify bounded contexts where language and business rules are coherent. A customer onboarding context is not the same as customer billing. Order capture is not the same as inventory allocation. The migration unit should usually be a bounded context or a subdomain slice, not a set of tables.

Each target microservice should have:

  • a clearly named domain
  • explicit ownership of business rules
  • its own persistence model
  • an API and/or event contract
  • a strategy for consuming historical and ongoing data during transition

2. Anti-corruption layer

Legacy systems rarely speak in the target domain language. Build translators. This is where old status codes, overloaded fields, and hidden invariants are interpreted into cleaner domain concepts. If you skip this, legacy semantics leak directly into new services and the migration reproduces the disease it was meant to cure.

3. Event propagation with Kafka

Kafka is useful here because migration is not just request/response integration. It is state propagation over time. Topics let services consume change streams, replay history, and decouple rollout timing. Common patterns include:

  • CDC from legacy database into Kafka
  • outbox events from new microservices into Kafka
  • stream processing for enrichment or transformation
  • event-driven updates to downstream read models
  • audit and replay support during reconciliation

Kafka is not magic. It helps because migrations are temporal. You need durable, ordered-enough records of change. But if your teams cannot operate event-driven systems, introducing Kafka during a migration can multiply confusion.

4. Reconciliation capability

This is where mature migration architecture separates itself from hopeful integration diagrams. During coexistence, discrepancies will happen. Reconciliation is not an afterthought. It is a first-class subsystem with:

  • identity mapping across systems
  • deterministic comparison rules
  • tolerances for acceptable divergence
  • exception workflows
  • replay or repair mechanisms
  • business sign-off on material mismatches

If money, inventory, or regulatory status is involved, reconciliation is the migration.

5. Explicit authority model

At every phase, define:

  • system of record
  • write authority
  • publication responsibility
  • conflict resolution rule
  • consumer routing path

Ambiguity here creates silent corruption.

Migration Strategy

The migration should proceed in phases with explicit gates. Here is the shape I recommend.

Migration Strategy
Migration Strategy

Phase 1: Mirror and observe

Use CDC or batch export to feed the new service’s shadow store. The service builds read models, validates mappings, and exposes non-authoritative views or internal APIs. This phase is about learning domain truth, not proving technical cleverness.

What to verify:

  • record completeness
  • identity correlation
  • semantic mapping
  • event ordering assumptions
  • acceptable staleness
  • hidden dependencies

This is where the ugly facts appear. A “customer” might be duplicated across geographies. Orders may be mutated after shipment due to returns logic. Nulls turn out to mean three different business states. Good. Better now than during cutover week.

Phase 2: Partial functional extraction

Move a narrow business flow into the new service while legacy remains authoritative for most state. For example, a new Customer Profile service may own preference updates but still source legal identity details from legacy.

This phase should be small and concrete. A common mistake is extracting a generic CRUD service with no business center of gravity. That produces a data wrapper, not a domain service.

Phase 3: Shift write ownership

This is the real migration moment. For selected aggregates, the microservice becomes the write authority. New writes go to the service-owned store. Legacy is updated asynchronously or via a compatibility adapter as required. microservices architecture diagrams

Use one of these patterns carefully:

  • New system publishes events; legacy subscribes
  • Good when legacy can consume downstream updates safely.

  • Facade routes writes to new system and derives legacy projection
  • Useful when callers cannot be changed immediately.

  • Strangler API layer
  • A gateway decides whether requests are served by legacy or the new service based on business scope.

Avoid direct dual writes from application code whenever possible. It looks efficient and creates chaos. The outbox pattern is generally safer for publishing durable post-commit events from the new service.

Phase 4: Read retirement

As confidence grows, route read consumers away from legacy to service-owned APIs or read models. This phase is often neglected, but it is critical. If reports, operational tools, and downstream systems still read legacy directly, you have not really migrated. You have merely added another source of confusion.

Phase 5: Decommission the legacy domain slice

Retire tables, jobs, interfaces, and user behaviors tied to the old domain slice. Decommissioning is not housekeeping. It is how you remove ambiguity. If both systems remain “sort of active,” the migration never truly ends.

Data flow and reconciliation architecture

Below is a common reference pattern for progressive migration with Kafka and reconciliation.

Data flow and reconciliation architecture
Data flow and reconciliation architecture

Reconciliation discussion

Reconciliation deserves blunt language. If two systems hold materially important facts during migration, they will diverge. Not might. Will.

Reasons include:

  • event delivery lag
  • duplicate events
  • mapping bugs
  • out-of-order updates
  • manual corrections in legacy
  • invalid historical records
  • failed replays
  • partial backfills
  • differences in derived logic

So build reconciliation as a product, not a script.

There are two broad reconciliation styles:

Transaction-level reconciliation

Compare each business transaction or aggregate instance between systems. This is appropriate for orders, payments, claims, shipments, and regulated records.

Pros:

  • precise
  • auditable
  • easier to explain to business stakeholders

Cons:

  • expensive at scale
  • requires strong identity mapping
  • can be noisy if semantics are not aligned

State-level reconciliation

Compare aggregate balances, counts, or snapshots over time. This is useful for reporting stores, analytics mirrors, and large-scale synchronization where per-record comparison is too costly.

Pros:

  • scalable
  • good for drift detection

Cons:

  • weaker diagnostic value
  • can hide localized corruption

In practice, enterprises use both.

A mature reconciliation process includes:

  • daily or near-real-time comparison
  • thresholds for warning vs incident
  • quarantine for unresolved mismatches
  • replay capability from Kafka
  • business-owner review for material discrepancies
  • evidence retained for audit

If the phrase “we’ll just spot check a few records” appears in the plan, the plan is unserious.

Enterprise Example

Consider a large insurer migrating from a policy administration monolith to domain-oriented microservices. The monolith manages policy quotes, underwriting decisions, issued policies, endorsements, billing references, and claims linkage in one Oracle schema. Everyone calls it “the policy system,” but that phrase hides several bounded contexts.

The insurer decides to extract a Policy Endorsement service first. Why not customer or billing? Because endorsements are a contained but high-value capability: changing coverage, named drivers, insured assets, or address details after policy issuance. The business wants faster release cycles for endorsement rules without destabilizing the whole platform.

Domain semantics matter

In the legacy schema, an endorsement is not a clean concept. It is represented by policy version rows, transaction codes, effective dates, and a tangle of approval flags. Some records represent genuine customer-requested changes. Others are back-office corrections. Others are billing-only technical adjustments.

If the team simply migrates tables, it will create nonsense in the new service. So they define the domain language first:

  • Policy: contractual coverage aggregate
  • Endorsement: a business-approved policy change with effective date and business reason
  • Technical correction: non-customer-facing amendment for internal data quality
  • Premium impact: financial effect derived from endorsement rules, not stored as arbitrary field delta

That is domain-driven design doing real work. It prevents the new service from inheriting every historical accident.

Migration approach

Phase 1 uses CDC from Oracle into Kafka. An anti-corruption layer interprets legacy transaction codes into endorsement events and populates a shadow store for the Endorsement service. For six weeks, the service is non-authoritative but exposes an internal dashboard to underwriters. This reveals several semantic issues: some “endorsements” are retroactively inserted; effective dates can precede policy issuance in certain correction scenarios; and one region uses free-text reasons that map poorly to global categories.

Phase 2 introduces a narrow new capability: digital address change endorsements for personal auto policies in one country. Customer requests go through a new microservice, which validates eligibility, creates the endorsement aggregate, and emits EndorsementRequested and EndorsementApproved events to Kafka. Legacy still receives a projected update because downstream billing and document generation remain there.

Phase 3 shifts write authority for selected endorsement types entirely to the new service. Legacy stops authoring these transactions. Instead, it consumes canonicalized updates for compatibility. Reconciliation compares policy versions, premium impacts, and document references daily.

Phase 4 migrates underwriting workbench reads and customer service screens away from direct legacy access to service APIs and materialized read models.

Phase 5 retires endorsement-related stored procedures and nightly extracts for the migrated product lines.

What went wrong

Several things, because this is enterprise architecture, not brochureware.

  • A legacy batch job continued to “correct” address-related policy records nightly, causing drift.
  • A Kafka consumer lag incident delayed updates to billing projections for three hours.
  • Historical policies with missing source identifiers could not be reliably correlated.
  • Business teams disagreed on whether a certain correction should count as an endorsement.

None of these were solved by better YAML. They were solved by explicit authority rules, reconciliation workflows, and domain governance with underwriting and operations. ArchiMate for governance

The lesson is plain: migration succeeds when technology serves clarified business semantics. Otherwise, Kafka just distributes your confusion faster.

Operational Considerations

Progressive migration lives or dies in operations.

Observability

You need more than service metrics. Track migration health explicitly:

  • event lag by topic and consumer group
  • backfill progress
  • reconciliation mismatch rates
  • write routing volumes by system
  • API read distribution old vs new
  • data freshness by bounded context
  • cutover flag usage
  • replay counts and dead-letter trends

A migration without a dashboard is rumor-driven architecture.

Backfill strategy

Historical backfill is often more dangerous than real-time sync. It can overload downstream consumers, violate ordering assumptions, and create duplicate side effects.

Use:

  • idempotent consumers
  • partition-aware backfills
  • replay windows
  • write suppression for historical events where appropriate
  • clear distinction between historical load and live traffic

Identity resolution

Almost every migration underestimates identifier complexity. Legacy keys may be reused, overloaded, or local to a region or product line. Build a mapping strategy early:

  • surrogate migration IDs
  • cross-reference tables
  • immutable business identifiers where possible
  • exception handling for ambiguous matches

Security and compliance

During coexistence, data can exist in more places than before. That increases exposure. Ensure:

  • consistent access controls across stores
  • masking in lower environments
  • lineage for regulated fields
  • retention and deletion policies spanning old and new systems
  • audit logs for repair and replay operations

Rollback and roll-forward

A mature migration plan defines both. Rollback is useful when write ownership has just shifted and confidence is low. But after some point, rollback is fantasy because too much new state has accumulated. Then you need roll-forward repair. Know the boundary.

Tradeoffs

Progressive migration is not free. It buys safety by adding temporary complexity.

Benefits

  • reduced cutover risk
  • incremental business value
  • better semantic understanding
  • ability to validate with production-like behavior
  • controlled retirement of legacy dependencies

Costs

  • coexistence overhead
  • duplicate data stores
  • reconciliation engineering
  • temporary latency and eventual consistency
  • prolonged need for dual operational knowledge

There is a deep tradeoff here: you move risk from a single dramatic event into a long period of managed complexity. For most enterprises, that is the right trade. But let’s not pretend it is elegant. During migration, the architecture is intentionally untidy. The point is not purity. The point is survivability.

Another important tradeoff concerns Kafka. Event streaming is excellent for decoupling and replay, but it adds operational burden and forces teams to think in asynchronous failure modes. If your organization has weak event governance, topic sprawl and inconsistent schemas can undermine the migration.

Similarly, strict bounded contexts are valuable, but overzealous decomposition can turn one difficult migration into twenty tiny unstable ones. Sometimes a larger service boundary is the more honest transition step.

Failure Modes

Migration architectures usually fail in familiar ways.

Failure Modes
Failure Modes

1. Hidden dual writes

A team updates both old and new stores in one application flow “just temporarily.” This creates inconsistent outcomes under timeout, retry, and partial failure. Temporary architecture has a way of becoming permanent.

2. Semantic drift

The new service models the domain differently from legacy without explicit translation rules. Both appear correct locally but disagree systematically. Reconciliation fills with “false positives” that are actually design failures.

3. Incomplete consumer migration

Write ownership shifts, but downstream reports and tools still read legacy tables. Users see conflicting data and lose trust in the migration.

4. CDC overconfidence

Teams assume CDC captures the business truth. It captures database changes. Those are not always the same thing. Some business events are implicit, derived, or spread across multiple updates.

5. Backfill side effects

Historical loads trigger downstream logic intended only for real-time transactions: notifications, invoices, compliance alerts, and partner messages. Enterprises have learned this one the hard way.

6. Reconciliation without authority

Data mismatches are detected but nobody owns resolution. The migration accumulates known inconsistency until cutover confidence collapses.

7. Domain slicing by table ownership

Services are split by schema shape rather than business capability. This is perhaps the most common strategic error. It produces chatty services, confused ownership, and endless cross-service transactions.

When Not To Use

Progressive migration is powerful, but not universal.

Do not use it when:

The domain is low value and low volatility

If the capability is peripheral, lightly used, and operationally simple, a direct replacement may be cheaper than a long coexistence strategy.

Data volume is small and downtime is acceptable

For internal tools or niche systems, a scheduled migration window can be perfectly reasonable.

There is no clear target domain model

If bounded contexts are still fuzzy, progressive migration can institutionalize confusion. First clarify the domain.

The organization cannot operate distributed systems

If teams lack event-driven operational maturity, introducing Kafka, CDC, and reconciliation may increase overall risk. Sometimes the right step is modularizing the monolith first.

Regulatory constraints forbid prolonged duplicate state without mature controls

In some highly sensitive domains, coexistence is possible only if governance, lineage, and reconciliation are robust. If they are not, progressive migration may be too dangerous.

In short: do not use a sophisticated migration pattern to compensate for weak domain understanding or weak operations.

Progressive data migration sits alongside several important enterprise patterns.

Strangler Fig Pattern

The obvious companion. New capabilities gradually surround and replace legacy behavior. In data migration, the strangler idea applies to write authority and read routing as much as to APIs.

Anti-Corruption Layer

Essential when legacy concepts should not leak into the new bounded context.

Outbox Pattern

A strong choice for reliable event publication from a service-owned database.

Change Data Capture

Useful for bootstrapping and observing legacy changes, though not a substitute for domain events.

Saga

Relevant for coordinating cross-service workflows after ownership splits. Use carefully; not every migration problem is a saga problem.

Materialized View / CQRS

Helpful for consumer transition, especially when many old read use cases need stable projections.

Parallel Run

A business pattern as much as a technical one. Old and new systems operate together while outputs are compared.

Each pattern addresses a different wound. The trick is not to throw all of them at the problem, but to combine them with intent.

Summary

Progressive data migration in microservices is not about moving bytes. It is about moving authority.

That is the central idea worth remembering.

The path from monolith to service-owned data is usually best handled as a strangler journey: observe, mirror, extract narrow behavior, shift write ownership, reconcile relentlessly, retire old reads, and decommission legacy slices with discipline. Domain-driven design is the compass. Kafka and CDC are useful machinery. Reconciliation is the safety net. Clear authority models are the guardrails.

The hard parts are not glamorous:

  • defining business semantics
  • identifying bounded contexts
  • making temporary duplication safe
  • handling asynchronous failure
  • migrating consumers, not just producers
  • deciding what happens when old and new disagree

This is where architecture earns its keep.

If you remember one line, make it this: in enterprise migration, the source of truth is not where the data happens to live today; it is where the business decides authority belongs tomorrow, and how carefully you manage the journey in between.

That journey should be progressive. Not because it is fashionable. Because in real enterprises, survival is a feature.

Frequently Asked Questions

What is a service mesh?

A service mesh is an infrastructure layer managing service-to-service communication. It provides mutual TLS, load balancing, circuit breaking, retries, and observability without each service implementing these capabilities. Istio and Linkerd are common implementations.

How do you document microservices architecture for governance?

Use ArchiMate Application Cooperation diagrams for the service landscape, UML Component diagrams for internal structure, UML Sequence diagrams for key flows, and UML Deployment diagrams for Kubernetes topology. All views can coexist in Sparx EA with full traceability.

What is the difference between choreography and orchestration in microservices?

Choreography has services react to events independently — no central coordinator. Orchestration uses a central workflow engine that calls services in sequence. Choreography scales better but is harder to debug; orchestration is easier to reason about but creates a central coupling point.