Migration by Domain Beats Migration by Layer

⏱ 19 min read

Big legacy migrations fail for boring reasons.

Not because teams are stupid. Not because the cloud was overhyped. Not because microservices are inherently flawed. They fail because enterprises choose a migration shape that looks tidy on a slide deck and turns poisonous in the real world. They peel away the UI first. Or the API layer. Or the database. Or they redraw the platform in fashionable boxes and call it transformation. For a few months, it feels like progress. Then the seams start leaking. Business rules are split across old and new stacks. Ownership gets fuzzy. Reconciliation becomes a permanent tax. Every release turns into archaeology.

This is why migration by domain beats migration by layer.

If you are using the strangler pattern, the unit of strangulation should usually be a business capability, not a technical tier. The old system does not die because you replaced Angular with React, or SOAP with REST, or Oracle with Kafka in the middle. It dies when meaningful slices of the business stop depending on it. Legacy systems survive on unresolved semantics. They keep breathing as long as “how claims are adjudicated” or “how policies are endorsed” or “how invoices are settled” still partly lives inside them. event-driven architecture patterns

That is the architectural truth many programs learn too late: software is not organized by the shape of the runtime. It is organized by the meaning of the business.

And so the migration question is not, “Which layer can we modernize first?” The real question is, “Which domain can we take end-to-end, with its language, rules, events, workflows, data, and operational responsibility?” Once you ask that question, a better migration path appears.

Context

Most enterprise estates were not designed. They accreted.

A bank has a core ledger, a customer platform, five generations of channel applications, and integration middleware that has outlived several CIOs. An insurer has policy administration on a mainframe, claims on a packaged platform, CRM in SaaS, underwriting workbenches built in .NET, and dozens of nightly batch jobs nobody wants to touch. A manufacturer has order management tangled with pricing, inventory, fulfillment, and customer-specific exceptions encoded in stored procedures.

These systems are not merely old. They are dense with semantics. They contain years of negotiated compromises between policy, regulation, operations, and customer expectation. That density is why they are hard to replace and why simplistic migration strategies usually fail.

In this environment, strangler architecture is attractive for good reason. Instead of replacing a monolith in one dangerous leap, you progressively route functionality to new services while the old system continues to serve what remains. It is a practical pattern. It respects production reality. It gives you room to learn.

But the strangler pattern is not one thing. The critical design decision is what you strangle by.

Too many programs choose layers because they are visible and easy to fund. “Let’s create a new API facade.” “Let’s move the front end first.” “Let’s introduce an event backbone.” “Let’s modernize the data layer.” Those moves are sometimes useful, but on their own they do not carve the business into coherent pieces. They create modern wrappers around old semantic knots.

A migration should leave you with better boundaries, not shinier plumbing.

Problem

Migration by layer sounds rational. It appeals to infrastructure teams, platform teams, and governance boards because it decomposes the work into familiar technical streams. EA governance checklist

One stream modernizes the UI.

Another modernizes integration.

Another modernizes data.

Another modernizes runtime and deployment.

This creates a dangerous illusion of progress.

You can spend 18 months replacing a presentation layer and discover that all meaningful business decisions still live in the legacy backend. You can put Kafka in the middle and still have no clean domain ownership. You can create a REST API facade that merely republishes the same transactional coupling with prettier URLs. You can even move a database to the cloud and preserve every bad dependency that made change expensive in the first place.

Layer-first migration often fragments semantics across systems:

validation rules in the new API
pricing logic in the old monolith
customer state in a replicated cache
workflow decisions in BPM tooling
reconciliation logic in batch scripts
exception handling in operations runbooks

At that point, the architecture looks distributed, but the business capability is not actually owned anywhere. It is smeared.

That smear is the enemy.

When a domain capability is split across technical layers and generations of software, several things happen quickly:

Change slows down because every enhancement crosses old and new boundaries.
Testing becomes combinatorial because scenarios span multiple execution paths.
Incidents become political because no single team owns the full failure.
Data quality degrades because synchronization becomes probabilistic.
Legacy retirement stalls because the old system still contains decisive rules.

You have not strangled the monolith. You have taught it to haunt your new platform.

Forces

A sensible migration strategy has to balance forces that are often in conflict.

Business continuity

The business rarely wants a pause while architecture catches up. Revenue, claims, order flow, customer service, and regulatory reporting must continue through the migration. This drives incremental approaches.

Semantic integrity

Domain rules need to move in coherent chunks. If “claim intake” or “pricing” or “returns authorization” is split between old and new implementations for too long, inconsistency creeps in. Domain-driven design matters here because bounded contexts are not theoretical niceties; they are migration units.

Data gravity

The hardest part of migration is usually not code. It is state. The old platform often owns authoritative records, hidden invariants, and transaction boundaries that were never documented. New services need autonomy, but autonomy without a clear data migration and reconciliation model is fantasy.

Team topology

Architecture follows ownership more than diagrams. If one team owns the old stack, another owns the new API, another owns the event platform, and nobody owns the end-to-end domain, migration by layer becomes the default. It is easier organizationally, and that is precisely why it is dangerous.

Operational risk

Distributed systems introduce partial failure, eventual consistency, duplicate events, stale reads, dead letter queues, and replay scenarios. A migration strategy that ignores operational semantics will simply relocate the fragility.

Delivery pressure

Executives want visible progress. Layer-based work often produces demos faster: a new portal, a new gateway, a new event stream. Domain migration can look slower at first because it requires untangling semantics before showing a polished interface.

That is a hard sell, but still the right move.

Solution

The better approach is to migrate by domain, using bounded contexts as the primary unit of extraction, and using the strangler pattern to progressively reroute business capability end-to-end.

This is plain domain-driven design applied under pressure.

Instead of asking, “How do we replace the service layer?” ask, “Which domain capability can become independently owned?” Instead of extracting generic technical services, identify a bounded context with clear language, rules, events, and data responsibility. Then move that slice through the strangler path: route incoming requests for that capability to the new implementation, publish and consume domain events, reconcile state during transition, and reduce dependency on the old system until the old implementation is dormant.

That sounds obvious. It rarely is.

A good domain extraction unit has several characteristics:

recognizable business meaning
stable or at least discussable boundaries
clear actors and workflows
explicit inputs and outputs
identifiable source-of-truth data
potential for team ownership
measurable business outcomes

Examples include claims intake, quote generation, payment allocation, order promising, returns processing, customer onboarding, policy endorsement, or account closure.

Not “API orchestration.”

Not “the rules engine.”

Not “the persistence layer.”

Those are implementation concerns. Migrations should be anchored in business semantics.

What the strangler looks like in practice

You place a routing seam in front of the legacy system. Requests for capabilities still in the old platform continue there. Requests for extracted domains are routed to the new service or service group. During migration, some workflows remain hybrid. Events and replicated data support coexistence. Reconciliation becomes a first-class concern, not an afterthought.

This is not a mere traffic-switching trick. The gateway is only the visible edge. The real work is behind it: semantic extraction, data ownership transition, event contracts, compatibility logic, and operational controls.

Architecture

A domain-first strangler architecture typically has five important elements.

1. An entry-point seam

You need a place to decide whether a request goes to legacy or new capabilities. This may be an API gateway, a channel backend, a reverse proxy, or application routing in the UI tier. The point is not technology. The point is controlled dispatch.

2. Bounded contexts with clear ownership

Each extracted domain should have an explicit model, ubiquitous language, service boundary, and owning team. If your “new microservices” all share a common enterprise data model and all depend on the same orchestration layer for decisions, you have not created bounded contexts. You have just exploded the monolith. microservices architecture diagrams

3. Independent state where possible

A migrated domain needs control over its own data if it is to evolve safely. During transition, there may be replicated or synchronized data, but the direction of authority must be explicit. Shared databases are the fastest way to preserve old coupling in new clothes.

4. Event-driven coexistence

Kafka and event streaming are useful here, especially in large enterprises. They allow the new domain to react to legacy changes and vice versa without synchronous entanglement everywhere. But Kafka is not a magic solvent. It carries facts, not understanding. If your event contracts are vague, overloaded, or secretly command-like, you will just distribute confusion faster.

5. Reconciliation and auditability

During coexistence, data will diverge unless you actively design for reconciliation. There is no serious migration without a strategy for mismatch detection, replay, correction, and audit.

A pragmatic target architecture often looks like this:

5. Reconciliation and auditability — Reconciliation and auditability

Notice what is absent: no “shared canonical enterprise database,” no giant orchestration layer pretending to centralize business truth, no layer-by-layer decomposition diagram. The architecture is organized around business capabilities and their transition states.

Migration Strategy

Domain migration is not a slogan. It needs a sequence.

Step 1: Find the seams in the business, not just the code

Use event storming, capability mapping, process analysis, and production incident history. Look for places where the business already thinks in coherent units. Listen for language. The business says “claim registered,” “payment allocated,” “quote bound,” “shipment confirmed.” Those are clues. They are often better guides than package structures or service catalogs.

A useful heuristic: if a capability can be explained to a business leader as a thing they recognize and can measure, it may be a good migration candidate.

Step 2: Choose a domain with favorable extraction economics

Do not start with the most central, tangled, politically charged capability unless you enjoy public suffering. Start where you can gain a clean win: a domain with clear value, manageable dependencies, and enough business importance to matter.

Good first candidates often have:

high change frequency
pain visible to business users
contained transactional scope
identifiable source events
moderate rather than extreme dependency density

Step 3: Define target ownership and source of truth

For the chosen domain, be explicit:

Which team owns behavior?
Which system is authoritative for which data elements?
Which events signal state transitions?
Which old interfaces remain temporarily?
Which invariants must remain globally true?

Without this, you are not migrating. You are improvising.

Step 4: Introduce anti-corruption around the legacy model

The new domain should not absorb the old model wholesale. Use anti-corruption layers to translate legacy concepts, codes, and workflows into the new bounded context. This is one of the most underrated pieces of migration architecture. Enterprises skip it because translation feels like overhead. Then they contaminate the new system with the same legacy semantics they hoped to escape.

Step 5: Run coexistence with explicit reconciliation

During the transition, both old and new may process related facts. You need:

idempotent consumers
replay-safe events
duplicate detection
consistency checks
variance reports
manual repair paths for exceptions

This is where Kafka becomes operationally useful. Event streams provide traceability and decoupled propagation. They do not remove the need for reconciliation. They make disciplined reconciliation possible at scale.

Step 6: Shift traffic and authority progressively

Do not switch the entire enterprise at once. Shift by customer segment, product line, region, channel, workflow type, or transaction class. Progressive rollout is not just risk management. It is learning infrastructure.

Step 7: Remove legacy dependencies ruthlessly

A migration only counts when the old dependency actually dies. Remove legacy writes. Stop reading shadow data “just in case.” Shut down old interfaces. Archive what must be retained. Retire operational runbooks tied to the old path.

Half-retired capabilities are expensive pets.

Reconciliation: the part executives forget and operators never do

Every serious strangler migration needs a reconciliation strategy.

Because during coexistence, truth becomes inconvenient.

Suppose customer onboarding moves to a new service, but downstream policy issuance still happens in the monolith. The onboarding service emits CustomerOnboarded. The legacy system consumes it, enriches records, and creates policy relationships. What if the event is delayed? Duplicated? Consumed but not committed? What if the legacy customer ID mapping fails? What if a call center agent updates customer details in the old UI while the new service is already authoritative for certain fields?

These are not edge cases. These are Tuesday.

A good reconciliation design distinguishes:

authoritative mismatches: where a target differs from the source of truth
timing mismatches: where eventual consistency has not yet converged
mapping mismatches: where identity or code translation failed
process mismatches: where one side advanced workflow and the other did not

You need controls for each class.

Diagram 3 — Migration by Domain Beats Migration by Layer

This is architecture, not housekeeping. If you ignore reconciliation, your migration success will be measured by PowerPoint while your operations teams live inside spreadsheets and incident bridges.

Enterprise Example

Consider a large insurer migrating from a 25-year-old policy administration platform.

The original estate looked familiar: policy servicing, billing, claims notifications, endorsements, and document generation all sat inside one heavily customized core platform. Over the years, digital channels were added, a CRM was integrated, and a Kafka backbone introduced for enterprise events. Leadership wanted “microservices modernization.” The first proposal was layer-based: build a new API layer, move the web front end, and gradually decompose backend services later.

That would have been a mistake.

Why? Because the real pain was not in presentation. It was in policy endorsements: mid-term changes to policies involving coverage adjustments, pricing recalculation, regulatory validation, effective dates, and downstream billing changes. Endorsements were high-value, high-change, and deeply frustrating for both agents and customers. More importantly, they formed a business capability with recognizable semantics and measurable outcomes.

So the insurer migrated endorsements as a domain, not the policy service layer.

They established:

a bounded context for endorsement management
a new domain model independent of the core policy schema
an anti-corruption layer translating policy records and product codes
Kafka events for policy retrieved, endorsement requested, endorsement priced, endorsement bound, billing adjustment requested
a reconciliation process comparing bound endorsements in the new service against policy and billing outcomes in legacy systems

Initially, only simple endorsement types were routed to the new domain: address changes, named driver updates, low-risk coverage amendments. Complex endorsements stayed in the legacy path. This made routing a business decision, not a technical one. Over time, more endorsement categories moved.

What happened?

The new team could release endorsement changes weekly instead of quarterly. Agent handling times dropped. Product teams could introduce new endorsement rules without opening the core platform. Most importantly, the old policy system lost a meaningful chunk of business dependency. It was not just wrapped. It was diminished.

The migration was not painless. They hit classic failure modes:

duplicate events during replay caused double downstream adjustments until idempotency keys were enforced
policy identifiers were inconsistently mapped across regions
customer service agents sometimes used the old screens out of habit, creating split-brain updates
pricing reference data synchronization lagged during a release window

But these were survivable because the domain boundary was coherent. There was a team that owned the full endorsement capability end-to-end, including reconciliation and production support. That is the practical power of domain-first migration. You can solve real problems because someone actually owns the problem.

Operational Considerations

A strangler architecture succeeds or fails in operations long before it wins in strategy.

Observability must follow the domain flow

Tracing should show a business transaction across old and new components. Logging by host, service, or topic is not enough. You need correlation IDs, business keys, event lineage, and dashboards aligned to domain outcomes: claims registered, payments allocated, endorsements bound.

Idempotency is not optional

Kafka consumers, REST endpoints, and command handlers will all be retried eventually. Design for duplicates from day one. A migration environment is a duplicate factory.

Identity mapping is a first-class concern

Legacy IDs, new IDs, reference codes, product variants, regional schemas — all of these create mismatch risk. Identity translation should be explicit, versioned, and observable.

Batch still matters

Many enterprises pretend they are event-driven while their financial truth is still settled overnight. Respect the batch world. Some migrations need hybrid designs where event-driven operational updates coexist with end-of-day reconciliation and reporting loads.

Rollback is rarely symmetrical

You can route requests back to legacy more easily than you can undo side effects already emitted into downstream systems. Rollback plans must distinguish traffic rerouting from business state repair.

Tradeoffs

Domain-first migration is better. It is not free.

Upsides

clearer ownership
faster retirement of meaningful legacy capability
better alignment between architecture and business semantics
less semantic fragmentation
more durable service boundaries
improved ability to evolve independently

Costs

harder upfront analysis
more investment in anti-corruption and translation
more politically challenging team realignment
deeper data migration and reconciliation work
slower cosmetic progress early in the program

That last point matters. Layer-based migration often looks better in the first six months. Domain-first migration looks better in year three, when something useful has actually been removed from legacy and new teams can move without asking permission from a 20-year-old codebase.

Architecture is often the art of disappointing the dashboard to improve the company.

Failure Modes

Even good strategies fail in familiar ways.

1. Fake domains

Teams label technical components as domains: “customer API domain,” “orchestration domain,” “reference data domain.” These are often just layer responsibilities wearing domain language.

2. Shared database backsliding

The new service reads and writes the legacy database “temporarily.” Temporary becomes structural. Autonomy dies quietly.

3. Event theater

Kafka is introduced, but events are poorly modeled, undocumented, and overloaded with integration noise. Teams say they are event-driven when they are merely asynchronous.

4. Split-brain operations

Users can still execute parts of the same business capability in old and new interfaces without clear authority rules. Inconsistency follows.

5. Legacy semantics leak into the new model

Instead of building a bounded context, the team mirrors the old schema, old states, old codes, and old workflow assumptions. The monolith is reborn in containers.

6. No reconciliation budget

The program assumes eventual consistency will sort itself out. It will not. Unreconciled migrations become trust failures.

When Not To Use

Domain-first strangler migration is not always the right answer.

Do not use it if the system is small enough to replace outright with acceptable risk. Incremental strangling introduces complexity; for a modest application with low business criticality, a full replacement may be cheaper and cleaner.

Do not use it when the domain boundaries are so unstable that any extraction would be churn. If the business model itself is in flux and nobody can define stable capability ownership, focus first on discovery and simplification.

Do not force microservices where a modular monolith would do. For some organizations, the real need is not distributed runtime but better modularity, clearer boundaries, and disciplined ownership. A modular monolith can be an excellent target if team scale and operational maturity do not justify many services.

And do not use event-driven coexistence if your organization cannot yet operate it. Kafka is useful, but it raises the bar for schema management, replay handling, support tooling, and data governance. If those disciplines are absent, synchronous integration with simpler compensations may be the wiser transitional design. ArchiMate for governance

This style of migration sits beside several patterns worth understanding.

Strangler Fig Pattern: progressive replacement of legacy functionality behind a routing seam.
Bounded Context: the DDD unit of semantic consistency and a strong candidate for migration scope.
Anti-Corruption Layer: protects the new model from legacy semantic contamination.
Event-Driven Architecture: useful for coexistence, propagation, and decoupled integration, especially with Kafka.
Change Data Capture: often a transitional mechanism for publishing legacy state changes, though not a substitute for good domain events.
Saga / Process Manager: useful when workflows span multiple services and require coordination without distributed transactions.
Modular Monolith: often a better destination than microservices for medium-scale domains or less mature organizations.

These patterns work together. But they should all remain subordinate to the one thing that matters most: preserving and reshaping business meaning as systems evolve.

Summary

Migration by layer is seductive because it is easy to explain. Migration by domain is harder because it forces you to confront what the business actually does.

That is exactly why it works.

A strangler architecture should progressively replace business capabilities, not merely technical tiers. The migration unit should be a bounded context with language, rules, data, events, ownership, and operational accountability. Kafka and microservices can help, especially in complex enterprises, but only when they serve domain clarity rather than distract from it. Reconciliation is essential. Anti-corruption is essential. Progressive traffic shifting is essential. Legacy retirement is the proof, not the promise.

If you remember one line, make it this:

You do not strangle a monolith by wrapping its edges. You strangle it by taking away its meaning.

That is the path that retires legacy for real.

Frequently Asked Questions

What is enterprise architecture?

Enterprise architecture aligns strategy, business processes, applications, and technology in a coherent model. It enables impact analysis, portfolio rationalisation, governance, and transformation planning across the organisation.

How does ArchiMate support architecture practice?

ArchiMate provides a standard language connecting strategy, business operations, applications, and technology. It enables traceability from strategic goals through capabilities and services to infrastructure — making architecture decisions explicit and reviewable.

What tools support enterprise architecture modeling?

The main tools are Sparx Enterprise Architect (ArchiMate, UML, BPMN, SysML), Archi (free, ArchiMate-only), and BiZZdesign. Sparx EA is the most feature-rich, supporting concurrent repositories, automation, scripting, and Jira integration.