Your Data Platform Is a Routing Problem

⏱ 20 min read

Data platforms rarely fail because the database is too slow.

They fail because the business changes shape faster than the pipes can keep up.

A customer becomes an account-holder in one system, a subscriber in another, a policy owner in a third, and an “identity” in the event stream. The organization calls this integration. The engineers call it complexity. Finance calls it late. Everyone keeps buying more infrastructure, as if a routing mess can be cured with bigger machines.

It usually can’t.

The deeper problem is that most data platforms are designed like warehouses and operated like switchboards. We talk about storage engines, schemas, query layers, streaming backbones, and lakehouses. All important. None of them answer the real question: how does meaning move through the enterprise without being mangled?

That is a routing problem.

Not routing in the narrow network sense. Routing in the enterprise sense: which domain is authoritative for what, where transformations happen, how events and facts are reconciled, how old systems are strangled without freezing the business, and how a platform lets multiple bounded contexts evolve without turning every change into a committee meeting.

This is where many “modern data platform” programs get lost. They begin as a technical modernization and end as a semantic civil war. The migration stalls not because Kafka is hard or cloud storage is expensive, but because nobody can answer simple questions with confidence: event-driven architecture patterns

  • Who owns the customer lifecycle?
  • Which system decides order status?
  • Is an event a business fact or just a notification?
  • When data disagrees, who wins?
  • Can we migrate one flow at a time, or must we cut over the world in a weekend?

If you treat your data platform as a single destination, you will build a brittle monument. If you treat it as a routing topology, you can migrate progressively, preserve domain integrity, and keep shipping.

That is the heart of this pattern: progressive migration topology. It is a way to modernize a data estate by routing data according to domain semantics, introducing new platform capabilities incrementally, and reconciling old and new paths until confidence is earned rather than assumed.

Context

Large enterprises do not have one data problem. They have several, layered over time.

There is usually a transactional core—ERP, CRM, policy admin, order management, billing, claims, core banking, supply chain, reservation engines, something with a 20-year-old schema and terrifying stored procedures. Around it sit integration platforms, ETL jobs, MDM hubs, reporting marts, APIs, operational caches, and increasingly a streaming backbone such as Kafka.

Then someone launches a modernization program.

The stated goals are familiar:

  • decouple from legacy systems
  • enable real-time analytics
  • support microservices
  • reduce batch windows
  • improve data quality
  • create a governed self-service platform
  • move to cloud
  • standardize on events

All sensible goals. The trouble begins when these are pursued as a platform replacement instead of a controlled migration of business meaning.

In domain-driven design terms, the enterprise already contains bounded contexts whether people admit it or not. Sales, billing, customer onboarding, fulfillment, fraud, service assurance, finance reporting—each carries its own language and invariants. A platform that ignores those boundaries will centralize data while decentralizing confusion.

A modern data platform succeeds when it becomes a semantic traffic system. It knows where facts originate. It distinguishes authoritative records from derived projections. It allows multiple representations, but it does not pretend they are the same thing. And it supports gradual rerouting as capabilities move from old systems to new ones.

That last point matters. Migrations rarely fail at the final cutover. They fail months earlier when teams discover hidden dependencies, incompatible semantics, or impossible reconciliation gaps. Progressive topology exists to surface these realities early, route around them, and move one business flow at a time.

Problem

Most enterprise data platforms are built around one of two bad instincts.

The first is the big central repository instinct. Put everything in one lake, one warehouse, one mesh catalog, one golden model, and eventually the enterprise will become coherent. It won’t. You will get scale and access, but not clarity. A giant repository without clear authority rules becomes a museum of plausible lies.

The second is the event-everything instinct. Publish every table change to Kafka, let downstream services subscribe, and call it decoupling. This is integration by exhaust pipe. It moves data quickly, but often strips away the domain semantics needed to interpret it safely.

Both approaches avoid the difficult part: deciding how information should be routed through bounded contexts during migration.

Here is the common enterprise situation:

  • Legacy systems remain system of record for core transactions.
  • New microservices are introduced for selected capabilities.
  • Kafka or another event platform carries change notifications and business events.
  • A cloud data platform supports analytics, machine learning, and sometimes operational use cases.
  • APIs expose current state.
  • Multiple teams need data at different freshness levels.
  • The business cannot tolerate a “stop the world” cutover.

The result is an overlap period where old and new systems both matter. This overlap is not a temporary inconvenience. It is the architecture.

The real problem is not moving data from A to B. It is designing an explicit topology for:

  • source authority
  • semantic transformations
  • event routing
  • reconciliation
  • progressive cutover
  • rollback
  • observability
  • domain ownership

Without that topology, migration becomes improvisation.

Forces

Several forces pull in different directions, and any serious architecture has to acknowledge them rather than pretending they can be optimized away.

Domain autonomy versus enterprise consistency

Bounded contexts need freedom to model their world. Billing should not be forced into the customer service model. Fraud should not wait for analytics governance to publish a table. But the enterprise still needs consistent routing rules for shared concepts such as customer, product, policy, order, or account. EA governance checklist

This is the old DDD tension: local models are healthy; semantic anarchy is not.

Real-time expectations versus authoritative truth

Kafka, CDC, and streaming platforms create a taste for immediacy. The business starts to expect all data to be current everywhere. But many facts become trustworthy only after validation, enrichment, or settlement. Some data is fast but provisional. Some is slow but authoritative. A routing architecture has to represent that distinction explicitly.

Progressive migration versus operational simplicity

The strangler pattern is usually the right migration strategy, but it creates overlap. Overlap means duplicate flows, reconciliation jobs, idempotency rules, and temporary complexity. Enterprises often underinvest here because they expect migration code to be short-lived. It rarely is.

Temporary architecture has a habit of becoming very permanent.

Platform standardization versus local fit

Central teams want common tooling: Kafka, schema registry, data contracts, lineage, cloud storage, orchestration, observability, and access control. Good. But forcing every domain into a single interaction style is a mistake. Some flows are event-driven. Some are API-based. Some are batch snapshots. Some require commands and workflow. Routing should allow different transport mechanisms under a coherent semantic model.

Historical data preservation versus behavioral change

A migration often preserves records but changes behavior. That is where projects get into trouble. Replaying historical orders into a new order service is not enough if order lifecycle rules changed. Data compatibility is not behavioral compatibility.

This is one reason reconciliation deserves architectural prominence. It is not just record matching. It is verifying that business outcomes remain correct while routes change underneath.

Solution

The pattern is to design the data platform as a progressive routing topology rather than a monolithic destination.

At a high level:

  1. Identify bounded contexts and classify data by domain authority.
  2. Separate business facts, integration events, and derived analytical projections.
  3. Introduce an explicit routing layer across APIs, event streams, CDC, and batch paths.
  4. Migrate capability by capability using a strangler approach.
  5. Reconcile old and new paths continuously until confidence allows authority transfer.
  6. Shift consumers incrementally rather than forcing synchronized cutovers.

This sounds obvious when written in six lines. In practice, it changes almost every architectural decision.

The key idea is simple: do not migrate platforms wholesale; migrate routes of meaning.

That means each important business flow gets a topology:

  • where the fact originates
  • how it is published
  • who enriches it
  • where state is materialized
  • how conflicts are resolved
  • how consumers are transitioned
  • how parity is measured
  • when authority transfers

The platform then becomes a controlled environment for these topologies.

Domain semantics first

In DDD terms, routing starts with the ubiquitous language of each bounded context. “Customer created” may mean one thing in onboarding, another in billing, and a third in marketing. A platform team that insists on one canonical event for all customer-related activity usually creates more coupling, not less.

The better approach is to define:

  • authoritative domain facts: emitted by the owning context
  • published integration events: stable enough for external consumption
  • derived enterprise views: assembled for analytics, reporting, or cross-domain use

That sounds like semantic hair-splitting. It isn’t. It is how you avoid turning every field change into an organizational incident.

Reconciliation as a first-class capability

In a progressive migration, dual-running is normal. Legacy and new paths will coexist. Therefore reconciliation cannot be an afterthought. You need explicit mechanisms for:

  • entity matching
  • version and sequence tracking
  • late-arriving event handling
  • duplicate suppression
  • compensating updates
  • parity dashboards
  • business-level discrepancy detection

If a new service and a legacy platform disagree about invoice status, “eventual consistency” is not an explanation. It is a bug report waiting to happen.

Shift authority, not just traffic

Many migration designs reroute reads before writes, or analytics before operations. Sensible. But the important move is authority transfer. At some point one bounded context stops being a downstream copy and becomes the source of truth for a capability. That moment requires clear contracts, rollback strategy, and downstream impact analysis.

This is why progressive topology is more than integration plumbing. It is a governance model for changing system authority safely. ArchiMate for governance

Architecture

A useful way to picture the pattern is as a layered but explicit routing model.

Architecture
Architecture

There are several important choices hidden in this picture.

First, legacy systems are not erased on day one. They become routed participants. CDC can expose change data, but CDC is not the domain model. It is just one source of facts and hints. You still need transformations that translate technical changes into business semantics.

Second, Kafka or an event backbone is helpful, often essential, but it is not the architecture by itself. The routing logic sits in contracts, enrichment, sequencing, and reconciliation. A topic without meaning is just a queue with good marketing.

Third, the data platform serves multiple purposes. Raw storage preserves lineage and replayability. Curated domain data products capture governed, semantically meaningful outputs. Analytics and machine learning consume those products, not random source fragments.

Fourth, microservices should align to bounded contexts, not to UI screens or team charts. A service that owns shipment orchestration should publish shipment domain events. It should not expose internal table changes and expect the enterprise to reverse-engineer intent.

Core routing concepts

A progressive migration topology usually needs these constructs:

  • Ingress adapters for legacy systems, external feeds, and partner systems
  • Domain routers that map events and commands into bounded contexts
  • Transformation services that convert source semantics into target domain language
  • Materialized views for query use cases and consumer isolation
  • Reconciliation services for parity checks and conflict resolution
  • Contract registry for schemas, semantic versions, and ownership metadata
  • Lineage and observability across batch and streaming paths

And yes, some of this sounds like enterprise plumbing. It is. But plumbing is what keeps buildings habitable.

Migration Strategy

This pattern shines during migration, especially under a strangler approach.

The classic strangler fig wraps around the old tree until the new structure can stand alone. In data migration, however, the roots are tangled. You cannot just route requests around a monolith and declare victory. Data dependencies leak everywhere. So the strangler needs a topology.

A practical sequence looks like this:

1. Carve along business capability boundaries

Do not begin with technology domains like “customer data” or “analytics ingestion.” Begin with business capabilities where authority can eventually shift: onboarding, pricing, order capture, fulfillment, claims intake, invoice generation.

Ask one brutal question: can this capability become authoritative anywhere else?

If not, you are integrating, not migrating.

2. Establish route visibility before route changes

Before moving flows, instrument them. Map producers, consumers, latency, sequence behavior, transformation points, and business criticality. Most enterprises discover shadow consumers, brittle nightly dependencies, and undocumented Excel-based business processes at this stage. Better now than during cutover week.

3. Introduce side-by-side publication

Start publishing domain-relevant events and curated data products without changing authority. This creates observability and lets downstream consumers begin adapting. Think of it as opening new lanes before diverting traffic.

4. Reconcile in parallel

Run legacy and new computations together. Compare entity states and business outcomes, not just row counts. For example, compare:

  • invoice totals
  • order fulfillment completion
  • policy lapse determination
  • customer eligibility decisions

This is where many teams learn that “same data” does not mean “same result.”

5. Shift reads, then selected writes, then authority

Move low-risk consumers first: analytics, operational dashboards, notifications, search indexes. Then move bounded write flows with clear rollback options. Only after sustained parity and stable operations do you transfer authority.

6. Decommission by route, not by application

A legacy platform may survive for years while individual routes are retired. That is normal. The goal is not heroic system shutdown. The goal is shrinking the blast radius and business dependence of the old stack until its retirement becomes boring.

That is the best kind of migration ending: boring.

Here is a typical progressive cutover flow:

6. Decommission by route, not by application
Decommission by route, not by application

Why reconciliation changes the game

Without reconciliation, progressive migration is wishful thinking. With it, you can make smaller bets.

The architecture should support both technical and business reconciliation:

  • technical: sequence gaps, duplicates, schema drift, missing keys, lag thresholds
  • business: financial totals, lifecycle state equivalence, SLA outcomes, exception rates

This distinction matters. A stream can be perfectly delivered and still semantically wrong.

Enterprise Example

Consider a global insurer modernizing claims and customer servicing.

The estate looks familiar. A core policy administration platform runs on-premises. Claims intake is partly in a legacy workflow tool, partly in regional portals. Customer data lives in CRM, policy admin, and an MDM hub that nobody fully trusts. Reporting is overnight batch into a warehouse. The company wants near-real-time claims visibility, digital servicing, and fraud analytics. Kafka is introduced as the enterprise event backbone. New microservices are planned for claims intake, correspondence, and customer interaction history. microservices architecture diagrams

If this program is approached as “replace the data warehouse and expose all changes through Kafka,” it will fail slowly and expensively.

The better move is to define route topologies around specific business capabilities.

For example, claims intake becomes its own bounded context. New digital channels submit claims to a claims intake service. That service owns intake validation and publishes a ClaimIntakeSubmitted domain event. The legacy claims processing system still owns adjudication and payment. A transformation service maps intake events into the older claims model and submits them to the legacy system. CDC from the legacy platform then emits processing milestones, which are translated into business events such as ClaimRegistered, ClaimAssigned, and ClaimSettled.

At the same time, a curated claims timeline data product is assembled for service agents and analytics. Reconciliation checks ensure that every submitted intake either results in a corresponding registered claim or a tracked exception. That is a route with semantics, authority, and parity.

Customer servicing is different. Here the company resists the urge to make one “golden customer” service authoritative for everything. Instead:

  • onboarding owns prospect-to-customer conversion
  • policy admin owns policy-holder status
  • CRM owns service interaction preferences
  • billing owns delinquency-related communication constraints

The platform routes domain facts into an enterprise customer interaction view, but authority remains distributed by bounded context. This is domain-driven design doing its proper job: acknowledging that “customer” is not one thing.

A simplified view:

Diagram 3
Your Data Platform Is a Routing Problem

What changed here was not just technology. It was control. The insurer stopped pretending there was a single migration event called “go live.” Instead, it migrated claims intake, claims visibility, customer interaction routing, and fraud feeds as separate topologies. Some authorities moved. Some stayed put. The platform became a set of managed routes, not a single replacement box.

This is how large enterprises actually win: one meaningful capability at a time.

Operational Considerations

A routing architecture lives or dies operationally.

Observability

You need end-to-end tracing across APIs, streams, batch jobs, and materialized views. For each business entity, operators should be able to answer:

  • where did this fact originate?
  • what transformations did it undergo?
  • which consumers have received it?
  • is it pending reconciliation?
  • what is the current authoritative source?

If you cannot answer those questions quickly, incidents will become archaeology.

Contract governance

Schema registries help, but syntax is the easy part. You also need semantic ownership:

  • who owns this event?
  • what business invariant does it represent?
  • is it a domain fact or an integration convenience?
  • what is the deprecation path?

Teams are often very disciplined about Avro compatibility and strangely casual about meaning. The second one hurts more.

Idempotency and ordering

In progressive topologies, duplicates and replays are not edge cases. They are normal. Every critical consumer should be idempotent. Ordering guarantees should be explicit and narrow. If a business process requires total enterprise ordering, stop and rethink the design. You are probably smuggling a workflow engine into a topic.

Data quality and exception handling

Some discrepancies should block progression. Others should be quarantined and repaired. Define thresholds before migration begins. Otherwise every cutover meeting becomes an argument about whether 0.7% mismatch is acceptable.

Cost discipline

Dual-running costs money. Streaming, storage, reconciliation jobs, lineage tooling, and duplicate views all add overhead. This is another reason to migrate by capability. If the overlap period has no expiration logic, the enterprise will quietly fund two platforms forever.

Tradeoffs

This pattern is strong, but not free.

The biggest tradeoff is obvious: you exchange a fantasy of simplicity for managed complexity.

A monolithic replacement looks cleaner on slides. Progressive topology looks messier because it admits reality: multiple authorities, overlapping routes, transitional states, and reconciliation loops. But the mess is real either way. This pattern simply puts it where you can see and govern it.

Other tradeoffs include:

  • More architecture upfront: You must think about domain boundaries and authority before implementation.
  • Longer overlap periods: Old and new systems coexist longer than executives expect.
  • Higher operational maturity needed: Streaming, observability, contract governance, and reconciliation are not optional.
  • Delayed “platform consolidation” optics: You may not retire legacy infrastructure as quickly as the finance deck hoped.
  • Better risk control: In return, failures are smaller, reversibility improves, and business continuity is far stronger.

That last tradeoff is usually worth it. Enterprises do not get points for dramatic cutovers. They get value from changing safely while revenue still depends on yesterday’s systems.

Failure Modes

A few failure modes appear again and again.

Treating CDC as domain truth

CDC is useful, sometimes indispensable, but raw table changes do not equal business events. If you publish low-level updates as enterprise facts, downstream systems will infer semantics differently and drift will spread.

Building a canonical model empire

The dream of one enterprise-wide canonical model is seductive and mostly wrong. Shared reference concepts matter, but forcing all bounded contexts through one model slows change and creates semantic compromises no team actually wants.

Ignoring reconciliation until cutover

By then it is too late. Reconciliation should start as soon as parallel routes exist. Otherwise the first real parity test arrives when business pressure is highest.

Migrating consumers before clarifying authority

Teams often move dashboards, APIs, and derived products onto a new platform while the source authority remains ambiguous. This creates a polished facade over semantic instability. Incidents then become political, because every team can claim the mismatch came from someone else.

Underestimating temporary code

Transformation and reconciliation services are often treated as scaffolding. In reality they may live for years. Build them like production assets.

When Not To Use

This pattern is not universal.

Do not use progressive migration topology if the domain is genuinely simple, the data estate is small, and one system can reasonably remain authoritative without elaborate routing. A startup with a handful of services does not need enterprise semantic traffic management. It needs clarity and speed.

Also avoid this pattern when the business process demands strict centralized transactional consistency across all participating domains and cannot tolerate asynchronous boundaries. In those cases you may need a more tightly integrated operational architecture, at least for the core transaction path.

And if the organization lacks basic operational discipline—no ownership model, weak observability, poor contract management, no tolerance for dual-run cost—then this pattern will expose those weaknesses brutally. Better to fix the operating model first than to install Kafka and hope maturity appears later.

Several patterns complement this one:

  • Strangler Fig Pattern: for progressive replacement of legacy capabilities
  • Bounded Contexts: to define authority and semantic boundaries
  • CQRS: useful where write models and read projections need separation
  • Event Sourcing: sometimes helpful, but only where domain history truly matters
  • Data Products / Data Mesh thinking: valuable if ownership aligns to domains
  • Anti-Corruption Layer: essential when translating between old and new models
  • Change Data Capture: effective as a migration input, not as the whole design
  • Materialized Views: crucial for stabilizing downstream consumption during change

These patterns are not alternatives so much as ingredients. Progressive migration topology is the assembly logic that makes them work together.

Summary

Your data platform is not a place. It is a set of routes.

Once you see that, a lot of enterprise modernization advice starts to look incomplete. The question is not whether to use Kafka, microservices, a lakehouse, CDC, or domain-driven design. The question is how business meaning travels across them while authority shifts from old systems to new ones.

That is the work.

A good progressive migration topology does a few hard things well:

  • it respects bounded contexts
  • it makes source authority explicit
  • it separates domain facts from technical changes
  • it treats reconciliation as core architecture
  • it supports strangler-style migration by capability
  • it moves consumers and authority in steps, not leaps

Most of all, it accepts a truth many programs try to avoid: migration is an overlap problem before it is a replacement problem.

Design for the overlap, and the replacement becomes possible.

Ignore it, and your “modern data platform” will become just another place where meaning goes to get lost.

Frequently Asked Questions

What is a data mesh?

A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.

What is a data product in architecture terms?

A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.

How does data mesh relate to enterprise architecture?

Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.