The Modern Data Platform Is a Routing Engine

⏱ 20 min read

The modern data platform is not a lake, not a warehouse, not a mesh, and certainly not a dumping ground with prettier dashboards. At scale, it becomes something much more operational and much more political: a routing engine.

That phrase matters.

Most enterprises still talk about data platforms as storage choices. Lake or warehouse. Batch or streaming. Medallion or mesh. But those labels are mostly the furniture. The real architecture problem is deciding how facts move, where they are allowed to land, which meaning they carry when they arrive, and who is trusted to interpret them. In other words: routing.

A routing engine does not merely transport bytes. It directs business events, reference data, derived facts, and analytical projections across an organization with rules, timing guarantees, lineage, ownership, and reconciliation. It decides whether a customer change should become a CRM update, a risk signal, a finance adjustment, a customer-360 enrichment, or all four. It decides whether the order event is authoritative, advisory, or suspect. It decides whether a data product should be pushed, pulled, materialized, replayed, or recomputed.

And this is where most modern data platform programs go wrong. They buy a stack before they understand the topology. They centralize too early, federate too vaguely, and stream data that still has no stable business meaning. They modernize plumbing while leaving semantics trapped in the old estate.

A modern data platform succeeds when it treats topology as progressive, not fixed. It starts from the flows the business actually needs, then evolves routing paths, ownership boundaries, and reconciliation mechanisms over time. That is what I mean by progressive topology: an architecture that can begin in the messy reality of ERP extracts, point-to-point integrations, and Kafka islands, then move deliberately toward governed, domain-aware, event-driven data exchange without requiring a big-bang rewrite. event-driven architecture patterns

This is not theory. It is the difference between a platform that becomes the enterprise nervous system and one that becomes an expensive swamp with lineage.

Context

Enterprise data architecture has spent the last twenty years swinging between extremes.

First came centralization. Put everything in one warehouse. Build enterprise canonical models. Force all consumers through one BI layer. It promised consistency and delivered bottlenecks.

Then came decentralization. Data lakes, self-service analytics, domain ownership, data mesh, event streaming. This promised agility and often delivered fragmentation with better branding. enterprise architecture with ArchiMate

Now most large organizations live in the in-between. They have cloud warehouses, object storage, Kafka or equivalent streaming, dozens or hundreds of SaaS systems, microservices emitting domain events, and a long tail of batch jobs nobody wants to touch. The platform team is asked to make it all coherent. Usually under the slogan of “real-time data.” microservices architecture diagrams

But real-time is rarely the core problem. Coherence is.

The hard part is not storing data cheaply or processing it quickly. The hard part is preserving domain semantics as information moves between operational systems, analytical platforms, and machine learning pipelines. The hard part is changing topology without breaking the business. The hard part is migration.

That is why the data platform should be viewed less like a repository and more like an adaptive routing fabric.

Problem

The typical enterprise data estate is a sedimentary rock of past decisions.

An ERP exports batches overnight. A CRM pushes webhook notifications. A billing service emits Kafka events. A compliance database is updated by file drop. Data engineers ingest all of it into a cloud platform, where transformation pipelines standardize fields and create marts. Somewhere else, product teams build microservices and expose APIs. Somewhere else again, analytics teams reverse-engineer business logic because the source systems never agreed on what “active customer” means.

This creates four recurring problems.

First, semantic drift. The same concept travels through the organization and changes meaning along the way. “Order” in e-commerce is not “order” in finance and not “order” in fulfillment. Yet pipelines often route them as though they were interchangeable.

Second, topology lock-in. Once a platform is organized around a central warehouse, or a lakehouse, or a single stream backbone, every integration begins to reflect that topology. Change later becomes expensive because routing assumptions are embedded everywhere.

Third, operational ambiguity. When reports disagree with operational systems, who is right? Is the warehouse delayed, the event lost, the source corrected retroactively, or the consumer applying stale business rules? Without explicit reconciliation patterns, every discrepancy becomes an incident and a political argument.

Fourth, migration paralysis. Enterprises know they need to move away from brittle ETL estates and nightly file transfers, but they cannot stop the business to rebuild semantics from scratch. So they add new pipelines beside old ones. The result is a dual-running platform without a migration doctrine.

The visible symptom is pipeline sprawl. The deeper disease is lack of routing architecture.

Forces

A useful architecture starts by respecting the forces at play, not pretending they can be wished away.

Domain semantics are stubborn

Domain-driven design matters here because data is not neutral. Facts belong to bounded contexts. A Customer in sales, a Party in master data, an Account Holder in banking, and a Subscriber in telecom may overlap but are not identical. If the platform routes all of them into a single “customer” table too early, it destroys information in the name of simplification.

The platform must carry meaning, not just payloads.

Operational and analytical worlds move differently

Operational systems optimize for transaction integrity and bounded responsibilities. Analytical systems optimize for broad queryability and historical perspective. Streaming platforms optimize for event propagation. These are different kinds of truth. Forcing them into a single access pattern is architectural vanity.

Enterprises migrate in layers, never in one move

No serious bank, retailer, insurer, or manufacturer replaces its integration and reporting landscape in one program. Migration happens incrementally: source by source, domain by domain, capability by capability. That means the target architecture must tolerate mixed modes for years.

Governance is necessary, central control is not

An enterprise needs lineage, quality, retention, access control, and policy enforcement. But that does not mean every transformation or schema decision should be owned by one central team. Good platforms centralize constraints and decentralize domain decisions.

Reconciliation is non-negotiable

In a distributed platform, facts arrive late, out of order, duplicated, corrected, or partially enriched. Reconciliation is not an afterthought for audit teams. It is a first-class design concern. Without it, streaming architecture turns into high-speed confusion.

Solution

The solution is to treat the data platform as a progressive routing engine.

Progressive because topology evolves.

Routing engine because the primary job is controlled movement of meaning.

This model has five core ideas.

1. Route by business meaning, not by technology tier

Do not start with “all data lands in the lake” or “all domain events go to Kafka.” Start with what is being routed:

authoritative business events
reference data
state snapshots
derived analytical facts
policy decisions
quality signals
correction events

These move differently because they mean different things.

A shipment event from fulfillment should probably be routed through a stream backbone for downstream operational consumers and also materialized into analytical storage. A product hierarchy update may need slower, controlled propagation with versioning. A financial correction may need explicit compensation and reconciliation. One topology does not fit all.

2. Preserve bounded contexts before harmonizing

A good platform does not flatten semantics at ingress. It ingests data with context intact, then creates explicit downstream projections for cross-domain use.

This is deeply aligned with domain-driven design. Bounded contexts are not an inconvenience to be cleaned away. They are the only reason the enterprise can evolve systems independently. Harmonization should happen through published contracts, conformed dimensions where justified, and curated data products—not through semantic collapse at the edge.

3. Separate transport, contract, and projection

Many enterprises entangle these:

transport: Kafka topic, file, API, CDC stream
contract: schema, event definition, business meaning
projection: warehouse table, search index, feature store, serving view

These should be architected separately. The same contract may be transported differently over time. The same event stream may feed multiple projections. The same warehouse table may be regenerated from a replay. Decoupling them is what allows progressive migration.

4. Make reconciliation part of the route

Every important route should define:

expected latency
ordering assumptions
deduplication strategy
correction behavior
replay capability
authoritative source
discrepancy handling

If those are absent, you do not have a route. You have hope.

5. Design for strangler migration, not heroic replacement

The old platform will not vanish on command. The new routing engine must coexist with legacy ETL, MDM hubs, and reporting marts. Progressive topology means building a routing layer that can siphon capabilities away from the old estate over time, while dual-running and reconciling outputs until confidence is earned.

Architecture

At a high level, the architecture looks less like a pyramid and more like a transport map.

The crucial component is not a single product called “routing engine.” It is an architectural capability composed of several things:

schema and contract management
metadata and lineage
policy enforcement
event and batch orchestration
quality rules
routing rules
replay and backfill support
reconciliation workflows

In some enterprises this is built from Kafka, CDC tooling, workflow orchestration, a data catalog, and a warehouse/lakehouse. In others it includes integration platforms, stream processing engines, and observability layers. The exact products matter far less than the separation of responsibilities.

Domain-aligned routing

The platform should be organized around domain streams and data products, not generic ingestion pipelines.

For example:

Order Management publishes OrderPlaced, OrderAllocated, OrderCancelled
Fulfillment publishes ShipmentPacked, ShipmentDispatched, ShipmentDelivered
Finance publishes InvoiceIssued, PaymentApplied, CreditNoteRaised

These events are not just technical messages. They are expressions of domain behavior inside bounded contexts. The routing engine then decides what to do with them:

forward to operational subscribers
materialize historical facts
join with reference data for analytics
produce customer notifications
trigger compliance checks
reconcile against downstream balances

That is a much healthier model than shoveling raw database changes into a lake and hoping the semantics emerge later.

Data products as projections, not dumping zones

The phrase “data product” has been abused into near meaninglessness. In a progressive topology, a data product is a deliberately designed projection with:

a defined audience
explicit semantics
quality SLOs
ownership
discoverability
change policy

Some data products are operationally adjacent, updated continuously from event streams. Others are analytical, refreshed in micro-batches or daily. The platform should support both, but never confuse them.

The role of Kafka and microservices

Kafka fits well in this architecture when the enterprise needs durable, replayable event distribution across domains and services. It is particularly valuable where microservices publish business events that multiple consumers need independently.

But Kafka is not the platform. It is one transport backbone.

A common failure is to make Kafka the universal hammer: every dataset becomes a topic, every transformation a stream job, every consumer responsible for reconstructing state. That way lies replay debt and operational exhaustion. Some things belong in streams. Some belong in tables. Mature architecture uses both and routes between them intentionally.

Domain semantics discussion

This is where architects earn their keep.

The modern data platform fails when it becomes a machine for stripping nouns of their context. The data team says “customer,” but sales means prospect, service means contract owner, billing means invoiced party, risk means regulated legal entity, and marketing means profile. You can standardize names all day and still not have semantic consistency.

Domain-driven design offers a better lens. Ask:

which bounded context owns this concept?
what invariants apply there?
what event expresses a meaningful state change?
what can be safely published outside the context?
what must be translated for other domains?

This changes the architecture.

Instead of one canonical enterprise customer model forced on everyone, you may have:

a Party master for identity and legal hierarchy
a Customer Account model for billing
a Subscriber model for telecom products
a Household projection for marketing analytics

The routing engine moves facts between these through explicit translations and projections. That sounds messier than canonical centralization, but it is actually more honest and more evolvable.

A platform that ignores bounded contexts turns every integration into a semantic negotiation conducted in SQL.

Migration Strategy

Progressive topology comes alive during migration.

You do not replace the old warehouse, ETL estate, MDM platform, and integrations in one move. You strangle them capability by capability. The migration strategy should be based on routes, not systems.

Step 1: Identify critical information routes

Map the routes that matter most:

order-to-cash
customer identity propagation
product and pricing distribution
finance close feeds
compliance and audit reporting

Then classify them:

system of record
current transport
latency need
data quality pain
reconciliation pain
consumer criticality

This quickly reveals where progressive migration will generate value.

Step 2: Create parallel routes with explicit contracts

Introduce the new routing path beside the old one. For example, continue nightly ETL from the order system into the warehouse, but also publish order domain events through Kafka and materialize a new analytical projection. Dual-run both, compare outputs, and instrument discrepancies.

Step 3: Reconcile before switching consumers

Do not cut consumers over because the new path looks cleaner. Cut them over when reconciliation proves semantic and numerical equivalence where equivalence is required—or explains intentional differences where it is not.

Step 4: Move consumers incrementally

First move low-risk analytical consumers.

Then downstream operational consumers with fallback.

Then decommission old transformations.

Last, remove old extracts and integration dependencies.

This is a strangler pattern applied to data topology.

Step 4: Move consumers incrementally — Move consumers incrementally

Step 5: Keep historical replay and backfill in scope

Migration always hits the awkward question: what about history? If the new route starts today, can it support three years of analytics? Can it replay corrected events? Can consumers compare pre- and post-cutover metrics? If not, migration is only half-designed.

A routing engine worthy of the name supports backfill and replay as first-class capabilities.

Reconciliation discussion

Reconciliation is the quiet backbone of credible architecture.

Most platform decks mention lineage, quality, and observability. Few talk enough about the practical mechanics of proving that distributed routes still represent the business faithfully.

There are several kinds of reconciliation:

record reconciliation: do source and target contain the same business objects?
balance reconciliation: do aggregates match, such as premiums, invoices, payments, inventory units?
state reconciliation: do lifecycle states align across systems?
temporal reconciliation: are differences due to timing windows rather than errors?
semantic reconciliation: are we comparing concepts that genuinely correspond?

This matters especially in Kafka and microservices environments. Events can be delayed, duplicated, out of order, or corrected after publication. CDC may capture technical changes not intended as business facts. Batch snapshots may “win” numerically while losing important event history.

A mature platform uses reconciliation services that:

consume from both legacy and modern paths
compare business keys and aggregates
classify variances
trigger repair workflows or compensations
produce audit evidence

Reconciliation is how progressive migration becomes trustworthy instead of theatrical.

Enterprise Example

Consider a multinational retailer modernizing order-to-cash and customer analytics.

The estate is familiar. SAP for finance, a commercial OMS for online orders, store systems with delayed batch uploads, a CRM, and a cloud warehouse fed by nightly ETL. Meanwhile, digital teams built microservices around pricing, inventory visibility, and customer notifications, all using Kafka. Leadership wants “real-time customer 360” and “daily financial certainty.” Those two phrases usually indicate an expensive coming argument.

The first instinct was centralization: stream everything into the lakehouse, define a common customer and order model, then rebuild downstream marts. It looked elegant on slides. It would have failed in practice.

Why? Because order semantics differed sharply:

OMS order means commercial intent
fulfillment shipment means physical movement
finance invoice means legal recognition
returns system means reverse logistics
CRM order history means customer-visible narrative

Forcing one model at ingestion would have hidden material business distinctions.

Instead, the retailer adopted a progressive topology.

The OMS, fulfillment services, and pricing services published business events into Kafka with explicit domain contracts. SAP and store systems continued to provide batch and CDC feeds. The routing layer preserved each bounded context, then built several projections:

operational customer notification streams
inventory and order status views for digital channels
financial settlement feeds aligned to finance semantics
analytical customer behavior marts for marketing
reconciliation datasets for order, shipment, invoice, and return matching

Customer identity was especially tricky. Marketing wanted one customer view; finance needed billing party accuracy; stores often had anonymous transactions. So the platform did not pretend there was one universal customer truth. It established a Party identity product, a Customer Account billing product, and a Household marketing projection. Each had different matching logic, quality metrics, and change policy.

Migration happened over eighteen months. Nightly ETL remained in place while event-driven routes were introduced for digital orders. Reconciliation compared order counts, net sales, refunds, and shipment status across old and new paths. When discrepancies appeared, they were often semantic, not technical: cancelled orders after allocation, partial shipments, tax adjustments, delayed store uploads. Because the platform had preserved context, these were diagnosable.

Eventually, digital analytics consumers moved to the new projections first. Then customer service status screens. Finance stayed on the legacy route longest, switching only after month-end reconciliation had passed several cycles. The retailer did not “replace the warehouse” in one stroke. It rerouted the business, one route at a time.

That is what successful modernization looks like in real enterprises: less revolution, more controlled redirection.

Operational Considerations

A routing engine architecture raises the bar operationally.

Observability must follow routes, not just jobs

Pipeline health is not enough. You need route health:

event lag by contract
delivery success by consumer class
schema drift
reconciliation variance
freshness by data product
replay backlog
policy violations

If a dashboard says the pipeline is green while finance numbers are wrong, you are measuring the wrong thing.

Schema evolution needs discipline

Domain contracts evolve. They always do. Additive changes are manageable. Breaking semantic changes are dangerous. The platform should support versioning, compatibility checks, and consumer impact analysis. Schema registries help, but governance must include meaning, not just shape. EA governance checklist

Ownership must be explicit

For each route and product, know:

source owner
contract owner
projection owner
consumer owner
reconciliation owner

Shared ownership is often architecture shorthand for “nobody can fix it quickly.”

Security and privacy are routing concerns

Sensitive data classification, tokenization, regional residency, retention controls, and purpose limitation all influence where data is allowed to flow. Security is not a wrapper around the platform. It is part of routing policy.

Tradeoffs

This architecture is powerful, but not free.

The first tradeoff is complexity versus honesty. Progressive topology exposes the real complexity of enterprise semantics. A simpler-looking centralized model may be easier to explain. It is also more likely to lie.

The second is speed versus governance. Teams can move quickly when they publish events and create projections with autonomy. But without contract discipline and routing policy, the platform devolves into distributed chaos.

The third is duplication versus decoupling. Multiple projections of similar business facts may exist. That offends people who grew up worshipping single sources of truth. But some duplication is the price of decoupling, performance, and bounded-context integrity.

The fourth is streaming ambition versus operational sanity. Real-time routing is seductive. Yet many business capabilities are perfectly served by hourly or daily propagation. Use streaming where timing changes outcomes, not where it merely flatters architecture.

The fifth is migration duration versus risk reduction. Progressive strangler migration takes time. Big-bang replacement promises speed and usually delivers outages, mistrust, and rollback plans nobody rehearsed.

Failure Modes

There are predictable ways this goes wrong.

Treating every change event as a business event

CDC is useful, but a row update is not automatically a meaningful domain fact. If you route technical mutations as though they were business semantics, downstream consumers will infer nonsense.

Inventing a canonical model too early

The enterprise canonical model is often a polite form of semantic imperialism. It smooths over differences that matter, then forces every team to live with the least accurate shared language.

Building a streaming platform without replay governance

Replay sounds wonderful until consumers recalculate side effects, duplicate notifications, or reload broken history. Replay requires contract, idempotency, and compensation strategy.

Ignoring reconciliation because “the stream is the truth”

No distributed transport makes truth automatic. If the platform cannot prove alignment with source and downstream obligations, it is not enterprise-grade.

Over-federating ownership

Domain ownership is good. Giving every team its own naming conventions, quality standards, and access policies is not. Federation without platform constraints is just decentralized entropy.

When Not To Use

This pattern is not for every organization.

Do not use a progressive routing-engine approach if:

your estate is small and a straightforward warehouse solves the problem
your domains are not meaningfully separated
your data movement is mostly analytical batch reporting with low change frequency
you lack the organizational maturity for contract ownership and operational discipline
the business does not need incremental migration and could genuinely replace a contained legacy stack cleanly

A routing-engine architecture shines in large enterprises with multiple operational systems, mixed batch and streaming needs, domain complexity, and migration constraints. In a mid-sized company with a handful of systems and one central analytics team, it may be unnecessary ceremony.

Architecture should solve the problem you have, not advertise your sophistication.

Several adjacent patterns connect naturally to this one.

Strangler Fig Pattern

Essential for progressive migration. Replace old routes piece by piece, not all at once.

Event-Driven Architecture

Useful for propagating domain facts and enabling multiple subscribers. Best when paired with strong contract and reconciliation discipline.

CQRS and Materialized Views

A good fit for creating domain-specific projections from shared event streams.

Data Mesh

Helpful as an ownership model if interpreted pragmatically. Dangerous when treated as a license for every domain to invent its own platform.

Change Data Capture

Valuable as a migration bridge and integration mechanism, but should not be confused with domain event design.

Master Data Management

Still relevant where identity, reference data, and hierarchies need stewardship. But MDM should participate in routing, not monopolize semantics.

Summary

The modern data platform is a routing engine because the enterprise problem is not where data sits. It is how meaning moves.

That shift in perspective changes everything. It pulls architecture away from storage-centric thinking and toward domain semantics, contracts, reconciliation, and migration. It embraces bounded contexts instead of flattening them. It recognizes Kafka and microservices as useful components, not universal answers. It accepts that progressive strangler migration is the normal path for large organizations. And it treats reconciliation as a core operational capability, not an embarrassing afterthought.

Most of all, it gives architects a more honest mental model.

A platform is not modern because it has streaming, a lakehouse, or a catalog. It is modern when it can route the right facts, with the right meaning, to the right consumers, at the right time, with enough evidence to trust the result—and evolve that topology without putting the business on hold.

That is progressive topology.

And in the enterprise, progress is not about moving fast. It is about changing the map without losing the business along the way.

Frequently Asked Questions

What is a data mesh?

A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.

What is a data product in architecture terms?

A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.

How does data mesh relate to enterprise architecture?

Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.