⏱ 20 min read
The modern data platform is not a lake, not a warehouse, not a mesh, and certainly not a dumping ground with prettier dashboards. At scale, it becomes something much more operational and much more political: a routing engine.
That phrase matters.
Most enterprises still talk about data platforms as storage choices. Lake or warehouse. Batch or streaming. Medallion or mesh. But those labels are mostly the furniture. The real architecture problem is deciding how facts move, where they are allowed to land, which meaning they carry when they arrive, and who is trusted to interpret them. In other words: routing.
A routing engine does not merely transport bytes. It directs business events, reference data, derived facts, and analytical projections across an organization with rules, timing guarantees, lineage, ownership, and reconciliation. It decides whether a customer change should become a CRM update, a risk signal, a finance adjustment, a customer-360 enrichment, or all four. It decides whether the order event is authoritative, advisory, or suspect. It decides whether a data product should be pushed, pulled, materialized, replayed, or recomputed.
And this is where most modern data platform programs go wrong. They buy a stack before they understand the topology. They centralize too early, federate too vaguely, and stream data that still has no stable business meaning. They modernize plumbing while leaving semantics trapped in the old estate.
A modern data platform succeeds when it treats topology as progressive, not fixed. It starts from the flows the business actually needs, then evolves routing paths, ownership boundaries, and reconciliation mechanisms over time. That is what I mean by progressive topology: an architecture that can begin in the messy reality of ERP extracts, point-to-point integrations, and Kafka islands, then move deliberately toward governed, domain-aware, event-driven data exchange without requiring a big-bang rewrite. event-driven architecture patterns
This is not theory. It is the difference between a platform that becomes the enterprise nervous system and one that becomes an expensive swamp with lineage.
Context
Enterprise data architecture has spent the last twenty years swinging between extremes.
First came centralization. Put everything in one warehouse. Build enterprise canonical models. Force all consumers through one BI layer. It promised consistency and delivered bottlenecks.
Then came decentralization. Data lakes, self-service analytics, domain ownership, data mesh, event streaming. This promised agility and often delivered fragmentation with better branding. enterprise architecture with ArchiMate
Now most large organizations live in the in-between. They have cloud warehouses, object storage, Kafka or equivalent streaming, dozens or hundreds of SaaS systems, microservices emitting domain events, and a long tail of batch jobs nobody wants to touch. The platform team is asked to make it all coherent. Usually under the slogan of “real-time data.” microservices architecture diagrams
But real-time is rarely the core problem. Coherence is.
The hard part is not storing data cheaply or processing it quickly. The hard part is preserving domain semantics as information moves between operational systems, analytical platforms, and machine learning pipelines. The hard part is changing topology without breaking the business. The hard part is migration.
That is why the data platform should be viewed less like a repository and more like an adaptive routing fabric.
Problem
The typical enterprise data estate is a sedimentary rock of past decisions.
An ERP exports batches overnight. A CRM pushes webhook notifications. A billing service emits Kafka events. A compliance database is updated by file drop. Data engineers ingest all of it into a cloud platform, where transformation pipelines standardize fields and create marts. Somewhere else, product teams build microservices and expose APIs. Somewhere else again, analytics teams reverse-engineer business logic because the source systems never agreed on what “active customer” means.
This creates four recurring problems.
First, semantic drift. The same concept travels through the organization and changes meaning along the way. “Order” in e-commerce is not “order” in finance and not “order” in fulfillment. Yet pipelines often route them as though they were interchangeable.
Second, topology lock-in. Once a platform is organized around a central warehouse, or a lakehouse, or a single stream backbone, every integration begins to reflect that topology. Change later becomes expensive because routing assumptions are embedded everywhere.
Third, operational ambiguity. When reports disagree with operational systems, who is right? Is the warehouse delayed, the event lost, the source corrected retroactively, or the consumer applying stale business rules? Without explicit reconciliation patterns, every discrepancy becomes an incident and a political argument.
Fourth, migration paralysis. Enterprises know they need to move away from brittle ETL estates and nightly file transfers, but they cannot stop the business to rebuild semantics from scratch. So they add new pipelines beside old ones. The result is a dual-running platform without a migration doctrine.
The visible symptom is pipeline sprawl. The deeper disease is lack of routing architecture.
Forces
A useful architecture starts by respecting the forces at play, not pretending they can be wished away.
Domain semantics are stubborn
Domain-driven design matters here because data is not neutral. Facts belong to bounded contexts. A Customer in sales, a Party in master data, an Account Holder in banking, and a Subscriber in telecom may overlap but are not identical. If the platform routes all of them into a single “customer” table too early, it destroys information in the name of simplification.
The platform must carry meaning, not just payloads.
Operational and analytical worlds move differently
Operational systems optimize for transaction integrity and bounded responsibilities. Analytical systems optimize for broad queryability and historical perspective. Streaming platforms optimize for event propagation. These are different kinds of truth. Forcing them into a single access pattern is architectural vanity.
Enterprises migrate in layers, never in one move
No serious bank, retailer, insurer, or manufacturer replaces its integration and reporting landscape in one program. Migration happens incrementally: source by source, domain by domain, capability by capability. That means the target architecture must tolerate mixed modes for years.
Governance is necessary, central control is not
An enterprise needs lineage, quality, retention, access control, and policy enforcement. But that does not mean every transformation or schema decision should be owned by one central team. Good platforms centralize constraints and decentralize domain decisions.
Reconciliation is non-negotiable
In a distributed platform, facts arrive late, out of order, duplicated, corrected, or partially enriched. Reconciliation is not an afterthought for audit teams. It is a first-class design concern. Without it, streaming architecture turns into high-speed confusion.
Solution
The solution is to treat the data platform as a progressive routing engine.
Progressive because topology evolves.
Routing engine because the primary job is controlled movement of meaning.
This model has five core ideas.
1. Route by business meaning, not by technology tier
Do not start with “all data lands in the lake” or “all domain events go to Kafka.” Start with what is being routed:
- authoritative business events
- reference data
- state snapshots
- derived analytical facts
- policy decisions
- quality signals
- correction events
These move differently because they mean different things.
A shipment event from fulfillment should probably be routed through a stream backbone for downstream operational consumers and also materialized into analytical storage. A product hierarchy update may need slower, controlled propagation with versioning. A financial correction may need explicit compensation and reconciliation. One topology does not fit all.
2. Preserve bounded contexts before harmonizing
A good platform does not flatten semantics at ingress. It ingests data with context intact, then creates explicit downstream projections for cross-domain use.
This is deeply aligned with domain-driven design. Bounded contexts are not an inconvenience to be cleaned away. They are the only reason the enterprise can evolve systems independently. Harmonization should happen through published contracts, conformed dimensions where justified, and curated data products—not through semantic collapse at the edge.
3. Separate transport, contract, and projection
Many enterprises entangle these:
- transport: Kafka topic, file, API, CDC stream
- contract: schema, event definition, business meaning
- projection: warehouse table, search index, feature store, serving view
These should be architected separately. The same contract may be transported differently over time. The same event stream may feed multiple projections. The same warehouse table may be regenerated from a replay. Decoupling them is what allows progressive migration.
4. Make reconciliation part of the route
Every important route should define:
- expected latency
- ordering assumptions
- deduplication strategy
- correction behavior
- replay capability
- authoritative source
- discrepancy handling
If those are absent, you do not have a route. You have hope.
5. Design for strangler migration, not heroic replacement
The old platform will not vanish on command. The new routing engine must coexist with legacy ETL, MDM hubs, and reporting marts. Progressive topology means building a routing layer that can siphon capabilities away from the old estate over time, while dual-running and reconciling outputs until confidence is earned.
Architecture
At a high level, the architecture looks less like a pyramid and more like a transport map.
The crucial component is not a single product called “routing engine.” It is an architectural capability composed of several things:
- schema and contract management
- metadata and lineage
- policy enforcement
- event and batch orchestration
- quality rules
- routing rules
- replay and backfill support
- reconciliation workflows
In some enterprises this is built from Kafka, CDC tooling, workflow orchestration, a data catalog, and a warehouse/lakehouse. In others it includes integration platforms, stream processing engines, and observability layers. The exact products matter far less than the separation of responsibilities.
Domain-aligned routing
The platform should be organized around domain streams and data products, not generic ingestion pipelines.
For example:
- Order Management publishes
OrderPlaced,OrderAllocated,OrderCancelled - Fulfillment publishes
ShipmentPacked,ShipmentDispatched,ShipmentDelivered - Finance publishes
InvoiceIssued,PaymentApplied,CreditNoteRaised
These events are not just technical messages. They are expressions of domain behavior inside bounded contexts. The routing engine then decides what to do with them:
- forward to operational subscribers
- materialize historical facts
- join with reference data for analytics
- produce customer notifications
- trigger compliance checks
- reconcile against downstream balances
That is a much healthier model than shoveling raw database changes into a lake and hoping the semantics emerge later.
Data products as projections, not dumping zones
The phrase “data product” has been abused into near meaninglessness. In a progressive topology, a data product is a deliberately designed projection with:
- a defined audience
- explicit semantics
- quality SLOs
- ownership
- discoverability
- change policy
Some data products are operationally adjacent, updated continuously from event streams. Others are analytical, refreshed in micro-batches or daily. The platform should support both, but never confuse them.
The role of Kafka and microservices
Kafka fits well in this architecture when the enterprise needs durable, replayable event distribution across domains and services. It is particularly valuable where microservices publish business events that multiple consumers need independently.
But Kafka is not the platform. It is one transport backbone.
A common failure is to make Kafka the universal hammer: every dataset becomes a topic, every transformation a stream job, every consumer responsible for reconstructing state. That way lies replay debt and operational exhaustion. Some things belong in streams. Some belong in tables. Mature architecture uses both and routes between them intentionally.
Domain semantics discussion
This is where architects earn their keep.
The modern data platform fails when it becomes a machine for stripping nouns of their context. The data team says “customer,” but sales means prospect, service means contract owner, billing means invoiced party, risk means regulated legal entity, and marketing means profile. You can standardize names all day and still not have semantic consistency.
Domain-driven design offers a better lens. Ask:
- which bounded context owns this concept?
- what invariants apply there?
- what event expresses a meaningful state change?
- what can be safely published outside the context?
- what must be translated for other domains?
This changes the architecture.
Instead of one canonical enterprise customer model forced on everyone, you may have:
- a Party master for identity and legal hierarchy
- a Customer Account model for billing
- a Subscriber model for telecom products
- a Household projection for marketing analytics
The routing engine moves facts between these through explicit translations and projections. That sounds messier than canonical centralization, but it is actually more honest and more evolvable.
A platform that ignores bounded contexts turns every integration into a semantic negotiation conducted in SQL.
Migration Strategy
Progressive topology comes alive during migration.
You do not replace the old warehouse, ETL estate, MDM platform, and integrations in one move. You strangle them capability by capability. The migration strategy should be based on routes, not systems.
Step 1: Identify critical information routes
Map the routes that matter most:
- order-to-cash
- customer identity propagation
- product and pricing distribution
- finance close feeds
- compliance and audit reporting
Then classify them:
- system of record
- current transport
- latency need
- data quality pain
- reconciliation pain
- consumer criticality
This quickly reveals where progressive migration will generate value.
Step 2: Create parallel routes with explicit contracts
Introduce the new routing path beside the old one. For example, continue nightly ETL from the order system into the warehouse, but also publish order domain events through Kafka and materialize a new analytical projection. Dual-run both, compare outputs, and instrument discrepancies.
Step 3: Reconcile before switching consumers
Do not cut consumers over because the new path looks cleaner. Cut them over when reconciliation proves semantic and numerical equivalence where equivalence is required—or explains intentional differences where it is not.
Step 4: Move consumers incrementally
First move low-risk analytical consumers.
Then downstream operational consumers with fallback.
Then decommission old transformations.
Last, remove old extracts and integration dependencies.
This is a strangler pattern applied to data topology.
Step 5: Keep historical replay and backfill in scope
Migration always hits the awkward question: what about history? If the new route starts today, can it support three years of analytics? Can it replay corrected events? Can consumers compare pre- and post-cutover metrics? If not, migration is only half-designed.
A routing engine worthy of the name supports backfill and replay as first-class capabilities.
Reconciliation discussion
Reconciliation is the quiet backbone of credible architecture.
Most platform decks mention lineage, quality, and observability. Few talk enough about the practical mechanics of proving that distributed routes still represent the business faithfully.
There are several kinds of reconciliation:
- record reconciliation: do source and target contain the same business objects?
- balance reconciliation: do aggregates match, such as premiums, invoices, payments, inventory units?
- state reconciliation: do lifecycle states align across systems?
- temporal reconciliation: are differences due to timing windows rather than errors?
- semantic reconciliation: are we comparing concepts that genuinely correspond?
This matters especially in Kafka and microservices environments. Events can be delayed, duplicated, out of order, or corrected after publication. CDC may capture technical changes not intended as business facts. Batch snapshots may “win” numerically while losing important event history.
A mature platform uses reconciliation services that:
- consume from both legacy and modern paths
- compare business keys and aggregates
- classify variances
- trigger repair workflows or compensations
- produce audit evidence
Reconciliation is how progressive migration becomes trustworthy instead of theatrical.
Enterprise Example
Consider a multinational retailer modernizing order-to-cash and customer analytics.
The estate is familiar. SAP for finance, a commercial OMS for online orders, store systems with delayed batch uploads, a CRM, and a cloud warehouse fed by nightly ETL. Meanwhile, digital teams built microservices around pricing, inventory visibility, and customer notifications, all using Kafka. Leadership wants “real-time customer 360” and “daily financial certainty.” Those two phrases usually indicate an expensive coming argument.
The first instinct was centralization: stream everything into the lakehouse, define a common customer and order model, then rebuild downstream marts. It looked elegant on slides. It would have failed in practice.
Why? Because order semantics differed sharply:
- OMS order means commercial intent
- fulfillment shipment means physical movement
- finance invoice means legal recognition
- returns system means reverse logistics
- CRM order history means customer-visible narrative
Forcing one model at ingestion would have hidden material business distinctions.
Instead, the retailer adopted a progressive topology.
The OMS, fulfillment services, and pricing services published business events into Kafka with explicit domain contracts. SAP and store systems continued to provide batch and CDC feeds. The routing layer preserved each bounded context, then built several projections:
- operational customer notification streams
- inventory and order status views for digital channels
- financial settlement feeds aligned to finance semantics
- analytical customer behavior marts for marketing
- reconciliation datasets for order, shipment, invoice, and return matching
Customer identity was especially tricky. Marketing wanted one customer view; finance needed billing party accuracy; stores often had anonymous transactions. So the platform did not pretend there was one universal customer truth. It established a Party identity product, a Customer Account billing product, and a Household marketing projection. Each had different matching logic, quality metrics, and change policy.
Migration happened over eighteen months. Nightly ETL remained in place while event-driven routes were introduced for digital orders. Reconciliation compared order counts, net sales, refunds, and shipment status across old and new paths. When discrepancies appeared, they were often semantic, not technical: cancelled orders after allocation, partial shipments, tax adjustments, delayed store uploads. Because the platform had preserved context, these were diagnosable.
Eventually, digital analytics consumers moved to the new projections first. Then customer service status screens. Finance stayed on the legacy route longest, switching only after month-end reconciliation had passed several cycles. The retailer did not “replace the warehouse” in one stroke. It rerouted the business, one route at a time.
That is what successful modernization looks like in real enterprises: less revolution, more controlled redirection.
Operational Considerations
A routing engine architecture raises the bar operationally.
Observability must follow routes, not just jobs
Pipeline health is not enough. You need route health:
- event lag by contract
- delivery success by consumer class
- schema drift
- reconciliation variance
- freshness by data product
- replay backlog
- policy violations
If a dashboard says the pipeline is green while finance numbers are wrong, you are measuring the wrong thing.
Schema evolution needs discipline
Domain contracts evolve. They always do. Additive changes are manageable. Breaking semantic changes are dangerous. The platform should support versioning, compatibility checks, and consumer impact analysis. Schema registries help, but governance must include meaning, not just shape. EA governance checklist
Ownership must be explicit
For each route and product, know:
- source owner
- contract owner
- projection owner
- consumer owner
- reconciliation owner
Shared ownership is often architecture shorthand for “nobody can fix it quickly.”
Security and privacy are routing concerns
Sensitive data classification, tokenization, regional residency, retention controls, and purpose limitation all influence where data is allowed to flow. Security is not a wrapper around the platform. It is part of routing policy.
Tradeoffs
This architecture is powerful, but not free.
The first tradeoff is complexity versus honesty. Progressive topology exposes the real complexity of enterprise semantics. A simpler-looking centralized model may be easier to explain. It is also more likely to lie.
The second is speed versus governance. Teams can move quickly when they publish events and create projections with autonomy. But without contract discipline and routing policy, the platform devolves into distributed chaos.
The third is duplication versus decoupling. Multiple projections of similar business facts may exist. That offends people who grew up worshipping single sources of truth. But some duplication is the price of decoupling, performance, and bounded-context integrity.
The fourth is streaming ambition versus operational sanity. Real-time routing is seductive. Yet many business capabilities are perfectly served by hourly or daily propagation. Use streaming where timing changes outcomes, not where it merely flatters architecture.
The fifth is migration duration versus risk reduction. Progressive strangler migration takes time. Big-bang replacement promises speed and usually delivers outages, mistrust, and rollback plans nobody rehearsed.
Failure Modes
There are predictable ways this goes wrong.
Treating every change event as a business event
CDC is useful, but a row update is not automatically a meaningful domain fact. If you route technical mutations as though they were business semantics, downstream consumers will infer nonsense.
Inventing a canonical model too early
The enterprise canonical model is often a polite form of semantic imperialism. It smooths over differences that matter, then forces every team to live with the least accurate shared language.
Building a streaming platform without replay governance
Replay sounds wonderful until consumers recalculate side effects, duplicate notifications, or reload broken history. Replay requires contract, idempotency, and compensation strategy.
Ignoring reconciliation because “the stream is the truth”
No distributed transport makes truth automatic. If the platform cannot prove alignment with source and downstream obligations, it is not enterprise-grade.
Over-federating ownership
Domain ownership is good. Giving every team its own naming conventions, quality standards, and access policies is not. Federation without platform constraints is just decentralized entropy.
When Not To Use
This pattern is not for every organization.
Do not use a progressive routing-engine approach if:
- your estate is small and a straightforward warehouse solves the problem
- your domains are not meaningfully separated
- your data movement is mostly analytical batch reporting with low change frequency
- you lack the organizational maturity for contract ownership and operational discipline
- the business does not need incremental migration and could genuinely replace a contained legacy stack cleanly
A routing-engine architecture shines in large enterprises with multiple operational systems, mixed batch and streaming needs, domain complexity, and migration constraints. In a mid-sized company with a handful of systems and one central analytics team, it may be unnecessary ceremony.
Architecture should solve the problem you have, not advertise your sophistication.
Related Patterns
Several adjacent patterns connect naturally to this one.
Strangler Fig Pattern
Essential for progressive migration. Replace old routes piece by piece, not all at once.
Event-Driven Architecture
Useful for propagating domain facts and enabling multiple subscribers. Best when paired with strong contract and reconciliation discipline.
CQRS and Materialized Views
A good fit for creating domain-specific projections from shared event streams.
Data Mesh
Helpful as an ownership model if interpreted pragmatically. Dangerous when treated as a license for every domain to invent its own platform.
Change Data Capture
Valuable as a migration bridge and integration mechanism, but should not be confused with domain event design.
Master Data Management
Still relevant where identity, reference data, and hierarchies need stewardship. But MDM should participate in routing, not monopolize semantics.
Summary
The modern data platform is a routing engine because the enterprise problem is not where data sits. It is how meaning moves.
That shift in perspective changes everything. It pulls architecture away from storage-centric thinking and toward domain semantics, contracts, reconciliation, and migration. It embraces bounded contexts instead of flattening them. It recognizes Kafka and microservices as useful components, not universal answers. It accepts that progressive strangler migration is the normal path for large organizations. And it treats reconciliation as a core operational capability, not an embarrassing afterthought.
Most of all, it gives architects a more honest mental model.
A platform is not modern because it has streaming, a lakehouse, or a catalog. It is modern when it can route the right facts, with the right meaning, to the right consumers, at the right time, with enough evidence to trust the result—and evolve that topology without putting the business on hold.
That is progressive topology.
And in the enterprise, progress is not about moving fast. It is about changing the map without losing the business along the way.
Frequently Asked Questions
What is a data mesh?
A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.
What is a data product in architecture terms?
A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.
How does data mesh relate to enterprise architecture?
Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.