⏱ 19 min read
Most integration architecture fails for a boring reason: we pretend data is a static asset when, in the enterprise, data is behavior wearing a database costume.
That sounds dramatic, but look around any large organization. “Customer” means one thing in billing, another in CRM, something more constrained in KYC, and something politically explosive in marketing. The raw tables are not the product. The semantic contract around them is. The routing of change, ownership, and meaning is. In practice, your so-called data product is an integration API, whether you admit it or not.
This is the architectural mistake behind many failed data mesh programs, many Kafka platform rollouts, and many service decomposition efforts. Teams are told to publish data products, so they export some tables, push events into a topic, or expose a GraphQL endpoint. Then the consumers arrive. They ask, reasonably, “What does this field mean?”, “When does it update?”, “Can it be replayed?”, “What happens when upstream corrects history?”, “Which identifier is authoritative?”, “Is deleted really deleted?” Those aren’t database questions. They’re integration questions.
That is the heart of the matter: a data product that is consumed across domain boundaries is an integration surface. Treat it as such, and the architecture gets clearer. Ignore it, and you end up with a very expensive pile of ambiguous records.
The pattern I want to argue for is domain-routed data product architecture: data products are published and consumed through explicit domain semantics, bounded ownership, and routing rules that reflect business meaning rather than technical topology. Kafka may be involved. Microservices may be involved. CDC may be involved. But the real design move is not the transport. It is admitting that enterprise data exchange is a domain problem first and an infrastructure problem second.
Context
Modern enterprises are trying to do several things at once:
- decouple monoliths
- stand up event-driven integration
- build reusable data products
- enable analytics and AI
- avoid brittle point-to-point interfaces
- preserve regulatory control and operational stability
These goals often collide.
The monolith knows too much and changes too slowly. The data warehouse knows everything too late. The microservices estate knows a lot, but only in fragments. Kafka promises real-time flow, but real-time ambiguity is still ambiguity. A lakehouse can centralize facts, but it does not settle ownership. event-driven architecture patterns
So organizations create data products. Good instinct. Bad execution, often.
A real data product is not “a curated table in Snowflake” or “a Kafka topic with Avro.” It is a published model of a domain fact with clear semantics, quality expectations, versioning discipline, and usage boundaries. Once another domain depends on it, that product behaves exactly like an API. It has consumers, compatibility constraints, outage blast radius, lifecycle management, and political consequences.
This is where domain-driven design matters. DDD is not a whiteboard hobby. It gives us the language to decide where meaning belongs.
- Bounded contexts tell us where a concept is valid.
- Ubiquitous language gives us names that survive implementation churn.
- Context mapping tells us how one domain’s truth is translated into another’s.
- Anti-corruption layers stop foreign semantics from poisoning local models.
Without those tools, enterprise data products become thinly disguised database integration. The interfaces may be modern. The coupling is still ancient.
Problem
The common failure pattern looks like this:
A domain team is asked to expose customer data. They publish a table or event stream called Customer. Consumers from risk, service, marketing, fulfillment, and finance subscribe. At first, this feels like progress. There is reuse. There is speed. There is one source of truth.
Then the cracks appear.
Marketing wants prospect records included. Finance wants only invoicable legal entities. Service wants household grouping. Risk wants regulatory identity and beneficial ownership. CRM emits updates in near real time; ERP corrects addresses in overnight batches; compliance can retroactively freeze an account. Every consumer starts deriving its own version of “customer” from the same feed. The “single data product” becomes a semantic junk drawer.
Worse, teams start routing enterprise workflows through that feed because it is available. Not because it is authoritative for those decisions.
This creates three kinds of coupling:
- Structural coupling
Consumers depend on fields, schemas, identifiers, and event shapes.
- Temporal coupling
Consumers assume specific update timing, ordering, and replay behavior.
- Semantic coupling
Consumers assume that the producer’s meaning of a concept fits their own.
The third is the killer. Structural issues can be versioned. Temporal issues can be engineered around. Semantic mismatch turns every integration into a negotiation.
And once semantic mismatch is in the platform, Kafka or API management will not save you. They amplify whatever design you put into them. A bad model scales just as efficiently as a good one.
Forces
Any practical architecture here has to balance a set of hard forces.
Domain autonomy vs enterprise consistency
Teams need autonomy to move. The enterprise needs consistency in core concepts like customer, order, product, payment, policy, and account. Push too hard on autonomy and you get semantic drift. Push too hard on central consistency and you rebuild a slow-moving integration monarchy.
Real-time flow vs corrected truth
Event streams are attractive because they move fast. But enterprise truth is often corrected after the fact. Backdated changes, canceled transactions, merged identities, legal holds, and reconciliation adjustments are normal. If you optimize only for streaming freshness, you produce fast wrongness.
Local optimization vs downstream usability
A producer wants to publish what is easy from its source system. Consumers need something stable, comprehensible, and complete enough to use safely. Those are not the same thing.
Source authority vs derived products
Some products are authoritative records of domain state. Others are derived, aggregated, or conformed views. Confusing the two creates governance theater. Not every useful data product should be treated as a golden source. EA governance checklist
Platform standardization vs domain-specific expression
Enterprises want common patterns: schemas, contracts, topic conventions, SLAs, lineage. Sensible. But if the standards erase domain nuance, teams route around them. A platform that cannot represent business meaning becomes a compliance checkbox, not an accelerator.
Solution
The solution is to model data products as domain integration APIs and route exchange through domain semantics, not system adjacency.
That means four concrete things.
1. Publish by bounded context, not by database or application
A data product should represent a domain-owned fact within a bounded context. Not a raw extraction from SAP. Not an “enterprise customer” fiction unless there is a real owning domain for that concept. The producer owns the meaning, lifecycle, and service levels of what it publishes.
Examples:
- Billing Customer Account is owned by Billing.
- Party Identity Verification Status is owned by Compliance.
- Sales Prospect is owned by CRM/Sales.
- Household Relationship View may be a derived product owned by a customer insights domain, not by any source system.
This seems obvious. In practice it is routinely violated.
2. Route through domain contracts and translation points
Cross-domain sharing requires explicit translation. A sales concept may feed finance, but not unchanged. A compliance freeze may affect fulfillment, but only through a policy-relevant projection. This is classic context mapping.
In architecture terms, the route is:
- producer emits domain event or state product
- domain router or integration layer applies semantic routing rules
- consumers receive products or events meaningful in their own context
- anti-corruption logic prevents upstream terms from becoming local truth by accident
This is not central ESB nostalgia. The difference is important. The old hub transformed messages because applications could not. The new domain-routed architecture exists to preserve bounded contexts while allowing federated flow. The transformations are domain-conscious, productized, and observable.
3. Distinguish operational APIs, event products, and analytical products
A lot of mess comes from pretending these are interchangeable.
- Operational APIs support transactional interaction and command/query use cases.
- Event products propagate domain changes over time, often through Kafka.
- Analytical products provide queryable, reconciled, often denormalized views.
One domain may expose all three. They are not the same thing.
If you call a Kafka topic a data product but consumers need point-in-time correctness, replay semantics, and reconciliation after late-arriving changes, you have really offered an event integration API. That’s fine. Name it correctly and support it properly.
4. Build reconciliation into the model, not as an afterthought
Enterprises do not run on perfect streams. They run on correction.
So every serious data product architecture needs to answer:
- what is the key?
- what is the authoritative source?
- what is the event time and processing time?
- can history be revised?
- how are duplicates handled?
- how are out-of-order events handled?
- what closes the books for financial or regulatory reporting?
- how do consumers reconcile missing or inconsistent records?
If those questions are unanswered, the architecture is aspirational, not operational.
Architecture
Here is the high-level shape.
The crucial component is the domain routing layer. That may be implemented with Kafka streams, stream processing, integration microservices, event gateways, or a combination. The point is not a single technology. The point is explicit semantic mediation.
A reasonable implementation stack might look like this:
- microservices own transactional boundaries
- outbox pattern or CDC publishes domain events
- Kafka carries event products and state-change streams
- schema registry manages compatibility
- stream processors build derived products and routing projections
- API layer exposes queryable operational views where needed
- lakehouse or warehouse stores reconciled analytical products
- metadata/catalog captures ownership, SLAs, lineage, and usage constraints
But no component should obscure the domain ownership model.
Domain semantics and canonical traps
Many enterprises hear this and immediately reach for a canonical enterprise model. That is usually the wrong move.
A canonical model is appealing because it promises common language. In practice it often becomes a compromise language nobody truly owns. Every field is included because someone somewhere needs it. Meaning gets blurred. Change becomes bureaucratic. Teams fall back to side channels and custom mappings.
Use shared kernels sparingly for genuinely shared concepts. Use published language where a producer’s model is intentionally stable for consumers. Use translation where concepts differ. Do not force sameness where difference is real.
The enterprise does not need one customer model. It needs a clear map of customer-related models and the routes between them.
Event and state duality
One subtle but important point: consumers often need both the event trail and the current state.
For example:
- fraud analytics may need every customer status change event
- order fulfillment may only need the current “eligible to ship” state
- finance may need end-of-day reconciled state for reporting
Trying to satisfy all three with one artifact is how teams create overloaded topics and brittle consumers. Publish event products for change history. Publish state products for durable consumability. Keep the distinction visible.
This architecture supports one of the hardest enterprise truths: the event that happened and the state you are allowed to act on are related, but not identical.
Migration Strategy
No large organization gets to this architecture in one move. If you try, the migration becomes the architecture’s obituary.
Use a progressive strangler approach.
Start from a painful integration seam, not from a platform manifesto. Pick a high-value domain concept with many consumers and clear business friction: customer eligibility, order status, payment settlement, inventory availability, policy coverage, account standing.
Then work in stages.
Stage 1: Identify bounded contexts and current semantic breaks
Map where the concept is used and what it means in each domain. This is DDD context mapping work, and it is worth doing properly. You are looking for:
- authoritative source by sub-concept
- key identifiers and crosswalks
- latency expectations
- correction paths
- current reconciliation pain
- hidden spreadsheets and manual checks
- systems of record versus systems of action
Stage 2: Publish the first domain-owned product
Do not build the whole enterprise information model. Publish one trustworthy product with explicit contract and quality notes.
For example:
- Billing publishes
AccountStandingChangedevents and a queryableAccountStandingstate product. - Compliance publishes
VerificationStatus. - Sales does not pretend either of those is “Customer Master.”
Stage 3: Add domain routing and anti-corruption around the old estate
This is where strangling begins. Instead of every downstream system reading legacy tables directly, route through the new product and translation layer. Some consumers still need old interfaces; fine. The routing layer can fan out to legacy integration patterns while new consumers use events and APIs.
Stage 4: Introduce reconciliation services
As traffic grows, you will discover mismatches: dropped events, stale records, correction logic, source disagreement. Good. That means you are touching reality.
Build reconciliation explicitly:
- periodic source-to-product comparison
- replay tooling
- dead-letter triage
- idempotent consumers
- compensating update flows
- exception dashboards tied to business ownership
Stage 5: Retire direct dependencies on underlying source schemas
This is the real strangler milestone. Once consumers rely on product contracts rather than source internals, the source can evolve. Until then, you have not decoupled anything.
Migration is less about replacing technology and more about replacing accidental semantics with intentional ones.
Enterprise Example
Consider a global insurer modernizing customer and policy servicing across 20 countries.
The legacy environment contains:
- a policy administration monolith
- regional CRMs
- a claims platform
- a finance ledger
- a data warehouse used for reporting and actuarial analysis
- several Kafka-backed microservices for new digital channels
Leadership decides to create a “Customer 360 data product.” This is the usual instinct. If done naively, it becomes a landfill for every customer-adjacent field from every system.
A better architecture starts by refusing the false unity.
The insurer identifies separate bounded contexts:
- Party Management: legal person/org identity, identifiers, contact points
- Policy Administration: policyholder role, covered parties, policy lifecycle
- Claims: claimant, incident relationships, claim status
- Billing: account, premium payment standing, delinquency
- Compliance: sanctions screening, KYC/AML status
- Customer Engagement: digital profile, preferences, consent
Now the architecture becomes useful.
Party Management publishes a PartyProfile product.
Billing publishes AccountStanding.
Compliance publishes ScreeningStatus.
Policy Administration publishes PolicyRoleAssignments.
A routing layer then produces consumer-specific projections:
- Claims receives a policy-and-party view relevant to claims intake.
- Digital channels receive a service-facing “eligible customer” view.
- Marketing receives only consented engagement attributes, not raw policy or compliance data.
- Finance receives reconciled account and policy linkage with end-of-day closure rules.
Kafka carries the event streams. Stream processors create current-state products. APIs provide on-demand lookup for operational workflows. Reconciliation jobs compare policy and billing relationships nightly because finance closes on controlled batches, not purely on streams.
What happened to “Customer 360”? It survives, but as a derived analytical product, not as the authoritative integration contract for every operational use case. That is the right answer.
This insurer avoids a common failure mode: leaking claims semantics into customer engagement, or marketing identifiers into compliance workflows. Teams can move independently because the routes are explicit.
And when regulations in one country require retroactive suppression of customer contact data, only the relevant products and projections are changed. The entire enterprise does not need to redefine “customer” overnight.
Operational Considerations
This style of architecture is not free. It asks for operational discipline.
Contract governance
If data products are integration APIs, treat them like APIs:
- version contracts deliberately
- define compatibility rules
- publish ownership and support model
- document semantic changes, not just schema changes
- test consumer compatibility continuously
Schema registry helps, but schema compatibility is the floor, not the ceiling.
Observability
You need lineage and runtime observability:
- producer health
- topic lag
- projection freshness
- reconciliation drift
- replay outcomes
- consumer error rates
- contract adoption by version
The most useful dashboards combine technical and business views. “Verification status product delayed by 2 hours” matters more than “topic throughput down 18%.”
Reconciliation as first-class architecture
I’ll say it plainly: if your event-driven enterprise has no reconciliation architecture, it is a demo.
Reconciliation should include:
- record counts and key coverage checks
- state comparison against source snapshots
- audit trail for corrections
- replay and backfill capability
- business sign-off for tolerated divergence windows
Especially in finance, healthcare, insurance, and regulated commerce, reconciliation is not a side utility. It is how the enterprise trusts the architecture.
Identity and key management
Many domain routing failures are actually identifier failures:
- customer ID versus party ID versus account ID
- local keys versus global keys
- merged/split identities
- reused external identifiers
- survivorship rules
A key strategy should be explicit. Hidden crosswalks destroy confidence faster than almost anything else.
Tradeoffs
This pattern has sharp edges.
Pros
- clearer domain ownership
- less semantic leakage across teams
- safer reuse of data products
- better support for event-driven and analytical use cases together
- improved migration path out of monoliths
- lower long-term coupling than direct database or raw CDC sharing
Cons
- more design work up front
- need for skilled domain modeling, not just platform engineering
- more product management around contracts
- translation layers can proliferate if unmanaged
- possible latency added by routing and projection building
- harder to explain than “just publish the table”
It also creates a political tradeoff. Some teams will resist because explicit semantics expose where their data is weak, overloaded, or inconsistent. Architecture here is not just technical hygiene. It is organizational honesty.
Failure Modes
Several things go wrong repeatedly.
1. The “topic equals product” mistake
Publishing a Kafka topic and calling it a product is not enough. If ownership, semantics, keys, correction rules, and consumer expectations are unclear, you have simply industrialized ambiguity.
2. Recreating the ESB with better branding
If every route and transformation is centrally controlled by an integration team with weak domain participation, the architecture ossifies. The answer is federated ownership with strong platform guardrails, not integration priesthood.
3. Canonical enterprise model bloat
The model becomes huge, slow, and contested. Nobody can evolve it. Teams fork around it. This is how standards die: not by rebellion, by circumvention.
4. Ignoring temporal semantics
Consumers treat arrival time as business truth. Then late or corrected events appear, and downstream processes break or silently diverge.
5. No reconciliation path
Streams drift from sources. Batch corrections overwrite assumptions. Audit asks for traceability. The architecture responds with hand-waving.
6. Leaking domain internals as public contract
A source system’s table structure or workflow statuses escape into enterprise consumption. Now every producer change becomes an enterprise incident.
When Not To Use
This pattern is powerful, but it is not universal.
Do not use full domain-routed data product architecture when:
- the organization is small and a simpler integration style will do
- there are very few consumers and semantics are stable
- the exchange is purely analytical and no operational dependency exists
- the domain is immature and concepts change weekly
- the platform team lacks the ability to support contract governance and observability
- batch file exchange with explicit controls is actually the better fit for regulatory or operational reasons
And do not force Kafka into places where event streaming is incidental. If the real need is a stable operational query API over a well-owned domain, build that. Streaming is useful when change propagation matters. It is not a moral upgrade.
Related Patterns
This architecture sits alongside several related patterns.
- Data Mesh
Useful when interpreted correctly: domain-owned products with federated governance. Dangerous when reduced to “every team publishes data somehow.” ArchiMate for governance
- Event-Driven Architecture
Excellent for propagating change. Insufficient by itself for semantic governance and reconciliation.
- CQRS
Helpful when separating write models from read-optimized state products.
- Outbox Pattern / CDC
Practical ways to publish domain changes reliably from operational systems.
- Anti-Corruption Layer
Essential for protecting bounded contexts from semantic pollution.
- Strangler Fig Pattern
The right migration shape for replacing direct integration with productized domain contracts over time.
- Master Data Management
Still relevant, but best used surgically for identity and survivorship where genuine shared mastery exists. Not every domain concept needs a universal master.
Summary
The useful provocation is this: your data products are integration APIs.
Once data leaves a domain and other domains depend on it, you are no longer merely publishing data. You are publishing meaning, timing, correction rules, ownership, and trust. That is integration architecture, whether the transport is Kafka, REST, CDC, or a warehouse table. integration architecture guide
The winning move is not to centralize everything into one canonical model, nor to let every team emit whatever they happen to store. It is to route data exchange through domain semantics. Use bounded contexts to define ownership. Use product contracts to make meaning explicit. Use anti-corruption layers and translation where concepts differ. Use progressive strangler migration to retire direct coupling. And build reconciliation as a first-class concern, because enterprises live on corrected truth, not just streamed truth.
A table is not a product. A topic is not a product. A payload is not a product.
A product begins when another domain can depend on it safely.
That is the standard worth designing for.
Frequently Asked Questions
What is a data mesh?
A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.
What is a data product in architecture terms?
A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.
How does data mesh relate to enterprise architecture?
Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.