Your Data Products Are Integration APIs

⏱ 19 min read

Most integration architecture fails for a boring reason: we pretend data is a static asset when, in the enterprise, data is behavior wearing a database costume.

That sounds dramatic, but look around any large organization. “Customer” means one thing in billing, another in CRM, something more constrained in KYC, and something politically explosive in marketing. The raw tables are not the product. The semantic contract around them is. The routing of change, ownership, and meaning is. In practice, your so-called data product is an integration API, whether you admit it or not.

This is the architectural mistake behind many failed data mesh programs, many Kafka platform rollouts, and many service decomposition efforts. Teams are told to publish data products, so they export some tables, push events into a topic, or expose a GraphQL endpoint. Then the consumers arrive. They ask, reasonably, “What does this field mean?”, “When does it update?”, “Can it be replayed?”, “What happens when upstream corrects history?”, “Which identifier is authoritative?”, “Is deleted really deleted?” Those aren’t database questions. They’re integration questions.

That is the heart of the matter: a data product that is consumed across domain boundaries is an integration surface. Treat it as such, and the architecture gets clearer. Ignore it, and you end up with a very expensive pile of ambiguous records.

The pattern I want to argue for is domain-routed data product architecture: data products are published and consumed through explicit domain semantics, bounded ownership, and routing rules that reflect business meaning rather than technical topology. Kafka may be involved. Microservices may be involved. CDC may be involved. But the real design move is not the transport. It is admitting that enterprise data exchange is a domain problem first and an infrastructure problem second.

Context

Modern enterprises are trying to do several things at once:

decouple monoliths
stand up event-driven integration
build reusable data products
enable analytics and AI
avoid brittle point-to-point interfaces
preserve regulatory control and operational stability

These goals often collide.

The monolith knows too much and changes too slowly. The data warehouse knows everything too late. The microservices estate knows a lot, but only in fragments. Kafka promises real-time flow, but real-time ambiguity is still ambiguity. A lakehouse can centralize facts, but it does not settle ownership. event-driven architecture patterns

So organizations create data products. Good instinct. Bad execution, often.

A real data product is not “a curated table in Snowflake” or “a Kafka topic with Avro.” It is a published model of a domain fact with clear semantics, quality expectations, versioning discipline, and usage boundaries. Once another domain depends on it, that product behaves exactly like an API. It has consumers, compatibility constraints, outage blast radius, lifecycle management, and political consequences.

This is where domain-driven design matters. DDD is not a whiteboard hobby. It gives us the language to decide where meaning belongs.

Bounded contexts tell us where a concept is valid.
Ubiquitous language gives us names that survive implementation churn.
Context mapping tells us how one domain’s truth is translated into another’s.
Anti-corruption layers stop foreign semantics from poisoning local models.

Without those tools, enterprise data products become thinly disguised database integration. The interfaces may be modern. The coupling is still ancient.

Problem

The common failure pattern looks like this:

A domain team is asked to expose customer data. They publish a table or event stream called Customer. Consumers from risk, service, marketing, fulfillment, and finance subscribe. At first, this feels like progress. There is reuse. There is speed. There is one source of truth.

Then the cracks appear.

Marketing wants prospect records included. Finance wants only invoicable legal entities. Service wants household grouping. Risk wants regulatory identity and beneficial ownership. CRM emits updates in near real time; ERP corrects addresses in overnight batches; compliance can retroactively freeze an account. Every consumer starts deriving its own version of “customer” from the same feed. The “single data product” becomes a semantic junk drawer.

Worse, teams start routing enterprise workflows through that feed because it is available. Not because it is authoritative for those decisions.

This creates three kinds of coupling:

Structural coupling

Consumers depend on fields, schemas, identifiers, and event shapes.

Temporal coupling

Consumers assume specific update timing, ordering, and replay behavior.

Semantic coupling

Consumers assume that the producer’s meaning of a concept fits their own.

The third is the killer. Structural issues can be versioned. Temporal issues can be engineered around. Semantic mismatch turns every integration into a negotiation.

And once semantic mismatch is in the platform, Kafka or API management will not save you. They amplify whatever design you put into them. A bad model scales just as efficiently as a good one.

Forces

Any practical architecture here has to balance a set of hard forces.

Domain autonomy vs enterprise consistency

Teams need autonomy to move. The enterprise needs consistency in core concepts like customer, order, product, payment, policy, and account. Push too hard on autonomy and you get semantic drift. Push too hard on central consistency and you rebuild a slow-moving integration monarchy.

Real-time flow vs corrected truth

Event streams are attractive because they move fast. But enterprise truth is often corrected after the fact. Backdated changes, canceled transactions, merged identities, legal holds, and reconciliation adjustments are normal. If you optimize only for streaming freshness, you produce fast wrongness.

Local optimization vs downstream usability

A producer wants to publish what is easy from its source system. Consumers need something stable, comprehensible, and complete enough to use safely. Those are not the same thing.

Source authority vs derived products

Some products are authoritative records of domain state. Others are derived, aggregated, or conformed views. Confusing the two creates governance theater. Not every useful data product should be treated as a golden source. EA governance checklist

Platform standardization vs domain-specific expression

Enterprises want common patterns: schemas, contracts, topic conventions, SLAs, lineage. Sensible. But if the standards erase domain nuance, teams route around them. A platform that cannot represent business meaning becomes a compliance checkbox, not an accelerator.

Solution

The solution is to model data products as domain integration APIs and route exchange through domain semantics, not system adjacency.

That means four concrete things.

1. Publish by bounded context, not by database or application

A data product should represent a domain-owned fact within a bounded context. Not a raw extraction from SAP. Not an “enterprise customer” fiction unless there is a real owning domain for that concept. The producer owns the meaning, lifecycle, and service levels of what it publishes.

Examples:

Billing Customer Account is owned by Billing.
Party Identity Verification Status is owned by Compliance.
Sales Prospect is owned by CRM/Sales.
Household Relationship View may be a derived product owned by a customer insights domain, not by any source system.

This seems obvious. In practice it is routinely violated.

2. Route through domain contracts and translation points

Cross-domain sharing requires explicit translation. A sales concept may feed finance, but not unchanged. A compliance freeze may affect fulfillment, but only through a policy-relevant projection. This is classic context mapping.

In architecture terms, the route is:

producer emits domain event or state product
domain router or integration layer applies semantic routing rules
consumers receive products or events meaningful in their own context
anti-corruption logic prevents upstream terms from becoming local truth by accident

This is not central ESB nostalgia. The difference is important. The old hub transformed messages because applications could not. The new domain-routed architecture exists to preserve bounded contexts while allowing federated flow. The transformations are domain-conscious, productized, and observable.

3. Distinguish operational APIs, event products, and analytical products

A lot of mess comes from pretending these are interchangeable.

Operational APIs support transactional interaction and command/query use cases.
Event products propagate domain changes over time, often through Kafka.
Analytical products provide queryable, reconciled, often denormalized views.

One domain may expose all three. They are not the same thing.

If you call a Kafka topic a data product but consumers need point-in-time correctness, replay semantics, and reconciliation after late-arriving changes, you have really offered an event integration API. That’s fine. Name it correctly and support it properly.

4. Build reconciliation into the model, not as an afterthought

Enterprises do not run on perfect streams. They run on correction.

So every serious data product architecture needs to answer:

what is the key?
what is the authoritative source?
what is the event time and processing time?
can history be revised?
how are duplicates handled?
how are out-of-order events handled?
what closes the books for financial or regulatory reporting?
how do consumers reconcile missing or inconsistent records?

If those questions are unanswered, the architecture is aspirational, not operational.

Architecture

Here is the high-level shape.

The crucial component is the domain routing layer. That may be implemented with Kafka streams, stream processing, integration microservices, event gateways, or a combination. The point is not a single technology. The point is explicit semantic mediation.

A reasonable implementation stack might look like this:

microservices own transactional boundaries
outbox pattern or CDC publishes domain events
Kafka carries event products and state-change streams
schema registry manages compatibility
stream processors build derived products and routing projections
API layer exposes queryable operational views where needed
lakehouse or warehouse stores reconciled analytical products
metadata/catalog captures ownership, SLAs, lineage, and usage constraints

But no component should obscure the domain ownership model.

Domain semantics and canonical traps

Many enterprises hear this and immediately reach for a canonical enterprise model. That is usually the wrong move.

A canonical model is appealing because it promises common language. In practice it often becomes a compromise language nobody truly owns. Every field is included because someone somewhere needs it. Meaning gets blurred. Change becomes bureaucratic. Teams fall back to side channels and custom mappings.

Use shared kernels sparingly for genuinely shared concepts. Use published language where a producer’s model is intentionally stable for consumers. Use translation where concepts differ. Do not force sameness where difference is real.

The enterprise does not need one customer model. It needs a clear map of customer-related models and the routes between them.

Event and state duality

One subtle but important point: consumers often need both the event trail and the current state.

For example:

fraud analytics may need every customer status change event
order fulfillment may only need the current “eligible to ship” state
finance may need end-of-day reconciled state for reporting

Trying to satisfy all three with one artifact is how teams create overloaded topics and brittle consumers. Publish event products for change history. Publish state products for durable consumability. Keep the distinction visible.

This architecture supports one of the hardest enterprise truths: the event that happened and the state you are allowed to act on are related, but not identical.

Migration Strategy

No large organization gets to this architecture in one move. If you try, the migration becomes the architecture’s obituary.

Use a progressive strangler approach.

Start from a painful integration seam, not from a platform manifesto. Pick a high-value domain concept with many consumers and clear business friction: customer eligibility, order status, payment settlement, inventory availability, policy coverage, account standing.

Then work in stages.

Stage 1: Identify bounded contexts and current semantic breaks

Map where the concept is used and what it means in each domain. This is DDD context mapping work, and it is worth doing properly. You are looking for:

authoritative source by sub-concept
key identifiers and crosswalks
latency expectations
correction paths
current reconciliation pain
hidden spreadsheets and manual checks
systems of record versus systems of action

Stage 2: Publish the first domain-owned product

Do not build the whole enterprise information model. Publish one trustworthy product with explicit contract and quality notes.

For example:

Billing publishes AccountStandingChanged events and a queryable AccountStanding state product.
Compliance publishes VerificationStatus.
Sales does not pretend either of those is “Customer Master.”

Stage 3: Add domain routing and anti-corruption around the old estate

This is where strangling begins. Instead of every downstream system reading legacy tables directly, route through the new product and translation layer. Some consumers still need old interfaces; fine. The routing layer can fan out to legacy integration patterns while new consumers use events and APIs.

Stage 4: Introduce reconciliation services

As traffic grows, you will discover mismatches: dropped events, stale records, correction logic, source disagreement. Good. That means you are touching reality.

Build reconciliation explicitly:

periodic source-to-product comparison
replay tooling
dead-letter triage
idempotent consumers
compensating update flows
exception dashboards tied to business ownership

Stage 5: Retire direct dependencies on underlying source schemas

This is the real strangler milestone. Once consumers rely on product contracts rather than source internals, the source can evolve. Until then, you have not decoupled anything.

Migration is less about replacing technology and more about replacing accidental semantics with intentional ones.

Enterprise Example

Consider a global insurer modernizing customer and policy servicing across 20 countries.

The legacy environment contains:

a policy administration monolith
regional CRMs
a claims platform
a finance ledger
a data warehouse used for reporting and actuarial analysis
several Kafka-backed microservices for new digital channels

Leadership decides to create a “Customer 360 data product.” This is the usual instinct. If done naively, it becomes a landfill for every customer-adjacent field from every system.

A better architecture starts by refusing the false unity.

The insurer identifies separate bounded contexts:

Party Management: legal person/org identity, identifiers, contact points
Policy Administration: policyholder role, covered parties, policy lifecycle
Claims: claimant, incident relationships, claim status
Billing: account, premium payment standing, delinquency
Compliance: sanctions screening, KYC/AML status
Customer Engagement: digital profile, preferences, consent

Now the architecture becomes useful.

Party Management publishes a PartyProfile product.

Billing publishes AccountStanding.

Compliance publishes ScreeningStatus.

Policy Administration publishes PolicyRoleAssignments.

A routing layer then produces consumer-specific projections:

Claims receives a policy-and-party view relevant to claims intake.
Digital channels receive a service-facing “eligible customer” view.
Marketing receives only consented engagement attributes, not raw policy or compliance data.
Finance receives reconciled account and policy linkage with end-of-day closure rules.

Kafka carries the event streams. Stream processors create current-state products. APIs provide on-demand lookup for operational workflows. Reconciliation jobs compare policy and billing relationships nightly because finance closes on controlled batches, not purely on streams.

What happened to “Customer 360”? It survives, but as a derived analytical product, not as the authoritative integration contract for every operational use case. That is the right answer.

This insurer avoids a common failure mode: leaking claims semantics into customer engagement, or marketing identifiers into compliance workflows. Teams can move independently because the routes are explicit.

And when regulations in one country require retroactive suppression of customer contact data, only the relevant products and projections are changed. The entire enterprise does not need to redefine “customer” overnight.

Operational Considerations

This style of architecture is not free. It asks for operational discipline.

Contract governance

If data products are integration APIs, treat them like APIs:

version contracts deliberately
define compatibility rules
publish ownership and support model
document semantic changes, not just schema changes
test consumer compatibility continuously

Schema registry helps, but schema compatibility is the floor, not the ceiling.

Observability

You need lineage and runtime observability:

producer health
topic lag
projection freshness
reconciliation drift
replay outcomes
consumer error rates
contract adoption by version

The most useful dashboards combine technical and business views. “Verification status product delayed by 2 hours” matters more than “topic throughput down 18%.”

Reconciliation as first-class architecture

I’ll say it plainly: if your event-driven enterprise has no reconciliation architecture, it is a demo.

Reconciliation should include:

record counts and key coverage checks
state comparison against source snapshots
audit trail for corrections
replay and backfill capability
business sign-off for tolerated divergence windows

Especially in finance, healthcare, insurance, and regulated commerce, reconciliation is not a side utility. It is how the enterprise trusts the architecture.

Identity and key management

Many domain routing failures are actually identifier failures:

customer ID versus party ID versus account ID
local keys versus global keys
merged/split identities
reused external identifiers
survivorship rules

A key strategy should be explicit. Hidden crosswalks destroy confidence faster than almost anything else.

Tradeoffs

This pattern has sharp edges.

Pros

clearer domain ownership
less semantic leakage across teams
safer reuse of data products
better support for event-driven and analytical use cases together
improved migration path out of monoliths
lower long-term coupling than direct database or raw CDC sharing

Cons

more design work up front
need for skilled domain modeling, not just platform engineering
more product management around contracts
translation layers can proliferate if unmanaged
possible latency added by routing and projection building
harder to explain than “just publish the table”

It also creates a political tradeoff. Some teams will resist because explicit semantics expose where their data is weak, overloaded, or inconsistent. Architecture here is not just technical hygiene. It is organizational honesty.

Failure Modes

Several things go wrong repeatedly.

1. The “topic equals product” mistake

Publishing a Kafka topic and calling it a product is not enough. If ownership, semantics, keys, correction rules, and consumer expectations are unclear, you have simply industrialized ambiguity.

2. Recreating the ESB with better branding

If every route and transformation is centrally controlled by an integration team with weak domain participation, the architecture ossifies. The answer is federated ownership with strong platform guardrails, not integration priesthood.

3. Canonical enterprise model bloat

The model becomes huge, slow, and contested. Nobody can evolve it. Teams fork around it. This is how standards die: not by rebellion, by circumvention.

4. Ignoring temporal semantics

Consumers treat arrival time as business truth. Then late or corrected events appear, and downstream processes break or silently diverge.

5. No reconciliation path

Streams drift from sources. Batch corrections overwrite assumptions. Audit asks for traceability. The architecture responds with hand-waving.

6. Leaking domain internals as public contract

A source system’s table structure or workflow statuses escape into enterprise consumption. Now every producer change becomes an enterprise incident.

When Not To Use

This pattern is powerful, but it is not universal.

Do not use full domain-routed data product architecture when:

the organization is small and a simpler integration style will do
there are very few consumers and semantics are stable
the exchange is purely analytical and no operational dependency exists
the domain is immature and concepts change weekly
the platform team lacks the ability to support contract governance and observability
batch file exchange with explicit controls is actually the better fit for regulatory or operational reasons

And do not force Kafka into places where event streaming is incidental. If the real need is a stable operational query API over a well-owned domain, build that. Streaming is useful when change propagation matters. It is not a moral upgrade.

This architecture sits alongside several related patterns.

Data Mesh

Useful when interpreted correctly: domain-owned products with federated governance. Dangerous when reduced to “every team publishes data somehow.” ArchiMate for governance

Event-Driven Architecture

Excellent for propagating change. Insufficient by itself for semantic governance and reconciliation.

CQRS

Helpful when separating write models from read-optimized state products.

Outbox Pattern / CDC

Practical ways to publish domain changes reliably from operational systems.

Anti-Corruption Layer

Essential for protecting bounded contexts from semantic pollution.

Strangler Fig Pattern

The right migration shape for replacing direct integration with productized domain contracts over time.

Master Data Management

Still relevant, but best used surgically for identity and survivorship where genuine shared mastery exists. Not every domain concept needs a universal master.

Summary

The useful provocation is this: your data products are integration APIs.

Once data leaves a domain and other domains depend on it, you are no longer merely publishing data. You are publishing meaning, timing, correction rules, ownership, and trust. That is integration architecture, whether the transport is Kafka, REST, CDC, or a warehouse table. integration architecture guide

The winning move is not to centralize everything into one canonical model, nor to let every team emit whatever they happen to store. It is to route data exchange through domain semantics. Use bounded contexts to define ownership. Use product contracts to make meaning explicit. Use anti-corruption layers and translation where concepts differ. Use progressive strangler migration to retire direct coupling. And build reconciliation as a first-class concern, because enterprises live on corrected truth, not just streamed truth.

A table is not a product. A topic is not a product. A payload is not a product.

A product begins when another domain can depend on it safely.

That is the standard worth designing for.

Frequently Asked Questions

What is a data mesh?

A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.

What is a data product in architecture terms?

A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.

How does data mesh relate to enterprise architecture?

Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.