Your Data Platform Is an Integration Layer in Disguise

⏱ 21 min read

Most data platforms are introduced with clean intentions and dirty consequences.

They begin life as a sensible investment: centralize analytics, standardize reporting, make data discoverable, help the business move faster. A warehouse here, a lake there, perhaps a stream processor if the architects are feeling modern. The language is reassuring. Single source of truth. Self-service. Federated governance. Data products. But then the platform grows teeth. Teams route operational events through it. Reference data gets copied into it and copied back out. Upstream services start depending on transformed outputs. Finance closes books from it. Customer operations resolve disputes from it. Machine learning pipelines enrich records that later leak into production workflows. What looked like a data estate quietly becomes an integration estate.

That is the trick. Your data platform is often not a passive repository at all. It is an integration layer in disguise.

And if you fail to see it that way, you inherit a hidden dependency graph topology that no one designed, few people understand, and everyone eventually depends on. The graph is hidden because the dependencies are not expressed in service contracts, API gateways, or event catalogs. They are buried in SQL jobs, schema evolution, notebook logic, reverse ETL tools, brittle Kafka topics, and “temporary” reconciliation scripts that somehow survive four reorganizations. event-driven architecture patterns

This is not merely a technology issue. It is a domain issue. A semantics issue. An ownership issue. In domain-driven design terms, the real danger is not data duplication. The danger is bounded contexts bleeding into each other through derived datasets with no explicit model of meaning, authority, or invariants. Once that happens, the platform stops being a helpful observer of the enterprise and starts becoming an accidental orchestrator of it.

That is where architecture matters. Not in drawing a prettier diagram, but in naming the thing honestly.

Context

A modern enterprise rarely has a single integration mechanism. It has many.

There are request-response APIs between operational systems. There are asynchronous events on Kafka. There are ETL and ELT pipelines into the warehouse or lakehouse. There are MDM hubs, batch exports, CDC streams, partner file drops, and SaaS connectors feeding every direction at once. Overlay that with data governance, retention rules, privacy controls, and departmental reporting requirements, and what you get is not a platform but a living network of dependency paths. EA governance checklist

The popular story says these are separate concerns. Operational integration happens in microservices and event backbones. Analytical consumption happens in the data platform. In practice, large enterprises blur this line constantly. microservices architecture diagrams

Why? Because the data platform is easy to reach. It already has the data. It already normalizes identifiers. It already calculates metrics. It often has better historical truth than source systems. For a delivery team under pressure, consuming a curated table from the warehouse can be faster than negotiating a new API with three system owners and a security board. Reverse ETL can push those results into CRM, marketing, or service platforms with one procurement cycle and a few clicks. The platform becomes the shortest path between silos.

Shortest paths are seductive. They are also how accidental architecture happens.

A hidden dependency graph forms when downstream behaviors depend on upstream data transformations that are undocumented as operational contracts. The graph has topology: hubs, choke points, transitive dependencies, circular loops, fan-out explosions, and orphan branches. But unlike a designed application topology, this one is mostly invisible until it fails.

Problem

The problem is not that data platforms integrate systems. They absolutely do, and sometimes they should. The problem is pretending they do not.

Once the platform acts as an integration layer, several risks emerge at once.

First, semantics drift. A “customer” in billing is an accountable legal entity. A “customer” in support is a case-bearing account. A “customer” in marketing may be an individual lead, half-known and probabilistic. The data platform often collapses these into one conformed entity because reporting wants a universal dimension. That can be useful analytically. It becomes dangerous when operational processes consume that same conformed view as if it were authoritative domain truth.

Second, authority gets muddled. Which system owns credit status? Which one owns product eligibility? Which one owns effective address? In a warehouse, it is easy to derive a best answer. In an enterprise, “best” is not the same as “authoritative.” A derived answer may be perfect for analytics and unacceptable for transactions.

Third, change impact becomes opaque. A team modifies a transformation to improve revenue attribution. Unbeknownst to them, a downstream CRM sync uses that output to drive case prioritization, and a machine learning feature set uses the same table to score churn risk. A harmless reporting change mutates customer treatment in production. Nobody changed an API, but the enterprise behavior changed anyway.

Fourth, temporal guarantees degrade. Operational integrations need explicit expectations around freshness, ordering, idempotency, retries, and failure handling. Data platforms often tolerate eventual consistency in broad windows. But as soon as a sales process or fulfillment workflow depends on a curated table being “fresh enough,” architecture has crossed a line. Timeliness is now a business invariant, not a reporting preference.

Finally, accountability evaporates. Integration layers need operational ownership because they sit on the critical path of business behavior. Data platforms are often governed by platform teams optimized for enablement, not transaction stewardship. That mismatch becomes painful in incidents.

You can live with these issues for a while. Large enterprises often do. But eventually the bill arrives, and it arrives during an outage, a close cycle, a compliance event, or a merger.

Forces

Good architecture is mostly the art of respecting forces instead of denying them.

The first force is speed. Product teams and business functions need information quickly. They will use the easiest available path. If the data platform offers a cleaner, richer, or faster route than operational interfaces, it will be used as one.

The second force is heterogeneity. Enterprises do not have one customer system, one product model, or one process engine. They have layers of ERP, CRM, bespoke services, acquired platforms, and regional variations. A central data platform becomes the place where these differences are normalized because no single operational system can absorb them all.

The third force is analytical gravity. Historical data, cross-domain joins, and aggregate state naturally collect in the platform. Any decision process requiring history or broad enterprise context will be tempted to run there.

The fourth force is organizational mismatch. Domain ownership in software architecture sounds crisp on slides and messy in reality. Data often crosses domains before governance catches up. Business stakeholders ask for an outcome, not a bounded context map. Someone delivers the outcome through SQL and schedules. ArchiMate for governance

The fifth force is economics. Building and governing explicit service APIs for every integration is expensive. So is event contract discipline on Kafka. Reusing the data platform can be the cheaper local optimization.

And then there is the force most architects underestimate: reconciliation. Enterprises are not run on pure transactions. They are run on finding and explaining differences between competing records of reality. Reconciliation is where hidden integration surfaces. If your finance ledger, order platform, and customer warehouse disagree, people do not care which stack is “correctly layered.” They care which one they can trust and how quickly they can explain variance.

These forces are why the pattern keeps appearing. Not because people are foolish, but because enterprises are untidy machines.

Solution

The answer is not to ban all operational use of the data platform. That is purity theater. The answer is to treat the platform explicitly as part of the enterprise integration architecture when it behaves that way.

That means four things.

1. Make domain semantics first-class

Domain-driven design is essential here. Not as decoration, but as control.

You need explicit bounded contexts, clear system-of-record decisions, and published semantic contracts for shared entities and events. A curated customer table is not “the customer truth” unless you can state which aspects of customer it owns, which are projections from other domains, and which are only analytical interpretations. The platform must expose whether a dataset is authoritative, derived, reconciled, probabilistic, or presentation-oriented.

This sounds obvious. It is not common.

Most hidden dependency graphs are really hidden semantic graphs. Teams depend less on raw fields than on implied meaning: active customer, fulfilled order, collectible invoice, eligible policy holder. Those meanings must be named and anchored to domains.

2. Classify data flows by operational criticality

Not all data flows deserve the same architecture.

Some flows are analytical only. Delay is acceptable. Reprocessing is fine. Backfills are normal.

Some flows are decision-support near operations. Delay matters, but direct transaction execution does not depend on them.

Some flows are operationally critical. They trigger customer treatment, financial booking, entitlement, fraud action, shipment release, or regulatory reporting.

These categories should drive different controls, SLAs, lineage depth, testing standards, and ownership models. If a dataset sits on the critical path of business behavior, it must be designed and run like integration middleware, not just data plumbing.

3. Separate projection from authority

The platform should be a superb place to build projections: consolidated views, historical models, machine-learning features, cross-domain read models, and reconciled analytics. But projection is not authority.

Keep the write authority and transactional invariants inside the owning bounded context whenever possible. If reverse ETL or platform-driven updates are necessary, treat them as explicit integration patterns with auditability, idempotency, conflict handling, and rollback plans.

A good line to remember: the closer you get to writing back, the less you are doing analytics and the more you are doing integration.

4. Expose the dependency graph

You need an architectural view of hidden topology: data producers, transforms, consumption paths, reverse flows, semantic ownership, freshness expectations, and failure blast radius.

A catalog alone is not enough. Nor is lineage alone. You need a dependency graph that combines technical lineage with business semantics and operational criticality.

Here is the shape of the problem.

4. Expose the dependency graph
Expose the dependency graph

Nothing in that diagram is unusual. That is precisely why it is dangerous. Enterprises normalize this shape and forget that every arrow is a dependency and every derived object carries semantic decisions.

Architecture

A sound architecture for this world does not pretend there is one universal pattern. It uses a layered model with explicit intent.

The layers

Operational systems of record own transactional invariants. They emit domain events or expose APIs. They are the source of authority for specific facts.

Event backbone and integration services handle near-real-time integration where ordering, routing, retries, and contract discipline matter. Kafka is excellent here when used deliberately, not as a dumping ground.

Data platform ingestion and storage collect raw and historized data from operational systems, often via CDC, events, and batch interfaces.

Semantic projection layer builds curated, domain-aware datasets: read models, reconciled views, feature tables, analytical marts.

Consumption and activation layer serves BI, data science, and selected downstream operational consumers. This is where reverse ETL, feeds, and domain-approved projections can live, under governance.

The key is not the layers themselves. It is the rules between them.

  • Authority flows from systems of record.
  • Projections can combine sources but must declare semantics.
  • Write-backs are explicit integrations with owning-domain approval.
  • Reconciliation is a named capability, not an ad hoc script.

A practical view looks like this:

Diagram 2
Your Data Platform Is an Integration Layer in Disguise

Domain semantics discussion

This is where many architecture articles become abstract. Let us keep it real.

Suppose you build a Customer360 projection. Fine. What is it?

Is it:

  • a conformed analytical dimension for reporting,
  • a master entity used to merge duplicate identities,
  • a support-facing read model,
  • a marketing audience model,
  • or a decisioning object used to drive eligibility?

Those are different things. They may share data, but they do not share semantics.

A DDD approach forces the platform to stop pretending one view can satisfy all bounded contexts without translation. It can still publish a broad projection, but each consuming context must know whether it is using a foreign model, a local read model, or a derived interpretation. This matters because domain language carries policy. “Delinquent” in collections is not “at risk” in marketing. “Active” in service operations may include suspended accounts that finance excludes.

Conformed data is useful. Universal meaning is a fantasy.

Kafka and microservices

Kafka often appears as the antidote to bad data integration. Sometimes it is. Sometimes it merely moves the problem earlier in the pipeline.

If services publish domain events with stable contracts and consumers build their own projections, you preserve domain ownership better than central SQL transformations can. But Kafka only helps if events reflect true domain semantics rather than leaked database changes wrapped in Avro. Event-driven architecture does not eliminate semantic ambiguity; it just distributes it faster.

Use Kafka for:

  • event propagation from authoritative domains,
  • replayable streams,
  • decoupled near-real-time projections,
  • outbox-backed reliability,
  • integration patterns where ordering and temporal flow matter.

Do not use Kafka as an excuse to publish every internal table and hope someone else will infer the meaning.

Migration Strategy

Most enterprises cannot replace a hidden dependency graph with a neat target state in one move. The graph is too entangled, the consumers too many, and the business too dependent.

This is a strangler migration problem.

You start by making the accidental integration layer visible. Inventory critical datasets, reverse ETL jobs, cross-domain transformations, manual reconciliation routines, and downstream consumers with operational significance. Then you classify them by criticality and semantic risk.

A practical migration sequence usually looks like this:

  1. Map the graph. Identify hidden operational dependencies in the data platform.
  2. Tag authority. Mark each important field or dataset as authoritative, derived, reconciled, or inferred.
  3. Stabilize contracts. For critical outputs, introduce explicit schemas, data quality gates, and ownership.
  4. Carve out domain projections. Move broad central transforms toward domain-aligned read models.
  5. Shift operationally critical integrations left. Rebuild the most sensitive write-backs and decision triggers onto APIs or event-driven services where transactional guarantees belong.
  6. Retain the platform for analytics and reconciled views. Do not throw away what it is good at.
  7. Continue strangling legacy flows. Replace brittle hidden dependencies one path at a time.

The migration should be progressive, not ideological.

A useful pattern is dual-running with reconciliation. For a time, you let both the old platform-derived integration and the new domain-owned integration produce outputs. Then you compare them, explain variances, and only cut over when the business and technical differences are understood.

That reconciliation phase is not overhead. It is the migration.

Diagram 3
Your Data Platform Is an Integration Layer in Disguise

This is especially important in finance, supply chain, and customer servicing. If you migrate without a disciplined reconciliation model, you merely trade hidden dependencies for hidden discrepancies.

Enterprise Example

Consider a global insurer.

Like many insurers, it grew by acquisition. It had multiple policy administration systems, a CRM estate spread across regions, separate billing platforms, and a modern digital claims stack. The enterprise data platform began as a reporting program to unify policy, premium, claims, and customer analysis. Over five years it evolved into something more consequential.

A Customer Household model in the warehouse became the de facto key for cross-sell campaigns. Then service teams started consuming policy-risk flags produced by a data science pipeline. Collections teams pulled “payment distress” segments generated from billing and behavioral data. Claims triage teams received propensity scores joined with fraud indicators and prior service history. Reverse ETL pushed many of these outputs back into CRM and work queues.

Nobody had set out to build an integration layer. But that is what they had.

The trouble started during a regional platform migration. Billing changed the semantics of “overdue balance” to reflect local regulation and grace-period handling. Analytics updated the transform correctly for reporting. But a downstream service workflow in CRM used that transformed field to prioritize outbound collections calls. In one region, compliant grace-period customers began receiving early contact. The issue was caught quickly, but not before legal and operations escalated.

The root cause was not bad SQL. It was semantic authority confusion. Billing owned the legal notion of collectibility. The platform had projected an analytical interpretation that became operational without adequate contract, ownership, or policy review.

The remediation was architectural.

They did not ban the data platform from decision support. Instead, they split the landscape into three classes:

  • Authoritative domain facts stayed in billing, policy, claims, and customer domains, exposed through APIs and domain events.
  • Analytical projections remained in the warehouse for reporting and modeling.
  • Operational decision projections were rebuilt as domain-approved read models with explicit freshness, lineage, and policy owners.

Kafka became the backbone for policy, claims, and billing domain events, using an outbox pattern from key transactional services. The data platform still consumed those events, but so did a small set of integration services building operational read models for CRM and case management.

They also introduced a reconciliation capability between billing status in the domain systems and the operational segments exposed to CRM. Every variance had an owner. Every exception type was classified. During migration, both paths ran side by side for three close cycles.

The result was not “simpler.” Enterprises rarely become simple. But it was legible. And legibility is a kind of safety.

Operational Considerations

If your data platform acts as integration infrastructure, run it like production middleware.

That starts with observability. You need more than pipeline success/failure dashboards. Measure freshness by domain-critical datasets, end-to-end latency by consumer, schema drift, contract violations, reconciliation break rates, and business-level data quality indicators. “Job succeeded” is meaningless if customer priority scoring is six hours stale.

Data quality controls need to be domain-aware. Null checks and row counts are table stakes. Real controls check policy invariants, state transitions, monotonicity where expected, duplicate business keys, and semantic compatibility across sources.

Lineage must include business semantics. A graph showing table-to-table transformations is useful. A graph showing that a “delinquency segment” influences collections outreach is better.

Security and privacy become more serious once the platform activates downstream processes. Fine-grained access, masking, retention enforcement, and audit trails must cover not only storage but movement and operational reuse.

And then there is incident management. When platform outputs drive operations, incident response must include domain teams, not just data engineers. The playbook should say what happens when a feed is delayed, a projection is wrong, or reconciliation variances spike. Do customers get held? Are cases paused? Is a fallback API available? Who has authority to disable activation?

The hidden graph becomes survivable when it is made operable.

Tradeoffs

There is no free architecture here.

Treating the data platform as part of the integration layer increases governance overhead. You will need more explicit contracts, more ownership clarity, more testing, and more operational discipline. Teams will complain that simple data sharing now feels bureaucratic. Sometimes they will be right.

Keeping authority in domain services can slow cross-domain delivery. Building proper APIs or event contracts is often slower than deriving a field in SQL. The platform can still be the fastest route to value for exploratory or analytical use cases.

A domain-aligned semantic model may also reduce the fantasy of one enterprise-wide canonical model. That can frustrate executives who prefer one customer definition to many contextual ones. But comforting simplifications are expensive in production.

Kafka and event-driven patterns improve decoupling but add complexity around ordering, replay, schema governance, and consumer responsibility. They are not a shortcut around organizational ambiguity.

Reconciliation itself has a cost. Dual-running systems and investigating variances consume time and attention. But that cost is usually lower than discovering semantic mismatches after cutover.

The architecture I am advocating is more honest, not more elegant. Honest architectures often feel heavier because they admit the enterprise’s actual complexity instead of burying it in transformation code.

Failure Modes

There are recurring ways this goes wrong.

The warehouse becomes the system of record by stealth. Teams start treating curated tables as authoritative because they are cleaner than source systems. Eventually nobody knows where truth originates.

Reverse ETL creates write-back loops. A projection derived from CRM and billing is pushed back into CRM, triggering workflows that generate more source data, which then alters the projection. Congratulations, you have built a circular dependency with no explicit control model.

Conformed dimensions erase domain nuance. Reporting-friendly harmonization leaks into operational processes that depend on distinctions the conformed model flattened away.

Kafka topics become another swamp. Without event discipline, the enterprise simply migrates hidden semantics from SQL transforms to streams.

Freshness assumptions drift. Consumers quietly begin depending on hourly data as if it were real time. Then a late-arriving pipeline causes a business outage nobody architected for.

Reconciliation is treated as temporary. Temporary variance scripts become permanent control points with no ownership, testing, or resilience.

Lineage is technical but not semantic. You can trace a field through ten jobs but still not know what it means or who can change its policy interpretation.

These are not edge cases. They are the normal consequences of under-designed integration through data platforms.

When Not To Use

Do not use the data platform as an integration layer for hard transactional invariants.

If a process requires synchronous validation, strict consistency, reservation semantics, payment authorization, inventory commitment, entitlement enforcement, or legally binding state transitions, keep it in operational services and dedicated integration mechanisms. The platform can observe and analyze these flows, but it should not own them.

Do not use it where latency and ordering are core business requirements unless you are genuinely prepared to engineer and operate the platform to that standard. Most analytics-oriented stacks are not designed for it.

Do not centralize domain logic in SQL just because cross-domain joins are convenient. Convenience is not a design principle.

And do not force every enterprise onto a platform-mediated model when domains are still immature. If ownership is unclear and semantics are unresolved, centralization will preserve ambiguity, not solve it.

Several patterns sit near this one.

Data mesh is useful when it pushes ownership of semantic data products toward domains. It fails when “data product” becomes a polite label for ungoverned extracts.

CQRS read models are relevant because many platform projections are really enterprise-scale read models. The trick is preserving the distinction between read optimization and write authority.

Event sourcing and event-driven architecture help when domains can publish reliable event histories. They do not absolve you from defining semantics.

MDM remains valuable where identity resolution and shared reference domains are central. But MDM should not become a blanket excuse for collapsing bounded contexts.

Strangler fig migration is the right mindset for evolving away from hidden dependencies without destabilizing the business.

Reconciliation services deserve more architectural respect than they usually get. In many enterprises, they are not side utilities. They are core trust mechanisms.

Summary

The uncomfortable truth is simple: many data platforms are already integration layers. They route meaning, not just data. They shape business behavior, not just reports. They sit in the dependency path of operational decisions whether architects acknowledge it or not.

Once you see that, the job changes.

You stop asking only how to ingest, transform, and serve data. You ask which domain owns meaning, which flows are operationally critical, where authority lives, how reconciliation works, and what hidden topology your platform has created. You stop pretending every curated dataset is harmless and start classifying which ones are effectively contracts.

This is where domain-driven design earns its keep. Bounded contexts, semantic clarity, and ownership are not ivory-tower concepts; they are the only sane defense against accidental enterprise coupling.

And this is where migration discipline matters. You do not rip out a hidden dependency graph in one heroic program. You expose it, classify it, reconcile it, and progressively strangle the most dangerous edges first.

A data platform can absolutely be a strategic advantage. But only if you stop treating it like a neutral storage plane when it is clearly acting as integration architecture.

The dependency graph is there whether you draw it or not.

Serious architects draw it.

Frequently Asked Questions

What is a data mesh?

A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.

What is a data product in architecture terms?

A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.

How does data mesh relate to enterprise architecture?

Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.