Your Data Platform Is a Dependency Map

⏱ 20 min read

Most enterprise data platforms are introduced with the wrong story.

They’re pitched as a central nervous system, a single source of truth, a lake, a mesh, a fabric, a hub. Pick your metaphor. The trouble starts when the metaphor becomes the design. Teams then spend millions building an elegant storage system, a heroic ingestion framework, or a very expensive Kafka installation—only to discover they haven’t built a platform at all. They’ve built a queue with branding. event-driven architecture patterns

A real data platform is not defined by where data sits. It is defined by who depends on whom, for what meaning, at what speed, and with what consequences when things go wrong. In other words, your data platform is a dependency map.

That sounds obvious. It isn’t. Most architecture decks still draw data as if it were water flowing downhill through boxes. Source systems on the left. Pipelines in the middle. Dashboards and machine learning on the right. This is comforting, and wrong in the way comforting diagrams often are. Enterprises do not fail because bytes didn’t move. They fail because business meaning crossed boundaries without enough care. “Customer” means one thing in billing, another in sales, and something politically explosive in compliance. “Order” can be a promise, a shipment, an invoice trigger, or a legal record. Once you see the world that way, the platform stops being a storage problem and starts being an architectural dependency problem.

That shift matters. It changes how you decompose domains. It changes where Kafka belongs. It changes how microservices and analytics interact. It changes migration strategy from “lift the data into the cloud” to “progressively relocate dependencies and make reconciliation survivable.” It also tells you when not to use a data platform pattern at all. microservices architecture diagrams

This article takes the unfashionable position that successful data architecture begins with semantics and dependency management, not tooling. We’ll look at the problem, the forces that make it hard, a practical target architecture, a migration path that doesn’t require a revolution, and the operational realities that architects too often hide in the appendix.

Context

Enterprises inherit data landscapes the way old cities inherit streets: not from a grand plan, but from years of local optimization.

ERP systems become systems of record because procurement said so. CRM systems become “the customer master” because commercial operations had budget. A claims platform, a warehouse management package, a core banking platform, a policy engine, a product catalog, and thirty-seven spreadsheets each become authoritative for something. Then reporting arrives, then APIs, then event streaming, then machine learning, and every new capability promises to “unify” what the previous wave fragmented.

What emerges is not one platform. It is a web of obligations.

Finance depends on order data, but only after legal entity normalization. Customer support depends on shipment events, but needs them correlated with returns and SLA rules. Marketing wants “customer 360,” which sounds tidy until you ask whether a household, a contract holder, a device user, and a newsletter subscriber are the same person. Risk wants near-real-time signals. Compliance wants immutable lineage. Operations wants to know why yesterday’s revenue changed after a supposedly final close.

These are not storage concerns. They are dependency concerns between domains, decisions, and time horizons.

Domain-driven design gives us a much better lens here than classic enterprise data modeling. DDD asks: what is the bounded context, what language does it use, and what can it truly promise to others? That is exactly the right set of questions for a data platform. The platform exists to make cross-context dependencies explicit and governable. It should not flatten the language of the enterprise into one giant canonical fantasy.

Canonical models are often where good intentions go to die. They aim for universal consistency and end up delivering universal ambiguity. The platform should instead preserve domain semantics while making translations visible, versioned, and operationally safe.

Problem

The problem isn’t merely that data is siloed. Silos are sometimes healthy. The problem is hidden coupling.

A dashboard depends on a transformed customer table that depends on CRM extracts and billing adjustments and a deduplication rule written by a contractor two years ago. A fraud model depends on event streams that silently changed from business events to technical state changes. A product recommendation service depends on inventory feeds that arrive “near real time,” which in practice means every 27 minutes except on month-end.

No one sees the full dependency chain until an executive asks why the numbers don’t match.

Traditional data platforms make this worse in three predictable ways.

First, they centralize data without centralizing meaning. Teams load everything into a warehouse, lakehouse, or streaming backbone and declare victory. But raw colocation does not produce semantic coherence. It merely creates a larger blast radius for misunderstanding.

Second, they overcorrect with premature standardization. A central team invents a canonical customer schema, a universal event taxonomy, and enterprise-wide golden records before domains are stable enough to support such abstractions. Every team then maps reluctantly to this model, often losing nuance, inventing exceptions, or bypassing the platform entirely.

Third, they ignore time as a first-class architectural concern. Data consumers do not merely depend on facts. They depend on facts at a certain latency, with certain replay properties, with a certain reconciliation posture. Batch versus streaming is not a tooling debate; it is an agreement about the shape of dependency.

This is where Kafka and microservices often enter the conversation badly. Kafka is treated as the cure for all integration disease. Microservices are expected to publish every state change as an event, and analytics teams are told to “just subscribe.” That usually creates a stream of low-level noise rather than business truth. Event-driven architecture only helps when the emitted events reflect stable domain semantics and when consumers understand what guarantees they are buying.

If your platform is a dependency map, then every feed, event, table, API, and derived model should answer four questions:

What business concept does this represent?
Which bounded context owns that meaning?
What dependency guarantees are provided?
How will divergence be reconciled when—not if—it appears?

Many enterprises cannot answer those questions today.

Forces

Several forces pull against a coherent design.

1. Domains are real, but funding is centralized

The business lives in domains. Budgets often do not. A central platform team is asked to serve finance, supply chain, sales, digital, and compliance with one operating model. That pushes toward generic infrastructure and away from semantic stewardship.

2. Local optimization beats enterprise purity

A supply chain team with a quarterly target will build the shortest path to value. If the platform imposes too much modeling overhead, they will export CSVs, build side pipelines, or stuff logic into dbt, Spark, or application services. They are not being rebellious. They are meeting deadlines.

3. The same noun means different things

This is the old enterprise wound. “Customer,” “product,” “account,” “exposure,” “booking,” “contract”—these words do not travel cleanly across contexts. Domain semantics are not a data quality nuisance. They are architecture.

4. Latency requirements vary wildly

Some dependencies are operational and sub-second. Some are analytical and daily. Some must be replayable. Some must be final. Trying to force all of them into a single batch warehouse or a single streaming backbone is a category mistake.

5. Legacy systems do not migrate in one piece

Mainframes, packaged apps, and old integration hubs aren’t replaced atomically. They coexist with modern services for years. That means the platform must handle overlap, partial authority, and reconciliation between old and new truth sources.

6. Governance usually arrives too late

By the time lineage, ownership, data contracts, and quality controls are discussed, dozens of hidden dependencies already exist. Governance then feels punitive because it is trying to retrofit clarity onto accidental architecture. EA governance checklist

These forces explain why “just build a lakehouse” rarely solves the real problem. The issue is not where to put data. The issue is how to manage semantic and operational dependencies across changing domains.

Solution

The solution is to design the data platform as a dependency management system for domain data products.

That phrase needs precision.

A domain data product is not just a curated table or an event topic. It is a named, owned, versioned representation of domain meaning, published with explicit guarantees for downstream use. It may be exposed as batch tables, streaming events, APIs, feature views, or all three. The platform’s job is to host, govern, and observe these products and their dependencies.

This is where domain-driven design helps cut through fashionable nonsense. The owner of “Order Accepted” should be the order management bounded context, not the central data team. The owner of “Invoice Settled” should be finance. The owner of an enterprise-wide “Customer Revenue Segment” might be a cross-domain analytical product, but it should declare its upstream dependencies explicitly rather than pretending to be the customer master.

So the architecture should do three things:

Preserve domain ownership of meaning.
Make cross-domain dependencies explicit and observable.
Support multiple delivery modes—batch, streaming, API—without semantic drift.

The platform becomes less like a lake and more like an air traffic control system. It doesn’t fly the planes. It makes routes, dependencies, sequencing, and failures visible enough that the enterprise can operate safely.

A practical design has four layers:

Operational domain sources: applications, microservices, packages, and legacy platforms where business transactions originate.
Domain publishing layer: outbox/event publication, CDC, APIs, and curated extracts that expose business meaning from each bounded context.
Data product layer: domain-aligned, versioned data products for analytical and operational reuse.
Consumption layer: dashboards, ML, workflows, regulatory reporting, customer experiences.

The key rule is simple: the closer you are to the domain boundary, the more you preserve source semantics. The farther you move into cross-domain products, the more you acknowledge that you are composing, not discovering, meaning.

Target architecture

Notice what this diagram does not show: one magical canonical layer where all disagreements disappear. That layer is usually fiction.

Instead, each data product carries a contract. Not merely schema, but semantics, freshness, keys, correction policy, lineage, and expected use. If a product is derived from three upstream domains with different close calendars and reconciliation rules, say so. Good platforms make dependency debt visible.

Architecture

Let’s get more concrete.

Domain semantics first

Every serious platform should begin with a semantic map, not a physical design. This is DDD applied to data architecture.

List the key business domains. Identify bounded contexts. For each important concept—customer, order, policy, account, inventory, payment—document where its meaning originates, where it is translated, and where it is merely consumed. This creates an architecture that people can argue about usefully.

You do not need a grand enterprise ontology. In fact, you probably should avoid one at first. What you need is a manageable set of explicit contracts between contexts.

For example:

Order Management owns “Order Accepted,” “Order Cancelled,” “Order Line.”
Warehouse owns “Item Picked,” “Shipment Dispatched.”
Billing owns “Invoice Issued,” “Payment Received.”
Finance Analytics may own “Recognized Revenue by Period,” but that is a derived analytical product, not a source-domain fact.

That distinction saves endless confusion.

Kafka where it actually helps

Kafka is useful when you have multiple consumers with different timing needs, when replay matters, when event ordering within a key matters enough, and when producers can publish stable business events. It is not useful as a dumping ground for every database change in the estate.

Publishing raw CDC from ten systems into Kafka and calling it event-driven architecture is one of those modern enterprise rituals that looks impressive and solves little. CDC is a migration and integration tool. Business events are a semantic contract. They are not the same thing.

Use Kafka for:

domain events with clear business meaning
stream processing where timeliness changes decisions
replayable dependency chains
decoupling many consumers from a stable publication model

Do not use Kafka as:

a substitute for domain modeling
a hidden ETL layer with no ownership
the only interface for consumers that need query semantics
a blanket mandate for every microservice

Dependency graph and contracts

The platform should maintain a dependency catalog: which products depend on which sources, with what transformations, SLAs, and semantic assumptions. This is not just lineage for auditors. It is how you understand blast radius.

When “Customer Revenue Segment” changes logic, the dashboard and retention model may shift while operational workflows remain stable. That is architecture, not metadata trivia.

Reconciliation as a first-class design concern

In real enterprises, upstream systems disagree. They disagree because updates arrive late, because packaged apps use different keys, because corrections backdate financial facts, because duplicated identities remain unresolved, because old and new platforms overlap during migration.

The worst mistake is to treat reconciliation as an edge case. It is central.

A mature platform distinguishes between:

event truth: what happened according to a source at a point in time
state truth: what the current record says now
decision truth: what downstream consumers should use for a particular purpose

These can differ legitimately.

A payment event may arrive before an invoice correction. Inventory state may show available units even though reservation events imply overcommit. Revenue recognition may revise historical numbers after a close adjustment. The platform must support correction, replay, and published reconciliation outcomes.

This is especially important during migration. For months or years, two systems may both claim authority over overlapping entities. Reconciliation products then become business-critical artifacts, not temporary plumbing.

Migration Strategy

Most platform transformations fail because they assume a clean break. Enterprises do not do clean breaks unless they have gone bankrupt and been reborn.

The sane path is a progressive strangler migration.

Start by mapping current dependencies, not current systems. Ask which decisions, reports, workflows, and regulatory outputs depend on which data products today. Then identify candidate bounded contexts where a modern publication model can be introduced without forcing the whole estate to change.

The migration usually moves through five stages.

1. Expose legacy truth safely

Use CDC, extracts, or wrapper APIs to expose existing systems as domain-aligned source publications. Do not pretend they are elegant. The goal is visibility and repeatability.

2. Introduce explicit data products

Create owned, versioned products for high-value domains. Customer identity, order lifecycle, inventory availability, payment status—pick areas with broad dependency but manageable semantics.

3. Build reconciliation between old and new

As modern microservices or SaaS platforms are introduced, maintain reconciliation products that compare records, detect divergence, and publish exceptions. Architects who skip this step usually end up in endless “which system is right?” meetings.

4. Shift consumers progressively

Move downstream consumers one dependency at a time from direct source coupling to data products. Dashboards first, then ML features, then operational consumers where confidence is higher.

5. Retire hidden dependencies

Only after consumers have moved should legacy point-to-point feeds and brittle ETL chains be decommissioned.

Strangler migration view

This is the important migration truth: you are not replacing technology first. You are relocating dependency with controlled semantics.

That is slower than a slide deck and faster than failure.

Enterprise Example

Consider a global retailer modernizing order-to-cash across e-commerce, stores, and wholesale channels.

The existing estate looked familiar: SAP for finance, a legacy order management platform, Salesforce for customer engagement, a warehouse system from an acquisition, and dozens of SQL-based reporting marts. Executives wanted “real-time customer 360” and “one version of inventory truth,” which is the sort of phrase that can burn a year of architecture effort before anyone asks useful questions.

The actual business pain was more specific.

Customer service could not answer where an order was if it crossed channels. Finance could not reconcile recognized revenue against returns quickly enough at month-end. Digital teams over-promised stock because e-commerce availability did not align with warehouse reservations. Meanwhile, a new microservices-based commerce stack was being introduced region by region.

A conventional approach would have centralized all source data into a cloud lakehouse and built a canonical customer and order model. The architecture team resisted that. Correctly.

Instead, they modeled bounded contexts:

Commerce owned order acceptance and basket semantics.
Warehouse owned physical fulfillment state.
Billing/Finance owned invoice and settlement meaning.
Customer Engagement owned consent and contactability.
Identity Resolution was treated as a derived cross-domain capability, not the owner of customer truth.

They introduced Kafka, but only for business events that reflected stable domain milestones: OrderAccepted, OrderAllocated, ShipmentDispatched, InvoiceIssued, PaymentCaptured, ReturnReceived. CDC from legacy systems was used behind the scenes for migration and backfill, but not exposed as enterprise business events.

Then they built four key data products:

Order Lifecycle Product
Inventory Availability Product
Customer Contactability Product
Revenue Recognition Product

The ugly but essential part was reconciliation. For nearly eighteen months, the old order platform and the new commerce services overlapped. Orders originated in different channels on different stacks. Rather than hide this, the team created a Cross-Channel Order Reconciliation Product that flagged duplicate keys, timing drift, state mismatches, and financial exceptions. Support staff used it daily. Finance used it at close. It was not glamorous. It saved the program.

Results came in waves, not miracles. Customer service resolution time dropped because the Order Lifecycle Product gave a unified journey. Revenue reporting became auditable because the finance product declared correction windows and source dependencies explicitly. Inventory promises improved, though never perfectly, because warehouse and commerce semantics remained distinct rather than being force-fit into a fake universal stock table.

The biggest win was architectural clarity. New teams stopped wiring themselves directly to source databases. They consumed products with known contracts. That reduced accidental coupling and made future migration cheaper.

That is what a data platform should do.

Operational Considerations

A dependency-centric platform lives or dies in operations.

Ownership

Every data product needs a real owner. Not a committee, not “the platform,” not a mailbox. Ownership means semantic stewardship, quality response, versioning decisions, and communication with consumers.

SLOs and freshness

Freshness is not a generic metric. A fraud scoring input with 5-second delay has different business impact from a finance mart delayed by 4 hours. Product contracts should state expected latency, completeness windows, and correction behavior.

Observability

Observe pipelines, yes—but more importantly observe products. Track schema changes, event volume anomalies, key cardinality shifts, freshness breaches, reconciliation rates, and semantic drift indicators. A healthy dashboard is less useful than knowing 12% of orders no longer map to revenue periods correctly.

Versioning

Version schemas and semantics separately where possible. You can preserve schema compatibility while changing business meaning, and that is often more dangerous than a breaking technical change.

Access and governance

Governance should follow product boundaries. Access control, retention, lineage, and policy tags belong in the publication model, not as an afterthought. Especially in regulated industries, domain ownership and policy enforcement must travel together.

Replay and recovery

If you use Kafka or stream processors, decide early how replay works, what can be recomputed, and what needs snapshot correction. Replaying technically valid but semantically superseded events can create downstream nonsense if correction policies are vague.

Tradeoffs

There is no free architecture. This style has costs.

More upfront semantic work

You must invest in domain discovery, bounded contexts, and product contracts. Teams that want immediate delivery will find this annoying. Sometimes they are right.

Distributed ownership is messy

Domain ownership improves meaning but complicates consistency. A central team can force standards faster, though often at the price of local resentment and semantic distortion.

Reconciliation adds visible complexity

Executives prefer diagrams where systems agree. Real systems do not. Publishing reconciliation products makes architecture look messier, because it is messier. That honesty is a virtue, but not always a popular one.

Not every consumer wants a product

Some analysts just need a table. Some applications just need an API. If your platform insists everything be a grand data product ceremony, teams will route around it.

Kafka can multiply operational burden

Streaming platforms bring partitioning, ordering, replay, schema governance, backpressure, consumer lag, and retention economics. Use them because the dependency pattern demands them, not because “real time” sounds modern. ArchiMate for governance

Failure Modes

This approach fails in familiar ways.

1. Data products become renamed tables

If products lack semantic contracts, ownership, and consumer guarantees, you have not built products. You have renamed datasets.

2. Canonical model sneaks back in

Under pressure, central teams often reintroduce one universal model “just for consistency.” Soon every exception flows through it and the enterprise is back to semantic mush.

3. Event streams carry technical noise

If microservices publish CRUD chatter instead of business events, downstream consumers inherit implementation details rather than domain truth.

4. Reconciliation is treated as temporary

Temporary reconciliation has a way of becoming permanent without support. If overlap will last 18 months, design reconciliation like a product, not a script.

5. Ownership is nominal

A data product with no empowered owner will degrade into stale logic and confused consumers. Platforms fail less from bad technology than from absent stewardship.

6. Batch and streaming split semantics

One team defines “active customer” in the warehouse, another emits a different notion in events, and consumers quietly pick whichever is convenient. This is semantic fragmentation by interface.

When Not To Use

You should not use this style everywhere.

If you are a small organization with one operational system and a handful of reports, a lightweight warehouse and sensible modeling is enough. Do not invent data products, event contracts, and reconciliation machinery to solve problems you do not have.

If your domains are not yet stable—say, a startup radically changing business model every quarter—heavy semantic contracts may slow you down. Use looser conventions until the language hardens.

If your primary need is straightforward analytical consolidation with low change frequency, a conventional dimensional warehouse may be the better answer. Not every enterprise problem requires Kafka, microservices alignment, or a dependency graph.

And if the organization has no appetite for distributed ownership, stop pretending architecture can substitute for governance culture. A domain-centric platform without domain accountability becomes a central bottleneck wearing a federation costume.

Several adjacent patterns are worth understanding.

Data Mesh: useful for emphasizing domain ownership, but often too vague on operational mechanics. Good principle set, dangerous as a slogan.
Event-Driven Architecture: excellent for time-sensitive dependency decoupling when events are semantically strong.
CQRS: helpful when operational reads and writes need different models, especially if data products feed query-optimized views.
Strangler Fig Pattern: essential for migration. Replace dependencies gradually, not systems theatrically.
Change Data Capture: powerful migration and integration tool, but not the same as a business publication model.
Master Data Management: useful when identity and reference coordination matter, but should not be mistaken for universal semantic ownership.

The trick is not choosing one religion. It is combining patterns with discipline.

Summary

A data platform is not a place where data goes. It is a map of what the enterprise depends on, what those dependencies mean, and how safely they can change.

That is the architecture worth building.

When you treat the platform as a dependency map, several good things happen. Domain semantics become explicit. Bounded contexts stop being abstract workshop artifacts and start guiding publication design. Kafka gets used for business events instead of technical exhaust. Microservices integrate with analytics without pretending every state change is valuable. Migration becomes a progressive strangler exercise in relocating dependencies, not a fantasy of instant replacement. Reconciliation stops being a shameful afterthought and becomes a core operational capability.

And perhaps most importantly, the enterprise gets an honest picture of itself. Not a neat left-to-right flow. A living web of obligations, meanings, delays, and tradeoffs.

That honesty is architecture. Everything else is plumbing.

Frequently Asked Questions

What is a data mesh?

A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.

What is a data product in architecture terms?

A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.

How does data mesh relate to enterprise architecture?

Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.