Your Data Platform Is a Governance Problem

⏱ 20 min read

Most data platform failures do not begin with technology. They begin with optimism.

A company buys a lakehouse, a streaming platform, a catalog, a policy engine, and three different dashboards that all promise “trust.” The architecture diagram looks modern. There are boxes for Kafka, Spark, object storage, a semantic layer, and machine learning. People congratulate each other for “building a foundation.” event-driven architecture patterns

Six months later, finance cannot reconcile revenue between systems. Compliance asks who approved access to customer data and gets five contradictory answers. Product teams publish events with names that sound right but mean different things in different contexts. Data scientists have five copies of “customer,” none of which agree. Every team insists the platform exists; nobody can say where authority lives.

That is the real issue. A data platform is not primarily a storage problem, nor a query problem, nor even an integration problem. It is a governance problem. EA governance checklist

Governance is one of those words people use when they want to sound responsible. Usually they mean controls, committees, and a grim spreadsheet of approvals. That is not what matters here. In architecture, governance is the operational expression of meaning, decision rights, and policy. It is how an enterprise turns domain semantics into enforceable behavior. Without it, your platform becomes a very expensive rumor mill. ArchiMate for governance

The hard truth is this: data platforms fail when they separate data from the business domains that give it meaning, and when they treat policy as documentation instead of execution. A table called customer_balance is not a fact. It is a claim. The question is: who is allowed to make that claim, under what rules, and how do downstream systems know whether to trust it?

That is why platform architecture has to start with domain-driven design thinking. If you do not understand bounded contexts, ownership, ubiquitous language, upstream/downstream relationships, and where truth is negotiated versus where truth is authoritative, then your governance tooling will merely automate confusion. Faster confusion is still confusion.

This article argues for a more grounded approach: design the data platform around domain semantics and policy enforcement, not just pipelines and storage tiers. Use progressive migration, not revolution. Accept reconciliation as a first-class capability, not a shameful afterthought. And be explicit about tradeoffs, because every governance model buys safety by constraining convenience.

Context

Enterprises now expect one platform to do everything.

It must support operational analytics, regulatory reporting, machine learning, self-service BI, event streaming, data sharing, lineage, retention, masking, and auditability. It must bridge legacy ERP systems, SaaS applications, custom microservices, and partner data feeds. It must serve both central platform teams and autonomous product teams. It must be cheap, fast, secure, discoverable, and compliant. microservices architecture diagrams

That expectation produces a familiar anti-pattern: the platform becomes a neutral bucket into which every system throws data, while governance is postponed to “later.” Later never arrives. Instead, the enterprise accumulates datasets detached from source accountability, policies buried in ETL scripts, and semantic drift spread across reports, topics, and APIs.

The modern stack worsens this if used carelessly. Kafka makes publication easy. Microservices decentralize behavior. Cloud warehouses make replication cheap. Lakehouses preserve every variant forever. Each choice is rational in isolation. Together, they can create a landscape where the same business event is reinterpreted ten times before breakfast.

In older architectures, centralization at least forced arguments into one room. In newer architectures, those arguments happen asynchronously through schemas, contracts, ACLs, code, and dashboards. The problem did not disappear. It fragmented.

So the platform architect’s job is not to “enable data.” That phrase is too vague to be useful. The job is to decide where meaning is established, how policy is enforced, how trust is signaled, and how divergence is detected and reconciled.

Problem

Most organizations build data platforms as transport and persistence layers. Governance is then bolted on in four weak forms:

a catalog that documents assets but does not control behavior
role-based access controls applied to storage but not to data semantics
pipeline conventions enforced socially, not technically
stewardship processes that discover problems after they have propagated

This creates a dangerous mismatch between how the enterprise talks about data and how the platform behaves.

The business talks in domain terms: customer, claim, order, invoice, consent, exposure, premium, limit, account, shipment. Those terms have context. A “customer” in billing is not always the same thing as a “customer” in CRM or risk. A “completed order” in e-commerce may differ from a “recognized sale” in finance. The platform often flattens these distinctions into generic schemas and shared tables, then acts surprised when reports disagree.

Domain semantics are not metadata garnish. They are the architecture.

When semantics are vague, policy becomes vague too. Who can see “customer”? Which customer definition? Which fields? Under what legal basis? Can it be joined with claims data? Can it leave a region? Can a machine learning feature store retain it longer than the source application? Can a downstream team republish it on Kafka? If those questions cannot be answered automatically, the platform is not governed. It is merely watched.

The cost shows up in the usual places:

regulatory exposure because controls are not demonstrably enforced
reconciliation work because datasets diverge silently
low trust because metrics are debated more than used
delivery friction because every data product requires bespoke approvals
shadow platforms because business teams route around central bottlenecks

And beneath all of it sits the most expensive symptom: authority is unclear. Nobody knows which domain owns the meaning of the data and which platform capability is the actual policy decision point.

Forces

Architects like clean solutions. Enterprises produce compromises. A useful design has to respect the forces at work.

Domain autonomy versus enterprise consistency

Domains need local control. The claims team should not wait for a central committee to evolve its event model. The finance team should own its accounting semantics. But the enterprise still needs shared obligations around privacy, retention, audit, lineage, and interoperability.

Too much autonomy gives you semantic sprawl. Too much centralization gives you a platform nobody can move on.

Real-time distribution versus governed consumption

Kafka and event-driven microservices let teams publish state changes quickly. That is good for responsiveness. It is also an excellent way to spread bad assumptions at scale. Events are often treated as raw truth when they are really local observations from a bounded context.

Streams accelerate propagation. They do not create meaning.

Self-service versus policy enforcement

Everyone says they want self-service data. They usually mean instant access. But self-service without policy-aware controls becomes self-service liability. The platform must allow discovery and access while enforcing domain-specific restrictions automatically.

Historical preservation versus semantic drift

Data lakes and object stores preserve everything. This is useful for audits, reprocessing, and machine learning. It also means obsolete meanings survive forever. A field can be retained long after its business interpretation changed. Without versioning, lineage, and policy inheritance, history becomes a swamp rather than a record.

Migration urgency versus operational risk

Most firms do not get to build from scratch. They have warehouses, marts, ETL jobs, master data tools, and hand-crafted reports already in production. Governance architecture must support progressive strangler migration. If your design only works after a big-bang cutover, it will remain a slide deck.

Local truth versus enterprise reconciliation

In domain-driven design, each bounded context has its own model. Good. But executives still want enterprise totals. Regulators still expect reconciled reporting. You cannot wish that away with slogans about decentralization. Reconciliation is the bridge between local truths and enterprise obligations.

Solution

The core solution is simple to say and harder to do: organize the data platform around authoritative domain data products and make policy enforcement executable at every boundary.

That means five things.

First, define authoritative sources by bounded context. Not every dataset deserves equal status. Some are system-of-record facts, some are derived projections, some are cross-domain aggregates, and some are analytical conveniences. The platform should reflect those distinctions clearly.

Second, separate semantic ownership from platform operation. A central platform team should run common capabilities: storage, streaming, policy engine, lineage, catalog, observability, and developer tooling. Domain teams should own the meaning, quality expectations, and publication contracts of their data products.

Third, enforce policy in code and runtime controls, not in wiki pages. Access, masking, regional residency, retention, publication rights, and data sharing constraints should be applied consistently across Kafka topics, APIs, files, and warehouse objects.

Fourth, treat reconciliation as a product. Build explicit processes and services that compare authoritative domain outputs to downstream projections, financial ledgers, and regulatory reports. The point is not to eliminate inconsistency completely. The point is to detect, explain, and resolve it.

Fifth, migrate progressively using a strangler pattern. Wrap legacy pipelines with governance controls, expose canonical domain contracts gradually, and move consumers from unmanaged datasets to governed products over time.

This is not a plea for central bureaucracy. Quite the opposite. Good governance is what lets you decentralize safely. It narrows argument to the places where argument is justified and automates the rest.

Architecture

The architecture has three layers of concern: domain publication, platform governance, and consumption with trust signals.

At the domain edge, operational systems and microservices publish events and snapshots. In a Kafka-centric environment, each bounded context emits domain events, not pretend-global facts. Contracts are versioned. Producers are accountable for semantics and quality indicators. Some contexts may also publish APIs for current-state lookups.

In the governance layer, a policy enforcement plane sits across all transport and storage mechanisms. This is where access policies, data classification rules, retention controls, schema compatibility, and publication constraints are executed. The policy plane is fed by metadata from the catalog, lineage systems, domain registries, and IAM.

Then comes the data product layer. Authoritative products are marked as such, with declared owners, service-level expectations, permitted uses, quality dimensions, and downstream obligations. Derived products inherit and refine policy. Analytical marts and feature stores are not exempt; they must carry policy lineage forward.

Finally, consumers—BI, regulatory reporting, data science, partner sharing, applications—consume not just data, but trust metadata: owner, freshness, reconciliation status, quality score, and policy posture.

Here is a practical view.

Diagram 1 — Your Data Platform Is a Governance Problem

The important feature here is not the boxes. It is the placement of policy. Policy is not a sidecar attached only to the warehouse. It mediates publication, storage, transformation, and consumption.

A second pattern matters just as much: explicit authority and reconciliation paths.

Diagram 2 — Your Data Platform Is a Governance Problem

This is the architecture many teams skip because they think reconciliation implies failure. It does imply failure. Routine, ordinary, expected failure. That is exactly why you design for it. Enterprises are full of timing differences, local definitions, delayed events, corrections, and legal overrides. Pretending all views line up naturally is a child’s idea of integration.

Domain semantics discussion

Domain-driven design gives us a disciplined way to avoid semantic collapse.

A bounded context defines where a term has a precise meaning. “Account” in treasury, retail banking, and identity management may share a label and almost nothing else. A sane platform does not force those meanings into one universal schema. Instead, it allows multiple authoritative products, each rooted in its context, and then creates explicit mapping and translation where cross-domain use is required.

That is the difference between integration and confusion.

A data product should therefore include semantic declarations:

bounded context
business owner and technical owner
authoritative status
glossary terms and definitions
event or entity lifecycle definition
quality and freshness commitments
allowed transformations and prohibited uses
join constraints where identity is probabilistic or regulated

These are not decorative metadata fields. They drive policy decisions. For example, a consent attribute from a privacy context may constrain whether customer profile data from another context may be used in a marketing feature store. If the platform cannot express and enforce that relationship, then the metadata is merely a brochure.

Migration Strategy

Big-bang data platform transformations are where good intentions go to die.

The right migration approach is progressive strangler migration. You do not replace the old world in one release. You establish a governed path beside it, redirect traffic gradually, and use reconciliation to prove equivalence or explain divergence.

The sequence usually looks like this:

Map bounded contexts and authority.

Identify core business domains and the systems that currently act as de facto authorities. Do not chase perfection. Start with business-critical objects like customer, order, invoice, claim, policy, account.

Classify existing datasets.

Label what is authoritative, derived, reference, duplicated, unmanaged, or deprecated. This exercise alone often reveals more than any lineage scan.

Introduce a central policy enforcement plane.

Start where risk is highest: sensitive data access, retention, region controls, and publication permissions. Enforce these across warehouse objects, object storage, and Kafka topics.

Wrap legacy sources.

Use CDC, APIs, or export jobs to expose governed domain products without rewriting every source system. Legacy systems can remain ugly and still participate in a disciplined architecture.

Publish domain contracts.

Introduce schema/contract registries and ownership declarations. New consumers should prefer governed products over direct table access.

Migrate consumers gradually.

Move reports, applications, and models one use case at a time. Use side-by-side outputs and reconciliation to build trust.

Decommission unmanaged paths.

This is the step organizations avoid. If you never turn off old feeds and shadow marts, the migration never finishes.

The strangler pattern works because it respects enterprise gravity. Existing reporting cycles, month-end close, regulatory obligations, and contractual integrations cannot be paused while architects admire their principles.

Here is that migration visually.

Diagram 3 — Your Data Platform Is a Governance Problem

Reconciliation in migration

Reconciliation deserves direct emphasis. During migration, it answers three uncomfortable but vital questions:

Did the new governed product produce the same result as the old pipeline?
If not, is the difference a bug, a timing issue, or a corrected business definition?
Who gets to approve the new meaning?

This is where governance becomes real. You need exception workflows, domain steward decisions, documented semantic deltas, and audit trails. In finance and regulated industries, this is not optional. It is the mechanism by which migration becomes credible.

Enterprise Example

Consider a multinational insurer modernizing its data estate.

The company has policy administration systems in three regions, a central claims platform, a billing platform, dozens of microservices, and a long-established enterprise warehouse. It also has Kafka because every transformation program eventually acquires Kafka. The CIO’s initial instinct is familiar: “Let’s centralize everything into the lakehouse and govern it there.”

That would have been a mistake.

Why? Because “policy,” “claim,” “customer,” and “exposure” do not mean one thing across underwriting, billing, claims, and compliance. Claims may consider a party as involved in an incident long before billing recognizes them as a billable customer. Finance recognizes revenue on rules that differ from operational policy issuance. Privacy obligations vary by region. A global customer 360 assembled casually would be both semantically naive and legally dangerous.

So the architecture team took a different route.

They defined bounded contexts: Policy Administration, Claims, Billing, Customer Interaction, Finance, and Compliance. Each context published authoritative domain products. Kafka topics carried domain events such as policy-issued, premium-adjusted, claim-opened, and invoice-settled. The schema registry enforced compatibility rules. But more importantly, each product had explicit semantic ownership and policy metadata.

The central platform team implemented a policy engine that integrated IAM, catalog classifications, data residency rules, and retention schedules. Access to claims data for analytics required a lawful-use policy, not just a role membership. Kafka consumers could not subscribe to restricted topics unless their service account, purpose, and region matched policy. Warehouse views inherited masking rules automatically. Feature store ingestion was blocked for fields whose consent basis did not permit model training.

Notice the pattern: governance was enforcing business obligations, not merely permissions.

Migration was handled with a strangler strategy. The old warehouse remained in place. CDC wrapped the legacy regional policy systems and billing database. New governed data products were published side-by-side. Finance reports were run in parallel for two quarter-closes, with reconciliation services comparing premium and receivable totals between old marts and new domain-based views. Differences were categorized: timing lag, duplicate correction logic, changed semantic interpretation, or plain defect.

One memorable issue emerged around “written premium.” Underwriting operations considered it booked when a policy was issued. Finance considered it recognized under a more constrained accounting process. The old warehouse had blurred these meanings for years. The migration surfaced the disagreement. This delayed rollout by six weeks and saved the firm from institutionalizing a bad metric in the new platform.

That is what good governance does. It makes the real argument visible before the platform turns it into permanent machinery.

The result was not utopia. Self-service got slightly harder because the governed path asked more questions. But regulatory response times improved, lineage became defensible, and the number of contradictory executive dashboards dropped sharply. More importantly, teams knew which products were authoritative and which were convenience layers.

In enterprise architecture, clarity beats elegance.

Operational Considerations

A governed platform needs operations that match its ambition.

Policy lifecycle management

Policies change. Regulations evolve. New uses emerge. You need versioned policies, test environments, and staged rollout. A policy that accidentally blocks month-end close is not a theoretical concern; it is a career event.

Treat policy changes like code deployments:

version them
test them against representative data and access scenarios
observe impact
provide rollback

Metadata quality

Governance depends on metadata that is complete enough to drive decisions. Missing ownership, stale classifications, and broken lineage links undermine policy enforcement quickly. The platform team should measure metadata completeness as an operational metric, not a documentation aspiration.

Contract governance for Kafka and APIs

In event-driven architectures, schema compatibility alone is insufficient. A producer can make a semantically breaking change without altering field shape. For example, changing when an event is emitted or altering the business rules behind a status code can damage dozens of consumers.

That means review mechanisms for domain contracts must cover semantics, timing assumptions, idempotency, and retention expectations.

Exception handling and stewardship workflows

There will be violations, mismatches, and override requests. Build workflows for them. A platform that only handles the happy path will create back-channel workarounds by Friday.

Observability

Observe:

policy decision rates and denials
data product freshness
contract violation rates
reconciliation exceptions
lineage coverage
sensitive data movement across regions and domains

Governance without observability becomes faith.

Tradeoffs

This architecture is worth doing, but it is not free.

The biggest tradeoff is speed versus discipline. Teams accustomed to dumping data into a lake and figuring it out later will feel constrained. They are right. They are being constrained. That is the point.

Another tradeoff is local optimization versus enterprise coherence. Domain teams may resent semantic reviews or publication controls. Central teams may overreach and try to standardize too much. The design only works if authority is explicit: central platform governs the mechanism of control; domains govern meaning within their bounded contexts.

There is also a cost in complexity. A policy plane, contract registry, lineage, reconciliation services, and metadata-driven enforcement are more complex than a pile of ETL jobs. But complexity does not vanish when ignored. It reappears later as fines, mistrust, duplicated marts, and impossible audits.

One more tradeoff is user experience. Strong governance can make self-service feel less magical. Good tooling softens this, but cannot erase it. “Request access” is less glamorous than “query anything.” It is also more compatible with reality.

Failure Modes

The most common failure mode is governance theater.

That is when an organization buys catalogs and policy tools, assigns data stewards, and still leaves access paths, data movement, and publication rights largely unmanaged. There are many dashboards about governance and very little actual enforcement. This usually happens because the enterprise wants the appearance of control without the friction of control.

Another failure mode is forcing a fake enterprise canonical model too early. Teams attempt to define one universal customer, one universal order, one universal product. They erase bounded contexts in the name of consistency and end up with abstractions so generic they are meaningless. Canonical models can be useful at integration seams, but they are terrible as a substitute for domain semantics.

A third failure mode is over-centralized approval processes. If every schema change, access request, or new data product needs committee review, teams will route around the platform. Shadow datasets are the market response to excessive governance friction.

There is also the event-stream trap: assuming Kafka equals authority. It does not. Streams are delivery mechanisms. A topic full of events can still be low-quality, semantically ambiguous, and non-compliant. Putting bad truth on a fast bus does not make it good truth.

Finally, many platforms fail because they ignore reconciliation. They declare the new product authoritative and shut down the old pipeline before proving equivalence, then spend months debugging discrepancies under executive pressure. Reconciliation is slower up front and dramatically cheaper overall.

When Not To Use

This approach is not for every situation.

Do not build a full governance-driven platform if you are a small company with a handful of systems, low regulatory burden, and a short path between data producers and consumers. You can drown in ceremony long before you have enough scale to justify it.

Do not impose heavy domain-product formality on exploratory analytics sandboxes where the purpose is temporary learning rather than enterprise distribution. Guard the perimeter, classify sensitive data, and move on.

Do not use this model if the organization lacks any willingness to assign actual domain ownership. Governance without accountable owners is just centralized paperwork.

And do not over-engineer for real-time if your core enterprise use cases are batch financial close, monthly planning, and standard BI. Kafka is useful where event distribution matters. It is not an architectural vitamin to be taken daily.

Several related patterns complement this approach.

Data mesh, when interpreted sensibly, aligns well with domain-owned data products. But it needs stronger policy enforcement than many data mesh discussions admit. Without executable governance, data mesh becomes decentralized entropy.

CQRS and event sourcing can help where domain events are central and read models are intentionally derived. They make authority and projection clearer, though they also increase reconciliation and historical interpretation challenges.

Master data management still has a role, especially for cross-domain identifiers and shared reference data. But it should not be used to bulldoze bounded contexts into one model.

Strangler fig migration is the right migration pattern for most enterprises moving from legacy warehouses and point-to-point integrations toward governed domain products.

Policy as code is essential. If your controls cannot be versioned, tested, and deployed, they will not survive contact with enterprise change.

Summary

A data platform is a governance problem because data only becomes valuable when the enterprise can trust what it means, who owns it, how it may be used, and whether policy is enforced consistently.

That trust does not come from a lakehouse, a catalog, or Kafka alone. It comes from aligning the platform with domain semantics, bounded contexts, authoritative data products, executable policy, and explicit reconciliation. It comes from migration strategies that accept legacy reality instead of denying it. And it comes from architecture that is willing to say an unfashionable thing: not all data should be equally accessible, and not all meanings should be flattened into one enterprise schema.

The best enterprise data platforms are not giant storage systems. They are operating models, implemented in technology.

Build the policy enforcement plane. Make domain ownership real. Design for reconciliation. Migrate with a strangler pattern. And whenever somebody says, “We’ll sort out governance later,” hear what they are really saying:

“We are about to scale ambiguity.”

The key is not replacing everything at once, but progressively earning trust while moving meaning, ownership, and behavior into the new platform.

Frequently Asked Questions

What is a data mesh?

A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.

What is a data product in architecture terms?

A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.

How does data mesh relate to enterprise architecture?

Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.