Microservices Data Still Forms a Monolith

⏱ 20 min read

There’s a lie we tell ourselves when we split a system into microservices. microservices architecture diagrams

We say the monolith is gone because the code is gone. We point to separate repositories, separate pipelines, separate teams, separate Kubernetes deployments, and a Kafka cluster humming in the middle like a badge of modernity. We have a customer service, an order service, a billing service, a pricing service, a fulfillment service. We have bounded contexts on slides and event streams in production. It looks decomposed. event-driven architecture patterns

But then a customer changes their legal entity. An order gets repriced after shipment. A refund collides with tax finalization. A product hierarchy changes in one service and six others begin making wrong decisions without throwing a single technical error. That’s the moment the truth arrives: the code may be distributed, but the meaning is not.

Data still forms a monolith.

Not a physical monolith, not a database monolith, but something more stubborn: a semantic monolith. A shared dependency topology of meanings, invariants, interpretations, and timing assumptions. You can split tables. You can split services. You can even split teams. But if every critical workflow still depends on a tightly connected graph of shared domain semantics, you haven’t escaped monolithic gravity. You’ve merely stretched it over the network.

That is the real architecture problem.

This article is about that hidden topology: what it is, why it persists in microservices, how domain-driven design helps expose it, where Kafka helps and where it deceives, and how to migrate without replacing one big ball of mud with fifty smaller balls tied together by confusion.

Context

Most enterprises did not arrive at microservices through philosophy. They arrived through pain.

The original system usually made perfect sense in its first life. One application. One database. Transactions were easy. Reporting was close to operations. The same team understood enough of the domain to keep the whole thing moving. Then the business expanded. More channels. More products. More regions. More regulations. More teams. Suddenly one schema had become the parliament of the company. Every change was a negotiation. Every release carried collateral damage. So the natural response was decomposition.

And decomposition is often the right response. But many migrations focus on technical seams before semantic seams. Teams split by functional area, APIs appear, Kafka topics multiply, and databases are assigned per service. Yet the underlying domain language remains cross-cutting and unresolved.

A customer is still “the same customer” in sales, billing, onboarding, risk, support, and compliance—but not really.

An order is still “the same order” across pricing, tax, fulfillment, invoicing, and returns—but not really.

A product is still “the same product” across catalog, inventory, commerce, logistics, and finance—but not really.

Those last two words matter. Not really is where architecture lives.

Domain-driven design has always been blunt about this: a shared word does not imply a shared model. “Customer” in support is not “Customer” in receivables. “Order” in fulfillment is not “Order” in pricing. Bounded contexts exist because the business itself contains semantic fractures. But enterprises often keep pretending there is one canonical business object that everyone can depend on. That fantasy is what recreates the monolith in distributed form.

Problem

The problem is not simply shared data. Shared data is normal.

The problem is shared semantic dependency topology: a network of services that appear autonomous but still rely on each other’s interpretation of business concepts, state transitions, and temporal guarantees in order to be correct.

That dependency can take several forms:

A service relies on another service’s status values to make business decisions.
A downstream service assumes the meaning of an event is stable across time.
Multiple services depend on the same identity and classification rules.
Business invariants are enforced only when several services happen to agree.
Teams publish “facts” that are actually local opinions presented as universal truth.

This is why distributed architectures often feel independent in deployment but entangled in operation. The coupling moved up the stack—from function calls and foreign keys to semantics and timing.

Here’s the cruel joke: network boundaries often make semantic coupling worse, not better. In a monolith, at least the coupling is visible. In microservices, it becomes polite. It hides behind contracts, event names, JSON schemas, topic conventions, and “platform standards.” It becomes harder to see and therefore harder to govern.

A service boundary that ignores semantic cohesion is not architecture. It is choreography for future incidents.

Forces

This problem persists because several forces pull in opposite directions.

Team autonomy vs semantic coherence

Microservices promise autonomous teams. That promise matters. Teams that can build, deploy, and evolve independently deliver faster. But autonomy in implementation does not remove interdependence in meaning. If teams share core concepts but evolve them without explicit context boundaries, autonomy becomes semantic drift.

Local optimization vs enterprise workflow correctness

Each team optimizes for its service: clean ownership, isolated schema, low change friction. The enterprise, however, runs on end-to-end processes: quote-to-cash, procure-to-pay, claim-to-settle, hire-to-retire. Those processes cut across services. The data required to complete them is almost always assembled from multiple bounded contexts. That creates hidden systemic coupling.

Event-driven decoupling vs event-driven illusion

Kafka is useful. Very useful. It can decouple producers from consumers in time, smooth load, preserve durable history, and support reconciliation. But Kafka does not erase semantic dependency. It often amplifies it. Once an event is published, many consumers may build assumptions on top of it. The event becomes a social contract, not merely a transport artifact.

Need for consistency vs cost of coordination

Some invariants truly require coordinated consistency. Others do not. The trouble comes when architects avoid this distinction. If everything is eventually consistent, you produce expensive business errors. If everything is strongly consistent, you reassemble the monolith through synchronous coupling and distributed transactions.

Reuse vs context protection

Enterprises love shared master data and canonical models because they appear efficient. Sometimes they are. More often they become semantic compromise layers that flatten useful distinctions. Reuse is cheap at first and expensive forever. Context protection is expensive at first and often cheaper later.

Solution

The solution is not “go back to the monolith,” though sometimes that is the right answer for a subsystem. The solution is to make the semantic topology explicit and design around it.

The key architectural move is this:

Treat data architecture in microservices as a map of bounded contexts and semantic dependencies, not a map of APIs and databases.

That means several things.

First, identify where domain semantics genuinely differ. Not every service should own a copy of “Customer.” Many should own a context-specific representation: billable party, service subscriber, legal account holder, support contact, fraud subject. Those are not implementation details. They are different concepts.

Second, classify relationships between contexts. Some data is authoritative upstream reference data. Some is derived. Some is replicated. Some is projected for read models. Some is provisional and must later be reconciled. Some should never be shared directly at all.

Third, design for semantic translation rather than semantic reuse. Anti-corruption layers are not old-fashioned; they are survival gear. When one context consumes another context’s events or APIs, it should translate them into local meaning. If you import another model raw, you import its future changes, ambiguities, and politics too.

Fourth, decide explicitly which invariants are local, which are cross-context, and how failures are resolved. This is where eventing, orchestration, sagas, reconciliation jobs, and compensations belong—not as default fashionable choices, but as specific answers to specific consistency problems.

Fifth, accept that some enterprise workflows are composite by nature. The goal is not to eliminate all cross-service dependency. The goal is to make the dependencies legible, limited, and operationally manageable.

A healthy microservice estate is not dependency-free. It is dependency-aware.

Architecture

A useful way to picture the estate is not as boxes and arrows of service calls, but as a layered semantic graph.

This graph is not bad by itself. Enterprises are interconnected. The problem begins when these arrows are treated as simple data exchange while they actually carry business meaning and business timing assumptions.

Consider the common “customer” example. In many architectures, a central customer service publishes customer-created, customer-updated, customer-merged, and customer-deactivated events. Downstream teams subscribe and believe they are being “kept in sync.” On paper, elegant. In practice, dangerous.

Why? Because downstream systems often need different things:

Billing cares about legal identity, tax profile, payment responsibility.
Support cares about contactability and relationship history.
Fulfillment cares about delivery destination and handoff constraints.
Risk cares about trust signals and exposure.
CRM cares about segmentation and campaign eligibility.

If one central service emits a single canonical customer event and everyone consumes it directly, every downstream team is now coupled to central semantics. What counts as a customer merge? Which address is primary? What does inactive mean? Is a suspended customer billable? Can an archived customer still receive refunds? These are not data modeling details. These are policy semantics.

This is where DDD matters. Bounded contexts let you say, with precision, that one service’s “Customer” is another service’s source material, not its truth.

A semantic topology view

Notice the word translate. That is architecture doing its job.

A context should not consume enterprise facts as if they are pre-interpreted forever. It should translate upstream facts into local concepts, preserving independence of meaning. This often means local storage, projections, and explicit mapping code. Some architects resist this because it looks like duplication. It is duplication. Useful duplication. Cheap compared to semantic lock-in.

Kafka’s role

Kafka fits well when:

You want durable event history.
Consumers need to replay and rebuild projections.
Temporal decoupling is useful.
A domain event is genuinely meaningful to multiple contexts.
You need auditability and reconciliation support.

Kafka fits poorly when:

Events are being used as a substitute for unresolved boundaries.
Every CRUD change becomes an event stream.
Services consume each other’s internal state transitions as enterprise truth.
Teams believe eventual consistency eliminates the need for failure handling.
The organization lacks discipline around event versioning and semantics.

An event should express a business fact meaningful beyond the producer’s internals. “InvoiceIssued” is often good. “CustomerTableRowUpdated” is not architecture; it is leakage.

Reconciliation is not a workaround

Reconciliation deserves a better reputation. In enterprise systems, reconciliation is not evidence of failure. It is evidence of maturity.

Distributed systems fail in slow, administrative, business-shaped ways. Messages arrive late. Consumers are down. Events are applied twice. Reference data changes after transactions start. Human corrections happen outside the happy path. Regulatory workflows force retroactive amendments. You do not solve this with optimism and retries alone.

A robust architecture uses:

event-driven propagation for timeliness,
local state for autonomy,
reconciliation for correctness over time.

That trio matters.

Diagram 3 — Reconciliation is not a workaround

This pattern is common because reality is common. Enterprises don’t run on perfect synchronization. They run on correction loops.

Migration Strategy

Most organizations cannot redesign semantics from scratch. They have a working system, or at least a revenue-producing one. So migration has to be progressive.

The right migration pattern here is usually a strangler fig, but not only at the API layer. It must also be semantic.

Step 1: map the real domain, not the org chart

Before carving services out of a monolith, identify bounded contexts and shared semantics. Event storming, capability mapping, and workflow analysis help, but the real test is sharper: where does the meaning of the same business term change? Where are invariants local? Where do people argue about what a status means? Those fault lines are your architectural clues.

Step 2: separate systems of record from systems of decision

Not every service needs to own authoritative source data. Some need authoritative decision logic over imported facts. Distinguish:

source of identity,
source of reference classification,
source of transaction,
source of financial obligation,
source of operational execution.

This reduces accidental overlap.

Step 3: peel off read models before write ownership

A practical migration move is to create context-specific read models fed from the monolith or initial services. This lets teams stop querying the shared schema directly while preserving business continuity. It also exposes where semantics diverge, because each read model chooses what local shape it needs.

Step 4: extract workflow edges, not core knots

Do not begin with the most semantically central concepts unless there is a compelling reason. “Customer” and “Order” are often loaded with enterprise dependency. Safer early extractions are bounded workflows with clearer autonomy: notification preferences, document generation, shipment tracking, returns intake, fraud scoring. These teach the organization how to operate services without detonating core semantics.

Step 5: introduce events as business contracts, not database echoes

When extracting a service, publish domain events that express stable business facts. Consumers should build local projections and anti-corruption layers. Resist the urge to publish every internal transition.

Step 6: add reconciliation from day one

Do not wait for incidents. Build comparison reports, replay tooling, drift detection, dead-letter workflows, and backfill jobs early. A migration without reconciliation is theater.

Step 7: move write responsibility only when invariants are clear

A service should take write ownership only when the business rules it must enforce are mostly local. If correctness still depends on synchronous agreement from several neighboring services, you are extracting too early or at the wrong seam.

Step 8: retire semantic dependencies deliberately

Killing old API calls is not enough. Remove old shared interpretations, old status dependencies, old reporting joins, old batch assumptions. Many “successful” migrations fail here: the old monolith remains the hidden arbiter of meaning.

Enterprise Example

Consider a global insurance carrier modernizing its policy administration platform.

The legacy system managed policy, customer, billing, claims linkage, product eligibility, regional regulation, and agent hierarchy in one large platform. Every function relied on the same customer and policy tables. It was a classic enterprise monolith: ugly in places, but semantically honest because everything happened in one place.

The modernization program split the system into microservices:

Policy Service
Customer Service
Billing Service
Claims Integration Service
Product Rules Service
Document Service

Kafka was introduced as the event backbone. The initial design assumed Customer Service would become the canonical owner of customer data and all other services would subscribe.

Within six months, strange production defects appeared.

A policy endorsement changed the insured party structure. Billing interpreted the customer update as a change in bill-to responsibility and regenerated payment schedules incorrectly. Claims integration treated the same event as a benign profile amendment and did nothing. Regulatory reporting consumed the event stream and updated legal entity mappings, but late-arriving corrections caused contradictory submissions across jurisdictions.

Nobody had a database problem. Nobody had a transport problem. They had a semantic topology problem.

The breakthrough came when the architects stopped asking “Who owns customer?” and started asking “Which contexts need which notion of party?”

They decomposed the concept:

Customer Identity Context: person/organization identity, survivorship, deduplication.
Policy Party Context within policy domain: insured, beneficiary, payer, covered member, broker association.
Billing Party Context within billing domain: liable party, invoice recipient, delinquency subject.
Claims Party Projection within claims integration: claimant, policyholder, legal contact.

Customer Service no longer published a universal “CustomerUpdated” event as enterprise truth. Instead:

identity events remained narrow and factual,
policy emitted policy-party changes,
billing maintained its own party model,
claims translated all incoming changes into its own role-based model.

They also introduced nightly and on-demand reconciliation between policy-party assignments and billing liabilities, because mid-term endorsements and backdated corrections were common. Kafka remained essential, but now as a propagation and replay mechanism, not a fantasy machine for universal truth.

The result was not less data duplication. In fact, there was more. But there was less semantic confusion, fewer emergency fixes, and better team autonomy. That is the trade worth making in enterprise architecture.

Operational Considerations

Once you accept semantic topology as a first-class concern, operations change.

Data observability must include meaning

Traditional monitoring watches CPU, latency, error rates, consumer lag. Necessary, not sufficient. You also need semantic observability:

projection freshness by business object,
drift between authoritative and derived models,
count mismatches across contexts,
event version adoption,
business invariant breach rates,
delayed reconciliation backlog.

A system can be technically green and business-red.

Schema governance is not enough

Teams often add Avro schemas, compatibility checks, and topic standards and call it governance. Good start. But schemas govern structure, not meaning. You need event catalogs with semantic definitions, ownership, deprecation rules, and examples of intended interpretation. If your teams cannot answer “What business claim does this event make?” the schema is not helping enough. EA governance checklist

Idempotency is table stakes

Every consumer should assume duplicate delivery, replay, reorder, and partial failure. Idempotent handlers, version-aware upserts, and deterministic projection logic are mandatory. In distributed data architecture, duplicate processing is normal weather.

Time matters more than people think

Events create historical truth. APIs often return current truth. Reconciliation may compare one against the other and find false differences unless effective dates, processing dates, correction dates, and publication dates are all treated properly. Temporal modeling is not optional in finance, insurance, retail pricing, logistics, or telecom. It is where many “clean” microservice architectures go to die.

Reporting will expose your lies

Operational services can tolerate context-specific models. Enterprise reporting cannot tolerate silent semantic inconsistency. This is why analytical platforms and data products need explicit business definitions and lineage back to source contexts. If the warehouse or lakehouse just combines “customer” from seven services by wishful joining, you have rebuilt the problem downstream.

Tradeoffs

No serious architecture choice comes free.

What you gain

Better local autonomy.
Safer team evolution.
Clearer business ownership.
More resilient integrations.
Reduced semantic blast radius.
Healthier migrations from legacy systems.

What you pay

More data duplication.
More translation logic.
More reconciliation mechanisms.
Harder enterprise reporting unless definitions are governed well.
Greater need for disciplined event and contract management.
More patience from stakeholders who expected microservices to simplify everything.

That last point matters. This style of architecture is not visually simple. It is operationally honest.

A single canonical model is simple to explain and expensive to live with. A bounded-context model is harder to explain and often cheaper to run.

Failure Modes

There are some classic ways this goes wrong.

Canonical data model empire

An enterprise architecture group creates a universal customer, order, product, party, account, and location model. Every service must use it. The intention is interoperability. The result is semantic compromise, slow change, and endless committee design. This is just a distributed monolith with governance theater. ArchiMate for governance

Event soup

Teams publish everything. Topics multiply. Consumers subscribe “just in case.” No one knows which events are business commitments and which are internal mechanics. The organization becomes dependent on accidental signals. Eventually someone changes one, and a distant process quietly corrupts itself.

Fake autonomy

Each service has its own database, but every critical write requires synchronous calls to three others for validation. The system now has the coupling of a monolith and the failure modes of a distributed system. This is the worst of both worlds.

Reconciliation by spreadsheet

When drift appears, operations teams export reports, compare in Excel, and trigger manual corrections. This can survive for months in enterprises, which is exactly why it is dangerous. Manual reconciliation hides architectural debt until scale or audit makes it catastrophic.

Misplaced strong consistency

Architects use distributed transactions, lock-step orchestration, or hard synchronous checks to preserve invariants that are not actually worth that operational cost. Availability drops, change friction rises, and teams become afraid of releases.

Misplaced eventual consistency

The opposite error: declaring everything eventually consistent without defining how wrongness is detected and corrected. Eventual consistency without reconciliation is not architecture. It is hope with brokers.

When Not To Use

This approach is not religion. There are cases where you should not lean into semantic decomposition this far.

Do not do this when the domain is still small, cohesive, and handled by one team. A modular monolith may be better. If the semantics are genuinely shared and stable, splitting them into services creates overhead without payoff.

Do not do this when the organization lacks product-aligned teams and basic operational discipline. If you cannot manage contracts, observability, and incident response, a finely decomposed architecture will simply fragment responsibility.

Do not force local semantic models when consumers truly need the same thing with the same meaning and the same timing. Reference data such as country codes, currency definitions, or regulated classification schemes may justify a shared source and thin local handling.

Do not pretend Kafka is required. Event streams are powerful, but if your integration patterns are few and mostly request-response, introducing an event backbone may be needless complexity.

And sometimes—this is worth saying plainly—the right answer is a monolith. A well-structured monolith with explicit module boundaries, good domain design, and a coherent transactional model can outperform a poorly decomposed microservice landscape by a wide margin. The issue is not monolith versus microservices. The issue is whether your semantic dependencies are being managed consciously.

Several patterns sit naturally beside this approach.

Bounded Contexts

The foundation. They define where models are allowed to differ because the business differs.

Context Mapping

Useful for identifying upstream/downstream relations, conformist integrations, anti-corruption layers, and partnership boundaries.

Anti-Corruption Layer

Critical whenever one context consumes another’s concepts but must preserve local meaning.

Strangler Fig Migration

The practical migration pattern for gradually extracting capabilities and semantics from a legacy monolith.

CQRS and Materialized Views

Helpful when local read models need to be shaped differently from write models, especially during progressive extraction.

Saga / Process Manager

Useful for coordinating long-running workflows across contexts, provided you are careful not to centralize all business semantics into orchestration code.

Data Mesh, Carefully Applied

Relevant on the analytical side, where domain-owned data products can preserve semantic accountability. But it does not eliminate operational semantic coupling.

Master Data Management

Sometimes necessary, especially for identity, reference data, and survivorship. But MDM should be narrow and explicit. Used carelessly, it becomes the canonical-model empire in nicer clothing.

Summary

Microservices do not dissolve the monolith. They relocate it.

If you decompose code and databases but leave business meaning entangled, your architecture will still behave like a monolith—just one with network latency, asynchronous bugs, and governance meetings. The hidden structure is a shared semantic dependency topology: the graph of meanings, invariants, and timing assumptions that services rely on to act correctly.

The remedy is not to eliminate dependency. That is fantasy. The remedy is to make semantic boundaries explicit through domain-driven design, bounded contexts, translation, local models, and deliberate reconciliation. Kafka can help as a propagation and replay backbone, but it cannot solve unresolved semantics. Progressive strangler migration works, but only when you migrate meaning as carefully as you migrate code.

This is the architect’s job: not merely to distribute software, but to decide where meaning lives, where it changes, and how the enterprise survives when those meanings inevitably diverge.

A monolith of code can be split with effort.

A monolith of semantics must be understood first.

That is the harder work.

And in enterprise architecture, it is the only work that really counts.

Frequently Asked Questions

What is a service mesh?

A service mesh is an infrastructure layer managing service-to-service communication. It provides mutual TLS, load balancing, circuit breaking, retries, and observability without each service implementing these capabilities. Istio and Linkerd are common implementations.

How do you document microservices architecture for governance?

Use ArchiMate Application Cooperation diagrams for the service landscape, UML Component diagrams for internal structure, UML Sequence diagrams for key flows, and UML Deployment diagrams for Kubernetes topology. All views can coexist in Sparx EA with full traceability.

What is the difference between choreography and orchestration in microservices?

Choreography has services react to events independently — no central coordinator. Orchestration uses a central workflow engine that calls services in sequence. Choreography scales better but is harder to debug; orchestration is easier to reason about but creates a central coupling point.