CQRS Read Model Explosion in Event-Driven Systems

⏱ 20 min read

There is a moment in almost every event-driven transformation when the architecture team realizes they did not build “a clean CQRS platform.” They built a zoo.

It starts innocently. One team needs a denormalized customer dashboard. Another needs a fraud view. Finance wants a reconciliation projection. Operations wants a support console. Search needs a differently shaped index. Compliance asks for an immutable reporting model. Each one sounds reasonable in isolation. In fact, each one often is reasonable. But taken together, they breed. Fast.

This is the dark side of CQRS in event-driven systems: read model explosion.

And unlike the obvious failures in architecture—latency spikes, outages, broken deployments—this one often arrives dressed as success. Teams are shipping quickly. Consumers are autonomous. Kafka topics are flowing. New projections are easy to add. Product leaders are happy because every use case gets a tailored view. Architects congratulate themselves on flexibility. event-driven architecture patterns

Then the costs come due.

Suddenly you have dozens of projections, many partially overlapping, all with their own semantics, refresh behavior, storage technologies, replay mechanics, ownership confusion, and operational quirks. A business concept as simple as “active customer” has six definitions across the estate. A single upstream event schema change triggers a small corporate incident. Rebuilding one view takes hours, ten views takes days, and no one is entirely sure which read models are still used and which are ceremonial ruins from a previous reorganization.

This is not a tooling problem. It is not solved by “better Kafka governance” or by adding a projection framework. At its core, it is a modeling problem, a bounded-context problem, and a migration problem. Domain-driven design matters here because read models are not just technical caches. They encode language. They embody assumptions. They crystallize business meaning into operational software.

A read model is a promise about what the business thinks is true enough to read.

That is why read model proliferation deserves architectural attention. Not because many read models are automatically bad, but because unmanaged read models become semantic debt. They create invisible coupling while pretending to increase autonomy.

In this article, I’ll lay out where read model explosion comes from, why event-driven and Kafka-based microservice environments make it both attractive and dangerous, and how to govern it without killing the very flexibility CQRS is supposed to provide. We’ll talk about domain semantics, progressive strangler migration, reconciliation, operational realities, and the tradeoffs that rarely appear in conference talks. We’ll also look at a concrete enterprise example, because this problem is never theoretical once payroll, customer balances, or regulatory reporting are involved.

Context

CQRS is often introduced with a clean little picture: commands update the write model, events are emitted, and one or more read models are built for queries. The write side protects invariants. The read side optimizes access. Nice separation. Elegant flow.

In a small system, that picture largely holds.

In a large enterprise, it mutates. The event stream becomes shared substrate. Different teams subscribe for different reasons. Some need operational views with low latency. Others need analytic snapshots. Others build process-state projections for orchestration. Some need customer-facing APIs. Some need internal support screens. Some just need a convenience cache because hitting the source system is too expensive or politically impossible.

With Kafka in the middle, the mechanics encourage this. Publishing a topic feels cheap. Creating a new consumer group feels almost free. Materializing another read store seems pragmatic. Teams can move independently. This is why event-driven architecture is powerful. It lowers coordination cost.

But lower coordination cost is not the same as lower system complexity. It often means the complexity moves.

In domain-driven design terms, the danger appears when read models stop being expressions of bounded contexts and start becoming ad hoc shadows of upstream data. Once teams subscribe directly to domain events they only half understand, they begin reconstructing concepts outside the originating context. The result is not autonomy. It is semantic drift.

A customer event emitted by Sales is not automatically the right source for Billing, Service Operations, Risk, and Marketing to all compute “customer status.” They may each need distinct downstream interpretations. Sometimes that is correct. Sometimes it is a smell. The hard part is knowing which is which.

Problem

Read model explosion happens when the ease of creating projections outruns the discipline needed to govern domain meaning, lifecycle, and operational ownership.

The symptom is not simply “many read models.” Large enterprises will naturally have many. The real problem is uncontrolled proliferation: overlapping models, duplicate pipelines, inconsistent semantics, uncertain lineage, and fragile replay behavior.

A few common patterns signal the issue:

  • Multiple projections derived from the same event stream with only slight schema differences.
  • Read stores that exist because one team could not rely on another team’s API, SLA, or semantics.
  • Consumer-owned materializations that replicate half an upstream bounded context.
  • Reporting views, support views, search indexes, fraud views, and workflow state stores all independently defining similar business concepts.
  • No clear catalog of which read model is authoritative for which query shape.
  • Reconciliation becoming a permanent business process because projections disagree.
  • Rebuilds and backfills turning into major production events.

What makes this pernicious is that each local decision often makes sense.

A customer service team wants a sub-second support screen. They materialize customer profile plus recent orders plus shipment state. The fraud team wants event history organized by identity signals. Marketing wants segmentation attributes optimized for campaign targeting. Finance needs a settlement ledger view with accounting cutoffs. None of these are irrational. In fact, forcing them all through one generic query model would be worse.

The explosion happens when nobody asks the harder question: what domain question does this read model answer, and who owns the language behind it?

Without that question, teams build read models as convenience artifacts. Convenience scales badly.

Forces

Several architectural forces pull organizations toward read model proliferation.

1. Query diversity is real

Different use cases genuinely need different views. Search indexes, timeline views, customer dashboards, fraud investigation screens, and statutory reports have very different access patterns. CQRS exists for a reason.

Trying to collapse all of these into one canonical read model usually creates a bloated, compromised structure that serves no one well. So some level of read model multiplication is healthy.

2. Event streams make downstream change cheap

In Kafka-based systems, adding a consumer and materializing a projection is often easier than negotiating an upstream API change. Teams choose the path of least resistance. Architecture follows incentives.

This is one of those enterprise truths people avoid saying out loud: many read models exist not because they are the right design, but because cross-team collaboration is expensive.

3. Bounded contexts and enterprise ownership don’t line up neatly

The business may think in domains, but organizations are charted by departments, budgets, and vendor contracts. A clean bounded context map is often disrupted by reporting needs, regional variations, legacy platforms, and compliance controls.

So teams build local read models to bridge organizational fault lines.

4. Event semantics are frequently under-specified

An event named CustomerUpdated is nearly useless if every consumer must reverse-engineer what changed, what “customer” means, and whether the event reflects legal identity, CRM profile, service account holder, or billing party.

Poor event semantics amplify proliferation because consumers compensate by creating interpretation-specific projections and glue logic.

5. Replay and rebuild are theoretically simple, operationally expensive

Architects love to say that projections can always be rebuilt from the event log. They can. Sometimes. In production, with years of retention policies, schema evolution, tombstones, enrichment dependencies, and external reference data, rebuilds are often painful.

That causes teams to preserve old read models longer than they should, because retirement feels risky.

6. Reporting and reconciliation have different truth needs

Operational queries often tolerate eventual consistency. Financial and regulatory uses do not. This drives separate read models with different completeness, timing, and correction mechanics. Again, valid. But each such distinction creates more architecture.

Solution

The answer is not “fewer read models.” The answer is intentional read model architecture.

Treat read models as products with explicit purpose, domain semantics, ownership, lineage, and retirement rules. A read model should exist because it answers a distinct domain question under explicit non-functional constraints—not because it was easy to subscribe to a topic.

I use four practical categories:

  1. Experience projections — optimized for UI or API consumption.
  2. Operational projections — support workflows, case management, process state.
  3. Decision projections — fraud, pricing, eligibility, recommendation inputs.
  4. Control projections — reconciliation, audit, compliance, financial reporting.

This classification matters because these categories have different semantics and tolerances. Experience projections can often accept temporary staleness. Control projections usually cannot. Decision projections may need feature history and explainability. Operational projections often blend multiple bounded contexts into something useful but dangerous.

The governing rule is simple:

A read model must declare its purpose, semantic source, refresh expectation, and authority boundaries.

That declaration should be architecture policy, not optional documentation theater.

It is also wise to introduce a layered approach to projection design, because not every consumer should subscribe directly to raw domain events. In many enterprises, the healthiest pattern is to separate foundational domain projections from experience-specific compositions.

Diagram 1
CQRS Read Model Explosion in Event-Driven Systems

This does not mean creating a centralized projection monopoly. It means acknowledging that some projections form reusable semantic infrastructure, while others are application-facing derivatives.

In domain-driven design terms, foundational projections often sit at context boundaries and provide a stable interpretation layer. They can reduce repeated reconstruction of upstream concepts. But they must be used carefully; if over-centralized, they become the read-side equivalent of the dreaded enterprise canonical model.

That is the tradeoff. A little consolidation reduces waste. Too much creates a bottleneck and semantic flattening.

Architecture

A sustainable architecture for CQRS read models in event-driven systems usually has five ingredients.

1. Event contracts with domain meaning

Events need business semantics, not just field lists. “OrderShipped” should mean something precise: when did shipment become official, from whose perspective, with what business guarantees, and is it reversible?

This sounds obvious. It is rarely done well.

When events are semantically weak, every downstream projection embeds private interpretation logic. That is the seed of proliferation. Stronger contracts do not eliminate multiple read models, but they reduce accidental ones.

2. Projection taxonomy and ownership

Every read model should have:

  • an owning team
  • a domain purpose
  • upstream sources
  • expected freshness
  • rebuild strategy
  • retention policy
  • deprecation path
  • reconciliation requirement

If no one can answer those questions, the read model is already a liability.

3. Distinguish domain projections from convenience caches

This is a vital distinction. Some read models are meaningful domain artifacts. Others are tactical performance structures. Conflating them causes governance chaos. EA governance checklist

A support case summary that combines customer, order, payment, and shipment data for a call center may be a legitimate operational projection. A Redis cache storing the same data to shave 40 milliseconds off an endpoint is not the same thing. One deserves semantic review. The other deserves performance review.

4. Use composition sparingly

There is a temptation to solve proliferation by creating one “customer 360” or “order 360” mega-view. This usually becomes a swamp. The phrase “360” should put an architect on alert. It often means “we could not agree on bounded contexts, so we built a giant denormalized compromise.”

Some composed views are necessary, especially for service operations and support. But if every use case points at the same all-purpose read model, you are building a reporting warehouse with delusions of operational agility.

5. Make reconciliation a first-class capability

In event-driven systems, projections drift. They drift due to consumer lag, poison events, schema changes, external enrichment mismatches, duplicate handling bugs, clock assumptions, or missed corrections. If you do not design for reconciliation, you are merely hoping consistency will happen.

A robust architecture includes:

  • replay support
  • point-in-time rebuild options
  • comparison jobs between control and operational views
  • dead-letter handling with business visibility
  • correction events or compensating rebuild paths

Reconciliation is not a patch. It is part of the design.

Diagram 2
Make reconciliation a first-class capability

Migration Strategy

Most enterprises do not start fresh. They inherit reporting databases, integration hubs, batch ETL pipelines, replicated operational stores, and a hundred “temporary” materialized views that survived three CIOs. So the practical question is not whether to prevent read model explosion on day one. It is how to migrate toward sanity without freezing delivery.

This is where a progressive strangler approach works.

Do not attempt a grand read-model rationalization program. Those usually become architecture PowerPoint with no production outcome. Instead, identify a domain where projection sprawl is already causing visible pain—customer support, order servicing, payment investigation, policy administration—and introduce structure there first.

The migration usually follows this sequence:

Step 1: Inventory and classify existing read models

Catalog what exists. You will find more than anyone expects. Group them by purpose: experience, operational, decision, control. Note overlaps and semantic conflicts.

Step 2: Identify semantic duplication

Look for repeated business concepts like customer status, active policy, settled payment, shippable order, available balance. If the same term is defined differently in multiple projections, decide whether the difference is intentional or accidental.

Accidental duplication is where rationalization pays off.

Step 3: Introduce foundational projections at the seams

Rather than forcing all consumers to subscribe directly to raw event streams, create a small set of stable, semantically explicit projections for the most reused concepts. These become migration stepping stones.

Step 4: Strangle consumers off brittle direct reconstructions

Move downstream applications from bespoke event interpretation toward approved foundational or context-owned projections. Do this gradually, one consumer at a time.

Step 5: Add reconciliation before retirement

Never retire old read models until you can compare outputs and understand variance. In enterprise migration, reconciliation is your bridge of trust.

Step 6: Decommission aggressively once confidence exists

Nothing creates read model explosion like keeping every old projection “just in case.” If a model has no active consumer, no regulatory requirement, and no rebuild purpose, retire it.

Here is the important bit: migration is not only technical. It is semantic. You are moving teams from private interpretations of events toward explicit domain language. That requires workshops, decision records, and some hard conversations. If architecture avoids those conversations, the platform will continue to grow sideways.

Step 6: Decommission aggressively once confidence exists
Decommission aggressively once confidence exists

Enterprise Example

Consider a large retail bank modernizing its customer servicing platform.

The bank had adopted Kafka as an enterprise event backbone. Core banking, CRM, cards, payments, fraud, and digital channels all published events. Over time, more than 70 read models emerged around “customer” and “account” concepts alone.

At first, this looked like healthy decentralization. The mobile app had its own customer summary view. Branch servicing had another. Fraud built a risk-oriented profile. Compliance materialized KYC status. Collections created delinquency projections. The contact center had a support dashboard fed by several topics and a few direct database extracts because not all needed data was evented yet.

Then the cracks appeared.

A customer changed address. The digital app reflected it in seconds. The branch system lagged by minutes. Fraud still showed the old address because its enrichment pipeline depended on a nightly master data feed. Compliance had a legally effective address date different from operational systems. When a customer disputed a notification failure, the bank could not easily prove which address was considered active at the point of communication.

The problem was not “eventual consistency” in the abstract. The problem was semantic inconsistency in a regulated context.

An architecture review found that at least 11 read models carried some definition of customer contactability. Some differences were legitimate. Marketing cared about campaign opt-in. Compliance cared about legal contact preference. Service operations cared about best reachable channel. But many differences were accidental, caused by teams reconstructing customer meaning from CRM and profile events without a shared interpretation layer.

The bank did not solve this by creating a giant canonical customer 360. That would have become another political battlefield. Instead, it introduced three bounded projections:

  • Customer Profile Projection for operational identity and service profile
  • Customer Contactability Projection for communication eligibility semantics
  • Customer Control Projection for point-in-time auditable state and reconciliation

Consumer teams then migrated gradually. The mobile app continued to shape its own experience view, but sourced contactability from the dedicated projection. Compliance retained its control model for audit. Fraud kept a specialized risk projection but stopped re-deriving basic profile semantics.

During migration, the bank ran old and new projections in parallel for six weeks. A reconciliation service compared outputs on key fields and generated variance reports by customer segment and event type. This exposed not only code bugs but policy ambiguities the business had never formally resolved.

That is the hidden value of reconciliation: it surfaces business disagreement masquerading as technical inconsistency.

After cutover, the bank retired 19 redundant projections and reduced change impact from customer event schema modifications by roughly half. Not because there were fewer consumers overall, but because there were fewer independent semantic reconstructions.

This is what good enterprise architecture looks like. Not purity. Better fault lines.

Operational Considerations

Architects who discuss read models only as boxes on diagrams have usually not operated them at scale.

Read models are living operational assets. They lag, corrupt, stall, bloat, replay, and occasionally lie.

A few operational concerns matter disproportionately:

Freshness and lag visibility

Every projection should expose freshness: event offset, processing timestamp, and business staleness indicators where relevant. “Eventually consistent” is not an excuse to be blind. Operators need to know whether a support screen is 2 seconds behind or 27 minutes behind.

Replay economics

Rebuild capability is essential, but replaying months of Kafka history into dozens of projections can be punishing. Topic retention, partitioning, consumer throughput, and downstream database write patterns all matter. Some projections should support snapshotting to reduce rebuild windows.

Schema evolution

Read model explosion multiplies schema blast radius. A harmless-looking event change can break many consumers, especially if they depend on optional fields that were never really optional in practice. Contract testing and schema compatibility controls are not bureaucracy here. They are survival tools.

Storage sprawl

One projection in PostgreSQL, another in Elasticsearch, another in Cassandra, another in Redis, another in S3 parquet files. This is how enterprises quietly build an accidental data platform nobody owns. Technology diversity should be earned by query needs, not by team preference. enterprise architecture with ArchiMate

Poison events and dead-letter queues

A dead-letter queue is not a solution. It is a symptom catcher. For critical read models, poison events need business-aware triage. If a control projection drops an event silently into a DLQ, your audit posture is already compromised.

Backfills and late-arriving events

Historical corrections are common in financial, insurance, and logistics domains. Read models must define whether they are current-state only, bitemporal, or correction-aware. If not, late events will produce “mysterious” inconsistencies that are really design omissions.

Tradeoffs

Let’s be blunt: there is no magic number of acceptable read models.

Too few, and you force unrelated use cases into one shape. Too many, and you create semantic chaos and operational drag.

The central tradeoffs are these:

  • Autonomy vs consistency
  • Independent teams can move faster with their own projections, but semantic divergence grows.

  • Optimization vs duplication
  • Specialized read models perform better for specific queries, but often duplicate transformation logic.

  • Direct event consumption vs curated projection reuse
  • Direct consumption maximizes freedom; curated reuse reduces accidental interpretation but can become a bottleneck.

  • Fast delivery vs governance overhead
  • Lightweight projection creation speeds experiments; weak governance creates long-term debt. ArchiMate for governance

  • Domain fidelity vs convenience composition
  • Rich bounded-context semantics preserve meaning; broad convenience views help users but risk flattening domain nuance.

A mature enterprise does not pick one side universally. It chooses by context.

Failure Modes

When read model proliferation goes wrong, it tends to fail in familiar ways.

Semantic drift

The same business term means different things in different projections. This is the most common and most damaging failure.

Hidden coupling

Teams believe they are decoupled because they consume events asynchronously, but they are tightly coupled to event interpretation details and field quirks.

Replay disasters

A projection rebuild overloads Kafka consumers, saturates databases, or replays with new logic that produces materially different historical results.

Zombie models

Unused read models remain in production because nobody is sure whether they are safe to retire. They continue to consume events and complicate changes.

Reconciliation as permanent manual labor

If every month-end close depends on humans comparing conflicting projections, the architecture has failed even if all systems are “up.”

Platform calcification

In reaction to chaos, some organizations overcorrect by centralizing all read models under a platform team. Delivery slows, local needs are ignored, and teams create shadow projections anyway. Governance by monopoly does not work.

When Not To Use

CQRS with multiple event-driven read models is not the default answer for every system.

Do not use this approach when:

  • the domain is simple and CRUD-oriented
  • query patterns are stable and modest
  • strong immediate consistency is required for most reads
  • the organization lacks event contract discipline
  • operational maturity for replay, reconciliation, and observability is weak
  • team boundaries do not support ownership of projections

A monolithic application with a well-designed relational model is often a better answer than a distributed read-model circus. Likewise, if you have only a handful of straightforward query screens, introducing CQRS and Kafka-backed projections may just manufacture complexity.

Architecture should solve real forces, not imitate fashionable diagrams.

Several related patterns often appear alongside this problem:

  • Materialized views — the technical basis for many read models, though not all materialized views are domain-significant.
  • Outbox pattern — helps publish reliable domain events from transactional systems.
  • Event sourcing — often paired with CQRS, but read model explosion can occur with or without full event sourcing.
  • Saga / process manager — may maintain operational state projections for long-running workflows.
  • Data mesh style data products — useful analogy, especially for ownership and discoverability, though operational CQRS read models are not the same as analytical data products.
  • Strangler fig migration — the right migration pattern for taming projection sprawl incrementally.
  • Anti-corruption layer — essential when downstream contexts need protection from upstream event semantics or legacy models.

The anti-corruption layer is especially important. Many “custom” read models are really ad hoc anti-corruption layers built without being named as such. Once you recognize that, you can design them deliberately.

Summary

Read model explosion in CQRS event-driven systems is not a sign that CQRS failed. It is a sign that success arrived faster than semantic discipline.

Large enterprises need multiple read models. Different users, processes, and controls demand different shapes of truth. The mistake is not multiplicity. The mistake is allowing projections to multiply without clear purpose, ownership, semantics, and retirement.

If you remember one thing, make it this:

Read models are not just query optimizations. They are domain commitments.

That means they need bounded-context thinking, not just streaming infrastructure. It means migration must be progressive, with strangler patterns and reconciliation to build trust. It means Kafka is an enabler, not an excuse for every team to reinvent upstream meaning. It means some consolidation is healthy, but centralization can be just as harmful as sprawl.

Good architecture here is not about drawing fewer boxes. It is about drawing the right fault lines, so the business can ask many questions without creating many incompatible answers.

That is the real challenge. And the real craft.

Frequently Asked Questions

What is CQRS?

Command Query Responsibility Segregation separates read and write models. Commands mutate state; queries read from a separate optimised read model. This enables independent scaling of reads and writes and allows different consistency models for each side.

What is the Saga pattern?

A Saga manages long-running transactions across multiple services without distributed ACID transactions. Each step publishes an event; if a step fails, compensating transactions roll back previous steps. Choreography-based sagas use events; orchestration-based sagas use a central coordinator.

What is the outbox pattern?

The transactional outbox pattern solves dual-write problems — ensuring a database update and a message publication happen atomically. The service writes both to its database and an outbox table in one transaction; a relay process reads the outbox and publishes to the message broker.