⏱ 21 min read
Most CQRS discussions go wrong in the first five minutes.
They start with commands and queries, draw a clean split down the middle, and pretend the hard part is naming the two boxes. It isn’t. The hard part is accepting that your read side is not a polite projection sitting beside the “real” system. In a serious enterprise system, the read side is a cache topology. A purposeful one. A designed one. A topology of copies, denormalizations, indexes, materialized views, search documents, aggregates, and precomputed answers shaped around how the business actually consumes information.
That idea matters because it changes the architecture conversation. You stop asking, “Should we have a read model?” and start asking, “What data copies do we need, where should they live, how fresh must they be, and what happens when they lie?”
That is the adult version of CQRS.
The write model exists to protect invariants. It speaks the language of intent: approve claim, allocate credit line, cancel shipment, apply pricing rule. The read model exists to answer questions fast enough, cheaply enough, and in forms that fit the work. It speaks the language of usage: open claims by region, customer exposure by product family, shipments delayed more than 48 hours, expected margin by channel. Those are not the same language. They should not be the same shape. And if you force them into one model, one of two things happens: either the write side becomes polluted with reporting concerns, or the read side becomes a swamp of joins, lock contention, and accidental complexity.
So yes, CQRS is about separating writes from reads. But the useful architectural insight is sharper: read models are a cache topology organized around business semantics.
That sentence sounds simple. It is not. It has consequences for domain-driven design, microservices, Kafka pipelines, migration strategy, operational safety, and the kinds of failures your users will actually see. event-driven architecture patterns
Let’s get into it.
Context
CQRS is often introduced as a pattern for separating command handling from query handling. That description is accurate and incomplete. In practice, CQRS becomes relevant when one model cannot serve two masters:
- transactional integrity on writes
- flexible, scalable, low-latency access on reads
In an enterprise landscape, this tension is everywhere. Policy administration systems, order management, lending platforms, telecom provisioning, claims processing, subscription billing, and ERP-adjacent workflows all suffer from the same structural mismatch. The write path wants consistency, transaction boundaries, and domain invariants. The read path wants denormalized views, broad filters, sortable datasets, search, pagination, analytics-like summaries, and low tail latency under heavy load.
The mistake is thinking these are merely technical optimization issues. They are usually domain issues wearing technical clothes.
A write model is built around the domain’s decisions. A read model is built around the domain’s questions.
That distinction is very close to domain-driven design. Aggregates enforce business rules. Bounded contexts define language boundaries. Read models sit downstream of those decisions and recast domain facts into shapes suitable for specific consumers: customer service screens, risk dashboards, mobile APIs, partner feeds, pricing workbenches, audit trails. Different consumers ask different questions. Therefore they need different caches.
If you run a large enterprise platform on Kafka and microservices, this becomes even more pronounced. Services publish events or change notifications. Multiple downstream consumers materialize different views from the same stream. One service builds a customer timeline. Another builds operational metrics. A third creates a fraud scoring feature store. A fourth maintains a search index. Same underlying business facts. Different cache topologies.
That is not duplication by accident. It is duplication as design.
Problem
The classic enterprise problem looks innocent enough.
You begin with a normalized transactional model. It serves the line-of-business application well. Then reporting requirements grow. Then APIs multiply. Then support teams need a “single customer view.” Then product managers ask for filtering, sorting, search, and aggregated widgets on every screen. Then someone adds Kafka, a few microservices, and a data lake. Now the old transactional schema is carrying workloads it was never designed to carry. microservices architecture diagrams
Symptoms arrive in predictable order:
- write transactions slow down because read queries hold locks or consume resources
- query logic drifts into application services and ORMs
- developers add ad hoc indexes for one screen and break another workload
- the “same” business concept is read differently by different teams
- joins span domains that should have been separate bounded contexts
- dashboards and operational UIs hit production databases directly
- read latency becomes spiky under peak traffic
- teams cannot evolve schema safely because consumers depend on internal tables
This is where teams often say, “We need CQRS.” Usually what they really mean is: “Our reads have escaped the shape of our transactional model.”
And they’re right. But introducing CQRS without a clear idea of read-side topology just replaces one mess with a distributed mess.
The central problem is not simply read/write asymmetry. It is this:
**The write model is optimized for deciding what is true.
The read model is optimized for answering what people need to know.**
Those are different jobs.
Forces
Several forces pull the architecture in different directions.
Domain semantics
In domain-driven design, the write model is where semantics are expensive and precise. Commands carry intent. Aggregates enforce invariants. The model reflects the business’s decision boundaries. For example, “approve loan application” belongs with credit policy, not with a giant customer record query.
Read models, by contrast, are semantic composites. They often stitch together facts from multiple aggregates or even multiple bounded contexts. That is exactly why they should not be mistaken for transactional truth. A “Customer 360” screen is useful, but it is not a domain aggregate. It is a convenience view.
Confuse those two, and your architecture starts lying.
Performance and scale
Read traffic is often an order of magnitude higher than write traffic. It also has different access patterns: fan-out reads, wide scans, search, faceting, dashboards, mobile summaries. The transactional store is good at some of these, bad at many, and catastrophic at a few.
Autonomy
In microservice environments, teams need to evolve independently. If every consumer queries another service’s internal database, autonomy is theatre. Read models let services publish facts and let consumers shape those facts into their own local query stores.
Freshness
Not every read needs strong consistency. But some do. Users tolerate delayed analytics. They do not tolerate confirming a payment and then seeing “payment pending” for ten minutes. The architecture must classify freshness expectations explicitly.
Cost and complexity
Every read model is another copy of data to build, test, monitor, backfill, secure, and reconcile. A cache topology buys speed and flexibility by introducing duplication and staleness. There is no free lunch here. There is only a better trade.
Solution
Treat the read side as a deliberate cache topology.
That means three things.
First, each read model is designed for a specific query or family of queries. Not “all reads.” Specific reads. Account summary. Open orders by warehouse. Product catalog search. Case timeline. Risk exposure by customer group. If the query shape is known and important, build a model around it.
Second, each read model has explicit semantics:
- source of truth
- projection logic
- acceptable staleness
- rebuild strategy
- ownership
- reconciliation mechanism
Third, the flow between write model and read model is evented or change-driven, not coupled through direct transactional joins.
A write model handles commands, persists state in a transactional store, and emits domain events or durable change notifications. Those events feed one or more projection pipelines that materialize read models in fit-for-purpose stores: relational tables, Elasticsearch, Redis, Cassandra, MongoDB, graph stores, or simply denormalized Postgres tables. The technology is secondary. The semantics are not.
Here is the high-level shape.
The point of this picture is not that Kafka is mandatory. It isn’t. The point is that write flow and read flow are different flows with different responsibilities.
The write flow decides.
The read flow distributes.
That difference sounds small. It changes everything.
Architecture
A good CQRS architecture begins with bounded contexts, not with infrastructure. If you cannot say where decisions live, you are not ready to split reads from writes.
Write model flow
The write model should be narrow, intentional, and stubborn about invariants. Commands enter through an application service or command handler, are validated against domain rules, and commit through aggregates or transaction scripts appropriate to the context. If event sourcing is used, events become the write persistence model. If not, state is stored conventionally and events are published through an outbox or CDC mechanism.
The key is reliability of change propagation. In enterprises, “we publish events after commit” is often where systems become folklore. If publication is not atomic with state change, you will eventually lose events. Use an outbox pattern or database log-based CDC. Hope is not a delivery guarantee.
Read model flow
Projection services subscribe to change events and transform them into query-optimized data structures. A projection can be as simple as “copy order and shipment status into a denormalized table,” or as complex as “maintain rolling risk metrics over a customer portfolio.”
Projection logic should be:
- idempotent
- replayable
- versioned
- measurable
If you cannot replay your read model from source events or source state, you have built a fragile cache with no repair path.
Domain semantics in read models
Read models should retain business language, but not aggregate boundaries. That distinction matters.
Suppose the write side has separate aggregates for Customer, Order, and Credit Exposure. A customer service read model may legitimately join these into a single operational screen. That does not mean you now have a “CustomerDashboardAggregate.” It means you have a denormalized query representation.
This is where architects need some discipline. Read models can cross write boundaries, but they should do so consciously and with the semantics of a view, not an authority.
Topology choices
There isn’t one read model. There are several common topologies:
- Local read store per service
A service materializes its own query tables for its own API. This is often the cleanest option.
- Consumer-owned projections
Downstream teams subscribe to events and build their own read stores. This supports autonomy well.
- Shared enterprise views
A central platform builds broad operational views like Customer 360 or Asset 360. Useful, but dangerous when they become pseudo-master systems.
- Specialized stores
Search indexes, graph projections, cache grids, OLAP cubes, and feature stores for ML. These are legitimate read models too.
A common enterprise pattern mixes all four.
This is where the phrase “cache topology” earns its keep. These views are copies with purpose. Some are narrow, some are composite, some are temporary, some are strategic. But they are all caches in the architectural sense: materialized representations of facts sourced elsewhere.
Strong vs eventual consistency
Not all reads belong on an eventual-consistency path.
There are usually three categories:
- Read-your-write critical: after changing data, the same user must see the updated result immediately. Keep this on the write side or use synchronous read repair.
- Operationally fresh: seconds of delay are acceptable, but not minutes.
- Analytical / observational: delays are acceptable as long as they are visible and bounded.
A mature CQRS system does not pretend every read can tolerate lag. It classifies them.
API design
Do not expose raw read stores directly. Put query APIs in front of them. The API should name the view in business terms and hide the storage and projection details. Otherwise every consumer becomes coupled to the cache layout, and your “optimized read side” hardens into another legacy schema.
Migration Strategy
This is not a big-bang pattern. It is a strangler pattern.
If you try to redesign the whole read/write split in one program increment, you will spend a year building plumbing while the business invents new exceptions. The better move is progressive extraction.
Start with pain, not ideology.
Pick one query flow that is expensive, unstable, or semantically awkward in the current transactional model. Build one read model for it. Feed it from either:
- database CDC into Kafka
- transactional outbox events
- carefully scoped synchronous update logic, if the system is still monolithic and you need a stepping stone
Then route only that query to the new model. Leave the rest alone.
Over time, repeat.
A practical migration sequence often looks like this:
- Observe current query workloads and identify hotspots.
- Define query contracts for one or two business-critical read scenarios.
- Publish reliable changes from the write system using outbox or CDC.
- Build projections into a denormalized read store.
- Run dual reads and compare old vs new responses.
- Cut over selected consumers.
- Backfill and replay until rebuilds are routine.
- Retire direct reads from transactional tables where possible.
The important phrase there is dual reads and compare. Reconciliation is not optional during migration.
Reconciliation
A read model is only useful if you can prove it is close enough to truth for its purpose.
That means establishing reconciliation mechanisms:
- count parity checks
- key-level sampling
- hash comparisons over canonical fields
- lag measurement by topic offset or event timestamp
- business exception reports for mismatched totals or statuses
Reconciliation is where architectural seriousness shows up. Teams love event-driven diagrams. Fewer teams love admitting that projections drift, events can be replayed out of order, schema changes break mappings, and backfills often produce surprises. But these are the real concerns.
A mature migration includes both online and offline reconciliation:
- online, to detect live drift
- offline, to validate rebuilds and release changes safely
Progressive strangler migration
The strangler approach works especially well in enterprises where the “legacy system” is not one thing but a stack of shared schemas, batch jobs, ETL, and screen-specific SQL.
A sensible path is:
- first, externalize change events
- second, create one or two consumer-owned read stores
- third, move UI/API query paths to those stores
- fourth, stop allowing new direct reads from write schemas
- fifth, decompose bounded contexts only after you have query independence
A lot of microservice migrations fail because they split services before splitting reads. That leaves every new service still reaching back into the old database to answer queries. You get distributed writes and centralized reads. The worst of both worlds.
Enterprise Example
Consider a global insurance carrier modernizing claims operations.
The core claims platform is a monolithic policy and claims system backed by Oracle. It handles first notice of loss, claim adjudication, reserve changes, payments, recoveries, and compliance workflows. Over twenty years, the system accumulated:
- custom reports
- agent portals
- adjuster workbenches
- fraud analytics feeds
- finance extracts
- call-center screens
Every one of those consumers reads from the same transactional schema. The write model is strong. The read story is chaos.
A claims adjuster opening a case needs:
- claim summary
- claimant contact details
- policy coverage snapshot
- payment history
- reserve movements
- tasks and notes
- fraud flags
- document status
No single aggregate owns all that, and it shouldn’t. Yet the old system serves it via monstrous SQL and brittle middle-tier joins. During catastrophe events, read load degrades write throughput. Claim intake suffers exactly when the business needs it most.
The carrier introduces Kafka with CDC from Oracle, not because Kafka is fashionable, but because it provides a durable change stream without rewriting the claims core on day one.
They create several read models:
- Adjuster Case View
Denormalized relational store optimized for workbench screens.
- Claim Search Index
Elasticsearch projection for flexible search by claim number, VIN, address, claimant name, catastrophe code.
- Finance Reconciliation View
Ledger-oriented projection for payment and reserve balancing.
- Fraud Feature Projection
Event-derived features for anomaly scoring.
The write model remains in the claims core initially. Commands still go through the legacy application boundary. But reads progressively move to the new projections.
This changes the economics of the whole system:
- adjuster screens become faster and more resilient
- search no longer hammers Oracle
- finance gets a purpose-built balancing view
- fraud gets near-real-time features
- the core transactional engine can later be decomposed bounded context by bounded context
The interesting part is not the technology. It is the domain language.
The team is careful not to crown the Adjuster Case View as the “master claim record.” It is explicitly a read model. The authoritative semantics remain in the bounded contexts that own adjudication, payments, reserves, and policy coverage. That discipline prevents a common enterprise failure: turning a convenience view into a shadow system of record.
Operational Considerations
Read models are operational systems. Treat them that way.
Lag visibility
Users need freshness visibility. If a dashboard is ten minutes behind, say so. If a workbench view is seconds behind, track and alarm on lag. Hidden staleness erodes trust faster than visible delay.
Replay and rebuild
Every projection must have a rebuild story:
- full replay from Kafka
- selective replay by key or date range
- snapshot plus incremental catch-up
- schema-versioned projection code
If rebuild takes three days and needs heroics, your cache topology is brittle.
Idempotency and ordering
Kafka gives durable streams, not magic semantics. Projections should handle:
- duplicate events
- out-of-order delivery across partitions
- late-arriving corrections
- tombstones and deletions
Design projection keys and version checks accordingly.
Data governance
Copies multiply compliance scope. PII, retention rules, masking, legal hold, right-to-erasure obligations, and auditability all now apply across multiple read stores. Architects who celebrate denormalization without governance are building tomorrow’s regulatory incident. EA governance checklist
Ownership
Every read model should have an owning team. Shared views with no owner become stale dumping grounds. This happens all the time in enterprises. “Customer 360” sounds strategic until nobody can answer who validates its correctness.
SLOs
Set service level objectives separately for:
- command success and latency
- read query latency
- projection lag
- rebuild duration
- reconciliation drift
If you roll them all into one availability number, you will miss the real failure.
Tradeoffs
Let’s be plain about it. CQRS with dedicated read models is a trade, not a triumph.
What you gain
- query performance shaped to usage
- write model purity around invariants
- reduced coupling between transactional schemas and consumers
- autonomy for downstream services
- scalability across heterogeneous workloads
- freedom to use fit-for-purpose storage
What you pay
- more moving parts
- eventual consistency
- duplicate data
- projection code and schema versioning
- replay and backfill complexity
- reconciliation overhead
- governance spread across more stores
The biggest tradeoff is cognitive, not technical. Teams must learn that there is no single representation of the domain suitable for every purpose. Some leaders find that unsettling. They want one truth. Enterprises do need one truth of record for decisions. They do not need one shape of data for all access.
That distinction is the whole game.
Failure Modes
Most CQRS failures are not caused by the pattern. They are caused by sloppy semantics and weak operational discipline.
1. The read model becomes a fake source of truth
A broad operational view starts receiving updates “because it’s convenient.” Soon business processes depend on it. Now you have split-brain authority.
2. Event publication is unreliable
Without outbox or CDC, events are lost during failures between transaction commit and publish. The read side silently drifts.
3. Projection logic embeds business decisions
Read projections start calculating eligibility, approval logic, or pricing outcomes. That logic belongs on the write side. A read model can summarize decisions, not invent them.
4. No replay strategy
A schema bug corrupts six months of projections. There is no replay capability. The team manually patches data for weeks.
5. Composite views hide bounded context violations
A “single view” starts joining everything to everything. The organization mistakes convenience for cohesion and loses domain boundaries.
6. Inconsistent freshness expectations
Product teams assume all views are current. The architecture team assumes eventual consistency is understood. Nobody made it explicit. Users see contradictory states and file incidents.
7. Kafka is used as an excuse for vague contracts
Events are poorly versioned, too low-level, or tied to table changes. Downstream projections become brittle. Messaging middleware cannot rescue bad domain contracts.
When Not To Use
CQRS read models are not a moral upgrade. Sometimes they are exactly the wrong move.
Do not use them when:
- the domain is simple and CRUD-oriented
- read and write patterns are modest and stable
- one relational model serves both workloads adequately
- the team lacks operational maturity for eventing and replay
- consistency requirements are immediate for nearly every read
- the problem is bad indexing, not model mismatch
If your application is an internal admin tool with low traffic and straightforward queries, splitting into command and read flows is probably architecture cosplay.
Likewise, if your biggest problem is that the database is underindexed and the ORM generates terrible SQL, fix that first. Not every performance issue deserves a projection pipeline.
And if your organization cannot yet handle schema evolution, versioned events, or owning distributed data copies, adding CQRS will amplify confusion. There is no shame in a well-designed modular monolith with a few denormalized reporting tables. In fact, that is often the right first step.
Related Patterns
Read models as cache topology sits near several related patterns.
Event Sourcing
Often paired with CQRS, but not required. Event sourcing makes replay natural because events are the source of truth. It also raises complexity. Use it when auditability, temporal modeling, and domain event history are central, not because a conference talk made it sound elegant.
Outbox Pattern
Essential when write state and event publication must be reliable without distributed transactions. In enterprise systems, this is often the hinge that makes CQRS workable.
Change Data Capture
A strong migration tool for legacy systems. CDC is especially useful in strangler migrations where the write model remains in an existing database while new read models are externalized.
Materialized Views
The database-native cousin of read models. Sometimes entirely sufficient, especially inside a modular monolith. Don’t dismiss them just because they aren’t fashionable.
API Composition
Useful for some query scenarios, but dangerous as a substitute for proper read models when latency, fan-out, or cross-service joins become significant. Runtime composition is not free.
Data Mesh and Analytical Projections
Read models overlap with analytical data products, but the purpose differs. Operational read models support application queries. Analytical products support broader analysis. Sometimes the same events feed both, but they should not be conflated.
Summary
The cleanest way to think about CQRS is not “two models.” It is decision model and cache topology.
The write model protects the meaning of the business. It guards invariants, records decisions, and defines what is true. The read side spreads those truths into forms that people and systems can use efficiently. Those forms are copies, and because they are copies, they must be designed with freshness, ownership, replay, reconciliation, and domain semantics in mind.
That is why read models are not an implementation detail. They are an architectural choice about where information lives, how it flows, and how much inconsistency the business can tolerate in exchange for speed and autonomy.
Done well, this approach gives you:
- cleaner bounded contexts
- faster and more useful queries
- safer migration from legacy systems
- better fit with Kafka and microservice ecosystems
- explicit tradeoffs instead of accidental database abuse
Done badly, it gives you:
- duplicate truth
- broken projections
- hidden drift
- unreadable event contracts
- one more distributed system nobody really understands
So be opinionated.
Keep the write model small and semantic.
Make the read side explicit and purpose-built.
Use strangler migration, not revolution.
Reconcile relentlessly.
And never let a convenient dashboard view pretend it owns the business.
That way lies architecture that lasts.
Frequently Asked Questions
What is CQRS?
Command Query Responsibility Segregation separates read and write models. Commands mutate state; queries read from a separate optimised read model. This enables independent scaling of reads and writes and allows different consistency models for each side.
What is the Saga pattern?
A Saga manages long-running transactions across multiple services without distributed ACID transactions. Each step publishes an event; if a step fails, compensating transactions roll back previous steps. Choreography-based sagas use events; orchestration-based sagas use a central coordinator.
What is the outbox pattern?
The transactional outbox pattern solves dual-write problems — ensuring a database update and a message publication happen atomically. The service writes both to its database and an outbox table in one transaction; a relay process reads the outbox and publishes to the message broker.