Command Routing by Domain in CQRS Architecture

⏱ 20 min read

Most command routing discussions start in the wrong place. They start with transports, topics, brokers, frameworks, handlers, and a parade of boxes connected by arrows. That’s backwards.

Command routing is not, at heart, a messaging problem. It’s a decision problem. More specifically, it’s the question of which part of the business is allowed to make a change.

That distinction matters. A system can have Kafka, REST, gRPC, event streams, sagas, an API gateway, and still be architecturally confused if it routes commands according to technical convenience instead of domain meaning. In those systems, “UpdateCustomer,” “ApproveOrder,” and “ApplyCreditLimit” all bounce around like luggage with the tags torn off. Eventually they arrive somewhere. Nobody feels safe about how.

In a CQRS architecture, command routing by domain is the discipline of sending every write request to the domain boundary that owns the decision. Not the service that happens to have the data. Not the team that built the first endpoint. Not the application that receives the HTTP call. The owner of the business rule gets the command.

That sounds obvious. In practice, it is one of the hardest habits for enterprises to build, because most estates have grown around databases and channels, not bounded contexts. There are ERP platforms with decades of sediment. CRM systems that became accidental masters. Kafka topics that mirror tables. Microservices named after nouns but coupled by shared assumptions. Somewhere in the middle, CQRS gets introduced as a modernization strategy, and people discover a hard truth: if the domains are muddled, command routing just amplifies the muddle at scale.

Done well, though, routing by domain becomes one of the sharpest tools in enterprise architecture. It gives you clearer ownership, safer migrations, better auditability, cleaner microservice boundaries, and far less accidental duplication of business rules. It also gives you a practical way to strangler modernize large systems without pretending you can redraw the whole enterprise in a quarter. microservices architecture diagrams

This article looks at command routing by domain in CQRS architecture from the angle that matters in the real world: domain semantics, migration reasoning, operational reality, and where it falls apart if you use it carelessly.

Context

CQRS is often explained as “separate reads from writes.” True, but too shallow to be useful. The more important move is this: treat state changes as explicit business decisions, not as generic CRUD operations.

A command is not just a write. It is an intent with meaning. “PlaceOrder.” “SuspendPolicy.” “ApproveClaim.” “IssueRefund.” “RegisterDevice.” The shape of a good command tells you something about the language of the business. It carries purpose, preconditions, authorization expectations, and side effects that matter.

That leads directly to routing. If a command expresses a business decision, it must be evaluated by the domain boundary that understands its invariants. In domain-driven design terms, that means routing into the bounded context that owns the aggregate or process responsible for making the decision.

This is where many enterprises get tangled. They model routing by entry point:

web channel routes one way
partner API another
back-office batch a third
mobile app through a gateway
internal jobs directly to the database

The result is fractured command semantics. Different channels trigger the same business outcome in different ways, often bypassing the same validations in different places. Audit trails splinter. Reconciliation becomes a standing cost rather than an exception. Teams quietly re-implement rules to keep the lights on.

Routing by domain cuts through that. It says: regardless of source, all commands that alter a business capability should arrive through a single semantic ownership boundary.

This does not mean one giant service. Quite the opposite. It means many boundaries, each with crisp responsibility.

Problem

The problem emerges when systems route commands according to infrastructure or data layout rather than domain ownership.

A common anti-pattern looks like this:

Customer profile lives in CRM
Credit status lives in finance
Order placement starts in ecommerce
Fulfillment status lives in warehouse
Customer service can override nearly anything through a back-office tool

Now imagine a command such as ApproveOrder. Which service should handle it?

If you answer “the order service,” you may still be wrong. Approval might depend on credit exposure, regulatory checks, customer segment exceptions, fraud posture, and allocation constraints. If those rules are scattered, then “approve” is not owned anywhere. It is merely assembled.

And assembled decisions are dangerous decisions.

The symptoms are familiar:

duplicate business validation in multiple services
direct database updates that bypass domain rules
command handlers coupled to several downstream systems synchronously
different outcomes for the same command depending on source channel
brittle compensations and constant reconciliation jobs
Kafka topics becoming de facto integration contracts for operational writes
teams arguing over “system of record” because no domain owner is clear

At that point, CQRS is often blamed. But CQRS did not create the confusion. It exposed it.

Forces

There are several competing forces shaping command routing in enterprise CQRS systems.

1. Domain integrity vs delivery speed

The business wants change now. Routing through the correct domain boundary can feel slower than letting whichever system has the screen perform the update. But shortcuts here are expensive. Every bypass creates another place where truth can drift.

2. Local autonomy vs end-to-end consistency

Microservices encourage local ownership. Good. But an enterprise capability like order approval or claims adjudication often crosses contexts. Routing must preserve domain autonomy while still ensuring that key decisions are made once, in the right place.

3. Transactional certainty vs asynchronous resilience

Synchronous routing gives immediate feedback. Asynchronous routing via Kafka scales and decouples, but introduces delayed consistency, retries, duplicates, and temporal ambiguity. Both are useful. Neither is free. event-driven architecture patterns

4. Legacy preservation vs modernization

A progressive strangler migration rarely starts with clean bounded contexts. Often the legacy system still owns the authoritative write path. New command routing has to coexist with old pathways without causing double processing or contradictory state.

5. Auditability vs operational simplicity

Enterprises need to answer hard questions: who issued this command, under what authority, against which version of state, and why did the outcome differ from expectations? Proper routing improves this, but only if command metadata, correlation IDs, and decision logs are treated as first-class concerns.

6. Data ownership vs decision ownership

This is subtle and often missed. The system that stores the data is not always the system that owns the decision. A customer balance may be stored in one place, but credit authorization belongs elsewhere. Routing by storage is a trap. Routing by domain semantics is the point.

Solution

The core solution is simple to say and demanding to implement:

Route commands to the bounded context that owns the business decision and its invariants.

That means a few concrete rules.

Commands are semantic, not CRUD-shaped

A command should express intent in domain language. Prefer ApproveInvoice, ReserveInventory, SuspendAccount, RegisterClaim over UpdateInvoiceStatus or SaveAccount.

CRUD-shaped commands usually indicate one of two problems: an anemic domain model, or a UI workflow leaking directly into the write model.

Routing keys should reflect domain identity

Commands should be routed using identifiers that map to domain ownership: order ID, account ID, policy ID, claim ID, merchant ID. This supports both aggregate-level consistency and partitioning strategies in distributed infrastructure like Kafka.

The write path enters through the domain owner

Every channel—web, mobile, partner API, batch, internal operations—should converge on the same command contract and route into the same domain boundary. Different channels may have different validation at the edge, but the core decision logic belongs in one place.

Cross-domain dependencies should be policy-driven, not hidden joins

If approving an order requires credit and fraud information, the approval domain should rely on published facts, policy services, or explicit orchestration—not ad hoc synchronous joins deep inside handlers if those create hidden coupling.

Events are outcomes, not backdoor commands

In Kafka-heavy architectures, teams often sneak commands through event topics. A topic named customer-updated becomes a de facto instruction for downstream systems to mutate state. That is sloppy. Commands ask for a decision. Events report that a decision or fact already happened. Mixing the two creates some of the nastiest failure modes in distributed systems.

Architecture

A practical architecture typically has these pieces:

channel-specific ingress: API gateway, BFF, partner endpoint, batch adapter
command normalization and authorization
a command router that resolves target bounded context
domain services / aggregates handling commands
transactional persistence in the owning service
domain events published after successful state change
read models updated asynchronously
reconciliation and observability around the edges

Here is the basic routing shape.

The command router itself does not contain business rules. It contains routing rules based on domain ownership. That distinction is crucial. Once the router starts deciding whether an order can be approved or whether a claim is suspicious, you have built a god component with better branding.

A robust router usually answers questions like:

what bounded context owns this command type?
what identity or tenant key determines partitioning?
should the route be synchronous or asynchronous?
what version or migration path applies for this command?
what correlation metadata must travel with it?

Synchronous and asynchronous routing

Not every command should travel the same way.

Use synchronous routing where the caller genuinely needs an immediate accept/reject decision and the owning domain can respond safely in-line. For example, PlaceOrder may need immediate validation and acceptance.

Use asynchronous routing where the command starts a long-running process, crosses domain boundaries, or needs buffering under load. For example, InitiatePolicyCancellation or RepricePortfolio.

Kafka is often relevant here, but it must be used with discipline. Kafka is excellent for durable transport, partitioned processing, and event distribution. It is not a substitute for domain design. A Kafka topic full of vaguely named messages does not make the architecture event-driven; it makes it harder to reason about.

A more complete picture looks like this:

Diagram 2 — Synchronous and asynchronous routing

Notice the outbox. In enterprise systems, if command handling writes to the database and publishes events separately, eventually you will lose one side of that bargain. Maybe not in test. In production, at 2 a.m., during a deployment, under partial network failure, certainly. The transactional outbox remains one of the least glamorous and most valuable patterns in this space.

Domain semantics and aggregate boundaries

Routing by domain only works if the command target has authority to enforce invariants. This often means commands resolve to an aggregate boundary or to a process manager in the owning context.

For example:

ApproveOrder belongs to Order Management if order approval is its invariant and it consumes credit/fraud outcomes as facts.
AdjustCreditLimit belongs to Credit Risk, even if customer service initiates it.
IssueRefund may belong to Payments rather than Orders, because money movement invariants live there.
SuspendCustomer may actually split into SuspendAccountAccess, BlockPayments, and FlagForReview, each in different bounded contexts.

That last point matters. Sometimes bad routing is a symptom of a bad command. If a command spans several domains’ decisions, it may be too broad.

Migration Strategy

This is where architecture earns its keep. Greenfield advice is cheap. Most enterprises are not greenfield. They are archaeological sites with APIs.

A sensible migration strategy uses a progressive strangler approach. You do not replace all command paths at once. You place a routing layer in front of existing write paths, then gradually redirect commands into new domain-owned services.

The migration usually goes through stages.

Stage 1: Observe and classify

Start by inventorying write operations, not services. What commands exist in practice? Which are real business decisions, and which are just disguised table updates? Who currently handles them? Which channels bypass the main path?

This step often uncovers unpleasant truths. A single business action may have six write paths. That is normal. It is also why migration fails when teams begin with service decomposition diagrams instead of command analysis.

Stage 2: Introduce canonical commands

Define a canonical command model aligned to domain language. Do not boil the ocean. Start with high-value flows: order placement, claim submission, payment authorization, policy endorsement.

Canonical here does not mean one enterprise-wide mega-schema. It means a clear command contract for a domain capability.

Stage 3: Put a router in front of legacy and new handlers

The router initially sends some commands to legacy handlers and some to new domain services. This allows migration without a channel-by-channel rewrite.

Stage 4: Publish events and reconcile

As new handlers take ownership, emit domain events and compare resulting state with legacy outcomes. Reconciliation is not optional in migration; it is how you discover semantic mismatches before they become financial losses.

Stage 5: Strangle legacy write paths

Gradually disable direct writes, old APIs, and batch jobs that mutate the same state outside the new routing model. This is usually the hardest political step, because many hidden dependencies emerge here.

A progressive migration can be visualized like this:

Reconciliation is the price of honesty

In migration, reconciliation is often treated as temporary plumbing. It deserves more respect than that.

When commands move from legacy to domain-owned services, subtle differences appear:

legacy rounds money differently
legacy allows transitions the new model rejects
old batch jobs apply updates in a different order
duplicate suppression rules differ
timezone handling changes outcomes
“optional” fields turn out not to be optional

A reconciliation process compares expected and actual state across systems, raises drift, and supports repair. In financial services, insurance, telecom, and logistics, this is not luxury architecture. It is survival.

You do not need to reconcile everything forever. But during strangler migration, you absolutely need enough reconciliation to prove that the new route preserves business truth.

Enterprise Example

Consider a global insurer modernizing claims processing.

The insurer has:

a core policy administration platform
a claims mainframe
a CRM used by agents and call centers
a fraud platform
Kafka used as the enterprise event backbone
several new microservices for digital channels

Historically, claim-related updates happen everywhere. The portal writes through a claims API. Call center tooling can alter status directly through CRM integration. Batch feeds from partners create and amend claims overnight. Fraud flags are written back into claims records by integration jobs.

The business asks for faster digital claims handling, better straight-through processing, and cleaner auditability.

At first glance, the team proposes a “Claim Service” microservice. Sensible. But the real work is not building a service. It is deciding what commands belong there.

Through domain analysis, they identify bounded contexts:

Claim Intake: receives submissions, validates completeness
Claim Adjudication: decides eligibility and settlement rules
Fraud Assessment: scores and flags suspicious cases
Policy Coverage: answers coverage facts
Payments: issues settlement payments
Customer Interaction: tracks communications and tasks

Now routing becomes clear.

SubmitClaim routes to Claim Intake
AdjudicateClaim routes to Claim Adjudication
FlagClaimForInvestigation routes to Fraud Assessment
AuthorizeSettlementPayment routes to Payments
UpdateContactPreference routes to Customer Interaction

Crucially, ApproveClaim does not route to the portal, the CRM, or the payment service, even though all of them have screens and data involved. It routes to Claim Adjudication, because that context owns the decision.

Kafka is used for events:

ClaimSubmitted
CoverageConfirmed
FraudFlagRaised
ClaimApproved
SettlementAuthorized

But commands are not broadcast as generic events. They are routed explicitly to owning services.

During migration, the router initially forwards SubmitClaim to both the legacy claims stack and the new Claim Intake service in shadow mode. Outcomes are compared. Reconciliation finds several mismatches:

legacy accepts claim dates in local office timezone; new service normalizes to UTC and changes date boundaries
some partner submissions omit a field that the new service treated as mandatory
duplicate claim detection in legacy is fuzzy and household-based; the new model used policy-and-loss-date only

Without reconciliation, these would have become production defects with legal and financial impact.

Over time, command ownership moves:

digital portal claims -> new Claim Intake
partner batch claims -> new Intake adapters
call center manual updates -> CRM now issues commands instead of direct writes
approval and payment authorization progressively shift to adjudication and payments domains
direct mainframe updates are retired

The result is not just cleaner microservices. The insurer gains:

consistent decision ownership
traceable command audit from channel to outcome
reduced duplicate rules
safer rollout of fraud policy changes
better resilience because adjudication and payment can scale independently
simpler reasoning about who may change what

That is what routing by domain buys you when done well.

Operational Considerations

This pattern lives or dies in operations.

Idempotency

Commands will be retried. Network failures, client retries, consumer restarts, and Kafka rebalances guarantee it. Every command handler needs an idempotency strategy keyed by command ID or business identity plus version.

Ordering

Some commands require strict order per aggregate or account. Kafka partitions can help if the routing key maps to the aggregate identity. But ordering across aggregates or across topics is a fantasy many teams accidentally depend on. Don’t.

Backpressure

If one domain becomes a hotspot, routing must support buffering or throttling without taking down unrelated capabilities. Domain-based partitioning helps isolate blast radius.

Observability

Track command acceptance, rejection, processing latency, retries, dead letters, emitted events, and reconciliation drift. Correlation IDs must flow from ingress through command handling and event publication.

Authorization

Authentication belongs at the edge; authorization often belongs in the domain. The command should carry actor and authority context, but the bounded context decides whether that actor may perform the business action.

Versioning

Commands evolve. During migration, you may route different versions to different handlers. Be explicit. Hidden version drift is one of those problems that only becomes visible after a quarter-end failure.

Dead-letter and repair

In asynchronous routing, dead-letter queues are not a garbage bin. They are an operational workbench. You need triage, replay, duplicate protection, and a clear policy for commands that can no longer be safely re-run.

Tradeoffs

There is no magic here. Routing by domain solves a class of problems by accepting other costs.

Benefits

clear ownership of business decisions
stronger domain invariants
channel-independent write behavior
better auditability and traceability
easier strangler migration path
less duplicate business logic
improved scalability through domain partitioning
cleaner event publication from authoritative sources

Costs

more upfront domain modeling
tension with existing org structures
possible latency from indirect routing
higher complexity in integration and observability
need for idempotency, outbox, and reconciliation mechanisms
harder local optimizations by channel teams
more care required around cross-domain workflows

The right question is not whether this pattern is “worth it” in abstract. It is whether your business suffers enough from ambiguous write ownership to justify the investment. In large enterprises, the answer is often yes.

Failure Modes

There are several classic ways this goes wrong.

1. The router becomes a god service

If the router starts owning validation, orchestration, enrichment, transformation, and policy decisions, it stops being a router. It becomes a monolith in a nicer suit.

2. Bounded contexts are named, not real

Teams create services called Order, Customer, or Billing, but commands still require hidden joins and downstream sync calls for basic decisions. That means the domain boundaries are cosmetic.

3. Events are used as commands

Publishing “RequestedCustomerAddressUpdate” on a public topic and hoping the right service treats it like a command is weak design. Commands need explicit ownership and handling guarantees.

4. Legacy backdoors remain

The new router is introduced, but old batch jobs and admin tools still write directly to the database. Eventually the “official” path and the real path diverge.

5. Reconciliation is skipped

Teams trust tests and dashboards, then discover months later that edge cases drifted state between legacy and new systems. By then the repair cost is ugly.

6. Aggregate boundaries are too coarse or too fine

Too coarse, and every command contends on a giant hotspot aggregate. Too fine, and invariants leak into distributed workflows that are hard to reason about.

7. Kafka is treated as architecture deodorant

When domain confusion smells bad, some organizations spray Kafka on it. The smell remains. It just becomes asynchronous.

When Not To Use

This pattern is powerful, but it is not mandatory everywhere.

Do not use command routing by domain if:

your application is small, single-team, and has trivial business rules
CRUD is genuinely sufficient and decision logic is minimal
the cost of eventual consistency and operational overhead outweighs the business value
your biggest problem is reporting, not write-path integrity
your domain boundaries are too immature to support stable routing
you lack the discipline to retire direct write backdoors

Also, do not force this pattern onto every internal admin workflow. Some enterprise back-office functions are operational data maintenance rather than domain decisions. Be honest about the difference. Not every field edit deserves a command bus and Kafka topic.

If you cannot identify meaningful domain semantics, stop. Naming database updates as commands does not make the design better. It just makes the PowerPoint look more modern.

Several adjacent patterns pair well with domain-based command routing.

Aggregates

Aggregates define the consistency boundary for handling commands. They are the place where invariants live.

Saga / Process Manager

Useful when one business process coordinates several domain-owned commands over time. Important caveat: a saga coordinates; it should not steal decision ownership from bounded contexts.

Transactional Outbox

Essential where state changes and event publication must remain consistent without distributed transactions.

Anti-Corruption Layer

Vital in strangler migrations. It translates between legacy semantics and the new domain model so the new service does not inherit old conceptual debt wholesale.

Event Sourcing

Sometimes useful, especially where decision history matters deeply. But command routing by domain does not require event sourcing. Do not couple the two by reflex.

Reconciliation Engine

Often neglected in pattern catalogs, but central in enterprise modernization. It compares outcomes across systems, detects drift, and supports repair during migration and after partial failures.

Summary

Command routing by domain in CQRS architecture is not about moving messages to handlers. It is about putting business decisions in the only place they can be made safely: the domain boundary that owns them.

That one move has far-reaching consequences. It clarifies ownership. It reduces duplicate rules. It makes Kafka useful instead of decorative. It gives microservices sharper edges. It enables progressive strangler migration without pretending the legacy estate will vanish overnight. And it forces an honest conversation about domain semantics—who is allowed to decide what, and based on which invariants.

It also demands discipline. You need canonical commands, explicit ownership, outbox-based event publication, idempotent handlers, and a reconciliation strategy during migration. You need to shut down backdoors. You need to resist the temptation to let routers become brains. You need bounded contexts that are real, not merely labeled.

The memorable version is this: route commands to where the business meaning lives, not where the data happens to sit.

In enterprise architecture, that is the difference between a system that changes state and a system that makes decisions. CQRS is at its best when it helps you tell those apart.

Frequently Asked Questions

What is CQRS?

Command Query Responsibility Segregation separates read and write models. Commands mutate state; queries read from a separate optimised read model. This enables independent scaling of reads and writes and allows different consistency models for each side.

What is the Saga pattern?

A Saga manages long-running transactions across multiple services without distributed ACID transactions. Each step publishes an event; if a step fails, compensating transactions roll back previous steps. Choreography-based sagas use events; orchestration-based sagas use a central coordinator.

What is the outbox pattern?

The transactional outbox pattern solves dual-write problems — ensuring a database update and a message publication happen atomically. The service writes both to its database and an outbox table in one transaction; a relay process reads the outbox and publishes to the message broker.