⏱ 20 min read
Most command routing discussions start in the wrong place. They start with transports, topics, brokers, frameworks, handlers, and a parade of boxes connected by arrows. That’s backwards.
Command routing is not, at heart, a messaging problem. It’s a decision problem. More specifically, it’s the question of which part of the business is allowed to make a change.
That distinction matters. A system can have Kafka, REST, gRPC, event streams, sagas, an API gateway, and still be architecturally confused if it routes commands according to technical convenience instead of domain meaning. In those systems, “UpdateCustomer,” “ApproveOrder,” and “ApplyCreditLimit” all bounce around like luggage with the tags torn off. Eventually they arrive somewhere. Nobody feels safe about how.
In a CQRS architecture, command routing by domain is the discipline of sending every write request to the domain boundary that owns the decision. Not the service that happens to have the data. Not the team that built the first endpoint. Not the application that receives the HTTP call. The owner of the business rule gets the command.
That sounds obvious. In practice, it is one of the hardest habits for enterprises to build, because most estates have grown around databases and channels, not bounded contexts. There are ERP platforms with decades of sediment. CRM systems that became accidental masters. Kafka topics that mirror tables. Microservices named after nouns but coupled by shared assumptions. Somewhere in the middle, CQRS gets introduced as a modernization strategy, and people discover a hard truth: if the domains are muddled, command routing just amplifies the muddle at scale.
Done well, though, routing by domain becomes one of the sharpest tools in enterprise architecture. It gives you clearer ownership, safer migrations, better auditability, cleaner microservice boundaries, and far less accidental duplication of business rules. It also gives you a practical way to strangler modernize large systems without pretending you can redraw the whole enterprise in a quarter. microservices architecture diagrams
This article looks at command routing by domain in CQRS architecture from the angle that matters in the real world: domain semantics, migration reasoning, operational reality, and where it falls apart if you use it carelessly.
Context
CQRS is often explained as “separate reads from writes.” True, but too shallow to be useful. The more important move is this: treat state changes as explicit business decisions, not as generic CRUD operations.
A command is not just a write. It is an intent with meaning. “PlaceOrder.” “SuspendPolicy.” “ApproveClaim.” “IssueRefund.” “RegisterDevice.” The shape of a good command tells you something about the language of the business. It carries purpose, preconditions, authorization expectations, and side effects that matter.
That leads directly to routing. If a command expresses a business decision, it must be evaluated by the domain boundary that understands its invariants. In domain-driven design terms, that means routing into the bounded context that owns the aggregate or process responsible for making the decision.
This is where many enterprises get tangled. They model routing by entry point:
- web channel routes one way
- partner API another
- back-office batch a third
- mobile app through a gateway
- internal jobs directly to the database
The result is fractured command semantics. Different channels trigger the same business outcome in different ways, often bypassing the same validations in different places. Audit trails splinter. Reconciliation becomes a standing cost rather than an exception. Teams quietly re-implement rules to keep the lights on.
Routing by domain cuts through that. It says: regardless of source, all commands that alter a business capability should arrive through a single semantic ownership boundary.
This does not mean one giant service. Quite the opposite. It means many boundaries, each with crisp responsibility.
Problem
The problem emerges when systems route commands according to infrastructure or data layout rather than domain ownership.
A common anti-pattern looks like this:
- Customer profile lives in CRM
- Credit status lives in finance
- Order placement starts in ecommerce
- Fulfillment status lives in warehouse
- Customer service can override nearly anything through a back-office tool
Now imagine a command such as ApproveOrder. Which service should handle it?
If you answer “the order service,” you may still be wrong. Approval might depend on credit exposure, regulatory checks, customer segment exceptions, fraud posture, and allocation constraints. If those rules are scattered, then “approve” is not owned anywhere. It is merely assembled.
And assembled decisions are dangerous decisions.
The symptoms are familiar:
- duplicate business validation in multiple services
- direct database updates that bypass domain rules
- command handlers coupled to several downstream systems synchronously
- different outcomes for the same command depending on source channel
- brittle compensations and constant reconciliation jobs
- Kafka topics becoming de facto integration contracts for operational writes
- teams arguing over “system of record” because no domain owner is clear
At that point, CQRS is often blamed. But CQRS did not create the confusion. It exposed it.
Forces
There are several competing forces shaping command routing in enterprise CQRS systems.
1. Domain integrity vs delivery speed
The business wants change now. Routing through the correct domain boundary can feel slower than letting whichever system has the screen perform the update. But shortcuts here are expensive. Every bypass creates another place where truth can drift.
2. Local autonomy vs end-to-end consistency
Microservices encourage local ownership. Good. But an enterprise capability like order approval or claims adjudication often crosses contexts. Routing must preserve domain autonomy while still ensuring that key decisions are made once, in the right place.
3. Transactional certainty vs asynchronous resilience
Synchronous routing gives immediate feedback. Asynchronous routing via Kafka scales and decouples, but introduces delayed consistency, retries, duplicates, and temporal ambiguity. Both are useful. Neither is free. event-driven architecture patterns
4. Legacy preservation vs modernization
A progressive strangler migration rarely starts with clean bounded contexts. Often the legacy system still owns the authoritative write path. New command routing has to coexist with old pathways without causing double processing or contradictory state.
5. Auditability vs operational simplicity
Enterprises need to answer hard questions: who issued this command, under what authority, against which version of state, and why did the outcome differ from expectations? Proper routing improves this, but only if command metadata, correlation IDs, and decision logs are treated as first-class concerns.
6. Data ownership vs decision ownership
This is subtle and often missed. The system that stores the data is not always the system that owns the decision. A customer balance may be stored in one place, but credit authorization belongs elsewhere. Routing by storage is a trap. Routing by domain semantics is the point.
Solution
The core solution is simple to say and demanding to implement:
Route commands to the bounded context that owns the business decision and its invariants.
That means a few concrete rules.
Commands are semantic, not CRUD-shaped
A command should express intent in domain language. Prefer ApproveInvoice, ReserveInventory, SuspendAccount, RegisterClaim over UpdateInvoiceStatus or SaveAccount.
CRUD-shaped commands usually indicate one of two problems: an anemic domain model, or a UI workflow leaking directly into the write model.
Routing keys should reflect domain identity
Commands should be routed using identifiers that map to domain ownership: order ID, account ID, policy ID, claim ID, merchant ID. This supports both aggregate-level consistency and partitioning strategies in distributed infrastructure like Kafka.
The write path enters through the domain owner
Every channel—web, mobile, partner API, batch, internal operations—should converge on the same command contract and route into the same domain boundary. Different channels may have different validation at the edge, but the core decision logic belongs in one place.
Cross-domain dependencies should be policy-driven, not hidden joins
If approving an order requires credit and fraud information, the approval domain should rely on published facts, policy services, or explicit orchestration—not ad hoc synchronous joins deep inside handlers if those create hidden coupling.
Events are outcomes, not backdoor commands
In Kafka-heavy architectures, teams often sneak commands through event topics. A topic named customer-updated becomes a de facto instruction for downstream systems to mutate state. That is sloppy. Commands ask for a decision. Events report that a decision or fact already happened. Mixing the two creates some of the nastiest failure modes in distributed systems.
Architecture
A practical architecture typically has these pieces:
- channel-specific ingress: API gateway, BFF, partner endpoint, batch adapter
- command normalization and authorization
- a command router that resolves target bounded context
- domain services / aggregates handling commands
- transactional persistence in the owning service
- domain events published after successful state change
- read models updated asynchronously
- reconciliation and observability around the edges
Here is the basic routing shape.
The command router itself does not contain business rules. It contains routing rules based on domain ownership. That distinction is crucial. Once the router starts deciding whether an order can be approved or whether a claim is suspicious, you have built a god component with better branding.
A robust router usually answers questions like:
- what bounded context owns this command type?
- what identity or tenant key determines partitioning?
- should the route be synchronous or asynchronous?
- what version or migration path applies for this command?
- what correlation metadata must travel with it?
Synchronous and asynchronous routing
Not every command should travel the same way.
Use synchronous routing where the caller genuinely needs an immediate accept/reject decision and the owning domain can respond safely in-line. For example, PlaceOrder may need immediate validation and acceptance.
Use asynchronous routing where the command starts a long-running process, crosses domain boundaries, or needs buffering under load. For example, InitiatePolicyCancellation or RepricePortfolio.
Kafka is often relevant here, but it must be used with discipline. Kafka is excellent for durable transport, partitioned processing, and event distribution. It is not a substitute for domain design. A Kafka topic full of vaguely named messages does not make the architecture event-driven; it makes it harder to reason about.
A more complete picture looks like this:
Notice the outbox. In enterprise systems, if command handling writes to the database and publishes events separately, eventually you will lose one side of that bargain. Maybe not in test. In production, at 2 a.m., during a deployment, under partial network failure, certainly. The transactional outbox remains one of the least glamorous and most valuable patterns in this space.
Domain semantics and aggregate boundaries
Routing by domain only works if the command target has authority to enforce invariants. This often means commands resolve to an aggregate boundary or to a process manager in the owning context.
For example:
ApproveOrderbelongs to Order Management if order approval is its invariant and it consumes credit/fraud outcomes as facts.AdjustCreditLimitbelongs to Credit Risk, even if customer service initiates it.IssueRefundmay belong to Payments rather than Orders, because money movement invariants live there.SuspendCustomermay actually split intoSuspendAccountAccess,BlockPayments, andFlagForReview, each in different bounded contexts.
That last point matters. Sometimes bad routing is a symptom of a bad command. If a command spans several domains’ decisions, it may be too broad.
Migration Strategy
This is where architecture earns its keep. Greenfield advice is cheap. Most enterprises are not greenfield. They are archaeological sites with APIs.
A sensible migration strategy uses a progressive strangler approach. You do not replace all command paths at once. You place a routing layer in front of existing write paths, then gradually redirect commands into new domain-owned services.
The migration usually goes through stages.
Stage 1: Observe and classify
Start by inventorying write operations, not services. What commands exist in practice? Which are real business decisions, and which are just disguised table updates? Who currently handles them? Which channels bypass the main path?
This step often uncovers unpleasant truths. A single business action may have six write paths. That is normal. It is also why migration fails when teams begin with service decomposition diagrams instead of command analysis.
Stage 2: Introduce canonical commands
Define a canonical command model aligned to domain language. Do not boil the ocean. Start with high-value flows: order placement, claim submission, payment authorization, policy endorsement.
Canonical here does not mean one enterprise-wide mega-schema. It means a clear command contract for a domain capability.
Stage 3: Put a router in front of legacy and new handlers
The router initially sends some commands to legacy handlers and some to new domain services. This allows migration without a channel-by-channel rewrite.
Stage 4: Publish events and reconcile
As new handlers take ownership, emit domain events and compare resulting state with legacy outcomes. Reconciliation is not optional in migration; it is how you discover semantic mismatches before they become financial losses.
Stage 5: Strangle legacy write paths
Gradually disable direct writes, old APIs, and batch jobs that mutate the same state outside the new routing model. This is usually the hardest political step, because many hidden dependencies emerge here.
A progressive migration can be visualized like this:
Reconciliation is the price of honesty
In migration, reconciliation is often treated as temporary plumbing. It deserves more respect than that.
When commands move from legacy to domain-owned services, subtle differences appear:
- legacy rounds money differently
- legacy allows transitions the new model rejects
- old batch jobs apply updates in a different order
- duplicate suppression rules differ
- timezone handling changes outcomes
- “optional” fields turn out not to be optional
A reconciliation process compares expected and actual state across systems, raises drift, and supports repair. In financial services, insurance, telecom, and logistics, this is not luxury architecture. It is survival.
You do not need to reconcile everything forever. But during strangler migration, you absolutely need enough reconciliation to prove that the new route preserves business truth.
Enterprise Example
Consider a global insurer modernizing claims processing.
The insurer has:
- a core policy administration platform
- a claims mainframe
- a CRM used by agents and call centers
- a fraud platform
- Kafka used as the enterprise event backbone
- several new microservices for digital channels
Historically, claim-related updates happen everywhere. The portal writes through a claims API. Call center tooling can alter status directly through CRM integration. Batch feeds from partners create and amend claims overnight. Fraud flags are written back into claims records by integration jobs.
The business asks for faster digital claims handling, better straight-through processing, and cleaner auditability.
At first glance, the team proposes a “Claim Service” microservice. Sensible. But the real work is not building a service. It is deciding what commands belong there.
Through domain analysis, they identify bounded contexts:
- Claim Intake: receives submissions, validates completeness
- Claim Adjudication: decides eligibility and settlement rules
- Fraud Assessment: scores and flags suspicious cases
- Policy Coverage: answers coverage facts
- Payments: issues settlement payments
- Customer Interaction: tracks communications and tasks
Now routing becomes clear.
SubmitClaimroutes to Claim IntakeAdjudicateClaimroutes to Claim AdjudicationFlagClaimForInvestigationroutes to Fraud AssessmentAuthorizeSettlementPaymentroutes to PaymentsUpdateContactPreferenceroutes to Customer Interaction
Crucially, ApproveClaim does not route to the portal, the CRM, or the payment service, even though all of them have screens and data involved. It routes to Claim Adjudication, because that context owns the decision.
Kafka is used for events:
ClaimSubmittedCoverageConfirmedFraudFlagRaisedClaimApprovedSettlementAuthorized
But commands are not broadcast as generic events. They are routed explicitly to owning services.
During migration, the router initially forwards SubmitClaim to both the legacy claims stack and the new Claim Intake service in shadow mode. Outcomes are compared. Reconciliation finds several mismatches:
- legacy accepts claim dates in local office timezone; new service normalizes to UTC and changes date boundaries
- some partner submissions omit a field that the new service treated as mandatory
- duplicate claim detection in legacy is fuzzy and household-based; the new model used policy-and-loss-date only
Without reconciliation, these would have become production defects with legal and financial impact.
Over time, command ownership moves:
- digital portal claims -> new Claim Intake
- partner batch claims -> new Intake adapters
- call center manual updates -> CRM now issues commands instead of direct writes
- approval and payment authorization progressively shift to adjudication and payments domains
- direct mainframe updates are retired
The result is not just cleaner microservices. The insurer gains:
- consistent decision ownership
- traceable command audit from channel to outcome
- reduced duplicate rules
- safer rollout of fraud policy changes
- better resilience because adjudication and payment can scale independently
- simpler reasoning about who may change what
That is what routing by domain buys you when done well.
Operational Considerations
This pattern lives or dies in operations.
Idempotency
Commands will be retried. Network failures, client retries, consumer restarts, and Kafka rebalances guarantee it. Every command handler needs an idempotency strategy keyed by command ID or business identity plus version.
Ordering
Some commands require strict order per aggregate or account. Kafka partitions can help if the routing key maps to the aggregate identity. But ordering across aggregates or across topics is a fantasy many teams accidentally depend on. Don’t.
Backpressure
If one domain becomes a hotspot, routing must support buffering or throttling without taking down unrelated capabilities. Domain-based partitioning helps isolate blast radius.
Observability
Track command acceptance, rejection, processing latency, retries, dead letters, emitted events, and reconciliation drift. Correlation IDs must flow from ingress through command handling and event publication.
Authorization
Authentication belongs at the edge; authorization often belongs in the domain. The command should carry actor and authority context, but the bounded context decides whether that actor may perform the business action.
Versioning
Commands evolve. During migration, you may route different versions to different handlers. Be explicit. Hidden version drift is one of those problems that only becomes visible after a quarter-end failure.
Dead-letter and repair
In asynchronous routing, dead-letter queues are not a garbage bin. They are an operational workbench. You need triage, replay, duplicate protection, and a clear policy for commands that can no longer be safely re-run.
Tradeoffs
There is no magic here. Routing by domain solves a class of problems by accepting other costs.
Benefits
- clear ownership of business decisions
- stronger domain invariants
- channel-independent write behavior
- better auditability and traceability
- easier strangler migration path
- less duplicate business logic
- improved scalability through domain partitioning
- cleaner event publication from authoritative sources
Costs
- more upfront domain modeling
- tension with existing org structures
- possible latency from indirect routing
- higher complexity in integration and observability
- need for idempotency, outbox, and reconciliation mechanisms
- harder local optimizations by channel teams
- more care required around cross-domain workflows
The right question is not whether this pattern is “worth it” in abstract. It is whether your business suffers enough from ambiguous write ownership to justify the investment. In large enterprises, the answer is often yes.
Failure Modes
There are several classic ways this goes wrong.
1. The router becomes a god service
If the router starts owning validation, orchestration, enrichment, transformation, and policy decisions, it stops being a router. It becomes a monolith in a nicer suit.
2. Bounded contexts are named, not real
Teams create services called Order, Customer, or Billing, but commands still require hidden joins and downstream sync calls for basic decisions. That means the domain boundaries are cosmetic.
3. Events are used as commands
Publishing “RequestedCustomerAddressUpdate” on a public topic and hoping the right service treats it like a command is weak design. Commands need explicit ownership and handling guarantees.
4. Legacy backdoors remain
The new router is introduced, but old batch jobs and admin tools still write directly to the database. Eventually the “official” path and the real path diverge.
5. Reconciliation is skipped
Teams trust tests and dashboards, then discover months later that edge cases drifted state between legacy and new systems. By then the repair cost is ugly.
6. Aggregate boundaries are too coarse or too fine
Too coarse, and every command contends on a giant hotspot aggregate. Too fine, and invariants leak into distributed workflows that are hard to reason about.
7. Kafka is treated as architecture deodorant
When domain confusion smells bad, some organizations spray Kafka on it. The smell remains. It just becomes asynchronous.
When Not To Use
This pattern is powerful, but it is not mandatory everywhere.
Do not use command routing by domain if:
- your application is small, single-team, and has trivial business rules
- CRUD is genuinely sufficient and decision logic is minimal
- the cost of eventual consistency and operational overhead outweighs the business value
- your biggest problem is reporting, not write-path integrity
- your domain boundaries are too immature to support stable routing
- you lack the discipline to retire direct write backdoors
Also, do not force this pattern onto every internal admin workflow. Some enterprise back-office functions are operational data maintenance rather than domain decisions. Be honest about the difference. Not every field edit deserves a command bus and Kafka topic.
If you cannot identify meaningful domain semantics, stop. Naming database updates as commands does not make the design better. It just makes the PowerPoint look more modern.
Related Patterns
Several adjacent patterns pair well with domain-based command routing.
Aggregates
Aggregates define the consistency boundary for handling commands. They are the place where invariants live.
Saga / Process Manager
Useful when one business process coordinates several domain-owned commands over time. Important caveat: a saga coordinates; it should not steal decision ownership from bounded contexts.
Transactional Outbox
Essential where state changes and event publication must remain consistent without distributed transactions.
Anti-Corruption Layer
Vital in strangler migrations. It translates between legacy semantics and the new domain model so the new service does not inherit old conceptual debt wholesale.
Event Sourcing
Sometimes useful, especially where decision history matters deeply. But command routing by domain does not require event sourcing. Do not couple the two by reflex.
Reconciliation Engine
Often neglected in pattern catalogs, but central in enterprise modernization. It compares outcomes across systems, detects drift, and supports repair during migration and after partial failures.
Summary
Command routing by domain in CQRS architecture is not about moving messages to handlers. It is about putting business decisions in the only place they can be made safely: the domain boundary that owns them.
That one move has far-reaching consequences. It clarifies ownership. It reduces duplicate rules. It makes Kafka useful instead of decorative. It gives microservices sharper edges. It enables progressive strangler migration without pretending the legacy estate will vanish overnight. And it forces an honest conversation about domain semantics—who is allowed to decide what, and based on which invariants.
It also demands discipline. You need canonical commands, explicit ownership, outbox-based event publication, idempotent handlers, and a reconciliation strategy during migration. You need to shut down backdoors. You need to resist the temptation to let routers become brains. You need bounded contexts that are real, not merely labeled.
The memorable version is this: route commands to where the business meaning lives, not where the data happens to sit.
In enterprise architecture, that is the difference between a system that changes state and a system that makes decisions. CQRS is at its best when it helps you tell those apart.
Frequently Asked Questions
What is CQRS?
Command Query Responsibility Segregation separates read and write models. Commands mutate state; queries read from a separate optimised read model. This enables independent scaling of reads and writes and allows different consistency models for each side.
What is the Saga pattern?
A Saga manages long-running transactions across multiple services without distributed ACID transactions. Each step publishes an event; if a step fails, compensating transactions roll back previous steps. Choreography-based sagas use events; orchestration-based sagas use a central coordinator.
What is the outbox pattern?
The transactional outbox pattern solves dual-write problems — ensuring a database update and a message publication happen atomically. The service writes both to its database and an outbox table in one transaction; a relay process reads the outbox and publishes to the message broker.