⏱ 18 min read
Most event streaming problems do not begin with technology. They begin with an argument nobody realized they were having.
One team says, “We need a central Kafka cluster so every service can publish and subscribe.” Another says, “No, each domain should own its own broker, otherwise we’ll create a platform bottleneck.” A third team, usually carrying scars from an earlier integration program, mutters that all roads lead to a giant shared bus and years of regret.
They are all partly right.
Broker topology is one of those architectural choices that looks operational on the surface but is actually about power, ownership, failure, and language. If you get it wrong, the system still runs—until the organization grows, domains diverge, data residency rules appear, and one noisy workload turns your carefully designed event backbone into a public motorway at rush hour. If you get it right, event streaming becomes an accelerator for domain autonomy, integration resilience, and sensible evolution.
This is the real issue: choosing a broker topology is not just choosing where topics live. It is choosing how bounded contexts collaborate, how operational responsibility is assigned, how replay and reconciliation happen, and how much coupling you are willing to hide under the friendly label of “asynchronous.”
In enterprise systems, topology is destiny.
Context
Event streaming has moved from specialist infrastructure to mainstream enterprise architecture. Kafka, Pulsar, Kinesis, Event Hubs, and similar platforms are now used for domain events, change propagation, analytical pipelines, integration with SaaS platforms, and operational telemetry. Microservices made this more visible, but the underlying problem is older than microservices. We have always needed a way for systems to react to business change without building point-to-point spaghetti. microservices architecture diagrams
What changed is scale and expectation.
Modern enterprises want:
- real-time propagation of business events
- independent deployability of services
- replay for recovery and audit
- stream processing close to the source
- decoupled consumers
- cross-region and cross-country architectures
- governance without becoming a central committee
These goals pull in different directions. A single centralized broker topology simplifies discovery, platform engineering, and policy enforcement. A distributed topology aligned to business domains improves autonomy, local optimization, and blast-radius control. Hybrid and hierarchical models try to keep the peace.
This is where domain-driven design matters. If you treat events as generic integration messages, topology becomes a purely infrastructure discussion. If you treat events as expressions of domain semantics—OrderPlaced, PaymentAuthorized, ShipmentDispatched, PolicyBound—then topology must reflect bounded contexts and ownership. A broker is not just a pipe. It becomes part of the language of the enterprise.
And language is where architectures either become coherent or collapse into accidental complexity.
Problem
How should an enterprise organize event brokers and topics across systems, teams, and domains?
The common topology choices are usually some variation of these:
- Centralized broker topology
One shared event streaming platform, often one Kafka estate or a logical shared cluster setup, used by many domains and teams. event-driven architecture patterns
- Domain-aligned broker topology
Each major bounded context or business platform owns its own broker or cluster, exposing selected event streams outward.
- Hybrid or federated topology
Domain-local brokers coexist with a central integration or streaming backbone; events are bridged, replicated, or promoted between layers.
- Environment- or geography-segmented topology
Brokers are separated by region, regulatory perimeter, tenant, or operational isolation needs, with selective replication.
At first glance, this seems like a capacity planning or platform engineering question. It is not. It touches:
- team boundaries
- governance
- event semantics
- security models
- data sovereignty
- recovery patterns
- change management
- platform economics
The wrong answer often manifests slowly. Teams start with a shared cluster because it is easy. Then they put analytics streams, domain events, CDC topics, integration payloads, and dead-letter traffic all in one place. Naming conventions become folklore. Topic ownership becomes ambiguous. One “temporary” schema change breaks five downstream consumers. Platform teams become traffic police.
Or the reverse happens. Every domain gets its own broker. Autonomy looks wonderful for six months. Then cross-domain consumers need ten credentials, twelve client libraries, and a wiki page that reads like a Cold War railway map. Nobody can answer a basic question like, “Where does customer state really come from?” Reconciliation becomes archaeology.
The architecture problem is not centralization versus decentralization. The architecture problem is choosing the right topology for your domain shape, operational maturity, and rate of change.
Forces
A good topology decision balances several forces. Ignore any one of them and the architecture will punish you later.
Domain ownership and bounded contexts
In domain-driven design, bounded contexts define where language is consistent and where models are owned. Event streams should follow that ownership. If multiple teams publish competing truths about the same concept into the same shared space, consumers cannot tell whether they are reading a domain event or a rumor.
A Customer domain should own customer lifecycle events. A Billing domain should own invoice and payment obligation events. A shared broker does not remove this need; it merely makes violations easier.
Consumer discoverability and ease of use
Centralized topologies are attractive because consumers know where to go. There is one platform, one set of security patterns, one standard client stack. This matters in large enterprises where friction kills adoption.
Distributed topologies improve autonomy but increase discovery complexity. Without strong cataloging, schema governance, and clear ownership, teams spend more time locating streams than using them. EA governance checklist
Operational blast radius
A shared broker creates efficiency and concentration risk. One cluster incident can affect dozens of critical business capabilities. One runaway consumer group, one partition imbalance, one misconfigured retention policy, and suddenly half the enterprise is having a bad afternoon.
Domain-local brokers reduce blast radius. They also create more surfaces to patch, observe, scale, and support.
Throughput, latency, and locality
Some domains generate huge volumes—clickstreams, IoT telemetry, fraud scoring features, supply chain tracking. Others generate low-volume but business-critical events. Putting everything in one place may be operationally neat and performance-wise foolish.
Data locality matters too. Cross-region writes, egress costs, and sovereignty regulations often make a single global topology unrealistic.
Governance and schema evolution
Centralized platforms make it easier to standardize schemas, topic naming, retention classes, encryption, and access control. But there is a dark side: central standards can drift into central ownership of business meaning. That is where architecture starts to suffocate delivery.
Governance should constrain the shape of interaction, not confiscate domain responsibility.
Replay, reconciliation, and recovery
Replaying a stream from retention is one thing. Reconciling business truth across bounded contexts is another. Topology influences both.
If all events are centralized, replay is operationally simpler but semantic recovery may be messy because consumers often lean on events beyond their original domain intent. In federated models, local replay is easier to reason about, but cross-domain reconciliation requires deliberate design: snapshots, compaction topics, canonical references, idempotent consumers, and sometimes batch repair processes.
Platform team maturity
A shared enterprise Kafka platform requires serious operational discipline. Multi-tenancy, quotas, schema registry governance, ACL management, client onboarding, SLOs, and cost attribution are not side jobs. If your platform team is thin, a grand shared backbone becomes a heroic fantasy. ArchiMate for governance
Likewise, domain-owned brokers only work if domains can genuinely operate them or if the platform provides self-service automation with strong paved roads.
Solution
My recommendation for most large enterprises is simple: prefer a federated topology, organized around bounded contexts, with a thin central integration layer rather than a giant universal event bus.
That sentence matters because many organizations instinctively choose one extreme.
A purely centralized topology is seductive because it looks efficient. One broker estate, one operations team, one control plane. It is the corporate answer. It also tends to become a semantic landfill unless domain ownership is enforced with uncommon discipline.
A purely decentralized topology fits DDD rhetoric nicely. Every bounded context owns its events and infrastructure. Fine in theory. Expensive and messy in a real enterprise where integration consumers, analytics teams, and operational support need consistency.
The federated model is the practical middle.
- Domains own their primary event streams close to the source.
- A central platform provides standards, tooling, schema controls, observability, and replication services.
- Only selected events are promoted to cross-domain or enterprise integration streams.
- Internal domain topics stay private unless there is a good reason to expose them.
- Cross-domain contracts are explicit, versioned, and curated.
That is not a compromise in the weak sense. It is a deliberate separation of concerns.
The domain broker is where operational truth is emitted and consumed inside the bounded context. The central integration layer is where business-significant, externally useful events are published for wider use. One is for local cohesion. The other is for enterprise collaboration.
A useful rule of thumb: not every event deserves citizenship outside its home domain.
Architecture
Let’s make the topology options concrete.
1. Centralized topology
This is the classic shared event platform. All major services publish and consume via one central broker estate.
This works well when:
- the organization is early in event streaming adoption
- the platform team is strong
- domains are not yet mature enough to own infrastructure
- regulatory boundaries are modest
- most consumers benefit from a common access point
But the hidden cost is semantic crowding. A central cluster too easily becomes a place where internal state transitions, CDC topics, integration contracts, and half-baked “events” all coexist without a clear distinction.
2. Domain-aligned topology
Each bounded context owns its own broker or logical cluster boundary. Cross-domain sharing happens through explicit interfaces or replicated topics.
This model is excellent for strong domain ownership and operational isolation. It is weaker for discoverability and enterprise-wide simplicity unless backed by cataloging, standards, and replication automation.
3. Federated topology
This is the pattern I see working best at scale.
In this model:
- local events stay local by default
- externally relevant events are promoted
- integration consumers depend on promoted contracts, not internal noise
- platform engineering focuses on policy, tooling, and movement between tiers
This topology aligns better with DDD because it preserves bounded-context ownership while still giving the enterprise a coherent event backbone.
Topic design and semantics
Topology only works if event semantics are clear.
A topic should have a recognizable meaning and owner. If you name topics after technical emitters—service-a-output-v2-final—you have already lost the plot. Better names come from the domain: orders.placed, payments.authorized, shipment.dispatched.
More importantly, distinguish event kinds:
- Domain events: meaningful business facts owned by a bounded context
- Integration events: curated events intended for external consumption
- CDC streams: database change feeds, useful but not automatically business events
- Process events: workflow or orchestration milestones
- Telemetry: operational signals, not business contracts
A central anti-pattern is mixing these categories with no semantic boundary. That creates accidental consumers and brittle dependencies.
Migration Strategy
No serious enterprise begins with a clean federated topology. They inherit one.
Usually the estate starts in one of two ways:
- a shared broker with ad hoc topics and weak ownership
- scattered brokers created by independent teams with little standardization
The migration path should be progressive, not revolutionary. This is classic strangler migration thinking.
Start by identifying event ownership
Map key business capabilities and bounded contexts. For each stream, ask:
- who owns the business meaning?
- who may publish?
- who may consume?
- is this an internal event or an external contract?
- does it represent a fact, a command, or a data export disguised as an event?
This work is not busywork. It is how you stop infrastructure choices from laundering domain confusion.
Introduce a promotion model
A practical migration pattern is to keep existing brokers and introduce the distinction between:
- local domain topics
- published enterprise topics
Not every local topic gets promoted. Promotion requires meeting standards: stable schema, documented ownership, quality-of-service expectations, and consumer guidance.
Use replication and bridges selectively
Kafka MirrorMaker 2, Cluster Linking, event gateways, or custom bridge services can move streams between broker tiers. Use them for explicit publishing, not as magical synchronization of everything.
If you replicate all topics blindly, you have not designed a topology. You have copied your mess into more places.
Build reconciliation into migration
This is where many event streaming migrations become naive. During topology changes, some consumers will temporarily read from old topics, some from new promoted topics, and some from both. You need reconciliation patterns:
- idempotent consumer behavior
- event keys and stable business identifiers
- duplicate detection windows
- periodic snapshot comparison
- compensating repair jobs
- compacted “current state” topics for reference alignment
Event streaming is not self-healing simply because it is asynchronous. In migration, inconsistency is not a bug; it is a phase. Your architecture must acknowledge it.
Progressive strangler example
A sensible sequence looks like this:
- classify existing topics by owner and semantic type
- establish schema and ownership metadata
- create domain brokers for priority bounded contexts
- publish curated integration events to a new backbone
- migrate consumers from legacy shared topics to curated contracts
- retire ambiguous or duplicate streams
- add reconciliation checks until parity is trusted
The point is not to move everything fast. The point is to move meaning into the right place.
Enterprise Example
Consider a global retailer with e-commerce, store operations, supply chain, finance, and customer loyalty platforms.
They started with one large Kafka estate. At first, it was a success. Teams could publish quickly. Data engineering loved the access. New microservices subscribed to order, inventory, and customer topics without asking permission from central integration teams.
Then success became entropy.
The customer-updated topic had four producers: CRM, loyalty, online account management, and an MDM feed. Each used a slightly different schema. Some events represented actual customer changes; others were denormalized snapshots. Inventory topics mixed warehouse stock deltas with website availability projections. Finance consumers subscribed to order events and quietly built revenue logic outside the Finance bounded context.
The topology was centralized, but the real failure was semantic ownership.
The retailer changed course with a federated model:
- Commerce, Supply Chain, Finance, and Customer each got domain-aligned broker boundaries.
- A central event backbone remained for curated enterprise events.
- Domain architecture boards identified authoritative publishers by bounded context.
- CDC topics were reclassified as internal technical streams unless explicitly promoted.
- Enterprise event catalog entries became mandatory for promoted topics.
- Reconciliation jobs compared promoted order and payment events against finance ledger postings daily during transition.
The result was not fewer events. It was fewer ambiguous events.
OrderPlaced remained owned by Commerce. PaymentCaptured belonged to Finance. InventoryReserved was internal to Fulfillment and only InventoryAvailabilityChanged was promoted outward. Customer profile changes were split into domain-specific events rather than one mythical universal customer topic.
Operations improved too. A supply chain traffic spike during holiday season no longer threatened all consumer workloads. Blast radius shrank because high-throughput local processing stayed local. The shared backbone carried business-significant contracts, not every heartbeat of the machine.
That is the real enterprise lesson: topology works when it mirrors organizational responsibility and business language, not just network design.
Operational Considerations
A topology choice is only credible if it can be run in anger.
Multi-tenancy and quotas
Shared environments need hard quotas, retention tiers, partition standards, and client controls. Without them, one team’s “temporary replay” becomes everyone else’s outage.
Observability
You need visibility at three levels:
- platform health: brokers, partitions, lag, replication, disk, throughput
- contract health: schema compatibility, consumer group behaviors, topic ownership
- business health: event freshness, missing business milestones, reconciliation drift
Most teams stop at the first level and then wonder why the business does not trust eventing.
Security and access control
Centralized topologies simplify policy distribution but broaden the risk surface. Domain-aligned topologies support least-privilege access more naturally, especially when events include regulated data.
For enterprise-grade streaming, access should reflect domain contracts, not just topic patterns.
Data retention and compaction
Retention policies should match event purpose. Domain event streams used for replay need different policies than transient integration notifications or large raw CDC logs. Compacted topics are useful for reference state and reconciliation but should not become a lazy substitute for proper event modeling.
Schema management
Use schema registries, compatibility rules, and publication standards. More importantly, ensure schema review checks business semantics, not just Avro syntax or JSON shape. A technically compatible schema can still be semantically destructive.
Tradeoffs
Every topology pays somewhere.
Centralized topology tradeoffs
Benefits
- simple onboarding
- unified tooling
- easier enterprise governance
- efficient platform team model
Costs
- larger blast radius
- semantic crowding
- shared-cluster contention
- risk of hidden coupling
- harder domain isolation
Domain-aligned topology tradeoffs
Benefits
- strong ownership
- bounded-context alignment
- smaller operational failures
- local optimization
Costs
- higher operational overhead
- harder discovery
- cross-domain consumer complexity
- more replication and policy plumbing
Federated topology tradeoffs
Benefits
- balances autonomy and consistency
- keeps local streams local
- enables curated enterprise contracts
- supports progressive migration
Costs
- more architectural discipline required
- bridge and replication design adds complexity
- ownership disputes become explicit
- tooling investment is non-trivial
This last point matters. Federated topology is not a cheap default. It is the best option for many enterprises because it makes the difficult things visible instead of burying them inside a shared cluster.
Failure Modes
Architectures usually fail in familiar ways.
The universal bus failure
Everything goes onto one shared backbone. Topic count explodes. Ownership fades. Consumers bind to internal implementation events. A schema change becomes a political incident. The platform team becomes a central bottleneck.
The broker-per-team failure
Every team gets autonomy and promptly creates a local kingdom. Cross-domain integration becomes painful. Duplicate events proliferate. Nobody can trace end-to-end business flow without custom glue.
The replication-everything failure
Organizations attempt federation by mirroring all topics across all clusters. Costs rise, semantics blur, and incident diagnosis becomes miserable because nobody knows which copy is authoritative.
The CDC-is-an-event-model failure
Database change streams are useful. They are not a substitute for domain events. When CDC becomes the integration contract, consumers inherit storage semantics instead of business meaning.
The no-reconciliation failure
During migration or recovery, teams assume replay is enough. It is not. Ordering gaps, duplicate deliveries, missed promotions, and consumer bugs leave divergent states. Without reconciliation, inconsistency just becomes institutionalized.
When Not To Use
Not every organization needs a sophisticated broker topology strategy.
Do not invest heavily in federated event broker design when:
- you have a small system with a handful of services
- event volume and organizational scale are low
- you do not yet have clear bounded contexts
- your platform engineering capability is immature
- your primary need is simple asynchronous messaging, not streaming
- governance discipline is weak and unlikely to improve soon
In these cases, a well-run centralized broker is often the right starting point. Better a simple shared topology with clear ownership than an ambitious federation nobody can operate.
Likewise, if your use case is mainly command messaging, transactional workflows, or low-volume integration, a message broker or workflow engine may be a better fit than Kafka-style event streaming. Not every asynchronous problem deserves a log.
Related Patterns
Several patterns sit naturally alongside broker topology decisions.
- Bounded Context: defines who owns event meaning
- Published Language: promoted integration events should use a stable shared language
- Anti-Corruption Layer: protects a domain from external semantic leakage
- Outbox Pattern: reliable publication from transactional systems
- CQRS: consumer projections built from event streams
- Event Carried State Transfer: useful, but dangerous if overused across contexts
- Strangler Fig Pattern: progressive migration from shared legacy topology
- Saga / Process Manager: orchestration across domains when events alone are insufficient
- Reconciliation Pipeline: compares and repairs state across systems after asynchronous divergence
These patterns matter because topology alone does not solve coupling. It merely shapes where coupling hides.
Summary
Broker topology is one of the quiet decisions that eventually becomes a loud problem.
A centralized event streaming platform gives you speed, convenience, and governance leverage. It also invites semantic chaos if ownership is weak. A domain-aligned topology gives you autonomy and bounded-context integrity, but can become fragmented and hard to consume at enterprise scale. A federated topology, in my view, is the best fit for most large organizations: domain-local brokers for local truth, a curated integration backbone for cross-domain collaboration, and explicit promotion of events that deserve broader use.
The key is to think in domain terms, not just infrastructure terms.
Ask who owns the fact. Ask whether an event is local or published. Ask how consumers will reconcile when reality drifts. Ask what fails together. Ask what should never have been coupled in the first place.
Because event streaming architecture is not really about brokers.
It is about deciding how truth moves through the enterprise without losing its meaning on the way.
Frequently Asked Questions
What is event-driven architecture?
Event-driven architecture (EDA) decouples services by having producers publish events to a broker like Kafka, while consumers subscribe independently. This reduces direct coupling, improves resilience, and allows new consumers to be added without modifying producers.
When should you use Kafka vs a message queue?
Use Kafka when you need event replay, high throughput, long retention, or multiple independent consumers reading the same stream. Use a traditional message queue (RabbitMQ, SQS) when you need simple point-to-point delivery, low latency, or complex routing logic per message.
How do you model event-driven architecture in ArchiMate?
In ArchiMate, the Kafka broker is a Technology Service or Application Component. Topics are Data Objects or Application Services. Producer/consumer services are Application Components connected via Flow relationships. This makes the event topology explicit and queryable.