Event Broker Topology Choices in Event Streaming

⏱ 18 min read

Most event streaming problems do not begin with technology. They begin with an argument nobody realized they were having.

One team says, “We need a central Kafka cluster so every service can publish and subscribe.” Another says, “No, each domain should own its own broker, otherwise we’ll create a platform bottleneck.” A third team, usually carrying scars from an earlier integration program, mutters that all roads lead to a giant shared bus and years of regret.

They are all partly right.

Broker topology is one of those architectural choices that looks operational on the surface but is actually about power, ownership, failure, and language. If you get it wrong, the system still runs—until the organization grows, domains diverge, data residency rules appear, and one noisy workload turns your carefully designed event backbone into a public motorway at rush hour. If you get it right, event streaming becomes an accelerator for domain autonomy, integration resilience, and sensible evolution.

This is the real issue: choosing a broker topology is not just choosing where topics live. It is choosing how bounded contexts collaborate, how operational responsibility is assigned, how replay and reconciliation happen, and how much coupling you are willing to hide under the friendly label of “asynchronous.”

In enterprise systems, topology is destiny.

Context

Event streaming has moved from specialist infrastructure to mainstream enterprise architecture. Kafka, Pulsar, Kinesis, Event Hubs, and similar platforms are now used for domain events, change propagation, analytical pipelines, integration with SaaS platforms, and operational telemetry. Microservices made this more visible, but the underlying problem is older than microservices. We have always needed a way for systems to react to business change without building point-to-point spaghetti. microservices architecture diagrams

What changed is scale and expectation.

Modern enterprises want:

real-time propagation of business events
independent deployability of services
replay for recovery and audit
stream processing close to the source
decoupled consumers
cross-region and cross-country architectures
governance without becoming a central committee

These goals pull in different directions. A single centralized broker topology simplifies discovery, platform engineering, and policy enforcement. A distributed topology aligned to business domains improves autonomy, local optimization, and blast-radius control. Hybrid and hierarchical models try to keep the peace.

This is where domain-driven design matters. If you treat events as generic integration messages, topology becomes a purely infrastructure discussion. If you treat events as expressions of domain semantics—OrderPlaced, PaymentAuthorized, ShipmentDispatched, PolicyBound—then topology must reflect bounded contexts and ownership. A broker is not just a pipe. It becomes part of the language of the enterprise.

And language is where architectures either become coherent or collapse into accidental complexity.

Problem

How should an enterprise organize event brokers and topics across systems, teams, and domains?

The common topology choices are usually some variation of these:

Centralized broker topology

One shared event streaming platform, often one Kafka estate or a logical shared cluster setup, used by many domains and teams. event-driven architecture patterns

Domain-aligned broker topology

Each major bounded context or business platform owns its own broker or cluster, exposing selected event streams outward.

Hybrid or federated topology

Domain-local brokers coexist with a central integration or streaming backbone; events are bridged, replicated, or promoted between layers.

Environment- or geography-segmented topology

Brokers are separated by region, regulatory perimeter, tenant, or operational isolation needs, with selective replication.

At first glance, this seems like a capacity planning or platform engineering question. It is not. It touches:

team boundaries
governance
event semantics
security models
data sovereignty
recovery patterns
change management
platform economics

The wrong answer often manifests slowly. Teams start with a shared cluster because it is easy. Then they put analytics streams, domain events, CDC topics, integration payloads, and dead-letter traffic all in one place. Naming conventions become folklore. Topic ownership becomes ambiguous. One “temporary” schema change breaks five downstream consumers. Platform teams become traffic police.

Or the reverse happens. Every domain gets its own broker. Autonomy looks wonderful for six months. Then cross-domain consumers need ten credentials, twelve client libraries, and a wiki page that reads like a Cold War railway map. Nobody can answer a basic question like, “Where does customer state really come from?” Reconciliation becomes archaeology.

The architecture problem is not centralization versus decentralization. The architecture problem is choosing the right topology for your domain shape, operational maturity, and rate of change.

Forces

A good topology decision balances several forces. Ignore any one of them and the architecture will punish you later.

Domain ownership and bounded contexts

In domain-driven design, bounded contexts define where language is consistent and where models are owned. Event streams should follow that ownership. If multiple teams publish competing truths about the same concept into the same shared space, consumers cannot tell whether they are reading a domain event or a rumor.

A Customer domain should own customer lifecycle events. A Billing domain should own invoice and payment obligation events. A shared broker does not remove this need; it merely makes violations easier.

Consumer discoverability and ease of use

Centralized topologies are attractive because consumers know where to go. There is one platform, one set of security patterns, one standard client stack. This matters in large enterprises where friction kills adoption.

Distributed topologies improve autonomy but increase discovery complexity. Without strong cataloging, schema governance, and clear ownership, teams spend more time locating streams than using them. EA governance checklist

Operational blast radius

A shared broker creates efficiency and concentration risk. One cluster incident can affect dozens of critical business capabilities. One runaway consumer group, one partition imbalance, one misconfigured retention policy, and suddenly half the enterprise is having a bad afternoon.

Domain-local brokers reduce blast radius. They also create more surfaces to patch, observe, scale, and support.

Throughput, latency, and locality

Some domains generate huge volumes—clickstreams, IoT telemetry, fraud scoring features, supply chain tracking. Others generate low-volume but business-critical events. Putting everything in one place may be operationally neat and performance-wise foolish.

Data locality matters too. Cross-region writes, egress costs, and sovereignty regulations often make a single global topology unrealistic.

Governance and schema evolution

Centralized platforms make it easier to standardize schemas, topic naming, retention classes, encryption, and access control. But there is a dark side: central standards can drift into central ownership of business meaning. That is where architecture starts to suffocate delivery.

Governance should constrain the shape of interaction, not confiscate domain responsibility.

Replay, reconciliation, and recovery

Replaying a stream from retention is one thing. Reconciling business truth across bounded contexts is another. Topology influences both.

If all events are centralized, replay is operationally simpler but semantic recovery may be messy because consumers often lean on events beyond their original domain intent. In federated models, local replay is easier to reason about, but cross-domain reconciliation requires deliberate design: snapshots, compaction topics, canonical references, idempotent consumers, and sometimes batch repair processes.

Platform team maturity

A shared enterprise Kafka platform requires serious operational discipline. Multi-tenancy, quotas, schema registry governance, ACL management, client onboarding, SLOs, and cost attribution are not side jobs. If your platform team is thin, a grand shared backbone becomes a heroic fantasy. ArchiMate for governance

Likewise, domain-owned brokers only work if domains can genuinely operate them or if the platform provides self-service automation with strong paved roads.

Solution

My recommendation for most large enterprises is simple: prefer a federated topology, organized around bounded contexts, with a thin central integration layer rather than a giant universal event bus.

That sentence matters because many organizations instinctively choose one extreme.

A purely centralized topology is seductive because it looks efficient. One broker estate, one operations team, one control plane. It is the corporate answer. It also tends to become a semantic landfill unless domain ownership is enforced with uncommon discipline.

A purely decentralized topology fits DDD rhetoric nicely. Every bounded context owns its events and infrastructure. Fine in theory. Expensive and messy in a real enterprise where integration consumers, analytics teams, and operational support need consistency.

The federated model is the practical middle.

Domains own their primary event streams close to the source.
A central platform provides standards, tooling, schema controls, observability, and replication services.
Only selected events are promoted to cross-domain or enterprise integration streams.
Internal domain topics stay private unless there is a good reason to expose them.
Cross-domain contracts are explicit, versioned, and curated.

That is not a compromise in the weak sense. It is a deliberate separation of concerns.

The domain broker is where operational truth is emitted and consumed inside the bounded context. The central integration layer is where business-significant, externally useful events are published for wider use. One is for local cohesion. The other is for enterprise collaboration.

A useful rule of thumb: not every event deserves citizenship outside its home domain.

Architecture

Let’s make the topology options concrete.

1. Centralized topology

This is the classic shared event platform. All major services publish and consume via one central broker estate.

This works well when:

the organization is early in event streaming adoption
the platform team is strong
domains are not yet mature enough to own infrastructure
regulatory boundaries are modest
most consumers benefit from a common access point

But the hidden cost is semantic crowding. A central cluster too easily becomes a place where internal state transitions, CDC topics, integration contracts, and half-baked “events” all coexist without a clear distinction.

2. Domain-aligned topology

Each bounded context owns its own broker or logical cluster boundary. Cross-domain sharing happens through explicit interfaces or replicated topics.

This model is excellent for strong domain ownership and operational isolation. It is weaker for discoverability and enterprise-wide simplicity unless backed by cataloging, standards, and replication automation.

3. Federated topology

This is the pattern I see working best at scale.

In this model:

local events stay local by default
externally relevant events are promoted
integration consumers depend on promoted contracts, not internal noise
platform engineering focuses on policy, tooling, and movement between tiers

This topology aligns better with DDD because it preserves bounded-context ownership while still giving the enterprise a coherent event backbone.

Topic design and semantics

Topology only works if event semantics are clear.

A topic should have a recognizable meaning and owner. If you name topics after technical emitters—service-a-output-v2-final—you have already lost the plot. Better names come from the domain: orders.placed, payments.authorized, shipment.dispatched.

More importantly, distinguish event kinds:

Domain events: meaningful business facts owned by a bounded context
Integration events: curated events intended for external consumption
CDC streams: database change feeds, useful but not automatically business events
Process events: workflow or orchestration milestones
Telemetry: operational signals, not business contracts

A central anti-pattern is mixing these categories with no semantic boundary. That creates accidental consumers and brittle dependencies.

Migration Strategy

No serious enterprise begins with a clean federated topology. They inherit one.

Usually the estate starts in one of two ways:

a shared broker with ad hoc topics and weak ownership
scattered brokers created by independent teams with little standardization

The migration path should be progressive, not revolutionary. This is classic strangler migration thinking.

Start by identifying event ownership

Map key business capabilities and bounded contexts. For each stream, ask:

who owns the business meaning?
who may publish?
who may consume?
is this an internal event or an external contract?
does it represent a fact, a command, or a data export disguised as an event?

This work is not busywork. It is how you stop infrastructure choices from laundering domain confusion.

Introduce a promotion model

A practical migration pattern is to keep existing brokers and introduce the distinction between:

local domain topics
published enterprise topics

Not every local topic gets promoted. Promotion requires meeting standards: stable schema, documented ownership, quality-of-service expectations, and consumer guidance.

Use replication and bridges selectively

Kafka MirrorMaker 2, Cluster Linking, event gateways, or custom bridge services can move streams between broker tiers. Use them for explicit publishing, not as magical synchronization of everything.

If you replicate all topics blindly, you have not designed a topology. You have copied your mess into more places.

Build reconciliation into migration

This is where many event streaming migrations become naive. During topology changes, some consumers will temporarily read from old topics, some from new promoted topics, and some from both. You need reconciliation patterns:

idempotent consumer behavior
event keys and stable business identifiers
duplicate detection windows
periodic snapshot comparison
compensating repair jobs
compacted “current state” topics for reference alignment

Event streaming is not self-healing simply because it is asynchronous. In migration, inconsistency is not a bug; it is a phase. Your architecture must acknowledge it.

Progressive strangler example

A sensible sequence looks like this:

classify existing topics by owner and semantic type
establish schema and ownership metadata
create domain brokers for priority bounded contexts
publish curated integration events to a new backbone
migrate consumers from legacy shared topics to curated contracts
retire ambiguous or duplicate streams
add reconciliation checks until parity is trusted

The point is not to move everything fast. The point is to move meaning into the right place.

Enterprise Example

Consider a global retailer with e-commerce, store operations, supply chain, finance, and customer loyalty platforms.

They started with one large Kafka estate. At first, it was a success. Teams could publish quickly. Data engineering loved the access. New microservices subscribed to order, inventory, and customer topics without asking permission from central integration teams.

Then success became entropy.

The customer-updated topic had four producers: CRM, loyalty, online account management, and an MDM feed. Each used a slightly different schema. Some events represented actual customer changes; others were denormalized snapshots. Inventory topics mixed warehouse stock deltas with website availability projections. Finance consumers subscribed to order events and quietly built revenue logic outside the Finance bounded context.

The topology was centralized, but the real failure was semantic ownership.

The retailer changed course with a federated model:

Commerce, Supply Chain, Finance, and Customer each got domain-aligned broker boundaries.
A central event backbone remained for curated enterprise events.
Domain architecture boards identified authoritative publishers by bounded context.
CDC topics were reclassified as internal technical streams unless explicitly promoted.
Enterprise event catalog entries became mandatory for promoted topics.
Reconciliation jobs compared promoted order and payment events against finance ledger postings daily during transition.

The result was not fewer events. It was fewer ambiguous events.

OrderPlaced remained owned by Commerce. PaymentCaptured belonged to Finance. InventoryReserved was internal to Fulfillment and only InventoryAvailabilityChanged was promoted outward. Customer profile changes were split into domain-specific events rather than one mythical universal customer topic.

Operations improved too. A supply chain traffic spike during holiday season no longer threatened all consumer workloads. Blast radius shrank because high-throughput local processing stayed local. The shared backbone carried business-significant contracts, not every heartbeat of the machine.

That is the real enterprise lesson: topology works when it mirrors organizational responsibility and business language, not just network design.

Operational Considerations

A topology choice is only credible if it can be run in anger.

Multi-tenancy and quotas

Shared environments need hard quotas, retention tiers, partition standards, and client controls. Without them, one team’s “temporary replay” becomes everyone else’s outage.

Observability

You need visibility at three levels:

platform health: brokers, partitions, lag, replication, disk, throughput
contract health: schema compatibility, consumer group behaviors, topic ownership
business health: event freshness, missing business milestones, reconciliation drift

Most teams stop at the first level and then wonder why the business does not trust eventing.

Security and access control

Centralized topologies simplify policy distribution but broaden the risk surface. Domain-aligned topologies support least-privilege access more naturally, especially when events include regulated data.

For enterprise-grade streaming, access should reflect domain contracts, not just topic patterns.

Data retention and compaction

Retention policies should match event purpose. Domain event streams used for replay need different policies than transient integration notifications or large raw CDC logs. Compacted topics are useful for reference state and reconciliation but should not become a lazy substitute for proper event modeling.

Schema management

Use schema registries, compatibility rules, and publication standards. More importantly, ensure schema review checks business semantics, not just Avro syntax or JSON shape. A technically compatible schema can still be semantically destructive.

Tradeoffs

Every topology pays somewhere.

Centralized topology tradeoffs

Benefits

simple onboarding
unified tooling
easier enterprise governance
efficient platform team model

Costs

larger blast radius
semantic crowding
shared-cluster contention
risk of hidden coupling
harder domain isolation

Domain-aligned topology tradeoffs

Benefits

strong ownership
bounded-context alignment
smaller operational failures
local optimization

Costs

higher operational overhead
harder discovery
cross-domain consumer complexity
more replication and policy plumbing

Federated topology tradeoffs

Benefits

balances autonomy and consistency
keeps local streams local
enables curated enterprise contracts
supports progressive migration

Costs

more architectural discipline required
bridge and replication design adds complexity
ownership disputes become explicit
tooling investment is non-trivial

This last point matters. Federated topology is not a cheap default. It is the best option for many enterprises because it makes the difficult things visible instead of burying them inside a shared cluster.

Failure Modes

Architectures usually fail in familiar ways.

The universal bus failure

Everything goes onto one shared backbone. Topic count explodes. Ownership fades. Consumers bind to internal implementation events. A schema change becomes a political incident. The platform team becomes a central bottleneck.

The broker-per-team failure

Every team gets autonomy and promptly creates a local kingdom. Cross-domain integration becomes painful. Duplicate events proliferate. Nobody can trace end-to-end business flow without custom glue.

The replication-everything failure

Organizations attempt federation by mirroring all topics across all clusters. Costs rise, semantics blur, and incident diagnosis becomes miserable because nobody knows which copy is authoritative.

The CDC-is-an-event-model failure

Database change streams are useful. They are not a substitute for domain events. When CDC becomes the integration contract, consumers inherit storage semantics instead of business meaning.

The no-reconciliation failure

During migration or recovery, teams assume replay is enough. It is not. Ordering gaps, duplicate deliveries, missed promotions, and consumer bugs leave divergent states. Without reconciliation, inconsistency just becomes institutionalized.

When Not To Use

Not every organization needs a sophisticated broker topology strategy.

Do not invest heavily in federated event broker design when:

you have a small system with a handful of services
event volume and organizational scale are low
you do not yet have clear bounded contexts
your platform engineering capability is immature
your primary need is simple asynchronous messaging, not streaming
governance discipline is weak and unlikely to improve soon

In these cases, a well-run centralized broker is often the right starting point. Better a simple shared topology with clear ownership than an ambitious federation nobody can operate.

Likewise, if your use case is mainly command messaging, transactional workflows, or low-volume integration, a message broker or workflow engine may be a better fit than Kafka-style event streaming. Not every asynchronous problem deserves a log.

Several patterns sit naturally alongside broker topology decisions.

Bounded Context: defines who owns event meaning
Published Language: promoted integration events should use a stable shared language
Anti-Corruption Layer: protects a domain from external semantic leakage
Outbox Pattern: reliable publication from transactional systems
CQRS: consumer projections built from event streams
Event Carried State Transfer: useful, but dangerous if overused across contexts
Strangler Fig Pattern: progressive migration from shared legacy topology
Saga / Process Manager: orchestration across domains when events alone are insufficient
Reconciliation Pipeline: compares and repairs state across systems after asynchronous divergence

These patterns matter because topology alone does not solve coupling. It merely shapes where coupling hides.

Summary

Broker topology is one of the quiet decisions that eventually becomes a loud problem.

A centralized event streaming platform gives you speed, convenience, and governance leverage. It also invites semantic chaos if ownership is weak. A domain-aligned topology gives you autonomy and bounded-context integrity, but can become fragmented and hard to consume at enterprise scale. A federated topology, in my view, is the best fit for most large organizations: domain-local brokers for local truth, a curated integration backbone for cross-domain collaboration, and explicit promotion of events that deserve broader use.

The key is to think in domain terms, not just infrastructure terms.

Ask who owns the fact. Ask whether an event is local or published. Ask how consumers will reconcile when reality drifts. Ask what fails together. Ask what should never have been coupled in the first place.

Because event streaming architecture is not really about brokers.

It is about deciding how truth moves through the enterprise without losing its meaning on the way.

Frequently Asked Questions

What is event-driven architecture?

Event-driven architecture (EDA) decouples services by having producers publish events to a broker like Kafka, while consumers subscribe independently. This reduces direct coupling, improves resilience, and allows new consumers to be added without modifying producers.

When should you use Kafka vs a message queue?

Use Kafka when you need event replay, high throughput, long retention, or multiple independent consumers reading the same stream. Use a traditional message queue (RabbitMQ, SQS) when you need simple point-to-point delivery, low latency, or complex routing logic per message.

How do you model event-driven architecture in ArchiMate?

In ArchiMate, the Kafka broker is a Technology Service or Application Component. Topics are Data Objects or Application Services. Producer/consumer services are Application Components connected via Flow relationships. This makes the event topology explicit and queryable.

Context

Problem

Forces

Domain ownership and bounded contexts

Consumer discoverability and ease of use

Operational blast radius

Throughput, latency, and locality

Governance and schema evolution

Replay, reconciliation, and recovery

Platform team maturity

Solution

Architecture

1. Centralized topology

2. Domain-aligned topology

3. Federated topology

Topic design and semantics

Migration Strategy

Start by identifying event ownership

Introduce a promotion model

Use replication and bridges selectively

Build reconciliation into migration

Progressive strangler example

Enterprise Example

Operational Considerations

Multi-tenancy and quotas

Observability

Security and access control

Data retention and compaction

Schema management

Tradeoffs

Centralized topology tradeoffs

Domain-aligned topology tradeoffs

Federated topology tradeoffs

Failure Modes

The universal bus failure

The broker-per-team failure

The replication-everything failure

The CDC-is-an-event-model failure

The no-reconciliation failure

When Not To Use

Related Patterns

Summary

Frequently Asked Questions

What is event-driven architecture?

When should you use Kafka vs a message queue?

How do you model event-driven architecture in ArchiMate?