Your Event Bus Became Your Monolith

⏱ 19 min read

There is a particular kind of architecture failure that looks modern right up until the day it doesn’t.

It starts with good intentions. Teams want autonomy. They want decoupling. They want to stop making direct service-to-service calls that turn every change into a negotiation. So they introduce an event bus. Kafka, perhaps. Or a cloud pub/sub platform. They publish domain events. They subscribe independently. They celebrate the disappearance of synchronous dependencies. event-driven architecture patterns

Then, a year later, the event bus is no longer plumbing. It is government.

Every meaningful business change now passes through shared topics, shared schemas, shared retention policies, shared routing conventions, shared dead-letter handling, shared enrichment steps, shared platform teams, and shared operational rituals. A payment change cannot ship until six consumers agree. A customer event cannot evolve without a data council. A retry storm in one part of the landscape ripples into others. The bus has become the place where business meaning goes to be flattened into infrastructure concerns.

The old monolith used to centralize code. The new one centralizes dependency.

That is the heart of central dependency architecture: a system that appears distributed at runtime, but is organizationally and semantically centralized through a shared integration backbone. You can have fifty microservices and still have one monolith. It just happens to be made of topics, contracts, and coordination meetings.

This article is about that failure mode. More importantly, it is about what to do instead.

The short version is simple: an event bus is an excellent integration mechanism, but a poor center of gravity. If your business model, domain semantics, and service responsibilities are being pulled into the bus, you have not eliminated coupling. You have displaced it into a harder-to-see place.

Context

The rise of event-driven architecture came from a real problem. Enterprises built tightly coupled systems around request-response interactions and centralized databases. Everything knew too much about everything else. Change propagated in ugly ways. Release trains became brittle. Scaling one capability meant scaling five.

Event streaming platforms promised a cleaner model. Services would own their own data. Changes in state would be emitted as events. Downstream consumers would react asynchronously. Teams could move independently. Analytics, operational systems, search, notifications, fraud, and compliance could all subscribe without burdening the source system.

That promise is still valid.

But a lot of organizations confuse using events with designing around domains. They replace chatty APIs with chatty topics. They move from direct coordination to indirect coordination. And because event infrastructure feels neutral, they start treating it as a universal integration surface. Every cross-cutting concern gets implemented there. Every new initiative gets solved with “publish another event.”

It is the same old enterprise instinct. We centralize where we do not yet trust local ownership.

This is where domain-driven design matters. DDD is not a naming exercise. It is a discipline for deciding where meaning lives. If bounded contexts are weak, then your event bus becomes the only place left where systems can discover one another’s truths. And once the bus becomes the place where truths are negotiated, it becomes the architecture.

Problem

The core problem is not Kafka. Kafka is usually doing exactly what it was built to do.

The problem is this: the organization uses the event bus as a shared semantic dependency hub, not merely as a transport. Services become producers and consumers of a common enterprise event model. Business workflows spread across too many subscribers. Ownership dissolves. Versioning becomes political. Infrastructure teams become de facto domain arbiters. Runtime decoupling masks design-time coupling.

A few symptoms are reliably present:

  • Core topics are consumed by dozens of services.
  • Event schemas are broad, generic, and enterprise-wide.
  • Teams ask “who else consumes this?” before changing local behavior.
  • The platform team becomes a change approval board.
  • Retry and replay semantics are not domain-specific; they are infrastructure-default.
  • Consumers depend on event ordering and delivery assumptions they do not control.
  • Reconciliation jobs become the hidden backbone of correctness.
  • End-to-end business flows cannot be understood without tracing many unrelated subscriptions.

At that point, the bus is not connecting bounded contexts. It is replacing them.

A real monolith has one advantage: at least its coupling is visible in the codebase. Event-bus monoliths hide coupling in topology, timing, schemas, and operator knowledge. They are harder to reason about, harder to test, and much harder to migrate.

Forces

Architects get into this situation because the forces are real and often contradictory.

Team autonomy vs enterprise consistency

Teams want the freedom to evolve independently. Meanwhile, enterprises want a standard integration model, a common audit trail, shared security controls, and lower connection costs. An event bus appears to satisfy both. But the more consistency you push into the bus, the less autonomy remains in the services.

Domain semantics vs canonical models

There is always pressure to create a canonical enterprise event format: customer, order, product, payment. It sounds efficient. In practice, canonical models usually erase context.

A CustomerUpdated event means one thing in CRM, another in billing, and another in identity management. A bounded context needs language that fits its own invariants. Canonical events often become vague enough to be widely shared and precise enough for no one.

Flexibility vs correctness

Events allow consumers to build their own projections. That is useful. But when every capability depends on consuming and interpreting another context’s events correctly, correctness becomes distributed. You no longer have one transaction boundary or one place to enforce invariants. You have a chain of assumptions and lag.

Platform efficiency vs local responsibility

A central platform team can provide brokers, schema registries, security, and observability. Good. But when it also defines topic strategy, event taxonomy, retention defaults, replay policy, and consumer conventions for everyone, enterprise architecture leaks into platform governance. The platform starts making domain decisions accidentally. EA governance checklist

Temporal decoupling vs operational uncertainty

Asynchronous systems absorb spikes and isolate failures. They also create delay, duplication, reordering, replay hazards, poison messages, and ambiguous states. If business processes cannot tolerate those conditions, an evented integration style may not be the right center of the design.

Solution

The solution is not “stop using an event bus.” That would be lazy advice.

The solution is to demote the bus.

Make the event platform a transport and integration capability, not the primary home of enterprise semantics. Put business meaning back into bounded contexts. Let services publish events that reflect their own domain language, not a watered-down enterprise abstraction. Use events for notification and state propagation, but not as a substitute for explicit ownership.

A useful rule is this:

> Services should own decisions. Events should report facts. The bus should not own the workflow.

That means several architectural moves.

1. Design bounded contexts before topic hierarchies

If your first design artifact is the Kafka topic map, you are already in trouble. Start with domain boundaries, aggregates, upstream/downstream relationships, and business invariants. Ask which context owns which decision. Then choose integration patterns.

Not every relationship should be event-driven. Some are commands. Some are synchronous queries. Some are batch extracts. Some should not be integrated at all.

2. Prefer context-specific events over enterprise-wide canonical events

A shipping context may publish ShipmentDispatched. A billing context may publish InvoiceIssued. A customer support context may publish CaseEscalated. These are meaningful because they arise from a specific model and a specific owner.

Do not force all of them into some canonical BusinessObjectChanged family. Generic events spread ignorance.

3. Keep orchestration close to domain ownership

If a business process spans contexts, someone must be responsible for its state and outcomes. That may be a process manager, a workflow component, or a domain service. But it should exist in a bounded context, not in the emergent behavior of twelve independent subscribers.

Choreography is elegant in small doses. In large enterprises, it often becomes gossip with side effects.

4. Use published language, not shared internals

An event should expose what the producing context is willing to publish, not leak all internal state because “someone might need it.” Fat events create accidental coupling. Thin but meaningful events, plus well-defined APIs or data products where needed, create healthier relationships.

5. Build reconciliation as a first-class capability

Distributed event processing will drift. Messages will fail, arrive late, be replayed, or be interpreted against newer rules. So stop pretending eventual consistency is free. Design reconciliation explicitly: authoritative sources, replay windows, mismatch detection, compensations, and operational ownership.

This is not a side utility. It is part of the architecture.

Architecture

Here is the contrast in simple form.

Architecture
Architecture

This diagram looks decoupled. It is often the opposite. The center is overloaded with policy, semantics, and coordination. Every edge is cheap to add, so dependencies accumulate. The bus becomes a dependency magnet.

A healthier architecture looks more like this:

Diagram 2
Your Event Bus Became Your Monolith

Notice the difference. The cross-context process has an owner. The bus still exists, but it is not the brain. Domain responsibilities remain anchored in contexts.

Domain semantics and event design

This is the part many teams skip because infrastructure feels more concrete than semantics.

An event is a statement about something that happened in a context. It should carry meaning that is stable enough for other contexts to react to, but not so much internals that the producer is trapped forever.

For example:

  • OrderPlaced in Sales means the customer committed to purchase intent according to Sales rules.
  • PaymentAuthorized in Billing means funds were reserved according to Billing rules.
  • ShipmentDispatched in Fulfillment means handoff occurred to a carrier under Fulfillment rules.

Those events are not interchangeable. They do not represent “status updates” on a shared enterprise object. They are milestones in different models.

This matters because consumers should react to the published language, not infer hidden state transitions. If another context needs richer information, give it an explicit integration path rather than bloating the event until it becomes a shadow database record.

Kafka in the right role

Kafka is excellent when you need durable logs, replay, partitioned scaling, and independent consumers. It is particularly strong for:

  • immutable event streams
  • operational and analytical fan-out
  • CQRS projections
  • integration with stream processors
  • buffering high-throughput change propagation

Kafka is much weaker as a place to centralize business workflow logic. Once teams rely on topic ordering across entities, cross-topic transaction assumptions, or shared interpretation of broad schemas, they are fighting the platform.

Kafka gives you an append-only log. It does not give you domain boundaries. Architects must supply those.

Migration Strategy

Most enterprises do not get to redesign from scratch. They already have the event bus monolith. The migration question is how to reduce central dependency without breaking the estate.

This is where a progressive strangler migration is the practical answer. Not a revolution. A sequence of ownership corrections.

Step 1: Map semantic hotspots

Start by identifying topics and events with the highest coordination cost:

  • most consumers
  • most schema version disputes
  • highest incident correlation
  • greatest replay sensitivity
  • business-critical workflows spanning many subscribers

These are your semantic hotspots. They are usually where enterprise-wide “customer,” “order,” or “account” topics have become universal dependencies.

Step 2: Identify true bounded context owners

For each hotspot, ask uncomfortable questions:

  • Which team truly owns this decision?
  • Which invariants belong together?
  • Which consumers are using the event as a convenient data feed rather than a genuine business trigger?
  • Which downstream dependencies should become APIs, replicated read models, or separate data products?

This is DDD in anger, not in workshop form.

Step 3: Introduce anti-corruption layers

You rarely can cut all consumers over at once. So create anti-corruption layers around legacy central topics. Let new or refactored services consume context-specific events and translate as needed for old consumers. Over time, move consumers off the enterprise-wide topics.

Step 4: Pull orchestration out of emergent choreography

Where a business process is currently spread across many listeners, establish an explicit process owner. This might be a saga orchestrator, process manager, or workflow service within a domain context. It can still use events, but it now makes the state transitions visible and testable.

Step 5: Add reconciliation before aggressive decoupling

Migration creates temporary duplication and ambiguity. Introduce reconciliation early:

  • source of truth comparisons
  • idempotency keys
  • drift reports
  • replay procedures
  • compensation workflows

Without reconciliation, migration will look successful until financial or compliance mismatches surface three months later.

Step 6: Retire canonical topics gradually

Do not try to switch off major topics in one move. Freeze them first. Stop adding new consumers. Publish replacement context-specific events. Provide migration adapters. Measure consumer reduction. Then decommission.

Here is a practical migration shape:

Step 6: Retire canonical topics gradually
Retire canonical topics gradually

This is not glamorous. Good migration rarely is. It is a campaign of reducing ambiguity.

Enterprise Example

Consider a global retailer with online commerce, stores, loyalty, finance, warehouse management, and customer support. Over several years, they standardized on Kafka. A central architecture team created enterprise topics: Customer, Order, Product, Inventory, Payment, each with broad Avro schemas and organization-wide governance.

Initially, this looked like success. Teams integrated faster than before. Search indexed customer and order changes. Marketing subscribed to customer updates. Fraud subscribed to payment and order events. Stores consumed inventory. Finance built reporting streams. Support built timelines. Everyone was happy because everyone could subscribe.

Then scale and change hit.

The loyalty team wanted a new rule: points should only accrue after shipment, not after payment. But loyalty was listening to OrderUpdated and PaymentUpdated, inferring business milestones from generic status fields. The fulfillment domain had no explicit published event for dispatch, so loyalty asked for one more field on the enterprise order event.

At the same time, finance required stricter invoice semantics due to a regulatory change in two countries. They needed to distinguish payment authorization, capture, and settlement. The enterprise Payment topic had one broad “status” field used differently across regions. Updating it triggered review across fraud, reporting, customer support, and mobile apps.

Then a replay incident happened. A consumer bug in the notification system required replaying order events. Several downstream consumers had not implemented idempotency correctly. Customers received duplicate shipment messages. Loyalty double-counted points for a subset of orders. A compensation batch corrected balances later, but support volumes spiked and the CFO lost confidence in “eventual consistency.”

The retailer’s real problem was not Kafka throughput. It was semantic centralization.

The architecture correction looked like this:

  • Sales owned order placement and customer purchase intent.
  • Billing owned payment lifecycle and invoice issuance.
  • Fulfillment owned pick-pack-ship milestones.
  • Loyalty stopped inferring from generic order updates and subscribed to specific published events relevant to accrual rules.
  • A process manager in Sales coordinated order progression across payment and fulfillment, while still using asynchronous communication where appropriate.
  • Canonical topics remained for analytics and legacy consumers during transition, but new operational consumers were prohibited from attaching to them.
  • Reconciliation compared loyalty balances, shipped orders, and invoice records daily and after replay operations.

The result was not fewer events. In fact, there were more streams. But there was less ambiguity. Teams stopped negotiating one giant shared meaning and returned to owning local truths.

That is the kind of trade the enterprise should make: more explicitness, less accidental centralization.

Operational Considerations

Once you reduce central dependency, operations improve—but only if you respect the realities of event-driven systems.

Observability must follow business flows

Tracing infrastructure metrics is not enough. You need visibility into domain milestones:

  • orders placed but not paid within threshold
  • payments authorized but not fulfilled
  • shipments dispatched but notifications unsent
  • invoices issued with no ledger posting

A healthy event architecture measures business lag, not just consumer lag.

Idempotency is not optional

Every meaningful consumer should be designed for duplicate delivery and replay. This means stable identifiers, deduplication strategy, and side-effect control. “Exactly once” is a dangerous phrase in enterprise architecture because business side effects often extend beyond the broker.

Partitioning strategy should respect domain boundaries

Kafka partitioning by a stable aggregate identifier can preserve local ordering where needed. But architects should be clear: ordering is a scoped tool, not a universal guarantee. If your process requires total ordering across many entities, you may be designing the wrong interaction model.

Schema evolution needs ownership

Schema registries help, but they do not solve semantic drift. Producers own meaning. Consumers own tolerance. Backward compatibility rules should exist, but they should not become an excuse for never revisiting poor event design.

Reconciliation needs a runbook

Reconciliation is often treated as an ugly back-office concern. That is a mistake. You need explicit procedures for:

  • partial outages
  • poison messages
  • replay windows
  • compensating actions
  • legal or financial restatement

If the only answer to data drift is “we’ll replay Kafka,” you have not finished the design.

Tradeoffs

Let’s be blunt: moving away from central dependency architecture is not free.

What you gain

  • clearer ownership
  • stronger domain semantics
  • lower design-time coupling
  • more understandable workflows
  • safer schema evolution
  • fewer invisible dependencies
  • more credible autonomy for teams

What you pay

  • more explicit integration design
  • less convenience for ad hoc consumers
  • more events with narrower meaning
  • occasional need for APIs alongside events
  • more work in process ownership
  • more investment in reconciliation and observability

This is a classic architecture trade. Central dependency feels efficient at first because it lowers the local cost of connecting things. It raises the global cost of changing them. Domain-centered design raises the local thought required. It lowers the global coordination tax.

I know which bill I’d rather pay.

Failure Modes

Even after you see the problem, there are several ways to fix it badly.

Replacing one central model with another

Some organizations retire a canonical topic only to introduce a “domain event council” that standardizes every event across all contexts. Same disease, different committee.

Over-orchestrating everything

Once teams discover emergent choreography is messy, they sometimes push every flow into a giant orchestration engine. That can recreate a monolith in workflow form. Not every interaction needs a central process manager. Only long-running, business-critical flows with real coordination responsibility do.

Confusing data replication with domain integration

Subscribing to another context’s events to build a read model is fine. Using those events to make decisions about invariants you do not own is where trouble starts.

Ignoring temporal failure modes

Architects often draw event-driven systems as if messages move instantly and reliably. Reality includes lag, duplicates, backlog, poison payloads, rebalances, partition skew, and human error in replay. Designs that ignore time become incident factories.

Treating reconciliation as a cleanup script

Reconciliation is not a hack for poor architecture. It is an essential safeguard in distributed systems. If it has no owner, no SLA, and no audit trail, it will fail at the worst moment.

When Not To Use

There are cases where event-driven integration should not be the primary architectural style.

Do not center the design on an event bus when:

  • the business process requires strong synchronous validation across multiple capabilities
  • users cannot tolerate ambiguity or delay in critical outcomes
  • the domain is small enough that a modular monolith would be simpler and safer
  • the organization lacks mature operational discipline for asynchronous systems
  • there is no clear bounded context ownership and no appetite to create it
  • the real need is simple request-response collaboration, not state propagation

And here is the uncomfortable one: if your teams are not capable of owning their domains, microservices with Kafka will not rescue you. They will just distribute your confusion. microservices architecture diagrams

Sometimes a modular monolith is the right answer. Strong module boundaries, a single deployable, and clear domain ownership can outperform a fleet of poorly bounded services hanging off a central bus. Architecture is not a morality play where distributed automatically means better.

Several patterns work well alongside this approach.

Bounded Contexts

The foundation. Without clear context boundaries, event design becomes generic and integration becomes political.

Anti-Corruption Layer

Essential in migration. It protects new models from legacy canonical event semantics.

Saga / Process Manager

Useful when a long-running business process needs explicit ownership across contexts. Use carefully. It should clarify responsibility, not centralize all behavior.

Outbox Pattern

Helpful for reliably publishing domain events alongside local state change. It reduces dual-write hazards.

CQRS

A good fit for building projections from published events. But remember: read model replication is not the same as domain ownership.

Data Mesh and Data Products

Relevant on the analytical side. Operational domain events can feed analytical data products, but analytical convenience should not dictate operational semantics.

Summary

An event bus is a powerful thing. That is exactly why it so often becomes dangerous.

When an enterprise uses the bus as the easiest place to connect systems, coordinate change, expose shared data, and negotiate semantics, it creates a hidden monolith. Not one of code, but one of dependency. The topology looks distributed; the organization does not. Teams still wait on one another. Meaning still centralizes. Failures still propagate in surprising ways.

The cure is not to reject events, Kafka, or microservices. The cure is to restore architectural gravity to the domain.

Start with bounded contexts. Publish context-specific events in a clear published language. Keep workflow ownership explicit. Use the event platform as transport, not government. Build reconciliation because reality is messy. Migrate progressively with anti-corruption layers and strangler moves, not with a heroic rewrite.

If you remember one line, remember this:

A shared bus should carry facts, not carry your architecture.

The moment it becomes the place where every important business change must be negotiated, your event bus has become your monolith. And monoliths, whatever shape they take, always send the same bill in the end.

Frequently Asked Questions

What is enterprise architecture?

Enterprise architecture aligns strategy, business processes, applications, and technology in a coherent model. It enables impact analysis, portfolio rationalisation, governance, and transformation planning across the organisation.

How does ArchiMate support architecture practice?

ArchiMate provides a standard language connecting strategy, business operations, applications, and technology. It enables traceability from strategic goals through capabilities and services to infrastructure — making architecture decisions explicit and reviewable.

What tools support enterprise architecture modeling?

The main tools are Sparx Enterprise Architect (ArchiMate, UML, BPMN, SysML), Archi (free, ArchiMate-only), and BiZZdesign. Sparx EA is the most feature-rich, supporting concurrent repositories, automation, scripting, and Jira integration.