Streaming Architecture Without Semantics Breaks Slowly

⏱ 22 min read

There is a particular kind of architecture failure that does not arrive with sirens.

Nothing catches fire. No database falls over. CPU stays respectable. Dashboards glow green. Teams even congratulate themselves: the event backbone is live, Kafka topics are flowing, microservices are decoupled, and the organization has finally escaped the tyranny of point-to-point integration. event-driven architecture patterns

Then, six months later, finance reports numbers no one can reconcile. Customer support sees orders in one system that do not exist in another. Compliance asks a brutally simple question — “what exactly does this event mean?” — and the room goes quiet.

This is how streaming architectures usually fail in large enterprises: not as an outage, but as a slow semantic leak.

The machinery works. The meaning does not.

That is the central mistake in many event-driven programs. Teams spend enormous effort on throughput, partitioning, retention, schemas, connectors, topics, and platform automation. Those things matter. But topology without semantics is just fast confusion. A message broker can move facts around the company at enormous speed; it cannot tell you what a fact is, when it becomes true, which bounded context owns it, or whether another team is allowed to reinterpret it.

In other words: event streaming solves movement. It does not solve meaning.

And in enterprise architecture, meaning is the expensive part.

This article is about that fault line: why streaming architecture without explicit domain semantics breaks slowly, how event meaning and topology must be designed together, and how to migrate toward a semantically coherent event-driven architecture without detonating your current estate. We will look at domain-driven design, Kafka, microservices, progressive strangler migration, reconciliation, failure modes, and the kind of tradeoffs that appear only when real business units and legacy systems are involved. microservices architecture diagrams

Context

The modern enterprise has embraced streaming for good reasons.

Batch integration is too slow for digital products. Shared databases are brittle. Synchronous APIs create runtime coupling and turn one system’s bad day into everybody’s bad day. Event streaming platforms such as Kafka offer a compelling alternative: durable logs, replay, fan-out, decoupled consumers, near-real-time propagation, and the seductive possibility of turning integration into a platform capability rather than a bespoke project.

That promise is real.

But many organizations adopt event streaming as an infrastructure pattern before they understand it as a domain pattern. The platform team stands up Kafka. Every application is encouraged — or forced — to “publish events.” Soon the estate is full of topics named after systems, tables, and implementation details:

crm_customer_update
order_db_changes
sap_material_sync
billing_status_events
shipment_v2_final_final

This is not event-driven architecture. It is distributed data exhaust.

A proper streaming architecture begins with a harder question: what business fact occurred, in whose language, under whose authority, and for what downstream use? That question pushes us straight into domain-driven design. Not because DDD is fashionable, but because enterprises contain multiple models of the same thing, and streaming makes those differences visible.

A customer is not one thing. Sales has a prospect. Billing has an account holder. Risk has a legal entity. Identity has a subject. Support has a contact. If you put CustomerUpdated on a topic without stating which bounded context owns that meaning, you have not simplified the enterprise. You have merely broadcast ambiguity at scale.

Problem

Most streaming programs begin with technical decomposition and postpone semantic decomposition. That is backwards.

The common path goes like this:

Introduce Kafka or another streaming platform.
Connect legacy systems with CDC, connectors, or event wrappers.
Encourage microservices to consume and publish events.
Standardize serialization, often Avro, Protobuf, or JSON Schema.
Declare victory because systems are now “loosely coupled.”

The trouble is that schemas are not semantics.

A schema can tell you a field exists. It cannot tell you whether a business fact is final, provisional, corrected, duplicated, superseded, or inferred. It cannot tell you whether an event represents a domain event, an integration event, a workflow signal, or a database mutation. It cannot tell you whether the source system is authoritative or merely convenient.

This is where slow breakage begins.

One team publishes OrderCreated when the shopping cart is submitted. Another assumes it means credit approval succeeded. A third team treats it as permission to start fulfillment. Later someone discovers that 8% of “created” orders are abandoned during fraud review. The event was technically valid. The architecture was semantically wrong.

The same issue appears with state transfer. CDC streams often become accidental enterprise contracts. A table update in one service is consumed by ten downstream services. They infer meaning from columns and transition logic they do not own. Eventually the producer refactors its model, and the enterprise discovers that what looked like decoupling was really shared implementation by other means.

The core problem is simple:

Streaming architectures often treat events as transport artifacts instead of domain statements.

Once that happens, the topology calcifies around accidental meanings. Consumers encode assumptions. Topics become de facto APIs without lifecycle discipline. New teams cargo-cult existing event names. Reconciliation becomes expensive because no one can explain why two systems disagree in business terms.

A bad service contract hurts one integration. A bad event contract poisons a whole ecosystem.

Forces

This problem persists because powerful forces pull architecture in opposite directions.

1. Platform standardization versus domain specificity

Platform teams want consistency: a small number of topic conventions, schema standards, security patterns, and operational templates. They are right to want this. Enterprises need repeatability.

But domains need nuance. An insurance claim is not a payment authorization. A shipment is not an invoice. The event model should reflect those differences, not flatten them into generic nouns and verbs.

A good platform standardizes mechanics.

A good domain model preserves meaning.

When one consumes the other, trouble starts.

2. Producer autonomy versus enterprise truth

Microservice rhetoric tells us each team owns its service and its data. Again, mostly true. But enterprise processes cross service boundaries. Financial close, compliance, customer servicing, and risk management all require a coherent view of business facts.

So a producer may own publication, but not unilateral interpretation. If one service emits PolicyBound, downstream teams need confidence that this fact has the same business force every time. Otherwise autonomy becomes semantic drift.

3. Event immediacy versus eventual consistency

Streaming shines when facts can be propagated quickly and acted on independently. Yet many business facts are not final at the moment they are emitted. Payment accepted may later be reversed. Product availability may be tentative. Customer onboarding may be pending KYC.

This does not invalidate event-driven design. It means events must express lifecycle and certainty honestly. Architects get into trouble when they optimize for immediacy and hide business provisionality.

4. Legacy constraints versus target-state purity

No enterprise starts clean. Core systems remain on mainframes, ERP suites, aging relational applications, and vendor packages. Migration has to work around what exists.

That means teams often begin with CDC, integration events derived from legacy transactions, or anti-corruption layers. Purists complain this is not ideal. They are right, and also irrelevant. Migration architecture lives in the realm of tradeoffs.

The real question is not “is CDC pure?”

It is “under what conditions can CDC be used without letting database structure become enterprise meaning?”

5. Reuse versus bounded context integrity

Everyone wants a canonical model until they have to use one.

A giant shared enterprise event model promises interoperability, but usually produces sterile abstractions that satisfy nobody. The opposite extreme — every team invents its own event language with no mapping discipline — yields chaos.

DDD offers the saner route: bounded contexts with explicit translation. Shared understanding where it is earned, not imposed.

Solution

The solution is not “more governance,” at least not in the usual committee-heavy sense. EA governance checklist

The solution is to design streaming around semantic contracts, using domain-driven design as the backbone and topology as the expression of those contracts.

That means several concrete choices.

Define events as business facts, not data changes

A business event says something meaningful happened in the domain:

PaymentAuthorized
InvoiceIssued
ClaimRejected
ShipmentDispatched

A data change says some record changed somewhere:

row updated
status column changed
entity replicated

Both may be useful. But they are not the same thing. Treating them as interchangeable is one of the most expensive mistakes in event architecture.

Where data-change streams are necessary, label and isolate them as implementation-level integration mechanisms, not as enterprise business truth.

Tie every event to a bounded context

An event name is incomplete without context. CustomerRegistered in Identity means something different from CustomerOnboarded in Retail Banking. The source bounded context defines the language, invariants, and authority behind the event.

This is the DDD move that saves streaming from becoming semantic soup.

Events should answer:

Which domain or subdomain owns this fact?
What aggregate or business capability produced it?
What invariant became true or changed?
Is the event final, provisional, compensatable, or corrective?
Who is allowed to consume it directly, and who needs translation?

Separate domain events from integration events

This distinction matters enormously in large systems.

A domain event is internal to a bounded context and reflects its own language and model. An integration event is what you choose to publish for external consumers. Sometimes they can be close. Often they should not be identical.

Why? Because internal models evolve for local reasons. External contracts need greater stability and clearer semantics. Conflating the two makes internal change expensive and external coupling invisible.

Design topology around ownership and consumption patterns

Topic design is not just naming. It is architectural shape.

The topology should reflect event ownership, lifecycle, and major consumer communities. Topics named after tables or source systems are nearly always a warning sign. Topics named after stable domain concepts, with versioning and consumption rules, tend to age better.

This does not mean one topic per event type in all cases. It means topology should express meaningful boundaries rather than accidental implementation details.

Build reconciliation into the architecture

Event-driven systems drift. Messages are delayed, duplicated, reordered, or replayed. External systems fail. Consumers deploy bugs. Human correction happens offline. If your architecture assumes the stream is the only truth and never needs verification, you are building a fantasy.

Reconciliation is not a cleanup task. It is a first-class architectural concern.

Every important event flow should answer:

How do we detect divergence?
What system is authoritative for final truth?
How do we repair downstream state?
What is the business process for disputed facts?

Streaming gives you propagation. Reconciliation gives you trust.

Architecture

A semantically sound streaming architecture usually has four layers of responsibility:

Systems of record / domain services where business invariants are enforced
Event publication boundary where internal changes become durable, intentional events
Streaming backbone such as Kafka for transport, replay, and fan-out
Consumer contexts that materialize views, run workflows, or translate into local models

Here is the shape in simple terms:

The crucial element is the publication boundary. This is where a local transaction becomes an externally visible statement. Patterns like transactional outbox matter because they reduce the gap between state change and event emission. But the outbox is not just a reliability device. It is also a semantic checkpoint: are we publishing a real business fact, or simply mirroring persistence noise?

Event meaning topology

I use the phrase event meaning topology deliberately. Enterprises often design physical topology — brokers, clusters, partitions, regions — and logical topology — topics, streams, consumers — while leaving meaning implicit. That is backwards. Meaning should shape topology.

For example:

Events with clear domain ownership and broad enterprise utility may live on stable, governed integration topics.
Internal workflow events may stay private within a context.
CDC streams may be exposed in restricted technical namespaces for migration use only.
Cross-domain translations may publish derived events in a target context’s language through anti-corruption layers.

That gives you a topology where not everything is equally authoritative.

This is a pragmatic design. During migration, a legacy ERP may only expose CDC. Fine. Use it. But do not let downstream consumers anchor themselves directly to erp.cdc.invoice if the target semantics are “invoice issued” in the billing domain. Insert a translation boundary. Make meaning explicit.

Choreography, orchestration, and the semantics trap

Streaming architectures often degenerate into accidental choreography. Service A emits an event, service B reacts and emits another, service C reacts, and eventually the enterprise process is spread across twenty consumers and no one owns the business flow.

This is not always wrong. Sometimes choreography is elegant. But if the process has clear end-to-end policy, deadlines, compensation rules, and customer-visible commitments, then pretending it is merely a sequence of independent reactions is dishonest.

For such flows, use orchestration or at least explicit process state. Event-driven does not mean process-anarchic.

A useful rule: if auditors, regulators, or customers care about the outcome as a named business process, model that process explicitly.

Migration Strategy

This is where architecture becomes real. Nobody gets to discard their legacy landscape and redraw the estate in bounded contexts over a weekend. Migration must be progressive, survivable, and reversible in parts. The right strategy is usually a semantic strangler, not just a technical one.

A technical strangler routes calls from old systems to new components. A semantic strangler goes further: it progressively shifts the meaning of enterprise events away from legacy implementation structures and toward domain-owned contracts.

Step 1: Classify existing streams

Start by inventorying current topics and events into categories:

domain events
integration events
workflow/process signals
CDC/data replication streams
telemetry masquerading as business events

Most organizations discover a mess. Good. Better to know.

Step 2: Identify authoritative domains

For each major business capability, decide where business truth should live. Not every system with relevant data is authoritative. In fact, many systems are merely participants in a broader process.

You need clear answers for domains such as order, payment, invoice, customer identity, product, shipment, claim, policy, account, and entitlement. Without this, event ownership remains political rather than architectural.

Step 3: Introduce translation boundaries

Do not ask downstream consumers to clean up semantic ambiguity individually. That creates twenty inconsistent interpretations. Instead, create explicit anti-corruption layers or translation services that convert legacy streams into meaningful integration events.

This is the bridge between migration pragmatism and target-state sanity.

Step 4: Dual run and reconcile

For critical flows, publish new semantically clear events while legacy integrations still operate. Compare outputs. Reconcile discrepancies. Track where semantics diverge, not just payloads.

This is hard work and absolutely necessary. You are not just validating software. You are validating business interpretation.

Step 5: Move consumers to domain-aligned topics

Migrate consumers off technical or legacy topics in waves. Prioritize high-value consumers first, especially where analytics, customer service, finance, and operations currently disagree.

Step 6: Retire technical topics from enterprise use

CDC topics may remain for migration, internal sync, or low-level replication. But they should no longer be treated as strategic contracts. If they become permanent public interfaces, the strangler has failed.

The reconciliation service in this picture is not decorative. During migration, it is often the single most important control point.

Enterprise Example

Consider a large retail bank modernizing its lending platform.

The bank had a mainframe core, a CRM platform, a document management system, a workflow engine, and a newer set of microservices built around Kafka. The modernization goal was straightforward on paper: emit loan application events and let downstream services handle onboarding, underwriting, document requests, pricing, and customer notifications.

In practice, the first implementation was a semantic disaster.

The core event was named ApplicationCreated. It was published when a customer clicked submit in the digital channel. Downstream services assumed wildly different things:

Document services treated it as a complete application and immediately requested proof artifacts.
Underwriting assumed the applicant had passed identity checks.
CRM created sales tasks for branch staff.
Analytics counted it as pipeline intake.
Compliance expected an immutable audit trail to begin there.

But in the actual lending domain, “created” only meant a draft had become visible to the workflow engine. It was not complete, not validated, not underwritten, and often not even legally attributable to a fully identified applicant.

The event spread faster than the bank’s understanding of it.

Worse, the event contract had been derived from a workflow table update via CDC. So its semantics were inherited from a process engine’s internal persistence model, not from the lending domain. Classic mistake.

The bank corrected course by introducing bounded-context thinking:

Digital Sales owned events like LoanApplicationSubmitted
Identity & KYC owned ApplicantVerified
Lending Origination owned ApplicationAcceptedForAssessment
Credit Decisioning owned OfferApproved or ApplicationDeclined
Document Management consumed translated integration events rather than workflow table changes

They also inserted a semantic translation layer between mainframe/workflow changes and Kafka integration topics. The translation logic encoded business rules that had previously lived as tribal knowledge in consuming teams.

This was not glamorous work. It involved difficult workshops, argument over terminology, and uncomfortable discoveries that some business units had conflicting interpretations of “application complete.” But once the semantics were cleaned up, the topology became simpler, not more complex.

The operational result was substantial:

reduced false downstream processing
cleaner customer notifications
auditable stage transitions
fewer reconciliation exceptions between lending operations and finance
easier migration of new microservices because event meaning was stable

The lesson was plain: Kafka was not the breakthrough. Shared domain language was.

Operational Considerations

Semantically sound architecture still has to survive production.

Schema evolution is necessary but insufficient

Use schema registries, compatibility rules, and versioning discipline. Of course. But remember that backward-compatible schemas can still introduce semantic incompatibility. Adding an optional field called statusReason may be harmless structurally and devastating semantically if it changes how consumers interpret the event.

Review semantic changes like contract changes, not like serialization trivia.

Partitioning can encode hidden semantics

Kafka partitioning strategy is often treated as a throughput concern. It is also a business concern. If consumers assume per-aggregate ordering, your keying strategy is now part of semantic integrity. Change it casually and you may break process correctness without any infrastructure alarm.

Idempotency is non-negotiable

At-least-once delivery, retries, and replays are normal. Consumers processing business events must be idempotent or explicitly deduplicating. If duplicate PaymentCaptured events can trigger duplicate settlement, you do not have a streaming architecture. You have a machine for manufacturing incidents.

Replay needs boundaries

Replay is one of Kafka’s superpowers and one of its great dangers. Replaying historical events into consumers that have changed logic, reference data, or external side effects can produce corrupt state. Architect for replay with clear rules:

what can be replayed
into which environments
with which side effects disabled
from which semantic version boundaries

Observability must include business lineage

Technical tracing is not enough. You need to answer business questions:

Why did this invoice exist in billing but not customer portal?
Which upstream events contributed to this account state?
Which corrections were applied?
Was this event translated, enriched, or compensated?

That means lineage metadata, correlation IDs, causation IDs, and meaningful audit trails.

Reconciliation as routine operations

Every important event-driven enterprise system should run regular reconciliation jobs or streams. Not because eventing is bad, but because reality is messy. Data stores are rebuilt. manual overrides happen. partner systems lag. legal adjustments are backdated.

Mature operations teams treat reconciliation as a routine control loop, not an embarrassing exception.

Tradeoffs

No honest architecture article should pretend there is a free lunch here.

More semantic design upfront

You will spend more time defining events, contexts, ownership, and lifecycle states. This slows early delivery. It also prevents months of downstream confusion. In enterprises, that is usually a bargain.

Translation layers add complexity

Anti-corruption layers and semantic translators are extra components to build and operate. Purists dislike them. But in migration-heavy estates they are often the difference between controlled evolution and enterprise-wide coupling to legacy internals.

Not every event can be perfectly pure

Some flows will still rely on CDC, package integration quirks, or vendor-imposed data structures. The goal is not semantic perfection. The goal is controlled semantics with explicit boundaries.

Strong domain ownership can frustrate broad reuse

A team may want a handy event for analytics or a downstream automation. If the owning domain refuses because the meaning is unstable or not externally appropriate, people complain about governance. Good. That is often governance doing its job. ArchiMate for governance

Reconciliation means accepting imperfection

Architects raised on synchronous consistency sometimes resist this. But in distributed event-driven enterprises, reconciliation is the adult answer. Not because consistency does not matter, but because local consistency and global eventual truth are different design concerns.

Failure Modes

There are predictable ways this architecture goes wrong.

The canonical model trap

The enterprise creates one grand event taxonomy for everything. It is abstract, over-governed, politically negotiated, and semantically weak. Teams bypass it or misuse it.

CDC becomes the architecture

CDC starts as a migration tactic and ends as the company’s primary integration model. Business meaning is now anchored to database structures. Refactoring becomes terrifying.

Topic sprawl with no ownership

Hundreds of topics, unclear producers, unknown consumers, no retirement policy, no semantic documentation. A haunted forest of contracts.

Event names that lie

Events sound final but are provisional. Or sound business-like but are really process artifacts. These are the worst because they invite confident misuse.

Choreography without process ownership

Critical business outcomes emerge from chains of consumer reactions with no explicit owner. Recovery and accountability become impossible.

Ignoring correction events

Teams model only happy-path forward events and forget cancellations, reversals, supersessions, merges, and manual adjustments. Real business is full of correction.

When Not To Use

Streaming with strong semantic modeling is powerful, but it is not universal medicine.

Do not use it when the business interaction is fundamentally request-response and requires immediate coordinated validation across a small set of systems. A synchronous API may be simpler and more honest.

Do not force event streaming onto small, tightly scoped applications with minimal integration needs. You will buy complexity without meaningful decoupling.

Do not use public business events as a substitute for transactional integrity within a bounded context. If two changes must commit atomically to preserve a core invariant, solve that locally before you fantasize about downstream elegance.

Do not pretend event-driven architecture removes the need for a workflow engine or process manager when the process truly has central policy, timeout handling, SLA commitments, and human intervention. Sometimes orchestration is the right answer.

And do not adopt Kafka because the platform team already bought Kafka. Technology procurement is not architecture.

Several patterns fit naturally around this style of architecture:

Transactional Outbox for reliable publication from local transactions
Anti-Corruption Layer to translate legacy or foreign models into bounded-context language
Strangler Fig Pattern for incremental replacement of legacy systems
CQRS where read models are materialized from event streams for different consumer needs
Saga / Process Manager for long-running business processes with compensation
Event Sourcing in selected domains where event history is the native model, though not as a default for everything
Data Mesh style domain ownership where analytical data products consume semantically governed domain events rather than raw operational leakage

Worth noting: event sourcing and event-driven architecture are not the same thing. You can have semantically sound streaming without event sourcing, and you can have event-sourced internals with terrible external event contracts. Keep the distinction sharp.

Summary

Streaming architecture is often sold as a wiring problem. It is not. It is a meaning problem with wiring consequences.

Kafka, microservices, schemas, partitions, and replays are all useful. They are also secondary. The primary design question is always semantic: what happened in the business, who has the authority to say so, what exactly does the event mean, and how should that meaning travel across bounded contexts without dissolving into ambiguity?

This is why domain-driven design matters so much in event-driven enterprise architecture. Bounded contexts stop teams from pretending one word means the same thing everywhere. Integration events force intentional publication. Anti-corruption layers protect new models from old confusion. Reconciliation acknowledges the reality of distributed truth. Progressive strangler migration lets you move from legacy structure to domain meaning without betting the company on a rewrite.

The memorable line, if you want one, is this:

A streaming platform can transport events forever. It cannot rescue an event that never meant anything clear in the first place.

Design event meaning first. Then design topology to preserve it. That is how streaming architecture ages well. That is how it remains useful after the launch decks are forgotten. And that is how enterprises avoid the slow break — the dangerous kind, the quiet kind — where everything still runs, but nobody agrees on what happened.

Frequently Asked Questions

What is event-driven architecture?

Event-driven architecture (EDA) decouples services by having producers publish events to a broker like Kafka, while consumers subscribe independently. This reduces direct coupling, improves resilience, and allows new consumers to be added without modifying producers.

When should you use Kafka vs a message queue?

Use Kafka when you need event replay, high throughput, long retention, or multiple independent consumers reading the same stream. Use a traditional message queue (RabbitMQ, SQS) when you need simple point-to-point delivery, low latency, or complex routing logic per message.

How do you model event-driven architecture in ArchiMate?

In ArchiMate, the Kafka broker is a Technology Service or Application Component. Topics are Data Objects or Application Services. Producer/consumer services are Application Components connected via Flow relationships. This makes the event topology explicit and queryable.