Streaming Everything Is Architecture Cargo Cult

⏱ 19 min read

There’s a familiar smell in enterprise architecture. It’s the smell of a good idea pushed too far.

A team discovers event streaming, usually through Kafka, and suddenly every problem starts looking like a topic, a consumer group, and a lag dashboard. File loads become streams. Monthly close becomes streams. Master data synchronization becomes streams. Someone says “real-time” often enough that nobody asks the old but necessary question: real-time for whom, and for what decision?

That’s how architecture cargo cults begin.

Streaming is a powerful architectural style. It is not a moral virtue. Batch is not legacy cowardice. It is often the correct answer, sometimes the best answer, and occasionally the only sane answer. The job of the architect is not to spread the latest pattern across the estate like peanut butter. The job is to shape systems around domain semantics, operational reality, and the economics of change.

This is where the batch-versus-stream debate usually goes wrong. People frame it as a technology decision. It isn’t. It’s a decision-topology problem. You are deciding how information should move through the enterprise based on the shape of business decisions: immediate, deferred, accumulative, reversible, auditable, high-volume, low-value, high-value, or legally constrained.

If the business needs to approve a card transaction before a customer walks away from the till, then waiting six hours for a nightly load is absurd. If Finance needs a reconciled ledger at the end of the day, pretending every posting must traverse a low-latency event pipeline is equally absurd. The important thing is not whether data moves in micro-batches, streams, or files. The important thing is whether the architecture matches the decision that data supports.

That’s the heart of the matter: design the movement of data around the meaning of the business, not the fashion of the platform.

Context

Most large enterprises now live with a mixed estate. They have packaged systems, mainframes, SaaS products, APIs, operational databases, data lakes, and at least one streaming platform—usually Kafka or a cloud equivalent. They also have integration scar tissue: ETL jobs that nobody wants to touch, brittle point-to-point APIs, message queues with unclear ownership, and “temporary” data extracts that have survived three CIOs. event-driven architecture patterns

In that environment, streaming arrives as both promise and temptation.

The promise is real. Event streams can reduce latency, decouple producers from consumers, enable reactive processes, support event-driven microservices, and create a durable log of business facts. For some domains—payments, fraud detection, logistics telemetry, inventory visibility—that is transformative. microservices architecture diagrams

The temptation is to conclude that because streams are useful somewhere, they should be used everywhere.

That conclusion is expensive.

It ignores a basic domain-driven design principle: software boundaries should reflect business boundaries. Not every domain speaks in events. Some domains speak in snapshots, statements, cycles, settlements, or approvals. Some business processes are not flows at all; they are periodic consolidations. If you force a stream-shaped technical model onto a batch-shaped business model, you don’t modernize. You distort.

And once you distort the domain model, the operational problems begin. Consumers infer state from incomplete event histories. Replays produce duplicate side effects. Reconciliation drifts because nobody owns the “truth.” Latency improves in the dashboard while data trust collapses in the business.

Architects should know better.

Problem

The core problem is simple: enterprises choose data movement patterns based on infrastructure enthusiasm instead of business decision topology.

That sounds abstract, so let’s make it plain.

A business decision topology is the pattern of decisions a domain makes over time:

  • Immediate decisions: authorize, reject, alert, reserve.
  • Near-real-time decisions: reroute, restock, reprioritize.
  • Periodic decisions: settle, bill, report, close.
  • Accumulated decisions: detect trend, optimize portfolio, review performance.
  • Reconciled decisions: match, verify, correct, attest.

These decision shapes matter more than whether your platform team likes Kafka.

When teams ignore that, they usually make one of two mistakes:

  1. They stream batch-shaped problems.
  2. Example: sending every general ledger posting event-by-event to dozens of consumers who still only act at end-of-day.

  1. They batch stream-shaped problems.
  2. Example: pushing fraud indicators every four hours because “the warehouse updates on schedule.”

Both create dysfunction. The first inflates complexity without business gain. The second misses the moment where information has value.

The architecture question, then, is not “batch or stream?” It is “where in this domain do we need facts, derived state, and decisions to move at what pace, with what guarantees?”

That is an architectural question in the old-fashioned sense. It is about forces, boundaries, and consequences.

Forces

There are several competing forces here, and good architecture is mostly about managing such tensions rather than pretending they disappear.

1. Domain semantics

A stream is not just a transport. It implies a way of thinking: business facts happen over time, and consumers react to them. That works beautifully when the domain itself is event-rich.

Order placed. Payment authorized. Parcel scanned. Temperature exceeded. Seat reserved.

But some domains care more about a certified state than about the event trail. A monthly solvency report, a tax filing, a payroll run, or a legal statement is not simply a chain of events. It is a governed, reconciled, often human-reviewed construct. In such domains, the snapshot is first-class.

DDD helps here. Ask what the ubiquitous language values:

  • Facts in time?
  • Current state?
  • Official statement?
  • Reconciled balance?
  • Human approval milestone?

If the domain language centers on “close,” “settle,” “publish,” “restate,” and “attest,” you are not looking at a pure streaming domain.

2. Latency economics

Low latency has a cost. More moving parts. More stateful consumers. More pressure on idempotency. More need for schema governance. More operational overhead. EA governance checklist

The right question is not “can we make this real-time?” The right question is “what business value appears if latency drops from 12 hours to 5 minutes, and does that justify the complexity?”

Sometimes the answer is obviously yes. Fraud prevention is worth milliseconds. Inventory oversell prevention is worth seconds. Dynamic dispatch is worth minutes.

Sometimes the answer is no. Executive scorecards do not become more useful because they refresh every three seconds. Quarterly planning does not improve because your budgeting system emits a topic.

3. Truth and reconciliation

Streams are good at publishing facts. Enterprises are bad at living without reconciliation.

Any architecture that spans multiple systems of record eventually needs a strategy for:

  • late or missing events
  • duplicate messages
  • out-of-order processing
  • downstream side effects
  • correction and restatement
  • legal or financial attestation

A mature architecture treats reconciliation as a first-class capability, not a grudging afterthought. If you stream operational events but still need end-of-day balancing, then you haven’t failed. You’ve discovered reality.

4. Organizational maturity

Streaming systems demand disciplined ownership. Someone must own contracts, schema evolution, retention, replay semantics, dead-letter handling, and consumer expectations. If your organization still struggles to version APIs, flooding the enterprise with event topics will not produce elegance. It will produce distributed confusion.

5. Legacy constraints and migration cost

Batch systems are often deeply embedded in the enterprise for a reason: they support critical periodic business processes. Replacing them with streaming-native microservices can be sensible, but not if it interrupts payroll, billing, settlement, or regulatory reporting.

A proper migration is not a clean switch. It is usually a progressive strangler, with coexistence, dual-running, reconciliation, and carefully chosen cut lines.

Solution

The practical solution is to use a decision topology for architecture selection.

Instead of asking “should we use batch or stream?”, classify each business capability along four dimensions:

  1. Decision urgency
  2. How quickly must a decision happen before value is lost?

  1. State semantics
  2. Is the domain driven by events, current state, or certified snapshots?

  1. Correction frequency
  2. How often are facts revised, restated, or reconciled later?

  1. Consumer diversity
  2. Are many consumers reacting independently, or is there a single downstream process?

This gives you a topology, not a binary choice. And from that topology, patterns emerge:

  • Pure streaming for immediate operational reactions.
  • Stream + materialized views for near-real-time domain state.
  • Batch over streaming facts for periodic consolidation and reporting.
  • Batch snapshots where official state matters more than event granularity.
  • Hybrid architectures where streams carry operational facts and batch performs financial or regulatory reconciliation.

The hybrid answer is often the right one. Not because it is fashionable, but because enterprises contain multiple time scales.

A supply chain can need second-by-second telemetry for warehouse operations and a nightly cost allocation for finance. The architecture should support both without pretending they are the same problem.

A good line to remember is this: operational truth moves fast; institutional truth settles slowly.

Architecture

A sensible enterprise architecture separates three concerns:

  • Operational event flow
  • Domain state serving
  • Periodic reconciliation and attestation

That means using streams where facts happen and reaction matters, while preserving batch where consolidation, certification, or broad replay is the real need.

Core architecture pattern

Core architecture pattern
Core architecture pattern

This is not “batch versus stream.” It is stream and batch, each serving a different kind of business meaning.

Streams carry operational facts and support responsive processes. Materialized views produce current state for applications that need low-latency reads. Raw events are retained for replay, audit, or derivation. Batch reconciliation then combines events, extracts, and correction logic into certified outputs.

That split matters because event-driven microservices are not good at everything. They are particularly poor at pretending consistency, history repair, and official close are easy.

Domain boundaries

In DDD terms, each bounded context should choose its own publication model based on domain semantics.

  • A Fraud Detection context naturally consumes streams.
  • A Fulfillment context may use both streams and queryable state.
  • A General Ledger context may consume event feeds but still publish authoritative daily balances as snapshots or statements.
  • A Regulatory Reporting context may be almost entirely batch-oriented, even when fed from streaming sources.

Do not make “everything publishes events” into a dogma. Publishing domain events can be valuable, but some contexts should expose stable state, not internal chatter. There is a difference between a meaningful business fact and every little persistence twitch.

State is not the enemy

One of the stranger mistakes in event-driven architecture is the suspicion of stateful views, as though materialized state were morally compromised. In reality, business users live in state:

  • current inventory
  • current credit exposure
  • current account balance
  • current claim status

Events are a means. State is often the thing people actually need.

A good architecture therefore treats event streams as the source of temporal facts and state projections as deliberate products. You should know which one is authoritative for which use case.

Decision topology map

Decision topology map
Decision topology map

The point of the map is not precision. It is discipline. It forces the architecture conversation back to business meaning.

Migration Strategy

Nobody gets to redesign the enterprise on a blank sheet. Real migration is negotiated with history.

The most reliable approach here is a progressive strangler migration. You do not replace a batch landscape with a streaming platform in one leap. You carve out high-value decisions, introduce event publication around them, run parallel reconciliation, and gradually shift authority.

Migration stages

  1. Identify decision hotspots
  2. Find where lower latency changes business outcomes. Not all domains qualify.

  1. Instrument existing systems
  2. Publish business events from legacy applications or CDC pipelines where appropriate. Start with facts, not with derived noise.

  1. Build projections and narrow consumers
  2. Introduce one or two high-value consumers first. Avoid broad enterprise fan-out until contracts stabilize.

  1. Run batch and stream in parallel
  2. This is where most teams get impatient. Don’t. Parallel run is how you earn trust.

  1. Introduce reconciliation loops
  2. Compare stream-derived outcomes with existing batch outputs. Measure drift. Investigate semantics mismatch.

  1. Cut over bounded contexts gradually
  2. Move decision ownership context by context. Leave periodic close processes in place until confidence is demonstrably high.

  1. Retire obsolete batch jobs selectively
  2. Not all of them. Some remain useful, especially for attestation or fallback.

Progressive strangler sketch

Progressive strangler sketch
Progressive strangler sketch

The key word here is selective. Architects who announce “we are eliminating batch” are usually replacing one slogan with another. Better to ask: which batch jobs are compensating for weak operational models, and which are genuinely aligned to periodic business needs?

Reconciliation is part of migration, not cleanup

If your new streaming path produces a customer balance that disagrees with the old nightly process, that is not a nuisance. That is architecture speaking back.

Usually the issue is one of:

  • event timing assumptions
  • omitted correction events
  • inconsistent business keys
  • dual writes
  • timezone or effective-date semantics
  • misunderstanding of what the old batch actually computes

In migration, reconciliation reveals the hidden domain model of the legacy estate. Ignore it and you will cut over into a lie.

Enterprise Example

Consider a global retailer with e-commerce, stores, warehouses, and a central finance platform.

Historically, inventory updates were batched every few hours from store systems and warehouse systems into a central stock platform. That was good enough when online ordering was a smaller channel. Then click-and-collect grew, same-day fulfillment appeared, and the business started overselling inventory. Customers ordered items that the website believed were available, while stores had already sold them.

The platform team’s first instinct was predictable: “Put everything on Kafka.”

That would have been reckless.

Why? Because the retailer actually had three distinct domains hiding inside one “inventory” conversation:

  1. Operational reservation
  2. Needs second-to-minute latency. If an item is reserved online, stores and fulfillment systems must know quickly.

  1. Store stock visibility
  2. Needs near-real-time but tolerates occasional correction. Scans, shrinkage, and delayed updates happen.

  1. Financial inventory valuation
  2. Needs certified end-of-day balances with reconciliation against ERP and warehouse systems.

A stream-first architecture made sense for the first two. It did not remove the need for the third.

The retailer therefore implemented:

  • Kafka topics for stock movement and reservation events
  • microservices for reservation and availability projection
  • materialized views serving current sellable inventory
  • nightly reconciliation jobs comparing event-derived stock with ERP balances
  • exception workflows for drift, shrinkage, and correction postings

This solved the oversell problem without corrupting finance.

The most valuable architectural move was not Kafka itself. It was separating sellable inventory from accounted inventory as distinct domain concepts. That is classic domain-driven design: language first, technology second.

The failure mode they avoided was subtle but common. Had they reused one “inventory” model for both online reservations and financial valuation, every late scan and correction would have become a battle over truth. Instead, they accepted multiple truths for different contexts, tied together through reconciliation.

That is how grown-up enterprises work.

Operational Considerations

Streaming systems are operational systems first and conceptual systems second. They look elegant in diagrams and unforgiving at 2 a.m.

A few concerns matter disproportionately.

Schema evolution

Events are contracts. If teams publish without governance, topics become folklore. Use versioned schemas, compatibility rules, and clear ownership. Backward compatibility is not bureaucracy; it is the price of decoupling. ArchiMate for governance

Idempotency and replay

Replays happen. Consumers restart. Topics are reprocessed. Side effects duplicate unless handlers are idempotent or protected by deduplication keys. This is not optional engineering polish. It is table stakes.

Ordering assumptions

Many designs quietly assume global order. Most platforms provide partition order, not universal order. If your business semantics depend on sequence across entities, you need a stronger design than “Kafka will handle it.”

Retention and audit

Retention settings are architecture decisions. If topics are kept for a week but finance needs 90-day replay during an incident, someone has designed a trap. Raw event persistence in a lake or immutable store often complements the streaming backbone.

Observability

Batch has completion monitoring. Streams need lag monitoring, consumer health, throughput metrics, poison-message handling, and semantic alerts. The most useful alerts are not just technical—consumer down, lag high—but business-aware: reservations not confirmed, expected event rates dropped, reconciliation drift rising.

Human operations

Exception queues and operational workbenches matter. Reconciliation failures, poison records, and correction workflows require human handling. Enterprises do not become “fully autonomous” because they bought a streaming platform.

Tradeoffs

Let’s be blunt. Streaming introduces real advantages and real pain.

Benefits of streaming

  • lower latency for operational decisions
  • temporal record of business facts
  • decoupled consumers
  • support for reactive workflows
  • easier fan-out for multiple downstream uses
  • stronger fit for event-rich domains

Costs of streaming

  • harder operational debugging
  • eventual consistency and state drift
  • schema governance burden
  • consumer replay complexity
  • duplicate and out-of-order handling
  • temptation to overpublish low-value events

Benefits of batch

  • simpler control points
  • strong suitability for periodic consolidation
  • easier certification and attestation
  • straightforward reprocessing of full data sets
  • often lower operational complexity per use case

Costs of batch

  • higher latency
  • coarse-grained failure windows
  • expensive full reruns at scale
  • weaker support for responsive operational action
  • tendency to hide issues until the cycle boundary

Neither style is superior in the abstract. The right answer depends on what kind of business truth you are moving.

A useful rule of thumb:

  • If the value of information decays rapidly, stream it.
  • If the value depends on completeness and attestation, batch it.
  • If both are true, do both deliberately.

Failure Modes

This topic is full of traps. Here are the big ones.

1. Event fetishism

Teams publish every internal state transition as an enterprise event. Consumers then depend on implementation details, and the producer can no longer evolve safely.

Publish meaningful domain facts, not your ORM’s emotional diary.

2. Dual-write inconsistency

A service writes to its database and publishes an event separately. One succeeds, the other fails. Now your estate is split-brain. Use transactional outbox or equivalent patterns where consistency matters.

3. No reconciliation strategy

The architecture assumes events are enough. Then late data, corrections, and missing records appear, and nobody knows how to produce an authoritative answer.

Reconciliation is not an admission of weakness. It is a design capability.

4. Misplaced low-latency ambition

A reporting or finance process is rebuilt on streams even though business users only consume daily outputs. The result is more cost, more fragility, and no meaningful gain.

5. Treating Kafka as enterprise truth

Kafka is a backbone, not a metaphysical guarantee. Topic retention, replay boundaries, schema shifts, and consumer logic all affect what truth you can reconstruct. For many domains, the stream is a source of facts, not the official record.

6. Ignoring bounded contexts

One shared enterprise topic model tries to standardize everything. The result is semantic mush. Different domains need different meanings, keys, correction rules, and timeliness expectations.

When Not To Use

There are clear cases where a stream-first architecture is the wrong move.

  • Periodic governed processes such as payroll close, statutory reporting, or regulatory submissions, where completeness and certification matter more than immediacy.
  • Low-change domains where updates are infrequent and there is little value in reactive fan-out.
  • Simple one-consumer integrations where a file or API is cheaper and clearer.
  • Organizations without integration discipline, where event governance, ownership, and operations are immature.
  • Domains with unstable semantics, where the business language is still in flux and publishing broad event contracts would harden confusion too early.
  • Migration situations with high business risk, where replacing a trusted batch close process would jeopardize compliance or financial integrity.

Sometimes the old nightly job is not technical debt. Sometimes it is the architecture expressing the rhythm of the business.

That’s worth saying because too many modernization programs confuse visible age with architectural incorrectness.

Several patterns sit naturally around this decision.

Event-carried state transfer

Useful when consumers need state snapshots in events rather than reconstructing from a long event history. Good for practical integration, though it increases payload coupling.

CQRS and materialized views

A natural fit where streams feed read models for current operational state. Particularly useful in customer, order, and inventory scenarios.

Transactional outbox

Essential when domain state changes and event publication must remain consistent.

Sagas / process managers

Helpful for long-running cross-service workflows, though often overused. If the business process is fundamentally periodic and reconciled later, a saga may add noise rather than clarity.

Data mesh and domain data products

Relevant when bounded contexts publish well-governed analytical outputs. But again, many analytical products remain batch-shaped, even if sourced from streams.

Lambda and Kappa styles

Useful reference points, but in practice many enterprises end up with a pragmatic hybrid: streaming for operations, batch for settlement, and a lakehouse for analytics and replay.

Summary

Streaming everything is not modern architecture. It is often just modern-looking confusion.

The real design question is not whether batch is old and stream is new. It is how decisions in the business unfold over time, and what kind of truth each decision requires. Some truths are urgent. Some are approximate until corrected. Some must be reconciled before they are trusted. Some must be certified before they are acted upon.

That gives us a better lens: batch vs stream is a decision topology problem.

Use streams for event-rich, time-sensitive operational reactions. Use projections for state people actually need. Use batch for consolidation, attestation, and periodic business rhythms. Use reconciliation where truths from different contexts must meet. And migrate progressively, with strangler patterns and parallel run, because the enterprise is not a whiteboard.

A good architecture does not worship motion. It respects meaning.

And in enterprise systems, meaning always wins in the end.

Frequently Asked Questions

What is event-driven architecture?

Event-driven architecture (EDA) decouples services by having producers publish events to a broker like Kafka, while consumers subscribe independently. This reduces direct coupling, improves resilience, and allows new consumers to be added without modifying producers.

When should you use Kafka vs a message queue?

Use Kafka when you need event replay, high throughput, long retention, or multiple independent consumers reading the same stream. Use a traditional message queue (RabbitMQ, SQS) when you need simple point-to-point delivery, low latency, or complex routing logic per message.

How do you model event-driven architecture in ArchiMate?

In ArchiMate, the Kafka broker is a Technology Service or Application Component. Topics are Data Objects or Application Services. Producer/consumer services are Application Components connected via Flow relationships. This makes the event topology explicit and queryable.