⏱ 18 min read
Real systems do not fail because the data was late by 400 milliseconds. They fail because nobody agreed what “late” meant.
That is the dirty secret behind a lot of modern streaming architecture. We built event-driven platforms, wired Kafka into every corner of the enterprise, split processing into fleets of microservices, and congratulated ourselves for being real-time. Then finance asked for the month-end numbers. Operations asked why yesterday’s shipment count changed overnight. Customer support asked why an order was shown as “paid” in one screen and “pending” in another. Suddenly the proud streaming platform looked less like a nervous system and more like a rumor mill.
The industry oversold purity. “Everything streaming” sounds elegant on a conference slide. In an enterprise, it often means pushing uncertainty downstream at machine speed.
The better pattern is humbler and more durable: a hybrid pipeline topology where streaming handles immediacy and batch windows restore semantic certainty. Not because streaming is weak. Because business truth has shape, and that shape is rarely fully known in the first few seconds of an event’s life.
This is not a retreat to old-school nightly ETL. It is a deliberate architecture choice. Stream for reaction. Batch for correction. Use events to move quickly, and windows to decide what you actually mean.
That distinction matters.
Context
Most enterprises now sit in an awkward middle ground. They have outgrown central batch-only data processing, but they have not escaped the economics and semantics that made batch useful in the first place. Orders arrive continuously. Inventory changes in bursts. Payments settle asynchronously. Customer records drift across channels. Compliance and finance still need reconciled, auditable answers.
So the platform evolves in layers:
- Kafka or another event backbone for transport
- microservices owning operational capabilities
- stream processors for low-latency transformations
- analytical stores or lakehouse platforms for historical views
- operational databases still acting as local systems of record
- a growing need for reconciliation, replay, and end-of-period certainty
In many organizations, the first streaming wave is driven by customer-facing needs: fraud detection, personalization, alerts, order tracking, pricing, and workflow orchestration. These are good use cases. They reward low latency.
The second wave is where the architecture gets serious. Finance wants a trusted ledger. Supply chain wants exact inventory positions. Risk wants reproducibility. Audit wants lineage. Data governance wants consistent definitions. Product teams discover that an event saying OrderShipped is not the same thing as revenue recognition, and a PaymentReceived event is not the same thing as settled cash.
This is where domain-driven design becomes useful, not fashionable.
A platform is not “real-time” because messages move quickly. It is real-time when the domain semantics of a decision tolerate immediate interpretation. If your domain cannot confidently interpret an event until more context arrives, then low latency is a transport property, not a business truth.
That is why batch windows remain essential.
Problem
Pure streaming topologies have a recurring flaw: they confuse event arrival with business finality.
An event appears on Kafka. A service consumes it. Another service enriches it. A stream processor aggregates it. A dashboard updates. Everybody feels fast. But the event may be incomplete, out of order, duplicated, subsequently corrected, or semantically provisional. event-driven architecture patterns
A few common examples:
- A retail order is placed, but payment authorization later expires.
- A shipment event arrives before inventory decrement due to integration lag.
- A refund event is posted after the revenue dashboard already counted the sale.
- A healthcare claim enters as approved, then is adjusted after adjudication.
- A telecom usage event is emitted promptly, but rating rules are back-applied.
- A bank transfer event appears accepted, but settlement later fails.
Streaming handles all of this mechanically. It can reorder, buffer, watermark, compact, and reprocess. But enterprises do not struggle because the mechanics are impossible. They struggle because there are multiple valid meanings of “current state”:
- current observed state
- current inferred state
- current legally recognized state
- current financially closed state
- current customer-visible state
Those are not the same thing.
If you force a single streaming pipeline to represent all of them at once, the architecture starts lying. Teams invent hidden side rules. Exceptions pile up. Replays become dangerous because they overwrite operational assumptions. “Eventually consistent” turns into “politically negotiated.”
Batch windows solve a practical problem: they create explicit moments when the organization agrees to stabilize, reconcile, and publish a version of truth fit for a particular purpose.
Not one truth for all use cases. That fantasy should be retired. But a truth with a defined semantic contract.
Forces
Several forces push architects toward hybrid designs.
1. Immediacy versus correctness
The business often needs fast reactions:
- fraud blocks in seconds
- customer notifications in near real time
- order routing immediately after submission
- operational alerts as conditions emerge
But those same domains later require corrected, auditable records. The faster stream often sees only the first draft of reality.
2. Domain semantics are not uniform
A shipment domain can tolerate transient uncertainty differently from a general ledger domain. Inventory reservation, fulfillment confirmation, invoicing, and cash settlement live on different clocks.
This is classic domain-driven design territory. Different bounded contexts speak different truths at different times. The architecture should reflect that instead of flattening everything into one event soup.
3. Late and changed data are normal
Not edge cases. Normal.
External partners resend files. mobile clients reconnect. upstream systems replay messages. CDC feeds reorder changes. manual corrections arrive after business cutoffs. Reference data changes after transactions were evaluated.
If your platform has no designed place to absorb correction, it will absorb it chaotically.
4. Enterprises need closure points
Month-end close, daily settlement, intraday risk snapshots, SLA reports, commission calculations, tax calculations, compliance submissions—these all need windows, versions, and reproducibility.
A streaming topology without closure points is like an accounting department that never closes the books.
5. Cost and operability
Low-latency compute everywhere is expensive. Continuous stateful processing is operationally sensitive. Some calculations simply do not need to run continuously. Many heavy joins, historical corrections, and broad reconciliations are far cheaper and safer in bounded windows.
6. Consumer diversity
Some consumers need fresh but approximate data. Others need slower but certified data. Architectures improve when they stop pretending those consumers are the same.
Solution
Use a hybrid pipeline topology.
In plain terms:
- Streaming path for immediate reactions, provisional views, local decisions, and operational workflows
- Batch window path for reconciliation, correction, semantic normalization, and publication of trusted domain outputs
The key is not to treat batch as a fallback. Treat it as a first-class semantic stage.
A healthy hybrid topology usually has three layers of meaning:
- Raw event flow
Facts as observed by producing systems. Fast, messy, valuable.
- Operational streaming projections
Low-latency derived views used for responsive experiences and local automation.
- Windowed reconciled outputs
Domain-certified datasets, ledgers, aggregates, and snapshots produced in explicit intervals.
That interval might be:
- every 5 minutes for SLA reporting
- hourly for inventory balancing
- daily for financial statements
- event-time windows with watermark tolerance
- business-day close aligned to a market or geography
The design discipline is to make semantics explicit:
- what is provisional?
- what is final for now?
- what can be corrected?
- who owns correction?
- where is replay safe?
- which publication becomes authoritative for which bounded context?
That is the architecture pattern. Not “stream plus batch” as a technology stack. Hybrid topology as semantic governance.
Architecture
A typical implementation uses Kafka as the event backbone, microservices as capability owners, stream processors for immediate transformations, and a batch or micro-batch layer for reconciled outputs. microservices architecture diagrams
The streaming side is where you respond now. Fraud scoring, order status updates, notifications, operational dashboards—things where speed matters and provisionality is acceptable.
The batch window side is where you answer responsibly. It reconciles against local state stores, reference data, corrections, and missing events. It applies domain rules that require more complete context. It publishes outputs that carry stronger guarantees.
The important move here is not technical but semantic: stop letting every downstream consumer read directly from the stream and infer meaning independently. That produces fragmentation. Instead, publish named products with clear contracts.
For example:
order_status_realtimefor customer experienceshipment_exception_streamfor operationsdaily_order_ledgerfor financeinventory_position_hourly_certifiedfor supply chain planning
These are different products because they represent different truths.
Domain semantics and bounded contexts
Domain-driven design helps determine where windows belong.
Take an order-to-cash domain:
- Ordering knows submission, cancellation, customer intent
- Payments knows authorization, capture, settlement, chargeback
- Fulfillment knows pick, pack, ship, return
- Billing knows invoice issuance and adjustments
- Finance knows revenue recognition, tax, and accounting periods
A naive platform streams all events and tries to build one “master order state.” That sounds neat and goes bad fast. Each context has different invariants and time expectations. Better to let each bounded context publish its operational events, then use reconciliation windows to create cross-context outputs where the enterprise actually needs integrated meaning.
That ledger is not simply “all events in one place.” It is a domain artifact with policy encoded into it:
- event precedence rules
- duplicate handling
- late-arrival cutoffs
- adjustment logic
- period attribution rules
- exception queues for unresolved records
That is architecture doing business work, which is what good enterprise architecture always was.
Reconciliation is not optional plumbing
In many teams, reconciliation is treated as a support process after the “real” architecture. That is backward. Reconciliation is the mechanism by which a distributed system admits it is distributed.
There are several types:
- event-to-state reconciliation: does the projection reflect source-of-record state?
- cross-context reconciliation: does fulfillment align with billing and payments?
- period reconciliation: are all in-window records complete enough for closure?
- control total reconciliation: do counts, sums, and balances match expected thresholds?
- exception reconciliation: what cannot yet be resolved and who owns it?
Without these, replay is dangerous. With them, replay becomes routine.
Migration Strategy
Most enterprises cannot replace an existing batch estate with a perfect event-driven platform in one move. Nor should they. The path is usually a progressive strangler migration.
Begin by respecting what the current batch windows already do. They often carry hidden business semantics:
- fiscal period cutoffs
- correction handling
- partner-file lateness allowances
- legal reporting conventions
- exception routing
- manual review checkpoints
If you bulldoze those semantics in the name of streaming modernization, the business will simply recreate them outside the platform, usually in spreadsheets and side databases.
A sensible migration sequence looks like this:
Step 1: expose current events without changing closure semantics
Introduce Kafka or an event bus to publish operational facts. Use CDC where necessary. Do not initially replace the trusted batch outputs. Let teams build real-time features on top of the event stream while finance and reporting continue using existing reconciled processes.
This buys learning without semantic risk.
Step 2: create provisional streaming projections
Build low-latency projections for use cases that benefit from speed:
- customer order tracking
- fraud/risk alerts
- operational exception detection
- warehouse workload balancing
Mark these clearly as provisional where needed. The architecture should not hide this.
Step 3: externalize reconciliation logic into explicit windowed pipelines
Take the most important batch jobs and reframe them as domain reconciliation services. Move rules out of opaque scripts and into governed pipelines with lineage, versioning, and observable controls.
Now you are not merely “keeping batch.” You are modernizing it.
Step 4: publish certified outputs as products
Create stable, versioned outputs for enterprise consumers:
- daily cash position
- hourly inventory position
- order-to-cash ledger
- adjusted claims summary
Downstream consumers should shift from scraping raw streams and bespoke tables to consuming these certified products.
Step 5: gradually shorten windows where the domain allows
Some windows can move from nightly to hourly, hourly to fifteen-minute, or fixed to event-time with lateness tolerance. Others should remain daily or period-based. The point is not to eliminate windows. It is to make them as small as the domain safely permits.
That is strangler migration done properly. Not by replacing one platform with another, but by progressively moving authority from old mechanisms to explicit domain products.
Enterprise Example
Consider a global retailer with e-commerce, stores, regional warehouses, and multiple payment providers. This is where hybrid pipeline topology proves its worth.
They began with nightly ETL feeding inventory and finance reports. Then they introduced Kafka and microservices to modernize order management. Soon they had:
- order events from web and stores
- warehouse events from fulfillment systems
- payment events from PSP integrations
- return events from customer service tools
- inventory deltas from WMS and POS systems
The first wave looked successful. Customer-facing order tracking improved. Operations got near-real-time alerts. Fraud blocks happened quickly.
Then the cracks opened.
Inventory showed negative availability in some regions because reservations and decrements arrived out of order. Finance saw order totals shift after daily close because refunds and tax adjustments were processed late. Customer support saw one status in the CRM and another in the self-service portal. Teams argued whether Kafka topics or service databases represented “truth.”
The fix was not more streaming. It was better semantics.
The retailer introduced a hybrid topology with three explicit products:
- Real-time order journey view
Built from Kafka streams, used by customer channels and support. Fast, approximate, correction-friendly.
- Hourly certified inventory position
Reconciled warehouse movements, store sales, returns, reservations, and cycle-count adjustments. Used for replenishment and planning. Included exception counts where confidence thresholds were not met.
- Daily order-to-cash ledger
Reconciled order placement, tax, discount allocation, payment capture, shipment, returns, refunds, and settlement. Used by finance and audit.
This changed behavior. Product teams stopped overloading the real-time order topic for financial use. Supply chain stopped inventing local correction tables. Finance stopped distrusting the platform because the daily ledger had explicit controls and versioned reruns.
Most importantly, the architecture mirrored the business:
- customer experience optimized for freshness
- planning optimized for balanced correctness
- finance optimized for closure and auditability
That is what mature enterprise architecture looks like. Different truths, deliberately governed.
Operational Considerations
Hybrid architectures are not simply two pipelines. They need operational discipline.
Data contracts
Event contracts should capture not just schema but semantics:
- event time versus processing time
- correction indicators
- source identity
- deduplication keys
- business identifiers
- causality where available
Without this, downstream reconciliation becomes guesswork.
Observability
You need more than service latency dashboards. Track:
- event lag
- late arrival rates
- duplicate rates
- watermark delay
- reconciliation match percentages
- control total variance
- exception backlog
- replay duration
- certified output publication SLA
If the platform cannot tell you how uncertain it currently is, it is not under control.
Idempotency and replay
Every stage must be replayable. Streaming projections should tolerate duplicate events. Batch reconciliation should be versioned and deterministic within the rule set used. Certified outputs need traceability:
- which inputs were included
- which cutoffs applied
- which rule version was used
- what exceptions were unresolved
Replay without lineage is just a more expensive form of panic.
Reference data management
Many streaming errors are really reference-data errors: product hierarchy changed, store calendar updated, tax rule revised, supplier mapping corrected. Batch windows are often where these shifts can be applied safely and consistently.
Governance and ownership
Someone must own each semantic product. Not the platform team alone. Domain owners need to own:
- definitions
- cutoff policies
- exception handling
- acceptable lateness
- quality thresholds
A data product with no business owner is a technical artifact waiting to be disputed.
Tradeoffs
Hybrid topology is a grown-up design. Which means it comes with tradeoffs.
Pros
- supports low-latency operational reactions
- provides closure points for finance, audit, and compliance
- handles late and corrected data more safely
- reduces semantic confusion across consumers
- allows progressive modernization from legacy batch estates
- often lowers compute cost for heavy reconciliation workloads
Cons
- more moving parts
- multiple versions of truth must be explained
- requires careful product naming and contracts
- reconciliation logic can become a new complexity hotspot
- window boundaries create policy debates
- downstream teams may resent not reading raw events directly
The right response is not to deny these tradeoffs. It is to make them explicit. Simplicity is not having one pipeline. Simplicity is making the semantics understandable.
Failure Modes
There are a few classic ways teams get this wrong.
1. Treating the batch layer as a dump
If the windowed pipeline becomes an ungoverned landing zone for “fixing things later,” it turns into a swamp. The batch layer must have stronger semantics, not weaker ones.
2. Publishing only technical stages
Topics like orders_enriched_v7 and payments_joined_compacted are not business products. They are processing debris. Consumers need named outputs tied to domain meaning.
3. Using one window policy for every domain
A universal hourly close is as naive as universal real-time. Domains differ. Market close, store close, settlement day, and tax period are distinct clocks.
4. Ignoring exception pathways
Some records cannot be automatically reconciled. Good systems route them explicitly. Bad systems silently coerce them into false certainty.
5. Replacing old batch jobs without understanding hidden business rules
Legacy systems are ugly, but ugliness often conceals policy. If migration teams translate only data movement and not business semantics, the new platform will be faster and less correct.
6. Letting streaming projections become de facto systems of record
This happens all the time. A dashboard becomes trusted. Then a workflow depends on it. Then an audit asks for reproducibility, and nobody can explain the state because the projection was built for immediacy, not durability.
When Not To Use
Hybrid pipeline topology is not universal.
Do not use it when the domain is naturally simple and tolerant of eventual convergence without formal closure. For example:
- lightweight clickstream personalization
- transient IoT telemetry monitoring
- social feed ranking
- ephemeral notification workflows
In these spaces, adding batch windows may just create drag.
Also avoid overengineering if:
- there is no downstream need for certified outputs
- late or corrected data is truly rare and low impact
- a single operational database can still meet the workload
- the domain has not yet stabilized enough to define meaningful windows
And be careful in ultra-low-latency domains where windowed correction is irrelevant to the primary value, such as certain algorithmic decision loops. There, the architecture may still emit historical correction datasets, but the operational path is the product.
The litmus test is simple: does the business need explicit closure, reconciliation, or auditable correction? If not, keep it simpler.
Related Patterns
A hybrid pipeline topology sits well with several established patterns:
- Lambda architecture, though in practice this article argues for a more semantically grounded version rather than technology duplication for its own sake
- Kappa-style streaming with replay, useful on the operational side but often incomplete without explicit closure semantics
- CQRS, where read models can be provisional and domain outputs can be certified separately
- Event sourcing, especially when paired with snapshotting and policy-aware reconciliation
- Data mesh, if you take product ownership seriously and do not confuse raw events with products
- Strangler fig migration, the right way to evolve from monolithic batch estates
- Outbox pattern and CDC, practical ways to capture trustworthy operational events
- Sagas, for process coordination, though sagas do not eliminate the need for later reconciliation
These patterns are complementary. The missing ingredient in many implementations is semantic honesty.
Summary
Streaming platforms are excellent at movement. Enterprises are judged on meaning.
That is why your streaming platform needs batch windows.
Not as nostalgia. Not as a concession. As an architectural acknowledgement that business facts ripen over time. Some are useful immediately. Some are only trustworthy after context, correction, and closure. A mature platform serves both realities.
Use streaming for responsiveness. Use windowed reconciliation for certainty. Design data products around bounded contexts. Name truths according to their purpose. Migrate progressively with a strangler approach. Treat reconciliation as first-class architecture, not after-hours repair work.
A good hybrid pipeline topology does something subtle but powerful: it stops the platform from pretending that every event is final, and stops the enterprise from pretending that waiting for finality is always acceptable.
Fast and right are not enemies. But they do need different rooms in the house.
Frequently Asked Questions
What is event-driven architecture?
Event-driven architecture (EDA) decouples services by having producers publish events to a broker like Kafka, while consumers subscribe independently. This reduces direct coupling, improves resilience, and allows new consumers to be added without modifying producers.
When should you use Kafka vs a message queue?
Use Kafka when you need event replay, high throughput, long retention, or multiple independent consumers reading the same stream. Use a traditional message queue (RabbitMQ, SQS) when you need simple point-to-point delivery, low latency, or complex routing logic per message.
How do you model event-driven architecture in ArchiMate?
In ArchiMate, the Kafka broker is a Technology Service or Application Component. Topics are Data Objects or Application Services. Producer/consumer services are Application Components connected via Flow relationships. This makes the event topology explicit and queryable.