Your Streaming Platform Needs Batch Windows

⏱ 18 min read

Real systems do not fail because the data was late by 400 milliseconds. They fail because nobody agreed what “late” meant.

That is the dirty secret behind a lot of modern streaming architecture. We built event-driven platforms, wired Kafka into every corner of the enterprise, split processing into fleets of microservices, and congratulated ourselves for being real-time. Then finance asked for the month-end numbers. Operations asked why yesterday’s shipment count changed overnight. Customer support asked why an order was shown as “paid” in one screen and “pending” in another. Suddenly the proud streaming platform looked less like a nervous system and more like a rumor mill.

The industry oversold purity. “Everything streaming” sounds elegant on a conference slide. In an enterprise, it often means pushing uncertainty downstream at machine speed.

The better pattern is humbler and more durable: a hybrid pipeline topology where streaming handles immediacy and batch windows restore semantic certainty. Not because streaming is weak. Because business truth has shape, and that shape is rarely fully known in the first few seconds of an event’s life.

This is not a retreat to old-school nightly ETL. It is a deliberate architecture choice. Stream for reaction. Batch for correction. Use events to move quickly, and windows to decide what you actually mean.

That distinction matters.

Context

Most enterprises now sit in an awkward middle ground. They have outgrown central batch-only data processing, but they have not escaped the economics and semantics that made batch useful in the first place. Orders arrive continuously. Inventory changes in bursts. Payments settle asynchronously. Customer records drift across channels. Compliance and finance still need reconciled, auditable answers.

So the platform evolves in layers:

Kafka or another event backbone for transport
microservices owning operational capabilities
stream processors for low-latency transformations
analytical stores or lakehouse platforms for historical views
operational databases still acting as local systems of record
a growing need for reconciliation, replay, and end-of-period certainty

In many organizations, the first streaming wave is driven by customer-facing needs: fraud detection, personalization, alerts, order tracking, pricing, and workflow orchestration. These are good use cases. They reward low latency.

The second wave is where the architecture gets serious. Finance wants a trusted ledger. Supply chain wants exact inventory positions. Risk wants reproducibility. Audit wants lineage. Data governance wants consistent definitions. Product teams discover that an event saying OrderShipped is not the same thing as revenue recognition, and a PaymentReceived event is not the same thing as settled cash.

This is where domain-driven design becomes useful, not fashionable.

A platform is not “real-time” because messages move quickly. It is real-time when the domain semantics of a decision tolerate immediate interpretation. If your domain cannot confidently interpret an event until more context arrives, then low latency is a transport property, not a business truth.

That is why batch windows remain essential.

Problem

Pure streaming topologies have a recurring flaw: they confuse event arrival with business finality.

An event appears on Kafka. A service consumes it. Another service enriches it. A stream processor aggregates it. A dashboard updates. Everybody feels fast. But the event may be incomplete, out of order, duplicated, subsequently corrected, or semantically provisional. event-driven architecture patterns

A few common examples:

A retail order is placed, but payment authorization later expires.
A shipment event arrives before inventory decrement due to integration lag.
A refund event is posted after the revenue dashboard already counted the sale.
A healthcare claim enters as approved, then is adjusted after adjudication.
A telecom usage event is emitted promptly, but rating rules are back-applied.
A bank transfer event appears accepted, but settlement later fails.

Streaming handles all of this mechanically. It can reorder, buffer, watermark, compact, and reprocess. But enterprises do not struggle because the mechanics are impossible. They struggle because there are multiple valid meanings of “current state”:

current observed state
current inferred state
current legally recognized state
current financially closed state
current customer-visible state

Those are not the same thing.

If you force a single streaming pipeline to represent all of them at once, the architecture starts lying. Teams invent hidden side rules. Exceptions pile up. Replays become dangerous because they overwrite operational assumptions. “Eventually consistent” turns into “politically negotiated.”

Batch windows solve a practical problem: they create explicit moments when the organization agrees to stabilize, reconcile, and publish a version of truth fit for a particular purpose.

Not one truth for all use cases. That fantasy should be retired. But a truth with a defined semantic contract.

Forces

Several forces push architects toward hybrid designs.

1. Immediacy versus correctness

The business often needs fast reactions:

fraud blocks in seconds
customer notifications in near real time
order routing immediately after submission
operational alerts as conditions emerge

But those same domains later require corrected, auditable records. The faster stream often sees only the first draft of reality.

2. Domain semantics are not uniform

A shipment domain can tolerate transient uncertainty differently from a general ledger domain. Inventory reservation, fulfillment confirmation, invoicing, and cash settlement live on different clocks.

This is classic domain-driven design territory. Different bounded contexts speak different truths at different times. The architecture should reflect that instead of flattening everything into one event soup.

3. Late and changed data are normal

Not edge cases. Normal.

External partners resend files. mobile clients reconnect. upstream systems replay messages. CDC feeds reorder changes. manual corrections arrive after business cutoffs. Reference data changes after transactions were evaluated.

If your platform has no designed place to absorb correction, it will absorb it chaotically.

4. Enterprises need closure points

Month-end close, daily settlement, intraday risk snapshots, SLA reports, commission calculations, tax calculations, compliance submissions—these all need windows, versions, and reproducibility.

A streaming topology without closure points is like an accounting department that never closes the books.

5. Cost and operability

Low-latency compute everywhere is expensive. Continuous stateful processing is operationally sensitive. Some calculations simply do not need to run continuously. Many heavy joins, historical corrections, and broad reconciliations are far cheaper and safer in bounded windows.

6. Consumer diversity

Some consumers need fresh but approximate data. Others need slower but certified data. Architectures improve when they stop pretending those consumers are the same.

Solution

Use a hybrid pipeline topology.

In plain terms:

Streaming path for immediate reactions, provisional views, local decisions, and operational workflows
Batch window path for reconciliation, correction, semantic normalization, and publication of trusted domain outputs

The key is not to treat batch as a fallback. Treat it as a first-class semantic stage.

A healthy hybrid topology usually has three layers of meaning:

Raw event flow

Facts as observed by producing systems. Fast, messy, valuable.

Operational streaming projections

Low-latency derived views used for responsive experiences and local automation.

Windowed reconciled outputs

Domain-certified datasets, ledgers, aggregates, and snapshots produced in explicit intervals.

That interval might be:

every 5 minutes for SLA reporting
hourly for inventory balancing
daily for financial statements
event-time windows with watermark tolerance
business-day close aligned to a market or geography

The design discipline is to make semantics explicit:

what is provisional?
what is final for now?
what can be corrected?
who owns correction?
where is replay safe?
which publication becomes authoritative for which bounded context?

That is the architecture pattern. Not “stream plus batch” as a technology stack. Hybrid topology as semantic governance.

Architecture

A typical implementation uses Kafka as the event backbone, microservices as capability owners, stream processors for immediate transformations, and a batch or micro-batch layer for reconciled outputs. microservices architecture diagrams

The streaming side is where you respond now. Fraud scoring, order status updates, notifications, operational dashboards—things where speed matters and provisionality is acceptable.

The batch window side is where you answer responsibly. It reconciles against local state stores, reference data, corrections, and missing events. It applies domain rules that require more complete context. It publishes outputs that carry stronger guarantees.

The important move here is not technical but semantic: stop letting every downstream consumer read directly from the stream and infer meaning independently. That produces fragmentation. Instead, publish named products with clear contracts.

For example:

order_status_realtime for customer experience
shipment_exception_stream for operations
daily_order_ledger for finance
inventory_position_hourly_certified for supply chain planning

These are different products because they represent different truths.

Domain semantics and bounded contexts

Domain-driven design helps determine where windows belong.

Take an order-to-cash domain:

Ordering knows submission, cancellation, customer intent
Payments knows authorization, capture, settlement, chargeback
Fulfillment knows pick, pack, ship, return
Billing knows invoice issuance and adjustments
Finance knows revenue recognition, tax, and accounting periods

A naive platform streams all events and tries to build one “master order state.” That sounds neat and goes bad fast. Each context has different invariants and time expectations. Better to let each bounded context publish its operational events, then use reconciliation windows to create cross-context outputs where the enterprise actually needs integrated meaning.

Diagram 2 — Domain semantics and bounded contexts

That ledger is not simply “all events in one place.” It is a domain artifact with policy encoded into it:

event precedence rules
duplicate handling
late-arrival cutoffs
adjustment logic
period attribution rules
exception queues for unresolved records

That is architecture doing business work, which is what good enterprise architecture always was.

Reconciliation is not optional plumbing

In many teams, reconciliation is treated as a support process after the “real” architecture. That is backward. Reconciliation is the mechanism by which a distributed system admits it is distributed.

There are several types:

event-to-state reconciliation: does the projection reflect source-of-record state?
cross-context reconciliation: does fulfillment align with billing and payments?
period reconciliation: are all in-window records complete enough for closure?
control total reconciliation: do counts, sums, and balances match expected thresholds?
exception reconciliation: what cannot yet be resolved and who owns it?

Without these, replay is dangerous. With them, replay becomes routine.

Migration Strategy

Most enterprises cannot replace an existing batch estate with a perfect event-driven platform in one move. Nor should they. The path is usually a progressive strangler migration.

Begin by respecting what the current batch windows already do. They often carry hidden business semantics:

fiscal period cutoffs
correction handling
partner-file lateness allowances
legal reporting conventions
exception routing
manual review checkpoints

If you bulldoze those semantics in the name of streaming modernization, the business will simply recreate them outside the platform, usually in spreadsheets and side databases.

A sensible migration sequence looks like this:

Step 1: expose current events without changing closure semantics

Introduce Kafka or an event bus to publish operational facts. Use CDC where necessary. Do not initially replace the trusted batch outputs. Let teams build real-time features on top of the event stream while finance and reporting continue using existing reconciled processes.

This buys learning without semantic risk.

Step 2: create provisional streaming projections

Build low-latency projections for use cases that benefit from speed:

customer order tracking
fraud/risk alerts
operational exception detection
warehouse workload balancing

Mark these clearly as provisional where needed. The architecture should not hide this.

Step 3: externalize reconciliation logic into explicit windowed pipelines

Take the most important batch jobs and reframe them as domain reconciliation services. Move rules out of opaque scripts and into governed pipelines with lineage, versioning, and observable controls.

Now you are not merely “keeping batch.” You are modernizing it.

Step 4: publish certified outputs as products

Create stable, versioned outputs for enterprise consumers:

daily cash position
hourly inventory position
order-to-cash ledger
adjusted claims summary

Downstream consumers should shift from scraping raw streams and bespoke tables to consuming these certified products.

Step 5: gradually shorten windows where the domain allows

Some windows can move from nightly to hourly, hourly to fifteen-minute, or fixed to event-time with lateness tolerance. Others should remain daily or period-based. The point is not to eliminate windows. It is to make them as small as the domain safely permits.

That is strangler migration done properly. Not by replacing one platform with another, but by progressively moving authority from old mechanisms to explicit domain products.

Enterprise Example

Consider a global retailer with e-commerce, stores, regional warehouses, and multiple payment providers. This is where hybrid pipeline topology proves its worth.

They began with nightly ETL feeding inventory and finance reports. Then they introduced Kafka and microservices to modernize order management. Soon they had:

order events from web and stores
warehouse events from fulfillment systems
payment events from PSP integrations
return events from customer service tools
inventory deltas from WMS and POS systems

The first wave looked successful. Customer-facing order tracking improved. Operations got near-real-time alerts. Fraud blocks happened quickly.

Then the cracks opened.

Inventory showed negative availability in some regions because reservations and decrements arrived out of order. Finance saw order totals shift after daily close because refunds and tax adjustments were processed late. Customer support saw one status in the CRM and another in the self-service portal. Teams argued whether Kafka topics or service databases represented “truth.”

The fix was not more streaming. It was better semantics.

The retailer introduced a hybrid topology with three explicit products:

Real-time order journey view

Built from Kafka streams, used by customer channels and support. Fast, approximate, correction-friendly.

Hourly certified inventory position

Reconciled warehouse movements, store sales, returns, reservations, and cycle-count adjustments. Used for replenishment and planning. Included exception counts where confidence thresholds were not met.

Daily order-to-cash ledger

Reconciled order placement, tax, discount allocation, payment capture, shipment, returns, refunds, and settlement. Used by finance and audit.

This changed behavior. Product teams stopped overloading the real-time order topic for financial use. Supply chain stopped inventing local correction tables. Finance stopped distrusting the platform because the daily ledger had explicit controls and versioned reruns.

Most importantly, the architecture mirrored the business:

customer experience optimized for freshness
planning optimized for balanced correctness
finance optimized for closure and auditability

That is what mature enterprise architecture looks like. Different truths, deliberately governed.

Operational Considerations

Hybrid architectures are not simply two pipelines. They need operational discipline.

Data contracts

Event contracts should capture not just schema but semantics:

event time versus processing time
correction indicators
source identity
deduplication keys
business identifiers
causality where available

Without this, downstream reconciliation becomes guesswork.

Observability

You need more than service latency dashboards. Track:

event lag
late arrival rates
duplicate rates
watermark delay
reconciliation match percentages
control total variance
exception backlog
replay duration
certified output publication SLA

If the platform cannot tell you how uncertain it currently is, it is not under control.

Idempotency and replay

Every stage must be replayable. Streaming projections should tolerate duplicate events. Batch reconciliation should be versioned and deterministic within the rule set used. Certified outputs need traceability:

which inputs were included
which cutoffs applied
which rule version was used
what exceptions were unresolved

Replay without lineage is just a more expensive form of panic.

Reference data management

Many streaming errors are really reference-data errors: product hierarchy changed, store calendar updated, tax rule revised, supplier mapping corrected. Batch windows are often where these shifts can be applied safely and consistently.

Governance and ownership

Someone must own each semantic product. Not the platform team alone. Domain owners need to own:

definitions
cutoff policies
exception handling
acceptable lateness
quality thresholds

A data product with no business owner is a technical artifact waiting to be disputed.

Tradeoffs

Hybrid topology is a grown-up design. Which means it comes with tradeoffs.

Pros

supports low-latency operational reactions
provides closure points for finance, audit, and compliance
handles late and corrected data more safely
reduces semantic confusion across consumers
allows progressive modernization from legacy batch estates
often lowers compute cost for heavy reconciliation workloads

Cons

more moving parts
multiple versions of truth must be explained
requires careful product naming and contracts
reconciliation logic can become a new complexity hotspot
window boundaries create policy debates
downstream teams may resent not reading raw events directly

The right response is not to deny these tradeoffs. It is to make them explicit. Simplicity is not having one pipeline. Simplicity is making the semantics understandable.

Failure Modes

There are a few classic ways teams get this wrong.

1. Treating the batch layer as a dump

If the windowed pipeline becomes an ungoverned landing zone for “fixing things later,” it turns into a swamp. The batch layer must have stronger semantics, not weaker ones.

2. Publishing only technical stages

Topics like orders_enriched_v7 and payments_joined_compacted are not business products. They are processing debris. Consumers need named outputs tied to domain meaning.

3. Using one window policy for every domain

A universal hourly close is as naive as universal real-time. Domains differ. Market close, store close, settlement day, and tax period are distinct clocks.

4. Ignoring exception pathways

Some records cannot be automatically reconciled. Good systems route them explicitly. Bad systems silently coerce them into false certainty.

5. Replacing old batch jobs without understanding hidden business rules

Legacy systems are ugly, but ugliness often conceals policy. If migration teams translate only data movement and not business semantics, the new platform will be faster and less correct.

6. Letting streaming projections become de facto systems of record

This happens all the time. A dashboard becomes trusted. Then a workflow depends on it. Then an audit asks for reproducibility, and nobody can explain the state because the projection was built for immediacy, not durability.

When Not To Use

Hybrid pipeline topology is not universal.

Do not use it when the domain is naturally simple and tolerant of eventual convergence without formal closure. For example:

lightweight clickstream personalization
transient IoT telemetry monitoring
social feed ranking
ephemeral notification workflows

In these spaces, adding batch windows may just create drag.

Also avoid overengineering if:

there is no downstream need for certified outputs
late or corrected data is truly rare and low impact
a single operational database can still meet the workload
the domain has not yet stabilized enough to define meaningful windows

And be careful in ultra-low-latency domains where windowed correction is irrelevant to the primary value, such as certain algorithmic decision loops. There, the architecture may still emit historical correction datasets, but the operational path is the product.

The litmus test is simple: does the business need explicit closure, reconciliation, or auditable correction? If not, keep it simpler.

A hybrid pipeline topology sits well with several established patterns:

Lambda architecture, though in practice this article argues for a more semantically grounded version rather than technology duplication for its own sake
Kappa-style streaming with replay, useful on the operational side but often incomplete without explicit closure semantics
CQRS, where read models can be provisional and domain outputs can be certified separately
Event sourcing, especially when paired with snapshotting and policy-aware reconciliation
Data mesh, if you take product ownership seriously and do not confuse raw events with products
Strangler fig migration, the right way to evolve from monolithic batch estates
Outbox pattern and CDC, practical ways to capture trustworthy operational events
Sagas, for process coordination, though sagas do not eliminate the need for later reconciliation

These patterns are complementary. The missing ingredient in many implementations is semantic honesty.

Summary

Streaming platforms are excellent at movement. Enterprises are judged on meaning.

That is why your streaming platform needs batch windows.

Not as nostalgia. Not as a concession. As an architectural acknowledgement that business facts ripen over time. Some are useful immediately. Some are only trustworthy after context, correction, and closure. A mature platform serves both realities.

Use streaming for responsiveness. Use windowed reconciliation for certainty. Design data products around bounded contexts. Name truths according to their purpose. Migrate progressively with a strangler approach. Treat reconciliation as first-class architecture, not after-hours repair work.

A good hybrid pipeline topology does something subtle but powerful: it stops the platform from pretending that every event is final, and stops the enterprise from pretending that waiting for finality is always acceptable.

Fast and right are not enemies. But they do need different rooms in the house.

Frequently Asked Questions

What is event-driven architecture?

Event-driven architecture (EDA) decouples services by having producers publish events to a broker like Kafka, while consumers subscribe independently. This reduces direct coupling, improves resilience, and allows new consumers to be added without modifying producers.

When should you use Kafka vs a message queue?

Use Kafka when you need event replay, high throughput, long retention, or multiple independent consumers reading the same stream. Use a traditional message queue (RabbitMQ, SQS) when you need simple point-to-point delivery, low latency, or complex routing logic per message.

How do you model event-driven architecture in ArchiMate?

In ArchiMate, the Kafka broker is a Technology Service or Application Component. Topics are Data Objects or Application Services. Producer/consumer services are Application Components connected via Flow relationships. This makes the event topology explicit and queryable.