Data Flow vs Control Flow in Architecture

⏱ 19 min read

Architecture gets into trouble when we confuse movement with meaning.

Teams draw boxes and arrows, put Kafka in the middle, add a workflow engine on the side, and call it “event-driven.” Six months later they discover the ugly truth: they never agreed whether the system was supposed to move information or tell components what to do. That distinction sounds small. It isn’t. It’s the difference between a city’s road network and a traffic cop shouting orders at every junction.

Data flow and control flow are not rivals in some tidy theoretical contest. They are two very different ways of shaping behavior in software systems. One treats the architecture as a stream of facts moving across bounded contexts. The other treats it as a set of decisions, commands, and coordinated steps. Good systems usually need both. Bad systems blur them until nobody can tell where business meaning lives, where state changes are authoritative, or why a retry caused three invoices and no shipment.

This is where enterprise architecture earns its keep. Not by adding more diagrams, but by deciding what kind of arrow each arrow really is.

The practical question is not “which one is better?” The practical question is: what should flow, what should decide, and where should the business semantics live?

Context

Most modern enterprise systems are a patchwork of styles. There is a transactional core, some APIs, a message broker, a reporting pipeline, maybe a workflow engine, and usually a handful of microservices that were meant to improve agility but mainly increased the number of dashboards. Inside that patchwork, data flow and control flow are both present, whether anyone names them or not. microservices architecture diagrams

Data flow is about the movement of information through the system. Records, events, documents, state changes, telemetry, facts. It answers questions like:

What happened?
What data changed?
Who needs to know?
How does information propagate across domains?

Control flow is about directing behavior. Commands, orchestration, workflow decisions, retries, compensations, approvals, sequencing. It answers questions like:

What should happen next?
Who is responsible for deciding?
What sequence must be enforced?
How do we handle timeouts and failure recovery?

This distinction matters most once systems stop being monoliths. In a monolith, the same codebase often carries both concerns without much ceremony. A method call both moves data and controls execution. Once systems are split across services, teams, databases, and queues, the line becomes expensive. If data flow is mistaken for control flow, services become passive databases with topics attached. If control flow is mistaken for data flow, every event turns into a remote procedure call wearing a fake moustache.

Domain-driven design gives us a better lens. Business systems are not merely pipelines. They are collections of bounded contexts, each with its own language, invariants, and responsibilities. Data flow should carry domain facts between contexts. Control flow should coordinate processes where the business truly requires sequencing and explicit decision-making. If those semantics are not clear, the architecture will lie.

Problem

The common failure is architectural ambiguity.

A team says they are “using events.” But are those events domain facts, integration messages, or just commands with another name? Another team introduces an orchestrator. But is it coordinating a long-running business process, or is it centralizing logic that belongs inside domain services? Kafka gets installed and suddenly every problem looks like a topic. A BPM engine arrives and suddenly every problem looks like a workflow. The tools start defining the architecture. event-driven architecture patterns

That leads to three recurring pathologies.

First, event-driven systems that are secretly command-driven. Services publish events like CustomerCreditChecked when what they really mean is ShippingServicePleaseProceed. The message claims to be a fact, but behaves like an instruction. Consumers become coupled to implied order and hidden expectations.

Second, workflow-heavy systems that suffocate domain autonomy. A central engine tells every service exactly what to do and when. The architecture becomes operationally neat but semantically brittle. Every policy change becomes a workflow change. Every domain service becomes a glorified adapter.

Third, integration models that leak across bounded contexts. One team’s internal lifecycle becomes everybody else’s external dependency. A service emits low-level state transitions that force other services to understand implementation details they should never have seen.

The result is familiar: accidental choreography, fragile orchestration, duplicate state, hard-to-explain incidents, and endless debates over whether an event should be replayed or ignored.

Forces

There is no clean universal answer because the forces pull in different directions.

Business semantics vs technical convenience

The architecture should mirror the business, not the transport. Some domains naturally emit facts that others consume asynchronously. Others depend on explicit approvals, ordered steps, and visible process state. The easiest implementation is often not the right semantic model.

Autonomy vs coordination

Microservices promise team and runtime autonomy. But many business processes cross domain boundaries: order management, claims handling, underwriting, loan approval, patient discharge. Those processes require coordination. Too much control flow centralizes everything. Too much data flow leaves critical outcomes to optimistic interpretation.

Throughput vs determinism

Data flow systems, especially with Kafka and append-only logs, are excellent for scale, replay, fan-out, and decoupling. Control flow systems are excellent when sequence, deadlines, and compensations matter. One optimizes movement. The other optimizes intent.

Domain truth vs read-model convenience

Data moving between contexts is often transformed, denormalized, and cached. That is useful. It is also dangerous. If those projections start masquerading as authoritative state, reconciliation becomes inevitable and ugly.

Changeability vs observability

Distributed data flow can be resilient and evolvable, but difficult to reason about in real time. Explicit control flow is easier to trace because the process model is visible, but often harder to evolve without ripple effects.

Local invariants vs end-to-end outcomes

A bounded context can enforce its own invariants. Cross-context business outcomes are another matter. “Order accepted” is local. “Order shipped and invoiced correctly” is end-to-end. The architecture must choose where those guarantees live.

Solution

The sensible answer is not to pick a winner. It is to use each style for what it is good at, and to be ruthless about semantics.

A useful rule of thumb:

Use data flow to propagate facts.
Use control flow to coordinate decisions.

That sounds obvious, but the consequences are sharp.

If a bounded context completes a business action and other contexts may care, publish a domain event or integration event that says what happened in business terms. That is data flow. It should not imply that the publisher is now responsible for the consumer’s next action.

If a business process requires explicit sequencing, deadlines, compensations, human approval, or policy-driven branching, introduce control flow. That may be an orchestrator, workflow engine, or process manager. But keep it at the level of business coordination, not internal service logic.

This leads to a layered stance:

Inside a bounded context, use whatever control structures the domain needs.
Across bounded contexts, prefer data flow for state propagation and loose coupling.
Across long-running business processes, use explicit control flow where the process itself is a first-class domain concept.

In DDD terms, the process may deserve its own bounded context: Order Fulfillment, Claims Adjudication, Loan Origination. That context owns the process language and coordination policy. It does not own the internal rules of Payment, Inventory, Shipping, or Billing.

That distinction keeps the architecture honest.

Architecture

A simple comparison makes the point.

In this model, services publish facts and react to facts. Nobody centrally tells Billing or Shipping what to do. Each bounded context interprets events based on its own rules. This works well when downstream behavior is autonomous and eventual consistency is acceptable.

Now compare that with explicit control.

Diagram 2 — Data Flow vs Control Flow in Architecture

Here a process owner decides sequence and next steps. This is the right shape when the process itself carries business meaning: reserve before payment, hold if fraud review is pending, cancel if SLA expires, compensate if shipment creation fails after capture.

Most enterprises need a hybrid.

Diagram 3 — Data Flow vs Control Flow in Architecture

This hybrid pattern is usually the mature answer. Domain facts move over the event backbone. A dedicated process context applies control flow only where business coordination is necessary. Kafka is used as an event log and integration spine, not as a magical replacement for process semantics.

Domain semantics discussion

This is the part many teams skip, and then regret.

An event is not “something happened in the code.” An event is “something happened in the business.” OrderPlaced, PaymentCaptured, ShipmentDispatched are domain semantics. RowUpdated, StatusChangedTo3, WorkflowStepCompleted are usually technical leakage.

Likewise, commands should carry intent, not implementation mechanics. ReserveInventory is legitimate because it expresses a business action. SetInventoryStatusReserved often signals an anemic design and misplaced responsibility.

Data flow should preserve business facts across contexts. Control flow should preserve business intent across steps.

The sharper the language, the healthier the system.

Kafka and microservices

Kafka shines in data flow architectures because it provides durable event streams, replay, fan-out, partitioned scalability, and decoupling in time. It is excellent for propagating facts, building read models, integrating bounded contexts, and capturing an auditable history of business events.

Kafka is much less good at being an invisible workflow engine. Teams can absolutely implement process coordination on top of Kafka, but they should do so explicitly, with visible state machines, correlation IDs, timeout handling, and compensation logic. Otherwise they end up with “workflow by topic subscription,” which is one of those ideas that looks elegant in slides and turns nasty in production.

Microservices make this worse because they multiply boundaries. Every boundary introduces translation, latency, failure, and ownership questions. This is why domain-driven design matters so much here. Service boundaries should follow bounded contexts, not database tables or org charts. Data flow between contexts should use published language. Control flow should reflect real business processes, not technical dependencies.

Migration Strategy

The hardest part is not designing the target. It is getting there without freezing the company.

Most enterprises start with a monolith or a tightly coupled service estate. The migration should be progressive, semantic, and boring. Boring is underrated.

1. Identify domain seams

Start by mapping bounded contexts and key business events. Do not begin with Kafka topics or workflow products. Begin with domain language: order accepted, policy quoted, claim registered, payment settled, stock reserved. Ask where authority lives for each concept.

2. Externalize facts before extracting control

A strong migration pattern is to first publish reliable business events from the incumbent system. Use the monolith as the initial system of record while exposing a stream of facts for downstream consumers. This establishes data flow without forcing immediate process decomposition.

This is classic strangler thinking. You do not rip out the old city overnight. You divert traffic one road at a time.

3. Build downstream capabilities as consumers

Create new services or contexts that subscribe to business events and build independent capabilities: notifications, search, reporting, customer communication, risk scoring, recommendation engines, partner integration. These are often easier wins because they consume facts without immediately owning the transaction.

4. Introduce process ownership where needed

Once the event backbone is stable, identify cross-context processes that truly require coordination. Extract them into a dedicated process context or orchestrator. Do not centralize everything. Only centralize the process logic that is genuinely cross-domain and long-running.

5. Reconcile during coexistence

During migration, you will have duplicate state and competing timelines. Pretending otherwise is childish. Build reconciliation explicitly.

Reconciliation means:

comparing source-of-truth state with downstream projections
detecting missing or duplicate events
correcting drift across old and new systems
defining what happens when one side says “paid” and the other says “pending”

This usually requires:

immutable event IDs
correlation and causation IDs
idempotent consumers
replay support
dead letter handling
operational dashboards for semantic mismatches

6. Shift authority gradually

A bounded context should only become authoritative after its data contracts, failure handling, and operational ownership are mature. Until then, keep clear transitional rules: who owns truth, who mirrors it, how disputes are resolved.

Progressive strangler migration in practice

The best migrations I have seen move in this order:

Publish domain events from the legacy core.
Build read models and peripheral capabilities from those events.
Extract one bounded context with clear authority.
Add anti-corruption layers to prevent legacy semantics leaking outward.
Introduce process coordination only for a specific business journey.
Retire legacy responsibilities incrementally, not theatrically.

The anti-pattern is trying to introduce microservices, Kafka, and orchestration all at once. That is not transformation. That is a distributed outage with PowerPoint support.

Enterprise Example

Consider a global retailer modernizing order fulfillment across e-commerce, stores, and third-party logistics.

The legacy estate had one central order management platform. It owned checkout, stock checks, payment capture, shipment creation, invoicing, and customer notifications. Every integration was synchronous. Peak season was a war.

The first modernization attempt embraced “events everywhere.” The team put Kafka in place and split the platform into Order, Inventory, Payment, Shipping, and Notification services. Orders were placed, and each service reacted to events. On paper it looked elegantly decoupled.

Then reality arrived.

Some orders required fraud screening before payment capture. Some products shipped from store inventory, others from distribution centers. Split shipments were common. Invoices had country-specific rules. Customer promises depended on stock reservation timing. The process was not a loose collection of reactions; it was a business capability with policy, timing, and exception handling.

The data flow model alone was not enough.

Inventory interpreted OrderPlaced as a signal to reserve. Payment interpreted it as a signal to authorize. Shipping waited for inventory and payment events, but race conditions created gaps. When retries happened, duplicate shipment requests appeared. Some orders were paid but not released. Others were shipped before invoice generation in countries where that was non-compliant. The event backbone was fine. The semantics were not.

The corrective architecture was hybrid.

Order Management remained the authority for accepted orders.
Kafka became the event backbone for domain facts and downstream projections.
A new Fulfillment Process bounded context was introduced to coordinate long-running order journeys.
Inventory, Payment, Shipping, and Billing remained autonomous bounded contexts with their own models and invariants.
The fulfillment process issued business commands, consumed result events, tracked deadlines, and handled compensations.
Read models for customer service, partner visibility, and operational monitoring were built from Kafka topics.

This changed the conversation. Instead of treating every event as an instruction, teams agreed on which messages were facts and which interactions represented explicit process decisions.

The migration used strangler steps. The monolith first emitted OrderAccepted, PaymentCaptured, and ShipmentCreated events. New consumer services were introduced for notifications and analytics. Inventory was then extracted with a clear reservation API and event contract. Only after those contracts stabilized did the company introduce the fulfillment process context. The final step was moving shipment initiation authority away from the monolith.

Reconciliation was essential. During coexistence, the retailer ran daily and intraday checks between:

order states in the monolith
fulfillment process state
inventory reservation records
payment ledger entries
shipping milestones

Discrepancies were not treated as random technical noise. They were treated as business incidents with named categories: “authorized not reserved,” “reserved not released,” “shipped not invoiced,” “duplicate shipment intent.” That naming mattered. It turned distributed ambiguity into operational language.

By the second peak season, the architecture was not simpler. It was clearer. And clarity scales better than cleverness.

Operational Considerations

Architectures fail in production for operational reasons long before they fail in diagrams.

Observability

Data flow systems need end-to-end tracing with correlation IDs that survive hops across topics and services. Control flow systems need visible process state, not just logs. If operators cannot answer “where is this order now and why?” the architecture is unfinished.

Idempotency

Retries are inevitable. Consumers in data flow architectures must be idempotent. Commands in control flow architectures must tolerate reissue or have deduplication keys. Otherwise resilience mechanisms become duplication machines.

Time

Control flow exposes time as a first-class concern: deadlines, waits, reminders, escalation, expiry. Data flow often hides time until it bites. If the business cares about “within 15 minutes” or “before end-of-day settlement,” model that explicitly.

Reconciliation

Reconciliation is not a last-resort batch script. In distributed enterprise systems it is part of the design. Every architecture with asynchronous propagation and separate authorities needs processes for drift detection and correction.

Versioning

Event schemas evolve. Commands evolve. Domain language evolves. Use compatibility strategies, schema governance, and anti-corruption layers. The cheapest way to destroy autonomy is to let every consumer bind directly to the publisher’s internal model. EA governance checklist

Security and compliance

Data flow often spreads data widely. That can be excellent for integration and terrible for privacy if uncontrolled. Control flow can centralize sensitive decisions but also create concentration risk. Enterprises need data classification, topic access controls, retention policies, and auditability by design.

Tradeoffs

There are no free lunches here, only different bills.

Data flow strengths

Loose coupling across bounded contexts
High scalability and fan-out
Durable audit trails
Natural support for analytics and projections
Better resilience to temporal decoupling

Data flow costs

Harder end-to-end reasoning
Eventual consistency and drift
Hidden process logic spread across consumers
Complex debugging and replay semantics
Easy to abuse as accidental command routing

Control flow strengths

Explicit business process visibility
Easier sequencing and timeout handling
Clear ownership of end-to-end outcomes
Better support for compensation and SLA management
More straightforward operational tracing

Control flow costs

Risk of centralizing domain logic
Tighter coupling to process definitions
Reduced service autonomy
Workflow changes can become bottlenecks
Temptation to turn services into passive endpoints

The best enterprise architectures accept the tension instead of pretending to resolve it entirely. Facts should flow freely. Decisions should live where the business would expect to find them.

Failure Modes

A few failure modes show up repeatedly.

Event soup

Everything is emitted, nothing is curated, and topic catalogs become archaeological sites. Consumers guess which event matters. Semantics decay.

Central brain syndrome

An orchestrator grows until it contains business rules from every domain. Services become dumb executors. Every change queues behind the workflow team.

Hidden sequencing assumptions

Teams claim a data flow architecture, but consumers quietly rely on event order, timing, or the presence of intermediate states. One partition change later, the truth emerges.

Projection-as-truth

A downstream read model becomes operationally convenient and starts being treated as authoritative. Eventually it diverges from the source system and nobody knows which one wins.

Compensation theater

Architects proudly say “we use sagas” without designing real compensation semantics. Not every action is reversible. Not every compensation restores prior business truth. Some failures require human resolution.

Legacy leakage

During migration, old status codes and transaction assumptions bleed into new services. The new estate looks modern but speaks in legacy dialect. The future arrives wearing the old system’s clothes.

When Not To Use

Do not force this distinction into places where it buys little.

If you have a small monolith with a coherent domain, modest scale, and one team, keep it simple. In-process calls often beat distributed purity. You do not need Kafka to move information between classes. You do not need orchestration to model a straightforward transaction already protected by ACID semantics.

Do not introduce event backbones just to look modern. If no bounded context needs asynchronous propagation or independent replayable consumption, a direct API may be better.

Do not introduce workflow engines when the process is trivial, static, and contained within one domain. Every orchestrator comes with cognitive and operational weight.

And do not split services before you understand the domain language. Premature decomposition turns domain confusion into network traffic.

Several patterns sit close to this discussion.

Event-Driven Architecture: best understood as a family of data flow patterns, not a complete replacement for process coordination.
Saga: useful for long-running cross-service transactions, but only if compensations are meaningful and state is explicit.
Process Manager / Orchestrator: appropriate when the process itself is a business concept needing explicit control.
CQRS: often pairs naturally with data flow for read-model generation, though many teams overcomplicate it.
Event Sourcing: powerful when the event history is the domain truth, but not necessary for every event-driven design.
Strangler Fig Pattern: essential for progressive migration away from monoliths and tightly coupled cores.
Anti-Corruption Layer: vital when introducing new bounded contexts beside legacy systems with polluted models.
Outbox Pattern: often necessary to publish reliable business events from transactional systems without dual-write disasters.

These patterns are not a menu of fashionable options. They are tools. Their value comes from semantic fit.

Summary

Data flow and control flow are not two notations for the same thing. They describe different architectural responsibilities.

Data flow moves business facts across bounded contexts. It is the architecture of propagation, integration, history, and decoupling. Kafka and event streams are natural tools here.

Control flow directs business decisions across time and dependency boundaries. It is the architecture of coordination, sequencing, exception handling, and outcome ownership. Process managers and orchestrators belong here.

Enterprise systems need both, but not mixed indiscriminately. Domain-driven design gives the right discipline: let bounded contexts own their models and invariants, let business facts travel in clear language, and make process coordination explicit when the business process is itself a first-class concept.

During migration, start with facts, then extract capabilities, then introduce process ownership selectively. Use strangler steps. Build reconciliation as a normal mechanism, not a confession of failure. Be honest about eventual consistency, duplicate messages, and semantic drift.

The memorable rule is this:

If the message says what happened, favor data flow. If it says what must happen next, treat it as control flow.

That one distinction saves a surprising amount of architecture.

Frequently Asked Questions

What is enterprise architecture?

Enterprise architecture aligns strategy, business processes, applications, and technology in a coherent model. It enables impact analysis, portfolio rationalisation, governance, and transformation planning across the organisation.

How does ArchiMate support architecture practice?

ArchiMate provides a standard language connecting strategy, business operations, applications, and technology. It enables traceability from strategic goals through capabilities and services to infrastructure — making architecture decisions explicit and reviewable.

What tools support enterprise architecture modeling?

The main tools are Sparx Enterprise Architect (ArchiMate, UML, BPMN, SysML), Archi (free, ArchiMate-only), and BiZZdesign. Sparx EA is the most feature-rich, supporting concurrent repositories, automation, scripting, and Jira integration.