⏱ 20 min read
Most workflow architecture arguments begin in the wrong place.
They begin with syntax. YAML or Java. BPMN or code. JSON documents in a database or methods in a service. That is like debating whether a constitution should be handwritten or typed while ignoring the country it governs. In enterprise systems, workflow representation is not a formatting choice. It is a decision about where business meaning lives, who is allowed to change it, how safely it can evolve, and what kind of failure the organization is willing to tolerate.
That is why the “workflow as data vs workflow as code” debate never really dies. It keeps resurfacing because it sits at the fault line between business change and technical control.
If you get it wrong, you do not just build an awkward engine. You create a company that cannot adapt without a release train, or worse, a company that can adapt too easily and no longer knows what rules were in force when money moved, contracts changed, or clinical decisions were made. One extreme turns process into compiled stone. The other turns it into a swamp of mutable configuration.
This is not a theoretical concern. Banks wrestle with it in loan origination. Insurers hit it in claims handling. Telecoms hit it in order fallout and provisioning. Healthcare hits it in case management. Anywhere the business says, “The steps depend on the case,” architecture is about to become political.
My view is blunt: workflow should be treated as a first-class domain concern, not merely as implementation plumbing. And whether it is represented as data or code should follow the semantics of the domain, the volatility of the process, the operational model, and the organization’s ability to govern change. There is no universal winner. There are only good fits, bad fits, and expensive mistakes.
Context
Workflow architectures sit in an awkward middle ground. They are not quite pure domain models, and not quite integration infrastructure either. They coordinate work across services, people, time, policy, and uncertainty. That alone makes them dangerous.
A useful way to frame the space is to separate three things that organizations often collapse into one:
- Domain policy: what must happen, under what conditions, with what outcomes
- Process flow: the ordering and routing of activities over time
- Execution mechanics: retries, timeouts, idempotency, correlation, compensation, event handling
These are related, but they are not the same. When teams fail to separate them, they either hard-code mutable business policies deep inside orchestrators, or they over-externalize stable decision logic into sprawling process definitions nobody can reason about.
Domain-driven design helps here. A workflow is not just “a sequence of tasks.” In many enterprises it is a domain object with state transitions, invariants, and language that matter to the business. “Claim referred to special investigation” is not a technical step. It is a business state with legal and operational implications. “Order awaiting credit release” is part of the ubiquitous language. If your workflow representation cannot express and protect those semantics, you are not modeling the business. You are drawing arrows.
This is where the representation question becomes architectural.
Problem
Enterprises need workflows that can evolve. Regulations change. Product rules change. New channels appear. SLAs tighten. Human approvals are inserted. Fraud checks become dynamic. Legacy systems force ugly detours.
The simple answer is often, “Make workflows data-driven.” Store process definitions in a database. Let a runtime interpret them. Allow configuration without redeploying services. This feels modern and flexible. And sometimes it is exactly right.
But data-driven workflow has a dark side. Once process logic becomes mutable data, you inherit the burden of versioning, validation, testability, auditability, compatibility, and runtime safety. You are effectively building a language, even if you pretend you are only storing configuration.
The opposite answer is, “Keep workflows in code.” Use application logic, typed models, tests, pull requests, and deployment pipelines. This gives discipline, refactoring support, and operational predictability. It also makes every business flow change a software release, which is tolerable for stable processes and miserable for high-variance domains.
So the real problem is this:
How should an enterprise represent workflow so that domain semantics remain clear, changes remain governable, distributed execution remains reliable, and migration from legacy process estates remains feasible?
That problem gets harder in microservices and Kafka-heavy environments. Once a workflow spans services, asynchronous messaging, retries, out-of-order events, and partial failure, representation choices start driving runtime behavior in ways that are very hard to reverse. event-driven architecture patterns
Forces
There are several forces pulling in different directions.
Business volatility vs engineering control
Some workflows change monthly because policy changes monthly. Others are stable for years. The more volatile the workflow, the stronger the pressure to represent it as data or declarative configuration. The more safety-critical or technically nuanced the workflow, the stronger the pressure to keep it in code.
This is the first tradeoff and the most abused one. Teams often hear “the business wants flexibility” and conclude “store everything in tables.” Flexibility without guardrails is just another word for production incidents.
Domain semantics vs generic engines
Generic workflow engines are attractive because they promise reuse. But generic abstractions flatten meaning. A “task” in an engine is not the same as a “clinical review,” “coverage determination,” or “trade settlement instruction.” Once domain semantics disappear under generic states and transitions, the model becomes harder for business and engineering to share.
DDD pushes us to preserve the language of the domain. That does not mean rejecting engines. It means refusing to let the engine become the domain model.
Auditability and historical truth
In enterprise environments, you often need to answer a nasty question months later: What process definition was in force for this case at the time this decision was made? If workflow is mutable data without strict versioning and immutable execution snapshots, that answer becomes unreliable.
Code has an advantage here because release artifacts and source control give a natural lineage. Data-driven workflows can match that, but only if designed with immutable versioning, migration rules, and execution binding.
Distributed systems reality
A workflow representation is only useful if the runtime can survive distributed failure. Messages arrive twice. Kafka partitions rebalance. services time out. Humans disappear for three days. Legacy systems acknowledge before committing. External providers answer eventually, and sometimes never.
This means workflow representation must align with execution semantics:
- correlation identifiers
- idempotent command handling
- timeout handling
- compensation or forward recovery
- replay and reconciliation
- event versioning
If your workflow model ignores these, operations will reintroduce them badly, one incident at a time.
Change governance
Who is allowed to change a workflow? Product managers? Operations analysts? Platform teams? Architects? Compliance? The answer matters.
Workflow as data is not just a technical pattern. It is an organizational operating model. If the enterprise cannot govern process changes, then giving runtime mutability to workflow definitions is like leaving power tools in a kindergarten.
Solution
My recommendation is opinionated:
Represent workflow in the form that best matches the volatility of sequencing, but keep domain decisions explicit and bounded.
In practice, that usually means:
- Workflow as code for stable, technically complex, or reliability-sensitive orchestration
- Workflow as data for highly variable routing, policy-driven case management, or business-managed flow changes
- Hybrid models for most large enterprises, with code owning execution semantics and data owning constrained business variation
That last point matters most. The sensible answer in real enterprises is rarely pure.
A good architecture distinguishes:
- Invariant domain rules that belong in code or strongly typed rule models
- Variable flow structures that can be externalized as versioned definitions
- Execution policies like retries, timeouts, and compensations that belong in the runtime platform
Think of workflow data as a score, not jazz improvisation. The musicians still need instruments, timing, discipline, and a conductor who knows what happens when the trumpets miss their cue.
Workflow as code
Workflow as code means the process is represented in application code or a typed DSL that compiles into code-level artifacts. This gives strong testing, refactoring, code review, static analysis, and deployment discipline.
It works especially well when:
- process logic is stable
- orchestration involves complex technical behavior
- consistency requirements are strict
- workflow changes require engineering anyway
- domain semantics can be represented in rich types and methods
The trap is rigidity. Teams often end up with “if/else process graphs” hidden inside orchestrator services. They technically have workflow as code, but no real domain model. Just spaghetti with unit tests.
Workflow as data
Workflow as data means the process definition is persisted and interpreted at runtime. States, transitions, conditions, human steps, SLAs, and routing rules may be stored as JSON, YAML, BPMN, or custom metadata. BPMN training
It works well when:
- process variants are numerous
- business-managed changes are frequent
- flow differs by product, region, partner, or regulation
- long-running cases need explicit persisted state
- analysis and audit of process definitions are important
The trap is that teams build accidental programming languages. Soon they need expression evaluators, compatibility checks, migration scripts, simulation tools, diffing, policy validation, and rollback mechanisms. The workflow engine becomes a platform whether they planned for one or not.
The hybrid pattern
The strongest pattern in enterprise architecture is to keep domain commands and event handling in code, while allowing controlled flow definitions to vary as data.
For example:
- code defines
AssessCredit,ApproveLoan,RequestDocuments - workflow data defines under what conditions these activities occur and in what sequence
- runtime enforces retries, deadlines, and correlation
- domain services still own business invariants
This preserves domain integrity while allowing business variability.
Architecture
A practical architecture usually has four layers:
- Domain services and aggregates
- Workflow definition model
- Workflow runtime / orchestration layer
- Event backbone and external system integration
Here is a simple comparison.
In both approaches, the runtime state of a case must be persisted. Long-running workflows cannot live in memory and hope for the best.
The major architectural difference is where control logic lives:
- in compiled artifacts
- or in interpreted definitions
In a Kafka-based microservices environment, the workflow engine should rarely own all business state. That belongs in bounded contexts. The workflow layer should coordinate, correlate, and track progression, not become a giant god-database of enterprise truth. microservices architecture diagrams
That separation is vital. Otherwise every workflow engine becomes a shadow ERP.
Domain semantics
The right design starts by naming business states and transitions properly. Instead of generic steps like TaskCompleted, model:
ClaimRegisteredCoverageConfirmedFraudReviewRequestedSettlementApproved
These names belong to bounded contexts. The workflow should reference them, not obscure them.
One of the best tests is simple: can a business expert and a developer stand in front of the workflow and use the same words? If not, the architecture is already leaking abstraction.
Reconciliation as a first-class concern
In distributed workflow, reconciliation is not an exception process. It is part of the normal architecture.
When a workflow says a payment request was sent but the payment service never committed, you need mechanisms to detect drift and restore truth. This matters even more in event-driven systems using Kafka, where event delivery guarantees do not eliminate application-level inconsistency.
A mature workflow architecture includes:
- state snapshots
- durable event logs
- correlation IDs
- replay or re-drive capability
- periodic reconciliation jobs
- manual exception lanes
That diagram is not edge-case theater. It is Tuesday morning in any serious enterprise.
Migration Strategy
No large organization starts from a blank sheet. They have legacy workflow buried in mainframes, BPM suites, custom order managers, CRM scripts, and human workarounds disguised as procedure manuals.
So the migration question is not “Which model is better?” It is “How do we move from what we have to what we need without losing control?”
The answer is usually progressive strangler migration.
Do not replace the whole workflow estate at once. Carve out a bounded slice where process semantics are clear and value is measurable. Introduce a new workflow representation there. Keep coexistence explicit. Build translation layers. Reconcile relentlessly.
A practical migration sequence looks like this:
- Map current process variants
- identify business states, triggers, outcomes, and exceptions
- separate true domain rules from legacy system quirks
- Choose a pilot bounded context
- not the simplest one, but one with visible pain and manageable blast radius
- Introduce canonical workflow state
- define a durable workflow instance model independent of legacy internals
- Run in parallel
- legacy and new workflow paths may coexist for a time
- compare outcomes and timing
- Add reconciliation
- use periodic comparison to detect divergence between old and new process states
- Cut over by case type or product line
- strangler by entry point, channel, region, or policy cohort
- Retire legacy process fragments incrementally
- not before observability proves stability
Here is the shape of that migration.
The key migration idea is that the workflow state becomes the bridge. If you cannot define a canonical notion of where a case is, migration will become guesswork wrapped in project governance. EA governance checklist
Data-first migration caveat
Teams often migrate by externalizing workflow into data immediately because it looks like a shortcut. Be careful. If your current process is poorly understood, putting it into metadata does not simplify it. It fossilizes confusion.
A better strategy is often:
- first re-express workflow clearly in code
- then externalize only the parts proven to vary
That sequence gives you testability before flexibility.
Enterprise Example
Consider a multinational insurer modernizing claims handling across auto, property, and travel lines.
The legacy estate looks familiar:
- a central BPM suite for claim routing
- policy checks in a mainframe
- fraud scoring in a separate analytics platform
- human tasks in a case management tool
- downstream payments over Kafka-backed services
- regional exceptions managed through undocumented operator procedures
The insurer’s first instinct is to standardize all claims workflows into one configurable engine. This is the kind of idea that looks fantastic in steering committees and dreadful in production. Auto glass claims, complex bodily injury claims, and travel delay reimbursements do not share enough semantics to justify a single generic mega-flow.
So the architecture team applies DDD and breaks the space into bounded contexts:
- Claim Intake
- Coverage Verification
- Fraud Assessment
- Settlement
- Recovery/Subrogation
Within those contexts, they make a deliberate representation split.
Settlement orchestration is implemented as workflow as code. It touches payment services, reserve adjustments, approvals, and ledger interactions. The sequencing is stable, compensation is critical, and failures are expensive. Engineering discipline matters more than runtime configurability.
Fraud triage routing is represented as workflow as data. It changes frequently by region, product, and fraud model output. Analysts need to adjust thresholds and review paths without waiting for monthly releases. But the allowed action set is constrained and versioned.
Coverage decisions remain inside domain services, not in workflow metadata. This is crucial. Eligibility and contractual interpretation are domain rules, not process arrows. By leaving them in bounded context services, the insurer avoids turning workflow definitions into legal logic spreadsheets.
Migration happens progressively:
- new travel claims use the new workflow first
- auto glass follows
- complex bodily injury remains on the old BPM platform until reconciliation quality is proven
During coexistence, every claim gets a canonical workflow state record. Kafka events carry claim IDs and workflow instance IDs. A reconciliation service compares expected milestones with actual downstream states every few hours. This catches cases where payment completed but the workflow still shows “awaiting disbursement,” or where fraud review was resolved but human task status remained stale.
The result is not purity. It is control.
That is what mature architecture looks like: not a perfect model, but a system that can change without lying to itself.
Operational Considerations
Workflow architecture lives or dies in operations.
Versioning
Never allow mutable in-place edits of active workflow definitions without strict versioning. Running instances must be bound to a definition version, or to explicit migration rules. Otherwise you will never be able to explain historical behavior.
Observability
A workflow runtime needs more than infrastructure monitoring. You need:
- per-instance state visibility
- business milestone tracking
- stuck workflow detection
- timeout and SLA dashboards
- event correlation traces
- replay and recovery tooling
Technical logs are not enough. Operations teams need to answer business questions, not just JVM questions.
Idempotency
Any step triggered by Kafka or asynchronous messaging must be idempotent. Workflow runtimes often amplify retries. If downstream actions are not safe to repeat, duplicate processing becomes inevitable.
Human tasks
Human-in-the-loop workflows are where elegant diagrams go to die. Approvals are delegated. Context is missing. Tasks sit abandoned. People complete steps offline. Your workflow model must handle reassignment, timeout escalation, cancellation, and manual correction.
Security and governance
If workflow is data, treat definition changes like code changes with policy checks, approvals, simulation, and audit logs. A runtime-editable process definition that can move money or alter entitlements is a governance risk, not a convenience feature.
Tradeoffs
Let us be plain about the tradeoffs.
Workflow as code strengths
- strong type safety
- easier testing and refactoring
- better deployment discipline
- clear integration with code-based domain models
- safer for technically complex orchestration
Workflow as code weaknesses
- slower business-facing change
- harder non-engineer participation
- risk of process logic being buried in imperative code
- poor fit for many process variants
Workflow as data strengths
- flexible process variation
- clear persistence of long-running state
- better business visibility into flows
- easier runtime adaptation when governed well
Workflow as data weaknesses
- weak validation unless heavily engineered
- accidental complexity in definition languages
- difficult refactoring across versions
- risk of encoding domain rules as opaque expressions
- operational danger from mutable runtime behavior
The central tradeoff is not flexibility versus rigidity. It is adaptability versus control. Enterprises need both. That is why hybrid architectures dominate.
Failure Modes
The failure modes are remarkably consistent.
1. Configuration masquerading as design
Teams externalize process definitions but do not build tooling, versioning, validation, or simulation. Soon they have fragile metadata that nobody trusts. The system is “dynamic” in the same way a loose wheel is dynamic.
2. God workflow engine
The workflow platform starts coordinating everything and slowly absorbs domain logic, integration logic, and reporting logic. Every team depends on it. It becomes the new monolith, only harder to debug.
3. Domain semantics lost in generic models
Business states become STEP_17 and ROUTE_B. Domain experts disengage. Engineers overfit the engine’s abstraction. The workflow is executable but no longer meaningful.
4. No reconciliation path
The architecture assumes happy-path event delivery and eventual consistency. Then drift appears between workflow state and service state, and there is no systematic way to detect or repair it.
5. Unmanaged version sprawl
Different products, regions, and exceptions lead to dozens or hundreds of workflow variants. Without inheritance, composition, and governance, process definition estates become unmaintainable.
6. Runtime flexibility without organizational readiness
The business is told it can change workflows without deployments. In practice, nobody owns testing, impact analysis, or rollback. So changes happen slowly anyway, but now with worse controls.
When Not To Use
Workflow as data is not always wise.
Do not use it when:
- the process is stable and mostly technical
- correctness matters more than runtime configurability
- business users are not realistically going to manage definitions
- the organization lacks governance for process changes
- domain rules are complex enough that expression languages will become code in disguise
Likewise, do not use workflow as code exclusively when:
- process variants are numerous and policy-driven
- analysts truly need controlled change without engineering bottlenecks
- long-running cases require explicit persisted state and visible progression
- process audit and explainability are central
And sometimes, do not use an explicit workflow engine at all. Some domains are better expressed through aggregates, state machines inside bounded contexts, and event choreography. If there is no meaningful cross-service process to coordinate, introducing a workflow layer may just add ceremony.
A workflow engine should solve a problem. It should not become one.
Related Patterns
Several related patterns shape this decision.
Saga orchestration and choreography
For distributed transactions, workflow as code often aligns well with saga orchestration, while event-driven workflow as data can drift toward choreography. Neither is inherently superior. Orchestration gives visibility and control. Choreography gives autonomy and looser coupling. The enterprise usually needs some of both.
State machine modeling
Many workflows are really state machines with rich domain semantics. Explicit state machine modeling can be cleaner than task-flow thinking, especially where legal or lifecycle status matters.
Rules engines
A rules engine can complement workflow as data by separating decisions from flow. This is often healthier than embedding conditions in workflow expressions. But it also creates another runtime dependency and governance surface.
Outbox and event sourcing
For Kafka-based workflows, the transactional outbox pattern is often essential to publish reliable domain events. Event sourcing can help with replay and audit, though it is not required and is often overused.
Case management
Not all workflows are linear processes. In case management domains, activities emerge based on context, evidence, and human judgment. These are often better suited to data-driven models than rigid code workflows.
Summary
The question is not whether workflow should be data or code. The question is where change belongs, where meaning belongs, and where failure can be tolerated.
Use workflow as code when the orchestration is stable, technically demanding, and safety matters more than runtime agility.
Use workflow as data when process variation is real, domain users need controlled influence, and you are prepared to invest in versioning, validation, governance, and reconciliation.
Use hybrid architecture most of the time:
- domain rules in bounded contexts
- execution mechanics in the platform
- variable flow in constrained, versioned definitions
That is the enterprise answer. Not ideological purity. Bounded flexibility.
If you remember one line, remember this: workflow representation is a decision about business truth under change.
Treat it with that level of seriousness, and your architecture will age gracefully. Treat it as a serialization format argument, and it will age like milk.
Frequently Asked Questions
What is enterprise architecture?
Enterprise architecture aligns strategy, business processes, applications, and technology in a coherent model. It enables impact analysis, portfolio rationalisation, governance, and transformation planning across the organisation.
How does ArchiMate support architecture practice?
ArchiMate provides a standard language connecting strategy, business operations, applications, and technology. It enables traceability from strategic goals through capabilities and services to infrastructure — making architecture decisions explicit and reviewable.
What tools support enterprise architecture modeling?
The main tools are Sparx Enterprise Architect (ArchiMate, UML, BPMN, SysML), Archi (free, ArchiMate-only), and BiZZdesign. Sparx EA is the most feature-rich, supporting concurrent repositories, automation, scripting, and Jira integration.