⏱ 19 min read
Most enterprises don’t have a data architecture problem. They have a meaning problem wearing a data platform badge. enterprise architecture with ArchiMate
Someone buys a lake, or a mesh, or a streaming platform, or all three in a single ambitious quarter. A few teams stand up Kafka. Another team provisions object storage and calls it the foundation. There is a heroic slide with arrows, boxes, and a blue cylinder at the center. Executives nod because it looks modern. The vendors are delighted. Six months later, the company has more data than ever and less shared understanding than before.
This is not unusual. It is almost the default.
A lake is a place to put things. Architecture is the set of decisions that determines how the business changes safely. Those are not the same thing. One is storage. The other is responsibility, meaning, control, and flow. Confusing the two is one of the most expensive category errors in enterprise technology.
The practical fault line usually appears here: ingestion pipelines are mistaken for domain pipelines.
Ingestion is about capture. It moves bytes from where they are produced into some centralized substrate, often quickly and with minimal judgment. Domain pipelines are different. They express business semantics. They apply bounded context rules. They maintain identity, lineage, quality, and state transitions that the business actually cares about. They answer questions such as: what does “customer” mean here? when is an order committed? which system is allowed to declare a payment settled? how do we reconcile disagreements between systems of record?
If you build only ingestion, you get a bigger attic. If you build domain pipelines, you get an operating model.
That distinction matters even more in enterprises with microservices, event-driven systems, and Kafka-heavy integration. The temptation is to think that because events are flowing, architecture is happening. But raw motion is not design. A fire hose is not a supply chain. event-driven architecture patterns
This article makes a blunt argument: the lake is not the architecture. The architecture lives in the domain semantics layered over ingestion, in the contracts between bounded contexts, in the reconciliation logic, in the progressive migration strategy, and in the operational discipline that keeps data and events trustworthy under failure.
Context
Most large organizations got to their current state honestly. They accumulated systems over years: ERP, CRM, warehouse management, billing, e-commerce, claims, policy administration, manufacturing execution, customer support, and a scattering of bespoke applications built during urgent moments that somehow became permanent.
Then came three converging pressures.
First, business leaders wanted integrated reporting and machine learning. They needed a single view of customer, product, supplier, order, policy, or patient. The existing estate offered only fragments.
Second, product teams wanted autonomy. Microservices promised independent delivery, local ownership, and faster change. Kafka promised decoupled integration, event streams, and near-real-time reaction. microservices architecture diagrams
Third, regulators, auditors, and operations teams demanded more traceability, not less. “Where did this number come from?” became a first-class architecture question.
So enterprises did what enterprises do: they added a lake to collect data, then added stream processing to move faster, then added governance to recover from the first two. EA governance checklist
None of those decisions are wrong in isolation. But they become dangerous when the lake is cast as the center of architecture rather than a component in a larger set of domain decisions.
A good architecture starts with business capabilities and bounded contexts. It identifies where domain truth is created, where it is merely copied, where translation is necessary, and where reconciliation is unavoidable. It understands that ingestion is a technical concern in service of a semantic model. Not the other way around.
That sounds obvious. It rarely survives budgeting season.
Problem
The common anti-pattern looks like this:
- Every source system publishes extracts or CDC streams.
- Everything lands in a lake or streaming backbone.
- A central team normalizes records into broad enterprise schemas.
- Downstream consumers are expected to derive business meaning from those centralized feeds.
This feels efficient. It is also where architecture quietly leaks away.
Why? Because enterprise entities are not universal facts. They are context-bound concepts.
A “customer” in billing is the party responsible for payment. In CRM, it may be a prospect, household, or account hierarchy. In shipping, it may be a delivery recipient. In compliance, it may be the legally accountable person or organization. These are related concepts, not the same concept with slightly different columns.
When a central ingestion pipeline flattens them into one enterprise customer schema too early, it creates a semantic fiction. That fiction is then embedded into dashboards, APIs, ML features, downstream marts, and operational logic. Soon everyone argues over the definition of customer, but the real issue is architectural: the pipeline collapsed bounded contexts before the business was ready to make those distinctions explicit.
The same happens with orders, products, inventory, claims, payments, and policies. The central platform becomes a semantic battlefield.
Worse, ingestion-oriented thinking optimizes the wrong things. It celebrates throughput, freshness, and schema standardization, while neglecting:
- source-of-truth boundaries
- lifecycle state transitions
- idempotency across domains
- reconciliation of conflicting records
- temporal correctness
- legal and audit obligations
- ownership of domain contracts
The result is a platform full of copied data and no trustworthy business narrative.
You can spot this failure mode in language. Teams say “the data is in the lake” as if location implies readiness. It does not. Data in a lake is often just data waiting for a real model.
Forces
Good architecture is forged by opposing forces, not slogans.
Speed versus meaning
Ingestion wants to move fast. Domain modeling wants to move carefully. Raw feeds can often be onboarded in days. Agreeing what a “fulfilled order” means across channels may take months. The enterprise must do both without pretending they are the same activity.
Centralization versus autonomy
A central platform team can standardize ingestion, security, observability, and tooling. That is useful. But domain semantics belong with domain teams. If central teams define the business meaning of every dataset, they become a bottleneck and usually get it wrong.
Reuse versus bounded context integrity
Executives love reuse. Architects should be more suspicious. Shared models reduce duplication, but they also spread coupling. The trick is to reuse infrastructure and platform patterns while preserving semantic boundaries.
Event-driven flow versus historical correctness
Kafka and microservices are excellent for propagating change. They are not magic. Late events, duplicates, out-of-order delivery, schema drift, and compensating business actions are facts of life. Enterprises need replay, temporal queries, and reconciliation, not just “real-time.”
Analytical convenience versus operational truth
Analytical consumers often want broad denormalized datasets. Operational systems require precise state machines and authority boundaries. Trying to serve both from one generic enterprise feed usually satisfies neither.
Migration urgency versus business continuity
No one gets to rebuild a global enterprise from scratch. The architecture must coexist with legacy systems, overlapping truths, contractual integrations, and reporting deadlines. This is where progressive strangler migration matters.
Solution
The answer is not “don’t build a lake.” The answer is to put the lake in its proper place.
Use a two-layer pipeline model:
- Ingestion pipelines capture and preserve source data with minimal semantic transformation.
- Domain pipelines transform, validate, reconcile, and publish business-meaningful data products inside bounded contexts.
That distinction sounds tidy. In practice, it changes everything.
Ingestion pipelines should answer:
- How do we capture data reliably from source systems?
- How do we preserve lineage, timestamps, source identifiers, and raw payloads?
- How do we detect schema changes and ingestion failures?
- How do we replay data safely?
Domain pipelines should answer:
- What business concept does this represent in this bounded context?
- Which system is authoritative for which attributes and state transitions?
- How do we handle duplicates, late arrivals, and conflicting updates?
- What is the identity model?
- What reconciliation process closes the gap between systems?
- What contracts do we publish to other domains?
That means the lake or streaming backbone becomes a substrate, not the architecture itself. The architecture emerges through domain-aligned processing and publication.
A strong enterprise design usually has these characteristics:
- raw immutable landing zones or append-only event streams
- clear separation between source capture and semantic transformation
- bounded-context domain datasets or topics
- explicit ownership per domain product
- canonical models used sparingly and only where there is stable shared language
- reconciliation services for cross-system consistency
- auditability and replay as first-class concerns
- a migration path that gradually shifts consumers from legacy extracts to domain products
This is domain-driven design applied to data and integration, not just to application code.
The bounded context remains the unit of semantic integrity. If that line blurs, so does everything downstream.
Architecture
Let’s make the distinction concrete.
Ingestion versus domain pipeline
The key point is that ingestion preserves. Domain pipelines interpret.
Raw ingestion should not prematurely collapse records into an “enterprise truth.” It should retain source fidelity so domain teams can reason from evidence. This also makes replay and forensic analysis possible when business rules change.
Then each domain pipeline takes responsibility for turning raw records into business-meaningful products. For example:
- Customer domain resolves identity, survivorship, consent, segmentation, and lifecycle state.
- Order domain models order placement, amendment, fulfillment, cancellation, and channel-specific semantics.
- Payment domain tracks authorization, capture, settlement, reversal, chargeback, and ledger implications.
- Inventory domain distinguishes on-hand, reserved, available-to-promise, in-transit, and quarantined states.
Notice how these are not generic data transformations. They are business semantics.
Domain contracts and bounded contexts
A healthy enterprise architecture does not force every team to consume raw topics or lake tables. Domain teams publish curated contracts.
This arrangement matters because domain events and data products are contracts. They should represent stable business facts meaningful to consumers, not raw implementation leaks from source applications.
A raw SAP table extract is not a domain contract.
A PaymentSettled event with business identifiers, settlement amount, settlement date, and authority source may be.
Reconciliation is architecture, not cleanup
Many programs treat reconciliation as a downstream data quality activity. That is a mistake. Reconciliation is how enterprises survive multiple systems asserting partial truth.
When one system says an order is shipped, another says invoiced, and a third says returned, the gap is not just dirty data. It reflects asynchronous processes, authority boundaries, and business lag. A serious architecture makes reconciliation explicit.
Reconciliation should define:
- comparison keys and identity strategy
- timing windows
- source authority by attribute and state
- tolerance thresholds
- exception handling workflows
- replay and restatement procedures
If you skip this, your architecture will lie politely until quarter-end, when finance and operations discover they have different numbers and no common explanation.
Migration Strategy
This is where most architecture articles become dreamy. Real enterprises do not replace sprawling estates with one elegant domain model in a fiscal year. They migrate under pressure, with conflicting priorities and live business risk.
The right strategy is usually a progressive strangler migration.
Start by separating concerns:
- establish standardized ingestion from legacy systems
- preserve raw history and identifiers
- build one domain pipeline where business pain is highest
- publish domain contracts
- move selected consumers from raw or legacy feeds to those contracts
- run reconciliation in parallel
- gradually retire old extracts and point-to-point interfaces
The migration is not from old technology to new technology. It is from opaque integration to explicit domain semantics.
A practical sequence often looks like this:
Phase 1: Stabilize ingestion
Introduce CDC, API capture, or batch ingestion into raw topics and object storage. Add lineage, metadata, schema compatibility checks, and replay capability. Do not over-model yet. The goal is reliable evidence.
Phase 2: Pick one bounded context
Choose a domain with visible business value and manageable boundaries: customer identity, order lifecycle, inventory availability, or payment status are common candidates. Avoid starting with a concept everyone fights over politically unless executive sponsorship is unusually strong.
Phase 3: Build a domain pipeline and contract
Model the business lifecycle. Define authoritative sources. Implement identity and state transitions. Publish domain events or datasets with clear ownership and SLA. This is the moment where architecture becomes legible.
Phase 4: Reconcile against legacy truth
Do not cut over immediately. Compare outputs to existing reports and operational systems. Quantify gaps. Many migration failures happen because teams assume semantic equivalence where none exists.
Phase 5: Strangle consumers
Move downstream systems one by one:
- reporting marts
- customer service dashboards
- digital channels
- compliance extracts
- ML feature pipelines
Each move should reduce dependency on raw ingestion or old bespoke integrations.
Phase 6: Retire old pathways deliberately
Turn off only what you can observe. Keep replay and backfill plans ready. Legacy interfaces often support hidden users no one documented.
This migration path respects business continuity. It also forces clarity. You cannot publish stable domain contracts without deciding what the domain means.
Enterprise Example
Consider a global retailer with e-commerce, stores, and wholesale channels. It has SAP for finance, a cloud CRM, a separate order management system for online sales, store systems acquired through mergers, and a warehouse platform that emits events into Kafka.
Leadership funds a “unified retail data lake” to create one view of customer and order.
The first program wave does what many programs do:
- ingests SAP, CRM, OMS, WMS, and point-of-sale data
- lands everything in a lake
- creates broad
customer_masterandorder_mastertables - publishes these tables as enterprise assets
Initially this looks successful. Data volumes are high, and many teams can query a central repository. But the cracks appear quickly.
The CRM customer includes prospects with no transactions.
The finance customer is an account structure tied to invoicing.
The store systems identify customers inconsistently.
E-commerce has guest checkouts and household profiles.
Marketing wants households.
Fraud wants cardholder patterns.
Compliance wants legally identifiable subjects.
The “customer_master” table turns into a compromise document encoded as SQL. Everyone uses it. No one trusts it.
Orders are worse. The e-commerce OMS treats order amendments as versioned changes. Store systems treat returns as separate transactions. Finance posts invoice facts later. Fulfillment has partial shipment events. The central order_master table cannot model the actual lifecycle, so it invents statuses that satisfy reports but not operations.
At this point the retailer has a lake, but not an architecture.
A better move is to redraw around domains.
- Customer domain pipeline resolves party identity and consent using explicit survivorship rules, not universal semantics. It publishes a domain contract for customer interaction and service use cases.
- Order domain pipeline models order lifecycle independently from finance posting. It emits meaningful events such as
OrderPlaced,OrderAmended,OrderPartiallyFulfilled,OrderCancelled. - Payment domain pipeline tracks authorizations, captures, settlements, refunds, and chargebacks.
- Reconciliation service compares fulfillment, invoicing, and settlement to produce reconciled commercial state and exceptions.
Kafka remains relevant, but in the right place. Raw operational events flow through it. Domain services consume and publish curated events. The lake still stores raw and conformed history for analytics and replay. But the business no longer pretends that one giant central table defines reality.
Over time, customer support applications switch from querying stitched lake tables to using customer and order domain APIs. Finance reporting consumes reconciled order-payment facts rather than joining raw operational feeds. Legacy nightly extracts are retired gradually. Mismatch exceptions become visible operational work, not hidden report drift.
This is the difference between integration plumbing and architecture.
Operational Considerations
Architects who ignore operations are just drawing expensive aspirations.
A domain-oriented pipeline architecture needs disciplined runtime behavior.
Data and event observability
Track freshness, completeness, schema evolution, duplication rates, lag, reconciliation exceptions, and consumer SLA breaches. The question is not merely “did the job run?” but “is the business signal trustworthy?”
Idempotency and replay
Kafka and distributed pipelines deliver the same lesson repeatedly: duplicates happen, retries happen, and reprocessing happens. Domain pipelines must be idempotent by design. Use stable business keys, event versioning, and replay-safe handlers.
Temporal modeling
Many enterprise disputes are really time disputes. Which state was true at 10:03 when the invoice posted? Architecture needs effective dates, processing dates, and event times. A “current snapshot” is useful, but insufficient.
Schema governance
Raw ingestion can tolerate drift better than domain contracts can. Consumer-facing events and datasets need compatibility rules, versioning, and change review. Breaking domain contracts casually is one of the fastest ways to lose platform trust.
Security and policy boundaries
Raw zones often contain sensitive data far broader than most consumers need. Domain products should publish the minimum viable semantic contract and enforce policy segregation. This is especially important for customer, healthcare, HR, and financial domains.
Exception operations
Reconciliation exceptions need owners, queues, escalation rules, and root-cause categorization. If exceptions disappear into a dashboard no one watches, the architecture becomes ceremonial.
Tradeoffs
There is no free architecture. There is only informed compromise.
What you gain
- clearer domain ownership
- more trustworthy business semantics
- better support for microservices and event-driven integration
- easier migration from legacy systems
- improved auditability and replay
- reduced coupling between source system structure and consumer usage
What you pay
- more design effort up front
- domain modeling work that cannot be delegated to generic ETL teams alone
- longer time before enterprise-wide semantic convergence
- duplicate-looking models across bounded contexts
- operational complexity around reconciliation and contract management
That last point bothers some leaders. “Why do we have several customer-like models?” Because the business has several customer-like realities. Pretending otherwise only moves complexity into hidden joins, undocumented logic, and political meetings.
A little duplication at the semantic edges is often cheaper than false unification in the center.
Failure Modes
Let’s be honest about how this goes wrong.
1. The central team becomes the semantic bottleneck
A platform or data office decides it will define enterprise meaning for all domains. It cannot. The queue fills, domain experts disengage, and the central model becomes a graveyard of compromises.
2. Raw ingestion is exposed as a product
Teams publish raw CDC topics or raw lake tables and call them strategic assets. Consumers build directly on them. Later, when source systems change, every downstream team breaks together. This is accidental tight coupling dressed as openness.
3. Canonical data model overreach
The enterprise tries to define a single canonical model for customer, order, product, invoice, shipment, and payment before stabilizing bounded contexts. Progress slows to a crawl. The model either becomes abstract and useless or precise and politically impossible.
4. Reconciliation is postponed
Programs go live on the assumption that mismatches will be “cleaned up later.” Later arrives as audit findings, billing disputes, and executive escalations.
5. Event enthusiasm outruns business state modeling
Teams emit lots of Kafka events but do not define lifecycle state, authority, or compensating behavior. They have streams without narrative.
6. Migration is treated as big bang
A new platform is declared the future. Legacy feeds are switched off too early. Hidden dependencies surface. Confidence collapses. People retreat to spreadsheets and side databases.
None of these are exotic failures. They are the mainstream ones.
When Not To Use
This approach is not mandatory for every situation.
Do not build a full domain-pipeline architecture when:
- the problem is purely analytical and not tied to operational semantics
- data volumes and business criticality are low
- there is one genuinely authoritative source and little need for cross-domain integration
- the organization lacks stable domain ownership entirely
- the cost of reconciliation and contract governance outweighs business value
For a small departmental reporting need, a simple ingestion-to-warehouse pattern may be enough. Not every dataset deserves a bounded context ceremony.
Likewise, if an enterprise is very early in platform maturity, it may need to first establish basic ingestion reliability, metadata, and security before it can sensibly pursue domain-oriented pipelines. You cannot do semantic elegance atop operational chaos.
The pattern is most valuable where multiple systems create overlapping business truth and where downstream decisions depend on trustworthy state, not just convenient access.
Related Patterns
Several adjacent patterns support this approach.
Data products
A domain pipeline often publishes data products with explicit ownership, documentation, SLA, quality metrics, and access policy. This aligns well with data mesh ideas, provided teams do not confuse mesh with semantic anarchy.
Event-carried state transfer
Useful when microservices and domains need timely propagation. Dangerous when teams publish internal implementation details instead of stable business events.
Change Data Capture
Excellent for ingestion. Not a domain model. CDC gets facts out of systems; it does not decide what they mean.
CQRS and materialized views
Helpful for presenting different read models to different consumers while preserving domain semantics behind them.
Master data management
Sometimes useful, especially for identity-heavy domains. But MDM should not become a universal flattening machine. It works best when scoped and aligned to bounded contexts.
Strangler fig pattern
Essential for migration. Replace consumer dependencies progressively rather than attempting one heroic cutover.
Summary
The lake is valuable. It is just not the architecture.
Architecture begins where semantics begin: in bounded contexts, in domain ownership, in business state transitions, in contracts that consumers can trust, and in reconciliation logic that explains disagreement instead of hiding it. Ingestion pipelines move data into reach. Domain pipelines make it meaningful.
That is the core distinction.
If you remember one line, make it this: store once if you like, but model where the business meaning lives.
Enterprises that miss this build enormous platforms that remain strangely hollow. They collect everything and decide very little. Enterprises that get it right treat raw capture as evidence, domain pipelines as interpretation, and reconciliation as a first-class control loop. They migrate progressively, strangling old interfaces without betting the company on a weekend cutover. They use Kafka where streaming helps, microservices where boundaries are real, and lakes where preservation and scale matter.
But they do not confuse infrastructure with architecture.
And that confusion, more than any tool choice, is what separates a modern-looking estate from a modern enterprise.
Frequently Asked Questions
What is enterprise architecture?
Enterprise architecture aligns strategy, business processes, applications, and technology in a coherent model. It enables impact analysis, portfolio rationalisation, governance, and transformation planning across the organisation.
How does ArchiMate support architecture practice?
ArchiMate provides a standard language connecting strategy, business operations, applications, and technology. It enables traceability from strategic goals through capabilities and services to infrastructure — making architecture decisions explicit and reviewable.
What tools support enterprise architecture modeling?
The main tools are Sparx Enterprise Architect (ArchiMate, UML, BPMN, SysML), Archi (free, ArchiMate-only), and BiZZdesign. Sparx EA is the most feature-rich, supporting concurrent repositories, automation, scripting, and Jira integration.