⏱ 18 min read
Most data platform failures do not begin with scale. They begin with doubt.
Someone in finance asks why yesterday’s revenue report doesn’t match the billing ledger. Operations sees an order marked “shipped” in one system and “pending” in another. A data science team quietly maintains its own extracts because the warehouse “usually” lags and “sometimes” lies. Nobody says the platform is broken at first. They say something worse: I don’t trust it.
That is the real architecture problem.
We like to talk about data platforms in the language of pipelines, lakehouses, event streams, CDC, Kafka topics, medallion layers, and semantic models. All useful. None sufficient. A serious enterprise data platform is not a storage strategy with some integration plumbing attached. It is a machine for maintaining institutional trust under continuous change. And trust is earned through reconciliation: the disciplined ability to explain why two views differ, when they should converge, and what source of truth wins when they don’t.
That is where reconciliation topology matters. Not as a fashionable label, but as a design choice. Every enterprise already has one, even if it is accidental: a web of systems, ledgers, event streams, snapshots, reports, and correction processes that determine how disagreement is discovered and resolved. Good architecture makes that topology explicit. Bad architecture leaves it buried in SQL scripts, tribal knowledge, and monthly war rooms.
This article takes a firm position: if your data platform architecture does not model domain semantics and reconciliation as first-class concerns, then it is not architecture. It is plumbing with nice slides.
Context
In most enterprises, the data platform sits in the blast radius of competing truths.
The operational estate evolved over years. ERP handles invoices and accounting periods. CRM owns opportunity and customer hierarchy. Order management tracks line-level fulfillment. A payments service emits transaction events into Kafka. A warehouse ingests CDC feeds. A lake stores semi-structured exhaust. Then analytics teams build “gold” datasets that flatten all of this into business-friendly views. Somewhere above that stack sits a dashboard with a single number called revenue.
The number looks innocent. It never is.
Because revenue in finance means recognized revenue under accounting rules. In sales it may mean booked contract value. In product analytics it may mean successful payment capture. In a marketplace it may exclude taxes and pass-through fees. In a subscription business it may be amortized over service periods. If the platform architecture does not preserve these domain meanings, the reconciliation problem becomes political. Teams are not arguing about data quality. They are arguing about language.
This is why domain-driven design belongs in data platform conversations, not just microservice design. Bounded contexts are not an application-only concern. They are the only sane way to stop your analytics layer from becoming a giant semantic landfill. microservices architecture diagrams
A healthy platform makes room for multiple valid truths inside clearly defined contexts, then provides explicit reconciliation paths across them.
Problem
The common enterprise anti-pattern is to centralize data before understanding meaning.
A central team builds ingestion pipelines from every source system into a warehouse or lakehouse. It standardizes IDs where possible, creates conformed dimensions, deduplicates records, and publishes enterprise datasets. On paper this sounds efficient. In practice it often creates three problems at once.
First, semantic collapse. Distinct business concepts get flattened into one “canonical” model too early. The platform claims to define Customer, Order, Payment, Revenue, as if one model could satisfy all upstream and downstream concerns. It can’t. Canonical models are often diplomatic documents: broad enough to gain approval, vague enough to hide disagreement, and brittle enough to fail in production.
Second, opaque inconsistency. Data moves through batch loads, CDC pipelines, event streams, and enrichment jobs on different clocks. One dataset reflects operational state at 10:00, another at 10:07, another after a failed retry at 09:52. Because architects did not design for temporal reconciliation, consumers see mismatch and assume poor quality.
Third, responsibility without ownership. The central platform team becomes accountable for business correctness without controlling source behavior. That is a miserable place to stand. If a CRM user merges accounts incorrectly, or an ERP back-posts an accounting adjustment, or a microservice emits duplicate events, the data platform inherits the blame.
This is not merely a tooling issue. It is an architectural issue rooted in topology: where truth originates, how truth propagates, and how disagreement is resolved.
Forces
Several forces pull the architecture in different directions.
Operational autonomy versus analytical consistency. Microservices and business systems need local control. Analytics needs cross-domain coherence. If you over-optimize for autonomy, every downstream consumer becomes an integrator. If you over-optimize for consistency, you create a central bottleneck that slows change.
Event immediacy versus ledger finality. Kafka and event-driven systems provide timely signals. Ledgers provide legal and financial finality. A payment authorized event may be operationally useful in milliseconds, but finance cannot close the books on an event stream alone. Architectures that confuse event truth with accounting truth eventually embarrass someone in a quarterly review.
Schema evolution versus consumer stability. Source teams must evolve models. Consumers want stable contracts. CDC and streaming reduce latency but increase exposure to source change. Reconciliation design must absorb this drift.
Domain fidelity versus enterprise comparability. Preserve domain semantics too faithfully and enterprise reporting becomes fragmented. Normalize too aggressively and you erase business nuance. This is the perennial tradeoff.
Cost versus traceability. Full lineage, immutable history, and replayable event logs are expensive. But without them, forensic analysis is guesswork. Enterprises often discover the value of traceability only after a regulator asks for evidence.
Human process versus machine process. Some reconciliations can be automated. Others require operational judgments, policy exceptions, or accounting interpretation. A platform that pretends everything can be solved in code becomes dangerous.
These forces do not disappear. Architecture earns its keep by making the tensions explicit and survivable.
Solution
The right answer is a reconciliation-oriented data platform organized around bounded contexts, authoritative records, and explicit convergence paths.
The heart of the design is simple:
- Model data by domain semantics, not just by source tables.
- Preserve multiple authoritative views where the business genuinely has more than one truth.
- Define reconciliation flows that compare, explain, and, where appropriate, converge those views.
- Make timing, completeness, and correction status visible as metadata, not hidden implementation detail.
This leads to a topology with three broad layers.
Operational truth layer. Source-aligned data products preserve what happened in the producing context: order events, invoice postings, payment captures, shipment confirmations, customer master changes. This is not raw dumping ground; it is semantically curated at the boundary of each domain.
Reconciliation layer. A dedicated set of services and data products compare related facts across domains. They identify expected differences, unexpected breaks, lag windows, duplicate keys, orphan records, and unresolved exceptions. This layer does not merely aggregate. It reasons.
Consumption layer. Business-facing models expose metrics and entities with explicit semantic contracts: “booked revenue,” “recognized revenue,” “cash collected,” “fulfilled orders,” “active customers.” Consumers choose the view that matches their use case.
That middle layer is where trust is built. Most platforms skip it. They leap from ingestion to dashboards, hoping enough SQL glue will bridge the gap. It won’t.
A practical topology
A few opinions here.
First, reconciliation is not just a dashboard check. It is an architectural capability. It deserves its own services, data contracts, and operational ownership.
Second, authoritative sources should be explicit per fact type. Payment authorization belongs to payments. Revenue recognition belongs to finance. Fulfillment status belongs to warehouse or order fulfillment. A platform that tries to override these facts centrally becomes a shadow system.
Third, “single source of truth” is usually a slogan. In real enterprises, you need a system of record per bounded context plus a mechanism for enterprise-level reconciliation. That is mature architecture. Anything simpler is often fantasy.
Architecture
Let’s get concrete.
1. Domain-driven data products
Each bounded context publishes a data product that captures its language and invariants. If the payments service emits PaymentAuthorized, PaymentCaptured, and RefundSettled, the payments product should preserve those event semantics and map them to stable analytical facts. If ERP posts journal entries and revenue schedules, the finance product should preserve accounting semantics rather than flatten them into “transaction amount.”
This is where many lakehouse programs go wrong. They preserve source structure but not source meaning. The data product boundary should be semantic, not just technical.
A good litmus test: can a domain expert read the contract and say, “yes, that is what we mean”?
2. Reconciliation services
Reconciliation services consume domain data products and apply explicit matching logic.
For example:
- Match order line items against payment captures within a tolerance window.
- Compare shipped quantities against invoiced quantities.
- Reconcile booked subscriptions against recognized revenue schedules.
- Match customer hierarchies across CRM and ERP with survivorship rules.
These services should produce more than pass/fail counts. They need classifications:
- matched
- pending due to expected latency
- unmatched
- duplicate
- partially matched
- superseded by correction
- excluded by policy
That classification language matters because it turns vague distrust into actionable states.
3. Temporal semantics
Reconciliation is often less about values than about time.
Two systems may both be correct but not yet converged. That means every serious platform needs a notion of data freshness, completeness horizon, and effective time. A report should know whether it is showing order state as-of event time, ingestion time, or accounting close time. If that sounds fussy, good. Enterprises lose credibility when they blur these distinctions.
4. Exception workflows
Some mismatches require human intervention. Build for that.
An exception workbench should allow operations, finance, or data stewardship teams to inspect unresolved breaks, annotate causes, trigger replays, or route source corrections. This is where architecture meets reality. Without an exception path, reconciliation devolves into nightly emails and spreadsheet heroics.
5. Kafka where it helps
Kafka is valuable here, but not magical.
Use Kafka for event distribution, decoupling, and near-real-time propagation of operational changes. It is especially useful when multiple reconciliation services need the same event stream, or when you want replayable processing for late fixes. But do not mistake Kafka for a source of business finality. Streams are excellent at expressing what was observed. They are not, by themselves, a substitute for domain authority or financial closure. event-driven architecture patterns
A practical pattern is event-carried state from microservices into Kafka, then durable domain data products with versioned schemas, then reconciliation processors that join events with ledger snapshots or CDC-fed master data.
6. Lineage and auditability
Trust requires explainability.
Every reconciled fact should trace back to source identifiers, source timestamps, transformation versions, and reconciliation rule versions. If revenue changed between Tuesday and Wednesday, someone should be able to answer why without archaeology.
That is not a luxury feature. In regulated industries, it is table stakes.
Migration Strategy
No enterprise gets to this architecture in one grand rewrite. Nor should it try.
The migration pattern that works is progressive strangler migration. Start where mistrust is most expensive. Build new trust-bearing paths around the old estate. Replace by proof, not by proclamation.
Step 1: Pick a high-value reconciliation seam
Choose a business seam where mismatches hurt: order-to-cash, shipment-to-invoice, quote-to-contract, claim-to-payment. Do not start with “enterprise customer 360.” That is usually too broad and too political.
Step 2: Establish authoritative facts
For the chosen seam, identify which system owns which fact. Be ruthless. Ambiguity here will poison everything downstream.
Step 3: Publish domain data products alongside legacy extracts
Do not shut off the old warehouse feeds on day one. Stand up domain-aligned products in parallel. Consumers need comparison periods.
Step 4: Introduce a reconciliation product
Build one reconciliation service that classifies differences and exposes exception states. Measure mismatch rates, latency windows, and causes.
Step 5: Move priority consumers to trusted semantic views
Migrate finance reporting, operational dashboards, or executive KPIs only after the reconciliation layer proves itself. Trust is won with boring consistency over time.
Step 6: Strangle legacy conformed models gradually
As trusted semantic views cover more ground, retire brittle central transformations and duplicated business rules.
Here is the migration shape.
This is more patient than most platform roadmaps. Good. Patience is cheaper than a failed transformation.
Migration reasoning that matters
The strangler approach works because reconciliation is inherently comparative. During migration, you want old and new views running side by side. That exposes hidden business rules embedded in legacy jobs, user workarounds, and monthly adjustments. A big-bang migration tends to miss these details until after go-live, at which point every defect becomes a political incident.
Also, progressive migration respects bounded contexts. You can modernize order-to-cash without first solving every customer identity problem in the enterprise. That sounds obvious, yet many programs collapse because they pursue universal data unification before delivering any trusted slice.
Enterprise Example
Consider a global manufacturer with e-commerce channels, distributor sales, and after-sales service. It has SAP for ERP, Salesforce for CRM, a custom order management platform, regional warehouse systems, and a payments microservice stack for direct online sales. It also has three revenue numbers in active use.
Sales reports booked orders from CRM opportunities and confirmed orders. Operations reports shipped and delivered line items. Finance reports recognized revenue after delivery terms, returns reserves, and accounting adjustments. Monthly close turns into a ritual of explanation.
The company launches a data platform modernization initiative. The first instinct is a central customer and order model in the lakehouse. That stalls. Every domain argues over definitions. Meanwhile, leadership still cannot explain why direct-to-consumer gross sales differ from cash receipts and recognized revenue.
The better move is to treat order-to-cash as a reconciliation topology.
- Orders product: authoritative from order management, line-level status and commercial terms.
- Payments product: authoritative from payments microservices via Kafka, including auth, capture, chargeback, refund.
- Shipment product: authoritative from warehouse systems, event-based with effective fulfillment timestamps.
- Finance ledger product: authoritative from SAP journal and revenue posting views.
- Customer product: contextual identity mappings, not a forced universal golden record.
The reconciliation layer implements three key services:
- Order-payment reconciliation for e-commerce channels.
- Shipment-invoice reconciliation for fulfillment and billing.
- Cash-to-revenue reconciliation for finance close support.
Within four months, the company has a trusted semantic layer exposing:
- booked sales
- captured cash
- shipped revenue basis
- recognized revenue
Each metric includes freshness and completeness metadata. Dashboards display pending windows and exception counts. Finance still owns accounting truth. The platform does not replace SAP. It explains the path from commerce activity to finance postings.
That is the point.
The project succeeds not because it unified all data, but because it reduced ambiguity where the business felt pain. Two years later, the same architecture pattern expands into returns, rebates, and service contracts. Trust compounds.
Operational Considerations
A reconciliation-oriented platform lives or dies by operations.
Data contracts
Domain producers need versioned contracts with explicit change policies. If a payments service changes event shape or semantics, downstream reconciliation logic must detect and adapt. Contract testing is not just for APIs. It belongs in event schemas and CDC-fed products too.
Service level indicators
Measure more than pipeline uptime. Useful SLIs include:
- freshness lag by data product
- completeness horizon
- reconciliation match rate
- unresolved exception age
- duplicate event rate
- replay success rate
- semantic contract violations
These are trust indicators, not mere platform vanity metrics.
Backfills and replays
You will need them. Late events, corrected journal entries, retroactive master data changes, and rule updates are normal in enterprise systems. Design idempotent processing and partitioned replay strategies from the start.
Security and governance
Trust also has a governance dimension. Reconciliation data often exposes sensitive financial and customer information across domains. Access control must reflect context. Finance-grade data needs stricter controls than general operational telemetry. EA governance checklist
Metadata visibility
Expose confidence signals to consumers. A BI model should say when figures are provisional, complete through a cutoff, or impacted by unresolved exceptions. Hidden caveats destroy trust faster than visible imperfections.
Tradeoffs
This architecture is not free.
The first tradeoff is complexity. You are adding a reconciliation layer instead of pretending one giant semantic model will solve everything. That means more products, more contracts, more operations. In return, you get honesty.
The second tradeoff is duplication of concepts across bounded contexts. There may be multiple customer or order representations. Purists may object. Ignore them. Some duplication is the price of preserving meaning.
The third tradeoff is latency. Reconciled truth is often slower than raw event visibility. That is acceptable if clearly communicated. Users can work with fast provisional views and slower final views, provided the distinction is explicit.
The fourth tradeoff is organizational. This architecture forces ownership conversations. Domain teams must stand behind their facts. Central platform teams must stop making silent semantic decisions on behalf of the business. Some organizations find this uncomfortable because it reveals where governance is weak. ArchiMate for governance
That discomfort is useful.
Failure Modes
There are predictable ways this can go wrong.
1. Canonical model relapse
The team says it is doing domain-oriented design, then quietly reintroduces a single enterprise model for everything. The result is semantic compromise and brittle mapping logic.
2. Reconciliation as reporting only
Mismatches are counted in dashboards but not operationalized. No exception workflow, no remediation path, no ownership. This creates visible distrust rather than solving it.
3. Kafka worship
The platform assumes event streams are automatically trustworthy. Duplicate, out-of-order, missing, and semantically ambiguous events then propagate confusion at speed.
4. No temporal model
Teams compare snapshots from different times and call the differences defects. Without event time, processing time, and business effective time, reconciliation becomes theater.
5. Ignoring correction flows
Enterprise systems post reversals, adjustments, and backdated changes. If your design only handles straight-line inserts, trust collapses during the first quarter close.
6. Central team owns business meaning alone
A platform team without domain engagement cannot define finance, commerce, and fulfillment semantics correctly. The result is technically elegant nonsense.
Here is a useful way to think about control flow for reconciliation.
The diagram is plain because the reality is plain: trust comes from seeing disagreement, classifying it, and closing the loop.
When Not To Use
This approach is not universal.
Do not build a heavyweight reconciliation topology for a small internal analytics platform with a handful of stable sources and low business consequence for mismatch. A simpler warehouse with clear source lineage may be enough.
Do not lead with this pattern if the real problem is basic operational data discipline. If source systems lack stable identifiers, timestamps, ownership, or minimally sane process controls, reconciliation architecture will expose chaos but not fix it. You may need operational remediation first.
Do not over-engineer bounded contexts where the domain is trivial. Some datasets really are just reference data or append-only telemetry. Not every topic deserves a semantic cathedral.
And do not use reconciliation as an excuse to avoid source modernization forever. A platform can mediate inconsistency, but it should not become a permanent dumping ground for broken upstream behavior.
Related Patterns
Several related architectural patterns fit naturally around this one.
Data mesh, if used carefully. The good part of data mesh is domain ownership and data as a product. The bad implementation pattern is dumping governance onto domain teams without shared semantic and operational standards. Reconciliation topology provides the connective tissue data mesh often lacks.
Event sourcing. Useful in some microservices because it preserves a replayable history. But event sourcing does not remove the need for cross-domain reconciliation. If anything, it makes it more important.
CQRS. Helpful where operational reads and analytical projections differ. Reconciliation services often resemble specialized read models, but they should remain explicit about business meaning rather than masquerading as generic projections.
Ledger-centric architecture. Essential in finance-heavy domains. A ledger provides durable truth for monetary facts, but operational workflows still need reconciliation against orders, shipments, entitlements, or claims.
Master data management. Sometimes necessary, especially for customer or product identity. But MDM is not a substitute for reconciliation. Matching records is not the same as reconciling business facts.
Summary
A data platform is not trustworthy because it is centralized, modern, real-time, or built on fashionable tooling. It is trustworthy because it can explain itself.
That is why reconciliation topology matters.
Model the platform around bounded contexts. Preserve authoritative truths where they belong. Introduce explicit reconciliation services that compare and classify differences across domains. Make timing, completeness, and correction visible. Migrate progressively with a strangler strategy, proving trust in one business seam at a time.
If that sounds less glamorous than promising a single source of truth, good. Enterprise architecture is not a talent show. It is the craft of making large systems believable.
And in data platforms, believability is everything. When people trust the numbers, the platform becomes infrastructure. When they don’t, it becomes folklore.
Frequently Asked Questions
What is a data mesh?
A data mesh is a decentralised data architecture where domain teams own and serve their data as products. Instead of a central data team, each domain is responsible for data quality, contracts, and discoverability.
What is a data product in architecture terms?
A data product is a self-contained, discoverable, trustworthy dataset exposed by a domain team. It has defined ownership, SLAs, documentation, and versioning — treated like a software product rather than an ETL output.
How does data mesh relate to enterprise architecture?
Data mesh aligns data ownership with business domain boundaries — the same boundaries used in domain-driven design and ArchiMate capability maps. Enterprise architects play a key role in defining the federated governance model that prevents data mesh from becoming data chaos.