Your Feature Store Depends on Domain Semantics

⏱ 20 min read

A feature store can look wonderfully modern while quietly encoding the oldest failure in enterprise architecture: nobody knows who really owns the meaning of the data.

That is the seduction. Teams stand up Kafka topics, stream processors, online stores, offline stores, backfills, point-in-time joins, and a polished model training pipeline. The diagrams look impressive. The vendor demos sparkle. Data scientists can finally discover and reuse “customer features” and “merchant features” and “risk features” at scale.

Then the arguments begin.

What exactly is an “active customer”? Does “available balance” include pending transactions? Is “merchant tenure” measured from legal onboarding, first successful payment, or first settled payout? Which team is allowed to redefine “chargeback risk”? Why did the batch feature used for training not match the online feature used for inference? Why did the fraud model degrade after the payments domain changed event semantics but never changed the schema?

This is where architecture stops being plumbing and becomes language. A feature store is not merely a serving layer for machine learning. It is a semantic system. If it lacks domain semantics, it becomes a high-speed amplifier for ambiguity.

That is the central point: your feature store only works when feature ownership follows domain ownership, and domain ownership follows bounded contexts, not infrastructure boundaries.

A good feature platform is not built from Redis, S3, Kafka, and Spark alone. It is built from decisions about meaning, authority, reconciliation, and change. If you miss that, you get reuse without trust, scale without coherence, and models that fail in production for reasons that no observability dashboard can explain. event-driven architecture patterns

Let’s get concrete.

Context

Most enterprises arrive at feature stores through pain, not strategy.

The first pain is duplication. Different machine learning teams compute nearly identical features in notebooks, pipelines, and services. “Customer spend in 30 days” appears in eight places with six definitions. Offline training data is expensive to recreate. Online serving is inconsistent. Teams want one place to register, discover, compute, and serve reusable features.

The second pain is productionization. A model that looked good in training must now run in a real business process: fraud checks during authorization, cross-sell recommendations in a mobile app, credit scoring during loan origination, supply chain risk scoring during procurement. This forces the organization to solve low-latency access, historical correctness, lineage, and governance. EA governance checklist

The third pain is organizational. As soon as a feature becomes valuable, multiple domains want it. Customer service wants the same customer-health indicators used by churn models. Finance wants merchant risk indicators created by fraud teams. Marketing wants lifecycle features created in product analytics. Features begin to act like shared enterprise assets.

That is where many companies make the wrong move. They centralize the computation and storage, but not the meaning. They create a data platform team that owns the feature store as if it were a neutral utility. But features are not neutral. A feature is a statement about the business. “Customer delinquency count over 90 days” is not just a number. It encodes policy decisions, event inclusion rules, time windows, and business exceptions.

In domain-driven design terms, a feature belongs to a bounded context before it belongs to a platform.

This matters more in event-driven microservices environments. Kafka makes data movement easy, sometimes dangerously easy. Topics flow everywhere. Teams subscribe to everything. Derived features multiply. Soon, the same raw event is interpreted differently by five consumers, each convinced they are being practical. This creates what looks like autonomy but behaves like semantic drift.

Feature stores do not remove this problem. They crystallize it.

Problem

The usual feature store architecture assumes the hard problem is technical consistency: low-latency retrieval, point-in-time correctness, offline/online parity, metadata management, and compute orchestration.

Those are real problems. They are not the first problems.

The first problem is semantic authority.

Who defines the source-of-truth feature? Who decides whether “approved application” means underwriting approval or final account activation? Which team can deprecate a feature? If two teams produce similar features, which one becomes canonical? If a domain event changes business meaning without changing structure, who catches it before models go sideways?

Without clear answers, the feature store becomes a warehouse of plausible numbers. It may be internally well-engineered and externally untrustworthy.

A second problem follows quickly: ownership inversion. Platform teams often end up owning business-derived features because they operate the pipelines. That is backwards. The team closest to the domain meaning should own the feature definition; the platform team should own the enabling machinery. When these are mixed, every feature change becomes either a ticket queue or a governance battle. ArchiMate for governance

A third problem is temporal mismatch. Training wants historical correctness. Serving wants current state. Reconciliation wants to explain discrepancies across systems, windows, and late-arriving events. If the organization cannot explain how a feature evolves over time, it cannot safely retrain models, audit decisions, or debug incidents.

A fourth problem is migration. Enterprises rarely start fresh. They already have SQL transformations in warehouses, hand-built scoring services, batch aggregates in Hadoop or Spark, streaming enrichments in Flink, maybe a vendor feature platform, maybe three. Features live in many places. The right architecture must support progressive strangler migration rather than a single grand rewrite.

And there is a fifth problem, the one executives discover only after spending heavily: feature reuse across domains is not free. It introduces coupling. Reuse is only healthy when consumers understand whether they are using a feature as a domain fact, a derived interpretation, or a model-specific convenience.

That distinction is often the difference between architecture and chaos.

Forces

Several forces pull this architecture in different directions.

Data scientists want speed. They need discoverability, reproducibility, and easy access to historical features. They do not want to negotiate with six service teams every time they need a new variable.

Domain teams want autonomy. They own the business process, the source systems, and the language of the domain. They do not want a central platform team redefining terms they are accountable for.

Platform teams want standardization. They are measured on reliability, cost, governance, and operational consistency. They know that bespoke pipelines in every domain lead to fragility.

Operational systems want low latency and resilience. Fraud scoring at payment authorization cannot wait for a warehouse query. Recommendation systems need fresh signals. Credit decisions require deterministic explanations.

Governance wants lineage and auditability. Especially in regulated environments, the enterprise must prove where a feature came from, what logic generated it, and what changed over time.

History fights back. Late events arrive. Source systems backfill data. Event contracts evolve. Reference data changes retroactively. Reconciliation is not a nice-to-have; it is the price of honest systems.

And then there is Conway’s Law, always sitting in the corner of the room, smirking. The feature store will reflect the organization that builds it. If business meaning is fragmented across teams with unclear boundaries, the feature store will mirror that confusion at machine speed.

Solution

The solution is simple to state and annoyingly hard to execute:

Treat features as domain products with explicit semantic ownership, implemented on a shared feature platform.

That means three things.

First, features are classified by semantic level.

Domain facts: canonical business facts derived directly from a bounded context. Example: merchant_first_settlement_date, account_current_balance, loan_days_past_due.
Domain interpretations: features that apply context-specific business logic. Example: merchant_operational_risk_band, customer_financial_stress_indicator.
Model-serving features: convenience transformations tailored for a model or use case. Example: log-scaled counts, bucketized ratios, rolling windows designed for a fraud model.

These levels should not be mixed casually. Domain facts can often be reused broadly. Domain interpretations are reusable inside adjacent contexts with care. Model-serving features are usually the least stable and should rarely be treated as enterprise-wide assets.

Second, ownership follows bounded contexts.

The team that owns the domain should own the semantics, contracts, quality rules, and lifecycle of the feature definitions closest to that domain. The platform team owns registration, storage abstractions, lineage tooling, access control, materialization frameworks, and serving infrastructure.

A platform team should make feature publication easy. It should not become the author of business truth.

Third, reconciliation is designed in, not added later.

Every nontrivial feature architecture must account for divergence between online and offline stores, stream and batch computation, source systems and derived views, current state and historical truth. Reconciliation pipelines, drift reports, and exception handling need to be first-class architectural elements.

That is the heart of the architecture: semantic ownership at the edge, technical standardization in the middle, reconciliation everywhere.

Architecture

A feature architecture that respects domain semantics usually has four layers.

Domain event and state sources
Feature definition and publication owned by domains
Shared feature platform for materialization and serving
Consumption by training, batch scoring, and online inference

The key is that feature definitions are attached to domain contracts, not hidden inside generic transformations.

This diagram looks ordinary. The real architectural move is hidden in the labels.

Each domain publishes feature definitions as versioned artifacts with metadata: owner, semantic type, source contracts, freshness expectations, quality rules, point-in-time logic, and deprecation policy. The feature platform materializes these definitions into offline and online representations. It does not invent them.

A practical pattern is to let domains expose features in one of two ways:

Push model: the domain team computes and publishes features into the platform.
Declarative model: the domain team declares feature logic in a governed definition framework, and the platform materializes it.

The declarative model is usually better at scale. It creates consistency in lineage, testing, and serving behavior. But there is a tradeoff: too much abstraction and domain teams start bypassing the platform because it cannot express their logic. That is how shadow pipelines are born.

Ownership model

Here is the ownership model that tends to work in large enterprises.

This is opinionated because ambiguity here causes years of damage. If the fraud team authors a canonical merchant risk feature based on merchant lifecycle events, then the merchant domain must be either explicitly delegating that semantic authority or formally consuming and extending upstream merchant facts. Otherwise the fraud team becomes an accidental owner of merchant truth.

That seems harmless at first. It rarely is.

Kafka and microservices

Kafka is useful here, but only when used with restraint. Domain events should carry business facts, not pre-chewed analytics interpretations for every downstream use. The feature platform can consume events, state snapshots, or CDC streams to compute features. But event streams are not a substitute for bounded contexts. Shipping more events does not create more clarity.

In microservices estates, a healthy pattern is:

Domain services emit business events and expose authoritative state where necessary.
Feature definitions reference domain events and state using approved contracts.
Stream processing computes freshness-sensitive features.
Batch recomputation handles historical rebuilds and late-arriving corrections.
Reconciliation compares the two.

The temptation is to compute everything in streaming and declare victory. Don’t. Backfills, corrections, and historical replay are the unpaid debt of event-driven systems. A feature architecture must support both incremental computation and full historical reconstruction.

Reconciliation architecture

This is the part many teams postpone. They pay for that later.

Reconciliation must answer practical questions:

Does the online feature match the offline feature for the same entity and timestamp?
If not, is the difference due to lateness, correction, event duplication, windowing logic, or code drift?
Which discrepancy is acceptable, and which should page a team?
How are corrections propagated to training data, to audit records, and to downstream consumers?

Enterprises that skip this create a false sense of reliability. Their model performance declines gradually, and everyone argues about whether the model drifted or the data changed. Often both happened. Without reconciliation, you do not know.

Migration Strategy

No large enterprise installs this architecture on a clean field. You inherit SQL jobs, warehouse marts, stream processors, vendor tools, and “temporary” code that has survived three reorganizations.

So migration must be progressive. This is a strangler pattern, not a moon landing.

Start by classifying existing features.

Canonical domain facts
Shared derived features
Model-specific transforms
Obsolete or duplicate features

This inventory is not glamorous work. It is necessary. Most organizations discover that their biggest feature store problem is not missing infrastructure but duplicate semantics.

Then migrate in slices by business value and semantic clarity, not by technical neatness.

A sensible sequence is:

Phase 1: Registry without disruption

Create a central feature catalog and metadata model before forcing computation changes. Register existing features, owners, consumers, freshness requirements, and known duplicates. This creates visibility. It also reveals where no owner exists, which is a warning sign.

Phase 2: Establish domain ownership

For high-value features, assign semantic owners in bounded contexts. Do not start with every feature. Start with the ones used by critical models and shared across teams.

Phase 3: Dual materialization

Materialize selected features through the new platform while preserving legacy pipelines. Run both in parallel. Compare outputs, freshness, and cost. This is where reconciliation earns its keep.

Phase 4: Shift consumers

Move training pipelines and online inference services to consume platform-served features. Preserve compatibility contracts where possible. Avoid forcing every consumer to rewrite at once.

Phase 5: Strangle legacy pipelines

Retire legacy feature computations only after parity thresholds, governance controls, and operational runbooks are proven.

This progression matters because feature migration has a hidden blast radius. A feature is often consumed by models no one documented properly. Deleting or changing it can break production in subtle ways.

A practical migration rule: migrate semantics before storage. If you centralize the store but leave ownership unresolved, you have simply moved confusion into a more expensive system.

Another rule: version feature definitions explicitly. When a business rule changes, do not silently mutate history. Introduce a new feature version, preserve lineage, and define the transition plan. Models need stable training semantics.

Enterprise Example

Consider a global payments company operating merchant acquiring, card issuing, fraud detection, and lending services. It has dozens of microservices, Kafka as the event backbone, a cloud data lake, and separate data science teams for fraud, growth, and underwriting. microservices architecture diagrams

The company created a feature store to serve fraud and credit models. It began well enough. Teams loaded transaction aggregates, merchant attributes, cardholder behavior, device fingerprints, dispute counts, and account balances into the platform. Reuse improved quickly.

Then trouble appeared.

The underwriting team adopted a feature called merchant_tenure_days. Fraud had authored it. Their definition started the clock at first observed payment authorization. Underwriting assumed tenure began at merchant onboarding approval. Those dates diverged significantly for merchants that onboarded long before processing their first payment.

This was not a technical bug. The pipeline ran perfectly.

The business bug was deeper: two bounded contexts had different meanings for the same phrase. Fraud cared about observed processing history. Underwriting cared about contractual relationship age. Both were reasonable. Neither was universal.

The platform had accidentally promoted a context-specific interpretation into an enterprise feature.

The fix was architectural. The merchant domain introduced canonical facts:

merchant_onboarding_approved_at
merchant_first_live_processing_at
merchant_first_settlement_at

Fraud then defined its own interpretation:

merchant_processing_tenure_days

Underwriting defined another:

merchant_relationship_tenure_days

Both were registered in the feature platform with clear ownership, lineage, and consumer guidance. Existing models were migrated through versioned updates. Reconciliation jobs validated historical parity for backfilled training sets.

The same company later faced a batch/stream mismatch in account_available_balance. The online feature excluded transactions still pending final posting because the issuing service computed available balance operationally for authorization decisions. The offline training pipeline used posted ledger state from the warehouse and effectively ignored some pending holds. Fraud models trained on one behavior and scored on another.

Again, the technology was not the root cause. The semantic boundary was.

They solved it by separating:

account_operational_available_balance
account_ledger_posted_balance
account_pending_hold_amount

Then they encoded approved combinations for specific use cases. Fraud online scoring consumed the operational form. Financial reporting and some training pipelines used the ledger-aligned form. Historical replay logic included late-arriving hold-release corrections. Reconciliation reports quantified divergence between online and offline values by account type and event lag.

That is what enterprise architecture looks like in practice. It is not “implement a feature store.” It is “make meaning explicit so the store deserves trust.”

Operational Considerations

A feature store that respects domain semantics still lives or dies operationally.

Metadata must be first-class. Every feature should include owner, domain, semantic classification, source contracts, computation logic, freshness SLA, retention, PII classification, and deprecation status. A feature without metadata is a rumor.

Lineage must be executable, not decorative. It should be possible to answer: which events, tables, and code versions produced this feature value? Which models consumed it? Which change caused the shift last Tuesday?

Freshness and staleness policies matter. Not every feature needs sub-second updates. Over-engineering freshness is expensive. A useful architecture distinguishes between operationally critical real-time features and slower-moving features suitable for batch materialization.

Quality assertions belong with features. Null rates, distribution bounds, key uniqueness, temporal constraints, and cross-source consistency checks should be attached to the feature definition. Domain teams define what “wrong” looks like; the platform enforces and reports it.

Access control should follow domain sensitivity. A feature store often becomes a back door to sensitive customer attributes. Fine-grained authorization, purpose-based access, and masking rules are necessary, especially for regulated industries.

Cost can spiral. Online serving, repeated backfills, and duplicate storage in batch and online systems are not free. Reuse should reduce total cost, but many organizations discover the opposite because they materialize everything at every cadence. Be selective.

Observability must include semantic drift. Infrastructure telemetry alone is not enough. You also need monitors for definition changes, distribution changes, parity gaps, and upstream contract shifts.

Tradeoffs

There is no free lunch here, only better debts.

The big tradeoff is local autonomy versus enterprise consistency. If domains own semantics, they may define similar concepts differently. That is healthy up to a point. Over-standardize, and you erase real business distinctions. Under-standardize, and consumers drown in near-duplicates.

Another tradeoff is declarative governance versus expressive flexibility. A strict feature definition framework improves consistency, discoverability, and platform automation. But if it cannot model complex temporal logic or domain nuance, teams will build side pipelines. The architecture then fragments.

There is also reuse versus coupling. Reusing domain facts is usually good. Reusing higher-level interpretations can accidentally couple unrelated consumers to one team’s business logic and release cadence.

And then streaming versus batch. Streaming gives freshness. Batch gives rebuildability and historical control. Serious feature architectures need both, which means accepting duplication in implementation paths and investing in reconciliation.

Finally, central platform efficiency versus domain capability uplift. If domain teams lack data engineering maturity, giving them semantic ownership may initially slow progress. But taking ownership away creates a bigger long-term problem: a central bottleneck with weak business understanding.

Failure Modes

The common failure modes are surprisingly predictable.

The platform team becomes semantic owner by accident. They run the pipelines, so everyone assumes they own the features. Definitions drift away from the business.

The feature catalog becomes a junk drawer. Everything gets registered, little gets curated, duplicate features proliferate, and trust collapses.

Offline/online parity is assumed, not verified. Models train on one reality and score on another.

Kafka topics become de facto domain models. Consumers infer business meaning from event streams without domain stewardship. This is especially dangerous when events change behavior but keep the same schema.

Versioning is ignored. Feature logic is changed in place, historical meaning mutates, and reproducibility disappears.

Migration stops halfway. The enterprise ends up with a shiny feature store plus all the old pipelines still running because no one had the authority to retire them.

Model-specific features are promoted to enterprise canon. Convenient at first, toxic later.

These failures are not exotic edge cases. They are the default outcomes when semantics are treated as documentation instead of architecture.

When Not To Use

Not every organization needs a sophisticated feature store architecture.

If you have a small number of models, a single analytical domain, low online inference needs, and a tightly coupled team, a warehouse-centric approach with disciplined transformation practices may be enough. The overhead of feature registries, online stores, domain governance, and reconciliation may outweigh the value.

Likewise, if your organization has not yet established bounded contexts or domain ownership in its operational systems, introducing a feature store as a central data product can be premature. You will simply centralize existing ambiguity.

And if your use cases are mostly experimentation with low production criticality, keep things simpler. A lightweight metadata catalog and reproducible data pipelines may serve better than a full enterprise feature platform.

A feature store is not a maturity shortcut. It magnifies whatever semantic discipline you already have—or lack.

Several patterns sit naturally around this architecture.

Bounded Contexts from domain-driven design are the foundation. Features must be interpreted inside domain language before being reused outside it.

Data Products from data mesh thinking are relevant, but with a caution: a feature is not just any data product. It has temporal and serving semantics that need tighter contracts.

Strangler Fig Migration is the right migration pattern. Wrap legacy feature pipelines, introduce the new platform incrementally, then retire old paths safely.

CQRS can help separate operational state concerns from analytical and model-serving read models, especially when online serving requires denormalized access patterns.

Event Sourcing can support high-fidelity historical reconstruction, but it is not required. And in many enterprises it is overkill compared to disciplined event logs plus snapshot state.

Anti-Corruption Layers are useful when consuming features across domains. They help translate one bounded context’s meaning into another without pretending the two are identical.

Summary

A feature store is not primarily a storage problem. It is a semantics problem dressed in infrastructure clothing.

If you remember only one thing, make it this: the right owner of a feature is the team that owns its business meaning, not the team that runs the pipeline.

Everything else follows from that.

Build a shared platform, yes. Standardize metadata, lineage, serving, governance, and materialization. Use Kafka where event streams add freshness and decoupling. Support microservices, online inference, historical training, and progressive strangler migration. Invest in reconciliation because reality arrives late and often in conflict.

But do not confuse movement with meaning.

A fast feature store with weak domain semantics is a rumor mill for machines. A slower, well-owned feature platform becomes something far more valuable: a trustworthy operational language for intelligent systems.

That is the difference between architecture that scales and architecture that merely grows.

Frequently Asked Questions

What is enterprise architecture?

Enterprise architecture aligns strategy, business processes, applications, and technology in a coherent model. It enables impact analysis, portfolio rationalisation, governance, and transformation planning across the organisation.

How does ArchiMate support architecture practice?

ArchiMate provides a standard language connecting strategy, business operations, applications, and technology. It enables traceability from strategic goals through capabilities and services to infrastructure — making architecture decisions explicit and reviewable.

What tools support enterprise architecture modeling?

The main tools are Sparx Enterprise Architect (ArchiMate, UML, BPMN, SysML), Archi (free, ArchiMate-only), and BiZZdesign. Sparx EA is the most feature-rich, supporting concurrent repositories, automation, scripting, and Jira integration.