AI Feature Pipelines Are Domain Models

⏱ 22 min read

Most feature pipeline discussions start in the wrong place.

They start with tools. A feature store. A streaming bus. A DAG engine. Maybe a lakehouse, maybe Kafka, maybe some heroic SQL that nobody wants to admit is business logic. The result is familiar: an impressive machine that can move data very quickly while steadily erasing the meaning of the data it moves. event-driven architecture patterns

That is the central mistake.

An AI feature pipeline is not merely plumbing. It is not an ETL flow with a nicer name. It is not “data prep for machine learning.” In an enterprise, a feature pipeline is a domain model under operational stress. It encodes definitions like active customer, delinquent account, household risk exposure, intent to churn, and available inventory. Those are not technical constructs. They are business semantics with money attached.

Once you see feature pipelines this way, architecture decisions become less decorative and more serious. Ownership matters. Bounded contexts matter. Reconciliation matters. The topology of your pipelines matters because topology determines who gets to define reality.

And in large organizations, reality is contested.

Sales has one notion of customer. Billing has another. Risk has a third, usually in a spreadsheet blessed by auditors. Data science may create a fourth because the model needs a stable training signal. If these definitions are allowed to float freely through feature engineering pipelines, the enterprise gets what it deserves: inconsistent predictions, endless “why did the model do that?” meetings, and a brittle stack where every schema change feels like a small fire.

So this article takes a harder line. AI feature pipelines should be designed as domain-owned architectures. They need explicit semantic boundaries, published contracts, and migration paths that respect how legacy systems actually work. The point is not ideological purity. The point is operational sanity.

If that sounds like domain-driven design invading the data platform, good. It should. enterprise architecture with ArchiMate

Context

Enterprises are trying to industrialize AI while standing on top of a landscape built for reporting, not for semantics. The typical estate looks something like this:

  • core systems of record in ERPs, CRMs, policy admin systems, trading platforms, or custom line-of-business applications
  • operational microservices, often event-driven, usually incomplete in their domain coverage
  • Kafka or another streaming platform carrying business events of uneven quality
  • a data lake or lakehouse collecting everything “for future analytics”
  • teams building online and offline features separately because latency and tooling differ
  • a feature store introduced to reduce duplication, but often becoming yet another semantic battleground

The technical stack is modern enough to look impressive in architecture review decks. But the organization beneath it is still fragmented. Data teams centralize ingestion. Platform teams centralize tooling. Domain teams own source systems but not always the downstream interpretation. Data scientists operate under pressure to ship models and are rewarded for predictive lift, not semantic hygiene.

That arrangement works for dashboards. It fails for AI.

Why? Because feature pipelines aren’t passive. They derive facts. They aggregate over time. They collapse multiple bounded contexts into one training row. They carry hidden assumptions about state, time, causality, and truth. As soon as a pipeline computes “customer lifetime value in the last 12 months” or “fraud velocity score over rolling seven days,” it is creating a business concept. If nobody owns that concept as part of a domain, the pipeline will.

And pipelines are terrible governors.

The proper question, then, is not “How do we compute features efficiently?” The proper question is “Which domain owns the meaning of this feature, and what topology lets us preserve that meaning from source to model to production inference?”

That is architecture.

Problem

Feature engineering in many enterprises drifts into a semantic free-for-all. The symptoms are easy to spot.

One model uses customer keyed by CRM contact ID. Another uses billing account ID. A third uses a household surrogate assembled in the data lake. Teams say they are using the same feature because the column name matches, but they are not talking about the same thing.

Offline features are computed with full historical snapshots; online features are computed from Kafka streams with different late-arrival rules. Training and serving skew follows. The model performs beautifully in validation and oddly in production. Everyone blames “data quality” as if that were a force of nature rather than a design decision.

A central data team builds a canonical feature layer to help. This often makes things worse. The layer starts as a convenience and ends as a shadow domain model owned by people far from the business decisions. Definitions become generic to satisfy everyone and precise enough for no one. The enterprise gets reusable mush.

There is also a topology problem. Pipelines are frequently organized around technology layers rather than domain semantics:

  • source-aligned ingestion
  • standardized bronze/silver/gold transformations
  • enterprise feature layer
  • model-specific feature subsets
  • serving adapters

That sequence is clean on a whiteboard. In practice it allows semantics to be repeatedly reinterpreted at every hop. By the time a feature reaches a model, it may be technically lineage-traceable but conceptually untrustworthy.

In AI systems, trust is not a dashboard metric. Trust is whether a decision can be defended. If a bank declines a credit application, or an insurer flags a claim, or a retailer suppresses a customer offer, the organization needs to explain not just the model but the business meaning of the inputs. That explanation cannot be reconstructed from anonymous transformations and hopeful naming conventions.

The problem, in short, is this: enterprises have treated feature pipelines as data processing assets when they are really domain assets.

Forces

Good architecture emerges by respecting forces, not by pretending they are absent.

1. Domain semantics are local, but AI consumption is cross-domain

A churn model may need subscription data, billing events, support interactions, and marketing engagement. These belong to different bounded contexts. The model wants a single row. The enterprise has many truths.

This tension never goes away. We need a way to preserve local ownership while assembling cross-domain features responsibly.

2. Online and offline paths diverge unless forced together

Historical training pipelines prefer completeness and replayability. Serving pipelines prefer low latency and resilience. If built independently, they drift. If forced into one path without thought, they become unusable for one side or the other.

The answer is not “one pipeline to rule them all.” The answer is shared semantic contracts with explicit computation strategies.

3. Legacy systems are immovable objects

No enterprise starts with a clean event model. The source of truth may be a mainframe, a batch export, an ERP table, or a hand-built application with no proper business events. Migration must deal with this world, not the one architects wish they had inherited.

4. Ownership and incentives are misaligned

Platform teams optimize standardization. Domain teams optimize business outcomes. Data science teams optimize model performance. Risk and compliance optimize auditability. These are not bad goals. But left unmanaged, they produce architectures where every team creates a local workaround and no one owns end-to-end semantics.

5. Reconciliation is unavoidable

When multiple systems contribute to a feature, there will be disagreement: duplicate identifiers, missing events, out-of-order updates, conflicting statuses. Enterprises that pretend reconciliation can be hidden inside “data quality rules” eventually discover that reconciliation is a business policy.

6. Scale punishes ambiguity

At low scale, ad hoc joins and bespoke features are survivable. At enterprise scale, ambiguity multiplies. Hundreds of models, thousands of features, dozens of source systems, and years of schema evolution expose every shortcut.

This is where topology matters. A messy topology amplifies semantic drift. A domain-aware topology contains it.

Solution

The solution is to treat feature pipelines as domain products built within bounded contexts, then composed through published semantic contracts.

That sentence sounds tidy. The implementation is not. But it is worth the effort.

Start with a simple principle: a feature should be owned by the domain that can explain its meaning, not by the team that happens to compute it.

If the feature is days past due, the lending or billing domain should own it. If it is inventory available to promise, the supply chain domain should own it. If it is customer sentiment score from support interactions, the customer service domain should own the semantics, even if a centralized ML platform supplies the runtime.

This does not mean every domain team builds a full data engineering stack. It means semantic authority sits with the domain, and the platform provides common mechanisms: event transport, storage, cataloging, computation frameworks, lineage, and serving infrastructure.

This is classic domain-driven design adapted for AI:

  • Bounded contexts define where terms are meaningful.
  • Ubiquitous language prevents accidental synonym abuse.
  • Context mapping clarifies translation between domains.
  • Aggregates and events shape how state changes are represented.
  • Anti-corruption layers protect domain semantics from legacy ugliness.

A feature pipeline should therefore be modeled in layers of meaning, not just layers of processing:

  1. Source-aligned facts
  2. Raw business events or source extracts, minimally interpreted.

  1. Domain facts
  2. Semantically curated facts inside a bounded context. This is where identity resolution, event normalization, and business rule application happen under domain ownership.

  1. Domain features
  2. Reusable features published by a domain with explicit definitions, time semantics, quality guarantees, and serving characteristics.

  1. Composite decision features
  2. Cross-domain compositions for a specific decision or model family. These are not “enterprise canonical” by default. They exist because a decision needs them.

That last point matters. Enterprises often overreach and try to make all features globally canonical. Most should not be. A feature useful for fraud scoring may be inappropriate for marketing propensity, even if both mention “customer activity.” Canonicalize at the domain level where semantics are stable. Compose at the decision level where purpose is specific.

Here is the topology in a picture.

Diagram 1
AI Feature Pipelines Are Domain Models

This topology does a few important things.

It localizes semantic ownership. It allows domains to evolve independently. It makes cross-domain composition explicit instead of accidental. And it forces conversations about translation and reconciliation into named architectural elements instead of burying them in SQL.

A feature store can fit into this architecture, but it should not define it. The store is a distribution mechanism and registry, not the source of semantic truth. If your feature store becomes the place where business concepts are invented, you have built a very expensive ambiguity engine.

Architecture

Let’s get concrete.

A robust ownership architecture topology for AI feature pipelines typically has five architectural zones.

1. Operational event and data capture

Where possible, capture business events at the point where the domain changes state. Kafka is useful here because it preserves event flow and enables near-real-time propagation. But not all systems can publish meaningful events. Some expose only tables or nightly files. Fine. Capture what exists.

The mistake is to promote every change data capture event into a business event. A row update saying status = 3 is not a domain event. It is a storage mutation. You need an anti-corruption step before it becomes domain-relevant.

2. Domain semantic processing

This is the heart of the design.

Within each bounded context, translate raw events or extracts into domain facts. Resolve identities as the domain understands them. Apply effective dating. Handle late arrivals. Define state transitions. Publish domain-level contracts.

If you skip this step, your “features” are just aggregates over operational residue.

3. Feature product publication

Domains publish reusable features as products:

  • definition and business meaning
  • owner and stewardship contacts
  • keys and identity assumptions
  • freshness expectations
  • historical reproducibility rules
  • null handling and default semantics
  • online/offline availability
  • quality SLOs and known caveats

In other words, a feature is a contract, not a column.

4. Decision-specific composition

A fraud decision, pricing decision, recommendation decision, or claims triage decision may combine multiple domain features. This composition layer owns the policy of combination, including eligibility windows, conflict handling, and fallback logic.

This is where reconciliation across bounded contexts becomes visible.

5. Model training and serving

Training and serving should consume the same semantic definitions, even if the physical computation differs. A batch training job may replay six months of domain events, while online serving may maintain a materialized state from Kafka streams. That is acceptable so long as the semantics are equivalent and tested.

Here is a more detailed view.

5. Model training and serving
Model training and serving

A few opinions, because architecture without opinions is just wallpaper:

  • Use Kafka when event ordering, replay, and multi-consumer propagation matter. Don’t use it as a universal religion. If your source emits one nightly eligibility file, Kafka may simply add choreography around a batch truth.
  • Keep domain feature logic close to domain teams. If a central data team writes all business derivations, they will become a surrogate business function without authority.
  • Prefer event-time semantics over processing-time semantics wherever decisions care about reality rather than pipeline speed.
  • Treat feature backfills as domain replays, not ad hoc data science projects. Backfills are where hidden semantic disagreements surface.

Migration Strategy

No enterprise gets to reboot. So the right migration pattern is a progressive strangler.

You do not replace the old reporting-driven data estate in one move. You carve out decision domains, build semantically owned feature products around them, and gradually route model development and inference to the new topology.

The sequence usually looks like this:

Step 1: Identify one decision with real business pressure

Pick a decision that matters enough to force clarity: fraud detection, churn prevention, credit risk refresh, claims triage, inventory allocation. Avoid generic “enterprise feature platform” programs at the start. They produce architecture before they produce learning.

Step 2: Map bounded contexts and current semantic conflicts

Document the source systems, key entities, identities, and contradictory definitions. If three teams define “active account” differently, write that down before anyone writes code. This is not bureaucracy. This is discovering where your enterprise disagrees with itself.

Step 3: Build anti-corruption layers around the ugliest sources

Translate CDC, files, or tables into domain facts. This is the first semantic shield. It lets downstream consumers deal in business language rather than source quirks.

Step 4: Publish a small number of domain features with strong contracts

Do not publish 500 features because a feature store vendor demo showed a beautiful catalog. Publish 10 that matter and make them defensible.

Step 5: Create a decision composition layer

This layer assembles model-ready features from domain-published products. It should contain explicit reconciliation rules. If billing says an account is closed but customer says active, what wins for the churn model? That is not a data engineering detail. That is policy.

Step 6: Run parallel and reconcile

Train and score using both legacy pipelines and the new domain-oriented pipeline. Compare outputs. Explain differences. Some differences will be bugs. Some will expose old assumptions that nobody knew they were carrying.

Step 7: Strangle legacy dependencies gradually

Move one model, one consumer, one serving path at a time. Keep the old path until operational confidence is real, not merely declared.

The migration topology often looks like this.

Step 7: Strangle legacy dependencies gradually
Strangle legacy dependencies gradually

Reconciliation deserves its own paragraph

In most migrations, reconciliation is where the real work is.

You will find duplicate entities. Events arriving days late. Status updates that overwrite history. One source using UTC, another using local branch time, and a third pretending time does not matter. You will find attributes whose null values mean “unknown” in one system and “not applicable” in another. These are not edge cases. These are the enterprise.

So build reconciliation as an explicit capability:

  • key resolution policies
  • source precedence rules
  • event deduplication
  • late-arrival handling
  • temporal consistency windows
  • discrepancy reporting and stewardship workflows

If reconciliation is hidden in notebook code or embedded in random SQL transforms, your migration will become a superstition engine. It may still produce predictions. It will not produce confidence.

Enterprise Example

Consider a large insurer building AI for claims triage and fraud detection.

The organization has:

  • a policy administration system
  • a claims management platform
  • a customer CRM
  • a payments platform
  • call center systems
  • a Kafka backbone introduced during a microservices modernization program
  • a lakehouse where analytics teams have spent years building “gold” claims tables

On paper, they already have most of what they need.

In practice, they have at least four different definitions of claimant identity, multiple versions of claim status, and a deeply unreliable notion of “time claim opened” because some systems use first notice of loss while others use internal case creation time.

The legacy fraud model is trained from lakehouse tables maintained by a central data engineering team. It performs reasonably but is hard to explain. The online scoring path uses a separate microservice assembling features in real time from APIs and Kafka topics. Training-serving skew is constant. Investigators distrust the scores when they most need them. microservices architecture diagrams

A better architecture starts by accepting that claims, policy, customer, and payments are bounded contexts.

The insurer creates:

  • a Claims domain pipeline that normalizes claim lifecycle events and publishes features like claim age, reopening count, adjuster reassignment velocity, and incident-to-report delay
  • a Policy domain pipeline that publishes policy tenure, coverage complexity indicators, prior endorsement frequency, and lapse history
  • a Payments domain pipeline that publishes payment anomaly features, reimbursement timing patterns, and recovery offsets
  • a Customer domain pipeline that publishes prior contact intensity, household linkage confidence, and channel-switching behavior

Each domain team owns the semantic definitions. The platform team provides Kafka topics, stream processing templates, historical replay support, feature registry, lineage, and serving infrastructure.

Then a Fraud Decision Composition product combines those domain features with explicit reconciliation rules:

  • if claimant identity linkage confidence is below threshold, suppress household-derived features
  • if claim and policy effective dates conflict, use claims event-time with a discrepancy flag
  • if payment events are late, score with fallback features and mark reduced-confidence mode

This changes the conversation.

Instead of arguing whether the model is “wrong,” investigators can examine which domain feature fired and whether its semantics are trusted. Compliance can trace a decision back to domain-owned contracts. Data science can improve model logic without silently redefining core business terms.

Did this architecture remove complexity? Of course not. It put complexity where it belongs.

That is often the best available win in enterprise architecture.

Operational Considerations

Architects who stop at boxes and arrows deserve the outages they get. Feature pipelines live or die operationally.

Data contracts and semantic versioning

A breaking change to a feature definition is not the same as adding a column. If the billing domain changes the rule for days past due, existing models may become invalid overnight. Use semantic versioning for feature contracts. Require declared deprecation windows and impact analysis.

Freshness and staleness policies

Not every feature needs real-time updates. Pretending otherwise creates expensive infrastructure and fragile systems. Define classes of freshness:

  • sub-second for transaction decisions
  • minutes for interaction personalization
  • hourly for operational triage
  • daily for planning or low-sensitivity decisions

Use the cheapest reliable mechanism that matches decision need.

Observability beyond pipeline health

A green pipeline can still produce semantically broken features. Monitor:

  • distribution drift
  • null-rate shifts
  • lateness profiles
  • reconciliation discrepancy counts
  • online/offline feature parity
  • point-in-time join correctness
  • serving fallback frequency

Point-in-time correctness

This is where many AI architectures quietly fail. Training data must reflect what was knowable at decision time, not what became known later. Domain facts should carry valid-time and ingestion-time where possible. If they do not, do not pretend you can do causally sound training.

Access control and policy boundaries

Some domain features carry sensitive information or proxy attributes. A central feature platform should enforce policy tags, not just table permissions. Reuse is not a virtue if it violates regulatory or ethical boundaries.

Cost discipline

Feature platforms become cost furnaces when teams materialize everything at every granularity “just in case.” Domain ownership helps because owners are more likely to challenge wasteful derivations. Platform teams should still enforce lifecycle controls, retention rules, and materialization policies.

Tradeoffs

This architecture is not free.

More upfront modeling

You need real domain discovery. Teams must agree on terms, events, and ownership. That takes time and strong facilitation. People who want to move fast will complain. Some of them are right.

More coordination at boundaries

Cross-domain decisions require context mapping and reconciliation. There is no avoiding the conversation. You are surfacing complexity that was previously hidden in code.

Possible duplication

Two decision products may compose similar domain features differently. Purists dislike this. They should relax. Some duplication is a healthy price for explicit purpose and clear semantics.

Platform complexity does not disappear

Kafka, stream processors, backfill pipelines, offline/online stores, schema registries, and model infrastructure still need serious engineering. Domain orientation does not simplify the runtime. It simplifies accountability.

Central teams lose unilateral control

That is a feature, not a bug, but many enterprises are not culturally ready for it. If the organization cannot tolerate federated semantic ownership, it will quietly recentralize everything and call it standardization.

Failure Modes

There are several predictable ways this goes wrong.

1. “Domain-owned” becomes an excuse for chaos

If every team publishes features with no standards, you get semantic anarchy with better branding. Domain ownership must sit on top of a platform contract: metadata, lineage, versioning, quality metrics, and policy enforcement.

2. The feature store becomes the new monolith

A central feature team starts approving every feature and redefining domain terms for consistency. Soon all roads lead through one bottleneck. Delivery slows and semantics drift away from the business. If a central team is deciding what claim reopened means, the design has already failed.

3. Reconciliation is postponed forever

Teams keep saying they will “align definitions later.” They never do. The model goes live with hidden conflicts and brittle assumptions. Six months later there is a major incident and nobody can explain the score. Reconciliation delayed is reconciliation weaponized.

4. Batch and streaming semantics fork

The offline pipeline uses snapshot history; the online path uses event streams; both claim to compute the same feature but differ on windows, late arrivals, or null defaults. This creates silent training-serving skew. It is one of the costliest failure modes because everything appears to work until decisions matter.

5. Microservices are mistaken for bounded contexts

This is a classic enterprise confusion. A microservice boundary is a deployment choice. A bounded context is a semantic boundary. Sometimes they align. Often they do not. If you derive feature ownership from service topology alone, you will inherit accidental boundaries.

When Not To Use

This architecture is not always the right answer.

Do not use this approach if:

  • you are building a small experimental model with limited business impact and short lifespan
  • your data estate is narrow, from one coherent application with little semantic conflict
  • the organization lacks any credible domain ownership and cannot support federated stewardship
  • the decision does not require explainability, auditability, or durable feature reuse
  • latency requirements are so trivial that a simple batch scoring pipeline is enough

In these cases, a lighter-weight feature engineering stack may be entirely appropriate. Not every ML use case deserves enterprise-grade semantic machinery.

Also, if your source systems are profoundly unreliable and no domain team can commit to definitions, do not pretend architecture can rescue organizational ambiguity. Sometimes the honest first move is governance and source remediation, not another pipeline layer. EA governance checklist

Several patterns sit naturally beside this approach.

Data Mesh, used carefully

Data mesh contributes the idea of domain-owned data products. Useful. But feature products need stronger temporal and operational guarantees than many analytical data products. Don’t assume mesh language alone solves AI semantics.

Event Sourcing

Helpful where domain state is naturally derived from event history and replay matters. Less helpful when sources only provide coarse snapshots or where event granularity is insufficient.

CQRS

Useful for separating write models from read or feature projections. Particularly effective when online feature materialization needs optimized views derived from authoritative domain events.

Anti-Corruption Layer

Essential in migration. Legacy ERPs and monoliths rarely emit clean domain concepts. Protect new feature products from old data structures.

Strangler Fig Pattern

The right migration posture for most enterprises. Replace decision flows progressively, validating semantics through parallel runs and reconciliation.

Feature Store

A supporting component, not a governing philosophy. Use it to distribute, catalog, and serve. Do not ask it to invent your domain model.

Summary

The big idea is simple, even if the implementation isn’t: AI feature pipelines are domain models expressed as operational data flows.

Treat them like plumbing and you will get fast, scalable semantic confusion. Treat them as domain-owned products, and you have a chance of building AI systems that the business can trust, operate, and evolve.

The architecture that follows from this is opinionated:

  • define bounded contexts before defining reusable features
  • place semantic authority with domains, not central pipeline teams
  • publish feature contracts, not just feature tables
  • make reconciliation explicit
  • use Kafka and microservices where they fit, not as decorative defaults
  • migrate progressively with a strangler approach
  • validate by parallel run and difference explanation, not by architecture optimism

The enterprise lesson is blunt. Models do not fail only because of bad algorithms. They fail because organizations let semantics leak across ownership boundaries until nobody knows what a feature really means.

A feature pipeline is where the enterprise teaches a machine what the business believes. That is too important to leave to anonymous transformations.

Design accordingly.

Frequently Asked Questions

What is enterprise architecture?

Enterprise architecture aligns strategy, business processes, applications, and technology in a coherent model. It enables impact analysis, portfolio rationalisation, governance, and transformation planning across the organisation.

How does ArchiMate support architecture practice?

ArchiMate provides a standard language connecting strategy, business operations, applications, and technology. It enables traceability from strategic goals through capabilities and services to infrastructure — making architecture decisions explicit and reviewable.

What tools support enterprise architecture modeling?

The main tools are Sparx Enterprise Architect (ArchiMate, UML, BPMN, SysML), Archi (free, ArchiMate-only), and BiZZdesign. Sparx EA is the most feature-rich, supporting concurrent repositories, automation, scripting, and Jira integration.