TOGAF in Data-Driven Enterprises

โฑ 7 min read

Executive summary

Data-driven enterprises must govern data flow, quality, and accountability while enabling speed. TOGAF provides a governance framework with compliance reviews and contracts; to make it data-native, you extend governance to cover lineage and provenance. W3C PROV defines provenance as information about entities, activities, and people involved in producing data, enabling trust assessments. TOGAF certified training

OpenLineage defines a generic model for datasets, jobs, and runs, enabling lineage metadata collection and extensibility via facets. This can populate governance dashboards and control evidence. ARB governance with Sparx EA

  • Governance mapping: TOGAF compliance + data contracts
  • Streaming and real-time: event governance linkage
Figure 1: TOGAF for data-driven enterprises โ€” data strategy through architecture to platform
Figure 1: TOGAF for data-driven enterprises โ€” data strategy through architecture to platform
  • TOGAF compliance review.

Data architecture within TOGAF ADM

Figure 2: Data in TOGAF โ€” strategy through architecture to platform and governance
Figure 2: Data in TOGAF โ€” strategy through architecture to platform and governance

Data-driven enterprises treat data as a strategic asset. TOGAF supports this through explicit data architecture, but most organizations underinvest in Phase C's data dimension. ArchiMate in TOGAF ADM

Data Strategy (Phase A): Define what data capabilities the organization needs โ€” real-time analytics, AI/ML training data, regulatory reporting, data monetization. This drives data architecture requirements.

Data Architecture (Phase C): Model the data landscape: Data Objects (logical), Data Components (databases, lakes, warehouses), Data Flows (how data moves). Tag each with Classification (Public/Internal/Confidential/Restricted), Owner, Quality Score, Retention Period.

Data Platform (Phase D): Design technology infrastructure: data lakes, warehouses, streaming (Kafka), compute engines (Spark). Model as Technology Services with capacity and cost tagging.

Data Governance (Phase G): Enforce quality, lineage, access control, and retention. Connect governance to the repository โ€” every data asset traceable from business definition through physical location to access controls.

Data mesh and TOGAF

Data mesh principles (domain ownership, data as product, self-serve platform, federated governance) align well with TOGAF. Each domain owns its data products in domain packages. The central team provides self-serve infrastructure. Federated governance ensures interoperability. TOGAF's repository becomes the federated catalog where data products are discoverable across domains. ArchiMate modeling standards

The enterprise data architecture stack

Figure 3: Data architecture stack โ€” consumers, platform, and sources with component detail
Figure 3: Data architecture stack โ€” consumers, platform, and sources with component detail

A data-driven enterprise requires a comprehensive data architecture stack that spans from raw data sources through processing and storage to consumption. TOGAF Phase C (Data Architecture) defines the logical model; Phase D (Technology Architecture) defines the physical platform.

Data sources span the full diversity of enterprise data: transactional databases (the system of record for orders, customers, accounts), SaaS applications (CRM, HRIS, marketing platforms that generate data outside the organization's control), IoT streams (sensor data, telemetry, device state), partner feeds (supply chain data, market data, regulatory feeds), and legacy mainframes (core banking, insurance policy management โ€” often the most valuable and hardest-to-access data).

The data platform provides five capabilities. Ingestion moves data from sources to the platform โ€” Kafka for streaming, CDC (Debezium) for database change capture, batch ETL for periodic loads. Storage provides both the data lake (cheap, schema-on-read, preserves raw data) and data warehouse (structured, optimized for analytics). Processing transforms raw data into business-ready datasets โ€” Spark for batch, Flink for streaming, dbt for SQL-based transformation. Serving delivers data to consumers through APIs, caches, and materialized views. The data catalog enables discovery โ€” every dataset is documented with schema, ownership, quality score, and lineage.

Data consumers are the reason the platform exists: BI dashboards for business intelligence, ML/AI pipelines for predictive analytics and automation, regulatory reports for compliance, real-time analytics for operational monitoring, and self-service queries for ad-hoc analysis. Each consumer type has different latency, completeness, and format requirements โ€” the data architecture must serve them all.

Data domains and federated ownership

The data mesh approach distributes data ownership to domain teams rather than centralizing it in a data engineering team. In TOGAF terms, this means each business domain (Payments, Customer, Risk) owns its data products โ€” the curated, documented, quality-assured datasets that other domains consume. The central data platform team provides the self-serve infrastructure (storage, compute, catalog) but does not own the data.

Model this in ArchiMate: each data domain is a Business Function that owns Data Objects (its data products). Data products are realized by Application Components (the pipelines and databases that produce and serve them). The Technology Layer shows the shared platform services that all domains consume. Cross-domain data access is governed by data contracts โ€” formalized agreements specifying schema, SLA, quality guarantees, and access permissions. ArchiMate modeling guide

Governing AI and ML within TOGAF

Data-driven enterprises increasingly rely on AI and ML models that consume architecture-managed data. TOGAF governance must extend to cover model lifecycle management.

Model-data dependency tracking. Every ML model depends on specific data inputs. Model these dependencies in the architecture repository: the ML Model (Application Component) accesses specific Data Objects through defined interfaces. When the data architecture changes (schema evolution, source system replacement, data quality degradation), the model dependency graph reveals which ML models are affected โ€” preventing the silent model degradation that occurs when training data changes without notification.

Feature store governance. The feature store โ€” a shared repository of curated features for ML โ€” is an architecture component that requires governance. Feature definitions, ownership, quality standards, and access controls are managed in the EA repository alongside other data architecture elements. Feature lineage (which raw data produces which features) is tracked in the lineage system.

Model deployment as architecture change. Deploying an ML model into production is an architecture change โ€” it introduces a new Application Component that consumes data, produces predictions, and affects business processes. Significant model deployments (customer-facing, financially material, or regulatory-impacting) should pass through the architecture review board with the same rigor as any other technology deployment.

Common data architecture anti-patterns in TOGAF implementations

Several anti-patterns undermine data architecture within TOGAF programs. Recognizing them early prevents costly remediation later.

The data afterthought. Phase C prioritizes application architecture while treating data architecture as a checkbox. The result: application components are well-modeled but nobody knows which data objects flow between them, who owns the data, or where it is stored. Fix: allocate equal time to data architecture in Phase C. Every Application Component should have explicit Access relationships to the Data Objects it creates, reads, updates, and deletes.

The centralized data team bottleneck. A single data engineering team owns all data pipelines, creating a bottleneck where every domain waits for the data team to build their pipeline. Fix: adopt domain-oriented data ownership (data mesh principles) within the TOGAF governance framework. Each domain team owns its data products; the central team owns the platform.

If you'd like hands-on training tailored to your team (Sparx Enterprise Architect, ArchiMate, TOGAF, BPMN, SysML, Apache Kafka, or the Archi tool), you can reach us via our contact page.

Frequently Asked Questions

What is TOGAF used for?

TOGAF (The Open Group Architecture Framework) is used to structure and manage enterprise architecture programmes. It provides the Architecture Development Method (ADM) for creating architecture, a content framework for deliverables, and an enterprise continuum for reuse.

How does ArchiMate relate to TOGAF?

ArchiMate and TOGAF are complementary. TOGAF provides the process framework (ADM phases, governance, deliverables) while ArchiMate provides the notation for creating the architecture content. Many organisations use TOGAF as their EA method and ArchiMate as the modeling language within each ADM phase.

What is the TOGAF Architecture Development Method (ADM)?

The ADM is a step-by-step process for developing enterprise architecture. It consists of a Preliminary phase and phases A through H: Architecture Vision, Business Architecture, Information Systems Architecture, Technology Architecture, Opportunities and Solutions, Migration Planning, Implementation Governance, and Architecture Change Management.