⏱ 19 min read
Most companies do not have a data problem. They have an architecture discipline problem.
That sounds harsh, but after enough enterprise programs, enough “data transformations,” enough cloud migrations with six steering committees and zero clarity, you start to see the pattern. The issue is rarely that the company lacks data. It is drowning in it. The issue is that nobody made hard decisions about how data should be structured, governed, shared, secured, and evolved. cloud architecture guide
That is where the data architect comes in.
And no, a data architect is not just the person who draws database diagrams, picks a warehouse, or lectures teams about naming conventions. That view is outdated and honestly a little dangerous. In modern enterprises, especially in banking, insurance, healthcare, telecom, and any business with regulatory pressure, a data architect sits right in the middle of business change, platform design, integration, security, and operating model.
So let’s answer the question simply first.
What does a data architect do?
A data architect designs how data is structured, moved, governed, secured, and used across an organization so that systems, teams, and business processes can work consistently at scale.
That is the short SEO-friendly answer.
Here is the real one.
A data architect decides what the enterprise should treat as important data, where that data should live, how it should flow between systems, who should be allowed to use it, how it should be modeled, how long it should be retained, what quality standards matter, and how all of that can change over time without breaking the company.
It is part technical design, part policy, part negotiation, part future-proofing, and part saying “no” when everyone else wants a shortcut.
That last part is underrated.
Because real architecture work is not about making pretty target-state diagrams. It is about reducing long-term chaos while still letting delivery teams ship something this quarter.
The role is broader than most job descriptions admit
A lot of job descriptions for data architects are nonsense. They list every database product on the market, ask for ten years of experience in tools that have existed for four, and then quietly expect the person to fix data governance, cloud architecture, metadata strategy, and reporting inconsistency at the same time. ArchiMate for governance
In practice, a competent data architect usually works across five concerns:
- Data modeling
- Data integration
- Data governance
- Data security and access
- Data platform alignment
If one of these is missing, the architecture starts to wobble.
For example, you can have a beautiful cloud data lake, but if IAM is weak and nobody knows which data products are authoritative, you have built a very expensive confusion engine. enterprise architecture guide
Or you can have strict governance, but if your integration pattern is still nightly batch files pretending to be modern architecture, the business will route around you. EA governance checklist
Good data architecture is not just about storage. It is about decisions.
What a data architect actually does in real enterprise work
Let’s get practical. In a real enterprise, the data architect is involved in work like this:
- Defining canonical business entities such as customer, account, transaction, policy, product, device, employee
- Deciding when canonical models help and when they become bureaucratic fiction
- Designing event schemas for Kafka topics
- Setting standards for data quality, lineage, metadata, and ownership
- Aligning operational systems, analytics platforms, and regulatory reporting
- Working with IAM teams on access models for sensitive data
- Defining how master and reference data should be managed
- Choosing between replication, event streaming, APIs, batch, CDC, or file-based exchange
- Designing retention and archival patterns in cloud environments
- Handling data residency and classification constraints
- Supporting program teams during mergers, core platform replacements, cloud migrations, or digital channel launches
That is the real job. It is messy. It crosses boundaries. It involves politics. It often requires telling a delivery team that their “quick fix” creates six years of integration debt.
A data architect who only focuses on schema design is useful, but limited. A strong enterprise data architect understands the operating model around the data, not just the tables.
The biggest misconception: data architecture is not BI architecture
This one needs saying clearly.
Many organizations still confuse data architecture with reporting architecture. They think if they have a warehouse, a lakehouse, a BI tool, and some dashboards, then they have solved data architecture.
They have not.
Reporting is downstream. Data architecture starts upstream.
If customer identifiers are inconsistent across channels, if transaction events are published differently by every domain, if reference data changes without controls, if IAM policies are manually managed in twenty places, then your dashboard layer is just polishing disorder.
This is why experienced architects spend so much time on foundational questions:
- What is the system of record for this data?
- What is the source of truth versus a derived copy?
- What is the lifecycle of this data?
- What is the contract for sharing it?
- What level of consistency is actually required?
- Which teams own which data domains?
- How is access granted and audited?
- What happens when the schema changes?
Those are architecture questions. Not dashboard questions.
A useful way to think about the role
Here is a practical breakdown.
That last row matters more than many architects admit. If you cannot explain why the data design supports business capabilities, you are doing theory, not architecture. TOGAF roadmap template
The levels of data architecture: conceptual, logical, physical
This is old-school, but still useful if applied well.
1. Conceptual data architecture
This is the business-level view. What are the core entities? Customer. Account. Payment. Loan. Identity. Product. Merchant. Employee.
At this level, the architect helps the organization agree on meaning.
Simple? In theory. In reality, “customer” in a bank can mean prospect, applicant, account holder, authorized signatory, card user, or legal entity. If you skip that ambiguity and move straight to implementation, you build fragmentation into the enterprise.
2. Logical data architecture
This defines relationships, attributes, data domains, ownership, and information flows without being tied to one specific technology.
This is where the architect decides things like:
- How customer identity relates to party, household, account, and consent
- Which fields are mandatory
- Which identifiers are enterprise-wide versus local
- What events should exist and what they mean
- Which domains own which records
3. Physical data architecture
This is where the design gets real. Tables, topics, buckets, partitions, indexes, object stores, warehouse schemas, retention policies, replication rules.
Contrarian view: many enterprise architects spend too much time at the conceptual level because it feels strategic. But if you never pressure-test the physical implications, your architecture is fantasy. A Kafka topic with a bad event contract is still a bad architecture, even if the PowerPoint was elegant.
Data architect vs data engineer vs enterprise architect
These roles overlap, and that causes confusion.
Data architect
Owns the structure, principles, standards, and target-state decisions for enterprise data.
Data engineer
Builds and operates pipelines, transformations, storage patterns, and data processing jobs. enterprise architecture consulting
Enterprise architect
Connects data architecture to broader business, application, technology, and capability architecture.
Solution architect
Designs the end-to-end architecture for a specific initiative or platform change.
In healthy organizations, these roles collaborate. In unhealthy ones, they fight over territory or leave gaps.
Here is my blunt opinion: if your data architect cannot talk credibly with engineers about Kafka partitions, CDC trade-offs, cloud storage tiers, or IAM patterns, they are too detached. If your data engineer thinks data architecture is “just governance,” they are too narrow. If your enterprise architect ignores data because “the platform team has that,” they are asleep at the wheel. architecture decision record template
Where data architects spend their time in modern enterprises
Not enough people say this honestly: the role is less about creating artifacts and more about making decisions stick.
A large part of the week goes into:
- Reviewing solution designs
- Resolving ownership disputes
- Defining standards that teams will actually adopt
- Challenging inconsistent data definitions
- Sitting with security and IAM teams on access patterns
- Working with cloud platform teams on storage and movement constraints
- Reviewing API and event contracts
- Helping delivery teams avoid bad shortcuts
- Explaining why two systems should not both become “golden sources”
- Translating business language into data domains and architectural consequences
The glamorous part is drawing the target state. The valuable part is preventing five incompatible local optimizations from becoming permanent.
Real architecture example: banking, Kafka, IAM, and cloud
Let’s use a realistic enterprise example.
A retail bank is modernizing its customer and payments landscape. It has:
- A legacy core banking platform
- A CRM used by branch and contact center teams
- Mobile and web channels in the cloud
- A Kafka platform for event streaming
- A cloud data platform for analytics and regulatory reporting
- IAM based on enterprise identity, role-based access, and fine-grained data access controls
The business goal sounds simple: create a single customer view and near-real-time transaction insight across channels.
This is exactly the kind of goal executives love because it sounds clean and modern. In practice, it is architectural dynamite.
What the data architect has to solve
1. Define what “customer” means
The bank already has multiple customer representations:
- Core banking customer ID
- CRM party ID
- Digital channel identity profile
- AML/KYC identity record
- Marketing segmentation profile
If nobody defines the canonical relationships, every platform invents its own “single view.”
A good data architect does not pretend one model will erase all differences. Instead, they define:
- Enterprise identifiers
- Cross-reference and survivorship rules
- Domain ownership
- Data quality expectations
- Acceptable lag and synchronization patterns
2. Design event contracts in Kafka
The bank wants customer and transaction events in Kafka.
This is where immature teams make a classic mistake: they dump database changes into Kafka and call it event-driven architecture.
That is not architecture. That is leakage.
The data architect should push for clear event semantics:
CustomerCreatedCustomerContactDetailsUpdatedAccountOpenedPaymentInitiatedPaymentSettled
Each event needs:
- Stable schema
- Versioning rules
- Ownership
- Classification
- Retention policy
- Consumer expectations
If the event model is weak, downstream systems become tightly coupled to source internals. Then every schema change becomes a political incident.
3. Align IAM with data sensitivity
Not every user, service, analyst, or application should see full customer data.
The data architect works with IAM and security teams to define:
- Data classification levels
- Attribute-level masking
- Role- and policy-based access
- Separation between operational access and analytical access
- Auditability for sensitive fields like national ID, account balance, transaction details
This is where many architectures fail quietly. They design beautiful data flows, then bolt on access control later. In banking, that is reckless.
4. Decide cloud data patterns
The cloud platform team wants all data landed in object storage, then transformed into curated layers.
Fine, maybe. But the data architect has to challenge lazy assumptions:
- Does every domain need raw retention?
- Which datasets require immutable retention?
- Which data should remain in operational stores?
- What latency matters for fraud versus reporting?
- Which workloads belong in warehouse structures versus stream processing?
- What residency and encryption controls apply?
Cloud gives options. It does not remove architectural responsibility.
What success looks like
A successful data architecture in this bank would produce:
- Clear customer domain ownership
- Reusable enterprise identifiers
- Event contracts with governance and schema discipline
- IAM policies aligned to data classification
- Stream and batch patterns chosen intentionally
- Trusted curated datasets for analytics and compliance
- Reduced reconciliation across channels
What failure looks like
And failure? That is easy to recognize:
- Every team publishes “customer” differently
- Kafka becomes a dumping ground for internal records
- IAM is manually configured per application
- The cloud data lake becomes a graveyard of unclear copies
- Regulatory reporting still needs spreadsheet reconciliation
- Executives keep asking why the “single customer view” is inconsistent
That is not a tooling failure. That is a data architecture failure.
Common mistakes data architects make
Architects are not innocent in this story. We create our own problems too.
1. Being too abstract
Some data architects live in principle-land. They produce conceptual models, standards, and governance decks, but never engage with actual implementation constraints.
If your architecture cannot survive contact with Kafka topic design, cloud storage economics, IAM enforcement models, or legacy integration realities, it is not architecture. It is commentary.
2. Confusing standardization with value
Not everything needs an enterprise canonical model. Sometimes a local domain model with a clean contract is better.
This is a contrarian point because enterprises love canonical everything. But over-standardization can slow delivery, create fake alignment, and hide domain nuance.
You do need shared meaning. You do not need one giant universal schema for all time.
3. Ignoring ownership
A lot of governance efforts fail because nobody made ownership explicit.
Who owns customer contact data? Who owns transaction status? Who approves schema changes? Who signs off on quality rules?
If the answer is “the enterprise” or “the data team,” the answer is probably “nobody.”
4. Treating governance as paperwork
Governance that only exists in policy documents is dead on arrival.
Governance must show up in:
- schema registries
- data catalogs
- access workflows
- lineage tooling
- deployment controls
- quality gates
- release processes
Otherwise teams bypass it. And honestly, they should.
5. Forgetting security architecture
Data architects sometimes assume security is another team’s problem. Big mistake.
If you design data movement, storage, and sharing without understanding IAM, classification, tokenization, masking, and audit requirements, you are leaving a dangerous gap.
6. Designing for an ideal future state only
Every architect likes a clean target state. Real enterprises have legacy cores, duplicate systems, messy reference data, and political constraints.
The role is not to describe perfection. The role is to create a credible path from current mess to better structure.
7. Letting the platform dictate the architecture
Just because the company bought a lakehouse, event bus, or master data tool does not mean every problem should be forced into that shape.
Tools should support the architecture. Too often the reverse happens.
What good data architects do differently
The good ones I have seen tend to do a few things consistently.
They anchor on business capabilities
They ask:
- What business capability is changing?
- What data is essential to that capability?
- What consistency and timeliness matter?
- What regulation applies?
- Which domains are impacted?
That keeps the work grounded.
They think in operating models, not just models
They care about stewardship, ownership, release governance, exception handling, and adoption.
Because a perfect model nobody uses is worthless.
They design for change
Schema evolution, acquisitions, divestments, regulation changes, cloud shifts, new channels, AI use cases. Change is the norm.
Rigid architectures break. Sloppy architectures drift. Good architectures absorb change with controlled friction.
They know where to be strict
This is critical.
Be strict on:
- identifiers
- data ownership
- security classification
- access controls
- event contract quality
- lineage for regulated data
Be more flexible on:
- local domain implementation details
- internal optimization within bounded ownership
- analytics structures for specific use cases
Not everything deserves enterprise control. But some things absolutely do.
How this applies in day-to-day architecture work
People often ask what this means outside theory. Here is the day-to-day reality.
If you are reviewing a new mobile banking feature, the data architect should ask:
- Does this create a new customer identifier?
- Which system owns the updated preference data?
- Is consent data replicated or referenced?
- Should updates be evented over Kafka or retrieved by API?
- Who can access this data in the cloud analytics platform?
- What retention applies?
If you are replacing an IAM platform, the data architect should ask:
- Which attributes are used for authorization decisions?
- Where is identity data mastered?
- How will entitlements map to data domains?
- Can masking and row-level controls be enforced consistently?
- How will audit data be retained and queried?
If you are launching a fraud analytics platform in the cloud, the data architect should ask:
- Which transaction events must be near-real-time?
- What is the trusted account and customer reference source?
- What latency is acceptable for model scoring?
- How will PII be protected in feature engineering datasets?
- What lineage is required for model explainability and investigations?
This is not side work. This is core architecture work.
The uncomfortable truth about “single source of truth”
Strong opinion: the phrase “single source of truth” is overused and often misleading.
In enterprises, there is rarely one truth in the simplistic sense people want. There are authoritative sources for specific domains, attributes, and processes. There are also derived, optimized, and context-specific representations.
A customer’s legal name may be authoritative in one system. Their digital profile preferences may be authoritative elsewhere. Their risk classification may belong to another domain entirely.
The job of the data architect is not to force all truth into one database. It is to make authority explicit, relationships clear, and synchronization controlled.
That is much more useful than repeating slogans.
What skills make a strong data architect?
Not just SQL. Not just modeling. Not just governance language.
A strong data architect usually combines:
- Business understanding: especially domain semantics
- Data modeling skill: conceptual through physical
- Integration knowledge: APIs, messaging, Kafka, CDC, batch
- Cloud awareness: storage, compute, cost, resilience, platform constraints
- Security understanding: IAM, privacy, masking, encryption, audit
- Governance discipline: ownership, metadata, quality, lineage
- Communication skill: translating between executives, engineers, and risk teams
- Judgment: knowing when to standardize and when to leave room for local design
Judgment is the hardest part to hire for. Many people know tools. Fewer know how to make durable trade-offs.
A practical checklist: signs your organization needs stronger data architecture
If several of these are true, you do not need another dashboard project. You need architecture attention.
- Different systems define customer differently
- Kafka topics have inconsistent schemas and ownership
- IAM access decisions are manual and application-specific
- Cloud data stores contain many copies with unclear authority
- Regulatory reporting requires reconciliation across teams
- Data lineage is partial or unreliable
- Teams argue over “golden source” every quarter
- Batch and real-time patterns are chosen for convenience, not need
- Metadata catalog exists but is not trusted
- Data quality issues are discovered by business users, not controls
That is the smell of under-architected data.
Final thought
A data architect is not there to make data neat for its own sake.
The role exists because enterprises need consistent, secure, evolvable data foundations to operate, comply, and change. That means making decisions that span business meaning, technical integration, cloud platforms, IAM, governance, and delivery reality.
Good data architecture is rarely flashy. It often looks like discipline. Clear ownership. Better contracts. Less duplication. Fewer surprises. Faster integration. Safer access. More trustworthy reporting.
And yes, that can feel less exciting than a new platform purchase.
But in enterprise work, boring clarity beats fashionable chaos every single time.
FAQ
1. What is the difference between a data architect and a database architect?
A database architect focuses mainly on database structures, performance, and implementation patterns within database technologies. A data architect works more broadly across enterprise data models, integration, governance, IAM alignment, cloud data patterns, ownership, and cross-system consistency.
2. Does a data architect need to know cloud and Kafka?
Yes. Maybe not as a deep hands-on operator in every case, but absolutely enough to make sound architectural decisions. Modern data architecture depends on understanding event streaming, cloud storage and processing patterns, and platform trade-offs. Without that, the role becomes too theoretical.
3. Is data architecture still relevant in a data mesh or domain-oriented model?
Yes, maybe more than ever. Domain ownership does not remove the need for architecture. It changes the focus. Instead of centralizing every model, the architect defines guardrails for interoperability, identity, governance, security, and shared standards so domains can operate without creating chaos.
4. What are the most common failures in enterprise data architecture?
The big ones are unclear ownership, over-abstract modeling, weak event contracts, governance that is not embedded in delivery, ignoring IAM and security, and believing one tool or one platform will magically solve semantic inconsistency.
5. How do you know if a data architect is effective?
Look for outcomes, not just artifacts. Are data domains clearer? Are integrations cleaner? Are Kafka schemas governed? Is IAM aligned to data classification? Is cloud data duplication reduced? Are reporting and regulatory issues decreasing? Effective architects leave the enterprise easier to change, not just better documented.
Frequently Asked Questions
What is enterprise architecture?
Enterprise architecture is a discipline that aligns an organisation's strategy, business processes, information systems, and technology. Using frameworks like TOGAF and modeling languages like ArchiMate, it provides a structured view of how the enterprise operates and how it needs to change.
How does ArchiMate support enterprise architecture practice?
ArchiMate provides a standard modeling language that connects strategy, business operations, applications, data, and technology in one coherent model. It enables traceability from strategic goals through business capabilities and application services to the technology platforms that support them.
What tools are used for enterprise architecture modeling?
The main tools are Sparx Enterprise Architect (ArchiMate, UML, BPMN, SysML), Archi (free, ArchiMate-only), and BiZZdesign Enterprise Studio. Sparx EA is the most feature-rich option, supporting concurrent repositories, automation, scripting, and integration with delivery tools like Jira and Azure DevOps.