What Does a Data Architect Do? Role Explained | NILUS

⏱ 19 min read

Most companies do not have a data problem. They have an architecture discipline problem.

That sounds harsh, but after enough enterprise programs, enough “data transformations,” enough cloud migrations with six steering committees and zero clarity, you start to see the pattern. The issue is rarely that the company lacks data. It is drowning in it. The issue is that nobody made hard decisions about how data should be structured, governed, shared, secured, and evolved. cloud architecture guide

That is where the data architect comes in.

And no, a data architect is not just the person who draws database diagrams, picks a warehouse, or lectures teams about naming conventions. That view is outdated and honestly a little dangerous. In modern enterprises, especially in banking, insurance, healthcare, telecom, and any business with regulatory pressure, a data architect sits right in the middle of business change, platform design, integration, security, and operating model.

So let’s answer the question simply first.

What does a data architect do?

A data architect designs how data is structured, moved, governed, secured, and used across an organization so that systems, teams, and business processes can work consistently at scale.

That is the short SEO-friendly answer.

Here is the real one.

A data architect decides what the enterprise should treat as important data, where that data should live, how it should flow between systems, who should be allowed to use it, how it should be modeled, how long it should be retained, what quality standards matter, and how all of that can change over time without breaking the company.

It is part technical design, part policy, part negotiation, part future-proofing, and part saying “no” when everyone else wants a shortcut.

That last part is underrated.

Because real architecture work is not about making pretty target-state diagrams. It is about reducing long-term chaos while still letting delivery teams ship something this quarter.

The role is broader than most job descriptions admit

A lot of job descriptions for data architects are nonsense. They list every database product on the market, ask for ten years of experience in tools that have existed for four, and then quietly expect the person to fix data governance, cloud architecture, metadata strategy, and reporting inconsistency at the same time. ArchiMate for governance

In practice, a competent data architect usually works across five concerns:

Data modeling
Data integration
Data governance
Data security and access
Data platform alignment

If one of these is missing, the architecture starts to wobble.

For example, you can have a beautiful cloud data lake, but if IAM is weak and nobody knows which data products are authoritative, you have built a very expensive confusion engine. enterprise architecture guide

Or you can have strict governance, but if your integration pattern is still nightly batch files pretending to be modern architecture, the business will route around you. EA governance checklist

Good data architecture is not just about storage. It is about decisions.

What a data architect actually does in real enterprise work

Let’s get practical. In a real enterprise, the data architect is involved in work like this:

Diagram 1 — What Does Data Architect Do Role Explained

Defining canonical business entities such as customer, account, transaction, policy, product, device, employee
Deciding when canonical models help and when they become bureaucratic fiction
Designing event schemas for Kafka topics
Setting standards for data quality, lineage, metadata, and ownership
Aligning operational systems, analytics platforms, and regulatory reporting
Working with IAM teams on access models for sensitive data
Defining how master and reference data should be managed
Choosing between replication, event streaming, APIs, batch, CDC, or file-based exchange
Designing retention and archival patterns in cloud environments
Handling data residency and classification constraints
Supporting program teams during mergers, core platform replacements, cloud migrations, or digital channel launches

That is the real job. It is messy. It crosses boundaries. It involves politics. It often requires telling a delivery team that their “quick fix” creates six years of integration debt.

A data architect who only focuses on schema design is useful, but limited. A strong enterprise data architect understands the operating model around the data, not just the tables.

The biggest misconception: data architecture is not BI architecture

This one needs saying clearly.

Many organizations still confuse data architecture with reporting architecture. They think if they have a warehouse, a lakehouse, a BI tool, and some dashboards, then they have solved data architecture.

They have not.

Reporting is downstream. Data architecture starts upstream.

If customer identifiers are inconsistent across channels, if transaction events are published differently by every domain, if reference data changes without controls, if IAM policies are manually managed in twenty places, then your dashboard layer is just polishing disorder.

This is why experienced architects spend so much time on foundational questions:

What is the system of record for this data?
What is the source of truth versus a derived copy?
What is the lifecycle of this data?
What is the contract for sharing it?
What level of consistency is actually required?
Which teams own which data domains?
How is access granted and audited?
What happens when the schema changes?

Those are architecture questions. Not dashboard questions.

A useful way to think about the role

Here is a practical breakdown.

That last row matters more than many architects admit. If you cannot explain why the data design supports business capabilities, you are doing theory, not architecture. TOGAF roadmap template

The levels of data architecture: conceptual, logical, physical

This is old-school, but still useful if applied well.

Diagram 2 — What Does Data Architect Do Role Explained

1. Conceptual data architecture

This is the business-level view. What are the core entities? Customer. Account. Payment. Loan. Identity. Product. Merchant. Employee.

At this level, the architect helps the organization agree on meaning.

Simple? In theory. In reality, “customer” in a bank can mean prospect, applicant, account holder, authorized signatory, card user, or legal entity. If you skip that ambiguity and move straight to implementation, you build fragmentation into the enterprise.

2. Logical data architecture

This defines relationships, attributes, data domains, ownership, and information flows without being tied to one specific technology.

This is where the architect decides things like:

How customer identity relates to party, household, account, and consent
Which fields are mandatory
Which identifiers are enterprise-wide versus local
What events should exist and what they mean
Which domains own which records

3. Physical data architecture

This is where the design gets real. Tables, topics, buckets, partitions, indexes, object stores, warehouse schemas, retention policies, replication rules.

Contrarian view: many enterprise architects spend too much time at the conceptual level because it feels strategic. But if you never pressure-test the physical implications, your architecture is fantasy. A Kafka topic with a bad event contract is still a bad architecture, even if the PowerPoint was elegant.

Data architect vs data engineer vs enterprise architect

These roles overlap, and that causes confusion.

Data architect

Owns the structure, principles, standards, and target-state decisions for enterprise data.

Data engineer

Builds and operates pipelines, transformations, storage patterns, and data processing jobs. enterprise architecture consulting

Enterprise architect

Connects data architecture to broader business, application, technology, and capability architecture.

Solution architect

Designs the end-to-end architecture for a specific initiative or platform change.

In healthy organizations, these roles collaborate. In unhealthy ones, they fight over territory or leave gaps.

Here is my blunt opinion: if your data architect cannot talk credibly with engineers about Kafka partitions, CDC trade-offs, cloud storage tiers, or IAM patterns, they are too detached. If your data engineer thinks data architecture is “just governance,” they are too narrow. If your enterprise architect ignores data because “the platform team has that,” they are asleep at the wheel. architecture decision record template

Where data architects spend their time in modern enterprises

Not enough people say this honestly: the role is less about creating artifacts and more about making decisions stick.

A large part of the week goes into:

Reviewing solution designs
Resolving ownership disputes
Defining standards that teams will actually adopt
Challenging inconsistent data definitions
Sitting with security and IAM teams on access patterns
Working with cloud platform teams on storage and movement constraints
Reviewing API and event contracts
Helping delivery teams avoid bad shortcuts
Explaining why two systems should not both become “golden sources”
Translating business language into data domains and architectural consequences

The glamorous part is drawing the target state. The valuable part is preventing five incompatible local optimizations from becoming permanent.

Real architecture example: banking, Kafka, IAM, and cloud

Let’s use a realistic enterprise example.

A retail bank is modernizing its customer and payments landscape. It has:

A legacy core banking platform
A CRM used by branch and contact center teams
Mobile and web channels in the cloud
A Kafka platform for event streaming
A cloud data platform for analytics and regulatory reporting
IAM based on enterprise identity, role-based access, and fine-grained data access controls

The business goal sounds simple: create a single customer view and near-real-time transaction insight across channels.

This is exactly the kind of goal executives love because it sounds clean and modern. In practice, it is architectural dynamite.

What the data architect has to solve

1. Define what “customer” means

The bank already has multiple customer representations:

Core banking customer ID
CRM party ID
Digital channel identity profile
AML/KYC identity record
Marketing segmentation profile

If nobody defines the canonical relationships, every platform invents its own “single view.”

A good data architect does not pretend one model will erase all differences. Instead, they define:

Enterprise identifiers
Cross-reference and survivorship rules
Domain ownership
Data quality expectations
Acceptable lag and synchronization patterns

2. Design event contracts in Kafka

The bank wants customer and transaction events in Kafka.

This is where immature teams make a classic mistake: they dump database changes into Kafka and call it event-driven architecture.

That is not architecture. That is leakage.

The data architect should push for clear event semantics:

CustomerCreated
CustomerContactDetailsUpdated
AccountOpened
PaymentInitiated
PaymentSettled

Each event needs:

Stable schema
Versioning rules
Ownership
Classification
Retention policy
Consumer expectations

If the event model is weak, downstream systems become tightly coupled to source internals. Then every schema change becomes a political incident.

3. Align IAM with data sensitivity

Not every user, service, analyst, or application should see full customer data.

The data architect works with IAM and security teams to define:

Data classification levels
Attribute-level masking
Role- and policy-based access
Separation between operational access and analytical access
Auditability for sensitive fields like national ID, account balance, transaction details

This is where many architectures fail quietly. They design beautiful data flows, then bolt on access control later. In banking, that is reckless.

4. Decide cloud data patterns

The cloud platform team wants all data landed in object storage, then transformed into curated layers.

Fine, maybe. But the data architect has to challenge lazy assumptions:

Does every domain need raw retention?
Which datasets require immutable retention?
Which data should remain in operational stores?
What latency matters for fraud versus reporting?
Which workloads belong in warehouse structures versus stream processing?
What residency and encryption controls apply?

Cloud gives options. It does not remove architectural responsibility.

What success looks like

A successful data architecture in this bank would produce:

Clear customer domain ownership
Reusable enterprise identifiers
Event contracts with governance and schema discipline
IAM policies aligned to data classification
Stream and batch patterns chosen intentionally
Trusted curated datasets for analytics and compliance
Reduced reconciliation across channels

What failure looks like

And failure? That is easy to recognize:

Every team publishes “customer” differently
Kafka becomes a dumping ground for internal records
IAM is manually configured per application
The cloud data lake becomes a graveyard of unclear copies
Regulatory reporting still needs spreadsheet reconciliation
Executives keep asking why the “single customer view” is inconsistent

That is not a tooling failure. That is a data architecture failure.

Common mistakes data architects make

Architects are not innocent in this story. We create our own problems too.

1. Being too abstract

Some data architects live in principle-land. They produce conceptual models, standards, and governance decks, but never engage with actual implementation constraints.

If your architecture cannot survive contact with Kafka topic design, cloud storage economics, IAM enforcement models, or legacy integration realities, it is not architecture. It is commentary.

2. Confusing standardization with value

Not everything needs an enterprise canonical model. Sometimes a local domain model with a clean contract is better.

This is a contrarian point because enterprises love canonical everything. But over-standardization can slow delivery, create fake alignment, and hide domain nuance.

You do need shared meaning. You do not need one giant universal schema for all time.

3. Ignoring ownership

A lot of governance efforts fail because nobody made ownership explicit.

Who owns customer contact data? Who owns transaction status? Who approves schema changes? Who signs off on quality rules?

If the answer is “the enterprise” or “the data team,” the answer is probably “nobody.”

4. Treating governance as paperwork

Governance that only exists in policy documents is dead on arrival.

Governance must show up in:

schema registries
data catalogs
access workflows
lineage tooling
deployment controls
quality gates
release processes

Otherwise teams bypass it. And honestly, they should.

5. Forgetting security architecture

Data architects sometimes assume security is another team’s problem. Big mistake.

If you design data movement, storage, and sharing without understanding IAM, classification, tokenization, masking, and audit requirements, you are leaving a dangerous gap.

6. Designing for an ideal future state only

Every architect likes a clean target state. Real enterprises have legacy cores, duplicate systems, messy reference data, and political constraints.

The role is not to describe perfection. The role is to create a credible path from current mess to better structure.

7. Letting the platform dictate the architecture

Just because the company bought a lakehouse, event bus, or master data tool does not mean every problem should be forced into that shape.

Tools should support the architecture. Too often the reverse happens.

What good data architects do differently

The good ones I have seen tend to do a few things consistently.

They anchor on business capabilities

They ask:

What business capability is changing?
What data is essential to that capability?
What consistency and timeliness matter?
What regulation applies?
Which domains are impacted?

That keeps the work grounded.

They think in operating models, not just models

They care about stewardship, ownership, release governance, exception handling, and adoption.

Because a perfect model nobody uses is worthless.

They design for change

Schema evolution, acquisitions, divestments, regulation changes, cloud shifts, new channels, AI use cases. Change is the norm.

Rigid architectures break. Sloppy architectures drift. Good architectures absorb change with controlled friction.

They know where to be strict

This is critical.

Be strict on:

identifiers
data ownership
security classification
access controls
event contract quality
lineage for regulated data

Be more flexible on:

local domain implementation details
internal optimization within bounded ownership
analytics structures for specific use cases

Not everything deserves enterprise control. But some things absolutely do.

How this applies in day-to-day architecture work

People often ask what this means outside theory. Here is the day-to-day reality.

If you are reviewing a new mobile banking feature, the data architect should ask:

Does this create a new customer identifier?
Which system owns the updated preference data?
Is consent data replicated or referenced?
Should updates be evented over Kafka or retrieved by API?
Who can access this data in the cloud analytics platform?
What retention applies?

If you are replacing an IAM platform, the data architect should ask:

Which attributes are used for authorization decisions?
Where is identity data mastered?
How will entitlements map to data domains?
Can masking and row-level controls be enforced consistently?
How will audit data be retained and queried?

If you are launching a fraud analytics platform in the cloud, the data architect should ask:

Which transaction events must be near-real-time?
What is the trusted account and customer reference source?
What latency is acceptable for model scoring?
How will PII be protected in feature engineering datasets?
What lineage is required for model explainability and investigations?

This is not side work. This is core architecture work.

The uncomfortable truth about “single source of truth”

Strong opinion: the phrase “single source of truth” is overused and often misleading.

In enterprises, there is rarely one truth in the simplistic sense people want. There are authoritative sources for specific domains, attributes, and processes. There are also derived, optimized, and context-specific representations.

A customer’s legal name may be authoritative in one system. Their digital profile preferences may be authoritative elsewhere. Their risk classification may belong to another domain entirely.

The job of the data architect is not to force all truth into one database. It is to make authority explicit, relationships clear, and synchronization controlled.

That is much more useful than repeating slogans.

What skills make a strong data architect?

Not just SQL. Not just modeling. Not just governance language.

A strong data architect usually combines:

Business understanding: especially domain semantics
Data modeling skill: conceptual through physical
Integration knowledge: APIs, messaging, Kafka, CDC, batch
Cloud awareness: storage, compute, cost, resilience, platform constraints
Security understanding: IAM, privacy, masking, encryption, audit
Governance discipline: ownership, metadata, quality, lineage
Communication skill: translating between executives, engineers, and risk teams
Judgment: knowing when to standardize and when to leave room for local design

Judgment is the hardest part to hire for. Many people know tools. Fewer know how to make durable trade-offs.

A practical checklist: signs your organization needs stronger data architecture

If several of these are true, you do not need another dashboard project. You need architecture attention.

Different systems define customer differently
Kafka topics have inconsistent schemas and ownership
IAM access decisions are manual and application-specific
Cloud data stores contain many copies with unclear authority
Regulatory reporting requires reconciliation across teams
Data lineage is partial or unreliable
Teams argue over “golden source” every quarter
Batch and real-time patterns are chosen for convenience, not need
Metadata catalog exists but is not trusted
Data quality issues are discovered by business users, not controls

That is the smell of under-architected data.

Final thought

A data architect is not there to make data neat for its own sake.

The role exists because enterprises need consistent, secure, evolvable data foundations to operate, comply, and change. That means making decisions that span business meaning, technical integration, cloud platforms, IAM, governance, and delivery reality.

Good data architecture is rarely flashy. It often looks like discipline. Clear ownership. Better contracts. Less duplication. Fewer surprises. Faster integration. Safer access. More trustworthy reporting.

And yes, that can feel less exciting than a new platform purchase.

But in enterprise work, boring clarity beats fashionable chaos every single time.

FAQ

1. What is the difference between a data architect and a database architect?

A database architect focuses mainly on database structures, performance, and implementation patterns within database technologies. A data architect works more broadly across enterprise data models, integration, governance, IAM alignment, cloud data patterns, ownership, and cross-system consistency.

2. Does a data architect need to know cloud and Kafka?

Yes. Maybe not as a deep hands-on operator in every case, but absolutely enough to make sound architectural decisions. Modern data architecture depends on understanding event streaming, cloud storage and processing patterns, and platform trade-offs. Without that, the role becomes too theoretical.

3. Is data architecture still relevant in a data mesh or domain-oriented model?

Yes, maybe more than ever. Domain ownership does not remove the need for architecture. It changes the focus. Instead of centralizing every model, the architect defines guardrails for interoperability, identity, governance, security, and shared standards so domains can operate without creating chaos.

4. What are the most common failures in enterprise data architecture?

The big ones are unclear ownership, over-abstract modeling, weak event contracts, governance that is not embedded in delivery, ignoring IAM and security, and believing one tool or one platform will magically solve semantic inconsistency.

5. How do you know if a data architect is effective?

Look for outcomes, not just artifacts. Are data domains clearer? Are integrations cleaner? Are Kafka schemas governed? Is IAM aligned to data classification? Is cloud data duplication reduced? Are reporting and regulatory issues decreasing? Effective architects leave the enterprise easier to change, not just better documented.

Frequently Asked Questions

What is enterprise architecture?

Enterprise architecture is a discipline that aligns an organisation's strategy, business processes, information systems, and technology. Using frameworks like TOGAF and modeling languages like ArchiMate, it provides a structured view of how the enterprise operates and how it needs to change.

How does ArchiMate support enterprise architecture practice?

ArchiMate provides a standard modeling language that connects strategy, business operations, applications, data, and technology in one coherent model. It enables traceability from strategic goals through business capabilities and application services to the technology platforms that support them.

What tools are used for enterprise architecture modeling?

The main tools are Sparx Enterprise Architect (ArchiMate, UML, BPMN, SysML), Archi (free, ArchiMate-only), and BiZZdesign Enterprise Studio. Sparx EA is the most feature-rich option, supporting concurrent repositories, automation, scripting, and integration with delivery tools like Jira and Azure DevOps.