What Is A Data Architect

⏱ 19 min read

Most companies do not have a data problem. They have a decision problem disguised as a data problem.

That sounds harsh, but it’s true. I’ve watched enterprises spend millions on cloud platforms, Kafka estates, lakehouses, governance councils, and “AI readiness” programs, only to discover they still can’t answer basic questions like: Which customer record is the right one? Who is allowed to see this data? What event actually means an account was opened? The technology is rarely the first failure. The architecture is.

And that’s where the data architect comes in.

A data architect is not just the person who draws boxes around databases. Not anymore. In a modern enterprise, a data architect defines how data is structured, moved, governed, secured, and made useful across business and technology boundaries. They make sure data works as an enterprise asset rather than a pile of disconnected project outputs.

That’s the simple version. Good for SEO, probably. But it’s incomplete.

Because in real architecture work, a data architect is part translator, part systems thinker, part governance enforcer, and part pragmatist. They sit in the uncomfortable space between business ambition and technical reality. They decide what should be canonical, what should be event-driven, what should remain local, what belongs in cloud analytics platforms, what must stay under tighter control, and where identity and access management has to shape the data design itself.

A good data architect helps the enterprise move faster. A bad one creates elegant diagrams no one can implement.

Let’s get into the real thing.

The simple definition first

If someone asks you in a corridor, “What does a data architect do?”, the clean answer is this:

A data architect designs the structure, flow, integration, governance, and security of data across systems so the enterprise can use data reliably and safely.

That includes things like:

  • data models
  • integration patterns
  • event and message design
  • master and reference data
  • data quality rules
  • metadata and lineage
  • retention and compliance
  • access controls
  • platform alignment across on-prem and cloud

That definition is fine. But if you stop there, you’ll end up treating data architecture like a technical specialty sitting under one platform team. That’s a mistake.

Data architecture is enterprise architecture, whether people admit it or not. TOGAF roadmap template

The real explanation: data architecture is about control, meaning, and movement

The reason data architecture matters is not because data is valuable. Everyone says that. It’s become corporate wallpaper.

Data architecture matters because enterprises are messy. They grow through product launches, mergers, regulations, vendor packages, siloed delivery teams, and political compromises. Every one of those leaves data scars behind.

So the data architect has to answer hard questions like:

  • What does “customer” actually mean in this bank?
  • Which system is authoritative for legal identity, and which one is only operationally convenient?
  • Should account lifecycle updates be shared as Kafka events, APIs, batch feeds, or all three?
  • How do IAM policies map to data domains and sensitive attributes?
  • What data can move to cloud analytics, and what must remain tokenized or restricted?
  • How do we avoid five teams inventing five incompatible event schemas for the same business fact?

This is why I push back when people describe data architects as “data model experts.” Modeling is part of the job. It is not the job.

The real job is designing for meaning, trust, and operational fit.

A data architect has to care about semantics and plumbing at the same time. That’s annoying, but unavoidable.

What a data architect actually does in enterprise work

The role changes by organization, but in serious enterprise environments the work usually lands in a few areas.

Diagram 1 — What Is Data Architect
Diagram 1 — What Is Data Architect

1. Defining core data domains

A data architect helps identify major business data domains such as:

  • customer
  • account
  • payment
  • product
  • employee
  • identity and access
  • risk
  • transaction
  • reference data

This is not just taxonomy theatre. Domain boundaries affect ownership, integration, data quality accountability, and security controls.

If your retail banking team, lending team, and wealth team all define customer differently, your architecture is already broken, no matter how modern your tooling is.

2. Designing data models that survive beyond one project

Project teams often optimize for immediate delivery. Fair enough. But somebody has to think about whether a model will still make sense when three more systems integrate with it next year.

That means conceptual, logical, and sometimes physical modeling. It also means deciding when to standardize and when to tolerate variation.

Contrarian point: not everything needs an enterprise canonical model. In fact, forcing a giant canonical model onto every integration is one of the classic architecture mistakes. Sometimes a stable domain event contract is better than a heavyweight canonical structure no one likes and everyone bypasses.

3. Designing data movement patterns

This is where architecture gets real.

A data architect decides, with integration and solution architects, how data should move:

  • batch ETL
  • APIs
  • CDC
  • event streaming via Kafka
  • file exchange
  • replication
  • virtualization
  • data products in cloud platforms

Each pattern has trade-offs. People love to say “everything should be event-driven.” No, it shouldn’t. Event streaming is powerful, but it is not a religion.

Kafka is fantastic for distributing business events at scale, decoupling producers and consumers, and enabling near-real-time data products. It is also a very efficient way to spread bad semantics quickly if your event design is sloppy.

A data architect must care not just about how data moves, but what business meaning survives the journey.

4. Setting governance that is usable, not ceremonial

Governance is where many architecture efforts go to die.

The data architect often helps define:

  • ownership and stewardship
  • business glossaries
  • classification rules
  • quality controls
  • retention rules
  • lineage requirements
  • policy enforcement patterns

The trick is making governance operational. If governance exists only in PowerPoint or committee minutes, it is not architecture. It is theatre. ArchiMate for governance

5. Aligning security and IAM with data architecture

This part is routinely underestimated.

Data architecture and IAM should be tightly connected. Who can access what data, under what context, through which platform, with what masking or tokenization, is not just a security question. It is an architectural one. enterprise architecture guide

In banking especially, this matters a lot. Customer PII, transaction history, KYC data, sanctions screening data, and internal risk signals all have different access requirements.

A mature data architect works with IAM and security architects on things like:

  • attribute-based access control
  • role mapping to data domains
  • privileged access boundaries
  • tokenization and masking patterns
  • service identity design for Kafka producers/consumers
  • data access controls in cloud warehouses and data lakes

If those concerns are bolted on later, the architecture usually becomes brittle and expensive.

6. Creating a roadmap, not just a target state

This is where real architects separate themselves from diagram enthusiasts.

It’s easy to draw a target-state data platform with cloud-native ingestion, domain data products, centralized metadata, policy-as-code, and streaming integration. Lovely. cloud architecture guide

The hard part is sequencing from today’s reality to that future without blowing up delivery.

A data architect must build transition states:

  • what gets standardized first
  • where Kafka adds value now
  • which data domains need stewardship before migration
  • what stays on-prem for regulatory or latency reasons
  • how IAM controls evolve alongside cloud adoption

Architecture is not just the destination. It’s the route that people can actually travel.

What a data architect is not

It helps to be blunt here.

A data architect is not:

  • just a DBA with a broader title
  • just a reporting specialist
  • just the owner of a data warehouse
  • just a governance admin
  • just someone who reviews schemas
  • just a cloud data platform engineer
  • just an enterprise architect who occasionally mentions data

Those can all overlap with the role. But the data architect’s value is in connecting them.

And no, the role is not obsolete because “the product teams own their data now.” That’s a fashionable misunderstanding. Domain ownership is useful. Enterprise coherence is still necessary. Without it, you don’t get autonomy. You get fragmentation with better branding.

The difference between a good data architect and a weak one

Here’s a table that, frankly, captures more reality than most role descriptions.

That’s the real split. Not certifications. Not tools. Judgment.

How this applies in real architecture work

Let’s make this practical, because this is where many articles become fluffy.

Diagram 2 — What Is Data Architect
Diagram 2 — What Is Data Architect

In day-to-day enterprise work, a data architect is involved in decisions like:

During a new digital banking initiative

The bank wants a new mobile onboarding journey. Product wants instant account creation. Compliance needs KYC checks. Risk wants fraud signals. Marketing wants customer events in real time. Analytics wants all onboarding data in cloud for funnel analysis.

A data architect helps answer:

  • Which system becomes the source of truth for customer identity?
  • What event is published to Kafka when onboarding reaches a meaningful state?
  • What fields are allowed into cloud analytics without violating privacy controls?
  • How is IAM enforced so only approved services and roles can access KYC attributes?
  • What customer identifier is shared across channels and downstream systems?

Without data architecture, every team invents its own answer.

During a merger or acquisition

Two banks merge. Both have customer master data. Both have product hierarchies. Both have IAM structures. Both claim their system is the “golden source.”

This is exactly where data architecture earns its keep.

The data architect has to define:

  • survivorship rules
  • data domain ownership
  • reference data harmonization
  • event contract convergence
  • access policy alignment
  • migration and coexistence patterns

This is ugly work. Political, messy, full of compromise. It is also core architecture work.

During cloud modernization

An enterprise moves analytics and selected operational data services to cloud. Suddenly everyone wants to ingest everything into the new platform.

A strong data architect slows the chaos down just enough to ask:

  • What data classifications apply?
  • Which datasets require masking, tokenization, or regional residency controls?
  • How will IAM roles map across cloud services, data platforms, and consuming teams?
  • What lineage is mandatory?
  • Which Kafka topics should feed cloud data products directly, and which need curation first?

Cloud doesn’t remove architecture. It punishes the lack of it faster.

A real enterprise example: retail banking, Kafka, IAM, and cloud

Let’s take a realistic example.

A mid-sized retail bank is modernizing customer and account data flows. Historically, it has:

  • a core banking platform on-prem
  • a CRM platform
  • a digital banking channel stack
  • separate KYC and AML systems
  • nightly batch feeds into an enterprise warehouse
  • fragmented IAM, with inconsistent service accounts and broad human access
  • a new cloud data platform for analytics and machine learning
  • Kafka introduced as the strategic event backbone

The bank’s business goals are sensible enough:

  • near-real-time customer updates across channels
  • faster onboarding decisions
  • reduced reconciliation effort
  • better fraud and risk analytics
  • controlled movement of data into cloud
  • stronger auditability and access control

Now here’s where people usually go wrong. They think the answer is simply:

  1. publish customer events into Kafka
  2. land everything in cloud
  3. let downstream teams consume what they need

That is not architecture. That is hopeful plumbing.

A real data architecture approach would look more like this.

Step 1: Define the domain boundaries

The architect separates:

  • party/customer identity
  • account
  • onboarding case
  • KYC profile
  • transaction
  • channel interaction
  • IAM identity and entitlements

These are related, but not the same thing. Keeping them distinct matters.

Step 2: Identify authoritative sources

For example:

  • legal customer identity: mastered in customer domain service with validated KYC inputs
  • account status: authoritative in core banking
  • digital profile preferences: authoritative in channel platform
  • access entitlements: authoritative in IAM platform

This avoids the common disaster where every downstream copy starts behaving like a source system.

Step 3: Define Kafka event contracts carefully

Instead of publishing giant “customer changed” blobs, the bank defines meaningful business events such as:

  • CustomerIdentityVerified
  • CustomerContactDetailsUpdated
  • AccountOpened
  • AccountStatusChanged
  • KYCReviewCompleted

Each event has:

  • clear business meaning
  • stable identifiers
  • versioning rules
  • classification tags
  • producer ownership
  • consumer expectations documented

This is boring work. It is also the difference between a usable event backbone and topic sprawl.

Step 4: Align IAM to data sensitivity

The architect works with IAM and security teams so that:

  • only approved producer services can publish to regulated topics
  • consumer groups are authorized by domain and purpose
  • sensitive fields are masked or omitted from broadly consumed topics
  • cloud ingestion pipelines use managed service identities, not shared secrets
  • human access to customer-level data in cloud is role-based and tightly audited

This is where many enterprises fail. They modernize data movement but leave access patterns basically medieval.

Step 5: Design cloud landing and curation patterns

Not all Kafka topics go straight into analytics as-is.

The architect defines layers:

  • raw ingestion for controlled technical replay
  • curated domain datasets for analytics
  • restricted views for sensitive PII
  • feature datasets for fraud models under stricter access control

Again, not glamorous. Necessary.

Step 6: Establish stewardship and quality controls

The customer domain has named ownership. Quality rules are explicit:

  • no account without valid party reference
  • no verified customer without minimum KYC attributes
  • status transitions must follow allowed lifecycle rules
  • duplicate resolution rules are monitored

This turns architecture into operating discipline.

What changed?

Within a year, the bank reduces batch dependency for several customer and account use cases, improves onboarding visibility, and gives analytics teams fresher data in cloud. More importantly, it does this without turning Kafka into an ungoverned rumor mill or cloud into a regulated data swamp.

That’s what data architecture looks like when it’s doing its job.

Common mistakes data architects make

Let’s be honest. Architects create plenty of the mess themselves.

1. Designing for elegance instead of adoption

This is the classic sin.

A beautifully normalized enterprise model, a pristine canonical schema, a layered governance framework — all useless if delivery teams can’t or won’t implement them. Architecture has to survive contact with budgets, deadlines, and platform limitations. EA governance checklist

2. Treating every data issue as a technology issue

A lot of data failures are ownership failures, meaning failures, or policy failures. Buying a metadata tool does not fix undefined stewardship. Deploying Kafka does not fix bad event semantics. Moving to cloud does not fix duplicate customer records.

3. Ignoring IAM until late in the design

This one is expensive.

If data access rules are not built into domain design, integration design, and platform patterns early, the enterprise ends up retrofitting controls after the fact. That usually means over-restriction, workarounds, and audit pain.

4. Forcing canonical models where they do not belong

I said this earlier and I’ll say it again: canonical models are often overused.

They can be useful, especially for stable enterprise concepts. But forcing every interaction through an enterprise-wide canonical layer often creates translation overhead, weak ownership, and endless arguments. Sometimes domain-oriented contracts are cleaner.

5. Confusing data lake accumulation with architecture

Just because data lands in cloud storage does not mean it is architected. Enterprises love to celebrate ingestion metrics. “We onboarded 2,000 datasets.” Great. Are they understood, governed, trusted, secured, and actually used? If not, you’ve built digital landfill.

6. Neglecting operational reality

Data architects sometimes design flows without understanding support models, replay strategies, schema evolution, retention costs, and failure handling. Kafka topics don’t manage themselves. Cloud pipelines don’t magically stay compliant. Operational architecture matters.

7. Staying too abstract

The role can drift into vague enterprise language very quickly. If an architect cannot discuss actual datasets, actual event contracts, actual IAM patterns, and actual system constraints, they are not doing enough real architecture.

Contrarian thoughts that need saying

A few opinions that may annoy people.

“Single source of truth” is often oversimplified

In enterprise reality, truth is contextual. Legal identity, marketing preference, operational status, and analytical enrichment may each have different authoritative origins. Pretending one platform is the source of truth for everything is lazy architecture language.

More data democratization is not always better

The modern slogan is “make data available to everyone.” No. Make the right data available to the right people and systems under the right controls. Especially in banking. Uncontrolled access is not empowerment. It’s future incident reporting.

Event-driven architecture does not eliminate data architecture

If anything, it increases the need. Once many consumers depend on events, semantics, identifiers, versioning, and lineage become even more important.

Cloud-native does not mean architecture-native

Some teams think using cloud-managed services automatically implies good architecture. It doesn’t. You can build fragmented, insecure, expensive nonsense in the cloud very efficiently.

Governance should be opinionated

Architects sometimes try to keep everyone happy with vague standards. That just creates loopholes. Good governance says no when needed. Not constantly, but clearly. architecture decision record template

What skills make a strong data architect?

The role needs a mix that is broader than many people expect.

Technical depth

They should understand:

  • data modeling
  • integration patterns
  • event streaming and Kafka fundamentals
  • cloud data platforms
  • database and storage patterns
  • metadata and lineage concepts
  • IAM and access control basics
  • privacy and regulatory implications

Not necessarily hands-on engineer level in every area, but enough to make credible decisions.

Business understanding

A data architect who doesn’t understand the business language of customer, account, risk, settlement, claims, or product is flying blind.

Communication

They need to explain trade-offs to:

  • engineers
  • product teams
  • governance leads
  • security teams
  • executives
  • auditors, sometimes unfortunately

Pragmatism

This is underrated. The best architects know when “good enough and governable” is better than “perfect and never delivered.”

Courage

Real architecture means making calls that some teams won’t love:

  • no, that system is not the master
  • no, that topic cannot expose those fields
  • no, you cannot replicate unrestricted customer data into every cloud workspace
  • no, this event name is too vague to be useful

That’s part of the job.

Where the role fits with other architecture roles

In larger enterprises, data architects overlap with enterprise, solution, integration, security, and platform architects.

The healthiest model is usually this:

  • Enterprise architect sets broader capability, operating model, and strategic alignment.
  • Data architect defines data domain structure, information flows, governance, and policy-aligned patterns.
  • Solution architect applies those principles to a specific initiative.
  • Integration architect focuses on APIs, events, messaging, and system interaction patterns.
  • Security architect defines security controls, including IAM and data protection requirements.
  • Platform architect shapes cloud or data platform capabilities.

In weak organizations, these roles fight over territory. In strong ones, they collaborate through clear decision rights.

A data architect should not try to own everything. But they absolutely should influence anything that changes the meaning, movement, control, or lifecycle of data.

So, what is a data architect really?

If I strip away the job-description language, here’s my answer:

A data architect is the person responsible for making enterprise data make sense across systems, teams, and controls.

They decide what data means, where it lives, how it moves, who owns it, who can access it, and how it stays usable as the enterprise changes.

That’s why the role matters.

Not because data is an asset. Every brochure says that.

It matters because without data architecture, enterprises build systems that function locally and fail collectively. The CRM works, the core system works, Kafka works, the cloud platform works, IAM works — and the enterprise still cannot trust or coordinate its own data.

A real data architect prevents that.

Not perfectly. Not elegantly all the time. Sometimes with compromise, sometimes with blunt standards, sometimes by saying “no” more than people want.

But that’s architecture. It’s supposed to shape reality, not decorate it.

FAQ

1. What is the difference between a data architect and a data engineer?

A data engineer builds and operates data pipelines, transformations, and platform components. A data architect defines the structure, standards, integration patterns, governance, and security model those implementations should follow. In strong teams, they work closely together. enterprise architecture consulting

2. Does a data architect need to know Kafka and event-driven architecture?

Yes, in most modern enterprises, at least at a practical level. A data architect should understand when Kafka is appropriate, how event contracts should be designed, how schema evolution works, and how streaming affects governance, lineage, and access control.

3. Is a data architect still needed if the company has a cloud data platform?

Absolutely. A cloud platform gives you capability, not coherence. Someone still has to define data domains, quality rules, access patterns, IAM alignment, and what should or should not move into cloud environments.

4. How does IAM relate to data architecture?

Directly. IAM determines who or what can access data, under what conditions, and with what restrictions. Data architecture should account for role-based or attribute-based access, service identities, masking, tokenization, and auditability from the beginning, not as an afterthought.

5. What is the biggest mistake organizations make with data architecture?

Treating it as documentation instead of decision-making. The biggest failure is not lack of diagrams. It’s lack of clear choices about ownership, meaning, integration, and control. When those choices are missing, the enterprise gets fragmentation no matter how modern the tools are.

Frequently Asked Questions

What is enterprise architecture?

Enterprise architecture is a discipline that aligns an organisation's strategy, business processes, information systems, and technology. Using frameworks like TOGAF and modeling languages like ArchiMate, it provides a structured view of how the enterprise operates and how it needs to change.

How does ArchiMate support enterprise architecture practice?

ArchiMate provides a standard modeling language that connects strategy, business operations, applications, data, and technology in one coherent model. It enables traceability from strategic goals through business capabilities and application services to the technology platforms that support them.

What tools are used for enterprise architecture modeling?

The main tools are Sparx Enterprise Architect (ArchiMate, UML, BPMN, SysML), Archi (free, ArchiMate-only), and BiZZdesign Enterprise Studio. Sparx EA is the most feature-rich option, supporting concurrent repositories, automation, scripting, and integration with delivery tools like Jira and Azure DevOps.