Your Kafka Topics Are Just Tables with Better Names

⏱ 21 min read

There’s a lie we tell ourselves when we adopt event streaming.

We say we’ve escaped the database. We say we’ve moved beyond CRUD. We say topics are fundamentally different from tables, that streams are somehow more alive, more modern, more fit for the age of microservices. microservices architecture diagrams

Then you look at a real enterprise Kafka estate six months later. event-driven architecture patterns

You see a topic called customer_master_v2. Another called order_status_current. A “reference data” topic compacted into the shape of a key-value table. A CDC stream mirroring invoice_header. A materialized view in a stream processor. A cache in a consumer. A search index built from events. Three reconciliation jobs. Five “temporary” compatibility topics. Suddenly the magic starts to wear off.

Kafka topics are not databases. But in enterprise architecture, they often behave like tables with better names.

That’s not an insult. It’s a clue.

If you don’t understand the table-like forces acting on topics—schema stability, key design, update semantics, retention, joins, ownership, reconciliation—you end up with a streaming platform that is expensive, fragile, and semantically muddy. If you do understand them, topics become one of the sharpest tools available for domain boundaries, integration autonomy, and evolutionary architecture.

The point is not to pretend topics are tables. The point is to admit that enterprise data has gravity. And gravity always wins.

This article is about topic schema architecture: how to design Kafka topics as domain artifacts instead of accidental transport pipes, how to migrate from table-centric integration without creating event-shaped chaos, and where the approach breaks down.

Context

In most large organizations, data integration starts in one of two ways.

Either everything is anchored on a shared operational database and its surrounding reporting extracts, or everything is anchored on APIs with a side-channel of nightly batch files. Kafka enters the picture later, usually when people need lower latency, better decoupling, or an answer to the phrase “real-time data platform.”

The first implementation is almost always tactical. Put Debezium on the ERP database. Publish customer changes. Stream orders to downstream services. Feed an analytics platform. Build a notification engine. Start a fraud detection use case. Before long Kafka becomes the spinal cord of the estate.

That’s when architecture matters.

Because Kafka is easy to start and surprisingly easy to misuse. Teams create topics that mirror source tables, topics that represent commands, topics that hold business facts, topics that carry integration envelopes, and topics that are really just RPC responses with extra steps. The same cluster ends up carrying domain events, CDC payloads, snapshots, caches, retries, dead-letter records, audit logs, and half-designed state synchronization streams.

This isn’t failure. It’s normal enterprise behavior. Organizations don’t migrate into purity. They migrate under pressure.

The important question is not whether a topic “should” look like a table. The important question is whether the topic schema reflects stable domain semantics or whether it merely leaks somebody else’s storage model.

That distinction is where good architecture starts.

Problem

Teams often approach Kafka with a false dichotomy:

  • Tables are old, rigid, and centralized.
  • Topics are modern, decoupled, and event-driven.

Reality is messier.

A topic has a schema. It has a key. It has a lifecycle. It has retention rules, evolution constraints, consumers with compatibility expectations, and operational meanings attached to records over time. That sounds suspiciously familiar because it is. These are the same architectural pressures that shaped table design, just expressed through logs and consumers rather than rows and queries.

The problem appears in several recurring forms:

1. Storage model leakage masquerading as event design

A team takes a source table, emits every insert/update/delete into Kafka, and calls it an event contract. But CUSTOMER_TBL becoming customer_tbl_cdc is not domain-driven design. It is plumbing with a schema registry.

Consumers are then forced to learn source-system quirks: nullable fields that mean three different things, surrogate keys with no business meaning, update storms triggered by technical columns, and entity models no bounded context actually wants.

2. Event design without state design

The opposite failure also happens. Teams obsess over naming “business events” but ignore the fact that downstream consumers need current state, replay semantics, identity rules, and reconciliation paths. They produce elegant event names like CustomerAddressCorrected, then discover every consumer separately reconstructs customer state and each one does it differently.

A log without a coherent state model is just distributed confusion.

3. Topic sprawl without ownership

Once many teams publish to Kafka, topics multiply faster than governance can keep up. There are multiple variants of customer, product, account, and order. Some are authoritative. Some are denormalized. Some are snapshots. Some are commands. Nobody is quite sure which should be used for a new consumer, so one more gets added. EA governance checklist

This is how platforms become swamps.

4. Migration through duplication rather than semantics

Organizations moving from batch ETL or shared databases often stream the old world into Kafka without changing the integration model. The result is lower latency but not better architecture. Every service still depends on centralized master data structures. Every schema change still causes broad coordination. All we’ve done is replace file transfers with topic subscriptions.

Faster coupling is still coupling.

Forces

Topic schema architecture lives in the tension between several competing forces. Ignore any one of them and the design falls apart under scale or change.

Domain semantics

A topic should mean something in the business, not just in middleware. If a topic is called customer-profile, what is a customer in that bounded context? Is it a legal party, a marketing identity, a billing account holder, or a household representative? Enterprises routinely collapse these into one “customer” concept and spend years paying for that shortcut.

Domain-driven design matters here because topics are integration contracts, and contracts need language. If the language is muddy, every consumer invents its own translation.

Autonomy of bounded contexts

Each service or domain wants to own its data model and evolve independently. But integration requires some stable shared meaning. Kafka sits exactly on this fault line. Publish too close to internal storage and you leak implementation. Publish too abstractly and consumers can’t use the data.

Good topic architecture is not about maximizing decoupling in the abstract. It is about choosing the right coupling: coupling on domain meaning rather than internal structure.

State versus facts

Some consumers need immutable facts: order placed, payment captured, shipment delivered. Others need current state: latest customer profile, current product availability, active policy terms. Kafka supports both patterns, but they are not the same.

A fact stream is append-only history.

A state topic is often a compacted representation of latest truth.

Many enterprise flows need both.

Pretending one can do the job of the other usually leads to excessive joins, replay complexity, or incorrect downstream models.

Evolution over time

Schema evolution is not optional. Businesses change, regulations change, mergers happen, product lines split, identifiers are replaced, and legal reporting appears out of nowhere. Topic schemas must absorb these changes without forcing synchronized rewrites across dozens of consumers.

Backward compatibility rules, versioning discipline, and semantic stability matter far more than whether you used Avro, Protobuf, or JSON.

Operational gravity

Theoretical elegance evaporates the first time a replay floods downstream systems, a compaction policy fails to preserve enough history, or a CDC stream reorders records under partition key changes. Architectures that work in slideware but fail under retention, backfill, and reconciliation are not architectures. They are decorative optimism.

Enterprise coexistence

Most organizations cannot stop the world and redesign all integrations around events. They must live with databases, ESBs, APIs, batch transfers, vendor packages, and reporting stores for years. Kafka architecture succeeds when it can coexist with legacy while steadily reducing dependence on it.

That means migration strategy is part of the design, not a later concern.

Solution

The practical answer is simple, though not easy:

Design Kafka topics as domain data products with explicit state semantics, not just transport channels.

That means treating a topic a little like a table, a little like an API, and a lot like a bounded-context contract.

A good topic schema architecture usually distinguishes among four kinds of streams:

  1. Domain event topics
  2. Immutable business facts meaningful in the ubiquitous language.

    Example: order-placed, payment-settled, claim-submitted.

  1. Entity state topics
  2. Compacted or upsert-style current-state representations of key business entities.

    Example: customer-profile, product-catalog-item, policy-summary.

  1. Integration or CDC topics
  2. Source-system change streams, explicitly acknowledged as technical integration artifacts.

    Example: sap.customer.cdc, oracle.invoice-header.cdc.

  1. Derived or analytical topics
  2. Published projections, aggregates, or enriched models optimized for specific use cases.

    Example: customer-360-materialized, inventory-risk-score.

The architectural move is to stop confusing these with one another.

A CDC topic is not automatically a domain topic.

A domain event is not automatically a good representation of current state.

A derived topic is not an authoritative system of record.

Once you classify topics this way, schema design becomes much clearer.

Topic design principles

1. Name for business meaning, not source origin

customer-profile is better than crm_customer_tbl_v4 if the topic represents a domain contract. If it really is raw CDC, name it honestly as raw CDC.

2. Put identity first

The key is architecture. It determines partitioning, ordering scope, compaction, replay behavior, deduplication, and consumer join patterns. A topic without deliberate key design is a future incident report.

3. Make state semantics explicit

Consumers need to know whether a record means:

  • a fact that happened,
  • a replacement of current state,
  • a correction of a previous record,
  • or a tombstone indicating deletion.

Ambiguity here creates subtle, expensive bugs.

4. Separate authoritative from derived

Every important topic should answer: who owns this truth? Can downstream systems rely on it as the source of record for this concept, or is it a convenience projection?

5. Design for reconciliation

No matter how clean the stream architecture is, enterprise systems drift. Messages are replayed, source systems are corrected, identifiers merge, consumers fall behind, and bad code is deployed. If a topic cannot be reconciled against an authoritative source or regenerated from upstream truth, it will eventually become untrustworthy.

6. Prefer semantic stability over model completeness

A thinner stable contract beats a rich unstable one. Do not publish every internal field because a few consumers might want them. That is how implementation details become institutional debt.

Architecture

A healthy enterprise Kafka landscape often has a layered shape. Not centralized ownership, but layered semantics.

Architecture
Architecture

The raw integration layer exists because reality exists. Source systems emit changes in their own shape. Fine. Capture them. But don’t pretend they are already good enterprise contracts.

Then introduce a translation layer—sometimes a dedicated service, sometimes stream processing, sometimes code embedded in domain services—that maps technical source changes into domain-aligned topics. This is where anti-corruption lives.

Raw topics versus canonical fantasies

Many organizations chase a “canonical event model” too early. They define one universal Customer schema that every system must publish and consume. It sounds tidy. It usually becomes bureaucracy in XML’s clothing.

A better approach is bounded context contracts. Sales can publish a sales customer representation. Billing can publish an account holder representation. Risk can publish a party screening representation. These should be related and mapped, not collapsed into a single enterprise abstraction unless there is genuine shared meaning.

That is classic domain-driven design: respect context boundaries and translate where they meet.

Entity state topics

This is where the “topics are tables with better names” line becomes useful.

Compacted entity topics are often the most valuable thing in a Kafka architecture because they provide a durable integration surface for current state. A customer-profile topic keyed by customerId behaves a lot like a distributed table of latest customer state, but with replay, subscriptions, and decoupled materialization.

That is enormously powerful.

It allows microservices to build local read models without synchronous calls.

It allows new consumers to bootstrap from retained state.

It allows stream processors to join reference data efficiently.

It creates a stable surface between event history and operational read needs.

But this only works if the topic schema is disciplined.

Diagram 2
Entity state topics

Schema architecture and semantics

A topic schema should answer several questions immediately:

  • What is the business entity or event?
  • What is the key?
  • Is it immutable or upserted?
  • What does deletion mean?
  • What fields are mandatory and why?
  • What timestamps matter: event time, processing time, source commit time?
  • Is this authoritative or derived?
  • What compatibility guarantees exist?

For example, an entity state topic might carry:

  • business key,
  • version or sequence if available,
  • source system metadata,
  • effective timestamps,
  • status,
  • attributes required for known integration scenarios,
  • tombstone semantics.

That starts to look table-like because stable integration contracts often do. The difference is not shape; the difference is usage and ownership. A table is queried in place. A topic is distributed state over time.

Migration Strategy

Most enterprises cannot leap from shared tables to idealized event-driven domains. They need a strangler approach.

And this is where architecture earns its keep.

Step 1: Capture reality without glorifying it

Start with CDC or source extracts if necessary. Publish raw technical topics from core systems. Make them clearly technical. Keep them out of domain diagrams except as upstream dependencies.

This provides immediate value: lower-latency integration, auditability, and a replayable source feed.

But stop there and you’ve just built faster coupling.

Step 2: Introduce translation at domain seams

Create translation services that consume raw topics and publish bounded-context topics. The translation logic handles:

  • source-to-domain mapping,
  • identifier normalization,
  • deletion and merge semantics,
  • enrichment from reference data,
  • filtering of technical noise,
  • schema stabilization.

This is the strangler move. New consumers attach to translated topics, not raw source topics.

Step 3: Build entity state topics for high-value domains

For domains like customer, account, policy, product, and order, publish stable compacted state topics. These serve as enterprise integration backbones.

Do not wait for perfect event sourcing. State is often what consumers need most.

Step 4: Move consuming systems off direct database dependencies

Replace point-to-point queries and batch extracts with subscriptions or local materialized views sourced from state topics. This reduces load on core platforms and weakens shared-database coupling.

Step 5: Add reconciliation loops

During migration, there will be dual-running systems and unavoidable drift. You need explicit reconciliation:

  • compare source-of-truth snapshots to topic-derived state,
  • detect missing or duplicate records,
  • handle out-of-order updates,
  • repair consumer stores,
  • support replay from checkpoints.

A migration without reconciliation is faith-based engineering.

Step 6: Retire raw dependencies selectively

Once consumers are stable on translated topics, reduce direct usage of CDC topics. Some will remain for audit, lineage, or technical integrations. That’s fine. The point is to stop exposing them as primary enterprise contracts.

Step 6: Retire raw dependencies selectively
Retire raw dependencies selectively

Reconciliation is not optional

Streaming advocates sometimes talk as though logs abolish data repair. They do not. In large enterprises, reconciliation is one of the central architectural concerns.

Why? Because there are many ways drift emerges:

  • source connectors miss changes during failover,
  • bad deploys publish malformed events,
  • partition key changes reorder updates,
  • consumer bugs create incorrect local projections,
  • upstream systems perform late corrections,
  • duplicate messages slip through retry paths,
  • merged customer identities invalidate historical assumptions.

A robust architecture provides:

  • periodic full snapshot comparisons,
  • key-level replay capability,
  • idempotent consumers,
  • dead-letter review with replay paths,
  • traceable lineage from source mutation to published contract.

Without this, trust in the platform decays. And once trust decays, teams start going back to direct database reads “just to be safe.”

That is how modern architecture quietly dies.

Enterprise Example

Consider a large insurance group with three policy administration systems acquired through merger, a central CRM, a claims platform, and a digital servicing layer built as microservices.

Every system has a concept of “customer.” None mean the same thing.

  • CRM customer = marketing contact identity.
  • Policy admin customer = legal policyholder party.
  • Claims customer = claimant, witness, or representative.
  • Billing customer = account owner.
  • Digital customer = authenticated online profile.

Initially the enterprise streams raw CDC from all platforms into Kafka. Teams are excited. Data is moving in near real time. But within months the problems begin.

One mobile team consumes CRM changes and assumes they represent the policyholder. Wrong.

A claims microservice consumes policy admin updates and misses claimant relationships entirely.

Another team builds a “customer-360” by joining five raw topics and can’t explain why household merges create duplicate identities.

Every source schema change breaks something downstream.

The architecture team intervenes with a domain-first topic strategy.

What they do

  1. Keep raw CDC topics
  2. - crm.contact.cdc

    - policy.party.cdc

    - claims.participant.cdc

  1. Define bounded-context topics
  2. - customer-engagement-profile

    - policyholder-profile

    - claims-participant-profile

  1. Create a party identity resolution service
  2. This maps multiple source identifiers to enterprise party identities with explicit confidence and survivorship rules.

  1. Publish a compacted party-summary topic
  2. This topic is authoritative for cross-context identity references, not for all attributes.

  1. Publish domain events
  2. - policy-issued

    - claim-opened

    - customer-consent-updated

  1. Build reconciliation jobs
  2. Daily snapshot comparison between source systems, identity resolution outputs, and downstream read models.

What changes

The digital servicing platform no longer consumes raw CRM CDC. It consumes party-summary for identity and customer-engagement-profile for communication preferences. Claims systems consume claims-specific participant topics. Policy services consume policyholder state and policy events.

The organization does not create one universal customer topic. That restraint matters. It avoids false harmonization and keeps each bounded context coherent.

This is a real enterprise pattern: not purity, but controlled semantics over messy reality.

Operational Considerations

Topic schema architecture is as much operational discipline as design.

Partitioning and key strategy

Keys define ordering guarantees. If you key customer updates by source system ID but consumers reason in enterprise party ID, you will eventually hit ordering and join anomalies. Re-keying later is painful. Think hard up front.

Where aggregate consistency matters, align partition keys with aggregate identity. Where fan-out matters, accept the tradeoff.

Retention and compaction

Compacted topics are excellent for entity state, but they are not magic. Tombstones may disappear after compaction windows. Historical reconstruction may be incomplete. If regulatory or audit use cases require full change history, keep a separate immutable event or CDC stream.

Do not ask one topic to serve every retention need.

Schema governance

A schema registry helps, but governance is not the same as tooling. You need conventions around: ArchiMate for governance

  • required metadata,
  • compatibility modes,
  • semantic versioning expectations,
  • deprecation periods,
  • ownership and contactability.

If nobody knows who owns customer-profile, the schema may be valid and still be useless.

Consumer bootstrap

One underappreciated strength of state topics is consumer bootstrap. New services can rebuild local stores from a compacted topic instead of making bulk API calls against operational systems. This reduces startup friction and dependency on fragile backfill jobs.

But test replay times realistically. Reprocessing millions of records under pressure is where architecture meets infrastructure limits.

Security and privacy

Topics are easy to subscribe to, which makes them easy to overshare. Customer and employee data in Kafka demands the same rigor as databases:

  • field-level minimization,
  • access controls,
  • masking,
  • encryption where appropriate,
  • retention aligned with policy,
  • lineage for regulatory reporting.

A streaming platform can become a privacy incident at enterprise scale.

Tradeoffs

This approach is powerful, but not free.

Benefit: better semantic decoupling

Consumers depend on domain contracts instead of shared tables or direct APIs.

Cost: more deliberate modeling

You need teams who understand bounded contexts, not just serializers and connectors.

Benefit: replayable integration and local materialization

New consumers can bootstrap and recover more easily.

Cost: operational complexity

Kafka, schema evolution, stream processors, compaction, and replay all require mature platform engineering.

Benefit: migration path from legacy

Raw CDC can coexist with domain topics while the estate evolves.

Cost: duplication

You will have multiple representations of similar business concepts. This is not always bad, but it must be intentional.

Benefit: reduced load on core systems

Consumers subscribe instead of repeatedly querying transactional systems.

Cost: eventual consistency

Downstream state is not synchronized instantly or perfectly. Teams must design for lag, correction, and reconciliation.

The biggest tradeoff is philosophical: you are choosing distributed semantic autonomy over centralized simplicity. Done well, this scales change. Done poorly, it scales ambiguity.

Failure Modes

There are a few classic ways this architecture goes wrong.

Treating CDC as a final product

You expose raw database mutations as enterprise contracts. This works fast, then ages badly. Source internals leak everywhere.

Inventing a universal enterprise schema

You force every domain into one canonical model and create endless committees, brittle mappings, and lowest-common-denominator semantics.

Ignoring state topics

You publish only business events and assume consumers can all derive state themselves. They can, but inconsistently and expensively.

Overproducing derived topics

Every use case gets its own topic until nobody knows which is authoritative. Topic count rises; trust falls.

No reconciliation path

Consumer stores drift, replays produce surprises, and eventually teams revert to direct source reads.

Poor key design

Ordering breaks, deduplication fails, compaction becomes misleading, and cross-topic joins become guesswork.

Ownership without stewardship

A team publishes a topic, then moves on. No semantic maintenance, no deprecation plan, no response to consumer pain. The topic becomes legacy the day it launches.

When Not To Use

This pattern is not a universal answer.

Do not use Kafka-centered topic schema architecture when:

The problem is simple CRUD with low integration demand

If two systems just need a few synchronous operations with strong transactional consistency, an API and a database may be simpler and safer.

The domain is too volatile to stabilize contracts

In very early product discovery, domain semantics may change weekly. Publishing broad durable topic contracts too early can freeze the wrong model.

Consumers need rich ad hoc querying over current state

Kafka is not a general-purpose query engine. If the primary need is flexible querying, reporting, or transactional joins, use databases or analytical stores where they fit.

The organization lacks operational maturity

If you cannot manage schema governance, replay, dead letters, monitoring, and reconciliation, don’t pretend a streaming backbone will somehow organize itself.

You are trying to avoid domain modeling

Kafka does not rescue teams from unclear domain boundaries. It punishes them.

In short: don’t use topics as a fashionable substitute for architectural thinking.

Several patterns sit naturally beside this approach.

Anti-Corruption Layer

Essential when translating source-system schemas into bounded-context topics.

Event-Carried State Transfer

Useful for distributing current state to downstream consumers, especially via compacted topics.

Event Sourcing

Related but distinct. Event sourcing stores domain events as the system of record. Many Kafka estates are not event-sourced and should not pretend to be.

CQRS

State topics often serve command/query separation by feeding read models optimized for consumption.

Data Mesh

Topic ownership as domain data products fits well, provided governance and interoperability are real rather than slogan-based.

Strangler Fig Migration

Ideal for moving from shared databases and batch integration toward domain-aligned event and state contracts incrementally.

Summary

Kafka topics are not tables. But if you squint at an enterprise landscape, many of the good ones behave like tables with better names: keyed, governed, replayable, semantically stable, and deeply tied to how the business understands its data.

That’s not a failure of event-driven architecture. It’s the sign that integration contracts are carrying real business load.

The mistake is not making topics table-like. The mistake is doing it unconsciously—letting raw storage models, accidental keys, and technical noise define your contracts. The better move is to embrace topic schema architecture deliberately:

  • use raw CDC where reality demands it,
  • translate into bounded-context contracts,
  • publish both facts and state where needed,
  • design keys and schemas around domain semantics,
  • migrate progressively with a strangler strategy,
  • and build reconciliation in from the start.

In other words, respect the gravity of enterprise data without surrendering to it.

Because in the end, the best Kafka architectures are not the ones that shout “events” the loudest. They are the ones that give data a clear meaning, a clear owner, and a survivable path through change.

That is what architecture is for.

Frequently Asked Questions

What is event-driven architecture?

Event-driven architecture (EDA) decouples services by having producers publish events to a broker like Kafka, while consumers subscribe independently. This reduces direct coupling, improves resilience, and allows new consumers to be added without modifying producers.

When should you use Kafka vs a message queue?

Use Kafka when you need event replay, high throughput, long retention, or multiple independent consumers reading the same stream. Use a traditional message queue (RabbitMQ, SQS) when you need simple point-to-point delivery, low latency, or complex routing logic per message.

How do you model event-driven architecture in ArchiMate?

In ArchiMate, the Kafka broker is a Technology Service or Application Component. Topics are Data Objects or Application Services. Producer/consumer services are Application Components connected via Flow relationships. This makes the event topology explicit and queryable.