Kafka Topic Design Strategy for Enterprise Domains

⏱ 7 min read

Executive summary

Kafka topic design is where organizational structure becomes technical structure. Kafka’s documentation positions topics as partitioned and distributed across brokers, making topic/partition strategy a core scaling and reliability concern.

Enterprise topic design therefore must align ownership, naming, retention, and schema evolution. Confluent’s schema evolution documentation provides compatibility mechanisms that should be integrated into topic design—because a topic without compatibility governance becomes a breaking-change surface. EA governance checklist

Domain-aligned naming and ownership model
Retention classes and replay expectations
Partitioning strategy considerations
Governance: compatibility modes and change review
Pitfalls and anti-patterns

Figure 1: Kafka topic taxonomy — domain events, integration, and internal operational topics — **Figure 1:** Kafka topic taxonomy — domain events, integration, and internal operational topics

Topic taxonomy: three categories every enterprise needs

Figure 2: Topic taxonomy — domain events, integration topics, and operational topics — **Figure 2:** Topic taxonomy — domain events, integration topics, and operational topics

A well-designed topic taxonomy is the foundation of a governable Kafka platform. Without it, topics proliferate chaotically — developers create topics ad-hoc, naming conventions diverge, and nobody knows what data flows where. The taxonomy organizes topics into three categories with distinct governance rules. ArchiMate modeling standards

Domain event topics carry business-meaningful events: payments.transaction.authorized, orders.item.shipped, customers.profile.updated. Naming convention: domain.entity.action. These are the most valuable topics — they represent the real-time pulse of the business. Governance: owned by the domain team, schema-registered, backward-compatible evolution required.

Integration topics bridge Kafka with external systems: legacy.erp.sync.daily (batch sync from legacy ERP), partner.feed.inbound (partner data feeds), cdc.database.changes (Change Data Capture from operational databases). Naming convention: source.system.purpose. Governance: owned by the integration team, often carrying raw data that requires transformation before consumption.

Operational topics serve platform-internal purposes: audit.log.all-events (compliance audit trail), retry.dlq.payments (dead letter queue for failed payment processing), internal.metrics.pipeline (platform telemetry). Naming convention: purpose.scope.detail. Governance: owned by the platform team, not exposed to domain consumers.

Topic lifecycle management

Topics have a lifecycle: Proposed → Approved → Active → Deprecated → Archived. New topics require a design review that checks: naming compliance, schema registration, partition count justification, retention policy, and consumer access list. This prevents the "wild west" of uncontrolled topic creation that plagues unmanaged Kafka deployments.

Partition strategy by throughput tier

Figure 3: Partition strategy — partition count and configuration by throughput tier — **Figure 3:** Partition strategy — partition count and configuration by throughput tier

Partition count is the most consequential Kafka design decision because it cannot be reduced after creation (only increased). Too few partitions limits consumer parallelism and throughput. Too many creates overhead in metadata management, leader elections, and end-to-end latency. The right number depends on the throughput tier.

Low volume (under 1K messages/sec): Three partitions provide sufficient parallelism for a single consumer group. Standard retention (7 days) is appropriate. Most enterprise topics fall into this tier — not every event stream is high-throughput. Resist the temptation to over-partition: 50 partitions for a 10-message-per-second topic wastes broker resources.

Medium volume (1K–10K messages/sec): Six to twelve partitions enable multiple consumer instances to process in parallel. Consider log compaction for topics that represent current state (customer profiles, account balances) rather than event history. Multiple consumer groups are common at this tier — the same events feed analytics, search indexing, and downstream services.

High volume (10K–100K messages/sec): Twelve to thirty partitions, potentially on dedicated brokers to prevent resource contention with lower-volume topics. Tiered storage moves older data to object storage (S3) to reduce broker disk pressure. At this tier, partition key design is critical — a poorly chosen key creates hot partitions that bottleneck the entire topic.

Ultra volume (over 100K messages/sec): Thirty to a hundred partitions on a dedicated Kafka cluster. Custom tuning of batch size, linger time, buffer memory, and compression is required. At this scale, the platform team should be involved in topic design — a misconfiguration can cascade into cluster-wide performance degradation.

Topic naming governance

Topic names are the primary discovery mechanism in Kafka. Without naming governance, developers create topics like myservice-events, test_payments, and data-feed-v2 — names that tell nothing about domain ownership, data content, or intended use.

Enforce a naming convention through pre-creation hooks: {domain}.{entity}.{action} for domain events (e.g., payments.transaction.authorized), {source}.{system}.{purpose} for integration topics (e.g., legacy.erp.sync-daily), and {function}.{scope}.{detail} for operational topics (e.g., audit.log.all-events). Every topic name should be self-documenting — an architect looking at the topic list should understand the domain landscape without reading documentation.

# Topic creation with governance validation
# This hook rejects topics that don't match the naming convention
kafka-topics --create --topic payments.transaction.authorized   --partitions 6 --replication-factor 3   --config retention.ms=604800000   --config cleanup.policy=delete   --bootstrap-server kafka:9092

Topic lifecycle and deprecation

Topics, like applications, have a lifecycle. Without explicit lifecycle management, deprecated topics linger indefinitely — consuming storage, appearing in monitoring dashboards, and confusing new team members who discover them.

Implement a topic lifecycle with five states: Proposed (design review pending), Active (in production with active producers and consumers), Deprecated (still active but no new consumers should subscribe — consumers should migrate to the replacement topic), Draining (retention period running out, consumers should have migrated), and Archived (topic deleted, metadata preserved in the architecture repository for audit).

Governance automation: scan consumer groups weekly. If a topic has had zero active consumers for 90 days, flag it for deprecation review. If a deprecated topic has been draining for longer than its retention period, archive it. This prevents the "ghost topic" problem that plagues unmanaged Kafka deployments — hundreds of topics where nobody knows if they are still needed. Sparx EA governance best practices

Compaction vs deletion: choosing the right cleanup policy

Every topic must have an explicit cleanup policy. The choice between deletion and compaction fundamentally changes the topic's semantics and use cases.

Deletion (cleanup.policy=delete): Messages are removed after the retention period expires. Use for event streams where historical sequence matters: transaction events, user activity events, system logs. Consumers that fall behind and miss the retention window lose data permanently. This is appropriate for time-series data where events older than the retention period have no analytical or operational value.

Compaction (cleanup.policy=compact): Kafka retains only the latest value for each key, discarding older entries. Use for current-state topics: customer profiles, account balances, configuration settings. This creates an effective key-value store where consumers can always reconstruct the latest state by reading the entire compacted topic. Compaction is essential for GDPR compliance: producing a tombstone record (null value) for a customer's key ensures their data is permanently removed after the next compaction cycle.

Compact + delete (cleanup.policy=compact,delete): The hybrid approach retains the latest value per key AND deletes records older than the retention period. Use when you need current state but also want to limit storage growth. Common for entity-change topics where the current state matters but historical changes older than 30 days are unnecessary.

If you'd like hands-on training tailored to your team (Sparx Enterprise Architect, ArchiMate, TOGAF, BPMN, SysML, Apache Kafka, or the Archi tool), you can reach us via our contact page.

Frequently Asked Questions

What is enterprise architecture?

Enterprise architecture is a discipline that aligns an organisation's strategy, business operations, information systems, and technology infrastructure. It provides a structured framework for understanding how an enterprise works today, where it needs to go, and how to manage the transition.

How is ArchiMate used in enterprise architecture practice?

ArchiMate is used as the standard modeling language in enterprise architecture practice. It enables architects to create consistent, layered models covering business capabilities, application services, data flows, and technology infrastructure — all traceable from strategic goals to implementation.

What tools are used for enterprise architecture modeling?

Common enterprise architecture modeling tools include Sparx Enterprise Architect (Sparx EA), Archi, BiZZdesign Enterprise Studio, LeanIX, and Orbus iServer. Sparx EA is widely used for its ArchiMate, UML, BPMN and SysML support combined with powerful automation and scripting capabilities.