Caching Layer Topologies in Microservices

⏱ 21 min read

Caching is one of those architectural decisions that looks innocent on a whiteboard and turns vicious in production.

A team starts with a simple goal: reduce latency, protect a downstream dependency, and avoid paying the tax of repeated reads. So they add a cache. Maybe in the client. Maybe at the API gateway. Maybe inside the service itself. The first graph looks wonderful. Response times fall off a cliff in the right direction. Infrastructure costs flatten. Everyone congratulates themselves for being practical.

Then the business changes one discount rule. A customer sees the wrong price. A doctor portal shows an outdated allergy note. A sales dashboard lags by fifteen minutes while finance insists the number is now wrong enough to matter. Suddenly the cache is no longer an optimization. It is part of the domain. And once a cache becomes part of the domain, architecture matters.

That is the real subject here. Not “how to cache,” but where the cache lives in a microservice landscape, what semantics it accidentally acquires, and how to evolve topology without turning consistency into folklore.

In enterprise systems, cache placement is rarely a pure technical choice. It reflects bounded contexts, ownership boundaries, failure tolerance, traffic shape, data volatility, and operational maturity. A client cache says one thing about who is allowed to remember. A gateway cache says another about shared experience and policy. A service cache says something stronger: the domain service itself is curating truth for performance.

This article walks through those topologies—client cache, gateway cache, and service cache—with an opinionated lens. We will look at the forces that drive each choice, the migration path between them, tradeoffs, failure modes, and where Kafka and event-driven reconciliation fit. Along the way, I’ll make one argument clearly: a cache is not just a faster database read; it is a decision about where stale truth is acceptable.

Context

Microservices promise autonomy. Each service owns its data, exposes a contract, and evolves independently. In practice, autonomy collides with one stubborn fact: users and upstream systems do not care about service boundaries. They care about fast responses and coherent answers.

If a product detail page needs catalog data, price data, availability data, and promotion data, the user does not want a lecture on bounded contexts. They want the page in 200 milliseconds. That means architects end up composing multiple data sources while trying to keep ownership clear. Caching enters because the network is expensive, the database is finite, and repeated reads are wasteful.

But caches behave differently depending on placement:

Client cache stores responses or fragments near the caller.
Gateway cache stores shared responses at the edge or API aggregation layer.
Service cache stores data inside or adjacent to the microservice that owns the API.

These are not interchangeable. They have different blast radiuses, invalidation mechanisms, semantic guarantees, and operational burdens.

A lot of teams treat this as a ladder of sophistication. Start at the client, move to the gateway, then mature into service-side caching. That is too simplistic. In reality, these are different topologies for different domains. Some organizations end up using all three. The trick is knowing what each one is allowed to lie about.

Problem

The problem is not merely “reads are slow.” The deeper problem is balancing four things that pull against each other:

Latency
Freshness
Scalability
Semantic correctness

You can improve three. The fourth will send you an invoice.

Suppose your Order History service is queried ten times more often than the underlying order records change. Caching feels obvious. But what exactly are you caching?

A serialized HTTP response?
A projection of domain objects?
A computed aggregate such as loyalty eligibility?
A user-specific view that includes entitlements and masking?

That distinction matters because stale order history might be harmless, stale inventory might cause oversell, and stale patient medication data might be dangerous. Different data has different business half-lives.

This is where domain-driven design is useful, not fashionable. DDD asks us to model the meaning of data in its bounded context. Caching should follow that meaning. A customer profile in the CRM context may be relatively slow-changing. Credit exposure in a lending context may be recalculated after every event and is not something to “just cache” casually. Domain semantics determine whether staleness is a performance compromise or a business defect.

So the real problem is this:

> How do we place caches in a microservices architecture so that we improve latency and resilience without violating domain meaning, ownership boundaries, or consistency expectations? microservices architecture diagrams

Forces

Several forces shape cache topology. Ignore them and you will end up with accidental architecture.

1. Read/write ratio

High read, low write workloads are the natural habitat of caches. Product catalogs, reference data, branch metadata, tax tables, shipping zones—these are easy wins. High write or high volatility domains are less forgiving.

2. Data volatility and business half-life

Technical TTL is not the same as business tolerance.

A product image can be stale for hours. A seat inventory count cannot. Customer preferences may tolerate eventual consistency. Fraud holds may not. Good architects don’t ask “what TTL should we set?” first. They ask “how wrong can we be, for how long, and who pays if we are?”

3. Ownership and bounded context

A gateway cache may accelerate data across contexts, but it can also blur ownership by storing composite responses that no single domain owns. This is often fine for read models. It is dangerous for transactional decisions.

4. Personalization and cardinality

The more user-specific the response, the less attractive shared edge caching becomes. A highly personalized dashboard cached at the gateway can explode storage and reduce hit rate. Client-side caching may be better, or perhaps no cache at all.

5. Invalidation mechanisms

A cache without a plausible invalidation strategy is a future incident report.

Invalidation options include:

TTL/expiration
explicit purge APIs
versioned keys
event-driven invalidation through Kafka
write-through/write-behind patterns
reconciliation jobs

Each comes with different correctness and complexity tradeoffs.

6. Failure isolation

A local in-memory cache inside a service can protect that service from downstream slowness. A shared Redis cluster can become a high-performance dependency and a single point of pain. Centralizing cache can simplify operations and enlarge blast radius at the same time.

7. Compliance and security

Client and gateway caches are often poor places for sensitive data. PII, regulated financial data, clinical records, and entitlement-sensitive information demand careful encryption, token scoping, and eviction discipline. Sometimes the right cache topology is “none.”

8. Team structure

Conway’s Law always gets a vote. If the platform team owns the gateway, the domain teams own services, and mobile teams own clients, then cache placement becomes an organizational decision as much as a technical one. A topology that requires cross-team choreography for invalidation will age badly.

Solution

The best way to think about cache topology is not as a single choice, but as a layered decision model.

Use client caching when staleness is user-local, invalidation can be coarse, and reducing repeated fetches from the same consumer matters more than shared consistency.
Use gateway caching when many consumers request the same response shape, policy can be centralized, and the response is safe to share.
Use service caching when the owning domain can define cache semantics, invalidation follows domain events, and the service needs to shield a database or downstream dependency.

My bias is simple: put semantic caches as close as possible to the domain that understands the data. That usually means service-side. Put convenience caches at the client. Put broad, shared acceleration at the gateway only when the response is genuinely a shared read model and not a disguised domain decision.

Here is the landscape at a glance.

This topology is common in large enterprises: multiple cache layers, each serving a different purpose. The trick is preventing them from drifting into contradictory truths.

Architecture

Let’s take the three topologies one by one.

Client cache

Client cache lives in the browser, mobile app, desktop app, BFF, or consuming service. It is the cheapest cache to add because it often requires no central infrastructure. HTTP caching headers, ETags, local storage, memory caches, and consumer-side memoization are the usual tools.

The upside is obvious:

reduced repeat calls from the same caller
lower perceived latency
lower backend load
resilience during transient network issues

The downside is subtler: the service loses control over what callers continue to believe.

This matters in domains with user-specific semantics. A mobile banking app may cache statement summaries for snappy navigation. Fine. But if it caches available balance too aggressively, customers may make decisions on stale money. The issue is not technical freshness. It is domain meaning.

Client caches are strong when:

data is user-local or session-scoped
stale reads are acceptable for a short period
responses are expensive to fetch repeatedly
offline or low-connectivity behavior is valuable

They are weak when:

invalidation must be immediate
data is shared and rapidly changing
security risk from local persistence is high
multiple clients must see the same update at roughly the same time

In DDD terms, client caching is usually best for published read models, not core decision state.

Gateway cache

Gateway caching lives at the API edge, CDN, reverse proxy, or API gateway. It is attractive because it centralizes acceleration. If 50 clients all request the same product description, why make 50 requests hit the inner services?

It works beautifully for:

public or semi-public content
reference data
cacheable GET endpoints
aggregate read endpoints with broad reuse
response normalization and policy enforcement

It gets dangerous when the gateway starts caching responses that encode domain decisions or user-specific state.

This is the architectural smell: the gateway becomes a secret read model platform without domain ownership. Teams start depending on it for consistency, but no domain team truly owns the invalidation semantics. The platform team ends up holding a business rule they should never have inherited.

Use gateway caching for shared representations, not for business truth.

Service cache

Service-side caching lives where the domain team can reason about it. It may be in-memory, distributed, embedded near the service, or implemented as a sidecar/cache store. This is usually the most powerful topology because the service understands the lifecycle of the data.

Examples:

Pricing service caches computed price books and discount matrices
Customer profile service caches merged reference data from slow systems of record
Inventory service caches availability snapshots for query speed while updates stream in via Kafka

This lets you align cache invalidation to domain events:

ProductUpdated
PriceRuleChanged
CustomerTierRecalculated
WarehouseAllocationAdjusted

That is the right shape. Events from the bounded context drive invalidation or refresh. The cache becomes part of the service’s internal optimization strategy rather than a shared external assumption.

But service caches also have costs:

more code and operational logic
warm-up concerns
duplicate cached data across services
harder cross-service visibility
risk of cache/database inconsistency if write paths are messy

When teams are mature enough, service-side caching is where serious domain-aware performance architecture usually lands.

Domain semantics and cache design

This is the part many articles skip. They talk about TTLs and Redis clusters and never ask what the data means. That is how people end up caching the wrong thing perfectly.

In domain-driven design, the question is not only “what entity is this?” but also “what promises does this context make about it?”

Consider a retail enterprise:

Catalog context owns descriptions, images, categories.
Pricing context owns sellable price, discount rules, effective dates.
Inventory context owns stock position and allocation.
Checkout context owns order confirmation and payment state.

These look like data points on the same product page. They are not the same kind of truth.

Catalog can be stale for minutes or even hours.
Pricing may be stale for seconds depending on promotion windows.
Inventory may be stale enough to inform browsing but not enough to authorize checkout.
Checkout should not trust gateway-cached aggregates for payment decisions.

This naturally leads to multiple cache semantics in the same user journey. That is not inconsistency. That is good modeling.

A memorable rule: cache by business tolerance, not by technical convenience.

Migration Strategy

Most enterprises do not get to redesign cache topology from scratch. They inherit a monolith, a CDN rule set, a few heroic Redis clusters, and a lot of unwritten tribal knowledge. Migration needs to be progressive. This is a strangler fig problem.

You do not rip out all caching and replace it with an immaculate service-side strategy. You evolve.

Stage 1: Stabilize with client or edge caching

During early decomposition, client or gateway caching can reduce pressure on the monolith and hide some latency from newly introduced service calls. This buys time. It is tactical, not final.

Stage 2: Identify domain-sensitive reads

As bounded contexts emerge, classify read paths:

safe shared reference reads
user-specific reads
transactional decision reads
derived projections

This classification will tell you which caches can remain at the edge and which must move inward.

Stage 3: Introduce service-owned caches for hot domains

For domains with clear ownership and stable event models, move caching into the service. Publish events to Kafka for invalidation or refresh. Let the domain team own freshness policy. event-driven architecture patterns

Stage 4: Reconcile and retire accidental caches

Once service caches are reliable, edge caches that used to compensate for monolith slowness often become redundant or harmful. Remove them deliberately. Redundant caches are not harmless. They preserve stale assumptions.

Stage 5: Add reconciliation loops

In event-driven systems, invalidation messages can be delayed, duplicated, or missed. Production architecture needs reconciliation, not just optimism. Periodic rebuilds, compacted topics, snapshot comparison, or anti-entropy jobs are how you keep cache state honest over time.

A migration view looks like this:

This is how grown-up migration works: first reduce pain, then move semantics to the right owner, then add controls that assume the real world is messy.

Reconciliation and Kafka

Kafka fits naturally into cache topology when you stop treating cache invalidation as a side effect and start treating it as a first-class event flow.

A common pattern is:

Domain state changes in the owning service.
The service persists its source of truth.
It emits a domain event or change event.
Cache consumers invalidate or refresh affected keys.
Reconciliation jobs periodically verify cache correctness.

This can support:

service-local cache refresh
distributed cache invalidation
materialized read model updates
cross-region propagation

But there are traps.

If your cache invalidation relies solely on event delivery and there is no replay or reconciliation, one missed event can leave stale data indefinitely. If ordering matters and topics are partitioned poorly, price update events may race with promotion update events and build impossible combinations. If consumers update cache non-idempotently, duplicates can corrupt state.

A safer event-driven topology looks like this:

The important word here is repair. Real systems require repair. Event-driven invalidation is fast, but reconciliation is what makes it trustworthy over long periods.

Enterprise Example

Consider a global retail bank modernizing its customer servicing platform.

The legacy estate had a central customer information file, a product processor, a card platform, and a CRM package. Response times were poor because every “customer 360” query fanned out across aging systems. The first modernization step introduced an API gateway with aggressive response caching for customer summary screens. It worked—until it didn’t.

Relationship managers began seeing stale product holdings after same-day account openings. Support agents saw cached phone numbers after profile changes. Most painfully, regulatory preference flags for marketing consent lagged behind updates. The gateway cache had become a shared memory of customer state without any serious ownership model.

The architecture team stepped back and reframed the problem with bounded contexts:

Customer Profile context owned identity and contact data.
Product Holdings context owned account and card relationships.
Consent context owned marketing and privacy permissions.
Interaction context owned case and servicing history.

They then redesigned cache placement:

Client-side cache was kept for non-sensitive UI fragments and short-lived navigation state.
Gateway cache remained for static reference data such as branch directories, product metadata, and rate-card pages.
Service-side caches were introduced for Profile and Holdings, each driven by Kafka events from their owning services.
Consent was deliberately not broadly cached; reads came directly from the owning service or from tightly controlled service-local cache with short TTL and event invalidation.
Reconciliation jobs compared event-driven cache state against source projections every 30 minutes.

The result was not “everything is cached now.” It was better: different data had different memory policies according to domain risk.

Latency on customer summary views fell from 2.8 seconds median to under 400 milliseconds. Mainframe query load dropped materially. More importantly, compliance incidents caused by stale consent views disappeared because that domain stopped pretending it was cache-friendly.

That is enterprise architecture at its best: not chasing generic best practice, but placing performance mechanisms where the business can live with the consequences.

Operational Considerations

Caching is cheap to add and expensive to operate badly.

Observability

You need more than hit rate. Hit rate is a vanity metric if stale responses are hurting the business.

Track:

cache hit/miss ratio
stale read rate where measurable
key cardinality and eviction churn
latency by hit vs miss path
invalidation lag
event-to-cache propagation delay
reconciliation drift
fallback frequency during cache outage

Capacity and eviction

Shared caches fail in boring ways: memory fills, eviction spikes, hot keys dominate, and the database gets slammed by a thundering herd. Plan for cache warm-up, controlled TTL jitter, request coalescing, and backpressure.

Security

Do not let convenience outrun data classification. Encrypt sensitive cache entries where necessary. Scope keys carefully. Never let shared cache layers ignore entitlements or tenant boundaries. Multi-tenant cache leakage is the kind of bug that gets executives involved.

Multi-region behavior

If you run active-active regions, cache invalidation becomes distributed systems engineering. Replication lag, split-brain edge nodes, and region-local refresh logic can yield different truths by geography. For some domains that is acceptable. For others it is not.

Cost

Gateway and CDN cache can be economically attractive for broad, reusable traffic. Service-local caches can increase duplication but reduce central bottlenecks. Distributed cache clusters often look cheap until high availability, security, observability, and multi-region support are added.

Tradeoffs

There is no universally correct cache topology. There are only tradeoffs made explicit or hidden.

Client cache tradeoffs

Pros: simplest, low infrastructure burden, good for UX and offline behavior
Cons: weak central control, difficult invalidation, security concerns, inconsistent user experiences

Gateway cache tradeoffs

Pros: shared acceleration, centralized policy, shields backend from repetitive reads
Cons: blurred domain ownership, dangerous for personalized or fast-changing data, can become accidental system of record for read behavior

Service cache tradeoffs

Pros: domain-aware semantics, strong ownership, event-driven invalidation possible, protects databases and dependencies
Cons: higher implementation complexity, more operational logic, potential duplication, more moving parts

My practical heuristic:

If the cache needs domain events to stay correct, it probably belongs with the service.
If the cache is mostly a convenience for one caller, it probably belongs in the client.
If the cache serves the same public or shared representation to many callers, gateway caching is a good fit.

Failure Modes

Caches fail in patterns. Learn them before they become your postmortem template.

1. Stale forever

An invalidation event is missed. TTL is too long or absent. A key remains wrong indefinitely. This is why reconciliation matters.

2. Thundering herd

A popular key expires and thousands of requests stampede the backing store. Use request collapsing, soft TTLs, and jittered expiration.

3. Split semantics

Different layers cache different representations with different TTLs. The mobile app sees one price, the web app sees another, and the service has already updated a third. This is usually a topology smell, not just a tuning issue.

4. Cache dependency outage

Teams call the cache “just an optimization” until Redis is down and the database melts in five minutes. If the system cannot survive a cache miss storm, the cache is a dependency. Treat it that way.

5. Security leakage

Shared gateway cache forgets that authorization varies by user or tenant. One customer sees another customer’s data. This is not a bug. It is a career event.

6. Event ordering corruption

Kafka consumers process invalidation out of order or rebuild derived cache state from partially ordered events. The cache becomes internally inconsistent even though every message was delivered.

7. Rebuild gap

After failover or deployment, caches are cold and rebuild logic hammers upstream systems. Warm-up and rate controls matter more than teams think.

When Not To Use

Caching is not a moral good. Sometimes the right answer is to make the source read path faster and stop being clever.

Do not use caching when:

correctness requirements are near-transactional and stale reads are unacceptable
write frequency is so high that hit value is minimal
data is deeply personalized and shared cache efficiency is poor
authorization rules are complex and easy to violate in cached layers
the team lacks observability and operational discipline
the source can be optimized with better indexing, read replicas, CQRS read models, or denormalized projections instead

A common anti-pattern is using cache to hide poor domain boundaries. If every request requires stitching together six services because the business capability was sliced wrong, cache may reduce pain but preserve the real problem. Sometimes you do not need a better cache. You need a better service model.

Several patterns sit naturally beside cache topology.

CQRS: separate read models often reduce the need for awkward edge caches and give explicit ownership to query-optimized views.
Materialized views: especially useful when aggregates span contexts and can be built asynchronously.
Strangler Fig migration: ideal for evolving cache strategy during monolith decomposition.
Event-driven architecture with Kafka: supports invalidation, refresh, and asynchronous projection maintenance.
API Composition: often drives gateway caching, but should not become a substitute for proper read model design.
Bulkheads and circuit breakers: useful when cache misses can overload dependencies.
Outbox pattern: improves reliability of event publication for cache invalidation flows.
Reconciliation / anti-entropy jobs: essential in long-lived distributed cache ecosystems.

One hard-earned lesson: CQRS read models are often a cleaner answer than ever-more-intricate caching rules. If a query matters enough, own it explicitly.

Summary

Cache topology in microservices is not a tuning exercise. It is an architectural decision about who is allowed to remember, for how long, and with what business consequences.

Client cache is best for caller-local convenience, session reuse, and low-risk stale data.
Gateway cache is best for shared, reusable representations and broad acceleration at the edge.
Service cache is best when the owning domain can define semantics, invalidation, and correctness.

The decisive factor is not technology. It is domain meaning. A stale branch address is an inconvenience. A stale consent flag is a compliance breach. A stale inventory count is a sales problem. A stale payment state is a support nightmare. Different bounded contexts deserve different memory policies.

In migration, start tactically, then move semantics inward. Use a strangler approach. Let Kafka help with event-driven invalidation, but never trust events alone—add reconciliation. Measure drift, not just hit rate. Assume failures will happen. Design for repair.

If there is one line worth carrying into your next architecture review, it is this:

A cache is not just faster data. It is delayed truth with a lease.

Write those leases carefully.

Frequently Asked Questions

What is a service mesh?

A service mesh is an infrastructure layer managing service-to-service communication. It provides mutual TLS, load balancing, circuit breaking, retries, and observability without each service implementing these capabilities. Istio and Linkerd are common implementations.

How do you document microservices architecture for governance?

Use ArchiMate Application Cooperation diagrams for the service landscape, UML Component diagrams for internal structure, UML Sequence diagrams for key flows, and UML Deployment diagrams for Kubernetes topology. All views can coexist in Sparx EA with full traceability.

What is the difference between choreography and orchestration in microservices?

Choreography has services react to events independently — no central coordinator. Orchestration uses a central workflow engine that calls services in sequence. Choreography scales better but is harder to debug; orchestration is easier to reason about but creates a central coupling point.