Rolling deployments are designed to let multiple versions of a service run at the same time without downtime. Most teams think about compatibility at the API and database layers, but there’s another place where versions quietly interact:
the cache.
We ran into this during a production rollout, and the fix turned out to be simpler than the incident itself.
The incident
A production deployment started failing partway through the rollout.
- Deserialization errors appeared in the logs
- Request latency increased
- Error rates climbed
The change that triggered it was small: we renamed a single field in a model class.
This was a backward-incompatible change. In general, such changes should be avoided, but sometimes they’re unavoidable — especially when consuming schemas owned by other teams or external systems.
We made the required changes to ensure downstream services would not break, and we updated our integration tests so they would pass. There were no API contract changes and no database migrations. From the perspective of our external dependencies, the change appeared safe.
What we didn’t anticipate was the cache.
The service was a data aggregator. It consumed upstream data, enriched it, and published results downstream. It didn’t own persistent storage. But during the rolling deployment, with two versions of the service running at the same time, new instances couldn’t deserialize cached data written by old instances — and old instances failed on data written by the new ones.
This wasn’t something our integration tests caught. The issue only emerged during the rollout, when multiple service versions were alive simultaneously.
The failure wasn’t in our APIs or database layer or even downstream dependencies.
It was in the shared cache.
When this problem exists (and when it doesn’t)
This issue doesn’t apply to every architecture.
If your cache is:
- Local
- In-memory
- Scoped per instance or per host
Then each service version only sees data it wrote itself. Schema changes are naturally isolated.
The problem appears only when all of the following are true:
- The cache is shared or distributed (Redis, Memcached, etc.)
- Multiple service versions run simultaneously
- Those versions read and write the same cache entries
In other words: rolling deployments + shared cache.
If that’s your setup, this failure mode isn’t rare — it’s inevitable.
What actually went wrong
Most teams treat cache as an internal optimization. But a shared cache is shared state, and shared state between independently deployed versions is effectively a contract.
During a rolling deployment, the cache outlives any single service version. Old code and new code both interact with it at the same time.
Here’s a simplified version of what happened:
14:23:01 - Deployment starts, new instances come up
14:23:15 - Old instance writes: {“userId”: 123, “userName”: “alice”}
14:23:18 - New instance reads same key, expects: {“userId”: 123, “fullName”: “alice”}
14:23:18 - Deserialization fails
14:23:19 - New instance writes: {“userId”: 456, “fullName”: “bob”}
14:23:20 - Old instance reads same key, expects: {“userId”: 456, “userName”: “bob”}
14:23:20 - Deserialization fails
Renaming a field introduced a breaking change — not at the API layer, but at the cache layer.
Why common alternatives don’t scale well
When this happens, teams usually consider one of the following:
- Pausing or draining traffic during deploys
- Flushing the entire cache
- Coordinating tightly timed releases across teams
All of these can work, but they add operational overhead and reduce deployment flexibility. They also don’t scale well as systems and teams grow.
We wanted a solution that made rolling deployments boring again.
The solution: version your cache keys
Instead of using cache keys like this:
user:123
We added an explicit version prefix:
v1:user:123
v2:user:123
Each service version reads and writes only the keys that match its own version.
That’s it.
How we calculate the cache key version
One open question with cache key versioning is how to manage the version itself.
Hard-coding version numbers or manually bumping them works, but it’s easy to forget and adds process overhead. We wanted versioning to be automatic and transparent.
Instead of maintaining versions manually, we derive them directly from the structure of the model class being cached.
At a high level, this works as follows:
- We inspect the model class using reflection
- We extract its structural shape:
- Field names
- Field types
- Nested objects, recursively
- From that canonical representation, we compute a hash
- That hash becomes the version prefix for the cache key
Because the version is derived from the model structure:
- Any structural change (renamed field, type change, added or removed field) produces a new version
- If nothing changes in the model, the version remains the same
- Version bumps happen automatically and only when needed
The version is per model class, not global. Each cached model evolves independently.
Conceptually, the cache key looks like this:
<model-structure-hash>:<logical-cache-key>
This gives us transparent version bumps on any model structure change, without requiring developers to remember to update cache versions during refactors.
Here is a simplified example in Java showing the concept of deriving a stable hash from a class structure using reflection:
public class RecursiveCacheVersioner {
public static String getVersion(Class<?> rootClass) {
// We use a Set to prevent infinite recursion on circular dependencies
String schemaBuffer = buildSchemaString(rootClass, new HashSet<>());
return hashString(schemaBuffer);
}
private static String buildSchemaString(Class<?> clazz, Set<Class<?>> visited) {
// Base case: If we've seen this class or it's a basic type, just return the name
if (isSimpleType(clazz) || visited.contains(clazz)) {
return clazz.getCanonicalName();
}
visited.add(clazz);
StringBuilder sb = new StringBuilder();
sb.append(clazz.getSimpleName()).append("{");
// Sort fields to ensure the hash is deterministic
List<Field> fields = Arrays.stream(clazz.getDeclaredFields())
.sorted(Comparator.comparing(Field::getName))
.collect(Collectors.toList());
for (Field field : fields) {
sb.append(field.getName()).append(":");
// RECURSIVE STEP:
// If the field is another model, we get its structural string too
sb.append(buildSchemaString(field.getType(), visited));
sb.append(";");
}
sb.append("}");
return sb.toString();
}
private static boolean isSimpleType(Class<?> clazz) {
return clazz.isPrimitive() ||
clazz.getName().startsWith("java.lang") ||
clazz.getName().startsWith("java.util");
}
private static String hashString(String input) {
try {
MessageDigest digest = MessageDigest.getInstance("SHA-256");
byte[] hash = digest.digest(input.getBytes(StandardCharsets.UTF_8));
// Java 17+ native hex formatting
return HexFormat.of().formatHex(hash).substring(0, 8);
} catch (Exception e) {
return "default";
}
}
What the generator sees: For a User class with an Address object, the builder generates a canonical string like this:
User{address:Address{city:String;zip:String;};id:int;name:String;}
This string is then hashed to create the version prefix.
Note on language support
This approach works best in strongly typed languages where type information is available at runtime (for example, Java, Kotlin, or C#). In these environments, reflection makes it possible to reliably inspect model structure and derive a stable version from it.
In more dynamic languages like JavaScript, where runtime type information is limited or implicit, the same technique may require a different approach — such as explicitly defined schemas, schema versioning, or build-time code generation. The underlying idea still applies, but the implementation details will differ.
Note on performance
Cache key versions are computed once per model class at service startup, not on every cache read or write. This keeps the runtime overhead negligible and ensures that cache operations remain as fast as before. Version calculation is part of initialization, while steady-state request handling stays unaffected.
Why this approach works well
Versioning cache keys allows old and new service versions to safely coexist during a rolling deployment. Each version reads and writes only the data it understands, so incompatible representations never collide.
This has a few important consequences:
- No cache flushes are required during deployments
- No coordination with other teams is needed
- Breaking schema changes are isolated by version
- Old and new versions can run side by side without errors
Because versions are derived automatically from the model structure, there’s also no manual version management. Refactors that change the model naturally invalidate incompatible cache entries, while compatible changes don’t cause unnecessary cache churn.
Once the deployment completes, older versions stop accessing the cache and their entries expire naturally according to TTL. The cache no longer forces compatibility between versions that were never meant to be compatible.
Most importantly, cache compatibility stays aligned with what actually matters: the shape of the data being serialized and deserialized.
When this approach is especially useful
Cache key versioning is particularly effective when:
- You use shared caches across multiple service instances
- You deploy frequently using rolling updates
- You don’t fully control upstream schemas
- You operate aggregator or middleware services
In our case, it allowed upstream provider teams to make changes without coordinating cache behavior with us, while keeping deployments safe on our side.
Trade-offs and limitations
Like most architectural decisions, cache key versioning comes with trade-offs that are worth understanding upfront.
Multiple versions in the cache
During a rolling deployment, multiple versions of the same logical object can exist in the cache at the same time. In our case, deployments take around 15 minutes. With a 1-hour TTL, this resulted in roughly double the cache entries for a short period.
Higher temporary cache usage
This approach trades memory for safety. For us, the impact was small: cache utilization increased from ~45% to ~52% during deployments and returned to normal once older entries expired.
Not suitable for every environment
If cache memory is extremely constrained, or if your system requires strict cross-version consistency during deployments, this approach may not be a good fit.
Intentional cache misses during transitions
New versions will miss the cache on first access and recompute values. This is expected and intentional — it’s safer than attempting to deserialize incompatible data. Cache TTLs and the cache-miss path should be designed accordingly.
What you gain is safer rolling deployments, simpler operational behavior, and reduced risk during schema changes.
What you give up is some cache efficiency during deployments.
For us, that trade-off was well worth it.
A better mental model for caches
Caches are often treated as implementation details. In reality, a shared cache is part of your system’s runtime contract.
If multiple versions of a service can read and write the same cached data, then cache compatibility matters just as much as API or storage compatibility.
Versioning cache keys makes that contract explicit.
Final takeaway
If you’re doing rolling deployments and using a shared cache:
Version your cache keys by default.
It’s a small change with a large impact on reliability, and it keeps deployments predictable as systems evolve.



Top comments (0)