orr benyamini

Posted on Dec 30 • Originally published at Medium

Version Your Cache Keys or Your Rolling Deployments Will Break

#java #architecture #backend #devops

Rolling deployments are designed to let multiple versions of a service run at the same time without downtime. Most teams think about compatibility at the API and database layers, but there’s another place where versions quietly interact:

the cache.

We ran into this during a production rollout, and the fix turned out to be simpler than the incident itself.

The incident

A production deployment started failing partway through the rollout.

Deserialization errors appeared in the logs
Request latency increased
Error rates climbed

The change that triggered it was small: we renamed a single field in a model class.

This was a backward-incompatible change. In general, such changes should be avoided, but sometimes they’re unavoidable — especially when consuming schemas owned by other teams or external systems.

We made the required changes to ensure downstream services would not break, and we updated our integration tests so they would pass. There were no API contract changes and no database migrations. From the perspective of our external dependencies, the change appeared safe.

What we didn’t anticipate was the cache.

The service was a data aggregator. It consumed upstream data, enriched it, and published results downstream. It didn’t own persistent storage. But during the rolling deployment, with two versions of the service running at the same time, new instances couldn’t deserialize cached data written by old instances — and old instances failed on data written by the new ones.

This wasn’t something our integration tests caught. The issue only emerged during the rollout, when multiple service versions were alive simultaneously.

The failure wasn’t in our APIs or database layer or even downstream dependencies.

It was in the shared cache.

When this problem exists (and when it doesn’t)

This issue doesn’t apply to every architecture.

If your cache is:

Local
In-memory
Scoped per instance or per host

Then each service version only sees data it wrote itself. Schema changes are naturally isolated.

The problem appears only when all of the following are true:

The cache is shared or distributed (Redis, Memcached, etc.)
Multiple service versions run simultaneously
Those versions read and write the same cache entries

In other words: rolling deployments + shared cache.

If that’s your setup, this failure mode isn’t rare — it’s inevitable.

What actually went wrong

Most teams treat cache as an internal optimization. But a shared cache is shared state, and shared state between independently deployed versions is effectively a contract.

During a rolling deployment, the cache outlives any single service version. Old code and new code both interact with it at the same time.

Here’s a simplified version of what happened:

14:23:01 - Deployment starts, new instances come up
14:23:15 - Old instance writes: {“userId”: 123, “userName”: “alice”}
14:23:18 - New instance reads same key, expects: {“userId”: 123, “fullName”: “alice”}
14:23:18 - Deserialization fails
14:23:19 - New instance writes: {“userId”: 456, “fullName”: “bob”}
14:23:20 - Old instance reads same key, expects: {“userId”: 456, “userName”: “bob”}
14:23:20 - Deserialization fails

Renaming a field introduced a breaking change — not at the API layer, but at the cache layer.

Why common alternatives don’t scale well

When this happens, teams usually consider one of the following:

Pausing or draining traffic during deploys
Flushing the entire cache
Coordinating tightly timed releases across teams

All of these can work, but they add operational overhead and reduce deployment flexibility. They also don’t scale well as systems and teams grow.

We wanted a solution that made rolling deployments boring again.

The solution: version your cache keys

Instead of using cache keys like this:

user:123

We added an explicit version prefix:

v1:user:123
v2:user:123

Each service version reads and writes only the keys that match its own version.

That’s it.

How we calculate the cache key version

One open question with cache key versioning is how to manage the version itself.

Hard-coding version numbers or manually bumping them works, but it’s easy to forget and adds process overhead. We wanted versioning to be automatic and transparent.

Instead of maintaining versions manually, we derive them directly from the structure of the model class being cached.

At a high level, this works as follows:

We inspect the model class using reflection
We extract its structural shape:
- Field names
- Field types
- Nested objects, recursively
From that canonical representation, we compute a hash
That hash becomes the version prefix for the cache key

Because the version is derived from the model structure:

Any structural change (renamed field, type change, added or removed field) produces a new version
If nothing changes in the model, the version remains the same
Version bumps happen automatically and only when needed

The version is per model class, not global. Each cached model evolves independently.

Conceptually, the cache key looks like this:

<model-structure-hash>:<logical-cache-key>

This gives us transparent version bumps on any model structure change, without requiring developers to remember to update cache versions during refactors.

Here is a simplified example in Java showing the concept of deriving a stable hash from a class structure using reflection:

public class RecursiveCacheVersioner {

    public static String getVersion(Class<?> rootClass) {
        // We use a Set to prevent infinite recursion on circular dependencies
        String schemaBuffer = buildSchemaString(rootClass, new HashSet<>());
        return hashString(schemaBuffer);
    }

    private static String buildSchemaString(Class<?> clazz, Set<Class<?>> visited) {
        // Base case: If we've seen this class or it's a basic type, just return the name
        if (isSimpleType(clazz) || visited.contains(clazz)) {
            return clazz.getCanonicalName();
        }

        visited.add(clazz);
        StringBuilder sb = new StringBuilder();
        sb.append(clazz.getSimpleName()).append("{");

        // Sort fields to ensure the hash is deterministic
        List<Field> fields = Arrays.stream(clazz.getDeclaredFields())
                .sorted(Comparator.comparing(Field::getName))
                .collect(Collectors.toList());

        for (Field field : fields) {
            sb.append(field.getName()).append(":");

            // RECURSIVE STEP: 
            // If the field is another model, we get its structural string too
            sb.append(buildSchemaString(field.getType(), visited));
            sb.append(";");
        }

        sb.append("}");
        return sb.toString();
    }

    private static boolean isSimpleType(Class<?> clazz) {
        return clazz.isPrimitive() || 
               clazz.getName().startsWith("java.lang") || 
               clazz.getName().startsWith("java.util");
    }

    private static String hashString(String input) {
    try {
        MessageDigest digest = MessageDigest.getInstance("SHA-256");
        byte[] hash = digest.digest(input.getBytes(StandardCharsets.UTF_8));

        // Java 17+ native hex formatting
        return HexFormat.of().formatHex(hash).substring(0, 8);
    } catch (Exception e) {
        return "default";
    }
}

What the generator sees: For a User class with an Address object, the builder generates a canonical string like this:

User{address:Address{city:String;zip:String;};id:int;name:String;}

This string is then hashed to create the version prefix.

Note on language support

This approach works best in strongly typed languages where type information is available at runtime (for example, Java, Kotlin, or C#). In these environments, reflection makes it possible to reliably inspect model structure and derive a stable version from it.

In more dynamic languages like JavaScript, where runtime type information is limited or implicit, the same technique may require a different approach — such as explicitly defined schemas, schema versioning, or build-time code generation. The underlying idea still applies, but the implementation details will differ.

Note on performance

Cache key versions are computed once per model class at service startup, not on every cache read or write. This keeps the runtime overhead negligible and ensures that cache operations remain as fast as before. Version calculation is part of initialization, while steady-state request handling stays unaffected.

Why this approach works well

Versioning cache keys allows old and new service versions to safely coexist during a rolling deployment. Each version reads and writes only the data it understands, so incompatible representations never collide.

This has a few important consequences:

No cache flushes are required during deployments
No coordination with other teams is needed
Breaking schema changes are isolated by version
Old and new versions can run side by side without errors

Because versions are derived automatically from the model structure, there’s also no manual version management. Refactors that change the model naturally invalidate incompatible cache entries, while compatible changes don’t cause unnecessary cache churn.

Once the deployment completes, older versions stop accessing the cache and their entries expire naturally according to TTL. The cache no longer forces compatibility between versions that were never meant to be compatible.

Most importantly, cache compatibility stays aligned with what actually matters: the shape of the data being serialized and deserialized.

When this approach is especially useful

Cache key versioning is particularly effective when:

You use shared caches across multiple service instances
You deploy frequently using rolling updates
You don’t fully control upstream schemas
You operate aggregator or middleware services

In our case, it allowed upstream provider teams to make changes without coordinating cache behavior with us, while keeping deployments safe on our side.

Trade-offs and limitations

Like most architectural decisions, cache key versioning comes with trade-offs that are worth understanding upfront.

Multiple versions in the cache
During a rolling deployment, multiple versions of the same logical object can exist in the cache at the same time. In our case, deployments take around 15 minutes. With a 1-hour TTL, this resulted in roughly double the cache entries for a short period.

Higher temporary cache usage
This approach trades memory for safety. For us, the impact was small: cache utilization increased from ~45% to ~52% during deployments and returned to normal once older entries expired.

Not suitable for every environment
If cache memory is extremely constrained, or if your system requires strict cross-version consistency during deployments, this approach may not be a good fit.

Intentional cache misses during transitions
New versions will miss the cache on first access and recompute values. This is expected and intentional — it’s safer than attempting to deserialize incompatible data. Cache TTLs and the cache-miss path should be designed accordingly.

What you gain is safer rolling deployments, simpler operational behavior, and reduced risk during schema changes.

What you give up is some cache efficiency during deployments.

For us, that trade-off was well worth it.

A better mental model for caches

Caches are often treated as implementation details. In reality, a shared cache is part of your system’s runtime contract.

If multiple versions of a service can read and write the same cached data, then cache compatibility matters just as much as API or storage compatibility.

Versioning cache keys makes that contract explicit.

Final takeaway

If you’re doing rolling deployments and using a shared cache:

Version your cache keys by default.

It’s a small change with a large impact on reliability, and it keeps deployments predictable as systems evolve.

DEV Community