Kader Khan

Posted on Dec 30

Event Sourcing - System Design Pattern

#devops #aws #cloudnative #systemdesign

“Imagine every action in your system writes to a timeline. This timeline can be read later to rebuild any version of the system — like time travel.”

✅ The Problem with Traditional CRUD Systems

In traditional systems (like most apps we’ve built):

We update the database to change state (e.g., set status = “processed”)
We overwrite old values
We lose history — we only store the latest state

📌 This leads to real problems such as:

No audit trail
We often can’t answer questions like: “What exactly happened to this order between 10:01 and 10:03?”
Inconsistencies due to partial failures
If part of a workflow fails (e.g., processing succeeds, but updating state fails), the system goes into an inconsistent state with no clear way to fix it.
Hard to debug or replay history
we cannot rewind to a point in time and reconstruct what state should have been.

👉 As systems scale with heavy workloads, these problems get worse. we need a better way to track changes than just “update this value now.”

✅ Event Sourcing — The Core Idea (Solved Problem)

Event Sourcing says:
👉 Instead of saving only the current state in the database, save every change as an event in order.

These events are:

✔ Immutable (never changed after they’re written)
✔ Ordered (every event has a timestamp or sequence)
✔ Replayed to reconstruct the current state

So instead of doing:

Product Price
100

We store events like:

PriceChanged from 90 ➝ 100 at 10:01AM
PriceChanged from 100 ➝ 110 at 10:10AM

To compute the current state, we simply replay those events.

💡 What Event Sourcing Solves (In Simple Terms)

Traditional CRUD	Event Sourcing
Only current state	Full history of all changes
Hard to track why something happened	we can replay to see why something happened
Race conditions can corrupt data	we always record events in a safe log
Hard to debug	we’ve got an audit trail

So the problem being solved is not just scaling — it’s:

“How do we store every change in a way we can trace, debug, and rebuild the system state reliably?”

📦 Event Sourcing Architecture (AWS)

🧱 AWS Architecture Example — Ride Booking (From AWS Guidance)

AWS provides a real architecture pattern for event sourcing:

1. User Action — Client Calls API Gateway

A user does something, e.g., Book a Ride.
This request first hits Amazon API Gateway, which exposes a public API endpoint.

2. Lambda Writes an Event to AWS Kinesis(Kafka in AWS)

The Lambda function acts as a command handler:

✔ It checks business logic
✔ It creates an event like RideBooked
✔ It sends this event to Amazon Kinesis Data Streams — an append-only event storage and streaming service

📌 Why Kinesis?
Because it can handle very high write throughput and acts as an event log we can replay.

3. Events Are Stored & Archived

Kinesis doesn’t just stream — we can also:

✔ Archive events in Amazon S3 for long-term retention (for compliance & audits)
✔ Retain events for replay or future analysis

This means our system generates a complete history of every change, backed up indefinitely.

4. Event Processor Lambda Builds Materialized Views

Another Lambda function consumes events from Kinesis to build read models (optimized tables that are easy to query). Typical read stores are:

✔ Amazon Aurora (MySQL/PostgreSQL)
✔ Amazon DynamoDB

This process creates current state views for read-heavy use cases.

5. Replay to Rebuild State (Hydration Model)

If something goes wrong, or we want to compute state at any point in time, we simply replay the events using hydration model stored in Kinesis + archived in S3.

This is what it calls Hydration — re-deriving the current or historical state of the system from the event log.

🧠 Hydration Model Explained (Simple)

Think of hydration as:

🎬 Re-running the entire timeline of events
so that our system always ends up in the correct state.

For example Video streaming platform, in this service example:

Event 1: VideoUploaded
Event 2: VideoProcessingStarted
Event 3: VideoProcessingSucceeded

To know current state:

state = "initial"
apply VideoUploaded → state="uploaded"
apply VideoProcessingStarted → state="processing"
apply VideoProcessingSucceeded → state="success"

That’s Hydration — it rebuilds state by replaying events in order, not by reading a single “status” value.

🐘 Why Kafka or Kinesis Are Used

Both Kafka (used in the transcript example) and Kinesis (AWS alternative) are event streaming platforms — essentially massive, durable, ordered logs of events. Also these are make sure specially, consumer group and topic partitions concepts make sure processors are getting sequential events and patch those sequentially too.

Why this matters

✔ We can replay events — essential for event sourcing
✔ We can scale horizontally (many consumers)
✔ We guarantee event order within partitions — crucial for replay and consistent state reconstruction

📌 Consumer Groups & Topic Partitions (Why They Matter)

When the event volume is large, we cannot have one server read everything.

So we use:

🔹 Kafka Consumer Group

Multiple workers that form a group and share work.
Each worker gets assigned partitions so no duplicates occur.

🔹 Topic Partitions

A topic (event category) is split into partitions — think of partitions as divided lanes of the event log. This allows:

✔ Parallel processing
✔ Ordered event consumption per partition
✔ Scale without losing order for each entity

For example in the video streaming platform pipelines, video A events are always in partition 0 and video B in partition 1, so events for each video are always processed in order even across many workers.

Problem Being Solved

Traditional system:

Database:
video_id | status
------------------
123      | "processing"

Problems:
✔ What if the update failed?
✔ What do you show to the user?
✔ What if you need to know the exact steps the video went through?

Event Sourcing Pattern solves it:

Event Log:
1. VideoUploaded(videoID=123)
2. VideoProcessingStarted(videoID=123)
3. VideoProcessingProgress(videoID=123, percent=50)
4. VideoProcessingFailed(videoID=123, error="timeout")

To get state:

Hydration Model reads:

apply VideoUploaded → status="uploaded"
apply VideoProcessingStarted → status="processing"
apply VideoProcessingProgress → status="processing:50%"
apply VideoProcessingFailed → status="failed"

We can even show why the failure happened — something impossible with simple CRUD.

🧩 AWS Services we can use

Role	AWS Service
API entrypoint	API Gateway
Command processor	AWS Lambda
Event storage	Kinesis Data Streams
Archive & audit log	Amazon S3
Event distribution	EventBridge / DynamoDB Streams
Read-optimized views	Aurora / DynamoDB
Async processing	Lambda consumers

DEV Community