DEV Community

Cover image for Understanding Kafka Architecture - The Complete Mental Model 🧠
Ajinkya Singh
Ajinkya Singh

Posted on • Edited on

Understanding Kafka Architecture - The Complete Mental Model 🧠

How all the pieces fit together to create a powerful streaming platform


The Goal

Understand the "Big Picture" - How events, topics, partitions, producers, consumers, brokers, and consumer groups all work together as one cohesive system.

Think of this as getting a bird's eye view of the entire Kafka ecosystem! 🦅


Building Block #1: The Event (Foundation)

What It Is

The fundamental unit - an immutable fact representing something that happened.

┌─────────────────────────────────────┐
│           EVENT/RECORD              │
├─────────────────────────────────────┤
│ Key: user_456                       │
│ Value: {"action": "purchase"}       │
│ Timestamp: 2025-11-18 14:30:00     │
└─────────────────────────────────────┘

Everything in Kafka revolves around these!
Enter fullscreen mode Exit fullscreen mode

Building Block #2: The Kafka Cluster (Infrastructure)

What It Is

A collection of servers working together - NOT just one server!

        KAFKA CLUSTER
┌─────────────────────────────────┐
│                                 │
│  ┌────────┐  ┌────────┐        │
│  │Broker 1│  │Broker 2│  ...   │
│  │Server 1│  │Server 2│        │
│  └────────┘  └────────┘        │
│                                 │
│  ┌────────┐  ┌────────┐        │
│  │Broker 3│  │Broker 4│  ...   │
│  │Server 3│  │Server 4│        │
│  └────────┘  └────────┘        │
│                                 │
└─────────────────────────────────┘

Network of powerful servers!
Enter fullscreen mode Exit fullscreen mode

What Brokers Do

  • Store your events
  • Handle requests from applications
  • Ensure the system stays available even if one fails

Why Multiple Brokers?

  1. Scalability → Handle massive amounts of data
  2. Fault Tolerance → Keep running even if servers fail

Modern Kafka (4.0+)

  • Brokers are self-managing using KRaft protocol
  • They coordinate with each other internally
  • No external ZooKeeper needed! 🎉

Visualize: A resilient network of powerful servers ready to handle your data streams.


Building Block #3: Topics (Organization)

What It Is

A logical name/category for a stream of related events.

KAFKA CLUSTER
├── Topic: "user-signups" 👤
├── Topic: "payment-transactions" 💰
├── Topic: "sensor-readings" 🌡️
└── Topic: "order-events" 📦
Enter fullscreen mode Exit fullscreen mode

Key Characteristics

1. Distributed Across Brokers

Single topic doesn't live on just ONE broker:

Topic: "orders"
├── Partition 0 → Broker 1
├── Partition 1 → Broker 2
└── Partition 2 → Broker 3

This distribution = SCALE! 🚀
Enter fullscreen mode Exit fullscreen mode

2. Durable Storage

  • Events stored for configurable retention period
  • Can be re-read multiple times
  • Not deleted after consumption

Building Block #4: Partitions (Parallelism)

What It Is

Each topic is divided into ordered lanes called partitions.

The Multi-Lane Highway Analogy 🛣️

Topic: "orders" (3 partitions)

┌────────────────────────────────────────────────────┐
│               MULTI-LANE HIGHWAY                   │
├────────────────────────────────────────────────────┤
│                                                    │
│  Lane 0 (Partition 0): Order1 → Order2 → Order3  │
│  ═════════════════════════════════════════════►   │
│                                                    │
│  Lane 1 (Partition 1): Order4 → Order5 → Order6  │
│  ═════════════════════════════════════════════►   │
│                                                    │
│  Lane 2 (Partition 2): Order7 → Order8 → Order9  │
│  ═════════════════════════════════════════════►   │
│                                                    │
└────────────────────────────────────────────────────┘

Each lane (partition) processes traffic (events) 
independently but IN ORDER within that lane!
Enter fullscreen mode Exit fullscreen mode

Key Properties

1. Ordered Within Partition ✅

Partition 0:
Event A (offset 0) → Event B (offset 1) → Event C (offset 2)

Consumer always sees: A, then B, then C
ORDER GUARANTEED within the partition!
Enter fullscreen mode Exit fullscreen mode

2. NO Order Across Partitions ❌

Partition 0: Event A (time: 10:00)
Partition 1: Event B (time: 09:59)

Consumer might see B before A
NO ORDER GUARANTEE across different partitions!
Enter fullscreen mode Exit fullscreen mode

3. Each Partition Lives on a Broker

Topic: "payments" (3 partitions)

Partition 0 → Broker 1 (Server 1)
Partition 1 → Broker 2 (Server 2)
Partition 2 → Broker 3 (Server 3)

Load is DISTRIBUTED across servers! ⚖️
Enter fullscreen mode Exit fullscreen mode

Why Partitions?

  • Enable parallelism → Multiple producers/consumers work simultaneously
  • Distribute load → Spread data across multiple servers
  • Scale horizontally → Add more partitions = more throughput

Building Block #5: Producers (Data Writers)

What It Is

Your application code that sends/publishes events to Kafka topics.

         PRODUCERS (Entry Ramps)

Mobile App 📱 ──┐
                │
Web Server 🌐 ──┼──► Kafka Topic: "events"
                │      ├─► Partition 0
IoT Device 🌡️ ──┘      ├─► Partition 1
                       └─► Partition 2
Enter fullscreen mode Exit fullscreen mode

How Producers Work

Option 1: Automatic Partition Selection (No Key)

Producer sends events WITHOUT key:

Event 1 → Partition 0 (round-robin)
Event 2 → Partition 1 (round-robin)
Event 3 → Partition 2 (round-robin)
Event 4 → Partition 0 (round-robin)
...

Result: EVEN DISTRIBUTION across partitions
Enter fullscreen mode Exit fullscreen mode

Option 2: Key-Based Routing (With Key)

Producer sends events WITH key:

Event (key: user_123) → Partition 1
Event (key: user_123) → Partition 1 (SAME!)
Event (key: user_456) → Partition 2
Event (key: user_456) → Partition 2 (SAME!)
Event (key: user_123) → Partition 1 (SAME!)

Result: ALL events with SAME KEY go to SAME PARTITION
        This maintains ORDER for related events! 🎯
Enter fullscreen mode Exit fullscreen mode

Visual Example: Key-Based Routing

Producer: E-commerce Website

Order from user_123:
┌──────────────────────┐
│ Key: user_123        │
│ Value: Order details │
└──────────────────────┘
         ↓
    Kafka hashes key
         ↓
    Always → Partition 1

Another order from user_123:
┌──────────────────────┐
│ Key: user_123        │
│ Value: Order details │
└──────────────────────┘
         ↓
    Kafka hashes key
         ↓
    Always → Partition 1 (SAME!)

✅ All user_123 orders processed IN ORDER!
Enter fullscreen mode Exit fullscreen mode

Producer Behavior

  • Asynchronous → Send and move on (don't wait for consumer)
  • High throughput → Can send thousands of events per second
  • Fire and forget → Ensures speed

Visualize: Entry ramps onto a highway, directing traffic into specific lanes.


Building Block #6: Consumers (Data Readers)

What It Is

Your application code that reads/subscribes to events from topics.

         Kafka Topic: "orders"
                 ↓
         ┌───────┴───────┐
         │               │
    Consumer A      Consumer B
         ↓               ↓
   Analytics App    Email Service

Each reads INDEPENDENTLY with its own position (offset)
Enter fullscreen mode Exit fullscreen mode

Key Properties

1. Pull-Based Model

Traditional Systems:        Kafka:
Server → PUSHES → Client   Client ← PULLS ← Server

Benefits of Pull:
✅ Consumer controls pace
✅ Can process at own speed
✅ Can pause/resume
Enter fullscreen mode Exit fullscreen mode

2. Independent Reading

Multiple consumers can read SAME topic:

Topic: "transactions"
     ↓
     ├──► Consumer A (reads everything)
     ├──► Consumer B (reads everything)
     └──► Consumer C (reads everything)

Each maintains its OWN offset (reading position)
Nobody affects anyone else! 🎭
Enter fullscreen mode Exit fullscreen mode

3. Offset Tracking

Partition 0:
┌────┬────┬────┬────┬────┬────┐
│ 0  │ 1  │ 2  │ 3  │ 4  │ 5  │ ...
└────┴────┴────┴────┴────┴────┘
              ↑
         Consumer's
         current offset
         (remembers position)

If consumer stops and restarts:
✅ Resumes from last offset (position 2)
✅ No messages skipped
✅ No messages duplicated
Enter fullscreen mode Exit fullscreen mode

Building Block #7: Consumer Groups (Team Work)

What It Is

A collection of consumer instances working together as a team to process events.

The Team Analogy 👥

Team A (Consumer Group "analytics"):
Worker 1, Worker 2, Worker 3

Team B (Consumer Group "email"):
Worker 4, Worker 5

Team C (Consumer Group "archiving"):
Worker 6, Worker 7, Worker 8

Each TEAM gets its own FULL COPY of the event stream!
Enter fullscreen mode Exit fullscreen mode

How Consumer Groups Work

Rule: One Partition = One Consumer (within group)

Topic: "orders" (3 partitions)

Consumer Group "order-processors" (3 consumers):

Partition 0 ──► Consumer A ┐
Partition 1 ──► Consumer B ├─ Group "order-processors"
Partition 2 ──► Consumer C ┘

✅ Each partition assigned to EXACTLY ONE consumer
✅ Work is DIVIDED among team members
✅ Parallel processing! ⚡
Enter fullscreen mode Exit fullscreen mode

Example: Load Distribution

Scenario 1: More partitions than consumers

Topic: 4 partitions
Group: 2 consumers

Partition 0 ──┐
Partition 1 ──┼──► Consumer A
               │
Partition 2 ──┤
Partition 3 ──┴──► Consumer B

Each consumer handles 2 partitions
Enter fullscreen mode Exit fullscreen mode
Scenario 2: More consumers than partitions

Topic: 2 partitions
Group: 3 consumers

Partition 0 ──► Consumer A
Partition 1 ──► Consumer B
                Consumer C (IDLE - no partition assigned)

Extra consumers sit idle (but ready for failover!)
Enter fullscreen mode Exit fullscreen mode
Scenario 3: Perfect match

Topic: 3 partitions
Group: 3 consumers

Partition 0 ──► Consumer A
Partition 1 ──► Consumer B
Partition 2 ──► Consumer C

Perfectly balanced! ⚖️
Enter fullscreen mode Exit fullscreen mode

Multiple Consumer Groups (Independent Processing)

Topic: "news-feed"
     │
     ├──► Group A "website-updates"
     │    ├─ Consumer 1 → Partition 0
     │    ├─ Consumer 2 → Partition 1
     │    └─ Consumer 3 → Partition 2
     │
     ├──► Group B "archiving"
     │    ├─ Consumer 1 → Partition 0
     │    ├─ Consumer 2 → Partition 1
     │    └─ Consumer 3 → Partition 2
     │
     └──► Group C "sentiment-analysis"
          └─ Consumer 1 → All partitions

✅ Each group processes SAME data INDEPENDENTLY
✅ Each group maintains its OWN offsets
✅ Groups don't affect each other
Enter fullscreen mode Exit fullscreen mode

Automatic Failover (Self-Healing)

Before failure:
Partition 0 ──► Consumer A ✅
Partition 1 ──► Consumer B ✅
Partition 2 ──► Consumer C ✅

Consumer B fails! 💥

After automatic rebalancing (seconds):
Partition 0 ──► Consumer A ✅
Partition 1 ──► Consumer A ✅ (took over!)
Partition 2 ──► Consumer C ✅

Or:
Partition 0 ──► Consumer A ✅
Partition 1 ──► Consumer C ✅ (took over!)
Partition 2 ──► Consumer C ✅

✅ No data loss!
✅ Processing continues!
Enter fullscreen mode Exit fullscreen mode

Visualize: Teams of workers where each team processes the full stream, but within each team, workers divide up the lanes (partitions) to work in parallel.


THE GRAND PICTURE: How Everything Works Together 🎯

Complete Data Flow

STEP 1: PRODUCERS CREATE EVENTS
┌────────────────────────────────────────┐
│ Mobile App, Website, IoT Devices, etc. │
└────────────────┬───────────────────────┘
                 ↓
          Generate Events

STEP 2: EVENTS SENT TO TOPICS
┌─────────────────────────────────────────┐
│  Event with key "user_123"              │
│  → Kafka hashes key                     │
│  → Routes to specific partition         │
└────────────────┬────────────────────────┘
                 ↓
           Topic: "orders"

STEP 3: PARTITIONS STORE EVENTS
┌─────────────────────────────────────────┐
│ Partition 0 (Broker 1): [E1, E2, E3]   │
│ Partition 1 (Broker 2): [E4, E5, E6]   │
│ Partition 2 (Broker 3): [E7, E8, E9]   │
└────────────────┬────────────────────────┘
                 ↓
        Ordered, Immutable Log

STEP 4: CONSUMER GROUPS PULL EVENTS
┌─────────────────────────────────────────┐
│ Group "analytics":                      │
│   Consumer A reads Partition 0          │
│   Consumer B reads Partition 1          │
│   Consumer C reads Partition 2          │
│                                         │
│ Group "email":                          │
│   Consumer D reads Partition 0          │
│   Consumer E reads Partition 1, 2       │
└────────────────┬────────────────────────┘
                 ↓
         Process in parallel
         at their own pace
Enter fullscreen mode Exit fullscreen mode

Visual: Complete System Architecture

┌─────────────────────────────────────────────────────────────┐
│                    KAFKA CLUSTER                            │
│                                                             │
│   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐     │
│   │Broker 1 │  │Broker 2 │  │Broker 3 │  │Broker 4 │     │
│   ├─────────┤  ├─────────┤  ├─────────┤  ├─────────┤     │
│   │ P0 (L)  │  │ P1 (L)  │  │ P2 (L)  │  │ P3 (L)  │     │
│   │ P1 (F)  │  │ P2 (F)  │  │ P3 (F)  │  │ P0 (F)  │     │
│   │ P2 (F)  │  │ P3 (F)  │  │ P0 (F)  │  │ P1 (F)  │     │
│   └─────────┘  └─────────┘  └─────────┘  └─────────┘     │
│        ↑                           ↓                       │
│    WRITE                         READ                      │
└────────┼───────────────────────────┼──────────────────────┘
         │                           │
    ┌────┴────┐               ┌─────┴──────┐
    │PRODUCERS│               │CONSUMER    │
    │         │               │GROUPS      │
    │📱 App   │               │            │
    │🌐 Web   │               │Group A:    │
    │🌡️ IoT   │               │ C1, C2, C3 │
    │         │               │            │
    └─────────┘               │Group B:    │
                              │ C4, C5     │
                              └────────────┘

Legend:
P0 = Partition 0
(L) = Leader
(F) = Follower (replica)
Enter fullscreen mode Exit fullscreen mode

Real-World Example: E-Commerce Order System

The Complete Flow

SCENARIO: Customer places an order on website

1️⃣ PRODUCER (Website) creates event:
┌──────────────────────────────────┐
│ Key: customer_789                │
│ Value: {                         │
│   order_id: "ORD-456",          │
│   items: ["laptop", "mouse"],   │
│   total: 1200                    │
│ }                                │
└──────────────────────────────────┘
         ↓

2️⃣ Kafka routes to TOPIC and PARTITION:
Topic: "orders"
Key "customer_789" → Partition 1 (always same partition!)
         ↓

3️⃣ BROKERS store in partition:
Broker 2 (Leader for Partition 1):
┌────────────────────────────────┐
│ Partition 1:                   │
│ Offset 100: ORD-453           │
│ Offset 101: ORD-454           │
│ Offset 102: ORD-456 ← NEW!    │
└────────────────────────────────┘

Broker 3 (Follower):           Broker 4 (Follower):
┌──────────────────────┐       ┌──────────────────────┐
│ Partition 1 (copy):  │       │ Partition 1 (copy):  │
│ Offset 102: ORD-456  │       │ Offset 102: ORD-456  │
└──────────────────────┘       └──────────────────────┘
         ↓                              ↓
    REPLICATED for durability!

4️⃣ MULTIPLE CONSUMER GROUPS process independently:

Group "payment-processing":
  Consumer A reads Partition 1 → Charges credit card

Group "inventory":
  Consumer B reads Partition 1 → Updates stock

Group "email":
  Consumer C reads Partition 1 → Sends confirmation

Group "analytics":
  Consumer D reads Partition 1 → Updates dashboard

✅ All process SAME order
✅ All work INDEPENDENTLY
✅ Each at their own pace
Enter fullscreen mode Exit fullscreen mode

Key Principles That Make It All Work

1. Distribution

┌──────────────────────────────────┐
│ Work spread across many servers  │
│ ✅ Scalability                   │
│ ✅ Parallel processing           │
└──────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

2. Immutability

┌──────────────────────────────────┐
│ Events never change or deleted   │
│ ✅ Can be replayed               │
│ ✅ Multiple consumers can read   │
└──────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

3. Parallelism

┌──────────────────────────────────┐
│ Multiple partitions processed    │
│ simultaneously                    │
│ ✅ High throughput               │
│ ✅ Efficient resource use        │
└──────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Fault Tolerance in Action

When Broker Fails

Before:
Broker 1 (P0-Leader) ✅
Broker 2 (P0-Follower) ✅
Broker 3 (P0-Follower) ✅

Broker 1 fails! 💥

After (2-3 seconds):
Broker 1 (P0-Leader) 💀
Broker 2 (P0-Leader) ⭐ Promoted!
Broker 3 (P0-Follower) ✅

✅ System keeps running
✅ No data loss
Enter fullscreen mode Exit fullscreen mode

When Consumer Fails

Before:
Partition 0 → Consumer A ✅
Partition 1 → Consumer B ✅
Partition 2 → Consumer C ✅

Consumer B fails! 💥

After (seconds):
Partition 0 → Consumer A ✅
Partition 1 → Consumer A ✅ Took over!
Partition 2 → Consumer C ✅

✅ Processing continues
✅ No events missed
Enter fullscreen mode Exit fullscreen mode

Summary: The Mental Model Checklist

The 7 Components

Events - The data (immutable facts)
Cluster - Network of servers
Brokers - Individual servers in cluster
Topics - Categories for events
Partitions - Ordered lanes within topics
Producers - Apps that write events
Consumers - Apps that read events
Consumer Groups - Teams that work together

The Flow

Producers → Topics → Partitions → Brokers
                                    ↓
                            Consumer Groups
Enter fullscreen mode Exit fullscreen mode

The Guarantees

  • Order within a partition
  • Scalability through distribution
  • Durability through replication
  • Fault tolerance through automatic failover
  • Parallel processing through partitions and consumer groups

Your Mental Model

Think of Kafka as:

🏭 A highly organized factory where:
   • Multiple assembly lines (partitions) run in parallel
   • Workers (producers) add items to lines
   • Quality checkers (consumers) inspect items
   • Teams (consumer groups) divide the work
   • Multiple facilities (brokers) ensure continuity
   • Everything is tracked and never lost
Enter fullscreen mode Exit fullscreen mode

You now have a complete bird's eye view of Apache Kafka! 🦅

This mental model will be invaluable as you build applications and dive deeper into Kafka's capabilities. Every detail you learn will fit into this bigger picture! 🎯

Top comments (0)