Manoj Kumar Vemula

Posted on Dec 16

Splunk Basics

#splunk #soc #security #siem

Splunk Architecture

In splunk logs are Collected, Processed, Stored, Searched & Analyzed through 3 main components :

Data Source → Forwarder → Indexer → Search Head → User

I] Data Input Stage

1}Splunk Forwarder (Data Collection Stage) :
It is a lightweight agent installed on machines where logs are generated.
Examples of machines :
Linux servers, Windows servers, Web servers, Firewalls, Routers, Databases, Applications.

Its job is to collect logs and send them to splunk Indexer.
We need forwarders because :
Imagine :
5,000 servers generating logs, we cannot manually copy logs from each server. So Splunk Forwarder does this job.
Advantages :
Very Low CPU usage.
Runs in background.
Can scale up to tens of thousands of machines.
Secure(SSl/TLS) real time log forwarding.

Types of Splunk Forwarders :
Universal Forwarder : In short, its non-intelligent and only forwards logs no other job
Heavy Forwarder : It filters logs based on our requirement.

At this point data is not event, data is just raw.
EX : JULY date sever2 sshd[1234]: Failed Password for root from

Key Points
->Splunk doesn't process the entire file at once, It cuts incoming data into 64 kilobyte chunks. These are data blocks.
->Splunk adds labels[metadata] to each 64k block.
Metadata : Information about the data, like :
host
source
source type

After the data/logs reached to splunk they must be stored right!!. They store in indexes.
2}Indexes : It's crucial, as it decides which index data goes into. There are different indexes by your choice and also default indexes.
Example :
index = security, index = firewall, index = authentication.

End of Data Input Stage :
Data is still raw.
Data is tagged with meta data.
Data is not searchable yet.

II] Data storage Stage(RAW data becomes usefull)

This stage has 2 phases
i)Parsing
ii)Indexing

Parsing :
Parsing can be done in automated and manual, if the logs are in non-standard forward manual parsing through regex is recommended. For standard format of logs splunk automates parsing phase.

Raw Text --> Individual Searchable Events.

Sub-Phases of Parsing :
-->Breaking the stream into individual lines where one event ends & where next event begins.
-->Identifying, Parsing, Setting Timestamps.
-->Annotating individual events with meta data.(Splunk copies meta data generated from i/p stage)
-->Transforming event data using regex. This uses
props.config, transforms.config.

Indexing Process
Splunk decides which parts are timestamp, message. what fields exists...
This process also known as Event Processing.
Indexing = Event Processing
Splunk does the following:
Step 1: Event Breaking
Splits continuous data stream into individual events
Example:
One log file → 1,000 separate events
Step 2: Timestamp Identification
Finds time for each event
Uses:
Event timestamp
Or system time
This allows:
Time-based searches
Timeline analysis
Step 3: Field Extraction
Splunk extracts default fields like:
host
source
sourcetype
Example:
host=webserver1
source=/var/log/auth.log
sourcetype=linux_secure

Step 4: User-Defined Processing
At index time, Splunk can:
Mask sensitive data (passwords, credit cards)
Create custom fields
Drop unwanted events
Route logs to specific indexes
Apply multi-line rules

Step 5: Store Data in Indexes
Indexer stores data in Buckets
Each bucket contains:
Raw data (compressed)
Index files (tsidx)
Metadata

WHAT ARE BUCKETS IN SPLUNK?
Splunk does NOT store all data in one folder.
Instead, it stores data in multiple folders called BUCKETS.

What is a Bucket : A bucket is a directory on disk contains :
1}Raw log data
2}Index files
3}Metadata

1}Hot Bucket :
Contains newly arriving data
Currently being written
Always open

2}Warm Bucket
A Hot bucket becomes warm when
Size limit is reached OR
Time limit is reached OR
Splunk is restarted

Contains recent but stable data
No longer writable
Still searched very often

3]Cold Bucket
When does data move to Cold?
When:
Warm bucket count exceeds configured limit

What is a COLD bucket?
Contains older data
Rarely searched
Stored on slower & cheaper storage
Think:
Cold = old logs, used occasionally

4}Frozen Bucket
What is Frozen bucket :
Data is removed from Splunk indexes
IMPORTANT:
Frozen data is NOT searchable
It may be:
Deleted permanently
Archived to external storage
Why Frozen exists?
Because:
Storage is expensive
Compliance rules define retention
Old logs are useless after some time
Think:
Frozen = dead / archived data

3}Data Searching Stage

The data searching stage is the phase where users query, analyze and visualize indexed data to extract insights.
This stage happens after data is already indexed.

During the data searching stage, Splunk :
->Searches events stored in indexes.
->Filters relevant data.
->Transform and analyze data
->Visualize results

SPL : SPL (Search Processing Language): It is the core engine of Splunk, used to search, filter, analyze, and visualize indexed data, and it primarily runs on the Search Head.
In the next blog I'll be discussing about top 25 commands in SPL.

Quick Summary Table
| Component | Role |
| ------------------ | ------------------------ |
| Forwarder | Collects & sends data |
| Indexer | Stores & indexes data |
| Index | Logical storage location |
| Search Head | Search & visualization |
| Deployment Server | Manage forwarders |
| License Master | License tracking |
| Cluster Manager | Indexer clustering |
| Monitoring Console | Health monitoring |

DEV Community

Splunk Basics

Top comments (0)