DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Amazon Kinesis vs Amazon MSK: The Complete Guide for Stream Processing on AWS

Amazon Kinesis vs Amazon MSK: The Complete Guide for Stream Processing on AWS

Comments
29 min read
Mastering Serverless Data Pipelines: AWS Step Functions Best Practices for 2026

Mastering Serverless Data Pipelines: AWS Step Functions Best Practices for 2026

Comments
5 min read
2025 Year in Review: Apache Iceberg, Polaris, Parquet, and Arrow

2025 Year in Review: Apache Iceberg, Polaris, Parquet, and Arrow

Comments
6 min read
💀 RIP Copy-Paste: Google NotebookLM Just Killed Manual Data Entry

💀 RIP Copy-Paste: Google NotebookLM Just Killed Manual Data Entry

Comments
3 min read
Unified Data Fabric: Serverless Spark on ROSA Integrating with AWS Glue Catalog

Unified Data Fabric: Serverless Spark on ROSA Integrating with AWS Glue Catalog

Comments
18 min read
Day 27: Building Exactly-Once Streaming Pipelines with Spark & Delta Lake

Day 27: Building Exactly-Once Streaming Pipelines with Spark & Delta Lake

Comments
1 min read
Day 28: Spark Streaming Performance Tuning

Day 28: Spark Streaming Performance Tuning

Comments
1 min read
Day 29: Building a Production-Grade Real-Time ETL Pipeline with Spark & Delta

Day 29: Building a Production-Grade Real-Time ETL Pipeline with Spark & Delta

Comments
1 min read
Production AI: Monitoring, Cost Optimization, and Operations

Production AI: Monitoring, Cost Optimization, and Operations

Comments
9 min read
Building a Realistic Banking Dummy Data Generator with Bad-Data Simulation

Building a Realistic Banking Dummy Data Generator with Bad-Data Simulation

Comments
1 min read
The Day Our Pipeline Went From 10 Minutes to 6 Seconds (Part 2 of 3)

The Day Our Pipeline Went From 10 Minutes to 6 Seconds (Part 2 of 3)

Comments
6 min read
Day 26: Spark Streaming Joins

Day 26: Spark Streaming Joins

Comments
1 min read
DataOps 101: What It Is and Why Enterprises Can’t Ignore It in 2026

DataOps 101: What It Is and Why Enterprises Can’t Ignore It in 2026

Comments
2 min read
Day 25: Streaming Aggregations in Spark

Day 25: Streaming Aggregations in Spark

Comments
1 min read
What Is Data Fabric Architecture? A Beginner’s Guide (Explained Simply)

What Is Data Fabric Architecture? A Beginner’s Guide (Explained Simply)

Comments
2 min read
Data Processing Does Not Belong in the Message Broker

Data Processing Does Not Belong in the Message Broker

Comments
3 min read
Day 24: Spark Structured Streaming

Day 24: Spark Structured Streaming

Comments
1 min read
Day 23: Spark Shuffle Optimization

Day 23: Spark Shuffle Optimization

Comments
1 min read
Day 22: Spark Shuffle Deep Dive

Day 22: Spark Shuffle Deep Dive

Comments
1 min read
Day 20: Handling Bad Records & Data Quality in Spark

Day 20: Handling Bad Records & Data Quality in Spark

Comments
1 min read
Data-Architect-Master-Professional-Workbook

Data-Architect-Master-Professional-Workbook

Comments
1 min read
Day 18: Spark Performance Tuning

Day 18: Spark Performance Tuning

Comments
1 min read
Day 19: Spark Broadcasting & Caching

Day 19: Spark Broadcasting & Caching

Comments
1 min read
Designing a YouTube Digest for Signal Over Noise

Designing a YouTube Digest for Signal Over Noise

Comments
4 min read
Day 21: Building a Production-Grade Data Quality Pipeline with Spark & Delta

Day 21: Building a Production-Grade Data Quality Pipeline with Spark & Delta

Comments
1 min read
loading...