DEV Community

Cover image for πŸš€ Terraform Day 23: Production-Grade Monitoring & Observability on AWS (Serverless)
Jeeva
Jeeva

Posted on

πŸš€ Terraform Day 23: Production-Grade Monitoring & Observability on AWS (Serverless)

🧱 Services Used
AWS Lambda – serverless compute
Amazon S3 – source & destination buckets
Amazon CloudWatch
Logs
Metrics
Dashboards
Alarms
Amazon SNS – email notifications
AWS CloudTrail – API activity auditing
Terraform (custom modules) – full automation

🧩 Terraform Architecture (Modular)
The project is split into multiple custom Terraform modules:
s3 – secure buckets with encryption & versioning
lambda – function, IAM role, permissions
cloudwatch_logs – log groups & filters
cloudwatch_metrics – custom metrics
cloudwatch_alarms – alert definitions
sns – notification channels
This mirrors real enterprise Terraform layouts.

πŸ“Š Observability Implementation
1️⃣ Logs (CloudWatch Logs)
Lambda logs are captured and analyzed using log metric filters.
Tracked patterns include:
Processing errors
Invalid file uploads
Large file sizes
Access denied events
Successful executions

Regex-based filters convert logs into metrics.

2️⃣ Custom Metrics
Beyond default Lambda metrics, custom metrics include:
Images processed successfully
Image processing failures
Invalid file types
Processing duration thresholds
File size violations

These metrics provide application-level visibility, not just infrastructure stats

3️⃣ Dashboards
A CloudWatch Dashboard is created entirely via Terraform (JSON):

Widgets include:
Invocation count
Error rate
Duration (Avg + P99 latency)
Concurrent executions
Custom error metrics
Log-based error trends

This dashboard is production-ready.

4️⃣ Alerts & Alarms
Multiple alarm categories are implemented:
❌ Lambda errors
⏱️ High execution duration
πŸ”₯ Concurrency limit breaches
πŸ“‰ Invalid file uploads
🚫 Log-based failures

All alarms are parameterized via Terraform variables.

5️⃣ Notifications (SNS)
Separate SNS topics for alert types
Email subscriptions (manual confirmation)
Real-time alert delivery when alarms trigger

This completes the incident response loop.

πŸ‹ Dockerized Lambda Layer Build

To avoid the classic β€œworks on my machine” problem:
Lambda dependencies (Pillow) are built using Docker
Ensures compatibility with AWS Lambda Linux runtime
Terraform deploys the generated layer artifact

This is production-grade dependency management.

πŸ§ͺ Testing & Failure Simulation
The project is actively tested by:
Uploading valid images β†’ metrics increase
Uploading invalid files (PDF/GIF) β†’ error alarms trigger
Uploading large files β†’ size alarms trigger
Uploading multiple files β†’ concurrency alarms trigger

Alerts are received via email in real time.

This proves the system actually works.

Top comments (0)