Why We Didn’t Move to EKS (Yet): Choosing ECS Over Kubernetes in Production

#architecture #kubernetes #aws #devops

In the cloud-native world, Kubernetes (EKS) is often treated as the default destination for container orchestration. It’s powerful, flexible, and industry-standard. But for many engineering teams, it’s also overkill.

We recently faced the classic "build vs. buy" decision for our infrastructure. The pressure to adopt EKS was there, but after evaluating our actual needs, we made a conscious choice to stick with Amazon ECS.

The result? We saved a handsome amount of money, months of engineering time, and avoided the operational tax that comes with managing Kubernetes clusters. Here is how we architected a robust, scalable production environment on ECS without the K8s complexity.

The "Kubernetes Tax" We Wanted to Avoid

Kubernetes is amazing, but it requires a significant investment in tooling and maintenance. To run EKS properly in production, you aren't just managing containers; you're managing a platform. You need:

GitOps tools: ArgoCD or FluxCD for deployments.
Observability: Fluentd or similar for log shipping.
Ingress Controllers: NGINX or ALB controllers.
Security: Constant patching of the control plane and worker nodes.

For our team, we wanted to focus 100% on shipping application code, not managing infrastructure plumbing.

Our Hybrid ECS Architecture

We designed a hybrid ECS strategy that leverages the best of both serverless and provisioned compute.

1. Fargate for Stateless Workloads

For our main application servers and Sidekiq background workers, we used ECS Fargate.

No Servers to Manage: We don't worry about OS patching or scaling instances.
Right-Sizing: We pay only for the vCPU and RAM the tasks actually use.
Scalability: Fargate handles the heavy lifting of launching thousands of containers if needed.

2. EC2 Launch Type for Cron Jobs

Interestingly, we didn't go 100% Fargate. For our scheduled Cron jobs, we stuck with the EC2 Launch Type.

Why? Cron jobs run frequently and often use the same base images.
The Cost Hack: By running these on EC2 instances, we can cache Docker layers locally on the host. This drastically reduces data transfer costs from ECR (Elastic Container Registry) and speeds up start times, something Fargate doesn't support as efficiently for frequent, short-lived tasks.

The Stack: Simple and Managed

We offloaded state management to AWS managed services to keep the compute layer purely ephemeral:

Database: Amazon RDS for PostgreSQL.
Caching: Amazon ElastiCache (Redis).

CI/CD: Skipping the Complexity

One of the biggest wins was avoiding the "GitOps" complexity of ArgoCD or Flux. Our pipeline is a straightforward GitHub Actions workflow:

Build: Create the Docker image.
Scan: Run security vulnerability scans.
Push: Upload to ECR.
Deploy: Update the ECS Task Definition and force a new deployment.

That’s it. No separate synchronization server, no complex CRDs (Custom Resource Definitions), and no managing Helm charts. The pipeline is robust, easy to debug, and requires zero maintenance.

The Verdict: Time is Money

By choosing ECS, we:

Skipped the Learning Curve: No need to train the team on kubectl, manifests, or cluster networking.
Reduced Operational Overhead: No node patching, no control plane upgrades.
Lowered Bill: We aren't paying for EKS control plane fees ($73/month per cluster) or the overhead of system pods running on worker nodes.

We might move to EKS one day if our requirements for custom networking or service mesh become complex enough to warrant it. But for now, ECS allows us to run a stable, high-performance production environment where the only thing we have to take care of is our application code.

Sometimes, the best engineering decision is the boring one.