Running Containers on AWS

Iā€™ve spent the last week exploring how to run containers on AWS, since I have more experience with GCP, particularly with GKE.

Hereā€™s what Iā€™ve learned.

Your Options

Thereā€™s basically 4 options. Weā€™re excluding AWS Lambda or other PaaS offerings, because those arenā€™t container-based.

  • Elastic Container Service (ECS) where you manage the EC2 instances
  • Elastic Container Service (ECS) with AWS Fargate ā€”Ā AWS manages the instances
  • Elastic Kubernetes Service (EKS) where you manage the node groups (node pools in GKE)
  • Elastic Kubernetes Service (EKS) with AWS Fargate ā€”Ā AWS manages the instances

The most interesting options are the 2 Fargate options; infra management is clearly moving farther and farther behind the scenes. We probably wonā€™t be upgrading node pools in 2 years.

The ECS API

Letā€™s go through the ECS API and see how it compares to the Kubernetes API.

A Container Instance is a node. VMs that are part of your ECS cluster. Iā€™m not sure why they didnā€™t call these nodes.

Launch Types

I only named two launch types above, but there are three available.

  • Fargate - AWS provisions and manages the VMs behind the scenes. No version upgrades, nada. Little more expensive but less to deal with.
  • EC2 - you have to manage the VMs. This was the original mode of ECS. You do this with an abstraction called a capacity provider.
  • External - you have on-prem VMs that are registered with your cluster. See AWS ECS Anywhere.

If you donā€™t have some weird compliance requirement thatā€™s stopping you, Iā€™d recommend using Fargate.

Itā€™s less of a question of ā€œwhich is betterā€, because thatā€™s clear. The real question isā€”ā€œIs Fargate mature enough to replace the EC2 launch type for most use cases?ā€œ. And I believe thatā€™s a yes.

TaskDefinition

  • This is similar to a Kubernetes Pod
  • Can have multiple containers, which can communicate with each other via localhost
  • Can share volumes
  • You want them to scale up and down together
  • You deploy them together
  • ā€You should create task definitions that group the containers that are used for a common purpose.ā€ - the docs.

I would go further and say no more than 1 main container and any additional supporting containers, i.e. the sidecar pattern.

  • portMappings, a nested field, is very similar to a Service of type NodePort in Kubernetes. It allows the container to access port(s) on the host container instance.

Service

Not to be confused with a Kubernetes Service (which provides a stable IP, among other things).

Services maintain the availability of your tasks (closer to a Kubernetes ReplicaSet or Deployment). You provide it with a task definition and a launch type.

  • placementConstraints is similar to node affinity / anti-affinity or taints and tolerations.

Networking

There are a few different networkConfigurations available.

  • In awsvpc, tasks receive their own elastic network interface and a private IPv4 address. This puts it on par with an EC2 instance, from a networking perspective.
  • In bridge, it uses Dockerā€™s virtual network.
  • In host, the task maps container ports to the ENI of the host EC2 instance. Keep in mind ports on host nodes are finite resources in your cluster.

If youā€™re using the Fargate launch mode, you have to use awsvpc.

This is interesting to compare to Kubernetes, because Kubernetes is like a combination of awsvpc and bridge. Pods are given their own IPs, but theyā€™re virtual (kube-proxy edits the nodeā€™s IP tables)

In Kubernetes it can also be implemented many different ways; you have to choose a network plugin. In managed Kubernetes they have good default choices and you usually donā€™t think about this.

Service Discovery

Itā€™s very common for one microservice to want to call another. You donā€™t want to call public endpoints, because thatā€™s additional load on your networking infrastructure (e.g. a NAT Gateway, an API Gateway), and itā€™s also going over the public internet.

In ECS, to accomplish this, you use service discovery, which is integrated with Amazon Route 53.

You register the service into a private DNS namespace, and DNS records, which reference the private IP, are created for a service. You can then hit your service at <service discovery service name>.<service discovery namespace>. Good thing weā€™re not overloading the word ā€œserviceā€. šŸ˜…

A typical workflow would create one ā€œService discovery serviceā€ per ECS Service, with all IP addresses having A name records.

This was added in 2018, and is a good example of ECS starting out overly simple, and growing more complicated, towards Kubernetes.

Relationship To Load Balancing

To understand this, we need to go over some of the load balancing abstractions in AWS.

  • TargetGroup - a set of endpoints, or Targets. We will have one of these per ECS service.
  • Listener - Listens to requests from clients on a protocol or port. Weā€™ll have 1 of these per (protocol, port) combination that we support. In the example below, just one, for HTTP.
  • ListenerRule - This is what connects the Listener and the TargetGroup.

e.g. if path is /hello, go to this TargetGroup. Or if itā€™s /foo, redirect to /bar.

So, we will have

  • 1 load balancer
  • 1 listener, for HTTP and port 80
  • 1 target group per ECS Service
  • 1 listener rule per ECS Service

Hereā€™s an example, in Pulumi.

index.ts
1
import * as pulumi from "@pulumi/pulumi";
2
import * as awsx from "@pulumi/awsx";
3
4
const vpc = awsx.ec2.Vpc.getDefault();
5
const cluster = new awsx.ecs.Cluster("main", { vpc });
6
7
// Notice we're using the EC2 launch type
8
const asg = cluster.createAutoScalingGroup("main", {
9
/* Why define this field? See this issue - https://github.com/pulumi/pulumi-awsx/issues/289 */
10
subnetIds: vpc.publicSubnetIds,
11
launchConfigurationArgs: { instanceType: "t2.medium" },
12
});
13
14
const loadBalancer = new awsx.lb.ApplicationLoadBalancer("main", {
15
external: true,
16
});
17
const httpListener = loadBalancer.createListener("http-listener", { port: 80 });
18
19
// Avocado Service
20
const avocadoServiceTG = loadBalancer.createTargetGroup("avocado-service", {
21
port: 80,
22
});
23
24
httpListener.addListenerRule("avocado-service-lr", {
25
actions: [
26
{
27
type: "forward",
28
targetGroupArn: avocadoServiceTG.targetGroup.arn,
29
},
30
],
31
conditions: [
32
{
33
pathPattern: {
34
values: ["/avocado"],
35
},
36
},
37
],
38
});
39
40
new awsx.ecs.EC2Service("avocado-service", {
41
cluster: cluster,
42
taskDefinitionArgs: {
43
vpc: vpc,
44
container: {
45
image: "ealen/echo-server",
46
memory: 512,
47
portMappings: [avocadoServiceTG],
48
environment: [
49
{
50
name: "MESSAGE",
51
value: "Avocado Service šŸ„‘",
52
},
53
],
54
},
55
},
56
});
57
58
// Pretzel Service
59
const pretzelServiceTG = loadBalancer.createTargetGroup("pretzel-service", {
60
port: 80,
61
});
62
httpListener.addListenerRule("pretzel-service-lr", {
63
actions: [
64
{
65
type: "forward",
66
targetGroupArn: pretzelServiceTG.targetGroup.arn,
67
},
68
],
69
conditions: [
70
{
71
pathPattern: {
72
values: ["/pretzel"],
73
},
74
},
75
],
76
});
77
78
new awsx.ecs.EC2Service("pretzel-service", {
79
cluster: cluster,
80
taskDefinitionArgs: {
81
vpc: vpc,
82
container: {
83
image: "ealen/echo-server",
84
memory: 512,
85
portMappings: [pretzelServiceTG],
86
environment: [
87
{
88
name: "MESSAGE",
89
value: "Pretzel Service šŸ„Ø",
90
},
91
],
92
},
93
},
94
});
95
96
export const frontendURL = pulumi.interpolate`http://${httpListener.endpoint.hostname}`;

Terminal window
$ curl -s $(pulumi stack output frontendURL)/avocado
| jq .environment.MESSAGE "Avocado Service šŸ„‘"
$ curl -s $(pulumi stack output frontendURL)/pretzel | jq .environment.MESSAGE
"Pretzel Service šŸ„Ø"

Autoscaling?

Yeah, ECS autoscales well. You do this by adding a ā€œscaling policyā€. Youā€™ve got a few options there.

  • Target based scaling - scale based on some metric
  • Step scaling - when some alarm goes off, scale up to the next step. When the next alarm goes off, scale to the next step.
  • Scheduled scaling - scale based on date and time.

These are really good options. Many companies know their system is going to have a lot of traffic at some given time, e.g. 09:00 on Monday morning, and scheduled scaling is simple.

The other two seem a bit more complex to tune, but really good options.

Additional Notes

  • ECS does rolling deployments by default, has an option for blue/green (CODE_DEPLOY), and a way to have even finer-grained control.
  • Workloads bit slower to start than Kubernetes. I changed two environment variables across two tasks, and that took 8 minutes for me.
  • Fargate is especially slow to start, because it can involve scaling up. GKE Autopilot has the same problem.

Fargate on EKS

Fargate on EKS might only be similar to Fargate on ECS in name and ability. Certainly not in implementation or how you use it.

In order to use Fargate on EKS, you have to create a Fargate profile.

You then use label selectors for pods in order to determine which, if any, Fargate profile applies. It schedules the pod using its own scheduler, on what is basically its own managed node pool. They will handle scaling and upgrading for you.

You just have to think through your memory and vCPU requests, which you should be doing anyway.

Fargate on EKS is very similar to GKE Autopilot. Itā€™s clear that these Containers as a Service tools are the future for container orchestration. Few people really want to deal with version upgrades and manually scaling.

Wow! You read the whole thing. People who make it this far sometimes want to receive emails when I post something new.

I also have an RSS feed.