I tried to understand Amazon EKS internals and here’s what happened

via GIPHY

EKS is a service to run kubernetes, so you don’t have to install the software, or manage or patch it. Just like GKS on Google, kubernetes as a service is really the way to go if you want to build kubernetes apps on AWS.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

So where do we get started? AWS docs are still coming together, so it’s not easy. I would start with Jerry Hargrove’s amazing EKS diagram. If a picture is worth a thousand words, this one is work 10,000!

1. Build your EKS cluster

I already did this in Terraform. There aren’t a lot of howtos, so I wrote one.

Basically you setup the service role, the cluster, then the worker nodes. Once you’ve done that you’re ready to run the demo app.

Related: When you have to take the fall

2. Build your app spec

These are very similar to ECS tasks. You’ll need to make slight changes. mountPoints become VolumeMounts, links get removed, and workingDirectory becomes workingDir and so on. Most of these changes are obvious, but the json syntax is obviously the biggest bear you’ll wrestle with.

When done do this:

$ kubectl apply -f my-controller.json

Related: When clients don’t pay

3. Build the service spec

The service is quite a bit different than an ECS service. I suggest starting from the guestbook service. Find it here

Edit that and add your own app name & details. Then apply:

$ kubectl apply -f my-service.json

Related: Why i ask for a deposit

4. Get the endpoint and go!

$ kubectl get service -o wide

You should see the EXTERNAL-IP display a loadbalancer endpoint. Copy that into your browser and you should see your app running.

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to setup an Amazon EKS demo with Terraform

via GIPHY

Since EKS is pretty new, there aren’t a lot of howtos on it yet.

I wanted to follow along with Amazon’s Getting started with EKS & Kubernetes Guide.

However I didn’t want to use cloudformation. We all know Terraform is far superior!

Join 38,000 others and follow Sean Hull on twitter @hullsean.

With that I went to work getting it going. And a learned a few lessons along the way.

My steps follow pretty closely with the Amazon guide above, and setting up the guestbook app. The only big difference is I’m using Terraform.

1. create the EKS service role

Create a file called eks-iam-role.tf and add the following:

resource "aws_iam_role" "demo-cluster" {
  name = "terraform-eks-demo-cluster"

  assume_role_policy = --POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "demo-cluster-AmazonEKSClusterPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = "${aws_iam_role.demo-cluster.name}"
}

resource "aws_iam_role_policy_attachment" "demo-cluster-AmazonEKSServicePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
  role       = "${aws_iam_role.demo-cluster.name}"
}

Note we reference demo-cluster resource. We define that in step #3 below.

Related: How to setup Amazon ECS with Terraform

2. Create the EKS vpc

Here’s the code to create the VPC. I’m using the Terraform community module to do this.

There are two things to notice here. One is I reference eks-region variable. Add this in your vars.tf. “us-east-1” or whatever you like. Also add cluster-name to your vars.tf.

Also notice the special tags. Those are super important. If you don’t tag your resources properly, kubernetes won’t be able to do it’s thing. Or rather EKS won’t. I had this problem early on and it is very hard to diagnose. The tags in this vpc module, with propagate to subnets, and security groups which is also crucial.

#
provider "aws" {
  region = "${var.eks-region}"
}

#
module "eks-vpc" {
  source = "terraform-aws-modules/vpc/aws"

  name = "eks-vpc"
  cidr = "10.0.0.0/16"

  azs             = "${var.eks-azs}"
  private_subnets = "${var.eks-private-cidrs}"
  public_subnets  = "${var.eks-public-cidrs}"

  enable_nat_gateway = false
  single_nat_gateway = true

  #  reuse_nat_ips        = "${var.eks-reuse-eip}"
  enable_vpn_gateway = false

  #  external_nat_ip_ids  = ["${var.eks-nat-fixed-eip}"]
  enable_dns_hostnames = true

  tags = {
    Terraform                                   = "true"
    Environment                                 = "${var.environment_name}"
    "kubernetes.io/cluster/${var.cluster-name}" = "shared"
  }
}

resource "aws_security_group_rule" "allow_http" {
  type              = "ingress"
  from_port         = 80
  to_port           = 80
  protocol          = "TCP"
  security_group_id = "${module.eks-vpc.default_security_group_id}"
  cidr_blocks       = ["0.0.0.0/0"]
}

resource "aws_security_group_rule" "allow_guestbook" {
  type              = "ingress"
  from_port         = 3000
  to_port           = 3000
  protocol          = "TCP"
  security_group_id = "${module.eks-vpc.default_security_group_id}"
  cidr_blocks       = ["0.0.0.0/0"]
}

Related: How I resolved some tough Docker problems when i was troubleshooting amazon ECS

3. Create the EKS Cluster

Creating the cluster is a short bit of terraform code below. The aws_eks_cluster resource.

#
# main EKS terraform resource definition
#
resource "aws_eks_cluster" "eks-cluster" {
  name = "${var.cluster-name}"

  role_arn = "${aws_iam_role.demo-cluster.arn}"

  vpc_config {
    subnet_ids = ["${module.eks-vpc.public_subnets}"]
  }
}

output "endpoint" {
  value = "${aws_eks_cluster.eks-cluster.endpoint}"
}

output "kubeconfig-certificate-authority-data" {
  value = "${aws_eks_cluster.eks-cluster.certificate_authority.0.data}"
}

Related: Is Amazon too big to fail?

4. Install & configure kubectl

The AWS docs are pretty good on this point.

First you need to install the client on your local desktop. For me i used brew install, the mac osx package manager. You’ll also need the heptio-authenticator-aws binary. Again refer to the aws docs for help on this.

The main piece you will add is a directory (~/.kube) and edit this file ~/.kube/config as follows:

apiVersion: v1
clusters:
- cluster:
    server: https://3A3C22EEF7477792E917CB0118DD3X22.yl4.us-east-1.eks.amazonaws.com
    certificate-authority-data: "a-really-really-long-string-of-characters"
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: aws
  name: aws
current-context: aws
kind: Config
preferences: {}
users:
- name: aws
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      command: heptio-authenticator-aws
      args:
        - "token"
        - "-i"
        - "sean-eks"
      #  - "-r"
      #  - "arn:aws:iam::12345678901:role/sean-eks-role"
      #env:
      #  - name: AWS_PROFILE
      #    value: "seancli"%  

Related: Is AWS too complex for small dev teams?

5. Spinup the worker nodes

This is definitely the largest file in your terraform EKS code. Let me walk you through it a bit.

First we attach some policies to our role. These are all essential to EKS. They’re predefined but you need to group them together.

Then you need to create a security group for your worker nodes. Notice this also has the special kubernetes tag added. Be sure that it there or you’ll have problems.

Then we add some additional ingress rules, which allow workers & the control plane of kubernetes all to communicate with eachother.

Next you’ll see some serious user-data code. This handles all the startup action, on the worker node instances. Notice we reference some variables here, so be sure those are defined.

Lastly we create a launch configuration, and autoscaling group. Notice we give it the AMI as defined in the aws docs. These are EKS optimized images, with all the supporting software. Notice also they are only available currently in us-east-1 and us-west-1.

Notice also that the autoscaling group also has the special kubernetes tag. As I’ve been saying over and over, that super important.

#
# EKS Worker Nodes Resources
#  * IAM role allowing Kubernetes actions to access other AWS services
#  * EC2 Security Group to allow networking traffic
#  * Data source to fetch latest EKS worker AMI
#  * AutoScaling Launch Configuration to configure worker instances
#  * AutoScaling Group to launch worker instances
#

resource "aws_iam_role" "demo-node" {
  name = "terraform-eks-demo-node"

  assume_role_policy = --POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "demo-node-AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = "${aws_iam_role.demo-node.name}"
}

resource "aws_iam_role_policy_attachment" "demo-node-AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = "${aws_iam_role.demo-node.name}"
}

resource "aws_iam_role_policy_attachment" "demo-node-AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = "${aws_iam_role.demo-node.name}"
}

resource "aws_iam_role_policy_attachment" "demo-node-lb" {
  policy_arn = "arn:aws:iam::12345678901:policy/eks-lb-policy"
  role       = "${aws_iam_role.demo-node.name}"
}

resource "aws_iam_instance_profile" "demo-node" {
  name = "terraform-eks-demo"
  role = "${aws_iam_role.demo-node.name}"
}

resource "aws_security_group" "demo-node" {
  name        = "terraform-eks-demo-node"
  description = "Security group for all nodes in the cluster"

  #  vpc_id      = "${aws_vpc.demo.id}"
  vpc_id = "${module.eks-vpc.vpc_id}"

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = "${
    map(
     "Name", "terraform-eks-demo-node",
     "kubernetes.io/cluster/${var.cluster-name}", "owned",
    )
  }"
}

resource "aws_security_group_rule" "demo-node-ingress-self" {
  description              = "Allow node to communicate with each other"
  from_port                = 0
  protocol                 = "-1"
  security_group_id        = "${aws_security_group.demo-node.id}"
  source_security_group_id = "${aws_security_group.demo-node.id}"
  to_port                  = 65535
  type                     = "ingress"
}

resource "aws_security_group_rule" "demo-node-ingress-cluster" {
  description              = "Allow worker Kubelets and pods to receive communication from the cluster control plane"
  from_port                = 1025
  protocol                 = "tcp"
  security_group_id        = "${aws_security_group.demo-node.id}"
  source_security_group_id = "${module.eks-vpc.default_security_group_id}"
  to_port                  = 65535
  type                     = "ingress"
}

data "aws_ami" "eks-worker" {
  filter {
    name   = "name"
    values = ["eks-worker-*"]
  }

  most_recent = true
  owners      = ["602401143452"] # Amazon
}

# EKS currently documents this required userdata for EKS worker nodes to
# properly configure Kubernetes applications on the EC2 instance.
# We utilize a Terraform local here to simplify Base64 encoding this
# information into the AutoScaling Launch Configuration.
# More information: https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-06-05/amazon-eks-nodegroup.yaml
locals {
  demo-node-userdata = --USERDATA
#!/bin/bash -xe

CA_CERTIFICATE_DIRECTORY=/etc/kubernetes/pki
CA_CERTIFICATE_FILE_PATH=$CA_CERTIFICATE_DIRECTORY/ca.crt
mkdir -p $CA_CERTIFICATE_DIRECTORY
echo "${aws_eks_cluster.eks-cluster.certificate_authority.0.data}" | base64 -d >  $CA_CERTIFICATE_FILE_PATH
INTERNAL_IP=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)
sed -i s,MASTER_ENDPOINT,${aws_eks_cluster.eks-cluster.endpoint},g /var/lib/kubelet/kubeconfig
sed -i s,CLUSTER_NAME,${var.cluster-name},g /var/lib/kubelet/kubeconfig
sed -i s,REGION,${var.eks-region},g /etc/systemd/system/kubelet.service
sed -i s,MAX_PODS,20,g /etc/systemd/system/kubelet.service
sed -i s,MASTER_ENDPOINT,${aws_eks_cluster.eks-cluster.endpoint},g /etc/systemd/system/kubelet.service
sed -i s,INTERNAL_IP,$INTERNAL_IP,g /etc/systemd/system/kubelet.service
DNS_CLUSTER_IP=10.100.0.10
if [[ $INTERNAL_IP == 10.* ]] ; then DNS_CLUSTER_IP=172.20.0.10; fi
sed -i s,DNS_CLUSTER_IP,$DNS_CLUSTER_IP,g /etc/systemd/system/kubelet.service
sed -i s,CERTIFICATE_AUTHORITY_FILE,$CA_CERTIFICATE_FILE_PATH,g /var/lib/kubelet/kubeconfig
sed -i s,CLIENT_CA_FILE,$CA_CERTIFICATE_FILE_PATH,g  /etc/systemd/system/kubelet.service
systemctl daemon-reload
systemctl restart kubelet
USERDATA
}

resource "aws_launch_configuration" "demo" {
  associate_public_ip_address = true
  iam_instance_profile        = "${aws_iam_instance_profile.demo-node.name}"
  image_id                    = "${data.aws_ami.eks-worker.id}"
  instance_type               = "m4.large"
  name_prefix                 = "terraform-eks-demo"
  security_groups             = ["${aws_security_group.demo-node.id}"]
  user_data_base64            = "${base64encode(local.demo-node-userdata)}"

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "demo" {
  desired_capacity     = 2
  launch_configuration = "${aws_launch_configuration.demo.id}"
  max_size             = 2
  min_size             = 1
  name                 = "terraform-eks-demo"

  #  vpc_zone_identifier  = ["${aws_subnet.demo.*.id}"]
  vpc_zone_identifier = ["${module.eks-vpc.public_subnets}"]

  tag {
    key                 = "Name"
    value               = "eks-worker-node"
    propagate_at_launch = true
  }

  tag {
    key                 = "kubernetes.io/cluster/${var.cluster-name}"
    value               = "owned"
    propagate_at_launch = true
  }
}

Related: How to hire a developer that doesn’t suck

6. Enable & Test worker nodes

If you haven’t already done so, apply all your above terraform:

$ terraform init
$ terraform plan
$ terraform apply

After that all runs, and all your resources are created. Now edit the file “aws-auth-cm.yaml” with the following contents:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: arn:aws:iam::12345678901:role/terraform-eks-demo-node
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes% 

Then apply it to your cluster:

$ kubectl apply -f aws-auth-cm.yaml

you should be able to use kubectl to view node status:

$ kubectl get nodes
NAME                           STATUS    ROLES     AGE       VERSION
ip-10-0-101-189.ec2.internal   Ready         10d       v1.10.3
ip-10-0-102-182.ec2.internal   Ready         10d       v1.10.3
$ 

Related: Why would I help a customer that’s not paying?

7. Setup guestbook app

Finally you can follow the exact steps in the AWS docs to create the app. Here they are again:

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.3/examples/guestbook-go/redis-master-controller.json
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.3/examples/guestbook-go/redis-master-service.json
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.3/examples/guestbook-go/redis-slave-controller.json
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.3/examples/guestbook-go/redis-slave-service.json
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.3/examples/guestbook-go/guestbook-controller.json
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.3/examples/guestbook-go/guestbook-service.json

Then you can get the endpoint with kubectl:

$ kubectl get services        
NAME           TYPE           CLUSTER-IP       EXTERNAL-IP        PORT(S)          AGE
guestbook      LoadBalancer   172.20.177.126   aaaaa555ee87c...   3000:31710/TCP   4d
kubernetes     ClusterIP      172.20.0.1                    443/TCP          10d
redis-master   ClusterIP      172.20.242.65                 6379/TCP         4d
redis-slave    ClusterIP      172.20.163.1                  6379/TCP         4d
$ 

Use “kubectl get services -o wide” to see the entire EXTERNAL-IP. If that is saying you likely have an issue with your node iam role, or missing special kubernetes tags. So check on those. It shouldn’t show for more than a minute really.

Hope you got everything working.

Good luck and if you have questions, post them in the comments & I’ll try to help out!

Related: How to migrate my skills to the cloud?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to setup an Amazon ECS cluster with Terraform

via GIPHY

ECS is Amazon’s Elastic Container Service. That’s greek for how you get docker containers running in the cloud. It’s sort of like Kubernetes without all the bells and whistles.

It takes a bit of getting used to, but This terraform how to, should get you moving. You need an EC2 host to run your containers on, you need a task that defines your container image & resources, and lastly a service which tells ECS which cluster to run on and registers with ALB if you have one.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

For each of these sections, create files: roles.tf, instance.tf, task.tf, service.tf, alb.tf. What I would recommend is create the first file roles.tf, then do:


$ terraform init
$ terraform plan
$ terraform apply

Then move on to instance.tf and do the terraform apply. One by one, next task, then service then finally alb. This way if you encounter errors, you can troubleshoot minimally, rather than digging through five files for the culprit.

This howto also requires a vpc. Terraform has a very good community vpc which will get you going in no time.

I recommend deploying in the public subnets for your first run, to avoid complexity of jump box, and private IPs for ecs instance etc.

Good luck!

May the terraform force be with you!

First setup roles

Roles are a really brilliant part of the aws stack. Inside of IAM or identity access and management, you can create roles. These are collections of privileges. I’m allowed to use this S3 bucket, but not others. I can use EC2, but not Athena. And so forth. There are some special policies already created just for ECS and you’ll need roles to use them.

These roles will be applied at the instance level, so your ecs host doesn’t have to pass credentials around. Clean. Secure. Smart!


resource "aws_iam_role" "ecs-instance-role" {
name = "ecs-instance-role"
path = "/"
assume_role_policy = "${data.aws_iam_policy_document.ecs-instance-policy.json}"
}

data "aws_iam_policy_document" "ecs-instance-policy" {
statement {
actions = ["sts:AssumeRole"]

principals {
type = "Service"
identifiers = ["ec2.amazonaws.com"]
}
}
}

resource "aws_iam_role_policy_attachment" "ecs-instance-role-attachment" {
role = "${aws_iam_role.ecs-instance-role.name}"
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
}

resource "aws_iam_instance_profile" "ecs-instance-profile" {
name = "ecs-instance-profile"
path = "/"
role = "${aws_iam_role.ecs-instance-role.id}"
provisioner "local-exec" {
command = "sleep 60"
}
}

resource "aws_iam_role" "ecs-service-role" {
name = "ecs-service-role"
path = "/"
assume_role_policy = "${data.aws_iam_policy_document.ecs-service-policy.json}"
}

resource "aws_iam_role_policy_attachment" "ecs-service-role-attachment" {
role = "${aws_iam_role.ecs-service-role.name}"
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceRole"
}

data "aws_iam_policy_document" "ecs-service-policy" {
statement {
actions = ["sts:AssumeRole"]

principals {
type = "Service"
identifiers = ["ecs.amazonaws.com"]
}
}
}

Related: 30 questions to ask a serverless fanboy

Setup your ecs host instance

Next you need EC2 instances on which to run your docker containers. Turns out AWS has already built AMIs just for this purpose. They call them ECS Optimized Images. There is one unique AMI id for each region. So be sure you’re using the right one for your setup.

The other thing that your instance needs to do is echo the cluster name to /etc/ecs/ecs.config. You can see us doing that in the user_data script section.

Lastly we’re configuring our instance inside of an auto-scaling group. That’s so we can easily add more instances dynamically to scale up or down as necessary.


#
# the ECS optimized AMI's change by region. You can lookup the AMI here:
# https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html
#
# us-east-1 ami-aff65ad2
# us-east-2 ami-64300001
# us-west-1 ami-69677709
# us-west-2 ami-40ddb938
#

#
# need to add security group config
# so that we can ssh into an ecs host from bastion box
#

resource "aws_launch_configuration" "ecs-launch-configuration" {
name = "ecs-launch-configuration"
image_id = "ami-aff65ad2"
instance_type = "t2.medium"
iam_instance_profile = "${aws_iam_instance_profile.ecs-instance-profile.id}"

root_block_device {
volume_type = "standard"
volume_size = 100
delete_on_termination = true
}

lifecycle {
create_before_destroy = true
}

associate_public_ip_address = "false"
key_name = "testone"

#
# register the cluster name with ecs-agent which will in turn coord
# with the AWS api about the cluster
#
user_data = <> /etc/ecs/ecs.config
EOF
}

#
# need an ASG so we can easily add more ecs host nodes as necessary
#
resource "aws_autoscaling_group" "ecs-autoscaling-group" {
name = "ecs-autoscaling-group"
max_size = "2"
min_size = "1"
desired_capacity = "1"

# vpc_zone_identifier = ["subnet-41395d29"]
vpc_zone_identifier = ["${module.new-vpc.private_subnets}"]
launch_configuration = "${aws_launch_configuration.ecs-launch-configuration.name}"
health_check_type = "ELB"

tag {
key = "Name"
value = "ECS-myecscluster"
propagate_at_launch = true
}
}

resource "aws_ecs_cluster" "test-ecs-cluster" {
name = "myecscluster"
}

Related: Is there a serious skills shortage in the devops space?

Setup your task definition

The third thing you need is a task. This one will spinup a generic nginx container. It’s a nice way to demonstrate things. For your real world usage, you’ll replace the image line with a docker image that you’ve pushed to ECR. I’ll leave that as an exercise. Once you have the cluster working, you should get the hang of things.

Note the portmappings, memory and CPU. All things you might expect to see in a docker-compose.yml file. So these tasks should look somewhat familiar.


data "aws_ecs_task_definition" "test" {
task_definition = "${aws_ecs_task_definition.test.family}"
depends_on = ["aws_ecs_task_definition.test"]
}

resource "aws_ecs_task_definition" "test" {
family = "test-family"

container_definitions = <

Related: Is AWS too complex for small dev teams?

Setup your service definition

The fourth thing you need to do is setup a service. The task above is a manifest, describing your containers needs. It is now registered, but nothing is running.

When you apply the service your container will startup. What I like to do is, ssh into the ecs host box. Get comfortable. Then issue $ watch "docker ps". This will repeatedly run "docker ps" every two seconds. Once you have that running, do your terraform apply for this service piece.

As you watch, you'll see ECS start your container, and it will suddenly appear in your watch terminal. It will first show "starting". Once it is started, it should say "healthy".


resource "aws_ecs_service" "test-ecs-service" {
name = "test-vz-service"
cluster = "${aws_ecs_cluster.test-ecs-cluster.id}"
task_definition = "${aws_ecs_task_definition.test.family}:${max("${aws_ecs_task_definition.test.revision}", "${data.aws_ecs_task_definition.test.revision}")}"
desired_count = 1
iam_role = "${aws_iam_role.ecs-service-role.name}"

load_balancer {
target_group_arn = "${aws_alb_target_group.test.id}"
container_name = "nginx"
container_port = "80"
}

depends_on = [
# "aws_iam_role_policy.ecs-service",
"aws_alb_listener.front_end",
]
}

Related: Does AWS have a dirty secret?

Setup your application load balancer

The above will all work by itself. However for a real-world use case, you'll want to have an ALB. This one has only a simple HTTP port 80 listener. These are much simpler than setting up 443 for SSL, so baby steps first.

Once you have the ALB going, new containers will register with the target group, to let the alb know about them. In "docker ps" you'll notice they are running on a lot of high numbered ports. These are the hostPorts which are dynamically assigned. The container ports are all 80.


#
#
resource "aws_alb_target_group" "test" {
name = "my-alb-group"
port = 80
protocol = "HTTP"
vpc_id = "${module.new-vpc.vpc_id}"
}

resource "aws_alb" "main" {
name = "my-alb-ecs"
subnets = ["${module.new-vpc.public_subnets}"]
security_groups = ["${module.new-vpc.default_security_group_id}"]
}

resource "aws_alb_listener" "front_end" {
load_balancer_arn = "${aws_alb.main.id}"
port = "80"
protocol = "HTTP"

default_action {
target_group_arn = "${aws_alb_target_group.test.id}"
type = "forward"
}
}

You will also want to add a domain name, so that as your infra changes, and if you rebuild your ALB, the name of your application doesn't vary. Route53 will adjust as terraform changes are applied. Pretty cool.


resource "aws_route53_record" "myapp" {
zone_id = "${aws_route53_zone.primary.zone_id}"
name = "myapp.mydomain.com"
type = "CNAME"
ttl = "60"
records = ["${aws_alb.main.dns_name}"]

depends_on = ["aws_alb.main"]
}

Related: How to deploy on EC2 with vagrant

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don't work with recruiters

How I resolved some tough docker problems on Amazon ECS

via GIPHY

ECS is Amazon’s elastic container service. If you have a dockerized app, this is one way to get it deployed in the cloud. It is basically an Amazon bootleg Kubernetes clone. And not nearly as feature rich! ๐Ÿ™‚

Join 38,000 others and follow Sean Hull on twitter @hullsean.

That said, ECS does work, and it will allow you to get your application going on Amazon. Soon enough EKS (Amazon’s Kubernetes service) will be production, and we’ll all happily switch.

Meantime, if you’re struggling with the weird errors, and when it is silently failing, I have some help here for you. Hopefully these various error cases are ones you’ve run into, and this helps you solve them.

Why is my container in a stopped state?

Containers can fail for a lot of different reasons. The litany of causes I found were:

o port mismatches
o missing links in the task definition
o shortage of resources (see #2 below)

When ecs repeatedly fails, it leaves around stopped containers. These eat up system resources, without much visible feedback. “df -k” or “df -m” doesn’t show you volumes filled up. *BUT* there are logical volumes which can fill.

Do this to see the status:


[[email protected] ~]# lvdisplay
--- Logical volume ---
LV Name docker-pool
VG Name docker
LV UUID aSSS-fEEE-d333-V999-e999-a000-t11111
LV Write Access read/write
LV Creation host, time ip-10-111-40-30, 2018-04-21 18:16:19 +0000
LV Pool metadata docker-pool_tmeta
LV Pool data docker-pool_tdata
LV Status available
# open 3
LV Size 21.73 GiB
Allocated pool data 18.81%
Allocated metadata 6.10%
Current LE 5562
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:2

[[email protected] ~]#

Related: 30 questions to ask a serverless fanboy

Why am I getting this error “Couldn’t run containers – reason=RESOURCE:PORTS”?

I was seeing errors like this. Your first thought might be that I have multiple containers on the same port. But no I didn’t have a port conflict.

What was happening was containers were failing, but in inconsistent ways. So docker had old copies still sitting around.

On the ecs host, use “docker ps -a” to list *ALL* containers. Then use “docker system prune” to cleanup old resources.


INFO[0000] Using ECS task definition TaskDefinition="docker:5"
INFO[0000] Couldn't run containers reason="RESOURCE:PORTS"
INFO[0000] Couldn't run containers reason="RESOURCE:PORTS"
INFO[0000] Starting container... container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-redis
INFO[0000] Starting container... container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-main
INFO[0000] Starting container... container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-postgres
INFO[0000] Describe ECS container status container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-postgres desiredStatus=RUNNING lastStatus=PENDING taskDefinition="docker:5"
INFO[0000] Describe ECS container status container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-redis desiredStatus=RUNNING lastStatus=PENDING taskDefinition="docker:5"
INFO[0000] Describe ECS container status container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-main desiredStatus=RUNNING lastStatus=PENDING taskDefinition="docker:5"

INFO[0007] Stopped container... container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-postgres desiredStatus=STOPPED lastStatus=STOPPED taskDefinition="docker:5"
INFO[0007] Stopped container... container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-redis desiredStatus=STOPPED lastStatus=STOPPED taskDefinition="docker:5"
INFO[0007] Stopped container... container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-main desiredStatus=STOPPED lastStatus=STOPPED taskDefinition="docker:5"

Related: What’s the luckiest thing that’s happened in your career?

3. My container gets killed before fully started

When a service is run, ECS wants to have *all* of the containers running together. Just like when you use docker-compose. If one container fails, ecs-agent may decide to kill the entire service, and restart. So you may see weird things happening in “docker logs” for one container, simply because another failed. What to do?

First look at your task definition, and set “essential = false”. That way if one fails, the other will still run. So you can eliminate the working container as a cause.

Next thing is remember some containers may startup almost instantly, like nginx for example. Because it is a very small footprint, it can start in a second or two. So if *it* depends on another container that is slow, nginx will fail. That’s because in the strange world of docker discovery, that other container doesn’t even exist yet. While nginx references it, it says hey, I don’t see the upstream server you are pointing to.

Solution? Be sure you have a “links” section in your task definition. This tells ecs-agent, that one container depends on another (think of the depends_on flag in docker-compose).

Related: Curve ball interview questions and answers

4. Understanding container ordering

As you are building your ecs manifest aka task definition, you want to run through your docker-compose file carefully. Review the links, essential flags and depends_on settings. Then be sure to mirror those in your ECS task.

When in doubt, reduce the scope of your problem. That is define *only one* container, then start the service. Once that container works, add a second. When you get that working as well, add a third or other container.

This approach allows you to eliminate interconnecting dependencies, and related problems.

Related: Are generalists better at scaling the web?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What makes a highly valued docker expert?

via GIPHY

What exactly do we need to know about to manage docker effectively? What are the main pain points?

Join 38,000 others and follow Sean Hull on twitter @hullsean.

The basics aren’t tough. You need to know the anatomy of a Dockerfile, and how to setup a docker-compose.yml to ease the headache of docker run. You also should know how to manage docker images, and us docker ps to find out what’s currently running. And get an interactive shell (docker exec -it imageid). You’ll also make friends with inspect. But what else?

1. Manage image bloat

Docker images can get quite large. Even as you try to pair them down they can grow. Why is this?

Turns out the architecture of docker means as you add more stuff, it creates more “layers”. So even as you delete files, the lower or earlier layers still contain your files.

One option, during a package install you can do this:

RUN apt-get update && apt-get install -y mypkg && rm -rf /var/lib/apt/lists/*

This will immediately cleanup the crap that apt-get built from, without it ever becoming permanent in that layer. Cool! As long as you use “&&” it is part of that same RUN command, and thus part of that same layer.

Another option is you can flatten a big image. Something like this should work:

$ docker export 0453814a47b3 | docker import – newimage

Related: 30 questions to ask a serverless fanboy

2. Orchestrate

Running docker containers on dev is great, and it can be a fast and easy way to get things running. Plus it can work across dev environments well, so it solves a lot of problems.

But what about when you want to get those containers up into the cloud? That’s where orchestration comes in. At the moment you can use docker’s own swarm or choose fleet or mesos.

But the biggest players seem to be kubernetes & ECS. The former of course is what all the cool kids in town are using, and couple it with Helm package manager, it becomes very manageable system. Get your pods, services, volumes, replicasets & deployments ready to go!

On the other hand Amazon is pushing ahead with it’s Elastic Container Service, which is native to AWS, and not open source. It works well, allowing you to apply a json manifest to create a task. Then just as with kubernetes you create a “service” to run one or more copies of that. Think of the task as a docker-compose file. It’s in json, but it basically specifies the same types of things. Entrypoint, ports, base image, environment etc.

For those wanting to go multi-cloud, kubernetes certainly has an appeal. But amazon is on the attack. They have announced a service to further ease container deployments. Dubbed Amazon Fargate. Remember how Lambda allowed you to just deploy your *code* into the cloud, and let amazon worry about the rest? Imaging you can do that with containers, and that’s what Fargate is.

Check out what Krish has to say – Why Kubernetes should be scared of AWS

Related: What’s the luckiest thing that’s happened in your career?

3. Registries & Deployment

There are a few different options for where to store those docker images.

One choice is dockerhub. It’s not feature rich, but it does the job. There is also Quay.io. Alternatively you can run your own registry. It’s as easy as:

$ docker run -d -p 5000:5000 registry:2

Of course if you’re running your own registry, now you need to manage that, and think about it’s uptime, and dependability to your deployment pipeline.

If you’re using ECS, you’ll be able to use ECR which is a private docker registry that comes with your AWS account. I think you can use this, even if you’re not on ECS. The login process is a little weird.

Once you have those pieces in place, you can do some fun things. Your jenkins deploy pipeline can use docker containers for testing, to spinup a copy of your app just to run some unittests, or it can build your images, and push them to your registry, for later use in ECS tasks or Kubernetes manifests. Awesome sauce!

Related: Is Amazon Web Services too complex for small dev teams?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters