I have a new appreciation for Agile and not because it worked

via GIPHY

A couple of years ago I worked at a startup in the online publishing space. But this story isn’t about online publishing. This story is about deadlines, and not meeting them.

For those who don’t know me, I’m a professional services consultant. I’ve worked on a project basis for 90% of two plus decades of my professional career. I mentioned this to give my opinions and perspectives some context. Although I’m not a manager, I’ve worked under more managers than most. Because I work on 5-10 projects per year, that comes to close to two hundred in my career.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

My career path has given me a unique perspective, on teams, leadership and motivation. Of course like anyone who’s worked in tech, I’ve see a lot of agile.

Daily standups are de rigueur of course. As are breaking up your work into two week sprints, and assigning story points to those tasks.

I have to admit there are times when Agile seemed the most fashionable way I saw teams working. And I guess I believe it was more trendy than functional.

Doesn’t everybody already work off tight todo lists, break tasks down into manageable pieces, and always deliver what they promised? It may be because I’ve spent a lot of time managing myself, and surviving as a freelancer that I’ve picked up some of these habits. But I digress…

1. Crunch time

While we as a team had been working on two week sprints, something happened to derail us. Suddenly a major customer was in jeopardy, because we had not delivered on sales promises.

Although what was being promised, we *could* build, we were stumbling over the details.

As an emergency plan, we dropped our current sprint tasks, and marshaled our forces towards this new goal. Everyone on the team knew the customer was hanging by a thread.

Related: When you have to take the fall

2. We need to deliver production by Friday

While this edict looks great on paper, it didn’t work out so well. Developers and ops weren’t clear which systems were included in “production”, and how they needed to be connected together.

Each engineer ended up interpreting the goal in their own way, often assuming that others were shouldering ultimate responsibility of delivery.

“Well I did my part, this other part of the team is responsible for those other pieces…” was the refrain I heard. Sadly the deadline was missed, and everything was a mess. Only after management later dug in and sifted through the rubble, did they actually break up the tasks, assign story points, and give each engineer actionable items to work on.

Related: When clients don’t pay

3. Please work together to make that happen

This doesn’t work because “production” is not an actionable item.

Actionable is much more specific
o deploy this container
o setup these five servers
o fix these three bugs and push code
o setup these new environment variables

Why is “production” not specific?

Which application? Which layer or API must work? Are there intervening steps before production will work? Are their individual tasks for each engineer?

To me you need to “break things down” further if you have tasks that span multiple people, and multiple sessions. I think of a session as a 2-3 hour bucket of productive work. For me it is also the length of time my laptop battery takes to drain from 100% to 0%. When that happens I know I need a break.

And I know that’s how I get chunks of work done.

So to take this to a more specific level, if Friday is 5 work days away, I figure I have 12 increments of work I can do in that time, and if I have 3 engineers, then I need 36 chunks of work.

If you assign 36 chunks of work I believe you are much more likely to get 25-30 of those chunks done.

If you stick to the one macro goal of “get production to work”, engineers may secretly drop the ball figuring, well there are dependent tasks that are not done, so we’re not gonna get there. And also since the goal points at everyone, nobody saddles the failure.

Whereas if you have 36 chunks of work, individual engineers are less likely to drag their feet because it’ll be clear that the hold up was three of john’s tasks…

Related: Why i ask for a deposit

All of this gave me a new appreciation for Agile. It truly does keep teams on track, and focuses everyone on actionable work. This allows management more transparency on whats working, and engineers the feedback they need to get to the finish line.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to avoid legal problems in consulting

via GIPHY

I posted a newsletter recently entitled “When Clients Don’t Pay”.

I got a lot of responses in email, which is always encouraging. I’m happy to know that folks are reading and getting something out of my ideas.

One colleague suggested that I modify my last point about going to court. He suggested that legal action does make sense after other avenues are exhausted.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

My feeling about avoiding court, has only grown stronger over the years.

There are usually only a few reasons a customer won’t pay. In my experience each of them are avoidable without going to court.

Here are my thoughts on those…

1. Misaligned on tasks, deliverables or deadlines

I find weekly progress reports and endless notes go a long way towards avoiding this problem. If it does arise, there is usually something specific in those notes that can be remedied.

One also needs to be willing to compromise. Putting yourself in the other’s shoes will help to understand their perspective.

Communicate, communicate, and communicate more!

Related: When you have to take the fall

2. Budget problems

Here there isn’t a lot to do anyway. Although companies are obligated to meet payroll by law, they are not so with vendors. If they are out of cash, will court really resolve that?

My way of heading off this problem is, billing/invoicing in smaller increments, getting a deposit, and keep on top of things, so larger debts don’t build up.

Related: The fine art of resistance

3. Shady customers

These I usually suss out well before becoming engaged. I’ve had a few incidents where a prospect was meeting me to get “free advice”. They ask a lot of architectural questions, and take careful notes. Then don’t engage, or use their own people to implement.

One situation in particular I remember was around scalability. The product was a website & app for teachers. From the beginning they built it to sync data instantly. As they got bigger and more customers used the platform, their servers became heavily loaded.

I suggested, instead of looking for a technical solution, why not offer your customers, silver, bronze & gold service levels. For the gold customers, yeah they get their own servers, and can sync all the time. But for the silver ones, once-a-day would probably suffice. Much less load on the servers, because 75% of customers would go silver, 20% bronze and 5% gold.

They actually ran with the idea and implemented it, but never hired me even for an hour of work. I knew they implemented it because I had a friend in the company. It is experiences like that which teach you quite a lot about business and about how you conduct yourself.

This has happened a few times, and I guess it’s part of doing business. But usually that comes out before we go much further, so in a sense it’s a blessing in disguise. 🙂

Related: How to hire a developer that doesn’t suck

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

30 questions to ask as you plan for multicloud

via GIPHY

I enjoy reading Corey Quinn’s Last week in AWS newsletter. And also really like his podcast Screaming in the cloud. This week Corey talks with Jay Gordon from Mongodb. While Jay seems somewhat of an advocate of multicloud, Corey is decidedly critical. Which makes for a great interview!

Corey also wrote a piece The Myth of cloud agnosticism. I like any writing that dubs a popular trope as a “myth” because it’s an opportunity to poke holes in optimism.

It is through this process that we become realistic, which is crucial to being reliable in operations and engineering.

Corey argues that multicloud, with respect to multiple infrastructure providers is usually a crappy idea. That’s because the cloud providers are evolving, your application is evolving, and it costs you in terms of feature velocity. What’s more it provides dubious instant uptime in the DR realm.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

The topic reminds me of similar myths in computer science… the myth of cross platform development or
the myth of the cross platform databases the myth of object relational modeling.

As always your mileage may vary. Here are my questions. Hope they can help provide perspective, and critical thinking around this.

1. Do you plan to use multiple cloud providers for infrastructure? And deploy your application twice?
2. Do you plan to use multiple SaaS providers?
3. Does hybrid cloud make sense? That’s an option where you deploy a data link to a public cloud, keeping some assets in your own datacenter.
4. Are their feature parallels across your chosen clouds? Or are there feature mismatches?
5. Your cloud providers have independent service level agreements. Are they consistent or not?

Related: When you have to take the fall

6. What does the outage history look like for each of your providers?
7. What is the potential for fatfinger outage on each platform? For example one may be unduly complicated and prone to mistakes, based on API or dashboard interface.
8. Is one cloud more complicated to implement? For example Amazon Web Services while being more feature rich, is also much more complicated to deploy than a Digital Ocean setup.
9. You can see your backups on both platforms. Have you done restores on both? Regularly? Recently?
10. Do you have time to automate everything twice? For example you may need to rewrite your ansible playbooks for each platform

Related: When clients don’t pay

11. What is driving your business to embrace the idea of multicloud?
12. Do you have time to rewrite scripts twice? one-off and user-data scripts alike?
13. Do you have time to firedrill twice? Smoketest twice?
14. Will different clouds fail in different ways? For example one might be weak around it’s network. Another might be weak around it’s database service, and a third might encounter multi-tenant traffic congestion (disk or network).
15. When one cloud doesn’t support a feature, Ex: lifecycle policies on S3 buckets, do you need to build it for the other cloud?

Related: Is Amazon too big to fail?

16. Will deploying multicloud encourage abstraction layers Ex: object relational modelers (ORM) which heavily slow down performance?
17. Have you tested performance on both clouds?
18. Is cloud #2 a temporary disaster recovery solution or an on-going load balancing solution via geo-dns?
19. If you go hybrid cloud, how does that impact security, firewalls, and access controls?
20. How do you monitor your object stores (S3), scanning for open buckets? Do you rewrite this code for the other API?

Related: Why i ask for a deposit

21. What are the disaster types you’re planning for?
22. What is the cost of maintaining your application on multiple platforms?
23. What is the cost of building infra for multiple platforms?
24. What is the cost of debugging & troubleshooting on multiple platforms?
25. How does multcloud complicate deployments?

Related: How to hire a developer that doesn’t suck

26. Does multicloud complicate GDPR and other compliance questions?
27. How does multicloud complicate your billing and budget management?
28. What about microservices? How will these multiple platforms play across two clouds?
29. Is the community around each cloud equally active?
30. If you deploy with an infrastructure as code language like Terraform, is there an active community there for both of your chosen clouds?

Related: Why generalists are better at scaling the web

31. Does each provider support customers well? What are their respective reputations?
32. Is each cloud provider equally solvent & invested in the business? Will they be around in a year? five years? ten years?
33. What complications arise when migrating to or from this provider?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

I tried to understand Amazon EKS internals and here’s what happened

via GIPHY

EKS is a service to run kubernetes, so you don’t have to install the software, or manage or patch it. Just like GKS on Google, kubernetes as a service is really the way to go if you want to build kubernetes apps on AWS.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

So where do we get started? AWS docs are still coming together, so it’s not easy. I would start with Jerry Hargrove’s amazing EKS diagram. If a picture is worth a thousand words, this one is work 10,000!

1. Build your EKS cluster

I already did this in Terraform. There aren’t a lot of howtos, so I wrote one.

Basically you setup the service role, the cluster, then the worker nodes. Once you’ve done that you’re ready to run the demo app.

Related: When you have to take the fall

2. Build your app spec

These are very similar to ECS tasks. You’ll need to make slight changes. mountPoints become VolumeMounts, links get removed, and workingDirectory becomes workingDir and so on. Most of these changes are obvious, but the json syntax is obviously the biggest bear you’ll wrestle with.

When done do this:

$ kubectl apply -f my-controller.json

Related: When clients don’t pay

3. Build the service spec

The service is quite a bit different than an ECS service. I suggest starting from the guestbook service. Find it here

Edit that and add your own app name & details. Then apply:

$ kubectl apply -f my-service.json

Related: Why i ask for a deposit

4. Get the endpoint and go!

$ kubectl get service -o wide

You should see the EXTERNAL-IP display a loadbalancer endpoint. Copy that into your browser and you should see your app running.

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How do we migrate our business to the public cloud?

via GIPHY

The public cloud is no longer a bleeding edge technology for the trailblazers. It’s mainstream now. As you think about it, you consider your customers and the SLAs they’ve come to expect.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

It’s not if, but when to move to the cloud, how to get there, and how fast will be the transition?

Here are my thoughts on what to start thinking about.

1. Ramp up team, skills & paradigm thinking

Teams with experience in traditional datacenters have certain ways of architecting solutions, and thinking about problems. For example they may choose NFS servers to host objects, where in the cloud you will use object storage such as S3.

S3 has all sorts of new features, like lifecycle policies, and super super redundant eleven 9’s of durability. But your applications may need to be retrofitted to work with it, and your devs may need to learn about new features and functionality.

What about networking? This changes a lot in the cloud, with VPCs, and virtual appliances like NATs and Gateways. And what about security groups?

Interacting with this new world of cloud resources, requires new skillsets and new ways of thinking. So priority one will be getting your engineering teams learning, and upgrading skills. I wrote a piece about this how do I migrate my skills to the cloud?

Related: When you have to take the fall

2. Adapt to a new security model

With the old style datacenter, you typically have a firewall, and everything gets blocked & controlled. The new world of cloud computing uses security groups. These can be applied at the network level, across your VPC, or at the server level. And of course you can have many security groups with overlapping jurisdictions. Here’s how you setup a VPC with Terraform

So understanding how things work in the public cloud is quite new and challenging. There are ingress and egress rules, ways to audit with network flow logs, and more.

However again, it’s one thing to have the features available, it’s quite another to put them to proper use.

Related: When clients don’t pay

3. Adapt to fragile components & networks

While the public cloud collectively is extremely resilient, the individual components such as EC2 instances are decidedly not reliable. It’s expected that they can and will die frequently. It’s your job as the customer to build things in a self-healing way.

That means VPCs with multiple subnets, across availability zones (multi-az). And that means redundant instances for everything. What’s more you front your servers with load balancers (classic or application). These themselves are redundant.

Whether you are building a containerized application and deploying on ECS or a traditional auto-scaling webserver with database backend, you’ll need to plan for failure. And that means code that detects, and reacts to such failures without downtime to the end user.

Related: Why i ask for a deposit

4. Build infrastructure as code

You’ve heard about devops, now it’s time to put it into practice. Building your complete stack in code, is very possible with tools like Terraform. But you may have trouble along the way. I wrote I tried to write infra as code with Terraform and AWS and it didn’t go as expected

So there’s a learning curve. Both for your operations teams who have previously called Rackspace to get a new server provisioned. And also for your business, learning what incurs an outage, and the tricky finicky sides to managing your public cloud through code.

Related: Why i ask for a deposit

5. Audit, log & monitor

As you automate more and more pieces, you may have less confidence in the overall scope of your deployments. How many servers am I using right now? How many S3 buckets? What about elastic IPs?

As your automation can itself spinup new temporary environments, those resource counts will change from moment to moment. Even a spike in user engagement or a sudden flash sale, can change your cloud footprint in an instant.

That’s where heavy use of logging such as ELK (elasticsearch, logstash and kibana) can really help. Sure AWS offers CloudWatch and CloudTrail, but again you must put it all to good use.

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to setup an Amazon EKS demo with Terraform

via GIPHY

Since EKS is pretty new, there aren’t a lot of howtos on it yet.

I wanted to follow along with Amazon’s Getting started with EKS & Kubernetes Guide.

However I didn’t want to use cloudformation. We all know Terraform is far superior!

Join 38,000 others and follow Sean Hull on twitter @hullsean.

With that I went to work getting it going. And a learned a few lessons along the way.

My steps follow pretty closely with the Amazon guide above, and setting up the guestbook app. The only big difference is I’m using Terraform.

1. create the EKS service role

Create a file called eks-iam-role.tf and add the following:

resource "aws_iam_role" "demo-cluster" {
  name = "terraform-eks-demo-cluster"

  assume_role_policy = --POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "demo-cluster-AmazonEKSClusterPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = "${aws_iam_role.demo-cluster.name}"
}

resource "aws_iam_role_policy_attachment" "demo-cluster-AmazonEKSServicePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
  role       = "${aws_iam_role.demo-cluster.name}"
}

Note we reference demo-cluster resource. We define that in step #3 below.

Related: How to setup Amazon ECS with Terraform

2. Create the EKS vpc

Here’s the code to create the VPC. I’m using the Terraform community module to do this.

There are two things to notice here. One is I reference eks-region variable. Add this in your vars.tf. “us-east-1” or whatever you like. Also add cluster-name to your vars.tf.

Also notice the special tags. Those are super important. If you don’t tag your resources properly, kubernetes won’t be able to do it’s thing. Or rather EKS won’t. I had this problem early on and it is very hard to diagnose. The tags in this vpc module, with propagate to subnets, and security groups which is also crucial.

#
provider "aws" {
  region = "${var.eks-region}"
}

#
module "eks-vpc" {
  source = "terraform-aws-modules/vpc/aws"

  name = "eks-vpc"
  cidr = "10.0.0.0/16"

  azs             = "${var.eks-azs}"
  private_subnets = "${var.eks-private-cidrs}"
  public_subnets  = "${var.eks-public-cidrs}"

  enable_nat_gateway = false
  single_nat_gateway = true

  #  reuse_nat_ips        = "${var.eks-reuse-eip}"
  enable_vpn_gateway = false

  #  external_nat_ip_ids  = ["${var.eks-nat-fixed-eip}"]
  enable_dns_hostnames = true

  tags = {
    Terraform                                   = "true"
    Environment                                 = "${var.environment_name}"
    "kubernetes.io/cluster/${var.cluster-name}" = "shared"
  }
}

resource "aws_security_group_rule" "allow_http" {
  type              = "ingress"
  from_port         = 80
  to_port           = 80
  protocol          = "TCP"
  security_group_id = "${module.eks-vpc.default_security_group_id}"
  cidr_blocks       = ["0.0.0.0/0"]
}

resource "aws_security_group_rule" "allow_guestbook" {
  type              = "ingress"
  from_port         = 3000
  to_port           = 3000
  protocol          = "TCP"
  security_group_id = "${module.eks-vpc.default_security_group_id}"
  cidr_blocks       = ["0.0.0.0/0"]
}

Related: How I resolved some tough Docker problems when i was troubleshooting amazon ECS

3. Create the EKS Cluster

Creating the cluster is a short bit of terraform code below. The aws_eks_cluster resource.

#
# main EKS terraform resource definition
#
resource "aws_eks_cluster" "eks-cluster" {
  name = "${var.cluster-name}"

  role_arn = "${aws_iam_role.demo-cluster.arn}"

  vpc_config {
    subnet_ids = ["${module.eks-vpc.public_subnets}"]
  }
}

output "endpoint" {
  value = "${aws_eks_cluster.eks-cluster.endpoint}"
}

output "kubeconfig-certificate-authority-data" {
  value = "${aws_eks_cluster.eks-cluster.certificate_authority.0.data}"
}

Related: Is Amazon too big to fail?

4. Install & configure kubectl

The AWS docs are pretty good on this point.

First you need to install the client on your local desktop. For me i used brew install, the mac osx package manager. You’ll also need the heptio-authenticator-aws binary. Again refer to the aws docs for help on this.

The main piece you will add is a directory (~/.kube) and edit this file ~/.kube/config as follows:

apiVersion: v1
clusters:
- cluster:
    server: https://3A3C22EEF7477792E917CB0118DD3X22.yl4.us-east-1.eks.amazonaws.com
    certificate-authority-data: "a-really-really-long-string-of-characters"
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: aws
  name: aws
current-context: aws
kind: Config
preferences: {}
users:
- name: aws
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      command: heptio-authenticator-aws
      args:
        - "token"
        - "-i"
        - "sean-eks"
      #  - "-r"
      #  - "arn:aws:iam::12345678901:role/sean-eks-role"
      #env:
      #  - name: AWS_PROFILE
      #    value: "seancli"%  

Related: Is AWS too complex for small dev teams?

5. Spinup the worker nodes

This is definitely the largest file in your terraform EKS code. Let me walk you through it a bit.

First we attach some policies to our role. These are all essential to EKS. They’re predefined but you need to group them together.

Then you need to create a security group for your worker nodes. Notice this also has the special kubernetes tag added. Be sure that it there or you’ll have problems.

Then we add some additional ingress rules, which allow workers & the control plane of kubernetes all to communicate with eachother.

Next you’ll see some serious user-data code. This handles all the startup action, on the worker node instances. Notice we reference some variables here, so be sure those are defined.

Lastly we create a launch configuration, and autoscaling group. Notice we give it the AMI as defined in the aws docs. These are EKS optimized images, with all the supporting software. Notice also they are only available currently in us-east-1 and us-west-1.

Notice also that the autoscaling group also has the special kubernetes tag. As I’ve been saying over and over, that super important.

#
# EKS Worker Nodes Resources
#  * IAM role allowing Kubernetes actions to access other AWS services
#  * EC2 Security Group to allow networking traffic
#  * Data source to fetch latest EKS worker AMI
#  * AutoScaling Launch Configuration to configure worker instances
#  * AutoScaling Group to launch worker instances
#

resource "aws_iam_role" "demo-node" {
  name = "terraform-eks-demo-node"

  assume_role_policy = --POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "demo-node-AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = "${aws_iam_role.demo-node.name}"
}

resource "aws_iam_role_policy_attachment" "demo-node-AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = "${aws_iam_role.demo-node.name}"
}

resource "aws_iam_role_policy_attachment" "demo-node-AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = "${aws_iam_role.demo-node.name}"
}

resource "aws_iam_role_policy_attachment" "demo-node-lb" {
  policy_arn = "arn:aws:iam::12345678901:policy/eks-lb-policy"
  role       = "${aws_iam_role.demo-node.name}"
}

resource "aws_iam_instance_profile" "demo-node" {
  name = "terraform-eks-demo"
  role = "${aws_iam_role.demo-node.name}"
}

resource "aws_security_group" "demo-node" {
  name        = "terraform-eks-demo-node"
  description = "Security group for all nodes in the cluster"

  #  vpc_id      = "${aws_vpc.demo.id}"
  vpc_id = "${module.eks-vpc.vpc_id}"

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = "${
    map(
     "Name", "terraform-eks-demo-node",
     "kubernetes.io/cluster/${var.cluster-name}", "owned",
    )
  }"
}

resource "aws_security_group_rule" "demo-node-ingress-self" {
  description              = "Allow node to communicate with each other"
  from_port                = 0
  protocol                 = "-1"
  security_group_id        = "${aws_security_group.demo-node.id}"
  source_security_group_id = "${aws_security_group.demo-node.id}"
  to_port                  = 65535
  type                     = "ingress"
}

resource "aws_security_group_rule" "demo-node-ingress-cluster" {
  description              = "Allow worker Kubelets and pods to receive communication from the cluster control plane"
  from_port                = 1025
  protocol                 = "tcp"
  security_group_id        = "${aws_security_group.demo-node.id}"
  source_security_group_id = "${module.eks-vpc.default_security_group_id}"
  to_port                  = 65535
  type                     = "ingress"
}

data "aws_ami" "eks-worker" {
  filter {
    name   = "name"
    values = ["eks-worker-*"]
  }

  most_recent = true
  owners      = ["602401143452"] # Amazon
}

# EKS currently documents this required userdata for EKS worker nodes to
# properly configure Kubernetes applications on the EC2 instance.
# We utilize a Terraform local here to simplify Base64 encoding this
# information into the AutoScaling Launch Configuration.
# More information: https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-06-05/amazon-eks-nodegroup.yaml
locals {
  demo-node-userdata = --USERDATA
#!/bin/bash -xe

CA_CERTIFICATE_DIRECTORY=/etc/kubernetes/pki
CA_CERTIFICATE_FILE_PATH=$CA_CERTIFICATE_DIRECTORY/ca.crt
mkdir -p $CA_CERTIFICATE_DIRECTORY
echo "${aws_eks_cluster.eks-cluster.certificate_authority.0.data}" | base64 -d >  $CA_CERTIFICATE_FILE_PATH
INTERNAL_IP=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)
sed -i s,MASTER_ENDPOINT,${aws_eks_cluster.eks-cluster.endpoint},g /var/lib/kubelet/kubeconfig
sed -i s,CLUSTER_NAME,${var.cluster-name},g /var/lib/kubelet/kubeconfig
sed -i s,REGION,${var.eks-region},g /etc/systemd/system/kubelet.service
sed -i s,MAX_PODS,20,g /etc/systemd/system/kubelet.service
sed -i s,MASTER_ENDPOINT,${aws_eks_cluster.eks-cluster.endpoint},g /etc/systemd/system/kubelet.service
sed -i s,INTERNAL_IP,$INTERNAL_IP,g /etc/systemd/system/kubelet.service
DNS_CLUSTER_IP=10.100.0.10
if [[ $INTERNAL_IP == 10.* ]] ; then DNS_CLUSTER_IP=172.20.0.10; fi
sed -i s,DNS_CLUSTER_IP,$DNS_CLUSTER_IP,g /etc/systemd/system/kubelet.service
sed -i s,CERTIFICATE_AUTHORITY_FILE,$CA_CERTIFICATE_FILE_PATH,g /var/lib/kubelet/kubeconfig
sed -i s,CLIENT_CA_FILE,$CA_CERTIFICATE_FILE_PATH,g  /etc/systemd/system/kubelet.service
systemctl daemon-reload
systemctl restart kubelet
USERDATA
}

resource "aws_launch_configuration" "demo" {
  associate_public_ip_address = true
  iam_instance_profile        = "${aws_iam_instance_profile.demo-node.name}"
  image_id                    = "${data.aws_ami.eks-worker.id}"
  instance_type               = "m4.large"
  name_prefix                 = "terraform-eks-demo"
  security_groups             = ["${aws_security_group.demo-node.id}"]
  user_data_base64            = "${base64encode(local.demo-node-userdata)}"

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "demo" {
  desired_capacity     = 2
  launch_configuration = "${aws_launch_configuration.demo.id}"
  max_size             = 2
  min_size             = 1
  name                 = "terraform-eks-demo"

  #  vpc_zone_identifier  = ["${aws_subnet.demo.*.id}"]
  vpc_zone_identifier = ["${module.eks-vpc.public_subnets}"]

  tag {
    key                 = "Name"
    value               = "eks-worker-node"
    propagate_at_launch = true
  }

  tag {
    key                 = "kubernetes.io/cluster/${var.cluster-name}"
    value               = "owned"
    propagate_at_launch = true
  }
}

Related: How to hire a developer that doesn’t suck

6. Enable & Test worker nodes

If you haven’t already done so, apply all your above terraform:

$ terraform init
$ terraform plan
$ terraform apply

After that all runs, and all your resources are created. Now edit the file “aws-auth-cm.yaml” with the following contents:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: arn:aws:iam::12345678901:role/terraform-eks-demo-node
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes% 

Then apply it to your cluster:

$ kubectl apply -f aws-auth-cm.yaml

you should be able to use kubectl to view node status:

$ kubectl get nodes
NAME                           STATUS    ROLES     AGE       VERSION
ip-10-0-101-189.ec2.internal   Ready         10d       v1.10.3
ip-10-0-102-182.ec2.internal   Ready         10d       v1.10.3
$ 

Related: Why would I help a customer that’s not paying?

7. Setup guestbook app

Finally you can follow the exact steps in the AWS docs to create the app. Here they are again:

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.3/examples/guestbook-go/redis-master-controller.json
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.3/examples/guestbook-go/redis-master-service.json
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.3/examples/guestbook-go/redis-slave-controller.json
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.3/examples/guestbook-go/redis-slave-service.json
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.3/examples/guestbook-go/guestbook-controller.json
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.3/examples/guestbook-go/guestbook-service.json

Then you can get the endpoint with kubectl:

$ kubectl get services        
NAME           TYPE           CLUSTER-IP       EXTERNAL-IP        PORT(S)          AGE
guestbook      LoadBalancer   172.20.177.126   aaaaa555ee87c...   3000:31710/TCP   4d
kubernetes     ClusterIP      172.20.0.1                    443/TCP          10d
redis-master   ClusterIP      172.20.242.65                 6379/TCP         4d
redis-slave    ClusterIP      172.20.163.1                  6379/TCP         4d
$ 

Use “kubectl get services -o wide” to see the entire EXTERNAL-IP. If that is saying you likely have an issue with your node iam role, or missing special kubernetes tags. So check on those. It shouldn’t show for more than a minute really.

Hope you got everything working.

Good luck and if you have questions, post them in the comments & I’ll try to help out!

Related: How to migrate my skills to the cloud?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What are the key aws skills and how do you interview for them?

via GIPHY

Whether you’re striving for a new role as a Devops engineer, or a startup looking to hire one, you’ll need to be on the lookout for specific skills.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

I’ve been on both sides of the fence, at times interviewing candidates, and other times the candidate looking to impress to win a new role.

Here are my suggestions…

Devops Pipeline

Jenkins isn’t the only build server, but it’s been around a long time, so it’s everywhere. You can also do well with CircleCI or Travis. Or even Amazon’s own CodeBuild & CodePipeline.

You should also be comfortable with a configuration management system. Ansible is my personal favorite but obviously there is lots of Puppet & Chef out there too. Talk about a playbook you wrote, how it configures the server, installs packages, edits configs and restarts services.

Bonus points if you can talk about handling deployments with autoscaling groups. Those dynamic environments can’t easily be captured in static host manifests, so talk about how you handle that.

Of course you should also be strong with Git, bitbucket or codecommit. Talk about how you create a branch, what’s gitflow and when/how do you tag a release.

Also be ready to talk about how a code checkin can trigger a post commit hook, which then can go and build your application, or new infra to test your code.

Related: How to avoid insane AWS bills

CloudFormation or Terraform

I’m partial to Terraform. Terraform is MacOSX or iPhone to CloudFormation as Android or Windows. Why do I say that? Well it’s more polished and a nicer language to write in. CloudFormation is downright ugly. But hey both get the job done.

Talk about some code you wrote, how you configured IAM roles and instance profiles, how you spinup an ECS cluster with Terraform for example.

Related: How best to do discovery in cloud and devops engagements?

AWS Services

There are lots of them. But the core services, are what you should be ready to talk about. CloudWatch for centralized logging. How does it integrate with ECS or EKS?

Route53, how do you create a zone? How do you do geo load balancing? How does it integrate with CertificateManager? Can Terraform build these things?

EC2 is the basic compute service. Tell me what happens when an instance dies? When it boots? What is a user-data script? How would you use one? What’s an AMI? How do you build them?

What about virtual networking? What is a VPC? And a private subnet? What’s a public subnet? How do you deploy a NAT? WHat’s it for? How do security groups work?

What are S3 buckets? Talk about infraquently accessed? How about glacier? What are lifecycle policies? How do you do cross region replication? How do you setup cloudfront? What’s a distribution?

What types of load balancers are there? Classic & Application are the main ones. How do they differ? ALB is smarter, it can integrate with ECS for example. What are some settings I should be concerned with? What about healthchecks?

What is Autoscaling? How do I setup EC2 instances to do this? What’s an autoscaling group? Target? How does it work with ECS? What about EKS?

Devops isn’t about writing application code, but you’re surely going to be writing jobs. What language do you like? Python and shell scripting  are a start. What about Lambda? Talk about frameworks to deploy applications.

Related: Are you getting good at Terraform or wrestling with a bear?

Databases

You should have some strong database skills even if you’re not the day-to-day DBA. Amazon RDS certainly makes administering a bit easier most of the time. But upgrade often require downtime, and unfortunately that’s wired into the service. I see mostly Postgresql, MySQL & Aurora. Get comfortable tuning SQL queries and optimizing. Analyze your slow query log and provide an output.

Amazon’s analytics offering is getting stronger. The purpose built Redshift is everywhere these days. It may use a postgresql driver, but there’s a lot more under the hood. You also may want to look at SPectrum, which provides a EXTERNAL TABLE type interface, to query data directly from S3.

Not on Redshift yet? Well you can use Athena as an interface directly onto your data sitting in S3. Even quicker.

For larger data analysis or folks that have systems built around the technology, Hadoop deployments or EMR may be good to know as well. At least be able to talk intelligently about it.

Related: Is zero downtime even possible on RDS?

Questions

Have you written any CloudFormation templates or Terraform code? For example how do you create a VPC with private & public subnets, plus bastion box with Terraform? What gotches do you run into?

If you are given a design document, how do you proceed from there? How do you build infra around those requirements? What is your first step? What questions would you ask about the doc?

What do you know about Nodejs? Or Python? Why do you prefer that language?

If you were asked to store 500 terrabytes of data on AWS and were going to do analysis of the data what would be your first choice? Why? Let’s say you evaluated S3 and Athena, and found the performance wasn’t there, what would you move to? Redshift? How would you load the data?

Describe a multi-az VPC setup that you recommend. How do you deploy multiple subnets in a high availability arragement?

Related: Why generalists are better at scaling the web

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to avoid insane AWS bills

via GIPHY

I was flipping through the aws news recently and ran into this article by Juan Ramallo – I was billed 14k on AWS!

Scary stuff!

Join 38,000 others and follow Sean Hull on twitter @hullsean.

When you see headlines like this, your first instinct as a CTO is probably, “Am I at risk?” And then “What are the chances of this happening to me?”

Truth can be stranger than fiction. Our efforts as devops should be towards mitigating risk, and reducing potential for these kinds of things to happen.

1. Use aws instance profiles instead

Those credentials that aws provides, are great for enabling the awscli. That’s because you control your desktop tightly. Don’t you?

But passing them around in your application code is prone to trouble. Eventually they’ll end up in a git repo. Not good!

The solution is applying aws IAM permissions at the instance level. That’s right, you can grant an instance permissions to read or write an s3 bucket, describe instances, create & write to dynamodb, or anything else in aws. The entire cloud is api configurable. You create a custom policy for your instance, and attach it to a named instance profile.

When you spinup your EC2 instance, or later modify it, you attach that instance profile, and voila! The instance has those permissions! No messy credentials required!

Related: Is Amazon too big to fail?

2. Enable 2 factor authentication

If you haven’t already, you should force 2 factor authentication on all of your IAM users. It’s an extra step, but well well worth it. Here’s how to set it up

Mobile phones support all sorts of 2FA apps now, from Duo, to Authenticator, and many more.

Related: Is AWS too complex for small dev teams?

3. blah blah

Encourage developers to use tools like Pivotal’s Credentials Scan.

Hey, while you’re at it, why not add a post commit hook to your code repo in git. Have it run the credentials scan each time code is committed. And when it finds trouble, it should email out the whole team.

This will get everybody on board quick!

Related: Are we fast approaching cloud-mageddon?

4. Scan your S3 Buckets

Open S3 buckets can be a real disaster, offering up your private assets & business data to the world. What to do about it?

Scan your S3 buckets regularly.

Also you can tie in this scanning process to a monitoring alert. That way as soon as an errant bucket is found, you’re notified of the problem. Better safe than sorry!

Related: Which tech do startups use most?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to find freelance work

via GIPHY

I’ve decided to take the plunge, and begin a career as a freelancer. What do you think of services like UpWork? Can I build a business around that?

Join 38,000 others and follow Sean Hull on twitter @hullsean.

There are lots of services that promise the same thing. Headshops too are businesses built around reselling you to customers.

1. Whose relationship?

On those platforms you are a commodity. And further you don’t control the relationship. Upwork becomes your customer.

This is a crucial point. You can’t negotiate additional services or fees, or build on the relationship. Because your customer is UpWork. They control the business they bring to you.

Just remember, your boss/client/customer is the one who writes you a check.

Related: When you have to take the fall

2. Learn sales

If you think you’re not so great at sales, join the club. It’s a real talent, and one everybody is not born with.

But if you want to work for yourself, it’s absolutely crucial. So get practicing!

Related: When clients don’t pay

3. Go to events

The ways i have found, network, meetups, blog weekly and have a newsletter that you send out monthly. Add everyone you ever meet to your newsletter. Write interesting things & appeal to a broad audience. Some receiving your newsletter will not read it but they will see your name pop up in their inbox once a month.

Related: Why i ask for a deposit

4. Expand

As you network, ask others for recommendations. Events, private email lists, single day conferences, forums etc.

Related: Can progress reports help consulting engagementss succeed?

5. Craft an origin story

And don’t forget to tell your story. And tell it well. Craft a memorable origin narrative. Practice & and add or remove things that resonate with people you meet. Even ask people, what do you think about my presentation? Any suggestions? Is it confusing, enticing, exciting?

Related: Why do people leave consulting?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to succeed with fixed price projects

via GIPHY

Bidding on projects is an art as much as a science. Exciting a customer, around skills and past successes is as important as being able to see details that haven’t yet materialized.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

So how does one approach this challenge. One way is to steer towards time and materials, and let things evolve in their own way. But that may not always work.

Here are my thoughts on how to navigate a fixed-fee project.

Overhead costs

When thinking about costing of projects, there are a lot of hidden costs. For fulltime folks, there is the cost of overhead around office space, supplies, training, liability & health insurance, retirement, time off and even severance in some cases.

There is also the cost of time, to hire the right team, manage them, and bring all the pieces together to success product out the door.

Lots of intangibles.

Related: Can progress reports help you achieve successful engagements?

Evolving scope

When looking at a project, to come up with a realistic fixed bid, the scope must be carefully considered. If the bridge has two spans at either end and you decide to add one in the middle, does that mean a project of twice the size?

Both the vendor and manager must together attempt to break down the full scope into smaller pieces. Inevitably there will be some amount of emergent tasks and the scope will change and evolve.

Both consultant and customer must be realistic about this. You can call them product features or in the agile universe stories, but at the end of the day when you have many pieces surprises will happen.

The devil is surely in the details!

Related: How best to do discovery in cloud and devops engagements?

Horse Trading Skills

Given that we know things will change, the customer and vendor should plan for change.

If both parties have a realistic perspective, there is the possibility of exchanging original scoped items for emergent or evolving scoped surprises.

That is both need to be comfortable doing some sort of horse trading, to keep the levels balanced. The client then gets some leeway, as does the consultant in deliverables.

It’s not easily, but truly necessary in a fixed priced project. Because a scope never really sits still.

Related: Why do people leave consulting?

Underbidding

Another approach that may work is underbidding to win the project. Here your scope is expected to change, and it becomes a painful process each time. If you are strong on sales, this may work, but you’re sure to get an endless stream of change orders, and many many scrapes and bruises.

Related: Why I ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters