Before you do infrastructure as code, consider your workflow carefully

via GIPHY

What happens with infrastructure as code when you want to make a change on prod?

I’ve been working on automation for a few years now. When you build your cloud infrastructure with code, you really take everything to a whole new level.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

You might be wondering, what’s the day to day workflow look like? It’s not terribly different from regular software development. You create a branch, write some code, test it, and commit it. But there are some differences.

1. Make a change on production

In this scenario, I have development branch reference the repo directly on my laptop. There is a module, but it references locally like this:

source = “../”

So here are the steps:

1. Make your change to terraform in main repo

This happens in the root source directory. Make your changes to .tf files, and save them.

2. Apply change on dev environment

You haven’t committed any changes to the git repo yet. You want to test them. Make sure there are no syntax errors, and they actually build the cloud resources you expect.

$ terraform plan
$ terraform apply

fix errors, etc.

3. Redeploy containers

If you’re using ECS, your code above may have changed a task definition, or other resources. You may need to update the service. This will force the containers to redeploy fresh with any updates from ECR etc.

4. Eyeball test

You’ll need to ssh to the ecs-host and attach to containers. There you can review env, or verify that docker ps shows your new containers are running.

5. commit changes to version control

Ok, now you’re happy with the changes on dev. Things seem to work, what next? You’ll want to commit your changes to the git repo:

$ git commit -am “added some variables to the application task definition”

6. Now tag your code – we’ll use v1.5

$ git tag -a v1.5 -m “added variables to app task definition”
$ git push origin v1.5

Be sure to push the tag (step 2 above)

7. Update stage terraform module to use v1.5

In your stage main.tf where your stage module definition is, change the source line:

source = “git::https://github.com/hullsean/infra-repo.git?ref=v1.5”

8. Apply changes to stage

$ terraform init
$ terraform plan
$ terraform apply

Note that you have to do terraform init this time. That’s because you are using a new version of your code. So terraform has to go and fetch the whole thing, and cache it in .terraform directory.

9. apply change on prod

Redo steps 7 & 8 for your prod-module main.tf.

Related: When you have to take the fall

2. Pros of infrastructure code

o very professional pipeline
o pipeline can be further automated with tests
o very safe changes on prod
o infra changes managed carefully in version control
o you can back out changes, or see how you got here
o you can audit what has happened historically

Related: When clients don’t pay

3. Cons of over automating

o no easy way to sidestep
o manual changes will break everything
o you have to have a strong knowledge of Terraform
o you need a strong in-depth knowledge of AWS
o the whole team has to be on-board with automation
o you can’t just go in and tweak things

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

I tried to build infrastructure as code Terraform and Amazon. It didn’t go as I expected.

via GIPHY

As I was building infrastructure code, I stumbled quite a few times. You hit a wall and you have to work through those confusing and frustrating moments.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

Here are a few of the lessons I learned in the process of building code for AWS. It’s not easy but when you get there you can enjoy the vistas. They’re pretty amazing.

Don’t pass credentials

As you build your applications, there are moments where components need to use AWS in some way. Your webserver needs to use S3 or your ELK box needs to use CloudWatch. Maybe you want to do an RDS backup, or list EC2 instances.

However it’s not safe to pass your access_key and secret_access_key around. Those should be for your desktop only. So how best to handle this in the cloud?

IAM roles to the rescue. These are collections of privileges. The cool thing is they can be assigned at the INSTANCE LEVEL. Meaning your whole server has permissions to use said resources.

Do this by first creating a role with the privileges you want. Create a json policy document which outlines the specific rules as you see fit. Then create an instance profile for that role.

When you create your ec2 instance in Terraform, you’ll specify that instance profile. Either by ARN or if Terraform created it, by resource ID.

Related: How to avoid insane AWS bills

Keep passwords out of code

Even though we know it should not happen, sometimes it does. We need to be vigilant to stay on top of this problem. There are projects like Pivotal’s credential scan. This can be used to check your source files for passwords.

What about something like RDS? You’re going to need to specify a password in your Terraform code right? Wrong! You can define a variable with no default as follows:

variable "my_rds_pass" {
  description = "password for rds database"
}

When Terraform comes upon this variable in your code, but sees there is no “default” value, it will prompt you when you do “$ terraform apply”

Related: How best to do discovery in cloud and devops engagements?

Versioning your code

When you first start building terraform code, chances are you create a directory, and some tf files, then do your “$ terraform apply”. When you watch that infra build for the first time, it’s exciting!

After you add more components, your code gets more complex. Hopefully you’ve created a git repo to house your code. You can check & commit the files, so you have them in a safe place. But of course there’s more to the equation than this.

How do you handle multiple environments, dev, stage & production all using the same code?

That’s where modules come in. Now at the beginning you may well have a module that looks like this:

module "all-proj" {

  source = "../"

  myvar = "true"
  myregion = "us-east-1"
  myami = "ami-64300001"
}

Etc and so on. That’s the first step in the right direction, however if you change your source code, all of your environments will now be using that code. They will get it as soon as you do “$ terraform apply” for each. That’s fine, but it doesn’t scale well.

Ultimately you want to manage your code like other software projects. So as you make changes, you’ll want to tag it.

So go ahead and checkin your latest changes:

# push your latest changes
$ git push origin master
# now tag it
$ git tag -a v0.1 -m "my latest coolest infra"
# now push the tags
$ git push origin v0.1

Great now you want to modify your module slightly. As follows:

module "all-proj" {

  source = "git::https://[email protected]/hullsean/myproj-infra.git?ref=v0.1"

  myvar = "true"
  myregion = "us-east-1"
  myami = "ami-64300001"
}

Cool! Now each dev, stage and prod can reference a different version. So you are free to work on the infra without interrupting stage or prod. When you’re ready to promote that code, checkin, tag and update stage.

You could go a step further to be more agile, and have a post-commit hook that triggers the stage terraform apply. This though requires you to build solid infra tests. Checkout testinfra and terratest.

Related: Are you getting good at Terraform or wrestling with a bear?

Managing RDS backups

Amazon’s RDS service is a bit weird. I wrote in the past asking Is upgrading RDS like a shit-storm that will not end?. Yes I’ve had my grievances.

My recent discovery is even more serious! Terraform wants to build infra. And it wants to be able to later destroy that infra. In the case of databases, obviously the previous state is one you want to keep. You want that to be perpetual, beyond the infra build. Obvious, no?

Apparently not to the folks at Amazon. When you destroy an RDS instance it will destroy all the old backups you created. I have no idea why anyone would want this. Certainly not as a default behavior. What’s worse you can’t copy those backups elsewhere. Why not? They’re probably sitting in S3 anyway!

While you can take a final backup when you destroy an RDS instance, that’s wondeful and I recommend it. However that’s not enough. I highly suggest you take matters into your own hands. Build a script that calls pg_dump yourself, and copy those .sql or .dump files to S3 for safe keeping.

Related: Is zero downtime even possible on RDS?

When to use force_destroy on S3 buckets

As with RDS, when you create S3 buckets with your infra, you want to be able to cleanup later. But the trouble is that once you create a bucket, you’ll likely fill it with objects and files.

What then happens is when you go to do “$ terraform destroy” it will fail with an error. This makes sense as a default behavior. We don’t want data disappearing without our knowledge.

However you do want to be able to cleanup. So what to do? Two things.

Firstly, create a process, perhaps a lambda job or other bucket replication to regularly sync your s3 bucket to your permanent bucket archive location. Run that every fifteen minutes or as often as you need.

Then add a force_destroy line to your s3 bucket resource. Here’s an example s3 bucket for storing load balancer logs:

data "aws_elb_service_account" "main" {}

resource "aws_s3_bucket" "lb_logs" {
  count         = "${var.create-logs-bucket ? 1 : 0}"
  force_destroy = "${var.force-destroy-logs-bucket}"
  bucket        = "${var.lb-logs-bucket}"
  acl           = "private"

  policy = POLICY
{
  "Id": "Policy",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "s3:PutObject"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::${var.lb-logs-bucket}/*",
      "Principal": {
        "AWS": [
          "${data.aws_elb_service_account.main.arn}"
        ]
      }
    }
  ]
}
POLICY

  tags {
    Environment = "${var.environment_name}"
  }
}

NOTE: There should be “< <" above and to the left of POLICY. HTML was not having this, and I couldn't resolve it quickly. Oh well.

Related: Why generalists are better at scaling the web

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is there a serious skills shortage around devops space?

via GIPHY

As devops adoption picks up pace, the signs are everywhere. Infrastructure as code once a backwater concept, and a hoped for ideal, has become an essential to many startups.

Why might that be?

Join 37,000 others and follow Sean Hull on twitter @hullsean.

My theory is that devops enables the business in a lot of profound ways. Sure it means one sysadmin can do much more, manage a fleet of servers, and support a large user base. But it goes much deeper than that.





Being able to standup your entire dev, qa, or production environment at the click of the button transforms software delivery dramatically. It means it can happen more often, more easily, and with less risk to the business. It means you can do things like blue/green deployments, rolling out featues without any risk to the production environment running in parallel.

What kind of chops does it take?

Strong generalist skills

For starters you’ll need a pragmatist mindset. Not fanatical about one technology, but open to the many choices available. And as a generalist, you start with a familiarity with a broad spectrum of skills, from coding, troubleshooting & debugging, to performance tuning & integration testing.

Stir into the mix good operating system fundamentals, top to bottom knowledge of Unix & Linux, networking, configuration and more. Maybe you’ve built kernels, compiled packages by hand, or better yet contributed to a few open source projects yourself.

You’ll be comfortable with databases, frontend frameworks, backend technologies & APIs. But that’s not all. You’ll need a broad understanding of cloud technologies, from GCP to AWS. S3, EC2, VPCs, EBS, webservers, caching servers, load balancing, Route53 DNS, serverless lambda. Add to all of that programmable infrastructure through CloudFormation or Terraform.

Related: 30 questions to ask a serverless fanboy

Competent programmer

Although as a devop you probably won’t be doing frontend dev, you’ll need some cursory understanding of those. You should be competent at Python and perhaps Nodejs. Maybe Ruby & bash scripts. You’ll need to understand JSON & Yaml, CloudFormation & Terraform if you want to deliver IAC.

Related: Does a 4-letter-word divide dev & ops?

Strong sysadmin with ops mindset

These are fundamental. But what does that mean? Ops mindset is born out of necessity. Having seen failures & outages, you prioritize around uptime. A simpler stack means fewer moving parts & less to manage. Do as Martin Weiner would suggest & use boring tech.

But you’ll also need to reason about all these components. That’ll come from dozens of debug & troubleshooting sessions you’ll do through years of practice.

Related: How to hire a developer that doesn’t suck

Understand build systems & deployment models

Build systems like CircleCI, Jenkins or Gitlab offer a way to automate code delivery. And as their use becomes more widespread knowing them becomes de rigueur. But it doesn’t end there.

With deployments you’ll have a lot to choose from. At the very simplest a single target deploy, to all-at-once, minimum in service and rolling upgrades. But if you have completely automated your dev, qa & prod infra buildout, you can dive into blue/green deployments, where you make a completely knew infra for each deploy, test, then tear down the old.

Related: Is AWS too complex for small dev teams?

Personality to communicate across organization

I think if you’ve made it this far you will agree that the technical know-how is a broad spectrum of modern computing expertise. But you’ll also need excellent people skills to put all this into practice.

That’s because devops is also about organizational transformation. Yes devs & ops have to get up to speed on the tech, but the organization has to get on board too. Many entrenched orgs pay lip service to devops, but still do a lot of things manually. This is out of fear as much as it stands as technical debt.

But getting past that requires evangelizing, and advocating. For that a leader in the devops department will need superb people skills. They’ll communicate concepts broadly across the organization to win hearts and minds.

Related: Will Microservices just die already?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Review – Test Driven Infrastructure with Chef – Stephen Nelson-Smith

In search of a good book on Chef itself, I picked up this new title on O’Reilly.  It’s one of their new format books, small in size, only 75 pages.

There was some very good material in this book.  Mr. Nelson-Smith’s writing style is good, readable, and informative.  The discussion of risks of infrastructure as code was instructive.  With the advent of APIs to build out virtual data centers, the idea of automating every aspect of systems administration, and building infrastructure itself as code is a new one.  So an honest discussion of the risks of such an approach is bold and much needed.  I also liked the introduction to Chef itself, and the discussion of installation.

Chef isn’t really the main focus of this book, unfortunately.  The book spends a lot of time introducing us to Agile Development, and specifically test driven development.  While these are lofty goals, and the first time I’ve seen treatment of the topic in relation to provisioning cloud infrastructure, I did feel too much time was spent on that.  Continue reading “Review – Test Driven Infrastructure with Chef – Stephen Nelson-Smith”