Why maintenance sometimes a forgotten art?

via GIPHY

Just finished reading the excellent
Why do people neglect maintenance?.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

With a wide ranging discussion, from cultural myths to cognitive biases, there are a lot of reasons why your organization may not be giving maintenance the attention it deserves.

Here are some thoughts…

1. Weighing short & long term tradeoffs

When looking at maintenance problems, we sometimes must weigh easy to implement quick fixes, versus the better though weightier longer term fix.

Often the longer term fix requires more downtime, and so is a harder pill to swallow. So sometimes taking that medicine is put off till later. Weighing how much that could cost you is not easy. But that is often the reality of maintenance and ops teams.

Read: How to hire a developer that doesn’t suck

2. Keeping it sexy

One interesting point they make in the article is around the culture of innovation. Since building new products and changing the world is sexy, some how the day-to-day realities of managing and maintaining that after delivery can take a back seat.

Just as we prioritize creating new features, and building a changed product, we must always weigh the costs of such change. As I wrote in the four letter word dividing dev and ops, different team members have different mandates. And that’s important.

While developers are mandated with bringing new features to life, ops are mandated with keeping things running at 4am. Software updates, maintenance, outages are what the ops team is worried about.

Related: Does Amazon have a dirty little secret?

3. Status symbols – dev versus ops

There is some interesting discussion by Andy Jess & Lee about how these different job roles can be viewed as low or high status in an organization.

In my article why we need techops I mention some of this narrative. Once at a keynote, I heard a sales guy advocate a product that would obviate your needing to hire ops. Yep, more money to hire Devs!

On the flip side at ops and DBA conferences, I’ve heard over and over the story of “some idiot developer that took down our production systems…”

You get it on both sides. When will teams ever work together?

Read: The art of resistance – when you have to be the bad guy

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Bye bye Kubernetes

via GIPHY

James Sanders at TechReplublic interviews Corey Quinn,
Is Kubernetes better for building a resume than a platform?

Corey is the snarky, observant, and talented author behind Last Week in AWS. If you haven’t subscribed you’re definitely missing out!

Join 35,000 others and follow Sean Hull on twitter @hullsean.

The observation about Kubernetes is not new. My friend & colleague over at Smash Company, Lawrence Krubner, wrote an exhaustive piece commenting on the flaws of docker, kubernetes and their eco-system.

It’s October 7th, 2019. I will revisit the question of Kubernetes demise again in six months. First thing in April 2020 we can see where my prediction stands!

1. Senior engineers are asking hard questions

The new generation of engineers, are more open to new solutions. They’re not bogged down by history, and legacy limitations. They’ll try new things, and embrace new technologies if they can solve a problem.

But senior engineers have been bitten. They’ve had the sorry job of untangling over engineered solutions. They’ve seen small firms live and die by wrong choices, and tech stack decisions. They’ve hit the wall on architectures, that were designed without enough perspective and foresight.

Read: What did Matt Ranney discover scaling Uber to 1000 microservices?

2. Trends come come and go

Does anyone out there remember Xen? Probably not. That’s because AWS squashed it like a little ant.

Kubernetes could go the same way. Not least because AWS has their own ECS.

Related: Can humility help you in your career?

3. Complexity is getting out of control

Kubernetes is sure great for doing blue-green deployments. And dockerizing all the things, helps developers and ops teams work together. After all just handing over the container asset, means ops can concentrate on deploying that. And kubernetes is great right up to this point.

But what about when kubernetes itself introduces application problems?

What about when kubernetes introduces performance and scalability problems?

And don’t even get me started about the crazy problems of running a database on Kubernetes.

Read: What happened when I offered advice outside my pay grade?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

When can I count time as project time?

via GIPHY

I was on the train looking at a Monday.com ad. I realized that the project management space is not dead and buried, but continuing to evolve everyday. While many of us spent years using tools like Jira, along comes an upstate to make project tracking simpler.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

The trouble is, I never felt it was the project management software that made tracking difficult.

For me there was always endless nuance, around tasks and tracking.

1. Time spent commuting or thinking off the chair

For me, when I get deep into the weeds of a customer and their technology stack, I’m thinking about problems as soon as I wake up. How can I tag those resources so they are region independent? How can I create the right S3 bucket policy so the application can write, but the world can’t?

What’s more if you’re like a lot of people, there’s some slack going on after hours, and emails too. Some of this may get tracked but inevitably there are hours not tracked.

An amount of estimating will probably happen. And creating a team policy on how to handle this ambiguity, is probably a good plan. Each project and company will be different, in terms of where to draw the line.

Read: What did Matt Ranney discover scaling Uber to 1000 microservices?

2. Crossover between projects or even customers

Sometimes there is a task which requires a bit of research, for example what’s the exact syntax in Terraform to create an IAM role, and attach it to an instance? You may spend an hour digging in, and then experimenting, to make sure you have the code right.

Then you may use that same snippet on another stack that you’re building, for department B and the same company, agile team C, or another customer entirely.

When you think about it, you carry a decade of learning into each new customer you work at. And they get all that learning, which translates into efficiency. And that’s effectively free.

So there is all sorts of ambiguity. And in each case you have to make a judgement call, when you’re tracking time.

Related: Can humility help you in your career?

3. Tasks grow and change

As you continue to work on a project, some tasks have a tendency to themselves unwind. As you dig deeper, you find vast caves yet to be explored. More excavation that needs to be done.

From there subtasks may hatch from the parent, and on it goes. But this will surely blowup initial estimates, and makes tracking progress often more of an art than a science.

Read: What happened when I offered advice outside my pay grade?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

I’m interviewed in a new book Devops Paradox

I had the good fortune to be interviewed earlier in the year by Viktor Farcic. I’m excited to see the book finally released. You can pickup your own copy of Devops Paradox here.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

Our interview is pretty wide ranging, touching on aspects of database administration, cloud operations, automation and devops.

1. Defining Devops

Viktor Farcic: Moving on to a more general subject, how would you define DevOps? I’ve gotten a different answer from every single person I’ve asked.

Sean Hull: I have a lot of opinions about it actually. I wrote an article on my blog a few years ago called The Four-Letter Word Dividing Dev and Ops, with the implication being that the four-letter word might be a swear word, akin to the development team swearing at the operations team, and the operations team swearing at the development team. But the four-letter word I was referring to was “risk.”
To summarize my article, in my view the development and the operations teams of old were separate silos in business, and they had very different mandates. Developers are tasked with writing code to build a product and to answer the needs of the customers, while directly building change into and facilitating a more sophisticated product. So, their thinking from day to day is about change and answering the requirements of the products team.
On the other hand, the operations team’s mandate is stability. It’s, “I don’t want these systems going down at 2:00 a.m.” So, over the long term, the operations teams are thinking about being as conservative as possible and having fewer moving parts, less code, and less new technologies. The simpler your stack is, the more reliable it is and the more robust and less likely it is to fail. I think the traditional reason why developers and operations teams were separated into silos was because of those two very different mandates.
They’re two different ways of prioritizing your work and your priorities when you think about the business and the technology. However, the downside was that those teams didn’t really communicate very well, and they were often at each other’s throats, pushing each other in opposite directions. But to answer your question, “what is DevOps?” I think of DevOps as a cultural movement that has made efforts to allow those teams to communicate better, and that’s a really good thing.

Read: What happened when I offered advice outside my pay grade?

2. How to adapt to change

Viktor Farcic: I have the impression that the speed with which new things are coming is only increasing. How do you keep up with it, and how do companies you work with keep up with all that?

Sean Hull: I don’t think they do keep up. I’ve gone to a lot of companies where they’ve never used serverless. None of their engineers know serverless at all. Lambda, web tasks, and Google Cloud functions have been out for a while, but I think there are very few companies that are able to really take advantage of them. I wrote another article blog post called Is Amazon Web Services Too Complex for Small Dev Teams? where I sort of implied that it is.
I do find a lot of companies want the advantage of on-demand computing, but they really don’t have the in-house expertise yet to really take advantage of all the things that Amazon can do and offer. That’s exactly why people aren’t up to speed on the technology, as it’s just changing so quickly. I’m not sure what the answer is. For me personally, there’s definitely a lot of stuff that I don’t know. I know I’m stronger in Python than I am with Node.js. Some companies have Node.js, and you can write Lambda functions in Java, Node.js, Python, and Go. So, I think Amazon’s investment in new technology allows the platform to evolve faster than a lot of companies are able to really take advantage of it.

Read: What did Matt Ranney discover scaling Uber to 1000 microservices?

3. What does the future hold for Devops?

Viktor Farcic: I’m going to ask you a question now that I hate being asked, so you’re allowed not to answer. Where do you see the future, let’s say a year from now?

Sean Hull:
I see more fragmentation happening across the technology landscape, and I think that that is ultimately making things more fragile because, for example, with microservices, companies don’t think twice about having Ruby, Python, Node.js, and Java. They have 10 different stacks, so when you hire new people, either you have to ask them to learn all those stacks or you have to hire people with each of those individual areas of expertise. The same is true with all these different clouds with their own sets of features: there’s a fragmentation happening.
Let’s look at the iPhone as an example. Think about how complex application testing is for Android versus the iPhone. I mean, you have hundreds of different smartphones that run Android, all with different screen sizes, different hardware, different amounts of memory, and the underlying stuff. Some may even have some extra chips that others don’t have, so how do you test your application across all those different platforms?
When you have fragmentation like that, it means the applications end up not working as well. I think the same thing is happening across the technology spectrum today that happened 10 to 15 years ago, where for your database backend there was Oracle, SQL Server, MySQL, and Postgres. Maybe somebody who’s a DB2 enterprise customer uses DB2, but now there are hundreds of open source databases, graph databases, and DynamoDB versus Cassandra, and so on and so on. There’s no real deep expertise in any of those databases.
What ends up happening is you have cases like what happened with customers who were using MongoDB. They found out the hard way about all of the weird behaviors and performance problems it had, because there just weren’t people around with deep knowledge of what was happening behind the scenes, whereas in Oracle’s space, for example, there are career DBAs that are performance experts that specialize in Oracle internals, so you can hire somebody to solve particular problems in that space.
There aren’t, as far as I know, a lot of people with MongoDB internals expertise. You’d have to call MongoDB themselves; maybe they have a few engineers that they can send out, so what’s the future? I see a lot of fragmentation and complexity, and that makes the internet and internet applications more fragile, more brittle, and more prone to failure.

Related: Can humility help you in your career?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Viktor Farcic Interview excerpts

via GIPHY

I recently did an interview with Viktor Farcic all about operations, DBA & Devops. Here are some excerpts: What does Dev-Ops mean?

Join 35,000 others and follow Sean Hull on twitter @hullsean.

Continuing where I left off, I’ve included a few more highlights below. Enjoy!

1. Can I use a tool to migrate to the public cloud?

Viktor Farcic: I’ve seen quite a few of these tools that tell you if you buy our tool, we’re going to transfer whatever you have to the cloud. For example, Docker announced in the last DockerCon that they’re going to put in containers without a single change and everything will work. What do you think about that?

Sean Hull Salespeople often simplify things quite a bit in order to sell a product; in my experience, the devil is in the detail. It’s not to say that an automation tool like that might not be valuable and useful. It might be a good first step to getting your application in the cloud, and it might be an easier way than to rebuild everything one by one. But I doubt that it’s going to work magically just by one script.
EC2 instances, for example, have different performance characteristics, not only in terms of the disk I/O, memory, and CPU, but in smaller instances, they actually throttle the network access so you might spin up an instance and it just might not behave well. It might take time. In fact, all sorts of things could happen. You might have written MySQL scripts that assume you have root access to the server and then you rebuild that in an RDS and you get errors because you don’t have access to those resources on RDS. There’s a lot of things to consider.

Read: What happened when I offered advice outside my pay grade?

2. How do you adapt to change?

Viktor Farcic: I have the impression that the speed with which new things are coming is only increasing. How do you keep up with it, and how do companies you work with keep up with all that?

Sean Hull: I don’t think they do keep up. I’ve gone to a lot of companies where they’ve never used serverless. None of their engineers know serverless at all. Lambda, web tasks, and Google Cloud functions have been out for a while, but I think there are very few companies that are able to really take advantage of them. I wrote another article blog post called Is Amazon Web Services Too Complex for Small Dev Teams? where I sort of implied that it is.
I do find a lot of companies want the advantage of on-demand computing, but they really don’t have the in-house expertise yet to really take advantage of all the things that Amazon can do and offer. That’s exactly why people aren’t up to speed on the technology, as it’s just changing so quickly. I’m not sure what the answer is. For me personally, there’s definitely a lot of stuff that I don’t know. I know I’m stronger in Python than I am with Node.js. Some companies have Node.js, and you can write Lambda functions in Java, Node.js, Python, and Go. So, I think Amazon’s investment in new technology allows the platform to evolve faster than a lot of companies are able to really take advantage of it.
Read: What did Matt Ranney discover scaling Uber to 1000 microservices?

3. What is the future of Devops?

Viktor Farcic: I’m going to ask you a question now that I hate being asked, so you’re allowed not to answer. Where do you see the future, let’s say a year from now?

Sean Hull:
I see more fragmentation happening across the technology landscape, and I think that that is ultimately making things more fragile because, for example, with microservices, companies don’t think twice about having Ruby, Python, Node.js, and Java. They have 10 different stacks, so when you hire new people, either you have to ask them to learn all those stacks or you have to hire people with each of those individual areas of expertise. The same is true with all these different clouds with their own sets of features: there’s a fragmentation happening.
Let’s look at the iPhone as an example. Think about how complex application testing is for Android versus the iPhone. I mean, you have hundreds of different smartphones that run Android, all with different screen sizes, different hardware, different amounts of memory, and the underlying stuff. Some may even have some extra chips that others don’t have, so how do you test your application across all those different platforms?
When you have fragmentation like that, it means the applications end up not working as well. I think the same thing is happening across the technology spectrum today that happened 10 to 15 years ago, where for your database backend there was Oracle, SQL Server, MySQL, and Postgres. Maybe somebody who’s a DB2 enterprise customer uses DB2, but now there are hundreds of open source databases, graph databases, and DynamoDB versus Cassandra, and so on and so on. There’s no real deep expertise in any of those databases.
What ends up happening is you have cases like what happened with customers who were using MongoDB. They found out the hard way about all of the weird behaviors and performance problems it had, because there just weren’t people around with deep knowledge of what was happening behind the scenes, whereas in Oracle’s space, for example, there are career DBAs that are performance experts that specialize in Oracle internals, so you can hire somebody to solve particular problems in that space.
There aren’t, as far as I know, a lot of people with MongoDB internals expertise. You’d have to call MongoDB themselves; maybe they have a few engineers that they can send out, so what’s the future? I see a lot of fragmentation and complexity, and that makes the internet and internet applications more fragile, more brittle, and more prone to failure.

Related: Can humility help you in your career?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What does DEV OPS mean?

via GIPHY

I was recently interviewed by Victor Farcic. He asked me a lot of interesting questions.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

One really struck me that I thought was important, about the whole devops movement.

How do you define devops

Viktor Farcic: Moving on to a more general subject, how would you define DevOps? I’ve gotten a different answer from every single person I’ve asked.

Sean Hull: I have a lot of opinions about it actually. I wrote an article on my blog a few years ago called The Four-Letter Word Dividing Dev and Ops, with the implication being that the four-letter word might be a swear word, akin to the development team swearing at the operations team, and the operations team swearing at the development team. But the four-letter word I was referring to was “risk.”

Related: Why generalists are better at scaling the web

To summarize my article, in my view the development and the operations teams of old were separate silos in business, and they had very different mandates. Developers are tasked with writing code to build a product and to answer the needs of the customers, while directly building change into and facilitating a more sophisticated product. So, their thinking from day to day is about change and answering the requirements of the products team.
On the other hand, the operations team’s mandate is stability. It’s, “I don’t want these systems going down at 2:00 a.m.” So, over the long term, the operations teams are thinking about being as conservative as possible and having fewer moving parts, less code, and less new technologies. The simpler your stack is, the more reliable it is and the more robust and less likely it is to fail. I think the traditional reason why developers and operations teams were separated into silos was because of those two very different mandates.

Also: Walking the delicate balance of transparency

They’re two different ways of prioritizing your work and your priorities when you think about the business and the technology. However, the downside was that those teams didn’t really communicate very well, and they were often at each other’s throats, pushing each other in opposite directions. But to answer your question, “what is DevOps?” I think of DevOps as a cultural movement that has made efforts to allow those teams to communicate better, and that’s a really good thing.

Read: One time in 2013 I had to take the fall

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Should we right size instances in the public cloud?

via GIPHY

I’m a big fan of Corey Quinn’s Last Week in AWS newsletter.

Recently he wrote a piece titled Right Sizing your instances is nonesense

Not to be outdone, a blogger Joe at Sunshower.io wrote a counterpoint piece Why Right Sizing instances is not nonsense.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

So what’s the verdict here? Is Corey wrong and Joe right? Or is Joe right and Corey wrong?

I would argue it depends. Corey’s piece emphasizes the big picture, essentially that technical changes can buy more trouble than the money they save. While

1. Corey emphasizes the 300 foot view

Corey’s article uses a broad brush, and in doing so some of his specifics are incorrect. For example the point about older versions of Operating Systems and hypervisor. In most cases your OS won’t know it’s running on a hypbervisor at all, so this seems a very rare edge case indeed.

That said his big picture conclusion seems spot on. Changing instance sizes can be a huge risk if you don’t do so regularly. With legacy apps, who knows how they might behave. In this you carry the same burden as changing instances at all. Your AMI may not be ready, or you may have had some manual steps to build the box again.

Related: How can we keep cloud architectures simple

2. Joe gets down in the weeds of technical specifics

The first thing about Joe’s post that caught my attention was to use the console to change the instance size. You shouldn’t be manually changing instances through the console to start with. What, everything is code, all the way down? Hehe…

Also his points about cost savings seemed cherry picked. If you can save that much money by changing instance sizes, it is well worth the cost to do regression, integration & disaster recovery tests to make sure it will all work. Get on it!

Also: What hidden things does a deposit reveal?

3. Use infrastructure as code, test & retest it

IAC is really the way to go. But that still doesn’t mean you’re out of the woods. Be cautious when changing instance sizes!

With code, there is testing. So change your Terraform code and then verify it works. Now just because you have a variable indicating the instance size, doesn’t mean changing it won’t break something.

Read: Can communication mixups sour an engagement?

4. Weigh the cost savings with the risk of breaking things

Changing an instance size and redeploying could break all manner of things. It’s possible you used a variable for the instance size in some places and hard coded it in others. Or made some weird reference in an autoscaling group.

It may be the AMI you’ve built works on one type of instance but not another. Or that your AMI is deployed in one region but not another. Or that your old instance size is available in us-east-2, but your new instance size is not yet available there. Yes the console wouldn’t have offered it, but your Terraform code didn’t know.

Check out: How I use 5 daily habits to stay on track

5. Put up some guiderails

As you’ll see further down in the comments, Joe suggests to put limits around instance size changes. That makes sense. After you’ve done testing, you’ll have an idea of what these limits should be. No instance size less than 30GB memory? No instance size greater than X. None in region Y. Etc.

It’ll require tweaking your terraform infrastructure code, so it’s not a free change. But it’ll pay dividends if your costs savings are in the thousands.

Also: Can daily notes help you work better with clients?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Are you as good as the public cloud?

via GIPHY

According to Lyft’s recent public filing, they plan to spend 300 million buckaroos in the next 2.5 years on AWS.

Did I hear that right?

Join 38,000 others and follow Sean Hull on twitter @hullsean.

Perhaps that is their estimate, or the maximum amount they want to budget for. Regardless that’s a lot of money any way you slice it. A lot of folks are commenting about how crazy that is, and how much datacenter you could build yourself with that much money.

What do you think? Is it foolhardy? Or is there a hidden wisdom here?

Here’s my take.

1. Do you have one million customers testing your datacenter?

If you’re comparing the cost of the cloud to the raw numbers of running your own datacenter, the hardware costs are not enough. You’ll need to include the ops teams & other engineers. Right, you probably guessed that.

But did you factor in the costs of a legion of testers. This is the hidden cost that commercial software carries, even while open source software gets this benefit for free.

With a public cloud like AWS you have millions of customers testing the product everyday, and running into edge cases long before you do. So you get a better service, that’s more reliable, all invisibly for free.

Related: How can we keep cloud architectures simple

2. Do you have 66 datacenters spread across 21 regions and a free network between them?

Anybody who was building web applications in the year 2000 will remember how websites didn’t load the same for different customers. Depending on where in the world they were located, they could experience a very different user experience.

These days we assume that we can be global from day one. But how exactly do we achieve this? Remember with a public cloud, you’re getting tons of things for free, without knowing it. Moving data between AZs or regions? That’s all going across a private interconnect.

And that’s not even including the 180 nodes inside cloudfront that give you a global CDN footprint too!

Also: What hidden things does a deposit reveal?

3. Do you have an engineering team automating away job roles?

I remember the days of DBA job role, do you? Probably not. I specialized in this for years, and there were tons of companies hiring me to help them with it. First Oracle, then MySQL, then Postgres.

Then along came Amazon RDS. Guess what, companies don’t really hire for that role anymore. They do need help with it from time to time, but not as a primary specialization.

What do I mean? Well by hosting your application on AWS, you’re benefiting from the work of teams of engineers in different departments, all expanding on APIs and automating things that those one million customers are asking for.

You’re not going to be able to innovate that well and that quickly in your own datacenter. So you’ll pay more!

Read: Can communication mixups sour an engagement?

4. Do you have APIs that tons of engineers have already written code for?

A quick peek at Terraform’s community modules on Github and you’ll probably blush. From VPCs to bastion boxes, key management to load balancers, lots of code has been written and open sourced.

By deploying on a platform that a lot of other devs are using, you’ll benefit from all this open source code. That means you won’t have to write that stuff yourself.

Sure you’ll have integration work to do, but the hidden benefit of being on a popular platform saves you money.

Check out: How I use 5 daily habits to stay on track

5. Can you do disaster recovery for free?

If you build your own datacenter, you have to buy all your capacity. So there are no spare servers sitting around waiting for your use. In the public cloud there is always spare capacity.

What that means is you can write automation code to spinup copies of your application stack in alternate regions, at the push of a button. Thus you effectively get disaster recovery for free!

Also: Can daily notes help you work better with clients?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How can we keep cloud architectures simple?

via GIPHY

I was reading hacker news, as I often do. And I found David Futcher’s post You Don’t Need all That Complex/Expensive/Distracting Infrastructure..

Of course it caught my attention. You may be surprised by the reasons

Join 38,000 others and follow Sean Hull on twitter @hullsean.

One quote that should raise your eyebrows…

I’ve seen the idea that every minute spent on infrastructure is a minute less spent shipping features

Here’s what I think…

1. Performance tuning is often about removing things

That sounds strange right? How can performance tuning be about removing things?

Here are a few examples:

o removing results: When you add an index you remove data, returning just the pieces you need.
o removing lag time: When you remove time, you get faster response. This cascades through your entire application, allowing more requests to get handled in a fixed amount of time. On AWS you get allocated a faster NIC when you use a larger instance size. It’s automatic, though somewhat invisible.
o removing data: By trimming tables, access speeds go up. Reads are faster when you hit the whole table, because there’s fewer records to sift through. Writes are faster because you are maintaining smaller associated indexes.
o removing codepaths: By having fewer libraries, and layers between your application, and the data it retrieves, you have less overhead. And that translates to quicker response time too.
o removing databases: If you’re fully microservices, you have a database behind every service. This means your service sometimes proxies just to get at data that has been decoupled. By consolidating databases to a shared db model, you reduce this cross-traffic dramatically.

Related: When you have to take the fall

2. Are we just building what everyone else does?

In technology as with any other industry, following the big trends is safe. If you’re building an architecture that is used by Facebook, Amazon, Apple, Netflix & Google are using, you’re on the best path, right? Certainly few would criticise their success. So yes it is safe. Even if it fails.

Going with a much simpler architecture, that has even a whiff of so-called legacy, may seem like bucking the trend. But fewer moving parts means less to break, less to manage, and less to tune.

Related: Why generalists are better at scaling the web

3. Customers don’t care

Remember, customers aren’t devops gurus nor do they care about Rust versus Swift versus Elixir. What they care about is they can comment on their social media app or order your widget. They want your product to work.

They don’t care if it is hosted in the cloud, or at a managed datacenter. They probably don’t even care about tiny short outages either. What they do care about is that it works, and works well. And fast.

If your infrastructure allows you to be responsive to customers, roll out new product features & updates, you’re going to have some happy customers. The end!

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Before you do infrastructure as code, consider your workflow carefully

via GIPHY

What happens with infrastructure as code when you want to make a change on prod?

I’ve been working on automation for a few years now. When you build your cloud infrastructure with code, you really take everything to a whole new level.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

You might be wondering, what’s the day to day workflow look like? It’s not terribly different from regular software development. You create a branch, write some code, test it, and commit it. But there are some differences.

1. Make a change on production

In this scenario, I have development branch reference the repo directly on my laptop. There is a module, but it references locally like this:

source = “../”

So here are the steps:

1. Make your change to terraform in main repo

This happens in the root source directory. Make your changes to .tf files, and save them.

2. Apply change on dev environment

You haven’t committed any changes to the git repo yet. You want to test them. Make sure there are no syntax errors, and they actually build the cloud resources you expect.

$ terraform plan
$ terraform apply

fix errors, etc.

3. Redeploy containers

If you’re using ECS, your code above may have changed a task definition, or other resources. You may need to update the service. This will force the containers to redeploy fresh with any updates from ECR etc.

4. Eyeball test

You’ll need to ssh to the ecs-host and attach to containers. There you can review env, or verify that docker ps shows your new containers are running.

5. commit changes to version control

Ok, now you’re happy with the changes on dev. Things seem to work, what next? You’ll want to commit your changes to the git repo:

$ git commit -am “added some variables to the application task definition”

6. Now tag your code – we’ll use v1.5

$ git tag -a v1.5 -m “added variables to app task definition”
$ git push origin v1.5

Be sure to push the tag (step 2 above)

7. Update stage terraform module to use v1.5

In your stage main.tf where your stage module definition is, change the source line:

source = “git::https://github.com/hullsean/infra-repo.git?ref=v1.5”

8. Apply changes to stage

$ terraform init
$ terraform plan
$ terraform apply

Note that you have to do terraform init this time. That’s because you are using a new version of your code. So terraform has to go and fetch the whole thing, and cache it in .terraform directory.

9. apply change on prod

Redo steps 7 & 8 for your prod-module main.tf.

Related: When you have to take the fall

2. Pros of infrastructure code

o very professional pipeline
o pipeline can be further automated with tests
o very safe changes on prod
o infra changes managed carefully in version control
o you can back out changes, or see how you got here
o you can audit what has happened historically

Related: When clients don’t pay

3. Cons of over automating

o no easy way to sidestep
o manual changes will break everything
o you have to have a strong knowledge of Terraform
o you need a strong in-depth knowledge of AWS
o the whole team has to be on-board with automation
o you can’t just go in and tweak things

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters