Are you as good as the public cloud?

via GIPHY

According to Lyft’s recent public filing, they plan to spend 300 million buckaroos in the next 2.5 years on AWS.

Did I hear that right?

Join 38,000 others and follow Sean Hull on twitter @hullsean.

Perhaps that is their estimate, or the maximum amount they want to budget for. Regardless that’s a lot of money any way you slice it. A lot of folks are commenting about how crazy that is, and how much datacenter you could build yourself with that much money.

What do you think? Is it foolhardy? Or is there a hidden wisdom here?

Here’s my take.

1. Do you have one million customers testing your datacenter?

If you’re comparing the cost of the cloud to the raw numbers of running your own datacenter, the hardware costs are not enough. You’ll need to include the ops teams & other engineers. Right, you probably guessed that.

But did you factor in the costs of a legion of testers. This is the hidden cost that commercial software carries, even while open source software gets this benefit for free.

With a public cloud like AWS you have millions of customers testing the product everyday, and running into edge cases long before you do. So you get a better service, that’s more reliable, all invisibly for free.

Related: How can we keep cloud architectures simple

2. Do you have 66 datacenters spread across 21 regions and a free network between them?

Anybody who was building web applications in the year 2000 will remember how websites didn’t load the same for different customers. Depending on where in the world they were located, they could experience a very different user experience.

These days we assume that we can be global from day one. But how exactly do we achieve this? Remember with a public cloud, you’re getting tons of things for free, without knowing it. Moving data between AZs or regions? That’s all going across a private interconnect.

And that’s not even including the 180 nodes inside cloudfront that give you a global CDN footprint too!

Also: What hidden things does a deposit reveal?

3. Do you have an engineering team automating away job roles?

I remember the days of DBA job role, do you? Probably not. I specialized in this for years, and there were tons of companies hiring me to help them with it. First Oracle, then MySQL, then Postgres.

Then along came Amazon RDS. Guess what, companies don’t really hire for that role anymore. They do need help with it from time to time, but not as a primary specialization.

What do I mean? Well by hosting your application on AWS, you’re benefiting from the work of teams of engineers in different departments, all expanding on APIs and automating things that those one million customers are asking for.

You’re not going to be able to innovate that well and that quickly in your own datacenter. So you’ll pay more!

Read: Can communication mixups sour an engagement?

4. Do you have APIs that tons of engineers have already written code for?

A quick peek at Terraform’s community modules on Github and you’ll probably blush. From VPCs to bastion boxes, key management to load balancers, lots of code has been written and open sourced.

By deploying on a platform that a lot of other devs are using, you’ll benefit from all this open source code. That means you won’t have to write that stuff yourself.

Sure you’ll have integration work to do, but the hidden benefit of being on a popular platform saves you money.

Check out: How I use 5 daily habits to stay on track

5. Can you do disaster recovery for free?

If you build your own datacenter, you have to buy all your capacity. So there are no spare servers sitting around waiting for your use. In the public cloud there is always spare capacity.

What that means is you can write automation code to spinup copies of your application stack in alternate regions, at the push of a button. Thus you effectively get disaster recovery for free!

Also: Can daily notes help you work better with clients?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How can communication mixups sour an enagement?

via GIPHY

I recently had some communications mixups with a customer. It reminded me how delicate, communications are between customers & vendors. What’s more they can be challenging between developers & managers. It highlighted for me these challenges, and the strategies I’ve learned over the years.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

While I didn’t lose the project, the initial misunderstandings continued to eclipse the project, long after they were cleared up.

1. First a missed conference call

Early on, we setup a call to discuss the challenges. The time of the conference call had been agreed to, but somehow it didn’t make it into my calendar. So when the appointed day & time came, i missed the call. This was before any contract was signed, or even the engagement had gotten started.

Needless to say this is a very delicate moment, as everything we do sets precedents about our personality and working style.

While we were able to reschedule, it added some initial strain to the relationship. As you’ll see that compounded more later.

Related: Walking the delicate balance of transparency

2. Next arriving late to the kickoff meeting

I always pride myself on timeliness. I think it communicates all sorts of things to customers. First it shows you’re serious and will manage the project carefully. Next it shows you respect for others time.

As usual, I left plenty of extra time, so I would arrive well before the meeting. Arriving at the building 20 minutes early, I searched but could not find the entrance. Neither could google as it turns out. Strange I thought, what could be wrong? I walked into the building where the address should be, and asked the doorman. He explained that the company didn’t reside there. Perhaps they’re not located at Park Avenue, but rather Park Avenue South, he suggested. And then the lightbulb goes off. Of course!

Realizing I now have 5 minutes to arrive on time, I’m going to be late. So I attempt to call the manager leading the meeting. I get his voicemail, and leave a message. I then jump in a taxi, and head to the Park Avenue South address. Arriving 10 minutes late, I quickly head upstairs. I’m greeted by some grumbling, and frustrated looks.

Despite this being an understandable mistake, it comes on the heels of another mixup. So now I’ve set a precedent of lateness. Despite being a timely person, it’s hard to erase the stamp that is there now.

We continued to have strained relations through the engagement. While it did finish to completion, I believe it would have gotten extended were I not to have stumbled early on.

Also: When you have to take the fall

3. What can a mixup indicate?

There are many questions it may raise. Possible ones include:

o Is candidate too busy with other tasks?
o Is the person forgetful?
o Is one party bullying on their perspective?
o Is there finger pointing & blame game in the org?
o What is the culture of the organization?
o Is it one of understanding & working together or blame game?
o Is the person uninterested?
o Is the project not a priority?
o Is the company disorganized
o Is miscommunication endemic?

Some of these thoughts may bubble up consciously, and some may linger as a bad taste in your mouth. Regardless, they should be faced head on, with understanding and humility on both sides.

Read: Why i ask for a deposit

4. The weight of first impressions

Inevitably, when there is a mixup, of lateness or missed meeting, there is a technical explanation. In my story above, the *reason* is Park Avenue and Park Avenue South are completely different addresses.

o First impressions are KEY

Even with a reasonable explanation, there is a reaction that is felt.

o There is a visceral emotional reaction we all have anyway

Such a reaction is easy to cause, but hard to patch up. It will take time, and multiple interactions to set a new impression to people.

o Reactions can be incorrect & irrational sometimes
o They can color further interactions

With time impressions can be adjusted, but it takes much more work after an initial mistake.

Check out: How to hire a developer that doesn’t suck

5. Possible solutions

While there is no sure fire way to avoid mixups like these, there are some things that can work in your favor.

o maintain flexibility

That means accepting blame, and mutual responsibility in reaching the goal posts.

o maintain a sense of I *can* be wrong

Everyone can be wrong, and everyone makes mistakes. So don’t try to avoid blame. That said emphasize that everyone must work together. On communicating engagement details, on mutual agreed times, and time zones.

o look for a sense of we *can* be wrong

I think these types of mixups can also be beneficial. For they underscore the customers management style. Do they point fingers, or acknowledge reasonable mistakes. Both parties will make mistakes eventually, and understanding of this builds good faith down the road.

o “let’s work together to improve communication”

Framing the mixup as a shared problem is important. Although the address mixup above is technically my fault, it’s probably a common one. Park Avenue South confuses everyone in New York. So an understanding customer might offer to share a bit in this with you.

o hold frame of mutual responsibility and working together using the word “we”

The frame is key. It’s not *all* your fault, nor is it the customers if they mixup. We all need to be understanding, to a point.

Also: Can daily notes help you work better with clients?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What hidden things does a deposit reveal?

via GIPHY

I like this idea of how integration tests in software development show you that everything is working and connected together properly.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

I think it’s interesting to consider how a deposit may serve a similar function across the financial space & contractual space.

1. Alignment across business units

In really small organizations, everyone is in tight communication. Finance knows what engineering is doing. In medium to large organizations, there can be a disconnect. Engineering may be 100% ready to start today, but finance is not ready. In some cases finance may not even know a consultant is being hired. Each case is different.

Some CTOs get this right away, and are already ahead of the request. While others might ask, “Well we’re ready to get going today, do you really need the deposit first? Because that might take some time.”

My thinking is, yes the engineering department is ready, but the organization is *not* completely ready. And it’s better that there be alignment across the organization. Ironing out that alignment, helps avoid other problems later on.

Related: When you have to take the fall

2. Organization or disorganization

Sometimes there is complete alignment, the contract is already ready, and the whole org really is ready to go. In other cases there can be some disfunction. For instance the lawyers have a lot of hoops that want us to jump through, in terms of a contract.

In other cases finance may only cut checks on a certain day of the month, or only pay 30 days after receiving an invoice. There are a lot of different policies. By insisting that we receive a deposit, however small, we iron out these things early.

If the engineering manager or CTO hiring you promises one thing, but finance has a policy against that, you’ll want to know early to avoid misunderstandings.

Related: Why generalists are better at scaling the web

3. Trust

The amount of a deposit is really irrelevant. It’s all about getting ducks in a row. Both in terms of what may be required of you the vendor, and what the company’s policies may be when onboarding consultants.

By ironing out these issues early, the customer is showing some faith in you as a vendor. They want you in particular, and will do what they need to, to make it work.

Related: Is AGILE right for fixing performance issues?

4. We want you to rush, but we don’t

I’ve encountered many cases where engineering was “ready” but finance was not. It’s tough. From the perspective of the CTO it may be a moot point to get stuck on.

My thought is to hold the frame of two organizations working together. When the organization has alignment that hiring this engineering resource is a priority, it will get things done that it needs to.

Related: How to hire a developer that doesn’t suck

5. Stress tests or organizational integration tests

In software testing, we have something called an integration test. It might be confirming that a login works, or a certain page can load. Behind the scenes that test requires the database to be running, the queuing system to work, an API call to return successfully, and so on. A lot of moving parts all have to be working for that test to succeed.

In a very real way, a deposit is the financial equivalent of an integration test. It confirms that we’re all aligned in the ways we need to, and are ready to get started.

Related: How do I migrate my skills to the cloud?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Walking the delicate balance of transparency

via GIPHY

I’ve written before about How I use progress reports to stay on track.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

I think it’s an interesting topic, and an important one.

While I do believe transparency is important when working with clients, that doesn’t mean it’s easy.

1. I start with daily notes

As I mentioned above I think they’re important. They provide visibility, improve trust, and keep me on track. They also help me remember what was happening on particular days. They’re like breadcrumbs on the path to building solutions.

Related: How to hire a developer that doesn’t suck

2. Notes can highlight organizational dysfunction

Often in my notes, there are details of who I coordinate to get what done. Perhaps I need credentials to reach a particular server. But to get those, I need an email address. And to get that, someone in department X must set that up. And there are delays with that process.

Those delays can cascade through the onboarding process, frustrating everyone. Although the operations team is read and raring to go, the finance or legal team is not quite ready, and there are delays there. Or there are hiccups in some other frequent business process.

Related: Why generalists are better at scaling the web

3. Notes can highlight task complexity

Sometimes I hear the phrase “That should be simple to do”. Only to find the devil buried in the details. As we put boots on the ground, we find there are many dependent tasks that are not finished. So those must be completed first.

In this case I think complexity of notes is a real triumph. For CTOs that are more management oriented, they may not have day-to-day understanding of coding complexity. And that’s ok. But when that complexity is laid out in all it’s gory detail it can be a real educational experience.

Related: How do I migrate my skills to the cloud?

4. For some CTOs high level is better

For some CTOs, they don’t want to slog through endless notes about setting up credentials, or problems with permissions of keys on server X or Y.

While in these cases I still collect the detail, I may also add some high level bullet points, that focus on what all these underlying parts are in service of.

Related: When you have to take the fall

5. Be prepared for archeological surprises

Inevitably there will be surprises. Whether department X does not know what department Y is doing. Or whether setting up an aws account takes two days, instead of two hours. Be prepared.

Inevitably I find these all help communication. And since I’ve been keeping them, I’ve never had a customer balk at an invoice. Notes don’t lie!

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How do you handle the onboarding at a new engagement?

via GIPHY

Jumping into the fray at a new firm is never easy. You’ll have new people’s names to remember, new web dashboards to login to, to bookmark, etc. New passwords to remember, new workflows to learn.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

While fulltime folks typically onboard logins in a week, and don’t contribute code for a month or more, consultant engagements mean hitting the ground running.

Here’s what I try to manage, when first diving in.

1. Deposit & agreement

When I start at a new engagement, I require a deposit. There are a lot of moving parts to that happening. In engineering speak, it acts like an integration test across your entire organization. All the departments must be aligned. Legal with the agreement language. Finance with the banking details, and invoice. CTO or manager with a clear picture of scope of work.

In getting past that first hurdle, both parties, will express their working style. And usually there are compromises that must be made on both sides. But the effort each one makes is essential to a strong and equitable relationship that you’re both working to build.

Related: How to hire a developer that doesn’t suck

2. Over communicate

Sometimes your teammate doesn’t know you’re also working to get things over to legal. And legal doesn’t know you’re working with finance. And finance doesn’t know you’re trying to tune a database. And the network admin doesn’t know your email address isn’t setup.

When in down over communicate. Don’t be afraid to repeat in an email what you thought you’d communicated clearly on slack. Sometimes slack messages are missed, as there are so many that get thrown around. It’s easy to miss a notification.

When in down, communicate again. Ask for clarification. Ask if there is anything someone may be waiting on.

Related: Why generalists are better at scaling the web

3. Keep daily notes

I’m a big fan of providing daily progress reports. There is a hell of a lot of detail buried in most tasks, and much of that gets lost in the shuffle.

Putting together your own notes of what your day looked like can help management understand that complexity. It can also help communicate where the organization is getting stuck. Sometimes surprises here can help unblock the org in other ways.

Related: Why i ask for a deposit

4. Beware the Slack rabbit hole

Slack can at times be a blessing, allowing you to reach someone immediately, but also sometimes be a curse. Have I seen every notification? Does the person who posted a note *assume* that I saw it? Which thread was that detail posted in anyway?

I personally like to repeat a lot of communications in email. From a consulting perspective this is also essential as it provides me a paper trail of what conversations we had. Remember once an engagement is completed, you lose the entire Slack message thread. That’s not true of email.

Related: When you have to take the fall

5. Anticipate login issues

Typically at the start of an engagement there is an email setup, and other authentication hangs off of that one. AWS confirms via email, or perhaps there is an SSO solution like OKTA. Inevitably, these interconnected pieces take time to setup. And one will hit a snag slowing down your over all onboarding.

Expect hiccups and challenges in this process. It’s normal for it to take some days. Imagine that FT hires typically onboard in a week, and don’t contribute code for a month or more. So keep everything in perspective on these points.

Related: How do I migrate my skills to the cloud?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How I use 5 daily habits to help me stay on track

via GIPHY

Join 38,000 others and follow Sean Hull on twitter @hullsean.

1. Keep a tight todo list

I shoot for five tasks on my todo list. Be sure they are small, 15-30 minute tasks because things have a way of ballooning. If what you’re doing takes longer, break it down into smaller pieces. This keeps you moving, and always making progress.

You might be tempted to have more items. But chances are you’ll spend an hour on emails and time on phone calls, and other distractions. And there will be preemptive tasks that suddenly require your attention. So keeping this list small, allows you to hit close to 100% success.

Sure there will be days when you’re *more* productive. It doesn’t hurt to pull some items off the long term list. 🙂

Related: When you have to take the fall

2. Zero inbox

I’m relentless about this. Terse replies, stay focused, and remember the reward you’ll give yourself when you finish your day.

Related: Why generalists are better at scaling the web

3. Take a break every hour or two

Smokers have an easy time with this. And perhaps coffee drinkers. If you’re anyone else, you may get into the habit of staying in your chair. Don’t. Regular breaks promote creative thinking, and physically moving helps get the mind in motion too.

Sometimes when I work in a coffeeshop I don’t bring my charger. That way I’m forced to take a break when the battery runs low.

Related: Why i ask for a deposit

4. Reward yourself

Pat yourself on the back when you complete all your tasks. If it’s 4pm, so be it. Jet a bit early. You know there will be other days when you’re working until 8pm too. Promise yourself something when you finish. A treat, or a stroll through the park, or an extra ten minutes to walk your dog, or a frosty IPA. Whatever it is, rewards help remind is we’ve done well.

Related: How to hire a developer that doesn’t suck

5. Always be networking

If you’re in a FT role, you may do most of your socializing with coworkers. That’s fine, but be sure to go to some regular meetups too. And followup with people. Maybe even give a few talks now and then. Networking is the most surefire way to build your career and always be growing. And it’s a little bit each day that it takes to build lasting momentum.

Related: How do I migrate my skills to the cloud?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How can we keep cloud architectures simple?

via GIPHY

I was reading hacker news, as I often do. And I found David Futcher’s post You Don’t Need all That Complex/Expensive/Distracting Infrastructure..

Of course it caught my attention. You may be surprised by the reasons

Join 38,000 others and follow Sean Hull on twitter @hullsean.

One quote that should raise your eyebrows…

I’ve seen the idea that every minute spent on infrastructure is a minute less spent shipping features

Here’s what I think…

1. Performance tuning is often about removing things

That sounds strange right? How can performance tuning be about removing things?

Here are a few examples:

o removing results: When you add an index you remove data, returning just the pieces you need.
o removing lag time: When you remove time, you get faster response. This cascades through your entire application, allowing more requests to get handled in a fixed amount of time. On AWS you get allocated a faster NIC when you use a larger instance size. It’s automatic, though somewhat invisible.
o removing data: By trimming tables, access speeds go up. Reads are faster when you hit the whole table, because there’s fewer records to sift through. Writes are faster because you are maintaining smaller associated indexes.
o removing codepaths: By having fewer libraries, and layers between your application, and the data it retrieves, you have less overhead. And that translates to quicker response time too.
o removing databases: If you’re fully microservices, you have a database behind every service. This means your service sometimes proxies just to get at data that has been decoupled. By consolidating databases to a shared db model, you reduce this cross-traffic dramatically.

Related: When you have to take the fall

2. Are we just building what everyone else does?

In technology as with any other industry, following the big trends is safe. If you’re building an architecture that is used by Facebook, Amazon, Apple, Netflix & Google are using, you’re on the best path, right? Certainly few would criticise their success. So yes it is safe. Even if it fails.

Going with a much simpler architecture, that has even a whiff of so-called legacy, may seem like bucking the trend. But fewer moving parts means less to break, less to manage, and less to tune.

Related: Why generalists are better at scaling the web

3. Customers don’t care

Remember, customers aren’t devops gurus nor do they care about Rust versus Swift versus Elixir. What they care about is they can comment on their social media app or order your widget. They want your product to work.

They don’t care if it is hosted in the cloud, or at a managed datacenter. They probably don’t even care about tiny short outages either. What they do care about is that it works, and works well. And fast.

If your infrastructure allows you to be responsive to customers, roll out new product features & updates, you’re going to have some happy customers. The end!

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Are shared databases back in vogue?

via GIPHY

I just stumbled upon this article by Roman Krivtsov on YC News Is shared database in microservices actually anti-pattern?. As a seasoned DBA in another life, I was intrigued by the title.

I devoured the piece.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

One quote at the end really sums things up…

In the very beginning you probably don’t need microservices. Start with monolith and see, if you really need them in future.

Here’s my takeaway from it.

1. DB access by proxy

As the database itself is a service, does it make sense to use a microservice to essentially front the database? By doing that we simply add a layer of abstraction, need to keep the API up to date, and eat the network and compute expense of interacting with that data by proxy.

Related: When you have to take the fall

2. Changing db schema or service API

In a traditional database, when we update a schema, we can keep the old columns around, and simply add new. Thus we are backward compatible.

With the one db per microservice model, we must update the API everytime we add and change schema. This requires a lot more coding, and maintenance. It also means more nuance to remain backward compatible.

Related: Why generalists are better at scaling the web

3. Consistency of data on restore

This is one factor that is often forgotten. Suppose you have an orders service and users service. The Users db behind the users service fails, and must be restored. When the db is restored, a new user that happened just before failure, is lost. However, the Orders service still has a record which references that lost user. What then?

From there we would need a cleanup routine that would go around and remove inconsistent child records after failure. Alternatively we would need a way to backup all dbs from all microservices in a consistent point in time manner. *NOT* an easy task.

Shared database solves these problems in an elegant way.

Related: Why i ask for a deposit

4. Improving performance

Allowing access to orders & users tables in the same db call means eliminating all those slow API calls, associated network congestion and more. It centralizes that, allowing you to do SQL joins. Here the database does the heavy lifting of slicing and dicing, and returning only the packets of data you need.

Related: How progress reports can help engagements succeed

5. Should we bring back db admin job role?

When we centralize our database, we also centralize responsibility. There too, we return to the old debate of ops versus devs. I wrote an article years back titled the four-letter word dividing dev and ops.

When we have a job role for the database management, they have a mandate. Ensure backup & recovery, consistency & performance. Watch things. Monitor. Provide care & feeding. Put fences around applications, and constraints on data going in and out.

All these things are good things. And just like you want a building inspector to be different from the building developer, so too you want those separation of job roles in the software arena.

Related: How to hire a developer that doesn’t suck

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What was the best decision you made in your career?

via GIPHY

I was recently asked this question by a colleague. I thought a little bit about it for a moment. The answer was quite clear.

For me the answer is easy. Going indedepent has been the best decision of my career.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

Starting at the birth of the internet explosion, mid-nineties when mozilla became real. The dot-com era took off, and so did the demand for engineering talent.

1. Going independent

For me I had just moved to New York. So timing was right. I had experience running my own business, in my teen years. That streak of independence drove me to do the same with my technology skills. Call it a hunger. A need to go it alone, make my own way in the world.

Related: When you have to take the fall

2. Self directed career

The advantages of going it alone are a double edge sword. On the one hand you can steer towards projects you find interesting. And upgrade your skills in those directions. The downside is you’re taking on all the risk. If you’re wrong about the direction of the industry, you’ll have wasted your time, money, and resources.

I wrote previously about that in Why do people leave consulting. It’s one reason among many.

Related: When clients don’t pay

3. Wide ranging exposure

For many in the traditional FT career track, you may work for 5-10 companies in the course of 20 years. In my case I’ve worked for close to 200 firms in that time.

In that process, you get exposure. To human problems & challenges, to product design & development problems, and architectural issues. And at that scale, patterns begin to emerge, as you see certain types of issues repeat themselves. This becomes valuable insight.

Related: Why i ask for a deposit

4. Build survival skills

As I mentioned previously, independence is a double edged sword. You build survival skills. But you need them. There’s no net beneath you, protecting you from falling. So you’re forced to make hard decisions about how you spend your time, finding projects, networking, learning new skills, and delivering in a real way to your customers.

The dividend is that now you have survival skills. And those indeed are very valuable.

Related: Why i ask for a deposit

5. Good money

There is a myth that consultants make more money. But then i hear stories of someone getting laid off, and getting a 4 or 5 month severance. That’s shocking to me. What’s more people often forget about the value of days off, health care & other benefits, and the huge one being upgrading skills. If a firm is offering you this, take advantage!

Remember that you’ll get none of these benefits working for yourself, unless you’re successful enough to reward yourself in this way. That means having a good pipeline of projects, and a trail of happy successful customers behind you. They will tell your story, and sell you to colleagues.

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Thinking deeply about Amazon cloud & infrastructure code

via GIPHY

If you’re building anything in the public cloud these days, you’re probably using some automation. There are a lot of ways to reach the goal posts, and a lot of tools to choose from.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

In my case I’ve put Terraform to use over and over again. I’m built vpcs, public & private subnets, and bastion boxes for mobile apps for mental health & fitness, building security, and two factor authentication apps.

I’ve chosen Terraform because it has a vibrant & growing community, the usability is miles ahead of CloudFormation, and it can work in a multi-cloud environment.

But this article isn’t about choice of tools. I’m curious about this one question:

“What architectural considerations should I keep in mind as I build my infrastructure code?”

Here are my thoughts on that one…

The VPC is your fundamental container

Everything you build sits inside of a VPC. Your entire stack references back to those variables, including vpc-id, private and public subnet IDs.

Here’s an example where you can get into trouble. Digging through some infra code, reviewing with a new devops hire, we were going through everything with a fine toothed comb. We found that the RDS instance was being deployed in PUBLIC subnet, instead of private.

Alerted to the problem, we first checked to see whether it was accessible from the internet at large. It wasn’t, because we had not exposed a public facing IP. That said it wasn’t the most secure setup and I wanted to fix it.

I made some changes to the Terraform code, to update the subnet to private, and tried “$ terraform apply”. Then I got all sorts of errors. Try as I might, this update would not work.

Sadly the long term solution was to destroy the entire stack, and rebuild with RDS in the right place. Lesson learned.

Related: When you have to take the fall

Why I discovered a shared or utility vpc was so useful

o story of placing ELK inside an application vpc

Related: Before you do infrastructure as code, consider your workflow

Think carefully about domains

As you build your application, you’ll likely need a route 53 zone associated with it. And you’ll want a CNAME in front of your load balancer, so it’s easy for customers to hit your endpoint.

1. rebuilding stack means new zone & new nameservers

If your registrar is elsewhere, you’ll need to update nameservers each time you destory & build the zone. This happens even if you host the domain at AWS. And it can’t be automated at the moment.

You could also have the zone created *outside* terraform. Then your terraform code would reference and add CNAMEs to that zone by using it’s ARN as reference. This is another possible pattern.

The pattern I prefer is to have each vpc & stack have their own unique top level domain. That way terraform can cleanly create and destroy the whole stack and nothing is comingled.

Related: I tried to do infrastructure as code, it didn’t go as I expected

Enable easy create & destroy

Each time you tear up your work and rebuild, you test the whole process. This is good. Iron out those hiccups before they cause you trouble. After some time, you’ll be able to move your entire application stack, db, ec2 instances, vpc & network resources from one region to another easily & quickly.

After doing this a few times, you’ll start to learn what resources in AWS are region specific. And which ones are global.

Remember, don’t allow any manually created objects or resouces inside your automated ones. If you aren’t strict here, you’ll hit errors when you try to destroy, and then have to troubleshoot those one-by-one.

Related: How to setup an Amazon ECS cluster with Terraform

Automate first

I was building an ELK server setup to centralize our logging infra. Everything worked pretty well. After a time, I added some more S3 buckets for load balancer log ingestion.

Later we hit a problem, where the root volume was filling up. This stopped new logs from appearing. So we rebuild the ELK server with a 10x larger root volume. As we had used terraform and ansible, the rebuild was easy. And quickly we had are logging system back online.

A week or so later though we had trouble again. It seemed that some of the load balancer log data wasn’t showing up. We spent a day troubleshooting, and eventually found out why. Those S3 buckets weren’t being ingested.

Turns out when we added those, we added them to the config file *directly* on the server, but not in the configuration scripts.

Moral of the story…

“Always update the automation scripts first, and apply those to the server. Don’t work on the server directly.”

Related: Are you getting good at Terraform or wrestling with a bear

Beware of account limits

As you’re building your stack in us-east-1, you may later go and try to create another copy of it. Suddenly AWS complains that there are no VPCs left. Or you’ve hit a maximum of 20 EC2 instances. While these errors may be irritating, you should be glad to have them. With them in place, and errant piece of infra code or application cannot accidentally run up your account and receive a surprise bill.

That said you should be mindful of those limits, and increase them before your application hits a wall.

A few that I’ve run into:

o 20 ec2 instances per region
o 5 Elastic IPs per region
o 5 VPCs per region

If your application requires more, prepare to switch regions, or up those service limits. You can open a support ticket to do that.

Related: How do you interview for key AWS skills?

What resources live on past a stack build/destroy cycle

As you build your stack with infrastructure code, you’ll tear it down again often. Each time you do this, you’ll be reminded of one thing. Any data inside there will be gone.

That means for starters don’t store things in the filesystem. Store them in a database. RDS is great for this purpose. Then the question becomes, when I destroy my stack, how can I backup and restore my database. RDS does support this, but if you have more nuanced requirements, you may have to build your own backup & restore.

What about cloudwatch logs? As long as your stack doesn’t destroy those resources, they’ll be kept in perpetuity for you. You may want to further back them up.

What about your load balancer logs? Here you can either create the S3 log bucket *outside* of your infra code. In that case it won’t get cleaned up during a destroy. Alternatively you can create a meta bucket for load balancer logs, and copy those over regularly. Then when you cleanup your infra, you can do a bucket destroy with –force option.

Related: Is Amazon too big to fail?

Some things remain manual but can be made easier

One thing that remains manual in AWS is the SSL certificate creation. You can “request” a new certificate and select DNS validation.

When you do this, incorporate the certificate control cname and certificate control record into the infra code as variables. Then copy/paste these two values from your certificate dashboard.

Assuming your nameservers are pointing to aws, the certificate validation code should spot the above secret control record in DNS. When it does it will conclude that you control the zone, and therefore validate your SSL certificate.

Once it shows VALID on the dashboard, copy the SSL certificate ARN, and pop that into your terraform code. You will add it as part of an SSL listener to your ALB (application load balancer) configuration.

Related: Does AWS have a dirty little secret?

Don’t just monitor metrics

Monitoring is of course important. You’ll want to setup a prometheus server that can do discovery. This allows it to dynamically configure and learn about new servers, so it can monitor those too. It does this by using the AWS API to find out what is there to monitor.

All of this monitoring is crucial, but it applies mainly to server metrics. What is the load average, CPU utilization, memory or disk usage.

What about your log counts? As I describe above, a logstash misconfiguration meant that log records didn’t show up. However this was only noticed through manual discovery. We want that to be automatically discovered!

Do that by creating checks that count records, and alert to numbers that are

You can also validate other data with checksums, by creating your own custom methods. You’ll need to think intelligently about your application, and what type of checks make sense.

Related: How I resolved some tough Docker problems on Amazon ECS

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters