All Devops

When I found gold in my customer archives


I’m good at keeping notes. I’ve blogged about Can progress reports & daily notes help engagements succeed. I would give that an emphatic YES!

From helping with communication, to sharing arcane details about blocking issues, struggles & hurdles, notes can illuminate things that a CTO or manager may not otherwise be aware of.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

I was digging through mine archives recently, and found a bunch of notes on how to think about Terraform. In particular, how do you think about infrastructure as code? How do you architect to make it all work together?

1. A dead end started me backtracking

You’re going to dig your heels in by getting your application working. To do that you’ll spinup a vpn, private public subnet, bastion boxes, ECS hosts to deploy containers to, and an application load balancer endpoint. Getting that all working wasn’t terrible. We even included a prometheus node, to give us some monitoring visibility. We even added our jenkins server into the mix. Do you see where this is going?

At a certain point we of course needed to destroy the whole setup, but didn’t want to destroy the CI pipeline. Duh! And what about monitoring? Lose all that data each time no way!

Read: Infrastructure provisioning – what is it and why is it important?

2. Organize around VPCs

After dragging yourself through that, you see a bit better. It’s like standing at 20,000 feet.

Your vpn is a logical collection of instances. A box that holds your application, provides security, and gets created and destroyed with it. You can even see in your Terraform code, a subnet requires a VPN id within which to create it. And an instance requires a subnet within which to create it. For security reasons the application instances will sit within PRIVATE subnets, and only bastion box & load balancers will sit in PUBLIC subnets.

TO my mind that means each environment DEV, STAGE, PROD all get their own vpn. This also allows you to control who can access stage & production, as they have their own bastion access points.

Read: How to hire a developer that doesn’t suck

3. Build a utility VPC

What you’ll also see from the above story is that you need a place to have business wide, non-application services sit inside. Welcome the UTILITY VPC!

This can contain prometheus, ELK or other log collection service, your jenkins or other CI pipeline, and any other services that don’t logically fit within the application VPC.

Related: Why generalists are better at scaling the web

4. A VPC should be ok to destroy and rebuild in another region – in one-click

When you use infrastructure code, you want to test, create & destroy often. That shouldn’t disrupt anything. That means all state data should sit outside of those instances. Logging data, send it to logstash or cloudwatch. Application state, keep that inside of an RDS instance. And you’ve tested those backups right?

Speaking of RDS, I encountered problems with Amazon’s own backup & restore. For my money, I had a lot of problems and ended up writing a custom db dump script. That may require a custom restore to, so buyer beware. Here’s my story though… I tried to build infrastructure as code with Terraform and Amazon and it didn’t go as i expected.

You also may encounter issues when you move across regions, such as elastic IPs and so forth. And you’ll need to check and verify the code which creates and destroy S3 buckets and domain name certs. These areas gave me some hiccups, but you can work through it with diligence!

Read: Is zero downtime even possible on RDS?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

All Cloud Computing CTO/CIO Data Database Operations Devops

How can we keep cloud architectures simple?


I was reading hacker news, as I often do. And I found David Futcher’s post You Don’t Need all That Complex/Expensive/Distracting Infrastructure..

Of course it caught my attention. You may be surprised by the reasons

Join 38,000 others and follow Sean Hull on twitter @hullsean.

One quote that should raise your eyebrows…

I’ve seen the idea that every minute spent on infrastructure is a minute less spent shipping features

Here’s what I think…

1. Performance tuning is often about removing things

That sounds strange right? How can performance tuning be about removing things?

Here are a few examples:

o removing results: When you add an index you remove data, returning just the pieces you need.
o removing lag time: When you remove time, you get faster response. This cascades through your entire application, allowing more requests to get handled in a fixed amount of time. On AWS you get allocated a faster NIC when you use a larger instance size. It’s automatic, though somewhat invisible.
o removing data: By trimming tables, access speeds go up. Reads are faster when you hit the whole table, because there’s fewer records to sift through. Writes are faster because you are maintaining smaller associated indexes.
o removing codepaths: By having fewer libraries, and layers between your application, and the data it retrieves, you have less overhead. And that translates to quicker response time too.
o removing databases: If you’re fully microservices, you have a database behind every service. This means your service sometimes proxies just to get at data that has been decoupled. By consolidating databases to a shared db model, you reduce this cross-traffic dramatically.

Related: When you have to take the fall

2. Are we just building what everyone else does?

In technology as with any other industry, following the big trends is safe. If you’re building an architecture that is used by Facebook, Amazon, Apple, Netflix & Google are using, you’re on the best path, right? Certainly few would criticise their success. So yes it is safe. Even if it fails.

Going with a much simpler architecture, that has even a whiff of so-called legacy, may seem like bucking the trend. But fewer moving parts means less to break, less to manage, and less to tune.

Related: Why generalists are better at scaling the web

3. Customers don’t care

Remember, customers aren’t devops gurus nor do they care about Rust versus Swift versus Elixir. What they care about is they can comment on their social media app or order your widget. They want your product to work.

They don’t care if it is hosted in the cloud, or at a managed datacenter. They probably don’t even care about tiny short outages either. What they do care about is that it works, and works well. And fast.

If your infrastructure allows you to be responsive to customers, roll out new product features & updates, you’re going to have some happy customers. The end!

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

All CTO/CIO Devops

How do we test performance in a microservices world?


I recently ran across this interesting question on a technology forum.

“I’m an engineering team lead at a startup in NYC. Our app is written in Ruby on Rails and hosted on Heroku. We use metrics such as the built-in metrics on Heroku, as well as New Relic for performance monitoring. This summer, we’re expecting a large influx of traffic from a new partnership and would like to have confidence that our system can handle the load.”

“I’ve tried to wrap my head around different types of performance/load testing tools like JMeter, Blazemeter, and others. Additionally, I’ve experimented with scripts which have grown more complex and I’m following rabbit holes of functionality within JMeter (such as loading a CSV file for dynamic user login, and using response data in subsequent requests, etc.). Ultimately, I feel this might be best left to consultants or experts who could be far more experienced and also provide our organization an opportunity to learn from them on key concepts and best practices.”

Join 38,000 others and follow Sean Hull on twitter @hullsean.

Here’s my point by point response.

I’ve been doing performance tuning since the old dot-com days.

It used to be you point a loadrunner type tool at your webpage and let it run. Then watch the load, memory & disk on your webserver or database. Before long you’d find some bottlenecks. Shortage of resources (memory, cpu, disk I/O) or slow queries were often the culprit. Optimizing queries, and ripping out those pesky ORMs usually did the trick.

Related: Why generalists are better at scaling the web

Today things are quite a bit more complicated. Yes jmeter & blazemeter are great tools. You might also get newrelic installed on your web nodes. This will give you instrumentation on where your app spends time. However it may still not be easy. With microservices, you have the docker container & orchestration layer to consider. In the AWS environment you can have bottlenecks on disk I/O where provisioned IOPS can help. But instance size also impacts network interfaces in the weird world of multi-tenant. So there’s that too!

Related: 5 things toxic to scalability

What’s more a lot of frameworks are starting to steer back towards ORMs again. Sadly this is not a good trend. On the flip side if you’re using RDS, your default MySQL or postgres settings may be decent. And newer versions of MySQL are getting some damn fancy & performant indexes. So there’s lots of improvement there.

Related: Anatomy of a performance review

There is also the question of simulating real users. What is a real user? What is an ACTIVE user? These are questions that may seem obvious, although I’ve worked at firms where engineering, product, sales & biz-dev all had different answers. But lets say you’ve answered that. Does are load test simply login the user? Or do they use a popular section of the site? Or how about an unpopular section of the site? Often we are guessing what “real world” users do and how they use our app.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters