Are you as good as the public cloud?

via GIPHY

According to Lyft’s recent public filing, they plan to spend 300 million buckaroos in the next 2.5 years on AWS.

Did I hear that right?

Join 38,000 others and follow Sean Hull on twitter @hullsean.

Perhaps that is their estimate, or the maximum amount they want to budget for. Regardless that’s a lot of money any way you slice it. A lot of folks are commenting about how crazy that is, and how much datacenter you could build yourself with that much money.

What do you think? Is it foolhardy? Or is there a hidden wisdom here?

Here’s my take.

1. Do you have one million customers testing your datacenter?

If you’re comparing the cost of the cloud to the raw numbers of running your own datacenter, the hardware costs are not enough. You’ll need to include the ops teams & other engineers. Right, you probably guessed that.

But did you factor in the costs of a legion of testers. This is the hidden cost that commercial software carries, even while open source software gets this benefit for free.

With a public cloud like AWS you have millions of customers testing the product everyday, and running into edge cases long before you do. So you get a better service, that’s more reliable, all invisibly for free.

Related: How can we keep cloud architectures simple

2. Do you have 66 datacenters spread across 21 regions and a free network between them?

Anybody who was building web applications in the year 2000 will remember how websites didn’t load the same for different customers. Depending on where in the world they were located, they could experience a very different user experience.

These days we assume that we can be global from day one. But how exactly do we achieve this? Remember with a public cloud, you’re getting tons of things for free, without knowing it. Moving data between AZs or regions? That’s all going across a private interconnect.

And that’s not even including the 180 nodes inside cloudfront that give you a global CDN footprint too!

Also: What hidden things does a deposit reveal?

3. Do you have an engineering team automating away job roles?

I remember the days of DBA job role, do you? Probably not. I specialized in this for years, and there were tons of companies hiring me to help them with it. First Oracle, then MySQL, then Postgres.

Then along came Amazon RDS. Guess what, companies don’t really hire for that role anymore. They do need help with it from time to time, but not as a primary specialization.

What do I mean? Well by hosting your application on AWS, you’re benefiting from the work of teams of engineers in different departments, all expanding on APIs and automating things that those one million customers are asking for.

You’re not going to be able to innovate that well and that quickly in your own datacenter. So you’ll pay more!

Read: Can communication mixups sour an engagement?

4. Do you have APIs that tons of engineers have already written code for?

A quick peek at Terraform’s community modules on Github and you’ll probably blush. From VPCs to bastion boxes, key management to load balancers, lots of code has been written and open sourced.

By deploying on a platform that a lot of other devs are using, you’ll benefit from all this open source code. That means you won’t have to write that stuff yourself.

Sure you’ll have integration work to do, but the hidden benefit of being on a popular platform saves you money.

Check out: How I use 5 daily habits to stay on track

5. Can you do disaster recovery for free?

If you build your own datacenter, you have to buy all your capacity. So there are no spare servers sitting around waiting for your use. In the public cloud there is always spare capacity.

What that means is you can write automation code to spinup copies of your application stack in alternate regions, at the push of a button. Thus you effectively get disaster recovery for free!

Also: Can daily notes help you work better with clients?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How can communication mixups sour an enagement?

via GIPHY

I recently had some communications mixups with a customer. It reminded me how delicate, communications are between customers & vendors. What’s more they can be challenging between developers & managers. It highlighted for me these challenges, and the strategies I’ve learned over the years.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

While I didn’t lose the project, the initial misunderstandings continued to eclipse the project, long after they were cleared up.

1. First a missed conference call

Early on, we setup a call to discuss the challenges. The time of the conference call had been agreed to, but somehow it didn’t make it into my calendar. So when the appointed day & time came, i missed the call. This was before any contract was signed, or even the engagement had gotten started.

Needless to say this is a very delicate moment, as everything we do sets precedents about our personality and working style.

While we were able to reschedule, it added some initial strain to the relationship. As you’ll see that compounded more later.

Related: Walking the delicate balance of transparency

2. Next arriving late to the kickoff meeting

I always pride myself on timeliness. I think it communicates all sorts of things to customers. First it shows you’re serious and will manage the project carefully. Next it shows you respect for others time.

As usual, I left plenty of extra time, so I would arrive well before the meeting. Arriving at the building 20 minutes early, I searched but could not find the entrance. Neither could google as it turns out. Strange I thought, what could be wrong? I walked into the building where the address should be, and asked the doorman. He explained that the company didn’t reside there. Perhaps they’re not located at Park Avenue, but rather Park Avenue South, he suggested. And then the lightbulb goes off. Of course!

Realizing I now have 5 minutes to arrive on time, I’m going to be late. So I attempt to call the manager leading the meeting. I get his voicemail, and leave a message. I then jump in a taxi, and head to the Park Avenue South address. Arriving 10 minutes late, I quickly head upstairs. I’m greeted by some grumbling, and frustrated looks.

Despite this being an understandable mistake, it comes on the heels of another mixup. So now I’ve set a precedent of lateness. Despite being a timely person, it’s hard to erase the stamp that is there now.

We continued to have strained relations through the engagement. While it did finish to completion, I believe it would have gotten extended were I not to have stumbled early on.

Also: When you have to take the fall

3. What can a mixup indicate?

There are many questions it may raise. Possible ones include:

o Is candidate too busy with other tasks?
o Is the person forgetful?
o Is one party bullying on their perspective?
o Is there finger pointing & blame game in the org?
o What is the culture of the organization?
o Is it one of understanding & working together or blame game?
o Is the person uninterested?
o Is the project not a priority?
o Is the company disorganized
o Is miscommunication endemic?

Some of these thoughts may bubble up consciously, and some may linger as a bad taste in your mouth. Regardless, they should be faced head on, with understanding and humility on both sides.

Read: Why i ask for a deposit

4. The weight of first impressions

Inevitably, when there is a mixup, of lateness or missed meeting, there is a technical explanation. In my story above, the *reason* is Park Avenue and Park Avenue South are completely different addresses.

o First impressions are KEY

Even with a reasonable explanation, there is a reaction that is felt.

o There is a visceral emotional reaction we all have anyway

Such a reaction is easy to cause, but hard to patch up. It will take time, and multiple interactions to set a new impression to people.

o Reactions can be incorrect & irrational sometimes
o They can color further interactions

With time impressions can be adjusted, but it takes much more work after an initial mistake.

Check out: How to hire a developer that doesn’t suck

5. Possible solutions

While there is no sure fire way to avoid mixups like these, there are some things that can work in your favor.

o maintain flexibility

That means accepting blame, and mutual responsibility in reaching the goal posts.

o maintain a sense of I *can* be wrong

Everyone can be wrong, and everyone makes mistakes. So don’t try to avoid blame. That said emphasize that everyone must work together. On communicating engagement details, on mutual agreed times, and time zones.

o look for a sense of we *can* be wrong

I think these types of mixups can also be beneficial. For they underscore the customers management style. Do they point fingers, or acknowledge reasonable mistakes. Both parties will make mistakes eventually, and understanding of this builds good faith down the road.

o “let’s work together to improve communication”

Framing the mixup as a shared problem is important. Although the address mixup above is technically my fault, it’s probably a common one. Park Avenue South confuses everyone in New York. So an understanding customer might offer to share a bit in this with you.

o hold frame of mutual responsibility and working together using the word “we”

The frame is key. It’s not *all* your fault, nor is it the customers if they mixup. We all need to be understanding, to a point.

Also: Can daily notes help you work better with clients?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What hidden things does a deposit reveal?

via GIPHY

I like this idea of how integration tests in software development show you that everything is working and connected together properly.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

I think it’s interesting to consider how a deposit may serve a similar function across the financial space & contractual space.

1. Alignment across business units

In really small organizations, everyone is in tight communication. Finance knows what engineering is doing. In medium to large organizations, there can be a disconnect. Engineering may be 100% ready to start today, but finance is not ready. In some cases finance may not even know a consultant is being hired. Each case is different.

Some CTOs get this right away, and are already ahead of the request. While others might ask, “Well we’re ready to get going today, do you really need the deposit first? Because that might take some time.”

My thinking is, yes the engineering department is ready, but the organization is *not* completely ready. And it’s better that there be alignment across the organization. Ironing out that alignment, helps avoid other problems later on.

Related: When you have to take the fall

2. Organization or disorganization

Sometimes there is complete alignment, the contract is already ready, and the whole org really is ready to go. In other cases there can be some disfunction. For instance the lawyers have a lot of hoops that want us to jump through, in terms of a contract.

In other cases finance may only cut checks on a certain day of the month, or only pay 30 days after receiving an invoice. There are a lot of different policies. By insisting that we receive a deposit, however small, we iron out these things early.

If the engineering manager or CTO hiring you promises one thing, but finance has a policy against that, you’ll want to know early to avoid misunderstandings.

Related: Why generalists are better at scaling the web

3. Trust

The amount of a deposit is really irrelevant. It’s all about getting ducks in a row. Both in terms of what may be required of you the vendor, and what the company’s policies may be when onboarding consultants.

By ironing out these issues early, the customer is showing some faith in you as a vendor. They want you in particular, and will do what they need to, to make it work.

Related: Is AGILE right for fixing performance issues?

4. We want you to rush, but we don’t

I’ve encountered many cases where engineering was “ready” but finance was not. It’s tough. From the perspective of the CTO it may be a moot point to get stuck on.

My thought is to hold the frame of two organizations working together. When the organization has alignment that hiring this engineering resource is a priority, it will get things done that it needs to.

Related: How to hire a developer that doesn’t suck

5. Stress tests or organizational integration tests

In software testing, we have something called an integration test. It might be confirming that a login works, or a certain page can load. Behind the scenes that test requires the database to be running, the queuing system to work, an API call to return successfully, and so on. A lot of moving parts all have to be working for that test to succeed.

In a very real way, a deposit is the financial equivalent of an integration test. It confirms that we’re all aligned in the ways we need to, and are ready to get started.

Related: How do I migrate my skills to the cloud?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Walking the delicate balance of transparency

via GIPHY

I’ve written before about How I use progress reports to stay on track.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

I think it’s an interesting topic, and an important one.

While I do believe transparency is important when working with clients, that doesn’t mean it’s easy.

1. I start with daily notes

As I mentioned above I think they’re important. They provide visibility, improve trust, and keep me on track. They also help me remember what was happening on particular days. They’re like breadcrumbs on the path to building solutions.

Related: How to hire a developer that doesn’t suck

2. Notes can highlight organizational dysfunction

Often in my notes, there are details of who I coordinate to get what done. Perhaps I need credentials to reach a particular server. But to get those, I need an email address. And to get that, someone in department X must set that up. And there are delays with that process.

Those delays can cascade through the onboarding process, frustrating everyone. Although the operations team is read and raring to go, the finance or legal team is not quite ready, and there are delays there. Or there are hiccups in some other frequent business process.

Related: Why generalists are better at scaling the web

3. Notes can highlight task complexity

Sometimes I hear the phrase “That should be simple to do”. Only to find the devil buried in the details. As we put boots on the ground, we find there are many dependent tasks that are not finished. So those must be completed first.

In this case I think complexity of notes is a real triumph. For CTOs that are more management oriented, they may not have day-to-day understanding of coding complexity. And that’s ok. But when that complexity is laid out in all it’s gory detail it can be a real educational experience.

Related: How do I migrate my skills to the cloud?

4. For some CTOs high level is better

For some CTOs, they don’t want to slog through endless notes about setting up credentials, or problems with permissions of keys on server X or Y.

While in these cases I still collect the detail, I may also add some high level bullet points, that focus on what all these underlying parts are in service of.

Related: When you have to take the fall

5. Be prepared for archeological surprises

Inevitably there will be surprises. Whether department X does not know what department Y is doing. Or whether setting up an aws account takes two days, instead of two hours. Be prepared.

Inevitably I find these all help communication. And since I’ve been keeping them, I’ve never had a customer balk at an invoice. Notes don’t lie!

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How do you handle the onboarding at a new engagement?

via GIPHY

Jumping into the fray at a new firm is never easy. You’ll have new people’s names to remember, new web dashboards to login to, to bookmark, etc. New passwords to remember, new workflows to learn.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

While fulltime folks typically onboard logins in a week, and don’t contribute code for a month or more, consultant engagements mean hitting the ground running.

Here’s what I try to manage, when first diving in.

1. Deposit & agreement

When I start at a new engagement, I require a deposit. There are a lot of moving parts to that happening. In engineering speak, it acts like an integration test across your entire organization. All the departments must be aligned. Legal with the agreement language. Finance with the banking details, and invoice. CTO or manager with a clear picture of scope of work.

In getting past that first hurdle, both parties, will express their working style. And usually there are compromises that must be made on both sides. But the effort each one makes is essential to a strong and equitable relationship that you’re both working to build.

Related: How to hire a developer that doesn’t suck

2. Over communicate

Sometimes your teammate doesn’t know you’re also working to get things over to legal. And legal doesn’t know you’re working with finance. And finance doesn’t know you’re trying to tune a database. And the network admin doesn’t know your email address isn’t setup.

When in down over communicate. Don’t be afraid to repeat in an email what you thought you’d communicated clearly on slack. Sometimes slack messages are missed, as there are so many that get thrown around. It’s easy to miss a notification.

When in down, communicate again. Ask for clarification. Ask if there is anything someone may be waiting on.

Related: Why generalists are better at scaling the web

3. Keep daily notes

I’m a big fan of providing daily progress reports. There is a hell of a lot of detail buried in most tasks, and much of that gets lost in the shuffle.

Putting together your own notes of what your day looked like can help management understand that complexity. It can also help communicate where the organization is getting stuck. Sometimes surprises here can help unblock the org in other ways.

Related: Why i ask for a deposit

4. Beware the Slack rabbit hole

Slack can at times be a blessing, allowing you to reach someone immediately, but also sometimes be a curse. Have I seen every notification? Does the person who posted a note *assume* that I saw it? Which thread was that detail posted in anyway?

I personally like to repeat a lot of communications in email. From a consulting perspective this is also essential as it provides me a paper trail of what conversations we had. Remember once an engagement is completed, you lose the entire Slack message thread. That’s not true of email.

Related: When you have to take the fall

5. Anticipate login issues

Typically at the start of an engagement there is an email setup, and other authentication hangs off of that one. AWS confirms via email, or perhaps there is an SSO solution like OKTA. Inevitably, these interconnected pieces take time to setup. And one will hit a snag slowing down your over all onboarding.

Expect hiccups and challenges in this process. It’s normal for it to take some days. Imagine that FT hires typically onboard in a week, and don’t contribute code for a month or more. So keep everything in perspective on these points.

Related: How do I migrate my skills to the cloud?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How I use 5 daily habits to help me stay on track

via GIPHY

Join 38,000 others and follow Sean Hull on twitter @hullsean.

1. Keep a tight todo list

I shoot for five tasks on my todo list. Be sure they are small, 15-30 minute tasks because things have a way of ballooning. If what you’re doing takes longer, break it down into smaller pieces. This keeps you moving, and always making progress.

You might be tempted to have more items. But chances are you’ll spend an hour on emails and time on phone calls, and other distractions. And there will be preemptive tasks that suddenly require your attention. So keeping this list small, allows you to hit close to 100% success.

Sure there will be days when you’re *more* productive. It doesn’t hurt to pull some items off the long term list. 🙂

Related: When you have to take the fall

2. Zero inbox

I’m relentless about this. Terse replies, stay focused, and remember the reward you’ll give yourself when you finish your day.

Related: Why generalists are better at scaling the web

3. Take a break every hour or two

Smokers have an easy time with this. And perhaps coffee drinkers. If you’re anyone else, you may get into the habit of staying in your chair. Don’t. Regular breaks promote creative thinking, and physically moving helps get the mind in motion too.

Sometimes when I work in a coffeeshop I don’t bring my charger. That way I’m forced to take a break when the battery runs low.

Related: Why i ask for a deposit

4. Reward yourself

Pat yourself on the back when you complete all your tasks. If it’s 4pm, so be it. Jet a bit early. You know there will be other days when you’re working until 8pm too. Promise yourself something when you finish. A treat, or a stroll through the park, or an extra ten minutes to walk your dog, or a frosty IPA. Whatever it is, rewards help remind is we’ve done well.

Related: How to hire a developer that doesn’t suck

5. Always be networking

If you’re in a FT role, you may do most of your socializing with coworkers. That’s fine, but be sure to go to some regular meetups too. And followup with people. Maybe even give a few talks now and then. Networking is the most surefire way to build your career and always be growing. And it’s a little bit each day that it takes to build lasting momentum.

Related: How do I migrate my skills to the cloud?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How can we keep cloud architectures simple?

via GIPHY

I was reading hacker news, as I often do. And I found David Futcher’s post You Don’t Need all That Complex/Expensive/Distracting Infrastructure..

Of course it caught my attention. You may be surprised by the reasons

Join 38,000 others and follow Sean Hull on twitter @hullsean.

One quote that should raise your eyebrows…

I’ve seen the idea that every minute spent on infrastructure is a minute less spent shipping features

Here’s what I think…

1. Performance tuning is often about removing things

That sounds strange right? How can performance tuning be about removing things?

Here are a few examples:

o removing results: When you add an index you remove data, returning just the pieces you need.
o removing lag time: When you remove time, you get faster response. This cascades through your entire application, allowing more requests to get handled in a fixed amount of time. On AWS you get allocated a faster NIC when you use a larger instance size. It’s automatic, though somewhat invisible.
o removing data: By trimming tables, access speeds go up. Reads are faster when you hit the whole table, because there’s fewer records to sift through. Writes are faster because you are maintaining smaller associated indexes.
o removing codepaths: By having fewer libraries, and layers between your application, and the data it retrieves, you have less overhead. And that translates to quicker response time too.
o removing databases: If you’re fully microservices, you have a database behind every service. This means your service sometimes proxies just to get at data that has been decoupled. By consolidating databases to a shared db model, you reduce this cross-traffic dramatically.

Related: When you have to take the fall

2. Are we just building what everyone else does?

In technology as with any other industry, following the big trends is safe. If you’re building an architecture that is used by Facebook, Amazon, Apple, Netflix & Google are using, you’re on the best path, right? Certainly few would criticise their success. So yes it is safe. Even if it fails.

Going with a much simpler architecture, that has even a whiff of so-called legacy, may seem like bucking the trend. But fewer moving parts means less to break, less to manage, and less to tune.

Related: Why generalists are better at scaling the web

3. Customers don’t care

Remember, customers aren’t devops gurus nor do they care about Rust versus Swift versus Elixir. What they care about is they can comment on their social media app or order your widget. They want your product to work.

They don’t care if it is hosted in the cloud, or at a managed datacenter. They probably don’t even care about tiny short outages either. What they do care about is that it works, and works well. And fast.

If your infrastructure allows you to be responsive to customers, roll out new product features & updates, you’re going to have some happy customers. The end!

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Are shared databases back in vogue?

via GIPHY

I just stumbled upon this article by Roman Krivtsov on YC News Is shared database in microservices actually anti-pattern?. As a seasoned DBA in another life, I was intrigued by the title.

I devoured the piece.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

One quote at the end really sums things up…

In the very beginning you probably don’t need microservices. Start with monolith and see, if you really need them in future.

Here’s my takeaway from it.

1. DB access by proxy

As the database itself is a service, does it make sense to use a microservice to essentially front the database? By doing that we simply add a layer of abstraction, need to keep the API up to date, and eat the network and compute expense of interacting with that data by proxy.

Related: When you have to take the fall

2. Changing db schema or service API

In a traditional database, when we update a schema, we can keep the old columns around, and simply add new. Thus we are backward compatible.

With the one db per microservice model, we must update the API everytime we add and change schema. This requires a lot more coding, and maintenance. It also means more nuance to remain backward compatible.

Related: Why generalists are better at scaling the web

3. Consistency of data on restore

This is one factor that is often forgotten. Suppose you have an orders service and users service. The Users db behind the users service fails, and must be restored. When the db is restored, a new user that happened just before failure, is lost. However, the Orders service still has a record which references that lost user. What then?

From there we would need a cleanup routine that would go around and remove inconsistent child records after failure. Alternatively we would need a way to backup all dbs from all microservices in a consistent point in time manner. *NOT* an easy task.

Shared database solves these problems in an elegant way.

Related: Why i ask for a deposit

4. Improving performance

Allowing access to orders & users tables in the same db call means eliminating all those slow API calls, associated network congestion and more. It centralizes that, allowing you to do SQL joins. Here the database does the heavy lifting of slicing and dicing, and returning only the packets of data you need.

Related: How progress reports can help engagements succeed

5. Should we bring back db admin job role?

When we centralize our database, we also centralize responsibility. There too, we return to the old debate of ops versus devs. I wrote an article years back titled the four-letter word dividing dev and ops.

When we have a job role for the database management, they have a mandate. Ensure backup & recovery, consistency & performance. Watch things. Monitor. Provide care & feeding. Put fences around applications, and constraints on data going in and out.

All these things are good things. And just like you want a building inspector to be different from the building developer, so too you want those separation of job roles in the software arena.

Related: How to hire a developer that doesn’t suck

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What was the best decision you made in your career?

via GIPHY

I was recently asked this question by a colleague. I thought a little bit about it for a moment. The answer was quite clear.

For me the answer is easy. Going indedepent has been the best decision of my career.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

Starting at the birth of the internet explosion, mid-nineties when mozilla became real. The dot-com era took off, and so did the demand for engineering talent.

1. Going independent

For me I had just moved to New York. So timing was right. I had experience running my own business, in my teen years. That streak of independence drove me to do the same with my technology skills. Call it a hunger. A need to go it alone, make my own way in the world.

Related: When you have to take the fall

2. Self directed career

The advantages of going it alone are a double edge sword. On the one hand you can steer towards projects you find interesting. And upgrade your skills in those directions. The downside is you’re taking on all the risk. If you’re wrong about the direction of the industry, you’ll have wasted your time, money, and resources.

I wrote previously about that in Why do people leave consulting. It’s one reason among many.

Related: When clients don’t pay

3. Wide ranging exposure

For many in the traditional FT career track, you may work for 5-10 companies in the course of 20 years. In my case I’ve worked for close to 200 firms in that time.

In that process, you get exposure. To human problems & challenges, to product design & development problems, and architectural issues. And at that scale, patterns begin to emerge, as you see certain types of issues repeat themselves. This becomes valuable insight.

Related: Why i ask for a deposit

4. Build survival skills

As I mentioned previously, independence is a double edged sword. You build survival skills. But you need them. There’s no net beneath you, protecting you from falling. So you’re forced to make hard decisions about how you spend your time, finding projects, networking, learning new skills, and delivering in a real way to your customers.

The dividend is that now you have survival skills. And those indeed are very valuable.

Related: Why i ask for a deposit

5. Good money

There is a myth that consultants make more money. But then i hear stories of someone getting laid off, and getting a 4 or 5 month severance. That’s shocking to me. What’s more people often forget about the value of days off, health care & other benefits, and the huge one being upgrading skills. If a firm is offering you this, take advantage!

Remember that you’ll get none of these benefits working for yourself, unless you’re successful enough to reward yourself in this way. That means having a good pipeline of projects, and a trail of happy successful customers behind you. They will tell your story, and sell you to colleagues.

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Does migrating to the cloud require a mindset change of your team?

via GIPHY

We’ve all heard the success stories at firms that have grappled with automation. The dividends are legendary.

Take Amazon themselves for example. By decoupling their teams, allowing each to grow independently and at their own pace, they’ve been able to scale massively.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

One look at the AWS dashboard these days, or their wikipedia page, reveals over 90 services on offer. And each of those is growing and expanding by day.

I’ve worked with a lot of startups, trying to get there. They’ve heard the gospel, and want to gain the benefits themselves.

Here are the challenges I’ve found.

1. Building ain’t easy

One example story was building an ELK box. ELK is elasticsearch, Logstash and Kibana. It provides a centralized place to send all your application & service logs, collect them all together on one dashboard. It’s the business intelligence of devops & software development. Super valuable tool.

In building our solution, we took a marketplace AMI off the shelf, and then customized that. After building the terraform code to spinup the server, we added Ansible scripts to further customize. This allowed us to add a cronjob for backups, set a password, add additional logstash configs, and a few other important housekeeping tasks.

All was great until we hit a snag, we found some CloudWatch logs were not making there way into ELK. Digging through the log messages, we eventually uncovered an error. And that was caused by a conflicting port configuration. So we removed that unused in logstash.conf, and problem solved.

Later, we rebuilt the server and that was pretty quick. Having all the scripts in place, meant we could rebuild quickly. In this case we just needed to resize the root volume by 25x to make room for future logs. This was 3 lines of terraform code and then done!

A couple of weeks later however, we found missing logs again. Digging digging digging, and then we finally discover it is a repeat of our old problem! Turns out the change to logstash.conf never got rolled into the automation scripts. It was done manually! Bad bad!

Moral of the story, with automation, your workflow needs to change. You should *always be working on the scripts* and then reapplying those. Never work on the server directly!

Time to eat my own dogfood!

Related: Is AGILE right for fixing performance issues?

2. Troubleshooting is tough

In the automation universe, as I wrote above, you really want to avoid logging into servers and doing things manually. But that may be easier said than done.

Take another example, I had an ssh key distribution script. I repurposed from the Terraform Community Modules. It works great when it works. It gets injected onto the server at boot time, by terraform inside the user-data script.

The code gets added to cron, and relies on awscli. As it turns out awscli is *not* on all of the aws linux images. Who knows why?!? But that’s where we are.

Should be easy to install. Use yum to get pip (python package manager) installed. Then use pip to install awscli. The script even has *both* yum and apt-get commands to attempt to install pip on either ubuntu or amzn linux. Problem is sometimes it doesn’t. Sometimes? You ask. Yes indeed.

Digging further, it seems that the new pip package gets installed in /usr/local/bin, while it used to install in /usr/bin/. Seems simple. Add a symlink. Yeah did that. Sometimes the package has a different name, such as python-pip3. Great!

Now all this is magnified because you can’t just go on the box and go through the steps. Why? Because in the primordial cromagnon universe that is linux server boot time, sometimes things happen in weird orders, or slower. So you may have something missing during that period, that is later available. So after boot you see no errors.

Yes complicated. Yes you need to build, destory, build destroy the server in endless cycles.

At the next level of automation, we will implement infrastructure testing pipeline. This will automatically build the server for you. The infrastructure unit testing framework seems pretty darn cool. And there is also Gruntworks Terratest.

Related: Is automation killing old-school operations?

3. The dividend is agility

What have i seen in terms of agility?

Well moving our application to a new region takes 20 minutes. Crazy as that sounds, from vpc, to 3 private subnets, 3 public subnets, bastion boxes, load balancers, rds & redis instances, security groups, ingress rules, iam roles, users, s3 buckets, ecs cluster, and various ec2 instances, route 53 zones & cnames, plus even EIPs all can be moved with a few simple code changes. Wow!

What else? We can resize our ELK box root volume by deploying a brand new setup, all in about ten minutes.

This kind of speed is so exciting. It brings repeatability to your engineering processes. It brings confidence to all of those components.

And best of all it allows the business to experiment with new product ideas, and accelerate in the marketplace.

And we all know what that means!

Related: I have a new appreciation for AGILE

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters