Categories
Uncategorized

Did Disney have to fail?

via GIPHY

Well it was a big day for Disney a couple of weeks ago when they launched the much anticipated Disney+ streaming service.

And of course as widely reported, they had a really bad outage.

We’ve heard the old refrain before. We never saw traffic like this before. We were absolutely buried.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

Indeed, Disney, in their own microcosm never saw traffic like that before. But does cloud computing know how to solve this problem? Has the software, techops and devops profession solved these problems?

Answer: Yes. Just look at Facebook, Google, Netflix or a million other high traffic sites.

Interestingly, I got a letter from a recruiter over at Disney just the other day!

Hi Sean,

I wanted to reach out again to see if you’ve had an opportunity to think about exploring a future role with Disney Streaming Services?

The company is in the midst of a massive engineering expansion, as Disney+ launched today in their US, Canadian and Dutch markets. While phased with natural issues, given the pure scope of their launch, paired with unforeseen customer engagement, the team is continuing to actively to build out their engineering team in order to ensure seamless launches in Western Europe and the Asian Pacific countries in late 2019/ early 2020.

While you may not actively be on the market, I would be interested in having a preliminary chat to see if joining and impacting one of the largest scale technical projects the international video streaming industry has ever seen, could align with future career goals?

Looking forward to your response!

**Name redacted** 

Related: Why generalists are better at scaling the web

1. Management maturity

When I was reading the article above I stopped at this line:

“This is a new ballgame for Disney”

And that tells the whole story doesn’t it? Just because the industry has solved a problem, just because there are technologists out there who know how to do this, doesn’t mean they work at Disney! And further doesn’t mean Disney’s management is ready for their new reality.

But they learn quickly!

Read: Infrastructure provisioning – what is it and why is it important?

2. Streaming is a solved

The challenge of streaming content on the internet is not new. The pipes are there, the cloud can scale seemlessly. But yes there are a lot of moving parts. That’s what testing is for. And for a launch like this, one could easily launch a million test clients, using aws regions and zones around the world. And point them all at your new streaming service. Don’t want to spend the money? Scale down your stack, and send a proportional amount of traffic.

If you’re not seeing the autoscaling happen quickly enough, spinup *lots* of spare compute before launch time. That’s another option. You can always scale back after the flash hits.

And yes flash sales, daily deals or deal-a-day sites are another example. ideeli and the first unicorn Gilt Groupe

Sites like these deal with an explosion of traffic in a short period each day. Typically 90% of their traffic occurs in half to one hour of the day. So it’s a real herd that pummels the site.

Related: 6 Devops interview questions

3. Have doubts? Ask experts

To my mind, if a problem is solved, there’s no excuse to fail. But then it happens again at Airbnb and again at Dropbox and again many others.

Yes we can monitor. Yes we can test. Yes we can automate. Yes we can react. But still systems fail.

As a small pitch, I’ve helped companies like Hollywood Reporter (100m uniques per month), AppNexus, ideeli and SoulCycle scale their systems for hypergrowth. It’s not easy but it can be done!

Read: Is zero downtime even possible on RDS?

4. Good problem to have?

Oscar Wilde said…

The only thing worse than being talked about is not being talked about.

And if today’s political climate is any indication, their is some prescient irony buried there.

So even though Disney+ and many other big names have failed…

Perhaps it’s a good problem to have?

Read: How to hire a developer that doesn’t suck

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

 

Categories
All Devops

When I found gold in my customer archives

via GIPHY

I’m good at keeping notes. I’ve blogged about Can progress reports & daily notes help engagements succeed. I would give that an emphatic YES!

From helping with communication, to sharing arcane details about blocking issues, struggles & hurdles, notes can illuminate things that a CTO or manager may not otherwise be aware of.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

I was digging through mine archives recently, and found a bunch of notes on how to think about Terraform. In particular, how do you think about infrastructure as code? How do you architect to make it all work together?

1. A dead end started me backtracking

You’re going to dig your heels in by getting your application working. To do that you’ll spinup a vpn, private public subnet, bastion boxes, ECS hosts to deploy containers to, and an application load balancer endpoint. Getting that all working wasn’t terrible. We even included a prometheus node, to give us some monitoring visibility. We even added our jenkins server into the mix. Do you see where this is going?

At a certain point we of course needed to destroy the whole setup, but didn’t want to destroy the CI pipeline. Duh! And what about monitoring? Lose all that data each time no way!

Read: Infrastructure provisioning – what is it and why is it important?

2. Organize around VPCs

After dragging yourself through that, you see a bit better. It’s like standing at 20,000 feet.

Your vpn is a logical collection of instances. A box that holds your application, provides security, and gets created and destroyed with it. You can even see in your Terraform code, a subnet requires a VPN id within which to create it. And an instance requires a subnet within which to create it. For security reasons the application instances will sit within PRIVATE subnets, and only bastion box & load balancers will sit in PUBLIC subnets.

TO my mind that means each environment DEV, STAGE, PROD all get their own vpn. This also allows you to control who can access stage & production, as they have their own bastion access points.

Read: How to hire a developer that doesn’t suck

3. Build a utility VPC

What you’ll also see from the above story is that you need a place to have business wide, non-application services sit inside. Welcome the UTILITY VPC!

This can contain prometheus, ELK or other log collection service, your jenkins or other CI pipeline, and any other services that don’t logically fit within the application VPC.

Related: Why generalists are better at scaling the web

4. A VPC should be ok to destroy and rebuild in another region – in one-click

When you use infrastructure code, you want to test, create & destroy often. That shouldn’t disrupt anything. That means all state data should sit outside of those instances. Logging data, send it to logstash or cloudwatch. Application state, keep that inside of an RDS instance. And you’ve tested those backups right?

Speaking of RDS, I encountered problems with Amazon’s own backup & restore. For my money, I had a lot of problems and ended up writing a custom db dump script. That may require a custom restore to, so buyer beware. Here’s my story though… I tried to build infrastructure as code with Terraform and Amazon and it didn’t go as i expected.

You also may encounter issues when you move across regions, such as elastic IPs and so forth. And you’ll need to check and verify the code which creates and destroy S3 buckets and domain name certs. These areas gave me some hiccups, but you can work through it with diligence!

Read: Is zero downtime even possible on RDS?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Categories
All Devops Scalability Startups

How crazy can Kubernetes get?

I was flipping through Reddit and found this hilarious post referencing a Scott Adams Dilbert strip on Kubernetes.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

What I found even funnier were the comments on the Reddit thread. Read on for fun!

1. Watch your memory

One engineer said he had a dockerized app running with 3GB of memory serving just 7 customers.

Not to be outdone another ops guy pipes in that he has one using 180Gb of memory serving just a few hundred customers.

Of course this is the internet, and along comes a guy who has an app using 1TB of memory, with only one user!

Optimization be damned!

Read: Infrastructure provisioning – what is it and why is it important?

2. Beware growing application complexity

As you dockerize your application, you can support multiple versions of software and packages. This can keep you flexible but also enable engineers to kick the can down the road. Technical debt is real!

What’s more each microservice *can* be on a different stack, using a different language and framework. But just because you *can* do something doesn’t mean you should.

Though docker & kubernetes will enable the above, keep in mind your team has to support it. Using some cool new language that hasn’t really achieved critical mass? Remember the engineer who championed that, and built your business crown jewels on top of that, will eventually leave. And when he or she does, you will be faced with the challenge of finding someone who knows the stuff!

Related: 6 Devops interview questions

3. What is a microserved monolith?

Well it’s not really a thing, except it sounds fun. And a bit absurd. If all those docker containers never get optimized, they probably have layer upon layer of useless stuff. Start with a smaller base image, dont include debug stuff, and extra layers. And cleanup after installing packages.

Here’s a more detailed howto optimizing Docker image sizes.

Read: Is zero downtime even possible on RDS?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Categories
All CTO/CIO Devops Software Development Startups

How to think like a senior engineer

via GIPHY

I just read this story
the Art of interrupting engineers. While much of what I read was pretty obvious from the engineer’s perspective, the product & project manager perspective was illustrative.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

There are definitely things that senior or more experienced engineers do differently. And those things can be learned.

Here are my thoughts…

1. Document all the things

By documenting all the little pieces you are working on you gain in a few ways. You communicate to management complexity they may not see. You buy yourself more time.

Finally you may even help yourself see implicit tasks more clearly. Whenever I hear myself or someone else utter the phrase “that should be easy”, I know I’m onto one of those mysterious tasks that seems simple but never is.

Be relentless. Break big tasks into smaller ones, and ticket each and every darn one!

Also: How can 1% of something equal nothing?

2. Communicate more and better

If you’re doing agile, chances are you’re probably joining a standup everyday.

Those are opportunities to share what is blocking.

o What tradeoffs are you struggling with?
o What technical debt is slowing you down?
o What new discoveries may require a rework of the timeline?

There is surely an art buried in communication. You want to be descriptive. You don’t want to come off complaining. You want to educate stakeholders. Beware of coming off as dismissive.

Related: Is maintenance sometimes a forgotten art?

3. Anticipate. Under promise & over deliver.

If you’ve gotten in the weeds with a particular API before, you’re likely to have a gut feeling about how new features and changes may go well or poorly. Or you may have dug through the comments. Maybe you didn’t find any!?

Or maybe the particular codebase sits on top of an interface or library you haven’t worked with before. So there will be a bit more of a learning curve. Whatever the case, try to promise less than you think you can really deliver on.

You can always finish extra tickets and over deliver later!

Read: What tools and technologies are devops engineers using today

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters