Did Disney have to fail?

via GIPHY

Well it was a big day for Disney a couple of weeks ago when they launched the much anticipated Disney+ streaming service.

And of course as widely reported, they had a really bad outage.

We’ve heard the old refrain before. We never saw traffic like this before. We were absolutely buried.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

Indeed, Disney, in their own microcosm never saw traffic like that before. But does cloud computing know how to solve this problem? Has the software, techops and devops profession solved these problems?

Answer: Yes. Just look at Facebook, Google, Netflix or a million other high traffic sites.

Interestingly, I got a letter from a recruiter over at Disney just the other day!

Hi Sean,

I wanted to reach out again to see if you’ve had an opportunity to think about exploring a future role with Disney Streaming Services?

The company is in the midst of a massive engineering expansion, as Disney+ launched today in their US, Canadian and Dutch markets. While phased with natural issues, given the pure scope of their launch, paired with unforeseen customer engagement, the team is continuing to actively to build out their engineering team in order to ensure seamless launches in Western Europe and the Asian Pacific countries in late 2019/ early 2020.

While you may not actively be on the market, I would be interested in having a preliminary chat to see if joining and impacting one of the largest scale technical projects the international video streaming industry has ever seen, could align with future career goals?

Looking forward to your response!

**Name redacted** 

Related: Why generalists are better at scaling the web

1. Management maturity

When I was reading the article above I stopped at this line:

“This is a new ballgame for Disney”

And that tells the whole story doesn’t it? Just because the industry has solved a problem, just because there are technologists out there who know how to do this, doesn’t mean they work at Disney! And further doesn’t mean Disney’s management is ready for their new reality.

But they learn quickly!

Read: Infrastructure provisioning – what is it and why is it important?

2. Streaming is a solved

The challenge of streaming content on the internet is not new. The pipes are there, the cloud can scale seemlessly. But yes there are a lot of moving parts. That’s what testing is for. And for a launch like this, one could easily launch a million test clients, using aws regions and zones around the world. And point them all at your new streaming service. Don’t want to spend the money? Scale down your stack, and send a proportional amount of traffic.

If you’re not seeing the autoscaling happen quickly enough, spinup *lots* of spare compute before launch time. That’s another option. You can always scale back after the flash hits.

And yes flash sales, daily deals or deal-a-day sites are another example. ideeli and the first unicorn Gilt Groupe

Sites like these deal with an explosion of traffic in a short period each day. Typically 90% of their traffic occurs in half to one hour of the day. So it’s a real herd that pummels the site.

Related: 6 Devops interview questions

3. Have doubts? Ask experts

To my mind, if a problem is solved, there’s no excuse to fail. But then it happens again at Airbnb and again at Dropbox and again many others.

Yes we can monitor. Yes we can test. Yes we can automate. Yes we can react. But still systems fail.

As a small pitch, I’ve helped companies like Hollywood Reporter (100m uniques per month), AppNexus, ideeli and SoulCycle scale their systems for hypergrowth. It’s not easy but it can be done!

Read: Is zero downtime even possible on RDS?

4. Good problem to have?

Oscar Wilde said…

The only thing worse than being talked about is not being talked about.

And if today’s political climate is any indication, their is some prescient irony buried there.

So even though Disney+ and many other big names have failed…

Perhaps it’s a good problem to have?

Read: How to hire a developer that doesn’t suck

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

 

What can NYFW teach Chad Dickerson about net neutrality?

net neutrality

Here we are again discussing Net Neutrality… Chad Dickerson CEO of well renowned Etsy.com, has come out strongly in favor, and wants everyone to take action.

Join 27,000 others and follow Sean Hull on twitter @hullsean.

Honestly when I read his wired piece Etsy CEO to businesses: If Net Neutrality Perishes, We Will Too, I was struck by one statement:

The FCC proposal will threaten *ANY* business that uses the internet to reach it’s customers.

Any business? Quite a sweeping statement. Strikes fear into me that’s for sure… And if you read through the comments, the debate is equally fierce. One side says net neutrality is socialism! The other side says anyone against net neutrality is a shill for Comcast or Verizon! Battle lines drawn!

1. Are all businesses at risk?

Isn’t the idea that ETSY will perish overstated? Are they a high bandwidth company? Are they trying to stream video?
Is the entire Etsy community alarmed? Isn’t that a rather broad statement?

To be sure ending net neutrality will impact some businesses. Perhaps one reason VC’s like Fred Wilson are so concerned about Net Neutrality isn’t for the freedom of millions of internet users, but the threat to disruptive businesses, the startups that VC’s directly invest in.

Read: Which tech do startups use most?

2. Will all internet users be impacted?

Here again some of this debate seems overstated. I remember using the internet on a dialup modem. 300 baud, was about the speed at which you can type. Then along came 14.4, 28k and upward speeds climbed. All the while the internet was usable. Could I do all the things I can today, nope.

Even if these horrible Comcast’s & Verizon’s reduce speeds by 100 times, they will still be plenty fast for most internet users. Sure streaming video would be impacted, and yes streaming music would be impacted. But for end users, I would argue most would not be impacted. It is rather the disruptive startups & businesses that would be most impacted.

Also: Is automation killing old-school operations?

3. Are there anti-EDU parallels

In the mid-nineties, before the dot-com bubble, there was a huge raging debate about even having commercial entities on the internet at all. Enlightened internet cognoscenti considered it an abomination.

But the real world pushed it’s nose in, and today we take as a given.

Check this: Is Hunter Walk right about operations & startups?

4. Is google right about millisecond delays?

“Research from Google & Microsoft shows that delays of milliseconds result in fewer page views and fewer sales in both the short & long term”. Yep, that’s a fact. The research shows this. But what do we take away from that?

As a performance and scalability consultant I see a *TON* of websites that have huge delays, well over tiny millisecond ones that Google frets over. Internet startups struggle with performance every day.

What’s the irony? Slowdowns that Comcast or Verizon might introduce to end users pale in comparison with these larger systemic problems.

Also: 5 Ways startups misstep on scalability

5. Any lessons from sites of New York Fashion Week?

I like the Pingdom speed test tool. I used it to track the speed of some of the websites & blogs that are big for NYFW. Here’s what I found:

nyfw speed test results

What do you see? Take a look at the SIZE column. Notice something strange? The LARGEST sites, in terms of images, css & assets aren’t necessarily the SLOWEST! That’s a funny result if you consider net neutrality. If you think the network speed is the same for all websites, shouldn’t the smallest pages load fastest?

Not true at all. It’s a very simplistic way of viewing things. Fashionista.com for example is doing a ton of tuning behind the scenes. As you can see it is making their site far and away the fastest! Network bandwidth and net neutrality be damned!

Related: Are SQL Databases Dead?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 startup & scalability blogs I never miss – week 2

5 blogs week 2

Join 11,500 others and follow Sean Hull on twitter @hullsean.

Hunter Walk – Startups

If you want to have your finger on the pulse of startup land, there aren’t many better places to start than Hunter Walk’s 99% humble writings. Google finds his top posts on topics like AngelList, Advisors, and reinventing the movie theatre. Good writing, insiders view.

Read: NYC technology startups are hiring

Arnold Waldstein – Marketing

I first found Arnold’s blog using my trusty disqus discovery hack. He had written an interesting piece about new mobile shopping at popup stores like Kate Spade.

Follow him on Disqus, follow the blog, get the newsletter. All good stuff.

Read This: Why hiring is a numbers game

Claire Diaz Ortiz – Social Media

Claire writes a lot about social media, twitter & blogging. She wrote an excellent guide to increasing your pagerank, another on 30 important people to follow on twitter and more. She can even help you find a job.

Check out: Top MySQL DBA Interview questions for candidates, managers & recruiters

Bruce Schneier – Security

Bruce Schneier is one of the original bad boys of computer security. He writes about broad topics, that affect us all everyday from common sense about airport security, to the impacts of cryptography for you and me. Very worth looking at regularly, just to see what he’s paying attention to.

Also: Why operations & MySQL DBA talent is hard to find

Eric Hammond – Amazon Cloud

Eric Hammond has been writing about Amazon Web Services, EC2 & Ubuntu for years now. He maintains and releases some excellent AMIs, those are the machine images for spinning up new servers in Amazon’s cloud.

Even if you’re not big on the command line, you can get a lot of critical insight about the Amazon cloud by keeping up with his blog. Jeff Barr’s AWS blog is also good, but not nearly as critical and boots on the ground as Eric’s.

Also: 8 Questions to ask an AWS expert

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

Anatomy of a Performance Review

A lot of firms come to us with a specific scalability problem. “Our user base is growing rapidly and the website is falling over!” Or they’re selling more widgets, “Our shopping cart is slowing down and we’re seeing users abandon their purchases”. These are real startup growing pains, so what to do?

We like to take a measured approach with these types of challenges, so we thought it would be helpful to run through a hypothetical scenario and see how we work.

Related: Why website speed is crucial to business

Having trouble with scalability? Check out our 5 things toxic to scalability piece.

1. Contract outline

First we talk on the phone, or meet face to face and discuss what’s happening. Do you have one page that’s problematic? Is the website slow during certain hours? Or are you seeing erratic behavior and can’t point to a single source?

From there we outline a course of action, based on:

o talking with team, devs & architects
o reviewing systems first hand
o identifying bottlenecks and trouble spots

This with this outline we’ll include an estimate of the number of work days it’ll take to complete. We’ll then send that back to you for review, exchange a deposit and set a start date.

2. Meet team & discuss architecture

Next we’ll meet the team and review the problems in more technical detail. If you’re in NYC we’ll probably make a stop into your offices and have a warm meet & greet. If you’re located further afield we can either meet over a skype call, or arrange for us to travel to your location for the start of the engagement.

3. Measure current throughput

In order to get a sense of the current state of the systems we’ll measure some system metrics. This could be load average or queries per second or other MySQL internal metrics. We’ll also look at some business metrics such as speed of an ecommerce checkout, or a speed test on a particularly slow page.

These metrics are designed to create a baseline of where things are before any changes are made.

[quote]Measuring both business and system metrics before and after changes, allow a rough ROI measurement to be done. This goes a long way towards justifying the expense of a performance review, current and future.[/quote]

4. Review systems, configurations & setups

Next we’ll jump on the various systems and review configurations. This includes webservers, caching servers and the database servers as necessary. We’ll review memory settings, important configurations, all the dials and switches.

Along with this we’ll also review development and architecture. Are you using Java with Hibernate a popular ORM? Or perhaps CakePHP? Are you writing custom SQL code? Are developers up to speed with EXPLAIN and query profiling? For that matter is code in version control?

Just looking for a DBA? Check out our MySQL Hiring Guide.

5. Report on actionable advice & findings

Perhaps the most essential and useful part of an initial engagement is our overall findings and review report. We’ve found these are very valuable to firms as they speak to a lot of folks up and down the business hierarchy. They speak to management about high level architectural problems and structural or process related challenges. And they can speak well to developers and operations teams as they provide a third party birds eye view of day-to-day activities.

Take a look at a sample report we’ve prepared for Acme StartUp, Inc.

6. Discuss which steps to move on

From here we’ll meet again. In particular we’ll review the actionable advice. Some changes will be low cost, requiring no downtime, while others might require a downtime window. Further medium term changes might require refactoring some code and deploying. Typically the larger longer term architecture changes will also be outlined.

Based on time & costs, we’ll decide together which changes are a priority. Obviously we’ll want to move on low hanging fruit first, and move forward from there.

Want to learn more about us? Check out our testimonials and our about page.

7. Take action on agreed changes

Once we’ve decided which changes we’ll make, we’ll schedule downtime windows as needed and make the changes to systems. From there we’ll carefully observe everything for stability, and no adverse affects.

8. Measure throughput again

Based on the throughput measurements in #3 above, we’ll perform those same benchmarks again. We’ll check low level system metrics, along with higher level business & user based throughput. Both of these are important as they can provide different perspectives on changes made.

For example if the system metrics improve markedly, but the business or user metrics do not, we know are change had some affect on overall performance, but likely we did not identify the one which directly is causing the business slowdown.

9. Summarize findings & performance gain

In the most likely case they both improve markedly, and we can measure the improvements from our entire process of performance review.

This can be helpful and measuring overall return on investment for the engagement. ROI is obviously an important exercise as we want to know that the money is well spent.

10. Document solutions & recommendations

The last step is to document what we did and what we learned. This allows us to carry forward that knowledge and keep applying it to the development and operations process. This allows the business to continue adding value from the engagement even after it’s completed.

Read this far? Grab our newsletter.

31 Essential Blogs for Startups & Scalability

So many blogs, so little time! Here’s our list of the best we’ve found. Currently our favorite reader is Pulse pictured left. Starting to play around with flipboard too.

Nuts & Bolts Technical

Slashdot
One of the original tech blogs, that still covers lots of breaking news, and difficult topics. Very technical, with probing commentary. Beware the actual comments though, as they’re often full of immature and childish rants.

Planet Mysql
An aggregator of many MySQL blogs, it hits on topics from benchmarking, and advanced tuning, to new technologies on the horizon. Drupal and LAMP topics are often also covered.

mysql performance blog
Percona’s technical blog never disappoints. There are endless posts about a myriad of topics related to deploying, tuning and optimizing MySQL and all it’s variants.

Hacker News
You may not like Paul Graham, he’s easy not to. But his YCombinator News site is an awesome collection of always surprising technical topics that are sure to keep you busy.

Netflix Tech Blog
You might have read about Chaos Monkey before, that

Our very own Scalable Startups
You’re already reading us regularly of course! Why not grab our newsletter?

Programmable Web
Mashups & APIs. What more do you want? Very cool stuff here.

Also take a look at our best of compilation.

Business & Economics

HBR Blog
If you’ve ever read Harvard Business Review, you know how in depth and on point the material is. More thorough discussions than many other blogs, and excellent discussions in the comments.

Marginal Revolution
Tyler Cowen’s endlessly interesting and provocative take on the world through the eyes of economics. Like using science to analyze and solve the worlds ails, this blog always has a reasoned take on things.

NPR Planet Money
I’ve been listening to this podcast religiously since the financial crisis of 2008. It continues to intrigue and educate me in ways that college finance never did. You’ll learn a lot.

Bloomberg Businessweek
BBW despite it’s name is like Wired back in the 90’s before it got taken over by Conde, and the cutting edge writers and risk takers left. That’s right this magazine is full of analysis, creativity, and color. It’s what you’re looking for in a print magazine. One of my favorites.

Inc. Magazine
Real articles for your small business needs today. Thoughtful and topical.

Forbes Magazine
Banking, finance, politics, news.

You might also check out our Scalable Startups newsletter archives.

Venture

A VC
Fred Wilson’s iconic blog is always on the cusp, with a thoughtful and participating audience of readers.

Infochachkie
John Greathouse is a VC with a very readable blog on startups and investing.

Chris Dixon
This guy invested in tons of great startups that are household names now. With a very readable blog to match, he’s a man with ideas that we all benefit from.

Springwise
As they call it, your “Essential Fix of Entrepreneurial Ideas”.

Feld Thoughts
Brad Feld is another big VC with an excellent blog on topics relevant to Venture Capital & Startups.

Social

Andrew Chen
Consumer internet, metrics, and user growth. Brilliant idea guy. I learn from this guy’s blog everytime I check it.

Problogger
The smarties behind the book of the same name, this is essential reading for bloggers who wanna make a dent in the world.

Blog Tyrant
How to build successful blogs that make real money. Learn from Ramsay Taplin who’s done it already. Whether your blog sells products, widgets or services, there’s stuff for you here.

Kissmetrics Marketing Blog
Very good stuff on marketing, twitter, facebook and all the other good social topics.

Mixergy Blog
Business tips & startup advice with a bent towards marketing and social.

Mark Schaefer Marketing
Mark’s the brains behind the great book Return on Influence which we reviewed. His Businesses Grow blog is full of helpful ideas and insights.

Figaro Speech
You may have read my review of Word Hero and seen the earlier review of Thank You For Arguing. His blog is a real gem, extending on the wonders and lessons of word hero, you’ll be writing witty and memorable one-liners and titles that will go viral tomorrow!

Industry

Gigaom
Om Malik started out writing about the bandwidth boom and bust of the 2000’s. His blog has grown wildly to cover the industry as a whole, and contrary to the stuff you get on business insider, this is quality journalism.

AllThingsD
Another industry site with a great selection of journalists writing on the internet & startup industries.

ReadWriteWeb
Another excellent industry blog with slightly overlapping coverage to gigaom and allthingsd, but worth scanning each of them for different perspectives.

Venturebeat
Possibly a bit more venture and investment oriented than the others, but still mainly an industry coverage blog site.

Entrepreneur
Slightly more focus on business, and entrepreneurs, but also internet & startup industry topics.

Adweek
Trying to broaden my horizons by adding this one into the mix. Some very interesting topics, and plenty of overlap with internet industry and startups.

If you read this far, grab our newsletter!

Best of Guide – Highlights of Our Popular Content

We cherry pick the top 5 most popular posts of various topics we’ve covered in recent months.

Scalability Rules for managers and startups

Scalability RulesAbbott and Fisher’s previous book, The Art of Scalability received good reviews for shifting the way we think about scalability from merely splitting databases and adding servers, to include the human factors that weigh heavily on its success. Together with the authors’ distinguished pedigree (PayPal, Amazon, and eBay between them), I picked up a copy of their second book, Scalability Rules – 50 Principles for Scaling Web Sites without a second thought.

If Art was about laying a strong foundation for a scalable organization then Rules is the reference point for when you actually tackle the growth challenges. It acts as a reminder when you come to a crossroad of decision-taking, to keep with the principles of scaling. Each guiding principle is clearly explained and illustrated with examples. It also prescribes how and when to apply the rules. Continue reading “Scalability Rules for managers and startups”

iHeavy Newsletter 84 – Restaurant Scalability

restaurant scalabilityRestaurant Scalability

Could pro-waitering serve up some lessons on web scalability? Observing peak hour dining at a New York restaurant gave us some insight.

I was dining at a restaurant the other day with friends. It was a warm and cozy place, nicely decorated with a long, narrow dining room.  The food was scrumptious, yet we were getting increasingly frustrated by the service as the night went along.

With some waiting experience behind me, I could immediately see the problem. The waiters, probably through lack of experience, were making the mistake of doing one thing at a time.  They would go to a table, respond to one customer’s request, and go and fetch that item.  Back and forth, back and forth they would dart, but always dealing with one request at a time. Continue reading “iHeavy Newsletter 84 – Restaurant Scalability”

5 things toxic to scalability

The.Rohit - Flickr
The.Rohit – Flickr

Check out our followup post 5 More Things Deadly to Scalability

If you’re using MySQL checkout 5 ways to boost MySQL scalability.

1. Object Relational Mappers

ORMs are popular among developers but not among performance experts.  Why is that?  Primarily these two engineers experience a web application from entirely different perspectives.  One is building functionality, delivering features, and results are measured on fitting business requirements.  Performance and scalability are often low priorities at this stage.  ORMs allow developers to be much more productive, abstracting away the SQL difficulties of interacting with the backend datastore, and allowing them to concentrate on building the features and functionality.


Scalability is about application, architecture and infrastructure design, and careful management of server components.

On the performance side the picture is a bit different.  By leaving SQL query writing to an ORM, you are faced with complex queries that the database cannot optimize well.  What’s more ORMs don’t allow easy tweaking of queries, slowing down the tuning process further.

Also: Is the difference between dev & ops a four-letter word?

2. Synchronous, Serial, Coupled or Locking Processes

Locking in a web application operates something like traffic lights in the real world.  Replacing a traffic light with a traffic circle often speeds up traffic dramatically.  That’s because when you’re out somewhere in the country where there’s very little traffic, no one is waiting idly at a traffic light for no reason.  What’s more even when there’s a lot of traffic, a traffic circle keeps things flowing.  If you need locking, better to use InnoDB tables as they offer granular row level locking than table level locking like MyISAM tables.

Avoid things like semi-synchronous replication that will wait for a message from another node before allowing the code to continue.  Such waits can add up in a highly transactional web application with many thousands of concurrent sessions.

Avoid any type of two-phase commit mechanism that we see in clustered databases quite often.  Multi-phase commit provides a serialization point so that multiple nodes can agree on what data looks like, but they are toxic to scalability.  Better to use technologies that employ an eventually consistent algorithm.

Related: Is automation killing old-school operations?

3. One Copy of Your Database

Without replication, you rely on only one copy of your database.  In this configuration, you limit all of your webservers to using a single backend datastore, which becomes a funnel or bottleneck.  It’s like a highway that is under construction, forcing all the cars to squeeze into one lane.  It’s sure to slow things down.  Better to build parallel roads to start with, and allow the application aka the drivers to choose alternate routes as their schedule and itinerary dictate.

Using MySQL? Checkout our our howto Easy Replication Setup with Hotbackups.

Read: Do managers underestimate operational cost?

4. Having No Metrics

Having no metrics in place is toxic to scalability because you can’t visualize what is happening on your systems.  Without this visual cue, it is hard to get business units, developers and operations teams all on the same bandwagon about scalability issues.  If teams are having trouble groking this, realize that these tools simple provide analytics for infrastructure.

There are tons of solutions too, that use SNMP and are non-invasive.  Consider Cacti, Munin, OpenNMS, Ganglia and Zabbix to name a few.  Metrics collections can involve business metrics like user registrations, accounts or widgets sold.  And of course they should also include low level system cpu, memory, disk & network usage as well as database level activity like buffer pool, transaction log, locking sorting, temp table and queries per second activity.

Also: Are SQL Databases dead?

5. Lack of Feature Flags

Applications built without feature flags make it much more difficult to degrade gracefully.  If your site gets bombarded by a spike in web traffic and you aren’t magically able to scale and expand capacity, having inbuilt feature flags gives the operations team a way to dial down the load on the servers without the site going down.   This can buy you time while you scale your webservers and/or database tier or even retrofit your application to allow multiple read and write databases.

Without these switches in place, you limit scalability and availability.

Also: Is high availability overrated? The myth of five nines…

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Amazon Web Services – What is it and why is it important?

Amazon Web Services is a division of Amazon the bookseller, but this part of the business is devoted solely to infrastructure and internet servers.  These are the building blocks of data centers, the workhorses of the internet.  AWS’s offering of Cloud Computing solutions allows a business to setup or “spinup” in the jargon of cloud computing, new compute resources at will.  Need a small single cpu 32bit ubuntu server with two 20G disks attached?  One command and 30 seconds away, and you can have that!

As we discussed previously, Infrastructure Provisioning has evolved dramatically over the past fifteen years from something took time and cost a lot, to a fast automatic process that it is today with cloud computing.  This has also brought with it a dramatic culture shift in the way that systems administration is being done, from a fairly manual process of physical machines, and software configuration, one that took weeks to setup new services, to a scriptable and automateable process that can then take seconds.

This new realm of cloud computing infrastructure and provisioning is called Infrastructure as a Service or IaaS, and Amazon Web Services is one of the largest providers of such compute resources.  They’re not the only ones of course.  Others include:

  • Rackspace Cloud
  • Joyent
  • GoGrid
  • Terremark
  • 3Tera
  • IBM
  • Microsoft
  • Enomaly
  • AT&T

Cloud Computing is still in it’s infancy, but is growing quickly.   Amazon themselves had a major data center outage in April that we discussed in detail. It sent some hot internet startups into a tailspin!

More discussion of Amazon Web Services on Quora – Sean Hull