5 ways startups misstep on scalability

Russian_Dolls

Join 27,000 others and follow Sean Hull on twitter @hullsean.

1. Ignoring the database

Yes, your internet site sits on top of a database. Have you forgotten to take care of it?

Like a garden, it must be watered & tended. And a gardener will always scold you for leaving plants to wither. I guess as a database administrator by background, I’ve seen a lot of this. But truth be told it is in large part the cause of slowness.

Queries

Are you writing them? If you’re using an ORM middleware, you may be leaving this heavy lifting to a library. These are inefficient. Avoid the Vietnam of computer science.

Testing

Now that you’re writing SQL, are you testing & tuning each one? Think of it like turning apps off on your phone that you’re not using. Saves memory, battery, and general headache.

Monitoring

Now that you’ve gotten in the habit of caring for your database, keep at it. Monitor its health regularly. NewRelic as a service or Cacti, Ganglia or Collectd if you’d like to roll your own. Real data can reap real benefits.

Read: Are SQL Databases Dead

2. Shortage of caching

You’ve heard it before, we’ll say it again, make sure you’re caching. But where?

Content Network
Amazon has it’s CloudFront, Rackspace uses Akamai. There are many choices but the results are the same. Static assets such as html & css files, images & video content all part all part of the page you are serving, get dished out closer to your users. It’s like only asking them to go to a corner deli for a soda, rather than the closest supermarket.

Webserver tier

There are many things you can do to cache at the webserver. In particular you can configure to tell browsers to cache objects. One example is Cache-Control. That means longer time-to-live, so objects don’t expire by default. You can always expire them manually. There are also ways to compress objects as well. See How to cache websites & boost speed.

Between webserver & database

Are you using Memcache or redis? Caching here can reduce load on your database by as much as 10x. That’s like buying you 10x free servers, or one large one that costs 10x the price!

Most languages such as PHP provide libraries to interact with memcache. Whenever you make a call out to your database, first check memcache. If you find your key, fetch the value & done. Otherwise grab the answer from the database, and pop it into memcache.

At the database

Databases of all kinds, be they postgres, Oracle, or MySQL have a query cache. Be sure you’ve enabled & tuned yours. Also check that your buffer cache is sizeable enough to fit most frequently hit data. A hit ratio may provide you a cheap guestimate on this.

Related: Why a four letter word divides dev and ops

3. Missing metrics collection

In a recent article Why Scalability is big business I talked about collecting metrics. These are invaluable.

If you’re a home owner or renting, and want to know what you spent on energy in the past year, what do you do? You look at your heating bills for the winter months. Similarly, collecting real data on all your servers, like with cacti, or a service like NewRelic allows you to do the same thing with your servers & infrastructure.

Real hindsight, and real visibility helps everyone from operations teams, to business units evaluating past problems.

Also: Why a killer title can make or break your content efforts

4. Not building feature flags

Tractor trailers use two tires on every axil. If one fails, you are still on the road. Planes use redundant engines. Having switches built into your application to turn off non-essential features may seem abstract when your deadlines for features are looming.

But operational switches for your devops team should be seen as good foundation, and solid bedrock to build on. It means you can do the maintenance that you will need to do, and do it without interrupting customers. It also means when your site gets hammered, and we hope that day will come, you can adjust the dials, and not go down.

Related: Is Amazon RDS Difficult to Manage

5. Building on a single database

Various NoSQL databases like MongoDB, Cassandra & Hbase are distributed out of the box. Keep in mind though they make various tradeoffs to achieve this.

Meanwhile the vast majority of web applications are still built on reliable relational databases. But they don’t scale seamlessly. Build a read-only mode into your application and you’ll thank yourself for years to come. This means you can browse, even while the master database is offline. What’s more it means you can scale more easily.

Avoid solutions that try to scale writes across multiple servers. Partitioning aka sharding is terribly complex to get right, both in planning & layout. Lets not forget how do we piece together a puzzle of 8 shards with 8 pieces to a backup. Recipe for trouble. There are some new cluster options for MySQL, such as Galera. Oracle has it’s own take. But in the end you’ll do better to get a bigger box for your central datastore and keep it central.

Related: How to Deploy on Amazon with Vagrant

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters