Category Archives: Scalability

Which tech do startups use most?

MySQL on Amazon Cloud AWS

Leo Polovets of Susa Ventures publishes an excellent blog called Coding VC. There you can find some excellent posts, such as pitches by analogy, and an algorithm for seed round valuations and analyzing product hunt data.

He recently wrote a blog post about a topic near and dear to my heart, Which Technologies do Startups Use. It’s worth a look.

One thing to keep in mind looking over the data, is that these are AngelList startups. So that’s not a cross section of all startups, nor does it cover more mature companies either.

In my experience startups can get it right by starting fresh, evaluating the spectrum of new technologies out there, balancing sheer solution power with a bit of prudence and long term thinking.

I like to ask these questions:

o Which technologies are fast & high performance?
o Which technologies have a big, vibrant & robust community?
o Which technologies can I find plenty of engineers to support?
o Which technologies have low operational overhead?
o Which technologies have low development overhead?

1. Database: MySQL

MySQL holds a slight lead according to the AngelList data. In my experience its not overly complex to setup and there are some experienced DBAs out there. That said database expertise can still be hard to find .

We hear a lot about MongoDB these days, and it is surely growing in popularity. Although it doesn’t support joins and arbitrary slicing and dicing of data, it is a very powerful database engine. If your application needs more straightforward data access, it can bring you amazing speed improvements.

Postgres is a close third. It’s a very sophisticated database engine. Although it may have a smaller community than MySQL, overall it’s a more full featured database. I’d have no reservations recommending it.

Also: Top MySQL DBA Interview questions

2. Hosting: Amazon

Amazon Web Services is obviously the giant in the room. They’re big, they’re cheap, they’re nimble. You have a lot of options for server types, they’ve fixed many of the problems around disk I/O and so forth. Although you may still experience latency around multi-tenant related problems, you’ll benefit from a truly global reach, and huge cost savings from the volume of customers they support.

Heroku is included although they’re a different type of service. In some sense their offering is one part operations team & one part automation. Yes ultimately you are getting hosting & virtualization, but some things are tied down. Amazon RDS provides some parallels here. I wrote Is Amazon RDS hard to manage?. Long term you’re likely going to switch to an AWS, Joyent or Rackspace for real scale.

I was surprised to see Azure on the list at all here, as I rarely see startups build on microsoft technologies. It may work for the desktop & office, but it’s not the right choice for the datacenter.

Read: Are generalists better at scaling the web?

3. Languages: Javascript

Javascript & Node.js are clearly very popular. They are also highly scalable.

In my experience I see a lot of PHP & of course Ruby too. Java although there is a lot out there, can tend to be a bear as a web dev language, and provide some additional complication, weight and overhead.

Related: Is Hunter Walk right about operations & startups?

4. Search: Elastic Search

I like that they broke apart search technology as a separate category. It is a key component of most web applications, and I do see a lot of Elastic Search & Solr.

That said I think this may be a bit skewed. I think by far the number one solution would be NO SPECIFIC SEARCH technology. That’s right, many times devs choose a database centric approach, like FULLTEXT or others that perform painfully bad.

If this is you, consider these search solutions. They will bring you huge performance gains.

Check this: Are SQL Databases Dead?

5. Automation: Chef

As with search above, I’d argue there is a far more prevalent trend, that is #1 to use none of these automation technologies.

Although I do think chef, docker & puppet can bring you real benefits, it’s a matter of having them in the right hands. Do you have an operations team that is comfortable with using them? When they leave in a years time, will your new devops also know the technology you’re using? Can you find a good balance between automation & manual configuration, and document accordingly?

Read: Why are database & operations experts so hard to find?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 ways startups misstep on scalability

Russian_Dolls

Join 27,000 others and follow Sean Hull on twitter @hullsean.

1. Ignoring the database

Yes, your internet site sits on top of a database. Have you forgotten to take care of it?

Like a garden, it must be watered & tended. And a gardener will always scold you for leaving plants to wither. I guess as a database administrator by background, I’ve seen a lot of this. But truth be told it is in large part the cause of slowness.

Queries

Are you writing them? If you’re using an ORM middleware, you may be leaving this heavy lifting to a library. These are inefficient. Avoid the Vietnam of computer science.

Testing

Now that you’re writing SQL, are you testing & tuning each one? Think of it like turning apps off on your phone that you’re not using. Saves memory, battery, and general headache.

Monitoring

Now that you’ve gotten in the habit of caring for your database, keep at it. Monitor its health regularly. NewRelic as a service or Cacti, Ganglia or Collectd if you’d like to roll your own. Real data can reap real benefits.

Read: Are SQL Databases Dead

2. Shortage of caching

You’ve heard it before, we’ll say it again, make sure you’re caching. But where?

Content Network
Amazon has it’s CloudFront, Rackspace uses Akamai. There are many choices but the results are the same. Static assets such as html & css files, images & video content all part all part of the page you are serving, get dished out closer to your users. It’s like only asking them to go to a corner deli for a soda, rather than the closest supermarket.

Webserver tier

There are many things you can do to cache at the webserver. In particular you can configure to tell browsers to cache objects. One example is Cache-Control. That means longer time-to-live, so objects don’t expire by default. You can always expire them manually. There are also ways to compress objects as well. See How to cache websites & boost speed.

Between webserver & database

Are you using Memcache or redis? Caching here can reduce load on your database by as much as 10x. That’s like buying you 10x free servers, or one large one that costs 10x the price!

Most languages such as PHP provide libraries to interact with memcache. Whenever you make a call out to your database, first check memcache. If you find your key, fetch the value & done. Otherwise grab the answer from the database, and pop it into memcache.

At the database

Databases of all kinds, be they postgres, Oracle, or MySQL have a query cache. Be sure you’ve enabled & tuned yours. Also check that your buffer cache is sizeable enough to fit most frequently hit data. A hit ratio may provide you a cheap guestimate on this.

Related: Why a four letter word divides dev and ops

3. Missing metrics collection

In a recent article Why Scalability is big business I talked about collecting metrics. These are invaluable.

If you’re a home owner or renting, and want to know what you spent on energy in the past year, what do you do? You look at your heating bills for the winter months. Similarly, collecting real data on all your servers, like with cacti, or a service like NewRelic allows you to do the same thing with your servers & infrastructure.

Real hindsight, and real visibility helps everyone from operations teams, to business units evaluating past problems.

Also: Why a killer title can make or break your content efforts

4. Not building feature flags

Tractor trailers use two tires on every axil. If one fails, you are still on the road. Planes use redundant engines. Having switches built into your application to turn off non-essential features may seem abstract when your deadlines for features are looming.

But operational switches for your devops team should be seen as good foundation, and solid bedrock to build on. It means you can do the maintenance that you will need to do, and do it without interrupting customers. It also means when your site gets hammered, and we hope that day will come, you can adjust the dials, and not go down.

Related: Is Amazon RDS Difficult to Manage

5. Building on a single database

Various NoSQL databases like MongoDB, Cassandra & Hbase are distributed out of the box. Keep in mind though they make various tradeoffs to achieve this.

Meanwhile the vast majority of web applications are still built on reliable relational databases. But they don’t scale seamlessly. Build a read-only mode into your application and you’ll thank yourself for years to come. This means you can browse, even while the master database is offline. What’s more it means you can scale more easily.

Avoid solutions that try to scale writes across multiple servers. Partitioning aka sharding is terribly complex to get right, both in planning & layout. Lets not forget how do we piece together a puzzle of 8 shards with 8 pieces to a backup. Recipe for trouble. There are some new cluster options for MySQL, such as Galera. Oracle has it’s own take. But in the end you’ll do better to get a bigger box for your central datastore and keep it central.

Related: How to Deploy on Amazon with Vagrant

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Are startup CEO’s hiding their scalability problems?

Russian_Dolls

Join 27,000 others and follow Sean Hull on twitter @hullsean.

Your site is running fine right? You have 1000 customers, and it usually runs smoothly. Just this one lingering question, why does it take five high performance EC2 instances to run the database, all on flash drives? Goood question!

The truth is one of the highest trafficed sites I managed, pulled in 100 million uniques a month, and only used three backend databases. That site was one of these wildly popular celebrity gossip sites, the ultimate guilty pleasure when you’re at the office and can’t watch reality tv!

Snickers aside, this is huge traffic. And all of the above was built on Drupal, with no ORM in the mix. It could even run, albeit noticeably slower, while memcache was disabled.

1. Servers with solid state drives

I’m very excited to see Amazon introduce servers with SSD drives. They can bring you 100x improvement of disk I/O, and that my friends is the end all and be all for databases. So why complain?

If you deploy on these boxes right out of the gates, it may be like using a crutch. You become dependent on it, and ignore real performance tuning. Solid state drives still won’t obviate that ORM middleware you’re using.

Also: Do managers & CEO’s underestimate operational costs?

2. Memcache saving your bad queries

Memcache is also a powerful tool. It sits between the database and your webservers, reducing load on the database by as much as 10x. That’s a great way to get better response time, and reduce drag on your db tier. But it’s still worthwhile performance tuning without it.

Why? If you can get your site to run without caching, it will run blazingly fast *with* it. Don’t use it as a crutch, use it as rocket fuel for your well tuned site.

Read this: Do startups need techops?

3. A legion of read slaves

I’ve seen smaller sites, using a ton of read slaves. All of it deployed to cover up slow & redundant queries pouring out of an ORM middleware layer, in this case Cake PHP.

Again, read slaves are great, but tune & test with less hardware, and get the performance up the hard way. With elbow grease!

Related: Howto automate MySQL query analysis with Amazon RDS

4. Really really big memory

64G, 128G, 256G of main memory? If I wax on about the days when you’d get excited by 64k, I’ll sound like an old timer. But with those extreme limitations, you had to write tight code. Otherwise it just wouldn’t do anything.

Really really big memory of today’s servers allows us to get lazy. I hear developers say “Hey, the database is 10G of data, and we have 64G main memory, so the whole thing will fit in memory. Problem solved!”

Duhhh… No. Why not? Because you still have to slice and dice that data. You still have to scan through for bits & pieces that aren’t indexed, then sort, and organize that into temporary memory space. In DBA speak, you’re still doing a ton of logical IOs.

Picture it another way, imagine the days when you’re on horseback, riding across the west. You travel light cause frankly your horse can carry only so much. Then along come cars, and you start loading up the trunk. You add the kitchen sign, and the rear tires are hanging on the ground. All seems fine until you hit a steep mountain, and you’re car is almost stalling at 20mph. If you had only carried the same load as you did on horseback, you’d be speeding across the country at lightning pace.

Read: Is Amazon RDS hard to manage?

5. Deploying poor code

Deadlines are looming, and new features must be deployed. So performance testing can wait until later. The code works after all.

Been there, done that. Code gets deployed and all of a sudden there are spikes on server load in the evening. Ops & DBA teams are screaming, “Who wrote this code?”.

Load testing should be a part of everyday QA & test. It’s the only way to avoid growing scalability problems.

Check this: Are SQL databases dead?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Howto automate MySQL slow query analysis with amazon RDS

iRobot1

If you’ve used relational databases for more than ten minutes, I hope you’ve heard of slow queries. Those are those pesky little gremlins that are slowing down your startup, and preventing scalability you so desperately need.

Luckily there’s a solution. What I’ve found is if I send a report to developers every week, it keeps these issues front and center, for folks that are very busy indeed.

The script below is for RDS, but you can surely modify it if you have a physical server or roll-your-own MySQL box on Amazon. Take a look & enjoy!

Join 26,000 others and follow Sean Hull on twitter @hullsean.

1. install percona tools

Percona as many probably already know, are a wildly successful services firm that support MySQL and related technologies. They also have a very popular & scalable MySQL distribution by the same name.

Even if you’re not using Percona MySQL, you definitely want to get ahold of the percona toolkit. It provides all sorts of useful tools, including the one this article is based on, query-digest.

This tool takes your stock MySQL slow query logfile as input, and summarizes it into a very useful and readable report. Formerly mk-query-digest, it’s not called pt-query-digest. See below.

You can install the percona tools easily by grabbing the repository file and installing that with rpm. From there you can just use yum or apt-get depending on your distribution.

Related: Why a killer title can make or break your content efforts

2. install aws command line tool

Amazon has consolidated all it’s command line tools into a single one called just “aws”. The options can be a little arcane, and the error messages misleading besides. What’s good though is it is slightly easier to install & configure.

Do you already use Python? Install it this way:


$ pip install awscli

If not, you’ll need to dig into the aws cli installation instructions further.

Also: Do managers underestimate operational costs?

3. edit .aws/config

After you get the tool installed, you need to setup your environment. I edited a file named /home/shull/.aws/config as follows:


[default]
region = us-east-1
aws_access_key_id = BLIBJZMKLWIL5UTNRBMQ
aws_secret_access_key = MF5J/2z7HmN92lQUrV12ZO/FBXNjDVjL52TNRWsG

Those access_key_id and secret_access_key you can find on your amazon dashboard. Click upper right hand corner under your name, select the menu item “Security Credentials”.

Check out: Are SQL Databases Dead?

4. edit send_query_report.sh

I wrote the script below so you can fairly easily edit it.


#!/bin/bash
#

# get the rds db instanceID from command line (or crontab) entry
#
AWS_INSTANCE=$1

# here's where we'll store the latest slowquery.log
#
SLOWLOG=/tmp/rds_slow.log
#SLOWLOG=`/bin/ls -tr /home/shull/*.log | /usr/bin/tail -1`

# fetch slow query log from rds box
# here I always grab the latest one.
#
/usr/local/bin/aws rds download-db-log-file-portion --db-instance-identifier $AWS_INSTANCE --output text --log-file-name slowquery/mysql-slowquery.log > $SLOWLOG

# query report output
SLOWREPORT=/tmp/reportoutput.txt

# pt-query-digest location
MKQD=/usr/local/bin/pt-query-digest

# run the tool to get analysis report
$MKQD $SLOWLOG > $SLOWREPORT

# today's date in a variable
TODAY=`/bin/date +\%m/\%d/\%Y-\%H:\%S`
#YESTERDAY=`/bin/date -d "1 day ago" +\%m/\%d/\%Y-\%H:\%S`

# report subject
SUBJECT="Sean Query Report -- $TODAY "

# recipient
EMAIL="hullsean@gmail.com"

# send an email using /bin/mail
/usr/bin/mailx -s "$SUBJECT" "$EMAIL" < $SLOWREPORT

Note, if you don't have mailx installed, it should be available in your repository. Use apt-get or yum as necessary to get it installed.

Also: Is high availability overrated & near impossible to deliver?

5. Add to crontab

After you've tested the above script from command line, you will want to add it to a weekly cron job. Voila, automation! Don't forget to chmod +x to make it executable. :)


00 09 * * 5 /home/shull/send_query_report.sh seandb

Read: Are MySQL DBA's impossible to find?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don't work with recruiters

If you use MySQL in the Amazon cloud, you need to ask yourself this question

Join 25,000 others and follow Sean Hull on twitter @hullsean.

Are you serious about backups?

If you’re just using Amazon EBS snapshots, that may not be sufficient. There’s a good chance it won’t protect you against your next data loss.

That’s why I like to have a few different types of backups

Also: 5 more things deadly to scalability

Protect against operator error

mysqldump is a tool every DBA is familiar with. Same as a hotbackup or snapshot you say? Just more labor? Not true.

A dump allows you to restore one table, or one schema. That’s why they’re also known as logical backups. What’s more you can edit the file, remove indexes, change object names, or datatypes. All these can be essential in the screwy and unpredictable event of a real world outage.

Expect the unexpected!

Read: Why devops talent is in short supply

Test those backups regularly

If you haven’t actually tried to restore, you really don’t know if you have everything. Did you backup stored procedures & database code? How about grants? Database events? How about cronjobs? What about the my.cnf file? And your replication configuration?

Yes there are a lot of little pieces, and testing your backups by rebuilding everything is an attempt to poke holes in your plan, and hit issues before d-day!

Related: MySQL interview guide for managers and candidates alike

Replication isn’t a backup

Replication is getting better and better in MySQL. It used to fail regularly. MyiSAM was very unpredictable. But even in the comfortable realm of Innodb, there can still be data drift. If you’re on MySQL 5.0 or 5.1, you should consider performing regular checksums. These test the integrity of data and compare what’s actually in master & slave. Bulletproofing MySQL replication with checksums.

Read: Why high availability is so very hard to deliver

Have you considered security around your backup files?

While you’re thinking about backups, make sure the files themselves are secure. Remember they contain your crown jewels. Hopefully individual data that’s sensitive is encrypted, but still you should secure their final resting place as well.

If you’re using S3, consider encrypting the file before shipping it up to the bucket.

Read this: Why a four letter word divides dev and ops

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 cloud ideas that aren’t actually true

storm coming

Join 20,000 others and follow Sean Hull’s scalability, startup & innovation content on twitter @hullsean.

Cloud computing is heralding us into a wonderful era where computing can be bought in small increments, like a utility. This changes the whole way we plan, manage budgets, and accelerates startups making them more agile.

But it’s not all wine & roses up there. I’ve heard a few refrains from clients over the years, and thought I’d share some of the most common.

1. Scaling is automatic

Rather recently I was working with a client on building some sophisticated reports. They needed to slice & dice customer data, over various time series, and summarize with invoices & tracking data. Unfortunately their dataset was large, in the half terabyte range.


Client: Can we just load all this data into the cloud?
Me: Yes we can do that. Build a system in Amazon public cloud, can support large datasets.
Client: I want it to scale easily. So we won’t have these slow reports. And as we add data, it’ll just manage it easily for us.
Me: Well it’s a little bit more complicated than that, unfortunately.

Unfortunately this is a rather familiar conversation that I have quite often. A lot of the press around cloud scalability, centers around auto-scaling, Amazon’s renowned & superb virtualization feature. Yes it’s true you can roll out webservers to scale out this way, but that’s not the end of the story. Typically web applications have a lot of components, from caching servers, to search servers, and of course their backend datastore.

But can we scrap our relational database, such as MySQL and go with one that scales out of the box like Riak, Cassandra or Dynamodb?

Those NoSQL solutions are built to be distributed from the start, it’s true. And they lend themselves to that type of architecture. However, if you’ve built up a dataset in MySQL or Oracle, and more so an application around that, you’ll have to migrate data into the NoSQL solution. That process will take some time.

Like teaching a fish to fly, it make take some time. They do well in water, but evolution takes a bit longer.

Related: RDS or MySQL 10 use cases

2. Disaster recovery is free

In the traditional datacenter, when you want DR, you setup a parallel environment. Hopefully not in the same room, same city or same coast even. Preferrably you do so in a different region. What you can’t get around is dishing out cash for that second datacenter. You need the servers, just in case.

In the cloud, things are different. That’s why we’re here, right? In amazon you have regions already setup & available for plugin-n-play use. Setup your various components, servers, software & configure. Once you’ve verified you can failover to the parallel environment you can just turn off all those instances. Great, no big charges for all that iron that you’d pay for to keep the rooms warm in an old-school datacenter. Or do you?

As it turns out, since you don’t have this environment running all the time, you’ll want to test it more often, run fire drills to bring the servers back online. That’ll incur some costs in terms of manpower. You’ll also want to include in there some scripts to start those servers up, and/or some detailed documentation on how to do that. And don’t lose that documentation, either will you?

You may also want to build some infrastructure as code unit tests. Things change, code checkouts evolve, especially in the agile & continuous integration world. Devops beware!

Read this: Why a killer title can make or break your content efforts

3. Machines are fast

Fast, fast, fast. That’s what we expect, things keep getting faster, right? Hard to believe then that the world of computing took a big step backward when it jumped into the cloud. Something similar happened when we jumped to commodity Linux a decade ago.

In amazon, it’s a multi-tenant world. And just like apartment buildings, popular restaurants, or busy highways you must share. When things are quiet you may have the road to yourself, but it’ll never be as quiet as a dirt road in the country!

Amazon is making big strides though. They now offer memory optimized & storage optimized instances. And an even bigger development is the addition of the most important feature for performance & scalability. That said the network & EBS can still be a real bottleneck.

Also: What is a relational database & why is it important?

4. Backups aren’t necessary

I’ve experienced a few horror stories over the years. I wrote about one noteworthy one When fat fingers take down your business.

True EBS snapshots make backing up your whole server, well a snap! That said a few extra steps have to happen (flush the filesystem & lock tables) to make this work for a relational database like MySQL or Oracle. And suddenly you have a verification step that you also need to perform. You see no backups are valid until they’ve been restored, remember?

But even with these wonderful disk snapshots, you’ll still want to do database dumps, and perhaps table dumps. Operator error, deleting the wrong data, or dropping the wrong tables, will always be a risk. Ignore backups at your own peril!

Check this: Why CTOs underestimate operational costs

5. Outages won’t happen

In an ideal world, everything is redundant, and outages will be a thing of the past. We’ll finally reach five nines uptime and devops everywhere will be out of work. :)

It’s true that Amazon provides all the components to build redundancy into your architecture, and very cutting edge firms that have taken netflix’s approach with chaos monkey are seeing big improvements here. But AirBNB did fail and at root it was an Amazon outage that shouldn’t ever happen.

Read: Why Oracle won’t kill MySQL

Get more. Monthly insights about scalability, startups & innovation.. Our latest Are SQL Databases Dead?

Why I can’t raise the bar at every firm

Screen-shot-2012-08-02-at-1.28.35-PM

Join 17,000 others and follow Sean Hull on twitter @hullsean. Also check out: Who is Sean Hull?

It may seem counterintuitive. If I am not the best solution provider, why on earth would I highlight it?

I believe by pointing those cases out, I also underline the clients and problems that I’m particularly well suited too, and for which I can really provide value. Read on!

1. People Problems

Sometimes, you’re hired to solve a particular problem which is framed as a technical one. Some process needs to be reworked, recoded or retooled. It’s framed as a technical problem, yet as things unfold the client already has the expertise in-house to solve & write the code. What then?

It may be that the right people aren’t communicating, project managers aren’t seeing the issues, or part of the human systems are gummed up. We can’t raise the technical bar, but we can help getting those folks talking.

I wrote about this before in When You’re Hired to Solve a People Problem.

Also: Why are oil spills & financial instability related to datacenter outages?

2. ORM Usage & Technical Debt

If you’ve read my blog you know I am not very fond of Object Relational Modelers.

I would also argue as Ward Cunningham does so elloquently that technical debt can be a real and pressing problem.

Here we would help identify and frame the problem, though the work of raising the bar technically involves the longer process of retooling & refactoring your code base.

Related: Why database choices are tricky

3. Where Commodity & Offshore Works

Some firms are already making use of odesk or offshoring resources, where you might pay as little as $150/day. If you have a very technical manager or CTO, such a solution may work well for you.

At the other end of the spectrum are the high priced senior consultants from firms like Oracle, Percona or Pythian. Yes they may set you back as much as $3500/day.

In those cases a scalability & performance review may make sense. Here’s how.. Although specialists are necessary, remember to ask yourself Why generalists are better at scaling the web.

We sit in the sweet spot between the two options. With low overhead, our prices are more affordable. At the same time you’re getting a whole lot more than a commodity solution. We’ll communicate in plain language with folks at every technical level. And for many firms that in itself is a value add.

Check this: Does Oracle Aim to Kill MySQL?

4. Existing team did their homework

Believe it or not, I’ve gone into consulting engagements where the existing team has really really done their homework.

In those cases it becomes much harder to raise the bar technically. In those cases I can help when existing team missed something. But more importantly, I can validate a correct setup, or identify technical debt.

Having an outside perspective then, can provide reassurance. As I see ten to fifteen new environments per year, I’ve seen hundreds in the past decade & a half. That’s helpful perspective in itself.

Read: Does Oracle Aim to Kill MySQL?

5. Availability & Uptime Are Already High

I wrote in depth about high availability in the Myth of Five Nines

At the end of the day, availability can only approach perfection, not actually reach it. That’s a property of complex systems. If your uptime is already extremely high, again we can validate your environment, review and provide & summarize findings. But we may not be able to raise the bar.

If that’s you, it’s a good problem to have!

Also: Why AirBNB Didn’t Have to Fail

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why Scalability Is Big Business

Russian_Dolls

Join 16,500 others and follow Sean Hull on twitter @hullsean.

1. Complexity Is Growing

Despite automation & the mass migration to the cloud, or perhaps because of it, complexity continues to grow. Back in the dot com era a typical infrastructure included a load balancer, a couple web servers, one oracle database, and that was pretty much it.

Now that has multiplied. Pile on top of that three to five more webservers, a search server, a page cache, an object cache, one or more slave databases and more. You may have a utility server with jenkins for continuous automation, monitoring applications like nagios and cacti, your source code repository and perhaps configuration management like Puppet or Chef.

That’s not only more moving parts, it’s a wider swath of skills and technologies to understand. That’s one reason Generalists Are Better At Scaling The Web.

Also: Are SQL Databases Dead?

2. Developer Mandate: Features

The pressure to build features that can directly be monetized is obvious. Startups especially have the pressure to grow fast and grow now. So security, technical debt, and scalability often take a back seat. What’s more in small scrappy and lean startups, ops sometimes falls on the shoulders of one competent but overworked developer.

Related: Why Oracle Won’t Kill MySQL

3. Startups Growing Pains

With hyper growth, startups can go from 100 customers to millions overnight. That kind of popularity is a good problem to have. But if your app hits a wall and suddenly falls over, everyone is scrambling. The pressure builds, as fear of losing that traction mounts, and heads are put on the chopping block.

Read: AirBNB Didn’t Have To Fail

4. Missing Browse-only Mode & Feature Flags

Ever been browsing for airline tickets, then go to order and get an error? Try again later? If so you’re familiar with a browse-only mode. This is a very powerful addition to any web application but is very often left out. Some mistakenly believe it won’t work for their application, as users will always be changing data.

Ever visited a website that has star ratings, only to find them missing? Or temporarily unable to edit your rating for a piece of content? This amounts to what’s called a feature flag. These powerful switches give operations teams the ability to disable heavy features, while the side is under tremendous load. They can take a huge burden off the shoulders of your servers when you hit that scalability cliff.

Check this: Why I Don’t Work With Recruiters

5. Operations as an afterthought

I outlined some of the top reasons Why Startups Desperately Need Techops. It is a repeating refrain. Priorities of a growing startup often involve taking on technical debt. But if that isn’t managed carefully you’ll run into some of the problems that Ward Cunningham Warns Us About.

Also: 5 Things Are Toxic To Scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Are SQL Databases Dead?

mesa verde city

I like the image of this city of Mesa Verde. It’s fascinating to see how ancient cities were built, especially as an inhabitant of one of the worlds largest cities today, New York.

I’m a long time relational database guy. I worked at scores of dot-coms in the 90′s as an old-guard Oracle DBA, and pivoted to MySQL into the new century. Would a guy like me who’s seen 20 years of relational database dominance really believe they could be dying?

There’s a lot to be excited about in this new realm of db, and some interesting bigger trends that are pushing things in a new way.

Join 15,100 others and follow Sean Hull on twitter @hullsean.

1. Growing use of ORMs

ORM probably sounds like some strange fossil archeologists just dug up in the ancient city of Mesa Verde. But they’re important. You may know them by their real-life names, Hibernate, Active Record, SQL Alchemy and Cake. There are many others. Object Relational Modelers provide a middleware between developers and the SQL of your chosen relational database. They abstract away the nitty gritty, and encapsulate it into a library.

In a way they’re like code generators. Mark Winand talks about them in SQL Performance Explained warning of the “eager fetching” problem. This is DBA speak for specifying all columns (SELECT *) or fetching all rows, when you don’t need them all. It’s inefficient in terms of asking the database to read & cache all that data, but also to send it across the network and then discard it on the webserver side. Like a lazy housekeeper the clutter & dust will grow to overwhelm you.

Martin Fowler is the author of the great book NoSQL Distilled. He tries to walk the fence in his post ORM Hate, trying to balance developers love of ORMs, and the obvious need for scalability. Ted Neward calls ORMs the Vietnam of Computer Science.

Mattias Geniar points out that BAD ORMs are infinitely worse than bad SQL and another on High Scalability by Drewsky The Case Against ORM Frameworks.

If you agree the ORM conversation is still a huge mess, you’ll be excited to know that NoSQL sidesteps it completely. They’re built out of the box to interface more like data structures, than reading rows and columns. So you eliminate the scalability problems they introduce when you go NoSQL. That makes developers happy, and pleases DBAs and techops too. Win!

Read: Why Oracle won’t kill MySQL

2. Widening field of options

NoSQL databases are not simply key value stores, though some like Memcache and Riak do fit that mold.

Mongodb offers configurable consistency & durability & the advantages of document storage, no need for an ORM here. You also have a mix of indexing options, that go a little deeper than other NoSQL solutions. A sort of middle ground solution that offers the best of both worlds.

Cassandra, a powerful db that is clustered out of the box. All nodes are writeable, and there are various ways to handle conflict resolution to suit your needs. Cassandra can grow big, and naturally takes advantage of cloud nodes. It also has a nice feature to naturally age out data, based on settings you control. No more monumental archiving jobs.

Hbase is the database part of Hadoop, based on Google’s seminal Bigtable paper.

Redis is another option with growing popularity. It’s a key-value store, but allowing more complex data in it’s buckets, such as hashes, lists, sets and sorted sets. Developers should be salivating at this one.

Also: 5 Great Things about Markus Winand’s Book SQL Performance Explained

3. Lowering bar

The old world of relational databases treat data as sacrosanct. DBAs are tasked with protecting it’s integrity & consistency. They manage backups to protect against disaster. In this world, every bit of data written is as sacred as any other, whether it’s your bank account balance, or a comment added to a facebook discussion.

But modern non-relational databases introduce the idea of eventually consistent. DBAs and architects would say we are relaxing our durability requirements. What they mean is data can get slightly out of sync and we’re ok with that. We’ll build our web applications to plan for that, or even in the case of Riak expose the levers of durability directly to the developers, allowing them to make some changes instant, while others more lax and lazy.

Check this: Why high availability is so very hard to deliver

4. Cloud demands

Virtualized environments like Amazon EC2, give easy access to legions of servers. Availability zones & regions only widen the deployment options. So deploying a single writeable master, the way traditional relational databases work best, is not natural.

Databases like Cassandra, Mongo & Redis are clustered right out of the box. They grew up in this virtual datacenter environment and feel comfortable there.

Related: Why I wrote the book on Oracle & Open Source

5. Only DBAs understand them

Devs may whine at this statement, and to be fair it’s a generalization. The popularity of ORMs speaks volumes here. Anything to eliminate the dreaded SQL writing. Meanwhile DBAs bemoan the use of ORMs for they represent everything they’re trying to fix.

SQL is hard enough, but the ugly truth is each database vendor has their own implementation, their own optimizations, their own optimal tweaks. Even between database versions, SQL code may not perform consistently.

Identifying slow SQL and tweaking it remains one of the primary tasks of performance tuning, for this reason. It hasn’t changed much in my two decades on the job.

Also: Why bemoaning AWS performance sounds like Linux detractors circa 1999

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 great things about Markus Winand’s book SQL Performance Explained

markus winand sql performance explained book

Join 12,100 others and follow Sean Hull on twitter @hullsean.

1. Covers databases broadly

You may not have noticed, but there’s a whole spectrum of relational databases on offer. Of course in the database world, most get infatuated with one, and that becomes their bread & butter before long. Their life, their passion, their devotion.

That’s fine as far as it goes, but Winand really stands out, offering a spectrum of ideas and optimization techniques for different platforms. If you’re an Oracle-only or MySQL-only dba you’ll gain a lot from this book but even more importantly if you work in professional services, and need to communicate with DBAs brought up on one of these platforms, it becomes like a rosetta stone for SQL query tuning.

Read: Why devops talent is in short supply

2. Shows you how to fish

I find database books and methods fall into two sort of broad categories. There’s the call Oracle support method, where you’ll be handed one very specific set of steps, commands, and a path to solve each specific problem. It’s more about memorization, it’s like they actually hand you the fish.

Then there is the investigative method, where you learn how to use a magnifying glass to look at fingerprints, and check for DNA samples, and interrogate suspects. You know learn the tools of the trade.

That’s what Markus brings you, in all it’s delicious glory.

Read: Why high availability is so very hard to deliver

3. It’s concise

Another gripe I have of technical books is that the publishing model is, this is a textbook and since the price tag is large, let’s make the physical book large! Of course no one wants to carry those books around. I even recently bought a kindle to solve this problem.

SQL Performance Explained is more a paperback book form factor, and that means you can tote it around with you easily, and keep it with you at work. Read it on the train, commuting to work.

200 pages packed cover to cover with all sorts of good chapters, including a primer on indexes & types, scalability & performance, joins, clustering, Top-N queries, DML, and more.

Read this: Why a four letter word divides dev and ops

4. It’s technical but accessible

If you’re a real rock bottom beginner, you might want to dig a bit more on your SQL syntax, and some of the basics. You could also keep a 101 book side-by-side, while you’re reading this book.

For the intermediate & advanced DBAs out there, this book will sit comfortably in your paws as you flip the pages and learn something new. For instance just today I learned that Postgres can index NULLs while MySQL, Oracle and SQL*Server cannot. Learn something new everyday.

Related: MySQL interview guide for managers and candidates alike

5. Gives you answers you can use today

After twenty years of consulting, I’ve seen a few patterns emerge. Besides the spectrum of team & communication challenges, firms hitting the performance wall often have issues with their relational databases.

Yes those databases are sometimes on the wrong hardware, or their are other obscure problems with setup or configuration. But the bulk of issues center on badly written SQL.

SQL is a much reviled language and often misunderstood. And it doesn’t seem like developers have gotten that much better at it over the years. It would explain the rise of NoSQL databases, as they often speak REST or xml, no need for pesky sequel.

One parting note. For all the devs and architects out there, who want to sing the virtues of ORMs, this book hits that squarely in the nose. By showing how differently each relational database implements SQL, performs work, and optimizes, Winand also illustrates the naivete behind trying to write database independent application code.

If you’re a developer and don’t know how to profile a query or run explain plan, don’t walk, run to your closest Amazon.com store and get this book!

Also: 5 more things deadly to scalability

Criticisms

If I were to offer two slight criticisms, it would be these. First, the index is a bit wonky. When I look under “P” for example, there’s no Postgres, while one quarter of the book is obviously devoted to that platform. Further, looking up NULL which are covered in depth, in various places in the book, only has one entry in the index, p54 on Oracle. So the index could be a bit more robust to be useful.

The other criticism is more perhaps my bias. On page 96, when he discusses ORMs I thought he was rather… shall we say gentle. Although he clearly states that “eager fetching” is problematic, I don’t think he goes far enough to condemn it. In my experience ORMs are always trouble.

Then again why am I complaining, their use keeps me forever employed.

Want a copy? Markus Winand’s book site has all the goods!.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters