Are SQL Databases Dead?

mesa verde city

I like the image of this city of Mesa Verde. It’s fascinating to see how ancient cities were built, especially as an inhabitant of one of the worlds largest cities today, New York.

I’m a long time relational database guy. I worked at scores of dot-coms in the 90’s as an old-guard Oracle DBA, and pivoted to MySQL into the new century. Would a guy like me who’s seen 20 years of relational database dominance really believe they could be dying?

There’s a lot to be excited about in this new realm of db, and some interesting bigger trends that are pushing things in a new way.

Join 15,100 others and follow Sean Hull on twitter @hullsean.

1. Growing use of ORMs

ORM probably sounds like some strange fossil archeologists just dug up in the ancient city of Mesa Verde. But they’re important. You may know them by their real-life names, Hibernate, Active Record, SQL Alchemy and Cake. There are many others. Object Relational Modelers provide a middleware between developers and the SQL of your chosen relational database. They abstract away the nitty gritty, and encapsulate it into a library.

In a way they’re like code generators. Mark Winand talks about them in SQL Performance Explained warning of the “eager fetching” problem. This is DBA speak for specifying all columns (SELECT *) or fetching all rows, when you don’t need them all. It’s inefficient in terms of asking the database to read & cache all that data, but also to send it across the network and then discard it on the webserver side. Like a lazy housekeeper the clutter & dust will grow to overwhelm you.

Martin Fowler is the author of the great book NoSQL Distilled. He tries to walk the fence in his post ORM Hate, trying to balance developers love of ORMs, and the obvious need for scalability. Ted Neward calls ORMs the Vietnam of Computer Science.

Mattias Geniar points out that BAD ORMs are infinitely worse than bad SQL and another on High Scalability by Drewsky The Case Against ORM Frameworks.

If you agree the ORM conversation is still a huge mess, you’ll be excited to know that NoSQL sidesteps it completely. They’re built out of the box to interface more like data structures, than reading rows and columns. So you eliminate the scalability problems they introduce when you go NoSQL. That makes developers happy, and pleases DBAs and techops too. Win!

Read: Why Oracle won’t kill MySQL

2. Widening field of options

NoSQL databases are not simply key value stores, though some like Memcache and Riak do fit that mold.

Mongodb offers configurable consistency & durability & the advantages of document storage, no need for an ORM here. You also have a mix of indexing options, that go a little deeper than other NoSQL solutions. A sort of middle ground solution that offers the best of both worlds.

Cassandra, a powerful db that is clustered out of the box. All nodes are writeable, and there are various ways to handle conflict resolution to suit your needs. Cassandra can grow big, and naturally takes advantage of cloud nodes. It also has a nice feature to naturally age out data, based on settings you control. No more monumental archiving jobs.

Hbase is the database part of Hadoop, based on Google’s seminal Bigtable paper.

Redis is another option with growing popularity. It’s a key-value store, but allowing more complex data in it’s buckets, such as hashes, lists, sets and sorted sets. Developers should be salivating at this one.

Also: 5 Great Things about Markus Winand’s Book SQL Performance Explained

3. Lowering bar

The old world of relational databases treat data as sacrosanct. DBAs are tasked with protecting it’s integrity & consistency. They manage backups to protect against disaster. In this world, every bit of data written is as sacred as any other, whether it’s your bank account balance, or a comment added to a facebook discussion.

But modern non-relational databases introduce the idea of eventually consistent. DBAs and architects would say we are relaxing our durability requirements. What they mean is data can get slightly out of sync and we’re ok with that. We’ll build our web applications to plan for that, or even in the case of Riak expose the levers of durability directly to the developers, allowing them to make some changes instant, while others more lax and lazy.

Check this: Why high availability is so very hard to deliver

4. Cloud demands

Virtualized environments like Amazon EC2, give easy access to legions of servers. Availability zones & regions only widen the deployment options. So deploying a single writeable master, the way traditional relational databases work best, is not natural.

Databases like Cassandra, Mongo & Redis are clustered right out of the box. They grew up in this virtual datacenter environment and feel comfortable there.

Related: Why I wrote the book on Oracle & Open Source

5. Only DBAs understand them

Devs may whine at this statement, and to be fair it’s a generalization. The popularity of ORMs speaks volumes here. Anything to eliminate the dreaded SQL writing. Meanwhile DBAs bemoan the use of ORMs for they represent everything they’re trying to fix.

SQL is hard enough, but the ugly truth is each database vendor has their own implementation, their own optimizations, their own optimal tweaks. Even between database versions, SQL code may not perform consistently.

Identifying slow SQL and tweaking it remains one of the primary tasks of performance tuning, for this reason. It hasn’t changed much in my two decades on the job.

Also: Why bemoaning AWS performance sounds like Linux detractors circa 1999

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Scalable Startups is back – happy holidays!

CC license via barrettward
CC license via barrettward

Join 15,100 others and follow Sean Hull on twitter @hullsean.

After some intermittent outages over the past weeks, we managed to get the site rebuilt on a brand new server. Yes we’re still hosted at a traditional datacenter, 1and1.com to be exact. They were very helpful through the recovery process.

Here are five lessons learned.

1. Keep various types of backups

While the site was down, I was scrambling to consider what I had lost and what would be the fastest way to get the site back online. First there is the content, images and database backup. But wordpress also has an export function, so an additional XML backup of all site content proved invaluable. Further you have the software, the content management system of wordpress itself. Yes you can download a new copy, but what about the plugins, and header modifications to support Google Analytics?

Read: Why Oracle won’t kill MySQL

2. Be patient with the support staff

Eventually you’ll need to get on a call with the support techs. They may try to help, but speak of worse case scenarious, and ask if you have a backup, like you’re really going to need it! Remain calm, and ask for clarification.

In my case, the mirrored root drives looked unsalvageable. Turns out that although the md volumes – Linux’s software RAID – could not be rebuilt, the good drive contained most of the data I needed to restore. In that case a filesystem check fixed the journals and brought back a copy of data.

Related: Why Affordable Care Act desperately needs Techops

3. Keep detailed notes of all your components

When you’re scrambling, and frustrated, notes save you. There are way more moving parts than you can keep all in your head, so why not keep good notes too?

For my site, there is wordpress, the version, MySQL and it’s version, config files for Apache & MySQL, dump files of individual schemas, existing installed plugins, themes, mods, google analytics configuration, passwords for the server, and wordpress itself. And don’t forget the logins to your hosting dashboard, PINs and secret identifiers. Write it all down!

Also: 5 Great Things about Markus Winand’s Book SQL Performance Explained

4. Monitor and test more

As it turns out I wasn’t hacked this time around, but was having a disk failure. This was manifesting in a strange way. Software RAID needs to be monitored, as does the syslog & mysql & apache logs.

Also, always be testing. I found that although I was using pingdom to notify me of outages, it would not *repeat* that notification. Since I had an extended outage of many days off and on, I had no idea of this. Pingdom had sent me one notification which I missed, and none to follow.

Check this: Why high availability is so very hard to deliver

5. Don’t forget the holidays!

Yes an outage is serious, but keep a sense of perspective. Have a splash page sitting at the ready, while you’re fiddling with all the components.

Although your digital billboard is down, your audience and customers are probably more patient than you realize. If you provide valued content, they’ll be back!

Read: How to do an architecture assessment for a merger

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to do technical assessment for a merger acquisition

tsmitty11 big eats small fish

Join 15,000 others and follow Sean Hull on twitter @hullsean.

My newsletter goes out the first few days of every month. I invite everyone I’ve ever worked with, so many of my present and former colleagues & clients receive it. I often receive a few emails in the days following, with requests for help, advice & expertise.

Recently I was referred on a merger acquisition project. The project involved evaluating the technology of the target company, to understand how it would fit in with existing infrastructure, and what challenges they might face.

1. What’s your tech stack?

The scope of the project included evaluating existing application code. We looking closely at:

o What programming languages & versions were in use?
o What volume of code was there, when was it built, and how was it maintained?
o Was the code well commented & organized?
o Are those languages or technologies popular in the marketplace today?
o Are the foundation technologies still supported, either commercially or by open source communities?

Related: When you’re hired to solve a people problem

2. What’s your team stack?

One of the points that came up during the discovery phase was subject matter expertise. We found that the acquiring company’s DNA was around Windows, SQL Server & IIS. Their applications had been developed mostly on C#, so their team had skills around that stack.

The target company, on the other hand was Oracle on Unix. Here the team & expertise had a different heritage.

These two stacks have very different people & cultures behind them. Those would likely introduce unique challenges were you to merge those two firms, as the former would not have skills and expertise to manage & maintain the latter. Trimming teams down, or consolidating hardware & components would likely prove challenging.

Also: A CTO must never do this

3. Where are your pain points

We also evaluated current problem areas in the target application.

o Where were team members struggling most?
o What type of performance problems existed?
o Was there data mismatch or redundancy?
o Were business units struggling in some area to report on the data they needed?

Read this: When you have to take the fall

4. Style of software development

In software development, the traditional building model is called waterfall. It involves specing out requirements, spending a long period writing code, and then releasing it all at once to be tested & deployed. Typically during development period, there isn’t a working version of the system, as it’s undergoing change.

The risk with waterfall is that what comes out the other end won’t be what feature specification teams envisioned. Worse still is when the resulting product is full of bugs, or has major performance problems. The ACA used this method to develop healthcare.gov website, resulting in a whole host of problems on launch. It’s why I’ve advocated for real techops at Healthcare.gov.

These days, the modern and arguably superior model is called agile software development. This involves writing code in much much smaller chunks, and for each releasing a small test to verify that it performs it’s function. These unit tests allow for software to be continuously deployed, perhaps multiple times per day.

Check this: How we do a performance review

5. Legacy or Open Source

A last point of evaluation is the use of legacy software components such as Oracle, versus the more nimble and open source components many internet firms use, such as Linux, Apache, MySQL & PHP or Python. These later components make the stack of many modern web facing applications, are supported by the internet community, and provide flexibility & configurability in the cloud.

Fred Wilson has advocated for Open Source in various postings on his A VC blog. Although these technologies offer great opportunity & strength, your team may not have experience bushwacking through the DIY world of open source and that would be a major consideration for a merger.

DNA and culture of the acquiring company, and the target company have a huge impact on whether those technologies will all play well together as the firm grows up.

Also: A sample executive summary we did for Acme Startup, Inc.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 great things about Markus Winand’s book SQL Performance Explained

markus winand sql performance explained book

Join 12,100 others and follow Sean Hull on twitter @hullsean.

1. Covers databases broadly

You may not have noticed, but there’s a whole spectrum of relational databases on offer. Of course in the database world, most get infatuated with one, and that becomes their bread & butter before long. Their life, their passion, their devotion.

That’s fine as far as it goes, but Winand really stands out, offering a spectrum of ideas and optimization techniques for different platforms. If you’re an Oracle-only or MySQL-only dba you’ll gain a lot from this book but even more importantly if you work in professional services, and need to communicate with DBAs brought up on one of these platforms, it becomes like a rosetta stone for SQL query tuning.

Read: Why devops talent is in short supply

2. Shows you how to fish

I find database books and methods fall into two sort of broad categories. There’s the call Oracle support method, where you’ll be handed one very specific set of steps, commands, and a path to solve each specific problem. It’s more about memorization, it’s like they actually hand you the fish.

Then there is the investigative method, where you learn how to use a magnifying glass to look at fingerprints, and check for DNA samples, and interrogate suspects. You know learn the tools of the trade.

That’s what Markus brings you, in all it’s delicious glory.

Read: Why high availability is so very hard to deliver

3. It’s concise

Another gripe I have of technical books is that the publishing model is, this is a textbook and since the price tag is large, let’s make the physical book large! Of course no one wants to carry those books around. I even recently bought a kindle to solve this problem.

SQL Performance Explained is more a paperback book form factor, and that means you can tote it around with you easily, and keep it with you at work. Read it on the train, commuting to work.

200 pages packed cover to cover with all sorts of good chapters, including a primer on indexes & types, scalability & performance, joins, clustering, Top-N queries, DML, and more.

Read this: Why a four letter word divides dev and ops

4. It’s technical but accessible

If you’re a real rock bottom beginner, you might want to dig a bit more on your SQL syntax, and some of the basics. You could also keep a 101 book side-by-side, while you’re reading this book.

For the intermediate & advanced DBAs out there, this book will sit comfortably in your paws as you flip the pages and learn something new. For instance just today I learned that Postgres can index NULLs while MySQL, Oracle and SQL*Server cannot. Learn something new everyday.

Related: MySQL interview guide for managers and candidates alike

5. Gives you answers you can use today

After twenty years of consulting, I’ve seen a few patterns emerge. Besides the spectrum of team & communication challenges, firms hitting the performance wall often have issues with their relational databases.

Yes those databases are sometimes on the wrong hardware, or their are other obscure problems with setup or configuration. But the bulk of issues center on badly written SQL.

SQL is a much reviled language and often misunderstood. And it doesn’t seem like developers have gotten that much better at it over the years. It would explain the rise of NoSQL databases, as they often speak REST or xml, no need for pesky sequel.

One parting note. For all the devs and architects out there, who want to sing the virtues of ORMs, this book hits that squarely in the nose. By showing how differently each relational database implements SQL, performs work, and optimizes, Winand also illustrates the naivete behind trying to write database independent application code.

If you’re a developer and don’t know how to profile a query or run explain plan, don’t walk, run to your closest Amazon.com store and get this book!

Also: 5 more things deadly to scalability

Criticisms

If I were to offer two slight criticisms, it would be these. First, the index is a bit wonky. When I look under “P” for example, there’s no Postgres, while one quarter of the book is obviously devoted to that platform. Further, looking up NULL which are covered in depth, in various places in the book, only has one entry in the index, p54 on Oracle. So the index could be a bit more robust to be useful.

The other criticism is more perhaps my bias. On page 96, when he discusses ORMs I thought he was rather… shall we say gentle. Although he clearly states that “eager fetching” is problematic, I don’t think he goes far enough to condemn it. In my experience ORMs are always trouble.

Then again why am I complaining, their use keeps me forever employed.

Want a copy? Markus Winand’s book site has all the goods!.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why Healthcare.gov desperately needs techops

healthcare.gov logo

Join 15,000 others and follow Sean Hull on twitter @hullsean.

1. Tech-what? A quick education

Techops is operational excellence. It’s the handoff when code is complete. These are the folks who are up at 2am when a website is down. They manage servers, keep the pipes clean, and the hackers out. They also help plan for capacity needs, and may help with load testing too.

If you’re new to technology, imagine a movie set. Story writers (programmers) have already done their part. The producers (venture capital folks) have financed the project. The director (architect) is there trying to put the vision together. But the folks who manage everything on set, from sound guys, camera guys, lighting people, and all the coordination, this is operations. In web application deployments it is devops, sysops or techops.

Also: Why the Twitter IPO is afraid of scalability

2. In contrast with Obama election campaign

Notice how phenomenally well Obama for America project was run. Like a finely tuned machine. Harper Reed and team pulled off one of the most data backed election campaigns in history.

That project used AWS cloud technologies to the fullest, from devops tools like Puppet and Asgard, collaboration tools like Campfire & Github, and superb monitoring & instrumentation tools NewRelic and Chartbeat.

Clearly Obama knows how to run an election. Something is drastically different with the healthcare.gov project. Too many cooks in the kitchen, perhaps?

Read: Why your startup needs professional techops

3. A failure in capacity planning

Many popular news outlets covered the outage, but most pointed to “bugs”, which caused the outage. But when a site dies under load, while it’s working in test & Q/A, that’s a failure of load testing, and capacity planning.

I would wager a good bet, database tuning would definitely help as it’s the most common and prevalent cause of

Read this: What four letter word divides dev and ops?

4. More testing & more Agility needed

Modern software projects take advantage of continuous integration & agile methods. That is they make small incremental changes. Developers build unit tests, and the code is always in a working state. There is no multi-month dev cycle, where your current software is in doubt.

Reports indicate that the healthcare.gov software was being designed & developed using this old and most agree inferior method of software development, the waterfall method. New Yorker criticises it in Don’t go chasing waterfalls.

Read: Why devops talent is in short supply

5. Caching is desperately needed

All high performance, high scale websites need to take advantage of various types of caching as I’ve discussed in detail before. From browser caching, to page & object caching on the server side.

Hayden James investigated in depth, and found healthcare.gov severely lacking. Again this is a huge failure in techops, sysops or devops. It’s not a bug, and not something the developers are responsible to deliver.

Read: Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters