I like the image of this city of Mesa Verde. It’s fascinating to see how ancient cities were built, especially as an inhabitant of one of the worlds largest cities today, New York.
I’m a long time relational database guy. I worked at scores of dot-coms in the 90′s as an old-guard Oracle DBA, and pivoted to MySQL into the new century. Would a guy like me who’s seen 20 years of relational database dominance really believe they could be dying?
There’s a lot to be excited about in this new realm of db, and some interesting bigger trends that are pushing things in a new way.
Join 15,100 others and follow Sean Hull on twitter @hullsean.
1. Growing use of ORMs
ORM probably sounds like some strange fossil archeologists just dug up in the ancient city of Mesa Verde. But they’re important. You may know them by their real-life names, Hibernate, Active Record, SQL Alchemy and Cake. There are many others. Object Relational Modelers provide a middleware between developers and the SQL of your chosen relational database. They abstract away the nitty gritty, and encapsulate it into a library.
In a way they’re like code generators. Mark Winand talks about them in SQL Performance Explained warning of the “eager fetching” problem. This is DBA speak for specifying all columns (SELECT *) or fetching all rows, when you don’t need them all. It’s inefficient in terms of asking the database to read & cache all that data, but also to send it across the network and then discard it on the webserver side. Like a lazy housekeeper the clutter & dust will grow to overwhelm you.
Martin Fowler is the author of the great book NoSQL Distilled. He tries to walk the fence in his post ORM Hate, trying to balance developers love of ORMs, and the obvious need for scalability. Ted Neward calls ORMs the Vietnam of Computer Science. Mattias Geniar points out that BAD ORMs are infinitely worse than bad SQL and another on High Scalability by Drewsky ”>The Case Against ORM Frameworks.
If you agree the ORM conversation is still a huge mess, you’ll be excited to know that NoSQL sidesteps it completely. They’re built out of the box to interface more like data structures, than reading rows and columns. So you eliminate the scalability problems they introduce when you go NoSQL. That makes developers happy, and pleases DBAs and techops too. Win!
2. Widening field of options
NoSQL databases are not simply key value stores, though some like Memcache and Riak do fit that mold.
Mongodb offers configurable consistency & durability & the advantages of document storage, no need for an ORM here. You also have a mix of indexing options, that go a little deeper than other NoSQL solutions. A sort of middle ground solution that offers the best of both worlds.
Cassandra, a powerful db that is clustered out of the box. All nodes are writeable, and there are various ways to handle conflict resolution to suit your needs. Cassandra can grow big, and naturally takes advantage of cloud nodes. It also has a nice feature to naturally age out data, based on settings you control. No more monumental archiving jobs.
Hbase is the database part of Hadoop, based on Google’s seminal Bigtable paper.
Redis is another option with growing popularity. It’s a key-value store, but allowing more complex data in it’s buckets, such as hashes, lists, sets and sorted sets. Developers should be salivating at this one.
3. Lowering bar
The old world of relational databases treat data as sacrosanct. DBAs are tasked with protecting it’s integrity & consistency. They manage backups to protect against disaster. In this world, every bit of data written is as sacred as any other, whether it’s your bank account balance, or a comment added to a facebook discussion.
But modern non-relational databases introduce the idea of eventually consistent. DBAs and architects would say we are relaxing our durability requirements. What they mean is data can get slightly out of sync and we’re ok with that. We’ll build our web applications to plan for that, or even in the case of Riak expose the levers of durability directly to the developers, allowing them to make some changes instant, while others more lax and lazy.
Check this: Why high availability is so very hard to deliver
4. Cloud demands
Virtualized environments like Amazon EC2, give easy access to legions of servers. Availability zones & regions only widen the deployment options. So deploying a single writeable master, the way traditional relational databases work best, is not natural.
Databases like Cassandra, Mongo & Redis are clustered right out of the box. They grew up in this virtual datacenter environment and feel comfortable there.
5. Only DBAs understand them
Devs may whine at this statement, and to be fair it’s a generalization. The popularity of ORMs speaks volumes here. Anything to eliminate the dreaded SQL writing. Meanwhile DBAs bemoan the use of ORMs for they represent everything they’re trying to fix.
SQL is hard enough, but the ugly truth is each database vendor has their own implementation, their own optimizations, their own optimal tweaks. Even between database versions, SQL code may not perform consistently.
Identifying slow SQL and tweaking it remains one of the primary tasks of performance tuning, for this reason. It hasn’t changed much in my two decades on the job.