Category Archives: MySQL

Connect to MySQL in the Amazon Public Cloud

MySQL on Amazon Cloud AWS

Troubleshooting MySQL on Amazon can be a real test of patience. There are quite a few different things to watch out for in terms of connectivity & networking. Sometimes a checklist can help.

Join 16,000 others and follow Sean Hull on twitter @hullsean.

Here’s my exhaustive list of things that can block you.

1. Be sure to create users & grants

Chances are you did something like this to create your user:


mysql> CREATE USER ‘sean’@‘localhost’ IDENTIFIED BY ‘password’;
mysql> GRANT ALL PRIVILEGES ON sean_schema.* TO ‘sean’@‘localhost' WITH GRANT OPTION;

But that won’t help you when connecting from a remote Amazon box. So what to do? Here’s an example:


mysql> CREATE USER ‘sean’@’10.10.%’ IDENTIFIED BY ‘password’;
mysql> GRANT ALL PRIVILEGES ON sean_schema.* TO ‘sean’@‘%’ WITH GRANT OPTION;

You may need to make your source IP wildcard *more* aggressive. For example consider ’10.%’. You *may* even with with ‘%’ which allows *all* source IPs. This may sound dangerous, but if you use a tight security group (see item #3 below), you can still be safe.

Related: Why Oracle Won’t Kill MySQL

2. Make sure iptables is not a problem

IPTables is a Linux service that acts like a private firewall for each server. Some AMIs will have it enabled by default. If you’re having trouble like I did, this can definitely trip you up. That’s because your connection will fail silently without telling you, hey the OS won’t let me into that port!

If you are a networking pro you’ve probably already fiddled with iptables. Feel free to add specific rules, and keep it turned on. However I’d recommend just disabling it completely, and using your Amazon security groups to protect your ports.


$ /etc/init.d/iptables stop
$ chkconfig --list iptables
iptables 0:off 1:off 2:on 3:on 4:on 5:on 6:off
$ chkconfig --del iptables
$ chkconfig --list iptables
service iptables supports chkconfig, but is not referenced in any runlevel (run 'chkconfig --add iptables')

Also: Are SQL Databases Dying Out?

3. Test & verify amazon security group settings

Security groups in Amazon can be tricky. I recommend the following:

o create a security group webserver_group
- allow port 80 from 0.0.0.0/0
- allow port 443 from 0.0.0.0/0
- allow port 22 from

o create a security group db_group
- allow port 22 from
- allow 3306 from

What’s happening here? We can’t specify a fixed set of IP addresses because they can change in Amazon. So essentially what we’ve done is say *any* requests from servers in our Amazon package, which are in the webserver_group security group, can connect to port 3306. Pretty cool right?

This means we’re pretty locked down. No internet connections to 3306, so we can be a little looser (see item #1 above) about our grants and source IPs.

What about if you want to use your GUI tools to hit your Amazon hosted MySQL boxes? Say you like to use the Oracle Workbench, Navicat or Toad to connect to MySQL. One way you could do this is configure your db_group to allow 3306 from your office subnet. Then anyone VPN’d into your office will be able to use the tools they like.

Another option is to use Amazon VPC for your servers. You’ll setup an Amazon Virtual Private Gateway, which is a direct VPN connection between Amazon’s datacenter and your datacenter. This can be a messy process, and you’ll want to contact your network admin to help. Once it’s setup, amazon boxes appear to sit on your office or datacenter network. Cool stuff!


$ mysql -h xxx.xxx.xxx.xxx -u admin -p
Enter password:
ERROR 2003 (HY000): Can't connect to MySQL server on 'xxx.xxx.xxx.xxx'

Read this: Why are MySQL experts in such short supply?

4. MySQL network settings

If MySQL is bound to the wrong IP address you can have real problems. First be sure skip_networking is OFF. If it is ON change it in /etc/my.cnf & restart MySQL.


mysql> show variables like 'skip_net%';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| skip_networking | OFF |
+-----------------+-------+
1 row in set (0.00 sec)

The other MySQL setting that can be problematic is bind-address. First check what it is set to:


$ cat /etc/my.cnf | grep bind
bind-address=127.0.0.1

This isn’t going to allow remote connections. In amazon however, your IP address may change upon reboot. So there is a special setting to allow binding to any IP:


bind-address=0.0.0.0

Related: Bulletproofing MySQL Replication with Checksums

5. installing mysql client & telnet for troubleshooting

You have two options for troubleshooting on the webserver side. If you’re simply trying to check by mysql command line, you may get blocked up if the network settings & security groups aren’t configured right. So use telnet first.


$ yum install -y telnet

$ telnet 10.10.10.1 3306
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
4
5.1.71??gu9Y6B'/y9Oay`QV Connection closed by foreign host.
$

If you don't get a responce, it's not an issue with users or grants, but rather that the port isn't opened. Check iptables, check bind-address and check security groups.

Check this: Top MySQL DBA Interview Questions

6. SE Linux related issues

SE Linux will do a lot of good, if managed properly. However if you're not aware of it's existence, it can be very very frustrating. Symptoms can be as abstract as allergies, a cold or flu. It can monitor files, and prevent MySQL from being able to write where it needs to,

Read this: Migrating MySQL to Oracle

7. RPM & later centos yum repo install conflicts

I had real problems doing a custom install for a customer. They didn't want to use a repository for various settings, but preferred downloading RPMs. There were a few other customizations which were tripping things up.

Based on all the connectivity issues I was having, I backed out of the RPM based install, and then ran through a stock yum install. After doing that, I started seeing these weird errors in the mysqld.log

120328 21:32:40 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use
120328 21:32:40 [ERROR] Do you already have another mysqld server running on port: 3306 ?
120328 21:32:40 [ERROR] Aborting
If I run "netstat -nat | grep 3306" in my terminal, I get the following:
tcp4 0 0 *.3306 . LISTEN

I spent hours spinning my wheels and not able to figure out what was happening here. At first it seemed a leftover pid file was the culprit. In the end it appeared the *old* /etc/init.d/mysql script was still in place, and the new yum packages wouldn't work with that.

I ended up just scrapping the whole box, and starting from scratch. Sometimes you have to do that. After a clean build, all was fine.

Related: RDS or MySQL 10 Use Cases

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don't work with recruiters

Are SQL Databases Dead?

mesa verde city

I like the image of this city of Mesa Verde. It’s fascinating to see how ancient cities were built, especially as an inhabitant of one of the worlds largest cities today, New York.

I’m a long time relational database guy. I worked at scores of dot-coms in the 90′s as an old-guard Oracle DBA, and pivoted to MySQL into the new century. Would a guy like me who’s seen 20 years of relational database dominance really believe they could be dying?

There’s a lot to be excited about in this new realm of db, and some interesting bigger trends that are pushing things in a new way.

Join 15,100 others and follow Sean Hull on twitter @hullsean.

1. Growing use of ORMs

ORM probably sounds like some strange fossil archeologists just dug up in the ancient city of Mesa Verde. But they’re important. You may know them by their real-life names, Hibernate, Active Record, SQL Alchemy and Cake. There are many others. Object Relational Modelers provide a middleware between developers and the SQL of your chosen relational database. They abstract away the nitty gritty, and encapsulate it into a library.

In a way they’re like code generators. Mark Winand talks about them in SQL Performance Explained warning of the “eager fetching” problem. This is DBA speak for specifying all columns (SELECT *) or fetching all rows, when you don’t need them all. It’s inefficient in terms of asking the database to read & cache all that data, but also to send it across the network and then discard it on the webserver side. Like a lazy housekeeper the clutter & dust will grow to overwhelm you.

Martin Fowler is the author of the great book NoSQL Distilled. He tries to walk the fence in his post ORM Hate, trying to balance developers love of ORMs, and the obvious need for scalability. Ted Neward calls ORMs the Vietnam of Computer Science. Mattias Geniar points out that BAD ORMs are infinitely worse than bad SQL and another on High Scalability by Drewsky ”>The Case Against ORM Frameworks.

If you agree the ORM conversation is still a huge mess, you’ll be excited to know that NoSQL sidesteps it completely. They’re built out of the box to interface more like data structures, than reading rows and columns. So you eliminate the scalability problems they introduce when you go NoSQL. That makes developers happy, and pleases DBAs and techops too. Win!

Read: Why Oracle won’t kill MySQL

2. Widening field of options

NoSQL databases are not simply key value stores, though some like Memcache and Riak do fit that mold.

Mongodb offers configurable consistency & durability & the advantages of document storage, no need for an ORM here. You also have a mix of indexing options, that go a little deeper than other NoSQL solutions. A sort of middle ground solution that offers the best of both worlds.

Cassandra, a powerful db that is clustered out of the box. All nodes are writeable, and there are various ways to handle conflict resolution to suit your needs. Cassandra can grow big, and naturally takes advantage of cloud nodes. It also has a nice feature to naturally age out data, based on settings you control. No more monumental archiving jobs.

Hbase is the database part of Hadoop, based on Google’s seminal Bigtable paper.

Redis is another option with growing popularity. It’s a key-value store, but allowing more complex data in it’s buckets, such as hashes, lists, sets and sorted sets. Developers should be salivating at this one.

Also: 5 Great Things about Markus Winand’s Book SQL Performance Explained

3. Lowering bar

The old world of relational databases treat data as sacrosanct. DBAs are tasked with protecting it’s integrity & consistency. They manage backups to protect against disaster. In this world, every bit of data written is as sacred as any other, whether it’s your bank account balance, or a comment added to a facebook discussion.

But modern non-relational databases introduce the idea of eventually consistent. DBAs and architects would say we are relaxing our durability requirements. What they mean is data can get slightly out of sync and we’re ok with that. We’ll build our web applications to plan for that, or even in the case of Riak expose the levers of durability directly to the developers, allowing them to make some changes instant, while others more lax and lazy.

Check this: Why high availability is so very hard to deliver

4. Cloud demands

Virtualized environments like Amazon EC2, give easy access to legions of servers. Availability zones & regions only widen the deployment options. So deploying a single writeable master, the way traditional relational databases work best, is not natural.

Databases like Cassandra, Mongo & Redis are clustered right out of the box. They grew up in this virtual datacenter environment and feel comfortable there.

Related: Why I wrote the book on Oracle & Open Source

5. Only DBAs understand them

Devs may whine at this statement, and to be fair it’s a generalization. The popularity of ORMs speaks volumes here. Anything to eliminate the dreaded SQL writing. Meanwhile DBAs bemoan the use of ORMs for they represent everything they’re trying to fix.

SQL is hard enough, but the ugly truth is each database vendor has their own implementation, their own optimizations, their own optimal tweaks. Even between database versions, SQL code may not perform consistently.

Identifying slow SQL and tweaking it remains one of the primary tasks of performance tuning, for this reason. It hasn’t changed much in my two decades on the job.

Also: Why bemoaning AWS performance sounds like Linux detractors circa 1999

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 great things about Markus Winand’s book SQL Performance Explained

markus winand sql performance explained book

Join 12,100 others and follow Sean Hull on twitter @hullsean.

1. Covers databases broadly

You may not have noticed, but there’s a whole spectrum of relational databases on offer. Of course in the database world, most get infatuated with one, and that becomes their bread & butter before long. Their life, their passion, their devotion.

That’s fine as far as it goes, but Winand really stands out, offering a spectrum of ideas and optimization techniques for different platforms. If you’re an Oracle-only or MySQL-only dba you’ll gain a lot from this book but even more importantly if you work in professional services, and need to communicate with DBAs brought up on one of these platforms, it becomes like a rosetta stone for SQL query tuning.

Read: Why devops talent is in short supply

2. Shows you how to fish

I find database books and methods fall into two sort of broad categories. There’s the call Oracle support method, where you’ll be handed one very specific set of steps, commands, and a path to solve each specific problem. It’s more about memorization, it’s like they actually hand you the fish.

Then there is the investigative method, where you learn how to use a magnifying glass to look at fingerprints, and check for DNA samples, and interrogate suspects. You know learn the tools of the trade.

That’s what Markus brings you, in all it’s delicious glory.

Read: Why high availability is so very hard to deliver

3. It’s concise

Another gripe I have of technical books is that the publishing model is, this is a textbook and since the price tag is large, let’s make the physical book large! Of course no one wants to carry those books around. I even recently bought a kindle to solve this problem.

SQL Performance Explained is more a paperback book form factor, and that means you can tote it around with you easily, and keep it with you at work. Read it on the train, commuting to work.

200 pages packed cover to cover with all sorts of good chapters, including a primer on indexes & types, scalability & performance, joins, clustering, Top-N queries, DML, and more.

Read this: Why a four letter word divides dev and ops

4. It’s technical but accessible

If you’re a real rock bottom beginner, you might want to dig a bit more on your SQL syntax, and some of the basics. You could also keep a 101 book side-by-side, while you’re reading this book.

For the intermediate & advanced DBAs out there, this book will sit comfortably in your paws as you flip the pages and learn something new. For instance just today I learned that Postgres can index NULLs while MySQL, Oracle and SQL*Server cannot. Learn something new everyday.

Related: MySQL interview guide for managers and candidates alike

5. Gives you answers you can use today

After twenty years of consulting, I’ve seen a few patterns emerge. Besides the spectrum of team & communication challenges, firms hitting the performance wall often have issues with their relational databases.

Yes those databases are sometimes on the wrong hardware, or their are other obscure problems with setup or configuration. But the bulk of issues center on badly written SQL.

SQL is a much reviled language and often misunderstood. And it doesn’t seem like developers have gotten that much better at it over the years. It would explain the rise of NoSQL databases, as they often speak REST or xml, no need for pesky sequel.

One parting note. For all the devs and architects out there, who want to sing the virtues of ORMs, this book hits that squarely in the nose. By showing how differently each relational database implements SQL, performs work, and optimizes, Winand also illustrates the naivete behind trying to write database independent application code.

If you’re a developer and don’t know how to profile a query or run explain plan, don’t walk, run to your closest Amazon.com store and get this book!

Also: 5 more things deadly to scalability

Criticisms

If I were to offer two slight criticisms, it would be these. First, the index is a bit wonky. When I look under “P” for example, there’s no Postgres, while one quarter of the book is obviously devoted to that platform. Further, looking up NULL which are covered in depth, in various places in the book, only has one entry in the index, p54 on Oracle. So the index could be a bit more robust to be useful.

The other criticism is more perhaps my bias. On page 96, when he discusses ORMs I thought he was rather… shall we say gentle. Although he clearly states that “eager fetching” is problematic, I don’t think he goes far enough to condemn it. In my experience ORMs are always trouble.

Then again why am I complaining, their use keeps me forever employed.

Want a copy? Markus Winand’s book site has all the goods!.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why Oracle won’t kill MySQL

oracle mysql database

Join 15,000 others and follow Sean Hull on twitter @hullsean.

1. MySQL does not compete with Oracle

It’s a myth that MySQL somehow poses a threat to Oracle. Oracle’s customers tend to be large enterprises running apps like e-business suite. These are certified to run on Oracle, and further they sit close to finance.

MySQL tends to be a choice of scrappy but nimble startups for their web-facing applications. They want to deploy in the cloud, and don’t want to deal with licenses. Plus they have the techops chops to handle the bushwacking of open source.

Related: Why I wrote the book on Oracle & Open Source

2. Oracle bought Sun for the hardware business

Remember when Oracle acquired Sun? A lot of folks assumed Larry was after MySQL. Grab it & slowly smother it. But actually it was more frosting on the cake. Larry had for years expressed interest in cubes and clusters, and building an Oracle appliance. Whether this ever came to profitable fruition in the form of Exadata remains to be seen. But buying Sun for a song helped him do this.

Also: Why bemoaning AWS performance sounds like Linux detractors circa 1999

3. Larry blows with the wind on open source

He’s money minded, so you’ll see in his decisions that comes first.

In the late 90’s when a customer might spend $100k on Sun and $100k on Oracle licenses, Larry realized porting to Linux and pushing commodity hardware would be a win. So he pushed Linux, and customers could now spend $20k on commodity hardware and $180k on Oracle licenses for them. Imagine the 10million dollar budget if you’re having trouble with the math here.

He also eventually moved the middle tier to Apache for similar reasons. I would argue Oracle corp overall pays lip service to contributing to open source, but they do that to some degree.

Read: Why MySQL dbas are so hard to find

4. MySQL support business is real

What’s more, just as adopting Linux, and then offering their “unbreakable Linux” distro, and pricey support along with it, they’re doing similar things with MySQL. For enterprise customers, and those already comfortable with making the call to Redwood Shores, sales folks will happily direct them support contracts and enterprise add-ons. Naturally.

Read: Why your startup needs real techops

5. There are real viable alternatives to keep balance

And let’s not forget folks, there are already a bunch of forks. There’s the popular and every growing Mariadb which Google has put their muscle behind.

Of course let’s not forget the very popular, very capable, and very bulletproof Percona distribution, along with the Percona toolkit and xtrabackup for real hotbackups.

And for those looking to experiment, there’s Drizzle a work in progress, complete rewrite, and one that’s unfortunately not a drop-in replacement.

Read this: What’s the four letter word dividing dev and ops?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

3 Simple Patterns for Tighter MySQL Code

Join 8000 others and follow Sean Hull on twitter @hullsean.

SQL is derided by many and for good reason. It’s key to scalability yet terribly difficult to write good code.

Here’s a few quick tips to write tighter queries in MySQL

1. Get rid of those Subqueries!

Subqueries are a standard part of SQL, unfortunately MySQL doesn’t handle them very well. Luckily there’s a sweet rewrite that can put you in the fast lane. Here’s how to speedup a MySQL subquery by rewriting as a join.

Note that another compelling reason to upgrade to MySQL 5.6 is that this tweak has been rolled into the optimizer. Hoorah!

Also: 5 Things Deadly to Scalability.

2. Repair those UNIONs

If your code uses the UNION construct in SQL, there are a few different ways to tune those queries. You can use UNION ALL or pushdown conditions can help you optimize UNION in MySQL.

Read this: MySQL DBA Hiring Guide for candidates, managers & recruiters

4. Better PAGING through datasets

Does your web application display pages of users, pages of orders or pages of items? If you’re using
LIMIT and OFFSET there are 3 good ways to optimize these in MySQL.

Check out: Scalability Happiness – A Quiet Query Log

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

MySQL Subquery Optimization

Join 7000 others and follow Sean Hull on twitter @hullsean.

MySQL’s optimizer can do a lot of things, but subqueries are not always handled well. Take a look at the IN subquery below. If you see the DEPENDENT SUBQUERY in your explain plan, you may want to take a second look. This will run slow as a dog, when the tables get large.

[code]
SELECT * FROM bucket
WHERE bucket_id IN (
SELECT bucket_id
FROM bucket_items
WHERE item_id = 1);
[/code]

Here’s what the EXPLAIN looks like.

[code]
(sean@localhost:mysql.sock) [test]> explain select * from bucket where bucket_id in (select bucket_id from bucket_items where item_id = 1);
+----+--------------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------+------+---------------+------+---------+------+------+-------------+
| 1 | PRIMARY | bucket | ALL | NULL | NULL | NULL | NULL | 3 | Using where |
| 2 | DEPENDENT SUBQUERY | bucket_items | ALL | NULL | NULL | NULL | NULL | 2 | Using where |
+----+--------------------+--------------+------+---------------+------+---------+------+------+-------------+
2 rows in set (0.00 sec)
[/code]

Also: 3 Ways to Optimize Paging queries with LIMIT and OFFSET

[quote]
Subqueries are not handled well in MySQL by default. Luckily an INNER JOIN rewrite can help in some cases.
[/quote]

Rewrite as an INNER JOIN

Fortunately there is an optimization for this type of query. You can rewrite it as an inner join.

[code]
SELECT bucket.*
FROM bucket
INNER JOIN bucket_items
USING (bucket_id)
WHERE item_id = 1;
[/code]

Here’s what the new explain looks like:

[code]
(sean@localhost:mysql.sock) [test]> explain select bucket.* from bucket inner join bucket_items using (bucket_id) where item_id = 1;
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | bucket_items | ALL | NULL | NULL | NULL | NULL | 2 | Using where |
| 1 | SIMPLE | bucket | ALL | NULL | NULL | NULL | NULL | 3 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
2 rows in set (0.00 sec)
[/code]

Also: How to optimize MySQL UNION for high speed

Note that you’re still getting out the same bucket items as the previous query, you’re just showing MySQL a much more efficient way to fetch and return the rows to you.

**Last point. You should index the bucket_id & item_id columns. I simply wanted to illustrate the DEPENDENT SUBQUERY above.

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

3 Ways to Optimize for Paging in MySQL

Join 6100 others and follow Sean Hull on twitter @hullsean.

Lots and lots of web applications need to page through information. From customer records, to the albums in your itunes collection. So as web developers and architects, it’s important that we do all this efficiently.

Start by looking at how you’re fetching information from your MySQL database. We’ve outlined three ways to do just that.

Also check out: Five more things Deadly to Scalability.

1. Paging without discarding records

Ultimately we’re trying to avoid discarding records. After all if the server doesn’t fetch them, we save big. How else can we avoid this extra work.

How about remember the last name. For example:

[code]
select id, name, address, phone
FROM customers
WHERE id > 990
ORDER BY id LIMIT 10;
[/code]

Also: The Mythical MySQL DBA.

Of course such a solution would only work if you were paging by ID. If you page by name, it might get messier as there may be more than one person with the same name. If ID doesn’t work for your application, perhaps returning paged users by USERNAME might work. Those would be unique:

[code]
SELECT id, username
FROM customers
WHERE username > 'shull@iheavy.com'
ORDER BY username LIMIT 10;
[/code]

Read this: Myth of Five Nines.

[quote]
Paging queries can be slow with SQL as they often involve the OFFSET keyword which instructs the server you only want a subset. However it typically scans collects and then discards those rows first. With deferred join or by maintaining a place or position column you can avoid this, and speedup your database dramatically.
[/quote]

2. Try using a Deferred Join

This is an interesting trick. Suppose you have pages of customers. Each page displays ten customers. The query will use LIMIT to get ten records, and OFFSET to skip all the previous page results. When you get to the 100th page, it’s doing LIMIT 10 OFFSET 990. So the server has to go and read all those records, then discard them.

Also: AirBNB didn’t have to fail during an AWS outage.

[code]
SELECT id, name, address, phone FROM customers ORDER BY name LIMIT 10 OFFSET 990;
[/code]

MySQL is first scanning an index then retrieving rows in the table by primary key id. So it’s doing double lookups and so forth. Turns out you can make this faster with a tricky thing called a deferred join.

The inside piece just uses the primary key. An explain plan shows us “using index” which we love!

[code]
SELECT id
FROM customers
ORDER BY name
LIMIT 10 OFFSET 990;
[/code]

Now combine this using an INNER JOIN to get the ten rows and data you want:

[code]
SELECT id, name, address, phone
FROM customers
INNER JOIN (
SELECT id
FROM customers
ORDER BY name
LIMIT 10 OFFSET 990)
AS my_results USING(id);
[/code]

That’s pretty cool!

3. Maintain a Page or Place column

Another way to trick the optimizer from retrieving rows it doesn’t need is to maintain a column for the page, place or position. Yes you need to update that column whenever you (a) INSERT a row (b) DELETE a row ( c) move a row with UPDATE. This could get messy with page, but a straight place or position might work easier.

[code]
SELECT id, name, address, phone
FROM customers
WHERE page = 100
ORDER BY name;
[/code]

Hiring? MySQL DBA Interview Guide for Candidates & Managers.

Or with place column something like this:

[code]
SELECT id, name, address, phone
FROM customers
WHERE place BETWEEN 990 AND 999
ORDER BY name;
[/code]

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

MySQL for Devs, DBAs and Debutantes

Join 6500 others and follow Sean Hull on twitter @hullsean.

I just received my copy of the 5th Edition of Paul DuBois’ MySQL tomb. Weighing in at 1153 pages, it’s a solid text, with a very thorough introduction to the topic of administering MySQL databases.

Buy the book here: MySQL 5th Edition by Paul Dubois

A book for a broad audience

When I say debutantes, it’s a nod to beginners, for this book forges a very solid and complete introduction to the topic of MySQL. Start with installing the software & setting up your environment, and then move on to really understanding the SQL language, from commands to create objects, to ones for adding & modifying data, and then writing code around it.

See also: 5 more things deadly to Scalability

There’s a thorough discussion of datatypes, stored procedures, functions and views.

[quote]
Paul Dubois’ definitive reference makes a excellent compliment to High Performance MySQL. They should sit alongside eachother on your database bookshelf.
[/quote]

For developers there are chapters on writing applications in C, another for Perl and a third for PHP.

For DBAs there are chapters on security, backups, replication, understanding the data directory and general server administration. There is also good coverage of both 5.5 and the newly released 5.6 of MySQL.

What I like about this book

You can think of this book as a definitive reference to MySQL. It includes much of the online documentation that you would find at Oracle’s site, such as command & variable reference, and detailed explanation of how to use the client tools.

Dubois also goes beyond the online documentation though, giving you a bit of a background around concepts, a broader more complete discussion.

Read this: Two Part DBA Interview Guide for Managers & Candidates alike

He also lays out the material in a very logical stepwise way, so for someone new to the MySQL world and the time on their hands, the 1153 pages could be read straight through.

Why No Mention of Percona Toolkit?

I have to admit I was a bit surprised there was no mention of Percona Toolkit. Perhaps it was buried in some dark corner of the text I missed, but it made no mention in the index at all.

Percona Toolkit of course is a tool that every DBA should be familiar with. It is really an essential toolkit and fills the gaps that the prepackaged tools can’t help you with.

Want to checksum your tables to compare data on master & slave? pt-table-checksum does the trick.

Check this: AirBNB didn’t have to fail during the Amazon AWS Outage

Want to find out how far your slaves *really* are behind? pt-heartbeat is your friend.

Want to analyze your slow query log to produce a useful summary report? pt-query-digest to the rescue.

I also see no mention of innotop, which I would also say is an essential tool. These aren’t really advanced topics, so It’s unclear why they are missing. In the real world you need these tools to do your job.

Other Criticisms

My more general criticism is where the book lacks real-world advice from a seasoned DBA. At times the writing feels a bit more of the official line on how things work. But in day-to-day devops and operations, things can be quite different.

Also: Bulletproofing MySQL Replication with Checksums

For example, stored procedures. In MySQL they are there, however using them brings real performance challenges. They’re not always compatible with replication. Given all of that, why include a whole chapter with endless discussion of them without strong reservations. It would lead a novice user or developer to incorporate them into an application only to be shocked and surprised at the problems they bring.

Another example, looking through the system variables reference, I see the sync_binlog option. There is a short caution “…lower values provide greater safety in the event of a crash, but also affect performance more adversely”. Now reading this as a novice DBA I might think great, crash protection. But having tried this parameter in production, I found a huge impact on performance and had to disable it. What’s the advice here? It’s a bit confusing.

Conclusions

This is a really great book as an introduction to MySQL, and delving into intermediate topics. I would sit it on your bookshelf along side High Performance MySQL. What this book lacks in advice, you can turn to the latter book, and what High Performance MySQL lacks in terms of introductory material this book covers in spades. They make a great compliment to each other.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

How to Optimize MySQL UNION For High Speed

obama innauguration big data sets

Join 6100 others and follow Sean Hull on twitter @hullsean.

There are two ways to speedup UNIONs in a MySQL database. First use UNION ALL if at all possible, and second try to push down your conditions.

[mytweetlinks]

1. UNION ALL is much faster than UNION

How does a UNION work? Imagine you have two tables for shirts. The short_sleeve table looks like this:

[code]
blue
green
gray
black
[/code]

And long_sleeve another that looks like this:

[code]
red
green
yellow
blue
[/code]

Related: Why Generalists are Better at Scaling the Web

If you UNION those two tables, first MySQL will sort the combined set into a temp table like this:

[code]
black
blue
blue
gray
green
green
red
yellow
[/code]

Once it’s done this sort, it can easily remove the duplicate blue & duplicate green for this resulting set:

[code]
black
blue
gray
green
red
yellow
[/code]

See also: Mythical MySQL DBA – the talent drought.

Why does it do this? UNION is defined that way in SQL. Duplicates must be removed and this is an efficient way for the MySQL engine to remove them. Combine results, sort, remove duplicates and return the set.

[quote]
Queries with UNION can be accelerated in two ways. Switch to UNION ALL or try to push ORDER BY, LIMIT and WHERE conditions inside each subquery. You’ll be glad you did!
[/quote]

What if we did UNION ALL? The result would look like this:

[code]
blue
green
gray
black
red
green
yellow
blue
[/code]

Read this: MySQL DBA Interview & Hiring Guide.

It doesn’t have to sort, and doesn’t have to remove duplicates. If you imagine combining two 10 million row tables, and don’t have to sort, this speedup can be HUGE.

2. Use Push-down Conditions to speedup UNION in MySQL

Imagine with our example above the shirts have a design date, the year they were released. Yes we’re keeping this example very simple to illustrate the concept.

Here is the short_sleeve table:
[code]
blue 2013
green 2013
green 2012
gray 2011
black 2009
black 2011
[/code]

And long_sleeve table looks like this:

[code]
red 2012
red 2013
green 2011
yellow 2010
blue 2011
[/code]

For 2013 designs could combine them like this:

[code]
(SELECT type, release FROM short_sleeve)
UNION
(SELECT type, release FROM long_sleeve);
WHERE release >=2013;
[/code]

See also: 5 More Things Deadly to Scalability and the original 5 Things Toxic to Scalability..

Here the WHERE clause works on this 11 record temp table:

[code]
black 2009
black 2011
blue 2011
blue 2013
gray 2011
green 2013
green 2012
green 2011
red 2012
red 2013
yellow 2010
[/code]

But it would be much faster to move the WHERE inside each subquery like this:

[code]
(SELECT type, release FROM short_sleeve WHERE release >=2013)
UNION
(SELECT type, release FROM long_sleeve WHERE release >=2013);
[/code]

That would be operating on a combined 3 record table. Faster to sort & remove duplicates. Smaller result sets cache better too, providing a pay forward dividend. That’s what performance optimization is all about!

Read this: RDS or MySQL – 10 Use Cases.

Remember multi-million row sets in each part of this query will quickly illustrate the optimization. We’re using very small results to make visualizing easier.

You can also use this optimization for ORDER BY and for LIMIT conditions. By reducing the number of records returned by EACH PART of the UNION, you reduce the work that happens at the stage where they are all combined.

If you’re seeing some UNION queries in your slow query log, I suggest you try this optimization out and see if you can tweak

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

The Most Important AWS Feature for Performance and Scalability

Join 6100 others and follow Sean Hull on twitter @hullsean.

The Foundation of Speed

All servers use disk to store files. Operating system libraries, webserver & application code, and most importantly databases all use disk constantly.

So disk speed is crucial to server speed.

[mytweetlinks]

[quote]
Disk speed is crucial for MySQL databases. It has been a real challenge in multi-tenant environments like Amazon’s EBS. The provisioned IOPS feature addresses this head on, allowing customers to lock in great MySQL database performance!
[/quote]

Also check out: Five more things Deadly to Scalability.

Disk Performance on Multi-tenant EBS

Amazon’s EBS or elastic block storage, is a virtualized network storage solution. You can think of it as RAIDed disks but accessed & provisioned over a high speed network.

Related: Why Generalists are Better at Scaling the Web

Since Amazon is a multi-tenant environment, other customers are using that same network, and hitting those same disks. So if your neighbors are seeing a lot of traffic to disk, your web application can slow down. Not good!

What is Provisioned IOPS

We’ll agree that it’s one of the worst branded features ever, but you should know about it and use it, especially for your MySQL databases.

Provisioned means that you’ll lock in performance in advance, and IOPS stands for I/O operations. Think of it as google juice for your cloud database servers!

Also: How I increased my blog pagerank to 5

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample