Category Archives: Database Migrations

How to interview an amazon database expert

via GIPHY

Amazon releases a new database offering every other day. It sure isn’t easy to keep up.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

Let’s say you’re hiring a devops & you want to suss out their database knowledge? Or you’re hiring a professional services firm or freelance consultant. Whatever the case you’ll need to sift through for the best people. Here’s how.

Also: How to interview an AWS expert

What database does Amazon support for caching?

Caching is a popular way to speed up access to your backend database. Put Amazon’s elasticache behind your webserver, and you can reduce load on your database by 90%. Nice!

The two types that amazon supports are Memcache & Redis. Memcache is historically more popular. These days Redis seems a clear winner. It’s faster, and can maintain your cached data between restarts. That will save you I promise!

Also: Is AWS too complex for small dev teams?

How can I store big data in AWS?

Amazon’s data warehouse offering is called Redshift. I wrote Why is everyone suddenly talking about Redshift?. Why indeed!

When you’re doing large reports for your business intelligence team, you don’t want to bog down your backend relational database. Redshift is purpose built for this use case.

I’ve see a report that took over 8 hours in MySQL return in under 60 seconds in Redshift!

A new offering is Amazon Spectrum. This tech is super cool. Load up all your data into S3, in standard CSV format. Then without even loading it into Redshift, you can query the S3 data directly. This is super useful. Firstly because S3 is 1/10th the price. But also because it allows you to stage your data before loading into Redshift itself. Goodbye Google Big Query! I talked about spectrum here.

Related: Which engineering roles are in greatest demand?

What relational database options are there on Amazon?

Amazon supports a number of options through it’s Relational Database Service or RDS. This is managed databases, which means less work on your DBAs shoulders. It also may make upgrades slower and harder with more downtime, but you get what you pay for.

There are a lot of platforms available. As you might guess MySQL & Postgres are there. Great! Even better you can use MariaDB if that’s your favorite. You can also go with Aurora which is Amazon’s own home-brew drop in replacement for MySQL that promises greater durability and some speedups.

If you’re a glutton for punishment, you can even get Oracle & SQL Server working on RDS. Very nice!

Read: Can on-demand consulting save startups time & money?

Does AWS have a NoSQL database solution?

If NoSQL is to your taste, Amazon has DynamoDB. According to . I haven’t seen a lot of large production applications using it, but what he describes makes a lot of sense. The way Amazon scales nodes & data I/O is bound to run into real performance problems.

That said it can be a great way to get you up and running quickly.

Read: Can on-demand consulting save startups time & money?

How do I do ETL & migrate data to AWS?

Let’s be honest, Amazon wants to make this really easy. The quicker & simpler it is to get your data there, that more you’ll buy!

Amazon’s Database Migration Service or DMS allows you to configure your old database as a data source, then choose a Amazon db solution as destination, then just turn on the spigot and pump your data in!

ETL is extract transform and load, data warehouse terminology for slicing and dicing data before you load it into your warehouse. Many of todays warehouses are being built with the data lake model, because databases like Redshift have gotten so damn fast. That model means you stage all your source data as-is in your warehouse, then build views & summary tables as needed to speed up queries & reports. Even better you might look a tool like xplenty.

Amazon’s new offering is called Glue. Five ways to get data into Amazon Redshift. This solution is purpose build for creating a powerful data pipeline, complete with python code to do transformations.

Read: Is data your dirty little secret?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to build an operational datastore on AWS with S3 & Redshift

via GIPHY

You’re building your data warehouse, and getting data into Redshift. You’ve got your ETL pipeline running, and presentation layer talking to the warehouse. Great.

But how to get access to that source data? Wouldn’t it be nice if that was close by too?

Join 35,000 others and follow Sean Hull on twitter @hullsean.

It may be you have 10-zillion rows of source data and don’t want or need to get all of that into Redshift and keep it there. But it would be nice to have access to it when you do.

Enter EXTERNAL tables, aka Spectrum. Now you can keep all your raw data in S3, an in place operational datastore of data before it’s been reworked and transformed. Use SQL to access it right where it sits.

Get all the advantages of lifecycle management in S3, and don’t pay all the redshift costs for data you don’t need all the time. Cool!

Let’s see how it works.

What is an EXTERNAL table?

Spectrum is Amazon’s rebranding of an old database technology called EXTERNAL TABLES. Back in the 90’s Oracle pioneered this work, allowing you to essentially map a CSV file, that sits outside the database proper. This means you can query all that juicy data sitting in flat files. Cool!

Athena allows you to query this stuff as a service, native to AWS. Spectrum allows you to create those external tables inside of Redshift.

Also: Top serverless interview questions for hiring aws lambda experts

Give Redshift permissions

Go into IAM and create a new role called “SeanSpectrumRole”. Assign the policy AmazonS3ReadOnlyPolicy. It looks like this:


{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*"
],
"Resource": "*"
}
]
}

If you’re using the dashboard you just pick the policy from the named list. However if you’re using CloudFormation, you’ll use the code above.

Now navigate your aws console to the Redshift dashboard, click clusters, and click the checkbox for your cluster. Probably there’s only one.

Now click the “Manage IAM Roles” button, and a dialog should popup.. Select the role you created earlier, SeanSpectrumRole. Then click “Apply Changes”.

The beauty of the AWS world is that servers themselves can have API permissions. In this case we gave the redshift cluster or server itself, access to S3 for our use below!

Related: Which engineering roles are in greatest demand?

Create your spectrum schema

First you must create a spectrum schema. Here’s the syntax:


create external schema spectrum
from data catalog
database 'sean'
region 'us-east-1'
iam_role 'arn:aws:iam::9999999999999:role/SeanSpectrumRole';

Read: Can on-demand consulting save startups time & money?

Upload your data to S3 bucket

Here we create an s3 bucket called sean_spectrum, then upload one csv file named sean_numbers.txt.


$ aws s3api create-bucket --bucket sean_spectrum --region us-east-1
{
"Location": "/sean_spectrum"
}
$ cd spectrum/
$ cat sean_numbers.txt
21,Dr.,Who,44-22-55-77-88
35,Bat,Man,317-222-4777
15,Wonder,Woman,999-324-7878
99,Storm,Cloud,367-399-6767
75,Marvel,Girl,222-333-9595
32,Quick,Silver,22-33-77-99
12,Scarlet,Witch,23-35-47-555
$ aws s3 cp sean_numbers.txt s3://sean_spectrum/
upload: ./sean_numbers.txt to s3://sean_spectrum/sean_numbers.txt
$ aws s3 ls s3://sean_spectrum/
2017-05-18 20:28:41 193 sean_numbers.txt
$

Note the names. The table name won’t turn out to be sean_numbers. It will be called sean_spectrum, and all files inside that directory will be queried. So make sure they have consistent formats!

Also: 30 questions to ask a serverless fanboy

Create & query your external table

Here’s how you create your external table. Note this is just a map to data. The data is still stored in S3. it is not brought into Redshift except to slice, dice & present.


mydb=# create external table spectrum_schema.sean_numbers(
id int,
fname string,
lname string,
phone string)
row format delimited
fields terminated by ','
stored as textfile
location 's3://sean_spectrum/';

Here’s how you query it:


mydb=# select * from spectrum_schema.sean_numbers order by id;
id | fname | lname | phone
----------------+---------------
12 | Scarlet | Witch | 23-35-47-555
15 | Wonder | Woman | 999-324-7878
21 | Dr. | Who | 44-22-55-77-88
32 | Quick | Silver | 22-33-77-99
35 | Bat | Man | 317-222-4777
75 | Marvel | Girl | 222-333-9595
99 | Storm | Cloud | 367-399-6767

Cool. We reordered data read from an S3 file!!!

Although you can’t create a view over a redshift table *AND* an S3 external table, you can query them together.

So for example if I have a table in redshift with addresses, I can join them together:

mydb=# select a.id, a.fname, a.lname, b.address from spectrum_schema.sean_numbers a, sean_addresses b
where a.id = b.id order by id;

id | fname | lname | phone | address
----------------+----------------------------
12 | Scarlet | Witch | 23-35-47-555 | 10 main st
15 | Wonder | Woman | 999-324-7878 | 25 center st
21 | Dr. | Who | 44-22-55-77-88 | 32 broadway
32 | Quick | Silver | 22-33-77-99 | 1 first st
35 | Bat | Man | 317-222-4777 | 99 west st
75 | Marvel | Girl | 222-333-9595 | 66 East Ave
99 | Storm | Cloud | 367-399-6767 | 50 North st

Also: What can startups learn from the DYN DNS outage?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Deploy wordpress on aws by first decoupling assets

too much inventory

You want to make your wordpress site bulletproof? No server outage worries? Want to make it faster & more reliable. And also host on cheaper components?

I was after all these gains & also wanted to kick the tires on some of Amazon’s latest devops offerings. So I plotted a way forward to completely automate the deployment of my blog, hosted on wordpress.

Here’s how!

Join 28,000 others and follow Sean Hull on twitter @hullsean.

The article is divided into two parts…

Deploy a wordpress site on aws – decouple assets (part 1)

In this one I decouple the assets from the website. What do I mean by this? By moving the db to it’s own server or RDS of even simpler management, it means my server can be stopped & started or terminated at will, without losing all my content. Cool.

You’ll also need to decouple your assets. Those are all the files in the uploads directory. Amazon’s S3 offering is purpose built for this use case. It also comes with easy cloudfront integration for object caching, and lifecycle management to give your files backups over time. Cool !

Terraform a wordpress site on aws – automate deploy (part 2)

The second part we move into all the automation pieces. We’ll use PHP’s Composer to manage dependencies. That’s fancy talk for fetching wordpress itself, and all of our plugins.

1. get your content into S3

How to do it?

A. move your content

$ cd html/wp-content/
$ aws s3 cp uploads s3://iheavy/wp-content/

Don’t have the aws command line tools installed?

$ yum install aws-cli -y
$ aws configure

B. Edit your .htaccess file with these lines:

These above steps handle all *existing* content. However you also want new content to go to S3. For that wordpress needs to understand how to put files there. Luckily there’s a plugin to help!


RewriteEngine On
RewriteRule ^wp-content/uploads/(.*)$ http://s3.amazonaws.com/your-bucket/wp-content/uploads/$1 [P]

C. Fetch WP Offload S3 Lite

You’ll see the plugin below in our composer.json file as “amazon-s3-and-cloudfront”

Theoretically you need to specify your aws credentials inside the wp-config.php. However this is insecure. You don’t ever want stuff like that in S3 or any code repository. What to do?

The best way is to use AWS ROLES for your instance. These give the whole instance access to API calls without credentials. Cool! Read more about AWS roles for instances.

Related: Is there a devops talent gap?

2. Move to your database to RDS

You may also use a roll-your-own MySQL instance. The point here is to make it a different EC2 instance. That way you can kill & rebuild the webserver at will. This offers us some cool advantages.

A. Create an RDS instance in a private subnet.

o Be sure it has no access to the outside world.
o note the root user, password
o note the endpoint or hostname

I recommend changing the password from your old instance. That way you can’t accidentally login to your old db. Well it’s still possible, but it’s one step harder.

B. mysqldump your current wp db from another server

$ cd /tmp
$ mysqldump –opts wp_database > wp_database.mysql

C. copy that dump to an instance in the same VPC & subnet as the rds instance

$ scp -i .ssh/mykey.pem ec2-user@oldbox.com:/tmp/wp_database.mysql /tmp/

D. import the data into your new db

$ cd /tmp
$ echo “create database wp_database” | mysql
$ mysql < wp_database.mysql E. Edit your wp-config.php

define(‘DB_PASSWORD’, ‘abc123’);
define(‘DB_HOST’, ‘my-rds.ccccjjjjuuuu.us-east-1.rds.amazonaws.com’);

Also: When hosting data on Amazon turns bloodsport?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

When hosting data on Amazon turns bloodsport

reddit aws outage

There’s a strong trend to automation across the cloud. That’s a great thing for startups because it reduces operational headaches & lets them focus on building products.

Join 31,000 others and follow Sean Hull on twitter @hullsean.

But as that trend begins to touch the database tier, all sorts of complications emerge. Let’s take a look at some of the tradeoffs.

1. Database as a service trend

I was recently reading Baron Schwartz’s article on the trend to database as a service.

I work with a lot of venture backed startups & pay close attention to what’s happening in New York & SF. From where I’m standing I see a similar trend. As automation simplifies management across the application stack, from load balancers to web & search servers, the same advantages are moving to database management.

Also: How to automate MySQL analysis on Amazon RDS

2. How Amazon RDS helps

Amazon’s RDS offers firms a data solution for Oracle & SQL Server as well as MySQL. For those just starting, it offers a long list of advantages.

o quick push-button deployment in minutes
o standardized parameters settings that just work
o ability to scale up or down from the dashboard
o automated backups
o multi-az so you can sleep at night

This brings a huge advantage to startups. Many have a team of developers but aren’t large enough to need an operations team and can’t afford a dedicated database administrator.

Amazon is obviously helping these firms raise the bar. And that’s a good thing.

Related: RDS or MySQL 10 use cases

3. How Amazon RDS hurts

As you get bigger, your needs will grow too. You’ll have tens of millions of customers, and with more customers comes an even higher bar. Zero downtime becomes critical. It’s then that Amazon’s solution starts to become frustrating.

Unpredictable upgrades

MySQL upgrades on RDS are a messy activity. Amazon will restart the instance, backup the instance, perform the upgrade then restart again. Each of these restarts takes a few minutes. The whole operation may have you down for ten minutes. This becomes more frustrating when your hands are completely tied. You don’t know when or what will happen!

When you roll-your-own instance, an upgrade can be performed in a matter of seconds. No instance restarts are necessary and you can monitor the process to know exactly where you are. This is the kind of control you’re going to want if you have millions of customers relying on your site & uptime.

Unnecessary slow restarts

When you apply parameter changes on RDS, some require a MySQL restart. Amazon forces the whole server to restart, increasing this downtime from a few seconds (when you roll your own) to many minutes. And while some parameters can be changed online, Amazon can provoke some strange behavior that is not always predictable.

With the frequency of these types of changes, you’ll quickly grow tired and frustrated with RDS.

EBS Snapshots are not portable

As mentioned above Amazon uses it’s standard filesystem snapshot technology to perform backups. While this works well, it can be slow & unpredictable in a multi-tenant environment.

When you roll your own, you can take advantage of xtrabackup, and perform hot backups against your database with zero downtime. This is a real godsend. What’s more they are portable, and can be moved to any other server even ones not hosted in Amazon’s cloud!

Promoting a read-replica is slow too!

One feature that Amazon touts is creating copies or “read replicas” of your data. These are great and can facilitate easy copying of data. However promoting these again brings unnecessary restarts which are slow.

When you roll your own, you can promote a read-replica or read-only slave in seconds. A few seconds can seem invisible to end users, while minutes will be perceived as a real outage or downtime.

Read: Is zero downtime even possible with RDS?

4. Is migration an option?

So what to do? As I mentioned above, there are real advantages to startups deploying their first database. It really does help. I would argue for many it can be a good place to start.

If you’re starting to outgrow RDS and frustrated with the limitations, performance tuning headaches & unneeded downtime, luckily you have options.

Migrating off of RDS onto a physical server can be done in a number of ways.

o slave off of the master

Here you build a MySQL slave on a standard EC2 instance, with your RDS instance as the master. When you’re caught up, bring your site down temporarily. Reset the slave & set to read-write mode. Then point your webservers at your new EC2 instance and bring the site back up. If done carefully 10 to 20 seconds of downtime should be plenty.

Don’t forget to run through the process with a firedrill first!

o dump & import

Another way to move your data may be MySQLdump. This option would be slower & bring a lot more downtime, but possibly necessary in some cases.

Also: 5 Reasons to move data to Amazon Redshift

5. Speed: It’s the database

Fred Wilson says speed is the number one feature of a web application. If customers are frustrated & waiting, they may leave & not come back. On the web it can be everything.

Many firms are rushing to database as a service to simplify administration. While that’s wonderful at the beginning, as you grow performance will become more of a day-to-day concern. And when it does, the database is going to be big on your list of headaches.

Web application performance inevitably involves the database and while it does, your decision to choose database as a service may come into question. Don’t be afraid to bite the bullet and manage things yourself when that time comes.

Also: Is upgrading RDS like a shit-storm that will not end?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Oracle DBAs… You Know You Want MySQL!

If you’re an Oracle DBA or developer and considering migrating an application over to MySQL you probably have a lot of questions. Here’s a five minute summary of what you should know.

Looking to hire a top flight MySQL DBA? Check out our MySQL interview guide.

What is truly delicious

o a command line that’s decades ahead of SQL*Plus
o fast & easy install via yum or apt-get package managers
– saves hassles, headaches, and standarded on linux distribution
o a very active open-source community with tons of roll-your-own solutions
o myriad of replication topologies with endless use cases

Flashback technology

Although MySQL doesn’t provide this by default, a slave set to lag behind by an hour or more can serve this function quite nicely. Perfect DIY solution.

Auditing

Out of the box, MySQL doesn’t have great auditing facilities. However the binary log can be of great assistance here. Use the mysqlbinlog tool to get human readable data and filter for the information you need with your favorite Unix tools such as sed, awk, sort and so forth.

MySQL also includes a general query log which may also be useful for such auditing jobs.

Users & Security

Logins are authenticated by hostname & username. So user@localhost has a distinct set of permissions from user@remoteserver. This can be very confusing so make every effort to create users & grants consistently. Also use –skip-name-resolve option to avoid DNS lookups. These can easily cause scalability problems.

Stored procedures

They’re there in MySQL, but not robust. Use sparingly. They can easily break replication, and cause serious performance and scalability problems. They also make your deployments more complicated – for example mysqldump requires the –routines flag to capture them.

SQL Queries & the cost based optimizer

MySQL uses a cost based optimizer like Oracle’s however it is much less mature. Don’t expect the same level of smarts. Some queries will need special attention.

o there’s no hash join, only nested loops
o you can use one index per table – no index merging
o correlated subqueries with IN() lists don’t optimize well
o not much control over stats collection & freshness

Partitioning

You can partition large tables by range, key or hash. However managing to get your queries just right to do partition pruning may be a trick. What’s more all the added operational benefits you get in Oracle may be limited. You can drop partitions quickly, however reorganizing them will be painfully slow.

Materialized views

MySQL doesn’t offer them per se, but check out Justin Swanhart’s Flexviews for a solution that may give you what you need.

Replication

Oracle standby database is a change based technology. Although MySQL is beginning to experiment with this using row-based replication, the vast majority of deployments are statement based.

Since slaves are hot or read-only, they can be used to horizontally scale your applications read capacity, serve as live backup, makeshift flashback or a logging apply server. Here’s how you can setup replication without downtime using hotbackups.

Be vigilant about performing checksums against your slave instances. This will alert you to any drift or other inconsistencies across your tables or data. We wrote a howto bulletproof replication with checksums.

Clustering

Don’t expect to find something like Oracle RAC. You won’t. For any typical web-facing database needs, MySQL NDB Cluster is most likely not what you’re looking for.

Consider MySQL master-master replication where one is active and one is passive. Configured as circular replication, all changes propagate to all nodes. It’s crucial you only ever write to one node, however. Use the inactive node to do online schema changes, test upgrades or other availability sensitive operations.

Lastly you may try DRBD which is a Linux filesystem level solution. Keep in mind your inactive database will be cold. You won’t experience any replication consistency issues, but upon promotion to master, you may experience slow performance until the caches are warmed up.

A few miscellaneous surprises

o many ALTER functions are blocking operations, so can’t be done online
o replication can be troublesome to manage & drift out of sync silently
o default MyISAM tables can corrupt easily and lose data – use Innodb instead
o complex many table joins & certain subqueries do not perform well
o there is a lot less instrumentation around the optimizer & db engine

Read this far, grab our newsletter!

Oracle Announces Paid MySQL Add-ons

 Oracle starts charging for MySQL Add-ons

Exciting news, Oracle just announced commercial MySQL extensions that they’ll be offering paid extensions to the core MySQL free product.

To be sure, this has raised waves of concern among the community, but on the whole I suspect it will be a good thing for MySQL.  This brings more commercial addons to the table, which only increases the options for customers.  Many will continue to use the core database product only, and avoid license hassles while others will surely embark on a hybrid approach if it solves their everyday business problems. Continue reading Oracle Announces Paid MySQL Add-ons

3 Biggest MySQL Migration Surprises

3 ways your MySQL migration project can shake you up

Once a development or operations team gets over the hurdle of open-source, and start to feel comfortable with the way software works outside of the enterprise world, they will likely start to settle in and feel comfortable.  Best not to get too cushy though for there are more surprises hiding around the corner.  Here are a few of the biggest ones. Continue reading 3 Biggest MySQL Migration Surprises

The New Commodity Hardware Craze aka Cloud Computing

Does anyone remember 15 years ago when the dot-com boom was just starting?  A lot of companies were running on Sun.  Sun was the best hardware you could buy for the price.  It was reliable and a lot of engineers had experience with the operating system, SunOS a flavor of Unix.

Yet suddenly companies were switching to cheap crappy hardware.  The stuff failed more often, had lower quality control, and cheaper and slower buses.  Despite all of that, cutting edge firms and startups were moving to commodity hardware in droves.  Why was it so? Continue reading The New Commodity Hardware Craze aka Cloud Computing

7 Ways to Troubleshoot MySQL

MySQL databases are great work horses of the internet.  They back tons of modern websites, from blogs and checkout carts, to huge sites like Facebook.  But these technologies don’t run themselves.  When you’re faced with a system that is slowing down, you’ll need the right tools to diagnose and troubleshoot the problem.  MySQL has a huge community following and that means scores of great tools for your toolbox. Here are 7 ways to troubleshoot MySQL. Continue reading 7 Ways to Troubleshoot MySQL

Migration to MySQL – What is it and why is it important?

MySQL is a relational database that backs many internet websites and enterprise applications.  Like all enterprise software, it has a whole complement of features which are well documented, such as data types, storage engines, transactional behaviors and so forth.  It also has a set of processes, many of which involve how software operates on Linux servers, such as how it gets installed, where binaries and libraries will get placed, where to find logfiles, and how to move directories and set permissions.  Thirdly it is important to understand the culture, in this case Unix-based, forum discussions and community contributions as an open-source project.

MySQL can do much of the workhorse kind of stuff you see in databases like Oracle or SQL Server, but sometimes it achieves those goals in very different ways.  For instance there are many open-source projects that support and surround the database, such as mysqltuner an analysis script, innotop a unix top-like utility for monitoring on-going activity in the database, and maatkit a whole suite of tools that build on and expand the features already present in the MySQL database.

Some Limitations in MySQL

  1. Complex queries and subqueries specifically can be problematic in MySQL.  If you’re used to writing huge queries in Oracle, and having the CBO figure everything out for you, you’ll be in for a surprise with MySQL.  Keep your queries simple, proper columns indexed and avoid complex joins where possible.  The EXPLAIN facility is available to you and at your disposal.  Use it!
  2. Vertical Scalability problems – primarily addressed in 5.5, the latest version of MySQL, previously the database did not scale well on greater than four processor boxes.  SMP or Symmetric Multiprocessing servers were less common 10-15 years ago when MySQL was in it’s infancy, and development is slowly catching up with the big iron of today.
  3. There is no flashback table, tablespace or database that you might find in other databases such as Oracle.  You can achieve the same thing with point-in-time recovery, so keep regular backups of your database, and also backup the transaction logs.
  4. MySQL can do JOINs, but only with the nested loops algorithm.  It can’t do sort merge join or hash join.
  5. MyISAM is the default table type and storage engine.  It is not crash safe and not transactional.  On new installations it’s recommended that you change this to InnoDB and use InnoDB for most if not all of your tables.  It’s very reliable and very fast!
  6. There is a query cache, but it caches result sets not query plans!  It also has some performance issues and shows some erratic behavior on larger SMP boxes.  Query plans are cached on a session basis, but when a session is closed and reopened, MySQL must reparse and reexecute that query.
  7. MySQL does not have a facility like Oracle’s Real Application Clusters.  It does have NDB Cluster which is an all-in-memory clustering solution.  Despite it’s promise, it tends to have very serious performance problems with any type of join, and is mainly good for single table index-based lookups.  If managed well it can increase availability but will probably reduce performance.
  8. MySQL’s default replication solution is statement based.  Although it is easy to setup, it breaks almost as easily, sometimes with resolvable errors, and sometimes silently.  Consider row-based replication, and definitely make use of Maatkit’s mk-table-checksum and mk-table-sync tools.  Also be sure to do thorough and regular monitoring of your replication setup.
  9. There are no in-built materialized views or snapshots in MySQL.  There is an open-source project called Flexviews by Justin Swanhart that provides this facility to the MySQL community.
  10. MySQL provides stored procedures, triggers and functions as a regular feature to the database.  However I would use them with caution.  They are very difficult to edit, troubleshoot and diagnose when they are causing troubles.  Also as with the query plan caching, stored procedures are cached at the session level, so they can be expensive to execute over and over again in different areas of your application.  They can cause real performance problems.
  11. There is no in-built mechanism for auditing that you find in relational databases such as Oracle 11g.
  12. Only b-tree indexes are supported, no bitmap indexes, index-organized tables, clustered indexes or other more exotic index types.
  13. ALTER TABLE is generally a locking and blocking operation.  For example if you add a new column or change a columns data type, the entire table will be locked for the duration of the operation.  This will be a surprise coming from the Oracle world where these type of operations can routinely be done online.

MySQL’s Strengths Are Numerous

  1. Install with an RPM using Yum or Aptget.  Fast & simple!
  2. Works great in the cloud, using MySQL Community distro, Percona distro, or Amazon’s own RDS solution.
  3. Comes out-of-the-box with an excellent command line shell providing all sorts of features and power that are constant frustrations on the Oracle side.  Command history, standard input/output redirection support, a full compliment of features and options, and easy autologin with a user level my.cnf file which fits in nicely with the global settings as well.
  4. A simpler mechanism to serve unique id columns with the auto-increment data type.  Although Oracle’s sequence method is extremely scalable, for many many developers it is troublesome and confusing.
  5. Good support of the LIMIT clause allowing an easier method for developers to fetch a subset of data.
  6. A huge community of users, forums, and support in third party applications such as monitoring (Nagios etc…) as well as metrics collection (Munin, Cacti, OpenNMS, Ganglia etc.)
  7. Great visibility of system variables with SHOW VARIABLES.  Many can be changed dynamically as well, just like Oracle.
  8. Great visibility of internal system state with SHOW PROCESSLIST.
  9. System counters for all sorts of internal instrumentation data using SHOW STATUS and SHOW INNODB STATUS.  Ultimately it is not as comprehensive as Oracle’s own data dictionary and millions of instrumentation counts.  However Oracle could take a huge page out of the MySQL book in terms of usability.  The obfuscation of Oracle’s internal kernel state makes it all but unusable by most.
  10. innotop, the utility much like the unix TOP facility that all Unix & Linux folks love, it provides instant visibility into what queries are running, what work is being done, and what is blocking.  Oracle could really take a page from this playbook, as this tool is so invaluable.
  11. The incredible Maatkit, a veritable goldmine of great community contributed powertools.  Query analyzers, profilers, log tools, replication tools, data archiver, a find facility, and a whole lot more!

Sean Hull discusses further on Quora – What considerations are important when migrating to MySQL?