Tag Archives: aurora

How to interview an amazon database expert

via GIPHY

Amazon releases a new database offering every other day. It sure isn’t easy to keep up.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

Let’s say you’re hiring a devops & you want to suss out their database knowledge? Or you’re hiring a professional services firm or freelance consultant. Whatever the case you’ll need to sift through for the best people. Here’s how.

Also: How to interview an AWS expert

What database does Amazon support for caching?

Caching is a popular way to speed up access to your backend database. Put Amazon’s elasticache behind your webserver, and you can reduce load on your database by 90%. Nice!

The two types that amazon supports are Memcache & Redis. Memcache is historically more popular. These days Redis seems a clear winner. It’s faster, and can maintain your cached data between restarts. That will save you I promise!

Also: Is AWS too complex for small dev teams?

How can I store big data in AWS?

Amazon’s data warehouse offering is called Redshift. I wrote Why is everyone suddenly talking about Redshift?. Why indeed!

When you’re doing large reports for your business intelligence team, you don’t want to bog down your backend relational database. Redshift is purpose built for this use case.

I’ve see a report that took over 8 hours in MySQL return in under 60 seconds in Redshift!

A new offering is Amazon Spectrum. This tech is super cool. Load up all your data into S3, in standard CSV format. Then without even loading it into Redshift, you can query the S3 data directly. This is super useful. Firstly because S3 is 1/10th the price. But also because it allows you to stage your data before loading into Redshift itself. Goodbye Google Big Query! I talked about spectrum here.

Related: Which engineering roles are in greatest demand?

What relational database options are there on Amazon?

Amazon supports a number of options through it’s Relational Database Service or RDS. This is managed databases, which means less work on your DBAs shoulders. It also may make upgrades slower and harder with more downtime, but you get what you pay for.

There are a lot of platforms available. As you might guess MySQL & Postgres are there. Great! Even better you can use MariaDB if that’s your favorite. You can also go with Aurora which is Amazon’s own home-brew drop in replacement for MySQL that promises greater durability and some speedups.

If you’re a glutton for punishment, you can even get Oracle & SQL Server working on RDS. Very nice!

Read: Can on-demand consulting save startups time & money?

Does AWS have a NoSQL database solution?

If NoSQL is to your taste, Amazon has DynamoDB. According to . I haven’t seen a lot of large production applications using it, but what he describes makes a lot of sense. The way Amazon scales nodes & data I/O is bound to run into real performance problems.

That said it can be a great way to get you up and running quickly.

Read: Can on-demand consulting save startups time & money?

How do I do ETL & migrate data to AWS?

Let’s be honest, Amazon wants to make this really easy. The quicker & simpler it is to get your data there, that more you’ll buy!

Amazon’s Database Migration Service or DMS allows you to configure your old database as a data source, then choose a Amazon db solution as destination, then just turn on the spigot and pump your data in!

ETL is extract transform and load, data warehouse terminology for slicing and dicing data before you load it into your warehouse. Many of todays warehouses are being built with the data lake model, because databases like Redshift have gotten so damn fast. That model means you stage all your source data as-is in your warehouse, then build views & summary tables as needed to speed up queries & reports. Even better you might look a tool like xplenty.

Amazon’s new offering is called Glue. Five ways to get data into Amazon Redshift. This solution is purpose build for creating a powerful data pipeline, complete with python code to do transformations.

Read: Is data your dirty little secret?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Did MySQL & Mongo have a beautiful baby called Aurora?

amazon aurora slide

Amazon recently announced RDS Aurora a new addition to their database as a service offerings.

Here’s Mark Callaghan’s take on what’s happening under the hood and thoughts from Fusheng Han.

Amazon is uniquely positioned with RDS to take on offerings like Clustrix. So it’s definitely worth reading Dave Anselmi’s take on Aurora.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

1. Big availability gains

One of the big improvements that Aurora seems to offer is around availability. You can replicate with aurora, or alternatively with MySQL binlog type replication as well. They’re also duplicating data two times in three different availability zones for six copies of data.

All this is done over their SSD storage network which means it’ll be very fast indeed.

Read: What’s best RDS or MySQL? 10 Use Cases

2. SSD means 5x faster

The Amazon RDS Aurora FAQ claims it’ll be 5x faster than equivalent hardware, but making use of it’s proprietary SSD storage network. This will be a welcome feature to anyone already running on MySQL or MySQL for RDS.

Also: Is MySQL talent in short supply?

3. Failover automation

Unplanned failover takes just a few minutes. Here customers will really be benefiting from the automation that Amazon has built around this process. Existing customers can do all of this of course, but typically require operations teams to anticipate & script the necessary steps.

Related: Will Oracle Kill MySQL?

4. Incremental backups & recovery

The new Aurora supports incremental backups & point-in-time recovery. This is traditionally a fairly manual process. In my experience MySQL customers are either unaware of the feature, or not interested in using it due to complexity. Restore last nights backup and we avoid the hassle.

I predict automation around this will be a big win for customers.

Check out: Are SQL Databases dead?

5. Warm restarts

RDS Aurora separates the buffer cache from the MySQL process. Amazon has probably accomplished this by some recoding of the stock MySQL kernel. What that means is this cache can survive a restart. Your database will then start with a warm cache, avoiding any service brownout.

I would expect this is a feature that looks great on paper, but one customers will rarely benefit from.

See also: The Myth of Five Nines – Is high availability overrated?

Unanswered questions

The FAQ says point-in-time recovery up to the last five minutes. What happens to data in those five minutes?

Presumably aurora duplication & read-replicas provide this additional protection.

If Amazon implemented Aurora as a new storage engine, doesn’t that mean new code?

As with anything your mileage may vary, but Innodb has been in the wild for many years. It is widely deployed, and thus tested in a variety of environments. Aurora may be a very new experiment.

Will real-world customers actually see 500% speedup?

Again your mileage may vary. Lets wait & see!

Related: 5 Things toxic to scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters