Amazon releases a new database offering every other day. It sure isn’t easy to keep up.
Join 35,000 others and follow Sean Hull on twitter @hullsean.
Let’s say you’re hiring a devops & you want to suss out their database knowledge? Or you’re hiring a professional services firm or freelance consultant. Whatever the case you’ll need to sift through for the best people. Here’s how.
What database does Amazon support for caching?
Caching is a popular way to speed up access to your backend database. Put Amazon’s elasticache behind your webserver, and you can reduce load on your database by 90%. Nice!
The two types that amazon supports are Memcache & Redis. Memcache is historically more popular. These days Redis seems a clear winner. It’s faster, and can maintain your cached data between restarts. That will save you I promise!
How can I store big data in AWS?
Amazon’s data warehouse offering is called Redshift. I wrote Why is everyone suddenly talking about Redshift?. Why indeed!
When you’re doing large reports for your business intelligence team, you don’t want to bog down your backend relational database. Redshift is purpose built for this use case.
I’ve see a report that took over 8 hours in MySQL return in under 60 seconds in Redshift!
A new offering is Amazon Spectrum. This tech is super cool. Load up all your data into S3, in standard CSV format. Then without even loading it into Redshift, you can query the S3 data directly. This is super useful. Firstly because S3 is 1/10th the price. But also because it allows you to stage your data before loading into Redshift itself. Goodbye Google Big Query! I talked about spectrum here.
What relational database options are there on Amazon?
Amazon supports a number of options through it’s Relational Database Service or RDS. This is managed databases, which means less work on your DBAs shoulders. It also may make upgrades slower and harder with more downtime, but you get what you pay for.
There are a lot of platforms available. As you might guess MySQL & Postgres are there. Great! Even better you can use MariaDB if that’s your favorite. You can also go with Aurora which is Amazon’s own home-brew drop in replacement for MySQL that promises greater durability and some speedups.
If you’re a glutton for punishment, you can even get Oracle & SQL Server working on RDS. Very nice!
Does AWS have a NoSQL database solution?
If NoSQL is to your taste, Amazon has DynamoDB. According to . I haven’t seen a lot of large production applications using it, but what he describes makes a lot of sense. The way Amazon scales nodes & data I/O is bound to run into real performance problems.
That said it can be a great way to get you up and running quickly.
How do I do ETL & migrate data to AWS?
Let’s be honest, Amazon wants to make this really easy. The quicker & simpler it is to get your data there, that more you’ll buy!
Amazon’s Database Migration Service or DMS allows you to configure your old database as a data source, then choose a Amazon db solution as destination, then just turn on the spigot and pump your data in!
ETL is extract transform and load, data warehouse terminology for slicing and dicing data before you load it into your warehouse. Many of todays warehouses are being built with the data lake model, because databases like Redshift have gotten so damn fast. That model means you stage all your source data as-is in your warehouse, then build views & summary tables as needed to speed up queries & reports. Even better you might look a tool like xplenty.
Amazon’s new offering is called Glue. Five ways to get data into Amazon Redshift. This solution is purpose build for creating a powerful data pipeline, complete with python code to do transformations.