If you’re like a lot of folks you’re building an application in AWS & using a NoSQL database for persistent data. Dynamodb fits the bill nicely. Little or no ops to worry about, at least in the traditional sense.
Join 32,000 others and follow Sean Hull on twitter @hullsean.
However there are knobs to turn & dials to set. Here are a few you should be thinking about.
1. You can replicate across regions
Dynamodb introduced a feature in 2015 called streams. If you come from the relational database world, you can think of streams like a transaction log. It captures before & after image of your data. Couple those with useful lambda functions, and you have triggers that can do anything you want.
Turns out Amazon have been all over this, and already build a library to do cross-region replication with streams. Pretty cool!
2. You can manage retrieval costs
Dynamodb automatically creates and manages an index on the primary key. But chances are that your application will read data based on other columns too. You can create secondary indexes on these other columns, reducing your data access patterns. Without an index Dynamodb would have to scan every row to find your data, but the index can dramatically reduce this, and making data retrieval faster too!
Related: Does Amazon eat it’s own dogfood?
3. You can do SQL Like queries
That’s right, if you thought NoSQL meant no SQL you were only half right. By loading your Dynamodb data into HDFS, you can allow elastic map reduce to have at it. And thus open the door to use HiveQL to query the data the way you wanted to in the first place.
Convoluted? Yes. But this is the brave new world of the cloud!
4. Partitions are handy & useful
By default dynamo is partitioning your data behind the scenes. Because that’s what good distributed databases are supposed to do. It does so using the primary key to figure out where the data should go. And just like with Redshift you have option of also using sort key to help the optimizer figure out how to distribute the data. This is important. Going across those different instances brings a lot of latency costs that will surprise you.
5. Metrics are your partner in performance
CloudWatch provides all sorts of instrumentation for Dynamodb. Read & write activity, throttling, errors & latency are just a few of the things you can see.