Like every other startup, you’ve built everything on AWS or are moving there quickly.
So it makes sense & is an easy win to build your analytics & bigdata engine on AWS too. Enter Redshift to the rescue.
Unlike the old days of Oracle where you had career DBAs to handle your data & hold the keys to the data kingdom, these days nobody is managing the data. What makes matters worse is the documentation is weak, and it hasn’t been out long enough to warrant good books on the topic.
But there’s hope!
Here are five great tools to help you in your day-to-day management of Redshift
Join 28,000 others and follow Sean Hull on twitter @hullsean.
The Github Redshift Utils Page has the whole package, while I link to individual scripts below.
1. Slow queries
Just like MySQL, Postgres & Oracle, your database performance & responsiveness lives & dies by the SQL that you write. And just like all those databases, you want to look at the slowest queries, to know where to tune. Spend your time on the worst offenders first.
2. Fragmented Tables
You add data, you delete data. And just like all the other relational databases we know & love, this process leaves gaps. New data is still added at the high water mark, and full table scans still read those empty blocks.
The solution is the vacuum command, but which tables should we run it on?
Run this script & find out which tables need maintenance.
3. Analyze Schema Compression
Getting your schema compression right on your data in Redshift turns out to be essential to performance too. Each datatype has a recommended compression type that works best for it. You can let Redshift decide when it first loads data, but it doesn’t always guess correctly.
This tool will help you analyze your data, and set compression types properly.
4. Audit Users
Once you have your Redshift cluster up & running, you’ll setup users for applications to connect as. Each will have different privileges for objects & schemas in your database. You’re auditing those right? 🙂
This script will output who can read & write what. Useful to ensure your application isn’t overly loose with permissions. Shoot for least privileges where possible.
5. Lambda Data Loader
Want to automatically load data into Redshift? Download this handy Lambda based tool to respond to S3 events. Whenever a new file appears, it’s data will get loaded into Redshift. Complete with metadata configuration control in Dynamodb.