Amazon is about to launch a product called glue. As you can see below, this is the last piece in the data warehousing puzzle. With that in place, Amazon will own you! Or at least have push button products to meet all of enterprises varying needs.
Even if you’re a small startup, you can do big-shot big enterprise data warehousing. That means everyone can use cutting edge data driven techniques for product & business decisions.
Join 33,000 others and follow Sean Hull on twitter @hullsean.
What is Redshift
Redshift is like the OLAP databases of years past, the Oracle’s of the world purpose built for warehousing data. Obviously without the crazy licensing model Oracle was famous for. With Amazon you can get enterprise class data warehouse for modest hourly prices.
If my recent conversations with recruiters about Redshift demand are any indication, there’s been a sudden uptick in startups looking for redshift expertise.
What is Spectrum?
Spectrum is a very new extension of Redshift allowing you to access & query S3 file data directly. This means you can have petabytes of data that you can access pre-load time. So you will ETL and load portions of it, but with Spectrum you can still access the offline data too.
In the old Oracle days this was called an EXTERNAL TABLE. I mention this only to say that Amazon isn’t doing anything that hasn’t been done before. Rather they’re bringing these advanced features within reach of everyday startups. That’s cool.
What is glue?
Glue is still in beta, but if the RE:Invent talk above is any indication, it’s set to disrupt an entire industry. Wow!
Glue first catalogs your data sources. What does this mean, it scans them & models their schemas.
It then generates sample python ETL code. Modify it, or write your own. Share your code on Git. Or borrow other open source pieces, that already address your specific ETL use case!
Lastly it includes a job scheduler which handles dependencies. Job A must be completed before B can run and so forth. Error handling & logging are also all included.
Since these are native Amazon services, of course they’re going to integrate with their dangerously fast Redshift warehouse.
What is serverless?
Serverless means deploying functions directly into the cloud. No servers, no configuration. All the systems administration & automation is hidden. No more devops to argue with! Amazon’s own offering is called Lambda.
What is Quicksight?
Now it’s possible to stay completely within the cozy Amazon ecosystem even for business insight and analytics.