How I use Terraform & Composer to automate wordpress on aws

iRobot1

How I setup wordpress to deploy automatically on aws

You want to make your wordpress site bulletproof? No server outage worries? Want to make it faster & more reliable. And also host on cheaper components?

I was after all these gains & also wanted to kick the tires on some of Amazon’s latest devops offerings. So I plotted a way forward to completely automate the deployment of my blog, hosted on wordpress.

Here’s how!

Join 28,000 others and follow Sean Hull on twitter @hullsean.

The article is divided into two parts…

Deploy a wordpress site on aws – decouple assets (part 1)

In this one I decouple the assets from the website. What do I mean by this? By moving the db to it’s own server or RDS of even simpler management, it means my server can be stopped & started or terminated at will, without losing all my content. Cool.

You’ll also need to decouple your assets. Those are all the files in the uploads directory. Amazon’s S3 offering is purpose built for this use case. It also comes with easy cloudfront integration for object caching, and lifecycle management to give your files backups over time. Cool !

Deploy a wordpress site on aws – automate (part 2)

The second part we move into all the automation pieces. We’ll use PHP’s Composer to manage dependencies. That’s fancy talk for fetching wordpress itself, and all of our plugins.

1. Isolate your config files

Create a directory & put your config files in it.

$ mkdir iheavy
$ cd iheavy
$ touch htaccess
$ touch httpd.conf
$ touch wp-config.php
$ touch a_simple_pingdom_test.php
$ touch composer.json
$ zip -r iheavy-config.zip *
$ aws s3 cp iheavy-config.zip s3://my-config-bucket/

In a future post we’re going to put all these files in version control. Amazon’s CodeCommit is feature compatible with Github, but integrated right into your account. Once you have your files there, you can use CodeDeploy to automatically place files on your server.

We chose to leave this step out, to simplify the server role you need, for your new EC2 webserver instance. In our case it only needs S3 permissions!

Also: When devops means resistance to change

2. Build your terraform script

Terraform is a lot like Vagrant. I wrote a Howto deploy on EC2 with Vagrant article a couple years ago.

The terraform configuration formalizes what you are asking of Amazon’s API. What size instance? Which AMI? What VPC should I launch in? Which role should my instance assume to get S3 access it needs? And lastly how do we make sure it gets the same Elastic IP each time it boots?

All the magic is inside the terraform config.

Here’s what I see:

levanter:~ sean$ cat iheavy.tf

resource "aws_iam_role" "web_iam_role" {
    name = "web_iam_role"
    assume_role_policy = <

And here's what it looks like when I ask terraform to build my infrastructure:

levanter:~ sean$ terraform apply
aws_iam_instance_profile.web_instance_profile: Refreshing state... (ID: web_instance_profile)
aws_iam_role.web_iam_role: Refreshing state... (ID: web_iam_role)
aws_s3_bucket.apps_bucket: Refreshing state... (ID: iheavy)
aws_iam_role_policy.web_iam_role_policy: Refreshing state... (ID: web_iam_role:web_iam_role_policy)
aws_instance.iheavy: Refreshing state... (ID: i-14e92e24)
aws_eip.bar: Refreshing state... (ID: eipalloc-78732a47)
aws_instance.iheavy: Creating...
  ami:                               "" => "ami-1a249873"
  availability_zone:                 "" => ""
  ebs_block_device.#:                "" => ""
  ephemeral_block_device.#:          "" => ""
  iam_instance_profile:              "" => "web_instance_profile"
  instance_state:                    "" => ""
  instance_type:                     "" => "t1.micro"
  key_name:                          "" => "iheavy"
  network_interface_id:              "" => ""
  placement_group:                   "" => ""
  private_dns:                       "" => ""
  private_ip:                        "" => ""
  public_dns:                        "" => ""
  public_ip:                         "" => ""
  root_block_device.#:               "" => ""
  security_groups.#:                 "" => ""
  source_dest_check:                 "" => "true"
  subnet_id:                         "" => "subnet-1f866434"
  tenancy:                           "" => ""
  user_data:                         "" => "ca8a661fffe09e4392b6813fbac68e62e9fd28b4"
  vpc_security_group_ids.#:          "" => "1"
  vpc_security_group_ids.2457389707: "" => "sg-46f0f223"
aws_instance.iheavy: Still creating... (10s elapsed)
aws_instance.iheavy: Still creating... (20s elapsed)
aws_instance.iheavy: Creation complete
aws_eip.bar: Modifying...
  instance: "" => "i-6af3345a"
aws_eip_association.eip_assoc: Creating...
  allocation_id:        "" => "eipalloc-78732a47"
  instance_id:          "" => "i-6af3345a"
  network_interface_id: "" => ""
  private_ip_address:   "" => ""
  public_ip:            "" => ""
aws_eip.bar: Modifications complete
aws_eip_association.eip_assoc: Creation complete

Apply complete! Resources: 2 added, 1 changed, 0 destroyed.

The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.

State path: terraform.tfstate
levanter:~ sean$ 

Also: Is Amazon too big to fail?

3. Use Composer to automate wordpress install

There is a PHP package manager called composer. It manages dependencies and we depend on a few things. First WordPress itself, and second the various plugins we have installed.

The file is a JSON file. Pretty vanilla. Have a look:

{
    "name": "acme/brilliant-wordpress-site",
    "description": "My brilliant WordPress site",
    "repositories":[
        {
            "type":"composer",
            "url":"https://wpackagist.org"
        }
    ],
    "require": {
       "aws/aws-sdk-php":"*",
	    "wpackagist-plugin/medium":"1.4.0",
       "wpackagist-plugin/google-sitemap-generator":"3.2.9",
       "wpackagist-plugin/amp":"0.3.1",
	    "wpackagist-plugin/w3-total-cache":"0.9.3",
	    "wpackagist-plugin/wordpress-importer":"0.6.1",
	    "wpackagist-plugin/yet-another-related-posts-plugin":"4.0.7",
	    "wpackagist-plugin/better-wp-security":"5.3.7",
	    "wpackagist-plugin/disqus-comment-system":"2.74",
	    "wpackagist-plugin/amazon-s3-and-cloudfront":"1.1",
	    "wpackagist-plugin/amazon-web-services":"1.0",
	    "wpackagist-plugin/feedburner-plugin":"1.48",
       "wpackagist-theme/hueman":"*",
	    "php": ">=5.3",
	    "johnpbloch/wordpress": "4.6.1"
    },
    "autoload": {
        "psr-0": {
            "Acme": "src/"
        }
    }

}

Read: Is aws a patient that needs constant medication?

4. build your user-data script

This captures all the commands you run once the instance starts. Update packages, install your own, move & configure files. You name it!

#!/bin/sh

yum update -y
yum install emacs -y
yum install mysql -y
yum install php -y
yum install git -y
yum install aws-cli -y
yum install gd -y
yum install php-gd -y
yum install ImageMagick -y
yum install php-mysql -y


yum install -y httpd24 
service httpd start
chkconfig httpd on

# configure mysql password file
echo "[client]" >> /root/.my.cnf
echo "host=my-rds.ccccjjjjuuuu.us-east-1.rds.amazonaws.com" >> /root/.my.cnf
echo "user=root" >> /root/.my.cnf
echo "password=abc123" >> /root/.my.cnf


# install PHP composer
export COMPOSE_HOME=/root
echo "installing composer..."
php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');"
php -r "if (hash_file('SHA384', 'composer-setup.php') === 'e115a8dc7871f15d853148a7fbac7da27d6c0030b848d9b3dc09e2a0388afed865e6a3d6b3c0fad45c48e2b5fc1196ae') { echo 'Installer verified'; } else { echo 'Installer corrupt'; unlink('composer-setup.php'); } echo PHP_EOL;"
php composer-setup.php
php -r "unlink('composer-setup.php');"
mv composer.phar /usr/local/bin/composer


# fetch config files from private S3 folder
aws s3 cp s3://iheavy-config/iheavy_files.zip .

# unzip files
unzip iheavy_files.zip 

# use composer to get wordpress & plugins
composer update

# move wordpress software
mv wordpress/* /var/www/html/

# move plugins
mv wp-content/plugins/* /var/www/html/wp-content/plugins/

# move pingdom test
mv a_simple_pingdom_test.php /var/www/html

# move htaccess
mv htaccess /var/www/html/.htaccess

# move httpd.conf
mv iheavy_httpd.conf /etc/httpd/conf.d

# move our wp-config into place
mv wp-config.php /var/www/html

# restart apache
service httpd restart

# allow apache to create uploads & any files inside wp-content
chown apache /var/www/html/wp-content

You can monitor things as they're being installed. Use ssh to reach your new instance. Then as root:

$ tail -f /var/log/cloud-init.log

Related: Does Amazon eat it's own dogfood?

5. Time to test

Visit the domain name you specified inside your /etc/httpd/conf.d/mysite.conf

You have full automation now. Don't believe me? Go ahead & TERMINATE the instance in your aws console. Now drop back to your terminal and do:

$ terraform apply

Terraform will figure out that the resources that *should* be there are missing, and go ahead and build them for you. AGAIN. Fully automated style!

Don't forget your analytics beacon code

Hopefully you remember how your analytics is configured. The beacon code makes an API call everytime a page is loaded. This tells google analytics or other monitoring systems what your users are doing, and how much time they're spending & where.

This typically goes in the header.php file. We'll leave it as an exercise to automate this piece yourself!

Also: Is AWS too complex for small dev teams?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don't work with recruiters

Deploy wordpress on aws by first decoupling assets

too much inventory

You want to make your wordpress site bulletproof? No server outage worries? Want to make it faster & more reliable. And also host on cheaper components?

I was after all these gains & also wanted to kick the tires on some of Amazon’s latest devops offerings. So I plotted a way forward to completely automate the deployment of my blog, hosted on wordpress.

Here’s how!

Join 28,000 others and follow Sean Hull on twitter @hullsean.

The article is divided into two parts…

Deploy a wordpress site on aws – decouple assets (part 1)

In this one I decouple the assets from the website. What do I mean by this? By moving the db to it’s own server or RDS of even simpler management, it means my server can be stopped & started or terminated at will, without losing all my content. Cool.

You’ll also need to decouple your assets. Those are all the files in the uploads directory. Amazon’s S3 offering is purpose built for this use case. It also comes with easy cloudfront integration for object caching, and lifecycle management to give your files backups over time. Cool !

Terraform a wordpress site on aws – automate deploy (part 2)

The second part we move into all the automation pieces. We’ll use PHP’s Composer to manage dependencies. That’s fancy talk for fetching wordpress itself, and all of our plugins.

1. get your content into S3

How to do it?

A. move your content

$ cd html/wp-content/
$ aws s3 cp uploads s3://iheavy/wp-content/

Don’t have the aws command line tools installed?

$ yum install aws-cli -y
$ aws configure

B. Edit your .htaccess file with these lines:

These above steps handle all *existing* content. However you also want new content to go to S3. For that wordpress needs to understand how to put files there. Luckily there’s a plugin to help!


RewriteEngine On
RewriteRule ^wp-content/uploads/(.*)$ http://s3.amazonaws.com/your-bucket/wp-content/uploads/$1 [P]

C. Fetch WP Offload S3 Lite

You’ll see the plugin below in our composer.json file as “amazon-s3-and-cloudfront”

Theoretically you need to specify your aws credentials inside the wp-config.php. However this is insecure. You don’t ever want stuff like that in S3 or any code repository. What to do?

The best way is to use AWS ROLES for your instance. These give the whole instance access to API calls without credentials. Cool! Read more about AWS roles for instances.

Related: Is there a devops talent gap?

2. Move to your database to RDS

You may also use a roll-your-own MySQL instance. The point here is to make it a different EC2 instance. That way you can kill & rebuild the webserver at will. This offers us some cool advantages.

A. Create an RDS instance in a private subnet.

o Be sure it has no access to the outside world.
o note the root user, password
o note the endpoint or hostname

I recommend changing the password from your old instance. That way you can’t accidentally login to your old db. Well it’s still possible, but it’s one step harder.

B. mysqldump your current wp db from another server

$ cd /tmp
$ mysqldump –opts wp_database > wp_database.mysql

C. copy that dump to an instance in the same VPC & subnet as the rds instance

$ scp -i .ssh/mykey.pem ec2-user@oldbox.com:/tmp/wp_database.mysql /tmp/

D. import the data into your new db

$ cd /tmp
$ echo “create database wp_database” | mysql
$ mysql < wp_database.mysql E. Edit your wp-config.php

define(‘DB_PASSWORD’, ‘abc123’);
define(‘DB_HOST’, ‘my-rds.ccccjjjjuuuu.us-east-1.rds.amazonaws.com’);

Also: When hosting data on Amazon turns bloodsport?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 things you didn’t know about Dynamodb that are hurting you bad

amazon-dynamo-db

If you’re like a lot of folks you’re building an application in AWS & using a NoSQL database for persistent data. Dynamodb fits the bill nicely. Little or no ops to worry about, at least in the traditional sense.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

However there are knobs to turn & dials to set. Here are a few you should be thinking about.

1. You can replicate across regions

Dynamodb introduced a feature in 2015 called streams. If you come from the relational database world, you can think of streams like a transaction log. It captures before & after image of your data. Couple those with useful lambda functions, and you have triggers that can do anything you want.

Turns out Amazon have been all over this, and already build a library to do cross-region replication with streams. Pretty cool!

Also: Is aws too complex for small dev teams?

2. You can manage retrieval costs

Dynamodb automatically creates and manages an index on the primary key. But chances are that your application will read data based on other columns too. You can create secondary indexes on these other columns, reducing your data access patterns. Without an index Dynamodb would have to scan every row to find your data, but the index can dramatically reduce this, and making data retrieval faster too!

Related: Does Amazon eat it’s own dogfood?

3. You can do SQL Like queries

That’s right, if you thought NoSQL meant no SQL you were only half right. By loading your Dynamodb data into HDFS, you can allow elastic map reduce to have at it. And thus open the door to use HiveQL to query the data the way you wanted to in the first place.

Convoluted? Yes. But this is the brave new world of the cloud!

Read: Is AMazon too big to fail?

4. Partitions are handy & useful

By default dynamo is partitioning your data behind the scenes. Because that’s what good distributed databases are supposed to do. It does so using the primary key to figure out where the data should go. And just like with Redshift you have option of also using sort key to help the optimizer figure out how to distribute the data. This is important. Going across those different instances brings a lot of latency costs that will surprise you.

Also: When hosting data on Amazon turned bloodsport

5. Metrics are your partner in performance

CloudWatch provides all sorts of instrumentation for Dynamodb. Read & write activity, throttling, errors & latency are just a few of the things you can see.

Also: Is aws the patient that needs constant medication?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How do we lock down cloud systems from disgruntled engineers?

CommitStrip.com

I worked at a customer last year, on a short term assignment. A brilliant engineer had built their infrastructure, automated deployments, and managed all the systems. Sadly despite all the sleepless nights, and dedication, they hadn’t managed to build up good report with management.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I’ve seen this happen so many times, and I do find it a bit sad. Here’s an engineer who’s working his butt off, really wants the company to succeed. Really cares about the systems. But doesn’t connect well with people, often is dismissive, disrespectful or talks down to people like they’re stupid. All of this burns bridges, and there’s a lot of bad feelings between all parties.

How do you manage the exit process? Here’s a battery of recommendations for changing credentials & logins so that systems can’t be accessed anymore.

1. Lock out API access

You can do this by removing the administrator role or any other role their IAM user might have. That way you keep the account around *just in case*. This will also prevent them from doing anything on the console, but you can see if they attempt any logins.

Also: Is AWS too complex for small dev teams?

2. Lock out of servers

They may have the private keys for various serves in your environment. So to lock them out, scan through all the security groups, and make sure their whitelisted IPs are gone.

Are you using a bastion box for access? That’s ideal because then you only have one accesspoint. Eliminate their login and audit access there. Then you’ve covered your bases.

Related: Does Amazon eat it’s own dogfood?

3. Update deployment keys

At one of my customers the outgoing op had setup many moving parts & automated & orchestrated all the deployment processes beautifully. However he also used his personal github key inside jenkins. So when it went to deploy, it used those credentials to get the code from github. Oops.

We ended up creating a company github account, then updating jenkins with those credentials. There were of course other places in the capistrano bits that also needed to be reviewed.

Read: Is aws a patient that needs constant medication?

4. Update dashboard logins

Monitoring with NewRelic or Nagios? Perhaps you have a centralized dashboard for your internal apps? Or you’re using Slack?

Also: Is Amazon too big to fail?

5. Audit Non-key based logins

Have some servers outside of AWS in a traditional datacenter? Or even servers in AWS that are using usernames & passwords? Be sure to audit the full list of systems, and change passwords or disable accounts for the outgoing sysop.

Also: When hosting data on Amazon turns bloodsport?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How do we measure devotion?

devoted_employee

I was talking recently over email with a hiring manager. Jamie (not his real name) wanted to hire me, but was set against consulting. While that by itself is understandable, he seemed to equate it with devotion. This troubled me. Here’s the quote below.

Join 32,000 others and follow Sean Hull on twitter @hullsean.


While I am sure your skills are excellent, I guess what I am trying to gauge is your desire to quit consulting and join us full time.  I am looking for you to share my vision of changing publishing through data.   Let me be clear: I am not looking for a contractor.  Acme is a fabulous company and I need a person devoted to Acme and to our data assets.

1. Devotion on vacation

Here’s my response. All names have been changed.


I understand Jamie.

I hear you about devotion, I think it’s very important too.  In 2010, I was working at MGC.  After 3 months, they hired a large remote DBA firm out of Canada, to manage the database systems & my contract concluded.  

A few weeks later and a few hours before a plane flight,  I got a harried call.  Can you help us? Database replication is broken & our site is offline.   I jumped on skype to chat with the team, even as I was packing my bags.  I went to the airport, and got on WIFI again.  In-flight on my way to California I remained online to help repair the systems & bring everything back.  It took a few more days and half of my vacation to get things working again, but I wanted to help.

My boss at MGC kept me on for 1 ½ year after that.  He felt I was devoted & gave them the very best service.  

If you change your mind, or would like to discuss further, don’t hesitate to reach out.

Also: What happens when clients don’t pay?

2. Devotion to a manager

I had another experience years back with company Media Inc. Working under a very good CTO, I was surrounded by a team who was also very loyal to him. After about a year, he decided to leave. He had gotten a very enticing offer from another firm. Although he made a great effort to leave the ship in good condition, the crew felt the ship rocking a bit. A temporary CTO was brought on who had a very different style.

As the ship continued to rock at sea, finally a new CTO was found. He however was not popular at all. He had a swagger & tended to throw his weight around, irritating the team, and making them fear they might be thrown from the ship. Slowly they began to leave. After three months, six out of eight on the team had left. There was one old-school Oracle guy still left, and me.

Although he certainly had a different style than the previous boss, it didn’t bother me much. I told him I’d stay as long as he needed me. I was also working remote so I didn’t deal with some of the day-to-day politics.

My devotion was to the business, databases & systems. I accomplished this by being devoted to my own business.

Related: Why I ask customers for a deposit?

3. Devotion to vesting

I worked at another firm about three years ago. Let’s call them Growing Fast Inc. While the firm itself was gaining ground & getting customers like Nike & Wallmart, it still had an engineering team of only ten. You could say it was boxing way above it’s weight.

While it tried to grow, it hired an outside CTO to help. His style was primarily management facing, while the teams problems were based in technology. With tons of technical debt & a lack of real leadership, the engineering team was floundering. Lots of infighting was making things worse.

Suddenly a key team member decided to quit. The following week another, and after that two more. All told four left. When you consider how small the team was, and further that the remaining members were basically founders a different picture emerges. Four out of six (non-founders) had left in two weeks, roughly 66% of the engineering team. The only other guy who stayed had his visa sponsored by Growing Fast Inc.

The founders who stayed were all vested. Everyone else quit because of mismanagement.

Read: 5 conversational ways to evaluate great consultants

4. Devotion to code & data

In an industry as competitive as software & technology, it’s often devotion to building things that wins the day. Using the latest & greatest languages, databases & tech stack can carry a lot of weight.

Managing technical debt can make a difference too. Developers don’t want to be asked to constantly walk a minefield of other developers mistakes. A minefield needs to be cleaned up, for the business to flourish.

Also: 5 things I learned about trust & advising clients?

5. Devotion through & through

Running a startup isn’t easy. Many fail after 3 or 5 years. I’m devoted to business.  I’ve been an entrepreneur for 20 years, and built it into a success.  

The year after 9/11 & again after 2008 were the most difficult periods to tough it out.  It’s been hard fought & I wouldn’t shutter the doors of my own business easily.  It affords me the opportunity to attend AWS popup loft hearing lectures, going to conferences & meetups & blogging about technology topics, & pivoting with the technological winds change.  

I’ve found all of this makes me extremely valuable to firms looking for expertise.  I have independence & perspective that’s hard to find.  I’m also there for firms that have been looking to fill a role, and need help sooner rather than later.

Also: A CTO must never do this

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Marching towards continuous deployment

cute code pipeline

If you’re like a lot of small dev teams & startups, you’ve dreamed of jumping on the continuous deployment train, but still aren’t quite there.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

You’ve got your code in some sort of repository. Now what? As it turns out the concepts aren’t terribly complicated. The hardest part is figuring out the process that works for your team.

1. Make a single script for deployment

Can you build easily? You want to take steps to simplify the build process & work towards everything being done from a single script. This might be an ant or maven script. It might be rake if you’re using ruby. Or it might be a makefile. Either way it organizes dependencies, checks your system & compiles things if necessary.

Also: Do startups need techops?

2. Do nightly builds

If you’re currently doing manual builds, work towards doing them nightly. Once you have that under your belt you can actually schedule these to happen automatically every night. But you’re not there yet. You want to work to improve the build process first. Work on the performance of this process. Quality is also important. Is the build quality poor?

Related: Is there a devops talent gap?

3. Is your build process slow?

If it takes a long time to do the build, it takes a long time to get to the point where you can smoke test. You want to shorten this time, so you can iterate faster. Look at ways to improve the overall performance of the whole chain. What’s it getting stuck on?

Read: 3 things devops can learn from aviation

4. Is your build quality poor?

Your tests are going to verify application functionality, do security checks, performance or even compliance checks. If these are often failing, you need dig in to find the source of your trouble. Tests aren’t specific enough? Or are you passing your tests, but finding a lot of bugs in QA?

It may take your team some time to get better at building the right tests, and reducing the bugs that show up in QA. But this will be crucial to increasing confidence level, where you’ll be ready to automate the whole pipeline. As you become more confident in your tests, then you’ll be confident to automatically deploy to production.

Also: How to deploy on Amazon EC2 with Vagrant

5. Evaluate tools to help

Continuous deployment is a lot about process. Once you’ve gotten a handle on that, you’ll have a much better idea of what you want out of the tools. Amazon’s own CodePipeline is one possible build server you can use. Because it’s a service, it’s one less server you don’t have to manage. And of course Jenkins is a popular option. Even with Jenkins there is a service based offering from CloudBees. You might also take a look at CircleCI, & Travis which are newer service based offerings, which although they don’t have all the plugins & integrations of Jenkins, they’ve learned from bumps in the road, and improved the formula.

We like CircleCI because it’s open source, smaller footprint than Jenkins, integrates with Slack & Hipchat, and has Docker support as well.

Also: 5 Tips for better database change management

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 essential tools for Redshift DBA’s

redshift dba tools

Like every other startup, you’ve built everything on AWS or are moving there quickly.

So it makes sense & is an easy win to build your analytics & bigdata engine on AWS too. Enter Redshift to the rescue.

Unlike the old days of Oracle where you had career DBAs to handle your data & hold the keys to the data kingdom, these days nobody is managing the data. What makes matters worse is the documentation is weak, and it hasn’t been out long enough to warrant good books on the topic.

But there’s hope!

Here are five great tools to help you in your day-to-day management of Redshift

Join 28,000 others and follow Sean Hull on twitter @hullsean.

The Github Redshift Utils Page has the whole package, while I link to individual scripts below.

1. Slow queries

Just like MySQL, Postgres & Oracle, your database performance & responsiveness lives & dies by the SQL that you write. And just like all those databases, you want to look at the slowest queries, to know where to tune. Spend your time on the worst offenders first.

Redshift Slow Queries Report.

Also: 5 Ways to get data into REdshift

2. Fragmented Tables

You add data, you delete data. And just like all the other relational databases we know & love, this process leaves gaps. New data is still added at the high water mark, and full table scans still read those empty blocks.

The solution is the vacuum command, but which tables should we run it on?

Run this script & find out which tables need maintenance.

Redshift Display tables needing vacuum

Related: Is Redshift outpacing Hadoop for Startups & bigdata

3. Analyze Schema Compression

Getting your schema compression right on your data in Redshift turns out to be essential to performance too. Each datatype has a recommended compression type that works best for it. You can let Redshift decide when it first loads data, but it doesn’t always guess correctly.

This tool will help you analyze your data, and set compression types properly.

Redshift analyze schema compression tool

Read: 5 Reasons to move data to Amazon Redshift

4. Audit Users

Once you have your Redshift cluster up & running, you’ll setup users for applications to connect as. Each will have different privileges for objects & schemas in your database. You’re auditing those right? 🙂

This script will output who can read & write what. Useful to ensure your application isn’t overly loose with permissions. Shoot for least privileges where possible.

Redshift Audit User Privileges.

Redshift Audit Table Privileges

Also: Is data your dirty little secret

5. Lambda Data Loader

Want to automatically load data into Redshift? Download this handy Lambda based tool to respond to S3 events. Whenever a new file appears, it’s data will get loaded into Redshift. Complete with metadata configuration control in Dynamodb.

Redshift Lambda Data Loader

Also: Why is everybody suddenly talking about Amazon Redshift?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Does AWS have a dirty little secret?

tell a secret

I was recently talking with a colleague of mine about where AWS is today. Obviously there companies are migrating to EC2 & the cloud rapidly. The growth rates are staggering.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

The question was…

“What’s good and bad with Amazon today?”

It’s an interesting question. I think there some dirty little secrets here, but also some very surprising bright spots. This is my take.

1. VPC is not well understood  (FAIL)

This is the biggest one in my mind.  Amazon’s security model is all new to traditional ops folks.  Many customers I see deploy in “classic EC2”.  Other’s deploy haphazerdly in their own VPC, without a clear plan.

The best practices is to have one or more VPCs, with private & public subnet.  Put databases in private, webservers in public.  Then create a jump box in the public subnet, and funnel all ssh connections through there, allow any source IP, use users for authentication & auditing (only on this box), then use google-authenticator for 2factor at the command line.  It also provides an easy way to decommission accounts, and lock out users who leave the company.

However most customers have done little of this, or a mixture but not all of it.  So GETTING TO BEST PRACTICES around vpc, would mean deploying a vpc as described, then moving each and every one of your boxes & services over there.  Imagine the risk to production services.  Imagine the chances of error, even if you’re using Chef or your own standardized AMIs.

Also: Are we fast approaching cloud-mageddon?

2. Feature fatigue (FAIL)

Another problem is a sort of “paradox of choice”.  That is that Amazon is releasing so many new offerings so quickly, few engineers know it all.  So you find a lot of shops implementing things wrong because they didn’t understand a feature.  In other words AWS already solved the problem.

OpenRoad comes to mind.  They’ve got media files on the filesystem, when S3 is plainly Amazon’s purpose-built service for this.  

Is AWS too complex for small dev teams & startups?

Related: Does Amazon eat it’s own dogfood? Apparently yes!

3. Required redundancy & automation  (FAIL)

The model here is what Netflix has done with ChaosMonkey.  They literally knock machines offline to test their setup.  The problem is detected, and new hardware brought online automatically.  Deploying across AZs is another example.  As Amazon says, we give you the tools, it’s up to you to implement the resiliency.

But few firms do this.  They’re deployed on Amazon as if it’s a traditional hosting platform.  So they’re at risk in various ways.  Of Amazon outages.  Of hardware problems under the VMs.  Of EBS network issues, of localized outages, etc.

Read: Is Amazon too big to fail?

4. Lambda  (WIN)

I went to the serverless conference a week ago.  It was exiting to see what is happening.  It is truely the *bleeding edge* of cloud.  IBM & Azure & Google all have a serverless offering now.  

The potential here is huge.  Eliminating *ALL* of the server management headaches, from packages to config management & scaling, hiding all of that could have a huge upside.  What’s more it takes the on-demand model even further.  YOu have no compute running idle until you hit an endpoint.  Cost savings could be huge.  Wonder if it has the potential to cannibalize Amazon’s own EC2 …  we’ll see.

Charity Majors wrote a very good critical piece – WTF is Operations? #serverless
WTF is operations? #serverless

Patrick Dubois 

Also: Is the difference between dev & ops a four-letter word?

5. Redshift  (WIN)

Seems like *everybody* is deploying a data warehouse on Redshift these days.  It’s no wonder, because they already have their transactional database, their web backend on RDS of some kind.  So it makes sense that Amazon would build an offering for reporting.

I’ve heard customers rave about reports that took 10 hours on MySQL run in under a minute on Redshift.  It’s not surprising because MySQL wasn’t built for the size servers it’s being deployed on today.  So it doesn’t make good use of all that memory.  Even with SSD drives, query plans can execute badly.

Also: Is there a better way to build a warehouse in 2016?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is there a new better way to build a data warehouse in 2016?

redshift warehouse

In the old days… the bygone days of 2005 🙂 That was when you’d pony up for an Oracle license, get the hardware, and build your warehouse. Somewhere along the way you crossed your fingers.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Today everybody wants to treat data as a product. And for good reason. Knowing how to better server your customers & iterate more quickly is essential in todays hypercompetitive startup world.

1. Amazon Redshift enters the fray

Recently I’ve been wondering why is everyone suddenly talking about Amazon Redshift?? I ask not because recruiters are experts at database technology & predicting the industry trends, but rather because they have their finger on the pulse of what firms are doing.

Amazon launched Redshift in early 2013 using ParAccel technology. Adoption has been quick. Customers who already have their data in the AWS ecosystem find the offering a perfect match for their data analytics needs. And with stories swirling around of 10 hour MySQL reports running in under 60 seconds on Redshift, it’s no wonder.

Also: Is AWS too complex for small dev teams?

2. Old method – select carefully

Ralph Kimball’s opus having fully digested, you set out to meet with stakeholders, and figure out what you were building.

Of course no one understood your questions, and business units & engineering teams spoke english & french. Months went by, and things devolved. Morale got squashed. Eventually out the other end something would be built, nobody would be happy, and eyeballs would roll over the dollars spent.

This model was known in the data warehousing world by the wonderful acronym ETL which is short for extract, transform & load. The transform part happens before you load it. So that your warehouse is a shining, trimmed & manicured copy of your data, ready for reporting.

Also: Is Amazon too big to fail?

3. Today – mirror everything & then build views

Today you’re more likely to see the ELT model employed. That is Extract, Load & Transform. A subtle change, with big differences. When you load first, you mirror all of your transactional data into your warehouse, then build views or new summary tables to fit your ongoing needs.

Customers are using tools like Looker & Tableau to layer on top of these ELT warehouses which are also have some intelligence around the transform piece. This makes the process more self serve for business units, and requires less back & forth between engineering & product teams. No more waiting a few days for a report to be built, because these non-technical teams can build for themselves.

Also: When hosting data on Amazon turns bloodsport?

Is Data your dirty little secret?

4. Pipeline services

So you’re going down the ELT path, but how do get your data into Redshift? I wrote Five ways to get data into Redshift to answer that question.

There are a number of service based offerings from the point & click Fivetran to the more full featured Alooma. And then RJ Metrics & Flydata also fit the bill. You may also want to build your own with xplenty that also has a lot of ELT ETL logic you can build without code. Pretty spiffy.

Read: Is aws a patient that needs constant medication?

5. Reporting databases

We’ll be covering a lot lot more in this space, so check back.

Related: Does Amazon eat it’s own dogfood?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Are career promotions like marriage… appealing until your first divorce?

surge pricing engineers

I was recently flipping through an interesting email list. It’s focused for tech leaders, managers & startup entrepreneurs. An HR team lead posted asking about “promotion paths” for engineers.

While I have an intuitive grasp of what engineers at those different levels look like, I’m having trouble making those concrete.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

It struck me how antiquated the whole “career ladder” concept is. Work one job for 20-30 years. It feels like the fairytale of dating that leads safely to marriage. It all seems like a wonderful plan until it fizzles out, employees get jaded, they start seeing the real money being paid elsewhere, and begin looking around.

1. Talent in short supply

I’m not a CTO.  I should preface with that bit.  I’m a consultant.  That said I’ve worked in the tech industry for 20 years, so I have a bit of an opinion here.

Going to meetups, startup industry & pitch events. They’re all like a feeding frenzy. There are more companies hiring now than I remember back in 1998 & 1999. It’s just crazy.

Angel List says 18,000 companies are hiring right now. What about Made In NYC? That shows 735 jobs. And of course there’s Ycombinator who is hiring April 2016, which posts every other month. It has 720 comments as of this writing.

Also: Why I don’t work with recruiters

2. Are salary jumps always larger through external promotion?

I’ve seen a pattern repeated over & over.  An outside firm offers more money & grabs the talent, or the talent gets restless, starts looking & finds they get a bigger bump in salary by leaving, than by internal promotions.  

I don’t know why this is, but it seems almost universal that salary jumps are larger from outside firms, than internally through promotion.  

Also: Why devops talent is so hard to find

3. Building a better ladder

There are great posts on engineering ladders like this one from Neo and also this one from RTR. Also take a look at this one at Artsy. And of course somebody has to go and put theirs up on github. 🙂

All the titles & internal shuffling in the world aren’t going to hide industry pay for long.  When an employee gets wise to their career & the skills marketplace, they’ll eventually learn that title does not equal compensation.

Related: How to hire a developer that doesn’t suck?

4. Building a better culture

In a pricey city like New York, the only thing that seems a counterweight to this is phenomenal culture, chance to build something cool & be surrounded by coworkers you love.  To be sure bouncing around you get less of this. Companies like Etsy comes to mind. According to glassdoor companies like Airbnb, Hubspot & facebook also fit the bill.

Read: 8 questions to ask an aws expert

5. Surge pricing for engineers?

Alternatively to better ladders & promotions, perhaps what Uber did for taxi driving would make sense for hiring engineers too. Let the freelancing phenomenon grow even bigger!

Perhaps we need surge pricing for engineers. That way the very best really do get rewarded the most. Let the marketplace work it’s magic.

Also: When you have to take the fall

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters