Category Archives: Cloud Computing

Is Amazon about to disrupt your data warehouse?

via GIPHY

Amazon is about to launch a product called glue. As you can see below, this is the last piece in the data warehousing puzzle. With that in place, Amazon will own you! Or at least have push button products to meet all of enterprises varying needs.

Even if you’re a small startup, you can do big-shot big enterprise data warehousing. That means everyone can use cutting edge data driven techniques for product & business decisions.

Join 33,000 others and follow Sean Hull on twitter @hullsean.

What is Redshift

Redshift is like the OLAP databases of years past, the Oracle’s of the world purpose built for warehousing data. Obviously without the crazy licensing model Oracle was famous for. With Amazon you can get enterprise class data warehouse for modest hourly prices.

If my recent conversations with recruiters about Redshift demand are any indication, there’s been a sudden uptick in startups looking for redshift expertise.

Also: Top serverless interview questions for hiring aws lambda experts

What is Spectrum?

Spectrum is a very new extension of Redshift allowing you to access & query S3 file data directly. This means you can have petabytes of data that you can access pre-load time. So you will ETL and load portions of it, but with Spectrum you can still access the offline data too.

In the old Oracle days this was called an EXTERNAL TABLE. I mention this only to say that Amazon isn’t doing anything that hasn’t been done before. Rather they’re bringing these advanced features within reach of everyday startups. That’s cool.

Related: Which engineering roles are in greatest demand?

What is glue?

Glue is still in beta, but if the RE:Invent talk above is any indication, it’s set to disrupt an entire industry. Wow!

Glue first catalogs your data sources. What does this mean, it scans them & models their schemas.

It then generates sample python ETL code. Modify it, or write your own. Share your code on Git. Or borrow other open source pieces, that already address your specific ETL use case!

Lastly it includes a job scheduler which handles dependencies. Job A must be completed before B can run and so forth. Error handling & logging are also all included.

Since these are native Amazon services, of course they’re going to integrate with their dangerously fast Redshift warehouse.

Read: Can on-demand consulting save startups time & money?

What is serverless?

I’ve written about how to throw fastballs at a serverless fanboy and even how to hire a serverless expert. But really what is it?

Serverless means deploying functions directly into the cloud. No servers, no configuration. All the systems administration & automation is hidden. No more devops to argue with! Amazon’s own offering is called Lambda.

Also: 30 questions to ask a serverless fanboy

What is Quicksight?

Amazon’s even jumped into the fray at the presentation layer. Quicksight is a BI tool along the lines of mode, domo, looker or Tableau.

Now it’s possible to stay completely within the cozy Amazon ecosystem even for business insight and analytics.

Also: What can startups learn from the DYN DNS outage?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

30 questions to ask a serverless fanboy

Everyone is hot under the collar again. So-called serverless or no-ops services are popping up everywhere allowing you to deploy “just code” into the cloud. Not only won’t you have to login to a server, you won’t even have to know they’re there.

As your code is called, but cloud events such a file upload, or hitting an http endpoint, your code runs. Behind the scene through the magic of containers & autoscaling, Amazon & others are able to provision in milliseconds.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Pretty cool. Yes even as it outsources the operations role to invisible teams behind Amazon Lambda, Google Cloud Functions or Webtask it’s also making companies more agile, and allowing startup innovation to happen even faster.

Believe it or not I’m a fan too.

That said I thought it would be fun to poke a hole in the bubble, and throw some criticisms at the technology. I mean going serverless today is still bleeding edge, and everyone isn’t cut out to be a pioneer!

With that, here’s 30 questions to throw on the serverless fanboys (and ladies!)…

1. Security

o Are you comfortable removing the barrier around your database?
o With more services, there is more surface area. How do you prevent malicious code?
o How do you know your vendor is doing security right?
o How transparent is your vendor about vulnerabilities?

Also: Myth of five nines – Why high availability is overrated

2. Testing

o How do you do integration testing with multiple vendor service components?
o How do you test your API Gateway configurations?
o Is there a way to version control changes to API Gateway configs?
o Can Terraform or CloudFormation help with this?
o How do you do load testing with a third party db backend?
o Are your QA tests hitting the prod backend db?
o Can you easily create & destroy test dbs?

Related: 5 ways to move data to amazon redshift

3. Management

o How do you do zero downtime deployments with Lambda?
o Is there a way to deploy functions in groups, all at once?
o How do you manage vendor lock-in at the monitoring & tools level but also code & services?
o How do you mitigate your vendors maintenance? Downtime? Upgrades?
o How do you plan for move to alternate vendor? Database import & export may not be ideal, plus code & infrastructure would need to be duplicated.
o How do you manage a third party service for authentication? What are the pros & cons there?
o What are the pros & cons of using a service-based backend database?
o How do you manage redundancy of code when every client needs to talk to backend db?

Read: Why were dev & ops siloed job roles?

4. Monitoring & debugging

o How do you build a third-party monitoring tool? Where are the APIs?
o When you’re down, is it your app or a system-wide problem?
o Where is the New Relic for Lambda?
o How do you degrade gracefully when using multiple vendors?
o How do you monitor execution duration so your function doesn’t fail unexpectedly?
o How do you monitor your account wide limits so dev deploy doesn’t take down production?

Also: Are SQL databases dead?

5. Performance

o How do you handle startup latency?
o How do you optimize code for mobile?
o Does battery life preclude a large codebase on client?
o How do you do caching on server when each invocation resets everything?
o How do you do database connection pooling?

Also: Is Amazon too big to fail?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Some irresistible reading for March – outages, code, databases, legacy & hiring

via GIPHY

I decided this week to write a different type of blog post. Because some of my favorite newsletters are lists of articles on topics of the day.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Here’s what I’m reading right now.

1. On Outages

While everyone is scrambling to figure out why part of the internet went down … wait is S3 is part of the internet, really? While I’m figuring out if it is a service of Amazon, or if Amazon is so big that Amazon *is* the internet now…

Let’s look at s3 architectural flaws in depth.

Meanwhile Gitlab had an outage too in which they *gasp* lost data. Seriously? An outage is one thing, losing data though. Hmmm…

And this article is brilliant on so many levels. No least because Matthew knows that “post truth” is a trending topic now, and uses it his title. So here we go, AWS Service status truth in a post truth world. Wow!

And meanwhile the Atlantic tries to track down where exactly are those Amazon datacenters?

Also: Is Amazon too big to fail?

2. On Code

Project wise I’m fiddling around with a few fun things.

Take a look at Guy Geerling’s Ansible on a Mac playbooks. Nice!

And meanwhile a very nice deep dive on Amazon Lambda serverless best practices.

Brandur Leach explains how to build awesome APIs aka ones that are robust & idempotent

Meanwhile Frans Rosen explains how to 0wn slack. And no you don’t want this. ūüôā

Related: 5 surprising features in Amazon’s serverless Lambda offering

3. On Hiring & Talent

Are you a rock star dev or a digital nomad? Take a look at the 12 best international cities to live in for software devs.

And if you’re wondering who’s hiring? Well just about everyone!

Devs are you blogging? You should be.

Looking to learn or teach… check out codementor.

Also: why did dev & ops used to be separate job roles?

4. On Legacy Systems

I loved Drew Bell’s story of stumbling into home ownership, attempting to fix a doorbell, and falling down a familiar rabbit hole. With parallels to legacy software systems… aka any older then oh say five years?

Ian Bogost ruminates why nothing works anymore… and I don’t think an hour goes by where I don’t ask myself the same question!

Also: Are we fast approaching cloud-mageddon?

5. On Databases

If you grew up on the virtual world of the cloud, you may have never touched hardware besides your own laptop. Developing in this world may completely remove us from understanding those pesky underlying physical layers. Yes indeed folks containers do run in “virtual” machines, but those themselves are running on metal, somewhere down the stack.

With that let’s not forget that No, databases are not for containers… but a healthy reminder ain’t bad..

Meanwhile Larry’s mothership is sinking…(hint: Oracle) Does anybody really care? Now’s the time to revisit Mike Wilson’s classic The difference between god and Larry Ellison.

Read: Are SQL Databases Dead?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

As cloud expands, does legacy grow too?

I was recently reading Drew Bell’s post Legacy systems are everywhere. It struck a deep chord for me.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Drew first touches on a story of upgrading an application with legacy components, taking pieces offline, and rebuilding to eliminate technical debt.

He then tells a parallel story of renovations in his new home. Well new for him, but an old building, with old building problems.

I’ve gone through some similar experiences so I thought I’d share some of those.

o A publishing company on AWS

I worked with one company in publishing. They had built a complex automation pipeline to deploy code. As a lead engineer planned to exit, I was brought in to provide support during transition. As with large complex websites, there was a lot that was done right, and some things done in old ways. Documenting all the pieces and digging up the dead bodies was a big part of the job.

Also: Is a dangerous anti-ops movement gaining momentum?

o Renovating a kitchen

In parallel to the above project, I was renovating my kitchen, in a new home in Brooklyn. Taking on this project myself, I dutifully assembled IKEA cabinents, and laid them out to spec. As I began the painstaking process of leveling for the countertop, I ran into trouble. Measurement after measurement didn’t add up. It seemed one section was shorter than another, where the counter should go.

Since I needed to add support for a dishwasher, that had to be measured correctly. Yet the level tool told a different story than the yardstick. Finally after thinking about it for a few hours, I put the level on the floor itself. Turns out the floor wasn’t level! That explained why cabinets were shorter in one area than another.

Also: How do we lock down systems from disgruntled engineers?

o Legacy in 5-7 years?

Complex systems like software, exhibit a lot of the same surprises as old buildings. That was one surprise I wasn’t expecting. As houses are renovated on the 15-30 year timeframe, software seems to experience a five to seven year cycle.

Whether a consequence of shifting sands in the underlying stack, databases, frameworks or cloud components, or the changing needs of product & customers

Also: Is AWS a patient that needs constant medication?

o Opportunity everywhere

As companies large & small migrate pieces of their systems to the cloud, move to microservices or rebuild on serverless, the opportunities are endless. It seems every firm is renovating their kitchen these days, putting on a new roof or upgrading their data pipeline.

Also: Is AWS too big to fail?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What products & improvements are new on AWS?

Amazon is releasing new products & services to it’s global cloud compute network at a rate that has all of our heads spinning.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Here’s new stuff worth mentioning around databases & data.

1. For ETL – AWS GLUE

Moving data from your transactional MySQL or Arora database to your reporting database isn’t always easy.

In the past you could use a service like xplenty or Alooma.

Now Amazon themselves are getting into the ETL game, providing a new service called Glue.

Also: RDS or Mysql? 10 use cases

2. Query S3 with Athena

Chances are if you’re using AWS for anything, you’ve got data in S3. And wouldn’t it be nice to pick that apart and dig through it, where it sits?

Oracle had a feature called “external tables” and MySQL had something similar. Now Amazon is offering that native within it’s own cloud universe. Thanks to some tricky lambda code, now you can do that. Don’t worry how they did it, because it’s been packaged into a nice easy service for your use!

Related: When you have to take the fall – consulting war stories

3. Business Intelligence with QuickSight

If you’re a data driven startup, and who isn’t these days, you’re going to have a business unit building reports. Tableau or Looker may be in your wheelhouse.

Amazon is obviously seeing the opportunity here, and competing with their own partners. Check out Amazon Quicksight for details.

Read: Is upgrading RDS like a sh*t storm that will not end?

4. Expanded RDS

RDS is obviously a very popular offering. And even though zero downtime is very hard to achieve with RDS, you’ll save plenty on DBAs and admins you don’t have to hire!

If you hadn’t heard, there is now MariaDB support. And with it, there’s a migration from MySQL to Mariadb as well.

Using Mariadb may bring you performance advantages & improvements. But RDS may mitigate this by productize & standarizing things.

You can also now move encrypted snapshots across regions. In my view this isn’t really a new feature, but rather fixing something that was broken before. The previous limitation was really more a symptom of their global network of data centers, than any built feature per se.

Also: Is the difference between dev & ops a four-letter word?

5. Expanded Redshift

As I’ve blogged before, everybody is excited about Redshift these days.

Amazon has introduced some new features.

o better loading of sorted data

This is done behind the scenes to load data quickly, and keep it stored efficiently. No more vacuuming after a big load!

o user & database rate limiting

Limit connections on a per user or per database level. Useful!

o storage estimates on analyze

When you perform the analyze command, you can get storage information so it’s easier to decide datatypes & compression type. Nifty!

Also: Is Redshift outpacing Hadoop as the big data warehouse for startups?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How can startups learn from the Dyn DNS outage?

storm coming

As most have heard by now, last Friday saw a serious DDOS attack against one of the major US DNS providers, Dyn.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

DNS being such a critical dependency, this affected many businesses across the board. We’re talking twitter, etsy, github, Airbnb & Reddit to name just a few. In fact Amazon Web Services itself was severely affected. And with so many companies hosting on the Amazon cloud, it’s no wonder this took down so much of the internet.

1. What happened?

According to Brian Krebs, a Mirai botnet was responsible for the attack. What’s even scarier, those requests originated for IOT devices. You know, baby monitors, webcams & DVRs. You’ve secured those right? ūüôā

Brian has posted a list of IOT device makers that have backdoors & default passwords and are involved. Interesting indeed.

Also: Is a dangerous anti-ops movement gaining momentum?

2. What can be done?

Companies like Dyn & Cloudflare among others spend plenty of energy & engineering resources studying attacks like this one, and figuring out how to reduce risk exposure.

But what about your startup in particular? How can we learn from these types of outages? There are a number of ways that I outline below.

Also: How do we lock down systems from disgruntled engineers?

3. What are your dependencies?

After an outage like the Dyn one, it’s an opportunity to survey your systems. Take stock of what technologies, software & services you rely on. This is something your ops team can & likely wants to do.

What components does your stack rely on? Which versions are hardest to upgrade? What hardware or services do you rely on? Which APIs do you call out to? Which steps or processes are still manual?

Related: The myth of five nines

4. Put your eggs in many baskets

Awareness around your dependencies, helps you see where you may need to build in redundancy. Can you setup a second cloud provider for DR? Can you use an alternate API to get data, when your primary is out? For which dependencies are your hands tied? Where are your weaknesses?

Read: Is AWS too complex for small dev teams?

5. Don’t assume five nines

The gold standard in technology & startup land has been 5 nines availability. This is the SLA we’re expected to shoot for. I’ve argued before (see: myth of five nines) that it’s rarely ever achieved. Outages like this one, bringing hours long downtime, kill hour 5 nines promise for years. That’s because 5 nines means only 5 ¬Ĺ minutes downtime per year!

Better to be realistic that outages can & will happen, manage & mitigate, and be realistic with your team & your customers.

Also: Is AWS a patient that needs constant medication?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How I use Terraform & Composer to automate wordpress on aws

iRobot1

How I setup wordpress to deploy automatically on aws

You want to make your wordpress site bulletproof? No server outage worries? Want to make it faster & more reliable. And also host on cheaper components?

I was after all these gains & also wanted to kick the tires on some of Amazon’s latest devops offerings. So I plotted a way forward to completely automate the deployment of my blog, hosted on wordpress.

Here’s how!

Join 28,000 others and follow Sean Hull on twitter @hullsean.

The article is divided into two parts…

Deploy a wordpress site on aws – decouple assets (part 1)

In this one I decouple the assets from the website. What do I mean by this? By moving the db to it’s own server or RDS of even simpler management, it means my server can be stopped & started or terminated at will, without losing all my content. Cool.

You’ll also need to decouple your assets. Those are all the files in the uploads directory. Amazon’s S3 offering is purpose built for this use case. It also comes with easy cloudfront integration for object caching, and lifecycle management to give your files backups over time. Cool !

Deploy a wordpress site on aws – automate (part 2)

The second part we move into all the automation pieces. We’ll use PHP’s Composer to manage dependencies. That’s fancy talk for fetching wordpress itself, and all of our plugins.

1. Isolate your config files

Create a directory & put your config files in it.

$ mkdir iheavy
$ cd iheavy
$ touch htaccess
$ touch httpd.conf
$ touch wp-config.php
$ touch a_simple_pingdom_test.php
$ touch composer.json
$ zip -r iheavy-config.zip *
$ aws s3 cp iheavy-config.zip s3://my-config-bucket/

In a future post we’re going to put all these files in version control. Amazon’s CodeCommit is feature compatible with Github, but integrated right into your account. Once you have your files there, you can use CodeDeploy to automatically place files on your server.

We chose to leave this step out, to simplify the server role you need, for your new EC2 webserver instance. In our case it only needs S3 permissions!

Also: When devops means resistance to change

2. Build your terraform script

Terraform is a lot like Vagrant. I wrote a Howto deploy on EC2 with Vagrant article a couple years ago.

The terraform configuration formalizes what you are asking of Amazon’s API. What size instance? Which AMI? What VPC should I launch in? Which role should my instance assume to get S3 access it needs? And lastly how do we make sure it gets the same Elastic IP each time it boots?

All the magic is inside the terraform config.

Here’s what I see:

levanter:~ sean$ cat iheavy.tf

resource "aws_iam_role" "web_iam_role" {
    name = "web_iam_role"
    assume_role_policy = <

And here's what it looks like when I ask terraform to build my infrastructure:

levanter:~ sean$ terraform apply
aws_iam_instance_profile.web_instance_profile: Refreshing state... (ID: web_instance_profile)
aws_iam_role.web_iam_role: Refreshing state... (ID: web_iam_role)
aws_s3_bucket.apps_bucket: Refreshing state... (ID: iheavy)
aws_iam_role_policy.web_iam_role_policy: Refreshing state... (ID: web_iam_role:web_iam_role_policy)
aws_instance.iheavy: Refreshing state... (ID: i-14e92e24)
aws_eip.bar: Refreshing state... (ID: eipalloc-78732a47)
aws_instance.iheavy: Creating...
  ami:                               "" => "ami-1a249873"
  availability_zone:                 "" => ""
  ebs_block_device.#:                "" => ""
  ephemeral_block_device.#:          "" => ""
  iam_instance_profile:              "" => "web_instance_profile"
  instance_state:                    "" => ""
  instance_type:                     "" => "t1.micro"
  key_name:                          "" => "iheavy"
  network_interface_id:              "" => ""
  placement_group:                   "" => ""
  private_dns:                       "" => ""
  private_ip:                        "" => ""
  public_dns:                        "" => ""
  public_ip:                         "" => ""
  root_block_device.#:               "" => ""
  security_groups.#:                 "" => ""
  source_dest_check:                 "" => "true"
  subnet_id:                         "" => "subnet-1f866434"
  tenancy:                           "" => ""
  user_data:                         "" => "ca8a661fffe09e4392b6813fbac68e62e9fd28b4"
  vpc_security_group_ids.#:          "" => "1"
  vpc_security_group_ids.2457389707: "" => "sg-46f0f223"
aws_instance.iheavy: Still creating... (10s elapsed)
aws_instance.iheavy: Still creating... (20s elapsed)
aws_instance.iheavy: Creation complete
aws_eip.bar: Modifying...
  instance: "" => "i-6af3345a"
aws_eip_association.eip_assoc: Creating...
  allocation_id:        "" => "eipalloc-78732a47"
  instance_id:          "" => "i-6af3345a"
  network_interface_id: "" => ""
  private_ip_address:   "" => ""
  public_ip:            "" => ""
aws_eip.bar: Modifications complete
aws_eip_association.eip_assoc: Creation complete

Apply complete! Resources: 2 added, 1 changed, 0 destroyed.

The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.

State path: terraform.tfstate
levanter:~ sean$ 

Also: Is Amazon too big to fail?

3. Use Composer to automate wordpress install

There is a PHP package manager called composer. It manages dependencies and we depend on a few things. First WordPress itself, and second the various plugins we have installed.

The file is a JSON file. Pretty vanilla. Have a look:

{
    "name": "acme/brilliant-wordpress-site",
    "description": "My brilliant WordPress site",
    "repositories":[
        {
            "type":"composer",
            "url":"https://wpackagist.org"
        }
    ],
    "require": {
       "aws/aws-sdk-php":"*",
	    "wpackagist-plugin/medium":"1.4.0",
       "wpackagist-plugin/google-sitemap-generator":"3.2.9",
       "wpackagist-plugin/amp":"0.3.1",
	    "wpackagist-plugin/w3-total-cache":"0.9.3",
	    "wpackagist-plugin/wordpress-importer":"0.6.1",
	    "wpackagist-plugin/yet-another-related-posts-plugin":"4.0.7",
	    "wpackagist-plugin/better-wp-security":"5.3.7",
	    "wpackagist-plugin/disqus-comment-system":"2.74",
	    "wpackagist-plugin/amazon-s3-and-cloudfront":"1.1",
	    "wpackagist-plugin/amazon-web-services":"1.0",
	    "wpackagist-plugin/feedburner-plugin":"1.48",
       "wpackagist-theme/hueman":"*",
	    "php": ">=5.3",
	    "johnpbloch/wordpress": "4.6.1"
    },
    "autoload": {
        "psr-0": {
            "Acme": "src/"
        }
    }

}

Read: Is aws a patient that needs constant medication?

4. build your user-data script

This captures all the commands you run once the instance starts. Update packages, install your own, move & configure files. You name it!

#!/bin/sh

yum update -y
yum install emacs -y
yum install mysql -y
yum install php -y
yum install git -y
yum install aws-cli -y
yum install gd -y
yum install php-gd -y
yum install ImageMagick -y
yum install php-mysql -y


yum install -y httpd24 
service httpd start
chkconfig httpd on

# configure mysql password file
echo "[client]" >> /root/.my.cnf
echo "host=my-rds.ccccjjjjuuuu.us-east-1.rds.amazonaws.com" >> /root/.my.cnf
echo "user=root" >> /root/.my.cnf
echo "password=abc123" >> /root/.my.cnf


# install PHP composer
export COMPOSE_HOME=/root
echo "installing composer..."
php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');"
php -r "if (hash_file('SHA384', 'composer-setup.php') === 'e115a8dc7871f15d853148a7fbac7da27d6c0030b848d9b3dc09e2a0388afed865e6a3d6b3c0fad45c48e2b5fc1196ae') { echo 'Installer verified'; } else { echo 'Installer corrupt'; unlink('composer-setup.php'); } echo PHP_EOL;"
php composer-setup.php
php -r "unlink('composer-setup.php');"
mv composer.phar /usr/local/bin/composer


# fetch config files from private S3 folder
aws s3 cp s3://iheavy-config/iheavy_files.zip .

# unzip files
unzip iheavy_files.zip 

# use composer to get wordpress & plugins
composer update

# move wordpress software
mv wordpress/* /var/www/html/

# move plugins
mv wp-content/plugins/* /var/www/html/wp-content/plugins/

# move pingdom test
mv a_simple_pingdom_test.php /var/www/html

# move htaccess
mv htaccess /var/www/html/.htaccess

# move httpd.conf
mv iheavy_httpd.conf /etc/httpd/conf.d

# move our wp-config into place
mv wp-config.php /var/www/html

# restart apache
service httpd restart

# allow apache to create uploads & any files inside wp-content
chown apache /var/www/html/wp-content

You can monitor things as they're being installed. Use ssh to reach your new instance. Then as root:

$ tail -f /var/log/cloud-init.log

Related: Does Amazon eat it's own dogfood?

5. Time to test

Visit the domain name you specified inside your /etc/httpd/conf.d/mysite.conf

You have full automation now. Don't believe me? Go ahead & TERMINATE the instance in your aws console. Now drop back to your terminal and do:

$ terraform apply

Terraform will figure out that the resources that *should* be there are missing, and go ahead and build them for you. AGAIN. Fully automated style!

Don't forget your analytics beacon code

Hopefully you remember how your analytics is configured. The beacon code makes an API call everytime a page is loaded. This tells google analytics or other monitoring systems what your users are doing, and how much time they're spending & where.

This typically goes in the header.php file. We'll leave it as an exercise to automate this piece yourself!

Also: Is AWS too complex for small dev teams?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don't work with recruiters

5 things you didn’t know about Dynamodb that are hurting you bad

amazon-dynamo-db

If you’re like a lot of folks you’re building an application in AWS & using a NoSQL database for persistent data. Dynamodb fits the bill nicely. Little or no ops to worry about, at least in the traditional sense.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

However there are knobs to turn & dials to set. Here are a few you should be thinking about.

1. You can replicate across regions

Dynamodb introduced a feature in 2015 called streams. If you come from the relational database world, you can think of streams like a transaction log. It captures before & after image of your data. Couple those with useful lambda functions, and you have triggers that can do anything you want.

Turns out Amazon have been all over this, and already build a library to do cross-region replication with streams. Pretty cool!

Also: Is aws too complex for small dev teams?

2. You can manage retrieval costs

Dynamodb automatically creates and manages an index on the primary key. But chances are that your application will read data based on other columns too. You can create secondary indexes on these other columns, reducing your data access patterns. Without an index Dynamodb would have to scan every row to find your data, but the index can dramatically reduce this, and making data retrieval faster too!

Related: Does Amazon eat it’s own dogfood?

3. You can do SQL Like queries

That’s right, if you thought NoSQL meant no SQL you were only half right. By loading your Dynamodb data into HDFS, you can allow elastic map reduce to have at it. And thus open the door to use HiveQL to query the data the way you wanted to in the first place.

Convoluted? Yes. But this is the brave new world of the cloud!

Read: Is AMazon too big to fail?

4. Partitions are handy & useful

By default dynamo is partitioning your data behind the scenes. Because that’s what good distributed databases are supposed to do. It does so using the primary key to figure out where the data should go. And just like with Redshift you have option of also using sort key to help the optimizer figure out how to distribute the data. This is important. Going across those different instances brings a lot of latency costs that will surprise you.

Also: When hosting data on Amazon turned bloodsport

5. Metrics are your partner in performance

CloudWatch provides all sorts of instrumentation for Dynamodb. Read & write activity, throttling, errors & latency are just a few of the things you can see.

Also: Is aws the patient that needs constant medication?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How do we lock down cloud systems from disgruntled engineers?

CommitStrip.com

I worked at a customer last year, on a short term assignment. A brilliant engineer had built their infrastructure, automated deployments, and managed all the systems. Sadly despite all the sleepless nights, and dedication, they hadn’t managed to build up good report with management.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I’ve seen this happen so many times, and I do find it a bit sad. Here’s an engineer who’s working his butt off, really wants the company to succeed. Really cares about the systems. But doesn’t connect well with people, often is dismissive, disrespectful or talks down to people like they’re stupid. All of this burns bridges, and there’s a lot of bad feelings between all parties.

How do you manage the exit process? Here’s a battery of recommendations for changing credentials & logins so that systems can’t be accessed anymore.

1. Lock out API access

You can do this by removing the administrator role or any other role their IAM user might have. That way you keep the account around *just in case*. This will also prevent them from doing anything on the console, but you can see if they attempt any logins.

Also: Is AWS too complex for small dev teams?

2. Lock out of servers

They may have the private keys for various serves in your environment. So to lock them out, scan through all the security groups, and make sure their whitelisted IPs are gone.

Are you using a bastion box for access? That’s ideal because then you only have one accesspoint. Eliminate their login and audit access there. Then you’ve covered your bases.

Related: Does Amazon eat it’s own dogfood?

3. Update deployment keys

At one of my customers the outgoing op had setup many moving parts & automated & orchestrated all the deployment processes beautifully. However he also used his personal github key inside jenkins. So when it went to deploy, it used those credentials to get the code from github. Oops.

We ended up creating a company github account, then updating jenkins with those credentials. There were of course other places in the capistrano bits that also needed to be reviewed.

Read: Is aws a patient that needs constant medication?

4. Update dashboard logins

Monitoring with NewRelic or Nagios? Perhaps you have a centralized dashboard for your internal apps? Or you’re using Slack?

Also: Is Amazon too big to fail?

5. Audit Non-key based logins

Have some servers outside of AWS in a traditional datacenter? Or even servers in AWS that are using usernames & passwords? Be sure to audit the full list of systems, and change passwords or disable accounts for the outgoing sysop.

Also: When hosting data on Amazon turns bloodsport?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Does AWS have a dirty little secret?

tell a secret

I was recently talking with a colleague of mine about where AWS is today. Obviously there companies are migrating to EC2 & the cloud rapidly. The growth rates are staggering.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

The question was…

“What’s good and bad with Amazon today?”

It’s an interesting question. I think there some dirty little secrets here, but also some very surprising bright spots. This is my take.

1. VPC is not well understood  (FAIL)

This is the biggest one in my mind. ¬†Amazon’s security model is all new to traditional ops folks. ¬†Many customers I see deploy in “classic EC2”. ¬†Other’s deploy haphazerdly in their own VPC, without a clear plan.

The best practices is to have one or more VPCs, with private & public subnet.  Put databases in private, webservers in public.  Then create a jump box in the public subnet, and funnel all ssh connections through there, allow any source IP, use users for authentication & auditing (only on this box), then use google-authenticator for 2factor at the command line.  It also provides an easy way to decommission accounts, and lock out users who leave the company.

However most customers have done little of this, or a mixture but not all of it. ¬†So GETTING TO BEST PRACTICES around vpc, would mean deploying a vpc as described, then moving each and every one of your boxes & services over there. ¬†Imagine the risk to production services. ¬†Imagine the chances of error, even if you’re using Chef or your own standardized AMIs.

Also: Are we fast approaching cloud-mageddon?

2. Feature fatigue (FAIL)

Another problem is a sort of “paradox of choice”. ¬†That is that Amazon is releasing so many new offerings so quickly, few engineers know it all. ¬†So you find a lot of shops implementing things wrong because they didn’t understand a feature. ¬†In other words AWS already solved the problem.

OpenRoad comes to mind. ¬†They’ve got media files on the filesystem, when S3 is plainly Amazon’s purpose-built service for this. ¬†

Is AWS too complex for small dev teams & startups?

Related: Does Amazon eat it’s own dogfood? Apparently yes!

3. Required redundancy & automation  (FAIL)

The model here is what Netflix has done with ChaosMonkey. ¬†They literally knock machines offline to test their setup. ¬†The problem is detected, and new hardware brought online automatically. ¬†Deploying across AZs is another example. ¬†As Amazon says, we give you the tools, it’s up to you to implement the resiliency.

But few firms do this. ¬†They’re deployed on Amazon as if it’s a traditional hosting platform. ¬†So they’re at risk in various ways. ¬†Of Amazon outages. ¬†Of hardware problems under the VMs. ¬†Of EBS network issues, of localized outages, etc.

Read: Is Amazon too big to fail?

4. Lambda  (WIN)

I went to the serverless conference a week ago.  It was exiting to see what is happening.  It is truely the *bleeding edge* of cloud.  IBM & Azure & Google all have a serverless offering now.  

The potential here is huge. ¬†Eliminating *ALL* of the server management headaches, from packages to config management & scaling, hiding all of that could have a huge upside. ¬†What’s more it takes the on-demand model even further. ¬†YOu have no compute running idle until you hit an endpoint. ¬†Cost savings could be huge. ¬†Wonder if it has the potential to cannibalize Amazon’s own EC2 … ¬†we’ll see.

Charity Majors wrote a very good critical piece – WTF is Operations? #serverless
WTF is operations? #serverless

Patrick Dubois 

Also: Is the difference between dev & ops a four-letter word?

5. Redshift  (WIN)

Seems like *everybody* is deploying a data warehouse on Redshift these days. ¬†It’s no wonder, because they already have their transactional database, their web backend on RDS of some kind. ¬†So it makes sense that Amazon would build an offering for reporting.

I’ve heard customers rave about reports that took 10 hours on MySQL run in under a minute on Redshift. ¬†It’s not surprising because MySQL wasn’t built for the size servers it’s being deployed on today. ¬†So it doesn’t make good use of all that memory. ¬†Even with SSD drives, query plans can execute badly.

Also: Is there a better way to build a warehouse in 2016?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters