Tag Archives: cloud computing

5 startup & scalability blogs I never miss – week 2

5 blogs week 2

Join 11,500 others and follow Sean Hull on twitter @hullsean.

Hunter Walk – Startups

If you want to have your finger on the pulse of startup land, there aren’t many better places to start than Hunter Walk’s 99% humble writings. Google finds his top posts on topics like AngelList, Advisors, and reinventing the movie theatre. Good writing, insiders view.

Read: NYC technology startups are hiring

Arnold Waldstein – Marketing

I first found Arnold’s blog using my trusty disqus discovery hack. He had written an interesting piece about new mobile shopping at popup stores like Kate Spade.

Follow him on Disqus, follow the blog, get the newsletter. All good stuff.

Read This: Why hiring is a numbers game

Claire Diaz Ortiz – Social Media

Claire writes a lot about social media, twitter & blogging. She wrote an excellent guide to increasing your pagerank, another on 30 important people to follow on twitter and more. She can even help you find a job.

Check out: Top MySQL DBA Interview questions for candidates, managers & recruiters

Bruce Schneier – Security

Bruce Schneier is one of the original bad boys of computer security. He writes about broad topics, that affect us all everyday from common sense about airport security, to the impacts of cryptography for you and me. Very worth looking at regularly, just to see what he’s paying attention to.

Also: Why operations & MySQL DBA talent is hard to find

Eric Hammond – Amazon Cloud

Eric Hammond has been writing about Amazon Web Services, EC2 & Ubuntu for years now. He maintains and releases some excellent AMIs, those are the machine images for spinning up new servers in Amazon’s cloud.

Even if you’re not big on the command line, you can get a lot of critical insight about the Amazon cloud by keeping up with his blog. Jeff Barr’s AWS blog is also good, but not nearly as critical and boots on the ground as Eric’s.

Also: 8 Questions to ask an AWS expert

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

Autoscaling MySQL on Amazon EC2

Also find Sean Hull’s ramblings on twitter @hullsean.

Autoscaling your webserver tier is typically straightforward. Image your apache server with source code or without, then sync down files from S3 upon spinup. Roll that image into the autoscale configuration and you’re all set.
autoscaling MySQL
With the database tier though, things can be a bit tricky. The typical configuration we see is to have a single master database where your application writes. But scaling out or horizontally on Amazon EC2 should be as easy as adding more slaves, right? Why not automate that process?

Below we’ve set out to answer some of the questions you’re likely to face when setting up slaves against your master. We’ve included instructions on building an AMI that automatically spins up as a slave. Fancy!

  1. How can I autoscale my database tier?
    1. Build an auto-starting MySQL slave against your master.
    2. Configure those to spinup. Amazon’s autoscaling loadbalancer is one option, another is to use a roll-your-own solution, monitoring thresholds on servers, and spinning up or dropping off slaves as necessary.
  2. Does an AWS snapshot capture subvolume data or just the SIZE of the attached volume?
  3. In fact, if you have an attached EBS volume and you create an new AMI off of that, you will capture the entire root volume, plus your attached volume data. In fact we find this a great way to create an auto-building slave in the cloud.

  4. How do I freeze MySQL during AWS snapshot?
  5. mysql> flush tables with read lock;mysql> system xfs_freeze -f /data

    At this point you can use the Amazon web console, ylastic, or ec2-create-image API call to do so from the command line. When the server you are imaging off of above restarts – as it will do by default – it will start with /data partition unfrozen and mysql’s tables unlocked again. Voila!

    If you’re not using xfs for your /data filesystem, you should be. It’s fast! The xfsprogs docs seem to indicate this may also work with foreign filesystems. Check the docs for details.

  6. How do I build an AMI mysql slave that autoconnects to master?
  7. Install mysql_serverid script below.

    1. Configure mysql to use your /data EBS mount.
    2. Set all your my.cnf settings including server_id
    3. Configure the instance as a slave in the normal way.
    4. When using GRANT to create the ‘rep’ user on master, specify the host with a subnet wildcard. For example ‘10.20.%’. That will subsequently allow any 10.20.x.y servers to connect and replicate.
    5. Point the slave at the master.
    6. When all is running properly, edit the my.cnf file and remove server_id. Don’t restart mysql.
    7. Freeze the filesystem as described above.
    8. Use the Amazon console, ylastic or API call to create your new image.
    9. Test it of course, to make sure it spins up, sets server_id and connects to master.
    10. Make a change in the test schema, and verify that it propagates to all slaves.
  8. How do I set server_id uniquely?
  9. As you hopefully already know, in MySQL replication environment each node requires a unique server_id setting. In my Amazon Machine Images, I want the server to startup and if it doesn’t find the server_id in the /etc/my.cnf file, to add it there, correctly! Is that so much to ask?

    Here’s what I did. Fire up your editor of choice and drop in this bit of code:

    #!/bin/shif grep -q “server_id” /etc/my.cnf


    : # do nothing – it’s already set


    # extract numeric component from hostname – should be internet IP in Amazon environment

    export server_id=`echo $HOSTNAME | sed ‘s/[^0-9]*//g’`

    echo “server_id=$server_id” >> /etc/my.cnf

    # restart mysql

    /etc/init.d/mysql restart


    Save that snippet at /root/mysql_serverid. Also be sure to make it executable:

    $ chmod +x /root/mysql_serverid

    Then just append it to your /etc/rc.local file with an editor or echo:

    $ echo "/root/mysql_serverid" >> /etc/rc.local

    Assuming your my.cnf file does *NOT* contain the server_id setting when you re-image, then it’ll set this automagically each time you spinup a new server off of that AMI. Nice!

  10. Can you easily slave off of a slave? How?
  11. It’s not terribly different from slaving off of a normal master.

    1. First enable slave updates. The setting is not dynamic, so if you don’t already have it set, you’ll have to restart your slave.
    2. log_slave_updates=true
    3. Get an initial snapshot of your slave data. You can do that the locking way:
    4. mysql> flush tables with read lock;mysql> show master statusG;

      mysql> system mysqldump -A > full_slave_dump.mysql

      mysql> unlock tables;

      You may also choose to use Percona’s excellent xtrabackup utility to create hotbackups without locking any tables. We are very lucky to have an open-source tool like this at our disposal. MySQL Enterprise Backup from Oracle Corp can also do this.

    5. On the slave, seed the database with your dump created above.
    6. $ mysql < full_slave_dump.mysql
    7. Now point your slave to the original slave.
    8. mysql> change master to master_user='rep', master_password='rep', master_host='', master_log_file='server-bin-log.000004', master_log_pos=399;mysql> start slave;

      mysql> show slave statusG;

  12. Slave master is set as an IP address. Is there another way?
  13. It’s possible to use hostnames in MySQL replication, however it’s not recommended. Why? Because of the wacky world of DNS. Suffice it to say MySQL has to do a lot of work to resolve those names into IP addresses. A hickup in DNS can interrupt all MySQL services potentially as sessions will fail to authenticate. To avoid this problem do two things:

    1. Set this parameter in my.cnf
    2. skip_name_resolve = true
    3. Remove entries in mysql.user table where hostname is not an IP address. Those entries will be invalid for authentication after setting the above parameter.
  14. Doesn’t RDS take care of all of this for me?
  15. RDS is Amazon’s Relational Database Service which is built on MySQL. Amazon’s RDS solution presents MySQL as a service which brings certain benefits to administrators and startups:

    • Simpler administration. Nuts and bolts are handled for you.
    • Push-button replication. No more struggling with the nuances and issues of MySQL’s replication management.
    • Simplicity of administration of course has it’s downsides. Depending on your environment, these may or may not be dealbreakers.

    • No access to the slow query log.
    • This is huge. The single best tool for troubleshooting slow database response is this log file. Queries are a large part of keeping a relational database server healthy and happy, and without this facility, you are severely limited.

    • Locked in downtime window
    • When you signup for RDS, you must define a thirty minute maintenance window. This is a weekly window during which your instance *COULD* be unavailable. When you host yourself, you may not require as much downtime at all, especially if you’re using master-master mysql and zero-downtime configuration.

    • Can’t use Percona Server to host your MySQL data.
    • You won’t be able to do this in RDS. Percona server is a high performance distribution of MySQL which typically rolls in serious performance tweaks and updates before they make it to community addition. Well worth the effort to consider it.

    • No access to filesystem, server metrics & command line.
    • Again for troubleshooting problems, these are crucial. Gathering data about what’s really happening on the server is how you begin to diagnose and troubleshoot a server stall or pileup.

    • You are beholden to Amazon’s support services if things go awry.
    • That’s because you won’t have access to the raw iron to diagnose and troubleshoot things yourself. Want to call in an outside consultant to help you debug or troubleshoot? You’ll have your hands tied without access to the underlying server.

    • You can’t replicate to a non-RDS database.
    • Have your own datacenter connected to Amazon via VPC? Want to replication to a cloud server? RDS won’t fit the bill. You’ll have to roll your own – as we’ve described above. And if you want to replicate to an alternate cloud provider, again RDS won’t work for you.

To the ‘Microsoft Azure’ Cloud

To The Cloud: Powering An Enterprise introduces the concepts of cloud computing from a high-level and strategic standpoint. I’ve read quite a few tomes on cloud computing and I was interested to see how this one would stack up against the others.

The book is not too weighty in technical language so as not to be overwhelming and intimidating. However at ninety five pages, one might argue it is a bit sparse for a $30 book, if you purchase it at full price.

It is organized nicely around initiatives to get you moving with the cloud.

Chapter 1, Explore takes you through the process of understanding what the cloud is and what it has to offer.

Chapter 2, Envision puts you in the drivers seat, looking at the opportunities the cloud can offer in terms of solutions to current business problems.

Chapter 3, Enable discusses specifics of getting there, such as selecting a vendor or provider, training your team, and establishing new processes in your organization.

Finally in Chapter 4, we hit on real details of adopting the cloud in your organization. Will you move applications wholesale, or will you adopt a hybrid model? How will you redesign your applications to take care of automated scaling? What new security practices and processes will you put in place. The authors offer practical answers to these questions. At the end there is also an epilogue discussing emerging market opportunities for cloud computing, such as those in India.

One of the problems I had with the book is that although it doesn’t really position itself as a Microsoft Cloud book per se, that is really what the book aims at.

For example, Microsoft Azure is sort of the default platform throughout the book, whereas in reality most folks think of Amazon Web Services to be the sort of default when talking about cloud computing. Although specifically, Azure is really a platform, while AWS is Infrastructure or raw iron, that can run Linux based Operating Systems, or Windows Azure stuff.

Of course having a trio of Microsoft executives as authors gives a strong hint to readers to expect some plugging but a rewrite of the title would probably manage readers’ expectations better.

The other missing piece with this book is a chapter on tackling new challenges in the cloud. Cloud Computing – Azure or otherwise, brings challenges with respect to hardware as using the cloud means deploying across shared resources. For example it’s hard to deploy a high-performance RAID array or SAN solution devoted to one server in the cloud. This is a challenge on AWS as well, and continues to be a major adoption hurdle. It’s part of the commoditization puzzle, but it’s as yet not completely solved. Such a chapter to discuss mitigating against virtual server failures, using redundancy, and cloud components to increase availability would be useful.

Lastly, I found it a bit disconcerting that all of the testimonials were from fellow CTOs and CIOs of big firms, not independents or other industry experts. For example I would have liked to see George Reese of Enstratus, Thorsten von Eicken from Rightscale or John Engates from Rackspace provide a comment or two on the book.

Overall the book is a decent primer if you’re looking for some guidance on Microsoft Azure Cloud. It is not a comprehensive introduction to cloud computing and you’d definitely need other resources to get the full picture. At such a hefty sticker price, my advice is to pick this one up at the bargain bin.

A History lesson for Cloud Detractors

Computing history

We’ve all seen cloud computing discussed ad nauseam on blogs, on Twitter, Quora, Stack Exchange, your mom’s Facebook page… you get the idea. The tech bloggers and performance experts often pipe in with their graphs and statistics showing clearly that dollar-for-dollar, cloud hosted virtual servers can’t compete with physical servers in performance, so why is everyone pushing them? It’s just foolhardy, they say.

On the other end, management and their bean counters would simply roll their eyes saying this is why the tech guys aren’t running the business.

Seriously, why the disconnect? Open source has always involved a lot of bushwacking…

Continue reading A History lesson for Cloud Detractors

$1000 per hour Servers, Anyone?

Amazon’s spot market for computing power is set up as an open market for surplus servers. The price is dynamic and depends on demand. So when demand is low, you can get computing instances for rock bottom prices. When you do that you normally set a range of prices you’re willing to pay. If it goes over your top end, your instances get killed and re-provisioned for someone else. Obviously this wouldn’t work for all applications, like a website that has to be up all the time, but for computing power, say to run some huge hedge fund analytics, it might fit perfectly. Continue reading $1000 per hour Servers, Anyone?

Service Monitoring – What is it and why is it important?

Data centers are complex beasts, and no amount of operator monitoring by itself can keep track of everything.  That’s why automated monitoring is so important.

So what should you monitor?  You can divide up your monitoring into a couple of strategic areas.  Just as with metrics collection, there is business & application level monitoring and then there is lower level system monitoring which is also important.

Business & Application Monitoring

  • If a user is getting an error page or cannot connect
  • If an e-commerce  transaction is failing
  • General service outages
  • If a business goal is met – or not
  • Page timeouts or slowness

Systems Level Monitoring

  • Backups completed and success
  • Error logs from database, webserver & other major services like email
  • Database replication is running
  • Webserver timeouts
  • Database timeouts
  • Replication failures – via error logs & checksum checks
  • Memory, CPU, Disk I/O, Server load average
  • Network latency
  • Network security

Tools that can perform this type of monitoring include Nagios,

Quora discussion – Web Operations Monitoring

Migrating to the Cloud – Why and why not?

A lot of technical forums and discussions have highlighted the limitations of EC2 and how it loses  on performance when compared to physical servers of equal cost.  They argue that you can get much more hardware and bigger iron for the same money.  So it then seems foolhardy to turn to the cloud.  Why this mad rush to the cloud then?  Of course if all you’re looking at is performance, it might seem odd indeed.  But another way of looking at it is, if performance is not as good, it’s clearly not the driving factor to cloud adoption.

CIOs and CTOs are often asking questions more along the lines of, “Can we deploy in the cloud and settle with the performance limitations, and if so how do we get there?”

Another question, “Is it a good idea to deploy your database in the cloud?”  It depends!  Let’s take a look at some of the strengths and weaknesses, then you decide.

8 big strengths of the cloud

  1. Flexibility in disaster recovery – it becomes a script, no need to buy additional hardware
  2. Easier roll out of patches and upgrades
  3. Reduced operational headache – scripting and automation becomes central
  4. Uniquely suited to seasonal traffic patterns – keep online only the capacity you’re using
  5. Low initial investment
  6. Auto-scaling – set thresholds and deploy new capacity automatically
  7. Easy compromise response – take server offline and spinup a new one
  8. Easy setup of dev, qa & test environments

Some challenges with deploying in the cloud

  1. Big cultural shift in how operations is done
  2. Lower SLAs and less reliable virtual servers – mitigate with automation
  3. No perimeter security – new model for managing & locking down servers
  4. Where is my data?  — concerns over compliance and privacy
  5. Variable disk performance – can be problematic for MySQL databases
  6. New procurement process can be a hurdle

Many of these challenges can be mitigated against.  The promise of the infrastructure deployed in the cloud is huge, so digging our heels in with gradual adoption is perhaps the best option for many firms.  Mitigate the weaknesses of the cloud by:

  • Use encrypted filesystems and backups where necessary
  • Also keep offsite backups inhouse or at an alternate cloud provider
  • Mitigate against EBS performance – cache at every layer of your application stack
  • Employ configuration management & automation tools such as Puppet & Chef

Quora discussion – Why or why not to migrate to the cloud?

Root Cause Analysis – What is it and why is it important?

Root Cause Analysis is the means to identify the ultimate source and cause of an outage.  When an outage occurs that causes serious downtime of a website, typically organizations are in crisis mode.  Urgency of resolution sometimes pushes aside due process, change management and general caution.  Root Cause Analysis attempts to as much as possible isolate logfiles, configurations, and the current state of systems for later analysis.

With traditional physical servers, physical hardware failure, operator error, or a security breach can cause outages.  Since you’re dealing with one physical machine, resolving that issue necessarily means moving around the things that broke.  So caution and later analysis must be balanced with the immediate problem resolution.

Another silver lining in cloud hosted solutions is around root cause analysis.  If a server was breached for example, that server can immediately be shutdown, while maintaining it’s current state as a disk or EBS snapshot.  A new server can then be fired up from a AMI image, then your server rebuilt from scripts or template and you’re back up and running.  Save the snapshot then for later analysis.

This could be used for analysis of operator error related outages as well.  Hardware failures are more expected and common in cloud hosted environments, so this should and really must push adoption of best practices around infrastructure, that is having scripts at hand that rebuild everything from bare metal.

More discussion of root cause analysis by Sean Hull on Quora.

Capacity Planning – What is it and why is it important?

Look at your website’s current traffic patterns, pageviews or visits per day, and compare that to your server infrastructure. In a nutshell your current capacity would measure the ceiling your traffic could grow to, and still be supported by your current servers. Think of it as the horsepower of you application stack – load balancer, caching server, webserver and database.

Capacity planning seeks to estimate when you will reach capacity with your current infrastructure by doing load testing, and stress testing. With traditional servers, you estimate how many months you will be comfortable with currently provisioned servers, and plan to bring new ones online and into rotation before you reach that traffic ceiling.

Your reaction to capacity and seasonal traffic variations becomes much more nimble with cloud computing solutions, as you can script server spinups to match capacity and growth needs. In fact you can implement auto-scaling as well, setting rules and thresholds to bring additional capacity online – or offline – automatically as traffic dictates.

In order to be able to do proper capacity planning, you need good data. Pageviews and visits per day can come from your analytics package, but you’ll also need more complex metrics on what your servers are doing over time. Packages like Cacti, Munin, Ganglia, OpenNMS or Zenoss can provide you with very useful data collection with very little overhead to the server. With these in place, you can view load average, memory & disk usage, database or webserver threads and correlate all that data back to your application. What’s more with time-based data and graphs, you can compare changes to application change management and deployment data, to determine how new code rollouts affect capacity requirements.

Sean Hull asks about Capacity Planning on Quora.