Real Disaster Recovery Lessons from Sandy

Also find Sean Hull’s ramblings on twitter @hullsean.

Having just spent the last 24 hours in lower manhattan, while Hurricane Sandy rolled through, it’s offered some first hand lessons on disaster recovery. Watching the city and state officials, Con Edison, first responders and hospitals deal with the disaster brings some salient insights.

1. What are your essentials?

Planning for disaster isn’t easy. Thinking about essentials is a good first question. For a real-life disaster scenario it might mean food, water, heat and power. What about backup power? Are your foods non-parishable? Do you have hands free flashlight or lamp? Have you thought about communication & coordination with your loved ones? Do you have an alternate cellular provider if your main one goes out?

With business continuity, coordinating between business units, operations teams, and datacenter admins is crucial. Running through essential services, understanding out they interoperate, who needs to be involved in what decisions and so far is key.

Here’s a real-world story where we lost a database, what caused it and how we recovered.

2. What can you turn off?

While power is being restored, or some redundant services are offline, how can you still provide limited or degraded service? In the case of Sandy, can we move people to unaffected areas? Can we reroute power to population centers? Can we provide cellular service even while regular power is out?

[quote]Hurricane Sandy has brought devastation to the East Coast. But strong coordinated efforts between NYC, State & Federal agencies has reduced the impact dramatically. We can learn a lot about disaster recovery in web operations from their model.
[/quote]

For web applications and datacenters, this can mean applications built with feature flags, we’ve mentioned before on this blog.

Also very important, architect your application to have a browse only mode. This allows you to service customers off of multiple webservers in various zones or regions, using lots of read-replicas or read-only MySQL slave databases. It’s easy to build lots of read-only copies of your data while there are no changes or transactions taking place.

More redundancy equals more uptime.

Like this topic? Grab our newsletter

3. Did we test the plan?

A disaster is never predictable, but watching the emergency services for the city was illustrative of some very good response. They outlined mandatory evacuation zones, where flooding was expected to be worst.

In a datacenter, fire drills can make a big difference. Running through them gives you a sense of the time it takes to restore service, what type of hurdles you’ll face, and a checklist to summarize things. In real life, expect things to take longer than you planned.

Probably the hardest part of testing is to devise scenarios. What happens if this server dies? What happens if this service fails? Be conservative with your estimates, to provide more time as things tend to unravel in an actual disaster.

Here are 5 ways to avoid EC2 outages.

4. Redundancy

In a disaster, redundancy is everything. Since you don’t know what the future will hold, better to be prepared. Have more water than you think you’ll need. Have additional power sources, bathrooms, or a plan B for shelter if you become flooded.

With Amazon’s recent outage, quite a number of internet firms failed. In our view AirBNB, FourSquare and Reddit Didn’t Have to Fail. Spreading your virtual components and services across zones and regions would help, but further across multiple cloud providers not just Amazon Web Services, but Joyent, Rackspace or other third party providers would give you further insurance against a failure in one single provider.

Redundancy also means providing multiple paths through system. From load balancers, to webservers and database servers, object caches and search servers, do you have any single points of failure? Single network path? Single place where some piece of data resides?

5. Remember the big picture

While chaos is swirling, and everyone is screaming, it’s important that everyone keep sight of the big picture. Having a central authority projecting a sense of calm and togetherness doesn’t hurt. It’s also important that multiple departments, agencies, or parts of the organization continue to coordinate towards a common goal. This coordinated effort could be seen clearly during Sandy, while Federal, State and City authorities worked together.

In the datacenter, it’s easy obsess over details and lose site of the big picture. Technical solutions and decisions need to be aligned with ultimate business needs. This also goes for business units. If a decision is unilaterally made that publish cannot be offline for even five minutes, such a tight constraint might cause errors and lead to larger outages.

Coordinate together, and everyone keep sight of the big picture – keeping the business running.

Speaking of the big picture, here’s Why generalists are better at scaling the web.

Read this far? Grab our newsletter Scalable Startups.

Cloud Operations Interview

What does a cloud computing expert need to know? How do you hire a cloud computing expert? Competition for operations & DBAs is fierce, so you’ll want to know how to find the best.

If you’re a systems administrator or ops guy, you may want to prepare for an interview for such a position. Meanwhile, if you’re a director of it or operations, a recruiter or manager in HR, you’ll want to have some idea how to find the right candidate.

Here’s my guide to do just that. You may also jump to part two Cloud Deployment Interview or the last part three Cloud DBA, Architecture and Management Interview.

1. Solid unix systems administrator

At the top of the list, a cloud operations expert needs to understand Unix and more importantly Linux. Here are some sample questions to get the conversation moving:

o What is web operations and what have you done day-to-day?

Prepare some stories.

o What’s your favorite feature of the linux kernel?

This is an open ended question, but a systems administrator should have some knowledge here. The kernel is the most basic piece of software that runs when a computer boots up, whether it is a desktop or a server. This piece of software coordinates everything, manages resources, and directs traffic.

o Name some distributions of linux. What is a distro?

Linux is built by a collaborative team of thousands on the internet. That’s what makes it open source. The distributions, include the operating system, along with a collection of software to go along with it. All the supporting utilities, libraries and servers must be compiled and held in a repository. That’s what makes up a distribution. Debian, Redhat and Ubuntu are a few popular ones.

[quote]
A cloud operations expert needs to have a wide ranging skillset, from unix administration, architecture, scalability, database & webserver administration, troubleshooting & performance, load & stress testing. You’ll also want someone who has learned hard lessons from some failures, has some war stories to tell and has a hard nose for stability.
[/quote]

o What’s the difference between apache and nginx?

These two pieces of software are both webservers, that is they respond to the HTTP protocol, and can serve HTML pages. They also have a myriad of plugins to support different languages and features. The difference? Nginx (pronounced engine-X) is a newer incarnation. It’s been rearchitected from the ground up, building on all the things learned from Apache over the years. Its tighter, more efficient code, and easier to configure.

You might also enjoy our Intro to EC2 Cloud Deployments Guide.

o What is a key value store? examples?

There are lots of examples of these types of databases. They are a very simple memory cache that can interface with most applications. Memcache is a popular example of a key value store. Redis, CouchDB and Voldemort can also do this.

o What is a page cache? Reverse proxy cache? examples?

These are all the same thing. They are basically a very minimal webserver without all the plugins or bells and whistles. You put one of these in front of your webserver to handle all the easy stuff, and speed up overall throughput. Varnish is a popular example.

o What filesystem do you prefer?

This is a bit arcane, but one should have some opinions here. xfs is a popular filesystem, though ext3 and ext4 are also common. Emphasize the journaling aspect here. Journaling means that if you pull the cord or your server crashes, the filesystem can recover upon reboot. It does this by journaling changes, much how a database keeps a redolog cache of recent changes to database tables.

o Command line tools

There are lots of commands in the day-to-day toolbox of a web ops expert. Here are some examples:
rsync (pronounced our-sync) – sync files between servers & do checksums to allow easy restarts
scp (pronounced s-c-p) – secure copy, similar to rsync but no checksums, so less reliable
curl (pronounced kurl) – diagnose & test urls and HTTP from the command line
cron (pronounced cron) – run commands at scheduled times
ssh (pronounced s-s-h) – secure shell, the most basic tool to reach a cloud server
ifconfig (pronounced if-config) – check the network interfaces on the server
vi/emacs (pronounced v-i and e-macks) – terminal editors, to modify config files
uptime (pronounced up-time) – display the current load average of the server
top (pronounced top) – interactive display of system metrics like memory, load, swap & processes
ps (pronounced p-s) – shows running processes on the server
/var/log/messages – essential system logfile

o What are application servers? How are they different from webservers?

Tomcat & Glassfish are two examples of application servers. These handle heavier weight languages & applications like Java. Application server on some level is just a more heavyduty webserver and these days Apache can be thought of as an application server also.


2. Cloud concepts

o What is virtualization? What is a hypervisor?

Virtualization allows you to run one or more computers within a computer. You can do virtualization on a desktop, sharing network, memory, cpu and disk resources among a number of virtual servers. But more importantly in cloud computing or IaaS offerings you can do virtualization at the datacenter level. The hypervisor layer is a datacenter virtualization technology that provisions server resources, and balances shared network and disk resources.

o What is an image?

In Amazon the world, the AMI or amazon machine image is a snapshot of a server state at one moment in time. This image is take at the block level, and includes the master block record, the first block on disk that a server boots from. All that is the state of a server, when it is shutdown, is what is stored on disk or in this image. All config files, logfiles, and anything else writing to disk.

o What is multi-tenant?

This means that there are multiple servers sharing resources. The tenants are the customers who each want to get the server, cpu, memory, network and disk that they paid for.

o What is the downside to shared resources?

Contention for resources is always the challenge. If your fellow tenants are not very thirsty, this can work to your advantage. But if they’re also heavy users, the hypervisor layer has manage the balancing act. You may get a spike of disk I/O at one point, but later get a dearth. This can cause a relational database like MySQL or Oracle to suddenly look stalled.

o What is instance-store? What is ebs?

Instance store servers were Amazon’s original offering, where servers had their own local (and slow) storage. This storage was ephemeral, so all machine state was lost on reboot. These servers also boot slowly. EBS also known as elastic block storage is a virtualized storage option, similar to NAS or NFS. You can create arbitrary chunks of storage, and attach them to servers, all from command line APIs. Cool!

o What is virtual private cloud?

With the VPC offering, Amazon drops a router into your existing datacenter. You can then provision virtual servers to your hearts content, and they all appear to be servers in your existing datacenter. Elastically scale, within the network and security model you’re already using.

o What is a hybrid approach to cloud adoption?

Keeping your investments in hardware and datacenter is obviously an appealing option for firms that have large existing environment. A hybrid approach with a VPC allows you to get your feet wet, but still keep essential applications on physical servers.

o What is Amazon EC2?

Elastic Compute Cloud refers to the virtual servers you spinup in Amazon Web Services.

o What is Amazon RDS, Oracle RDS, Mysql RDS?

Amazon has various relational and non-relational database offerings. RDS stands for relational database service.

RDS or roll your own – which is better? Here are some use cases to help you decide.

o What is multi-az?

Amazon’s infrastructure offering isn’t just a single datacenter with servers. The beauty of what they’ve built is that they offer a number of datacenters (called availability zones) in each of many regions such as Northern Virginia, Oregon and Singapore.

Incidentally multi-az is a key feature to how businesses can protect themselves from failure. Amazon recently had an outage, but AirBNB, Reddit & Foursquare didn’t have to fail.

o What does a CDN do? How does it work? examples?

A CDN is a content delivery network. Remember all those files that make up a webpage? Images, video, css files? Turns out serving these components from servers *closer* to your customer, make their webpages load much faster. CDNs are networks of servers that hold the content of your pages, and serve them faster.

It works by replacing content paths with a special one from your provider. A simple change in your code will allow content to dynamically load from across the web. Cool!

CloudFront is Amazon’s offering coupled with S3 for file storage. Akamai is another big provider.

We’re not done yet. In part two on deployments and http://www.iheavy.com/2012/11/01/cloud-deployment-interview/”>part three of this series, we’ll hit on other important skills a cloud ops expert should have including scripting, database administration (Our MySQL Interview Guide), scalability, performance, configuration management, metrics, monitoring, and some all important war stories!

Here are some questions to pique your interest:

o Why does the API battle between Amazon & Eucalyptus (FOSS) matter?
o Do you use command line tools? why?
o What can go wrong with backups? how do we test them?
o Should we encrypt filesystems in the cloud? what are the risks?
o Should we use offsite backups?
o What is DRBD?
o Why is auditing important? access control?
o What is load balancing? why is it difficult with databases?
o How do you perform a benchmark? perform load testing?
o Why use a package manager? can we install from source?

Our Deploying MySQL on Amazon EC2 Guide is also related to this interview process.

You may also jump to part two Cloud Deployment Interview or the last part three Cloud DBA, Architecture and Management Interview.

Read this far? Grab our newsletter – startup scalability.

AirBNB didn't have to fail

Today part of Amazon Web Services failed, taking down with it a slew of startups that all run on Amazon’s Cloud infrastructure. AirBNB was one of the biggest, but also Heroku, Reddit, Minecraft, Flipboard & Coursera down with it. Its not the first time. What the heck happened, and why should we care?

1. Root Cause

The AWS service allows companies like AirBNB to build web applications, and host them on servers owned and managed by Amazon. The so-called raw iron of this army of compute power sits in datacenters. Each datacenter is a zone, and there are many in each of their service regions including US East (Northern Virginia), US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), South America (Sao Paulo), and AWS GovCloud.

Today one of those datacenters in the Northern Virginia region had a failure. What does this mean? Essentially firms like AirBNB that hosted their applications ONLY in Northern Virginia experienced outages.

As it turns out, Amazon has a service level agreement of 99.95% availability. We’ve long since said goodbye to the five nines. HA is overrated.

2. Use Redundancy

Although there are lots of pieces and components to a web infrastructure, two big ones are webservers and database servers. Turns out AirBNB could make both of these tiers redundant. How do we do it?

On the database side, you can use Amazon’s multi-az or alternately read-replicas. Each have different service characteristics so you’ll have to evaluate your application to figure out what will work for you.

Then there is the option to host mysql or Percona directly on Amazon servers yourself and use replication.

[quote]Using redundant components like placing webservers and databases in multiple regions, AirBNB could avoid an Amazon outage like Monday’s that affected only Northern Virginia.[/quote]
When do I want RDS versus mysql? Here are some use cases for RDS versus roll your own MySQL.

Now that you’re using multiple zones and regions for your database the hard work is completed. Webservers can be hosted in different regions easily, and don’t require complicated replication to do it.

3. Have a browsing only mode

Another step AirBNB can take to be resilient is to build a browsing only mode into their application. Often we hear about this option for performing maintenance without downtime. But it’s even more valuable during a situation like this. In a real outage you don’t have control over how long it lasts or WHEN it happens. So a browsing only mode can provide real insurance.

For a site like AirBNB this would mean the entire website was up and operating. Customers could browse and view listings, only when they went to book a room would the encounter an error. This would be a very small segment of their customers, and a much less painful PR problem.

Facebook has experience intermittent outages of it’s service. People hardly notice because they’ll often only see a message when they are trying to comment on someone’s wall post, send a message or upload a photo. The site is still operating, but not allowing changes. That’s what a browsing only mode affords you.

[quote]A browsing only mode can make a big difference, keeping most of the site up even when transactions or publish are blocked.
[/quote]

Drupal, an open source CMS system that powers sites like Adweek.com, TheHollywoodReporter.com, and Economist.com uses this technology. It supports a browsing only mode out of the box. An amazon outage like this one would only stop editors from publishing new stories temporarily. A huge win to sites that get 50 to 100 million with-an-m pageviews per month.

4. Web Applications need Feature Flags

Feature flags give you an on/off switch. Build them into heavy duty parts of your site, and you can disable those in an emergency. Host components multiple availability zones for extra peace of mind.

One of our all time most popular posts 5 Things Toxic to Scalability included some indepth discussion of feature flags.

5. Consider Netflix’s Simian Army

Netflix takes a very progressive approach to availability. They bake redundancy and automation right into all of their infrastructure. Then they run an app called the Chaos Monkey which essentially causes outages, randomly. If resilience from constantly falling and getting back up can’t make you stronger, I don’t know what can!

Take a look at the Netflix blog for details on intentional load & stress testing.

6. Use multiple cloud providers

If all of the above isn’t enough for you, taking it further you’d do as George Reese of enstratus recommends and use multiple cloud providers. Not being beholden to one company could help in more situations than just these type of service disruptions too.

Basic EC2 Best Practices mean building redundancy into your infrastructure. Multiple cloud providers simply take that one step further.

Read this far? Grab our newsletter on scalability and startups!

Going Solo for Fun or Profit?

Sara Horowitz has serious chops. Independent herself, she started Freelancer’s Union way back in 1995. Back then it was tougher as a freelancer. Through her great efforts, we’ve all benefited.

So when I saw she’d published a guide called “Freelancer’s Bible”, I was quick to grab a copy. And the book doesn’t disappoint. I wish this book had existed when I got my start way back when I first moved to NYC in 1996.

A budding freelancer

As a budding freelancer, you’ve got a ton of new skills to pickup, where to start? Flip to Part 1 and you’ll get a quick hitchhikers guide, with advice on setting up your office and organizing your time, to pitching to prospects, networking, and building a portfolio of different types of clients to keep your workflow steady. You also learn how to package your services, and the myriad ways to set fees from hourly, to project based and day rates to packages.

[quote]Network & market yourself, package & price services, communicate well and manage timelines, deliver, bill and finally get paid. Each step is outlined here in easy to read bullets, and helpful “Ask Sara” sections. Easy layout, and a pleasure to read.[/quote]

As I was reading these early chapters, I thought it would be nice to have a chapter on social media. Turns out I spoke too soon, as I flip through the pages, chapter 9 is all about marketing and social media, online tools to build your reputation and influence. Also I like that she appeals to the practical approach. For example Sara emphases that you let go of strategy, and experiment with different options, and methods. This is exactly what I’ve done over the years, and it’s the best way to find out what works for your personal style, as well as your industry. Trial and error!

I’ve written a guide on this topic myself. Take a look at my three part Guide to independent consulting 101.

Advice for Seasoned & Growing Operations

I’ve been working as a freelancer for 17 years now, and I’ve certainly learned a lot. So when I flip through the book, it confirms many of those lessons. But I also found material that I could use. For example chapter 7 Troubleshooting she has examples of situations where you and your client are out of sync, and offers “triple-a communication” solutions to those problems. This is the type of advice you’ll definitely need, as these scenarios are inevitable in freelance work. Also I found Chapter 10 Ways to Grow very helpful. Her list of “How do you know it’s time to grow” outlines some surprising and helpful thoughts on what to do if you have too much work. I’ve started dabbling with subcontracting and hiring additional help, so these chapters I’m finding very helpful.

Criticisms

There are a few things that I’d differ with Sara slightly on. Here are my thoughts.

Avoiding Contracts and Lawyers

I don’t get to heavy with lawyers and contracts. I know I know people say this is crazy, but over the years my method has served me well. It starts with a simple premise – I never intend to go to court. What do I mean? It costs too much, both in real dollars, time spent, but most of all stress. If you’ve ever been on jury duty you know what I mean.

With that, you pull the perceived safety net completely out from under yourself. So I am careful and cautious as a result. My *contracts* are simple emails, in which I outline what I’ll do, what the client will do, and who will do what when. I do all this in plain language, without any lawyer-ese. What I do get though is a confirmed *yes* in an email. This email thread is above and beyond verbal conversations and phone calls. It allows clarification down the line if you and the client have differences.

I also insist on a deposit of some kind. It doesn’t have to be a lot, but it is a hoop that you ask your client to jump through. This is very important. So-called dead beat clients will fail this test. If they are very very hesitant to provide a deposit, they are either uncomfortable with you, or are short on budget. Either case should be a red flag. Managing this relationship is very very important, when you plan never to rely on legal recourse for differences.

Who’s Played? Get paid!

On page 195 she talks about the Freelancer’s Union Client Scorecard, and “outing” clients who don’t pay. I personally think this is a bad road to go down. Why? Well there are a few reasons.

1. There are two sides to every coin

When a company hires an outside resource, they don’t have control over day-to-day operations, and overseeing what the person is doing. And yes, sadly there are many levels of work quality. So there can be differences. In my experience all those differences can and should be worked out. Communication is key and I think if you follow all of Sara’s advice on triple-A communication, you’ll avoid these situations. I do feel though that these ***

2. You can ask for an insurance deposit

Asking for a deposit from a new prospect is an important step. Without a past history of paying, and paying timely, this is a hoop you’re asking them to jump through. It proves that the budget exists, it proves that the team or director that hired you has communicated that to AP, and simply that you’re in the system. In my experience after the first check, things tend to go smoothly. If you’re experiencing trouble with this step, ask yourself – Are we on the same page? Where is the disconnect? Is the client confident you’ll deliver, and complete? Where is the hesitation?

3. It could hurt you in the end

Lastly these type of “outing” boards might hurt you in the end. If you gain a reputation for creating bad publicity or press for one firm, others may not want to work with you. I also think they are a distraction from communicating and resolving issues, and/or finding other work.

[quote]Apply all of Sara’s advice, especially those around Triple-A Communication, and you’ll likely do very well as a solopreneur. Let’s avoid becoming part of the 44% of freelancer’s who’ve reportedly had trouble getting paid![/quote]

Don’t undercharge for Services

Another point I’ll underline is charging for services. There is some talk in the book of wage wars, and 44% of freelancers not getting paid. In my experience being a freelancer is more like being another corporation. Corps fight with each other all the time. They have differences, and duke it out. It’s a bit dog eat dog out there. If you’re not prepared for that, you may be in for an uphill battle. Over the years I’ve certainly had differences with clients, but I’ve never not gotten paid. I *have* however turned away work, if I got a bad feeling about the client.

I wrote a critique of John Greathouse’s Beware the Consultant that might interest readers here. Take a look at my article Beware the Client.

That said you should be charging more than your fulltime brothers and sisters. Let’s give an example. Say your fulltime job would pay 75k/year. This theoretically is about $37.50/hr (40 hours x 50 weeks). However as a freelancer you must also pay for benefits like health insurance, retirement funds, downtime when you’re not billing, overhead of networking and meetings. You also have some additional taxes to pay. I my experience at minimum you should be charging roughly double this amount just to break even. If you’re not, it simply won’t make financial sense to stay freelancing. More likely you should be charging roughly 3x this base hourly amount. If you’re not, you may over time drift back towards fulltime employment.

I wrote another article on this topic Why do people leave consulting.

All of this should be part of educating the client. It’s often forgotten when firms look at outsourcing to get projects completed. So you should explain all of these costs clearly, and compare yourself to larger firms and agencies. These folks tend to be a *LOT* more expensive than a solopreneur.

All together now…

Sara’s bible is one every freelancer should have a copy of. It is the most complete book for a solo operator I’ve seen. Besides a few criticisms I have, it is a superb book and sure to be a reference you’ll turn to again and again.

Read this far? Grab our newsletter Scalable Startups.

Why do people leave consulting?

Join 12,100 others and follow Sean Hull on twitter @hullsean.

As a long time freelancer, it’s a question that’s intrigued me for some time. I do have some theories…

First, definitions… I’m not talking about working for a large consulting firm. Although this role may be called “consultant”, my meaning is consultant as sole proprietor, entrepreneur, gun for hire or lone wolf.

1. Make more money in a fulltime role

I’ve met a lot of people who fall into this trap. They take a fulltime role simply because it pays better. That raises a lot of questions…

o Are you pricing right?

You could be pricing to high to get *enough* work. You may also be pricing too low to cover benefits, health insurance and so forth. Or perhaps you can’t sell to your rate. You can be smart skills-wise, but do you feel your clients pain? Are you good at being a businessman? Consistent?

o Can you sell, and put together an appealing proposal?

o Can you execute to the clients satisfaction?

o Can you followup consistently while accounts payable gets tied up in knots?

o Can you followup if your client executes past their spend?

Running a business is complicated, and a lot of expenses can be hard to juggle. You will find times when a client may have spent a little faster than their revenue, and have trouble finding money when the invoice arrives. Followup, patience and persistence is key.

Read: Why high availability is so very hard to deliver

Want more? We wrote an in depth 3 part guide to consulting.

2. Make a consistent paycheck in a fulltime position

o Are you networking enough?

If you take a longterm gig and get comfortable, your pipeline can dry up. And your pipeline is the key to your longterm strength, and regular business. You must get out there, and let people know about you, your services, and your availability.

If you don’t network regularly, post across the web, engage on social media channels, blog regularly and so forth, you’ll likely just land a series of 6-12 month fulltimeish gigs through recruiters or headshops.

Related: 5 ways to evaluate independent consultants

[quote]Being a freelancer or entrepreneur involves wearing many hats. Finding business involves networking & marketing. Delivering to their needs involves emotional intelligence. And actually getting paid on time is a whole artform in itself. Leave a good taste in their mouth and your reputation will spread quickly by word of mouth.[/quote]

o Do you really *LIKE* being an entrepreneur?

Are you consistent? Consulting is like running a marathon, if you burn out you may give up!

Have a large web property or application which is experiencing some growing pains? Take a look at how we do performance reviews. It may be just what you’re looking for.

Related: MySQL interview guide for managers and candidates alike

3. Do you like the lifestyle of larger corporate environments?

o Fulltime roles allow for much more jedi sword play. Maneuvering up the ranks involves relationship building as much as consulting, but with a more well defined ladder to climb.

o Sometimes you’ll find pass the buck and pointing fingers quite common.

o There are roles involving managing people and processes. These less often lend themselves to short term or situational consulting arrangements. If you lean towards those roles

Trying to hire top tech talent? Here’s our MySQL DBA hiring guide & interview questions

[quote]Working as a sole proprietor for a couple of decades has taught me to be very entrepreneurial. It is every bit about building a real-world startup[/quote]

4. Want to do more cutting edge & at the keyboard work

Consulting can and often does allow you to bump into the latest technologies, and get your feet wet with what cutting edge firms are doing. However in a fulltime role you can more completely immerse yourself in the technology, and those long term solutions.

Also: Why devops talent is in short supply

o You can take part in R&D – Google’s 20% projects, for example

o You can build hypothetical projects

o You can work in more idealistic environments, operations and even lectures & training

Though you can certainly do all of this as a freelancer, you have to build enough capital, and so forth to make it work.

Juggling job roles as a consultant isn’t easy. What a CTO must never do.

5. Don’t like running a small business

Consulting as a sole proprietor and staying in business for almost twenty years, I’ve learned that it is every bit about running a small business or startup.

A. Acquiring customers, networking, marketing
B. Understanding their needs and delivering to improve their position
C. Pricing in a your customers understand
D. Offering value to your customers, at a competitive price
E. Managing relationships so your brand or reputation precedes you
F. Making sure payments and invoicing isn’t a hurdle, followup
G. Pacing yourself like a marathon runner – keep doing what you’re doing right

Read this far? Get our scalable startups monthly newsletter. We cover these topics in detail, year in and year out.

Thoughts on Upcoming MySQL 5.6 Defaults

During Oracle Open World 2012 and the parallel MySQL Connect conference, the new 5.6 version was announced. It’s only release candidate right now, but that means the GA release is just around the corner.

With that James Day has posted changes to various of the new parameter defaults. Many of them you may not run into on day-to-day production systems. However there are a few which I do see a lot.

Welcome changes to defaults

innodb_file_per_table = 1 (formerly 0 or off)

This is a parameter which really needs to be set on most systems. It tells InnoDB not to use just one large tablespace for all tables, but rather to create an individual tablespace for each InnoDB table. If you come from the Oracle world you might wonder why this is necessary, but due to the evolution of InnoDB itself, this having lots of individual tablespaces helps a lot with concurrency. And that’s something you want for scalability of those web applications.

What made the default really troublesome before is that if you forgot to set it, all your existing tables would be created in the single large tablespace. That would leave you only one option. Set the parameter, and then REBUILD all those previously created tables. The syntax to do rebuild wasn’t difficult, but the load it would generate on your systems surely was.

innodb_log_file_size = 48m (formerly 5m)

Here’s another parameter that didn’t have a great default previously. The more transactional activity you have in your database, the quicker these files fill up. As one fills up the database must switch to the next. You want to keep this switching to a minimum just like switching redologs in Oracle. By resizing these files larger, you make your database faster.

With the previous default of 5m, one had to jump through a number of hoops to remedy the default. First you would stop the database, then delete or move the files, then change the parameter in the my.cnf file and start MySQL again. If you started just by changing the value in my.cnf, you’d get various errors and the database wouldn’t start. So all and all a larger default avoids all these problems. Bravo!

Possibly unwelcome changes to defaults

sync_master_info=10000 (formerly 0)
sync_relay_log=10000 (formerly 0)
sync_relay_log_info=10000 (formerly 0)

I first discovered these three parameters reading the most recent printing of High Performance MySQL by Baron Schwartz, Peter Zaitsev & Vadim Tkachenko. I highly recommend the book if you haven’t picked up a copy. It is an in-depth technical tomb of MySQL information, a real tour-de-force that every database administrator should own.

I mention these parameters elsewhere in my article on bulletproofing MySQL replication with integrity checking.

The master.info and relay-log.info files on MySQL slaves are not by themselves crash safe. So if your slave crashes, it can fail to update these files with the servers current position. So upon restart the slave can fail.

I decided to put those parameters into use on a production slave server. I set each of them to a value of 1. This tells the server to sync after ever 1 event or 1 transaction respectively. It didn’t occur to me to do some performance testing with the parameters on. Mistake, big mistake.

Although the change made the slave crash safe, it also made the slave much slower. It began to lag behind the master significantly until it was showing old data. Since our Drupal setup used served most of the site off of the read-only slaves, we began to see old pages.

Although these new defaults set the value to 10,000 I would not recommend setting these parameters on by default. If you do use them on production boxes I would recommend doing performance tests to make sure they don’t cause unnecessary lag to your replication setup. Perhaps the new multi-threaded replication will help somewhat with this in 5.6.

Read this far? Grab our newsletter scalable startups!

Lessons from Locksmiths and the 99%

Just finished reading Dan Ariely’s new book The (Honest) Truth About Dishonesty. What a great title for a book on cheating & lying.

Dishonesty is pretty easy to understand, isn’t it? We know when someone is being honest or not, and we ourselves are of course never dishonest? Or so we think.

Well your preexisting ideas about honesty are about to be turned upside down.
Ariely has an amazing storytelling ability that will leaving you scratching your head and saying – I hadn’t thought of that.

Conventional economics theory has it that people will be dishonest when there is low risk and it serves their economic interest. But it’s actually much much more complex.

He tells one story of how a cab driver lies about the fare in favor of himself, while at other times in favor of the passenger! Or for example the story of how soda & some cash are left in an office refrigerator. The sodas disappear, but the money remains.

Perhaps the most interesting discussion is that of law firms & billable hours. There is of course the question of what counts as a billable hour, and where the rounding happens. But what’s more accountability can and does become a measure of how much work gets done. So those who round down because they are more honest, may be perceived to be doing the least amount of work! Conflicts of interest indeed.

[quote]Sometimes conflicts of interest cloud our judgement and steer our thinking. In professions where we make recommendations and then also provide service based on those recommendations those may be difficult to eliminate. Consumers or businesses should make every effort to find service providers with the least conflicts.[/quote]

We’re a service provider ourselves. Wondering how we work? Take a peek at our Anatomy of a Performance Review to get insight.

Interestingly, based on the soda story among others, it turns out people are less likely to be dishonest when cash is involved. So he wonders, as we become more of a cashless society, it may be that our moral compass slips? Still hot on the heels of our housing financial crisis it does make one wonder.

Perhaps my favorite anecdote was one where told the story of locking himself out of his apartment. After very quickly picking the lock he was surprised. The locksmith explained that doors are very easy to pick and open for a professional. You wouldn’t need locks for the 1% of people who are honest. Nor would you need them for the 1% of people who are thieves, they can pick your lock easily. Locks are for the 98% of people who are mostly honest, but might be tempted to be dishonest if the conditions are right. What he was also saying was that 99% of people are not completely perfectly honest.

Great read and excellent food for thought.

Read this far? Grab our newsletter.