Business Agility at AWS re:Invent

Also find Sean Hull’s ramblings on twitter @hullsean.

Although I couldn’t be in Vegas to attend re:Invent, there is so much online it’s almost better than being at the conference. From an ongoing live stream of keynotes and sessions, to an archived collection on Youtube.

The big wins

You may have heard of all the great things that Amazon or cloud computing can do, but I thought Andy Jassy summarized these nicely in these six points.

1. Replace capex with opex
2. lower total costs of ownership
3. no guessing about capacity
4. encourage agility & innovation
5. differentiation
6. global from the start

Redshift

By far the biggest announcement at the show is Amazon’s new Redshift product. It is a fully managed datawarehouse solution that scales to petabytes in it’s cloud. Currently there are two business intelligence tools that are supported namely Jaspersoft and Microstrategy.

[quote]
In 2003 Amazon was a 5 billion dollar company. Today AWS adds the same infrastructure capacity everyday to it’s availability zones!
[/quote]

Reduced prices by 25% for S3

As a lot of folks know, Amazon has always been about cheaper prices. That model has been disruptive in the book selling industry, and in a huge way in the infrastructure and datacenter industry. As more customers signup, economies of scale mean they can offer the same hardware & services for lower prices.

With that they’re announcing lower prices for S3 by a whopping 25%. To me this speaks to their continuing push to dominate the market by driving prices downward.

Amazon’s Channel on Youtube

If you weren’t able to attend the conference, or want to recap some highlights you might have missed, they have put up a great AWS Channel on Youtube.

Some of the speakers include Sharon Chiarella VP Mechanical Turk, Glenn Hazard, CEO, Xceedium, Todd Barr CMO of Alfresco talks, Bright Fulton, Operations for Swipely, Colin Percival, FreeBSD Developer, Ted Dunning, Chief Application Architect of MapR Technologies, James Broberg, CTO & Founder of MetaCDN, Mitchell Garnaat, Sr. Engineer, David Etue, Vice President, SafeNet, and Mike Culver, Sr. Consultant to name just a few.

Read this far? Grab our Scalable Startups for more tips and special content.

Hacking Job Search – Three Meaty Ideas

Also find the author on twitter @hullsean.

Demand for talented engineers has never been higher. It is in fact the dirty little secret of the startup industry, that there are simply not enough qualified folks to fill the positions.

What this means for you is that you have a lot of options. What it means for a hiring manager is that you will have to work even harder to find the right candidate. Just going to a recruiter isn’t enough. Use your network, go to meetups, follow Gary’s Guide daily.

Also check out our Mythical MySQL DBA piece where we talk about the shortage of DBAs and operations folks.

Further if you’ve dabbled in freelance or independent consulting, I wrote an interesting an in depth look at Why do people leave consulting. Understanding this can help avoid it in your own career, or avoid your resources leaving for better shores.

Find us on twitter @hullsean and linkedin where we share content and ideas everyday!

1. Build your reputation

As they say, your reputation precedes you. So start building it now. Fulltime or freelance, you want to be known.

Speaking, yes you can do it. Start with some small meetups, volunteer to speak on a topic. A ten person room is easier than 30, 50 or 100. Once you have a couple under your belt, fill out a CFP for Velocity, OSCon or some software developers conference. There are many.

Blog – if you’re not already doing so you should. Start with once a week. Comment on industry topics, controversial ideas, or engineering know-how. Prospects can look at this and learn a lot more than from a business card.

Write a book, yes you can. It may sound impossible, but the truth is that publishers are always looking for technical writers. Pick a topic near and dear to you. It’ll also give you endless material for your blog.

Go to meetups, you really need to be getting out there and networking. Get some Moo Business Cards and start working on your elevator pitch!

Social media – being active here helps your blog, and helps people find you. Twitter is a great place to do this. Interact with colleagues and startup founders, VCs and more. If you’re a hiring manager or CTO, you may find great programmers and devops this way.

We also wrote a more in-depth article Consulting and Freelance 101. It’s a three part guide with a lot of useful nugets.

Also take a look at our MySQL DBA Interview Guide which is as helpful to devops and DBAs as it is to managers hiring them!

[quote]
Above all else, build your network & your reputation. It will put you in front of more people as a person, not a commodity or a resume in a pile of hundreds.
[/quote]

2. Qualify prospects

You definitely don’t want to take the first offer you get, and managers don’t want to hire the first candidate that comes along. You want two or three to choose from. Best way to do this is to have options.

If you’re a candidate, network or work through your colleagues. When you do get a lead, be sure you’re speaking to an economic buyer. If you’re not you’ll need to try to find that person who actually signs the checks. They are the ones who ultimately make the decision, so you want to sell yourself to them.

Get a Deposit – I know I know, if it’s your first freelance job, you don’t want to scare them off. Or maybe you do? The only prospects that would be scared off by this are ones who may not pay down the line. Dragging their feet with a deposit can also mean bureaucratic red tap, so be patient too.

Sara Horowitz has an excellent book Freelancers bible, we recommend you grab a copy right now!

Commodity You Are Not so don’t sell yourself as one. What do I mean? You are not an interchangeable part. You have special skills, you have personality, you have things that you’re particularly good at. These traits are what you need to focus on. The dime-a-dozen skills should sit more in the background.

You’ll also need to price and package your services. We talked about this in-depth in Consulting Essentials – Getting the Business.

We also think there is a reason Why Generalists are better at scaling the web.

3. Play the numbers game

For hiring managers this doesn’t mean working through recruiters that might be bringing subpar talent, it means networking through industry events, meetups, startup pitch and venture capital events. There are a few every single day in NYC and there’s no reason not to go to some of them.

For candidates, be eyeing a few different companies, and following up on more than one prospect. You should really think of this process as an integral and enjoyable part of your career, not a temporary in between stage. Networking doesn’t happen overnight, but from a regular process of meeting and engaging with colleagues over years and years in an industry.

At the end of the day hiring is a numbers game so you should play it as such. Keep searching, and always be watching the horizon.

Read this far? Grab our Scalable Startups for more tips and special content.

No iPhones Were Harmed in the Creation of this Outage

Apple’s recent iMessage outage had some users confused. What do you mean I can’t text my favorite cat photos?? How can Apple do this to me!?!?

What happened?

Apple provides services to everyone who uses it’s platform. iCloud for example stores your contacts, calendar, photos, apps and documents in the cloud. No more syncing to itunes to make sure all your stuff is backed up. It’s automatic in the cloud. Yes or course unless iCloud is down.

Same goes for iMessage. Apple has quietly introduced this, as a more feature rich version of text messaging. It’s great until the service isn’t available. What gives?

All these services are backed magically or not so magically by computer servers. These computers sit in datacenters, managed by operations teams, and to some degree with automation. All the things that brought down AWS & AirBNB & Reddit with it could also take out Apple. A serious storm like Sandy also presents real risks.

[quote]
iMessage is a text and SMS replacement service for iPhones & iPads. It is more feature rich, offering device synchronization, group texting & return receipt. But in a very big way it is also an attempt for Apple to muscle into the market and further extend it’s platform reach.
[/quote]

100% uptime ain’t easy

Even for firms that promise insanely good uptime, five nines remains very very hard to achieve in practice.

For starters all the components behind your service, need to be redundant. Multiple load balancers, webservers, caching servers, and of course databases that hold all your business assets.

But as the repeated AWS outages attest, even redundancy here isn’t enough. You also need to use multiple cloud providers. Here you can mirror across clouds so even an outage in one won’t bring down your business.

What about in the world of messaging? Well you can bet your customers don’t likely know or care about high availability, uptime, or any of these other web operations buzzwords. But they sure understand when they can’t use their service. It may give companies like Apple pause as they try to stretch themselves into areas outside their core business of iphones, ipads, and the IOS platform itself.

iMessage – messaging standards power play

When I first upgraded to an iPhone 4S, the first thing I noticed was the light blue bubbles when texting certain people. Why was that, I wondered? I quickly found out about iMessage, which was conveniently configured, to replace my old and trusty text messaging.

Texts or SMS work across all phones, smartphone or not, and apple or not. But open standards don’t lend themselves well to market muscle and dominance. So it makes sense that Apple would be pushing into this space. I met more than one blackberry owner who loved using bbm to keep in touch with colleagues. It’s like your own private club. And that muscle further strengthens Apple’s platform overall. Just take a look at how the Android Ecosystem is broken if you need an example of what not to do.

The flip side is it means you have more to manage. More servers, more services, more dimensions to your business. More frequent outages that can tarnish your reputation.

[quote]
A lot complaining and publicity like the iMessage outage received, may just be an indication that you’re big enough for people to care.
[/quote]

Alternatives abound…

There is huge competition in the messaging space. The outage and it’s publicity further underline this fact.

For example on the iPhone for messaging there is ChatOn, Whatsapp, LINE, SKYPE & wechat just to name a few.

Interestingly, while researching this article, I downloaded WhatsApp to give it a try. Only 99 cents, why not. Turns out that they had not one, but two outages, just a week ago. Seems Apple isn’t the only one experiencing growing pains.

A lot of complaining and publicity could be a sign that you’re big enough for people to care!

Read this far? Grab our Scalable Startups monthly.

Cloud DBA and Management Interview

What does a cloud computing expert need to know? This is the last of a three part guide to interviewing for a cloud operations position. You can find them here – part one Operations Interview and part two Deployment Interview.

Here’s my guide to do just that.

1. Database administration experience

Although in some shops the DBA role is a completely separate one, there are many others where the Linux and Operations teams manage these services as well. We do have a some other material Oracle DBA Interview questions and MySQL DBA Interview Guide. Here’s a taste of what to expect.

o What is RAID? Which type is best?

RAID is a way to share a whole bunch of disks on one server. Databases like Oracle or MySQL do a lot of writing and reading from disk. If there are more disks sharing this work, it’s like you have more waiters in your restaurant. Faster serivce.

Although some folks still hang onto RAID 5 as an option, it’s generally a very bad one. It has a serious write penalty because of parity checking it must perform. Most databases do a lot of writing, even when user transactions are not doing INSERT or UPDATE. What’s more if a disk fails, RAID 5 although technically online, will be so slow as to be effectively unusable while the long slow rebuild happens.

What’s the answer then? RAID 10! It mirrors each volume, and then stripes across those mirrored sets. Fast I/O, fast recovery. Done & done.

o What are the tradeoffs with more indexes versus fewer?

In all relational databases, you build indexes on data. Indexes are just like the ones you think of in the yellow pages, phonebooks of yore. An index on first name means you can look up Obama by Barack as well. Index on street addresses means you can lookup on the White House. So the more indexes you have, the more different ways you can search for & fetch what you want.

On the other hand the penalty here, is that whenever you add new data & records to this database, all those indexes must be updated. That’s overhead, which slows down writes.

So the tradeoff is more indexes – faster fetching, slower writing. Fewer indexes slower fetching, faster writing.

o What do NoSQL databases eliminate? How do they achieve great speed?

There are quite a few different types of NoSQL databases. So I’m generalizing quite a lot here. One thing NoSQL databases eliminate is the ability to JOIN data across different columns. By removing this great feature of relational databases, they dramatically simplify the underlying implementation. No free lunch!

What else? Many of these databases cut corners on what’s called durability. What is durability? Imagine you are in a lecture hall and bring your notebook or are waiting tables, and taking orders. It might be quicker to do so without writing things down. You keep it all in your head. Great, but what if you forget something? You have to go ask for the order again! It may be faster, but more prone to error. Losing data is not something to be taken lightly. NoSQL databases don’t always flush data to permanent storage.

[quote]
Whether or not an web operations candidate uses command line may seem like a small issue. But it speaks to what their DNA is, and the strength of their foundation. Strength and comfort on the command line is key.
[/quote]

o What is Amazon RDS? When should I use it?

Amazon has a managed relational database solution called RDS. It’s basically MySQL, Oracle or SQL Server, but modified so you can’t shoot yourself in the foot. Administrative tasks are simplified, but so are your configuration options.

I wrote an in-depth Amazon RDS use cases article. It mostly covers MySQL, but the general rules apply to Oracle & SQL Server. At the end of the data RDS is a lot less configurable and flexible. But if you don’t have a regular DBA on staff, it will probably simplify your administration of these servers.

o What are read-replicas? What about Multi-az?

Read-replicas are read-only copies of your data. Using MySQL these are fairly stock master-slave configurations. Note since they’re the standard technology, they’re still asyncronous. So yes the read-replica can lag behind.

Multi-az is a proprietary technology, and Amazon doesn’t disclose what’s under the hood. However it’s likely running on top of something like DRBD which is a distributed filesystem. This allows the underlying disk I/O to be mirrored across the internet, and to another availability zone. You’ll enjoy syncronous copies of your data, and no data consistency problems. Keep in mind those that the alternate server is offline or cold and can take time to come online.

o What is the primary bottleneck of hosting databases in the cloud? How has Amazon recently addressed this?

As I explained above disk I/O remains the largest bottleneck for relational databases, even if the entire dataset fits in memory. Why? Because sorting, joining, and rearranging data can take orders of magnitude more memory to magically do in memory. And that’s not even talking about durability guarentees.

The cloud has traditionally lagged quite a lot behind physical servers in terms of disk I/O so some internet firms have shyed away from moving to the cloud. EBS volumes were typically limited to a few hundred IOPs.

Amazon’s recently announced Provisioned IOPs. It’s a mouthful of a name for a very big development. It means you can provision how fast you want those virtual disks to be. For individual volumes the limit seems to be 2000 IOPs but you can also software raid across many of those virtual disks. For Amazon RDS the limit is reportedly 10,000 IOPs. This new feature will make a huge difference for hosting large high I/O databases in Amazon’s cloud.

2. Architecture & Management Questions

o Why does the API battle between Amazon & Eucalyptus (FOSS) matter?

As large applications are architected to build hardware components, and resources in the cloud, the API they work through becomes key. Sticking to an open standard for this API means you can change cloud vendors and/or build on multiple ones. We talked about this multi-cloud solution as a key way to avoid outages like AirBNB and Reddit experienced when AWS had an outage.

Following on the heels of that article, we were quoted about multi-cloud by Brandon Butler in his Network World piece .

o Do you use command line tools? Why?

A good web operations candidate should be very comfortable with command line tools. Everything in Linux is command line. It’s like broadway acting to movie acting, or literature to books. It’s the original source, much more powerful, what’s more it indicates and requires much stronger theoretical knowledge of the underlying systems being managed.

o What can go wrong with backups? How do we test them?

Everything can go wrong with them. They can fail to complete. Be backups of the wrong service or resource. Even the backup software itself can have bugs. The only way to sleep well at night is if you run firedrills and restore your application and data top to bottom.

o Should we encrypt filesystems in the cloud? What are the risks?

This depends on your environment and how sensitive your data is. If you’re collecting credit card data for instance, it may be key. However some surprising blips may push other applications to encrypt as well. Bugs in the hypervisor could potentially make your data vulnerable. What’s more if the cloud provider gets subpeonaed, it may well capture your server and data into the net. Better safe than sorry. Remember you don’t know where your data actually resides, but you do control who has access if you’re encrypted.

We wrote a very in-depth piece on Deploying on Amazon EC2 where we discuss questions such as encryption in more depth.
o Should we use offsite backups?

It’s definitely worth doing this. One more layer of insurance.

o What is load balancing? Why is it difficult with databases?


Load balancing puts a digital traffic circle into your infrastructure, giving you two roads or paths to resources. However those resources have to be exactly the same. With databases you are constantly writing to tables, and updating records. When you scale those horizontally, it becomes impossible to keep track of changes.

[quote]
Relational databases are inherently difficult to scale. Most environments scale a single authoritative master vertically, and add multiple read-only slaves horizontally to allow the appplication to serve more customers.
[/quote]


o Why use a package manager? Can we install from source?

Package managers simplify the installation of software components. A team such as Redhat, Ubuntu or Debian builds a distribution, and compiles all components storing them in a repository. Installing packages this way allows your setup to be standard across servers. This allows more automation, and is simpler for another admin to figure out what you have, down the line when it passes to someone elses shoulders.

Installing from source is generally a bad idea. Although it allows you to tweak and configure each piece of software the way you want, tightly and efficiently, it also means everything is custom. No commoditization advantages.

o What is horizontal scalability?

This involves adding more hardware, more individual servers to service the same application and users.

o What is vertical scalability?

This means scaling up or growing your existing single server, so it is larger, has more memory, cpu or faster disk.

o What can go wrong with automatic failover?

Just about everything. Applications and services can stall, disks can fail, servers can hang. What’s more networks can exhibit latency. Automatic failover is ultimately a piece of software or algorithm trying to diagnose and handle situations. And it does so based on a very small list of rules or heuristics. The real world is messy, so this can often lead to false failure detection, and potentially loss of data.

o How do cloud vendors implement vertical scalability?

This may vary dramatically between cloud providers. Ultimately, however since virtualization allows you to boot a disk image onto any hardware, you can snapshot your current root volume or disk and then boot it on another server, one that is larger, smaller and so forth. About the only thing you need to watch out for is 32 versus 64 bit questions.

If you haven’t already, don’t forget to checkout the rest of this series – part one Operations Interview and part two Deployment Interview.

Read this far? Grab our newsletter – startup scalability.

Crisis Management in the Crosshairs – Sandy

Crisis Management During Sandy

The news this past week has brought endless images of devastation. All metropolitan region, the damage is apparent.

More than once in conversation I’ve commented “That’s similar to what I do.” The response is often one of confusion. So I go on to clarify. Web operations is every bit about disaster recovery and crisis management in the datacenter. If you saw Con Edison down in the trenches you might not know how that power gets to your building, or what all those pipes down there do, but you know when it’s out! You know when something is out of order.

That’s why datacenter operations can learn so much about crisis management from the handling of Hurricane Sandy.

This is a followup to our popular article last week Real Disaster Recovery Lessons from Sandy.

1. Run Fire Drills

Nothing can substitute for real world testing. Run your application through it’s paces, pull the plugs, pull the power. You need to know what’s going to go wrong before it happens. Put your application on life support, and see how it handles. Failover to backup servers, restore the entire application stack and components from backups.

2. Let the Pros Handle Cleanup

This week Fred Wilson blogged about a small data room his family managed, for their personal photos, videos, music and so forth. He ruminated on what would have happened to that home datacenter, were he living there today when Sandy struck.

It’s a story many of us can related to, and points to obvious advantages of moving to the cloud. Handing things over to the pros means basic best practices will be followed. EBS storage, for example is redundant, so a single harddrive failure won’t take you out. What’s more S3 offers geographically distributed redundant copies of your data.

After last week’s AWS outage I wrote that AirBNB & Reddit didn’t have to fail. What’s more in the cloud, disaster recovery is also left to the professionals.

[quote]
Web Operations teams do what Con Edison does, but for the interwebs. We drill down into the bowels of our digital city, find the wires that are crossed, and repair them. Crisis management rules the day. I can admire how quickly they’ve brought NYC back up and running after the wrath of storm Sandy.
[/quote]

3. Have a few different backup plans

Watching New Yorkers find alternate means of transportation into the city has been nothing short of inspirational. Trains not running? A bus services takes it’s place. L trains not crossing the river? A huge stream of bikes takes to the williamsburg bridge to get workers to where they need to go.

Deploying on Amazon can be a great cloud option, but consider using multiple cloud providers to give you even more redundancy. Don’t put all your eggs in one basket.

Some very important things to remember about MySQL backups.

4. Keep Open Lines of Communication

While recovery continued apace, city dwellers below 34th street looked to text messages, and old school radios to get news and updates. When would power be restored? Does my building use gas or steam to heat? Why are certain streets coming back online, while others remain dark?

During an emergency like this one, it becomes obvious how important lines of communication are. So to in datacenter crisis management, key people from business units, operations teams, and dev all must coordinate. Orchestrating that is and art all by itself. A great CTO knows how to do this.

Read this far? Grab our monthly scalable startups.

Cloud Deployment Interview

What does a cloud computing expert need to know? In part one of the cloud interview guide we covered some basic unix & Linux systems administration skills, and cloud computing and infrastructure concepts. Those are key starting points. You might also want to jump to part 3 cloud dba, architecture and management interview questions.

In this second part, let’s dig into deploying applications in the cloud, and day to day operations skills. There’s a lot of material here. We recommend picking a few questions out of the bunch and focusing on those questions, rather than trying to cover all of them.

Also while on the topic of hiring, keep in mind that Hiring is a Numbers Game.

1. Deploying in the Cloud

Deploying applications into virtual or cloud datacenters involves understanding and evaluating providers. Many just deploy on Amazon EC2 as it is far and away the largest cloud hosting solution, with the most robust offering.

You might also like our MySQL DBA Interview Guide as well.

o What sets amazon apart from the other cloud providers?

There are probably two things that set Amazon apart from other cloud infrastructure solutions. EBS or elastic block storage being one. Although the others have storage solutions, and Rackspace is working on their own virtualized storage, Amazon seems to be the furthest ahead with their offering. It is fully virtual, allows arbitrary chunks of storage to be attached to instances, and allows instances to boot of ebs volumes.

The other major point is that since Amazon has grown so large, so quickly, it has more datacenters, in more geographically dispersed areas than other providers. Since these are organized into logical resources, and can be accessed through API, it makes your application infrastructure truly virtual.

o What are some other large cloud providers?

Joyent, Rackspace cloud, Storm on Demand, GoGrid and VoxCloud. There are certainly many others. Take a look at this Quora post: Most Reliable Cloud Providers.

o Tell one vendor management story.

Everyone who has managed operations, has worked with vendors at one point or another. For example if you’ve worked with Rackspace you know that it’s pretty easy to get a human on the line. Amazon on the other hand allows you to do-it-yourself for everything, and only later added on a support service option. So their service pattern and history are different.

Also check out 3 Things CEOs should know about the cloud.

o How do you troubleshoot a problems?

There isn’t really a right or wrong answer to this question, but it’s a nice starting point to discussion. It can also help illustrate a candidates communication skills, and how specifically they walk through solving a problem. What problem they choose as an illustration, and how they work through to a resolution is an important indicator of operations experience.

[quote]
Pros and cons of Amazon versus Rackspace, configuration management & automation and cloud management solutions like Scalr and Rightscale… these and other skills are a important for a cloud deployment expert.
[/quote]

o What is puppet and chef?

Puppet is a configuration management system which allows ops teams to build templates for servers, and deploy many servers based on those templates. It further allows centralized control of configuration, to automate the management of a large number of servers.

Chef grew out of frustrations of Puppet, and is a sort of next generation configuration management system.

The term infrastructure as code may be thrown around. Since all cloud resources can be provisioned through API calls, everything in server deployment can be *theoretically* done via code, from spinup of servers, to installing packages, to configuring, code checkout, seeding databases and more.

Also our article What is Infrastructure provisioning and why is it important.

o What are some of the pros and cons of configuration management for operations?

Pros include allowing a smaller team to automate the deployment of a large fleet of servers, standardization, and consistency. Cons include complexity when needing to do surgical, urgent changes, and complexity when coming into an existing environment that you’ve inherited.

o How is rightscale different? What does it provide?

Rightscale is a layer on top of your cloud provider. They provide a common interface and dashboard from which to deploy servers. Templating, automation, and multi-cloud support make it a great solution for teams that have less technical expertise on staff or less hands to manage things.

o How about scalr?

They’re another management solution, that supports multiple cloud providers. They offer templating, and auto-scaling too.

While you’re here, take a look at our Myth of Five Nines – Why HA is Overrated.

2. Day to day skills

o What type of programming experience do you have?

The answer is that every ops guy or girl should be able to code, just as every developer should have some basic operational experience. Should and does are often two different things, so ask for some examples.

o shell scripts

Bash, csh, Perl and Python are all part of the Linux administrators toolbox. Writing backup scripts, log rotation, automating routine tasks and so forth are all common needs of an operations expert.

Regular expressions are a part of Unix and used in scripting to search files, cronjobs, and ETL jobs. Ask for some basic examples.

o What is continuous integration?

The old model of code deployment was called waterfall, and allowed long careful planning, coding of new features, testing, and finally deployment. The cycle could take weeks or months and iterative change took a lot of time. Continuous integration also known as agile deployments, allows a much more frequent in some cases many times per day deployment of changes.

o What are metrics good for?

Just like in website visitor tracking, and business analytics, server level analytics and tracking is possible. Collecting server metrics such as load averages, memory, disk and cpu usage over time can be invaluable. When an application slows or server stalls, checking historical metrics can often quickly reveal problems or causes.

What are some examples? nagios, ganglia, cacti, munin, opennms

o What is unit testing?

This allows for software to be build in small testable compontents. When the compontents are coded, tests are also written that test whether they are operating properly, and whether dependencies are also installed and working.

[quote]
Metrics, monitoring, load testing, firewalls, security & patching, Saas, Paas and IaaS there is a wide swath of skills needed to be competent as a web operations engineer. You’ve got your work cut out for you!
[/quote]
o What is load testing?

By performing some benchmarks, load testing can make estimates about how the application and code will perform when more users are hitting it.

o Security & networking

Sometimes a systems administrator is a generalized admin and sometimes there is a networking specialist on staff who doesn’t allow anyone else to touch that domain.

o What are firewall rules?

Unix services use port numbers to expose those services to the world. Since all servers on the internet are identified by IP addresses, firewall rules are defined around IP addresses or groups of them, and the ports they’re allowed to access.

o What is DNS?

DNS stands for domain name services. This is the sort of yellow pages of the internet. DNS allows a server name to be converted to it’s underlying IP address. It’s a very important service for any network, and generally includes many backup servers for when the primaries experience problems.

o What is a virtual private network?

A VPC provides a network link between a physical datacenter or your offices network, and your cloud provider. It allows you to elastically grow your existing datacenter using virtual resources, while treating those new boxes more like servers in your existing datacenter. IP addresses and subnets are controlled by your existing network rules and admins.

o Why is security important in web operations?

Since your business assets are primarily stored in digital form, the security of those assets depends on the security of your computer systems. Passwords, firewalls and encryption are all relevant.

o Why is patching software important?

Since security is a moving target, and vulnerabilities are constantly being discovered in software, patching and updates are important. Staying fairly current in applying patches means you network and systems will be more secure.

o What is intrusion detection?

Bugs in software open up vulnerabilities and ways into systems. Intrusion detection attempts to detect that such intrusions and avoid further damage.


o What is Saas – Software as a Service?

An example is dropbox, and other so-called hold-my-data type solutions fall into this category.

o What is Iaas – Infrastructure as a Service?

This is raw iron, the virtualized datacenters, hosting providers such as Amazon, GoGrid, Joyent, and Rackspace.

o What is Paas – platform as a service?

Solutions such as heroku, squarespace, wpengine and engineyard fall into this category. Some provide a platform such as the WordPress CMS, with arbitrary scaling options. Others like Heroku and EngineYard allow Ruby applications to be deployed without the need for a lot of fuss at the operational level.

We’re not done yet. In part three of this series, we’ll hit on dba skills, and a series of general questions that cut across the spectrum of web operations. Or jump back to part one of the cloud interview guide.

Read this far? Grab our newsletter – startup scalability.