How to hire a developer that doesn’t suck

xkcd_goodcode
Strip by Randall Munroe; xkcd.com

First things first. This is not meant to be a beef against developers. But let’s not ignore the elephant in the living room that is the divide between brilliant code writers and the risk averse operations team.

By the way we also have a MySQL DBA Interview Questions article which is quite popular.

Also take a look at our AWS & EC2 Interview questions piece.

Lastly we have a great Oracle DBA Hiring Guide.

It is almost by default that developers are disruptive with their creative coding while the guys in operations, those who deploy the code, constantly cross their fingers in the hope that application changes won’t tilt the machine. And when you’re woken up at 4am to deal with an outage or your sluggish site is costing millions in losses, the blame game and finger-pointing starts.

Also: Does Amazon have a dirty little secret?

If you manage a startup you may be faced with this problem all the time. You know your business, you know what you’re trying to build but how do you find people who can help you build and execute your ideas with minimal risk?

Join 38,000 others and follow Sean Hull on twitter @hullsean.

Ideally, you want people who can bridge the mentality divide between the programmers eager to see feature changes, the business units pushing for them, and the operations team resistant to changes for the sake of stability.

DevOps – Why can’t we all live together?

The DevOps movement is an attempt to bring all these folks together. For instance by providing insight to developers about the implications of their work on performance and availability, they can better balance the onslaught of feature demands from users with the business’ need for up-time.

Also: The art of resistence or when you have to be the bad guy

Operations teams can work to expose operational data to the development teams.  Metrics collection and analytics aren’t just for the business units anymore. Employing tools like Cacti, OpenNMS or Ganglia allow you to communicate with developers and other business units alike about up-time, and the impact of deployments on site availability, and ultimately the bottom-line.

Above all, business goals and customer needs should underscore everything the engineering team is doing. Bringing all three to the table makes for a more cohesive approach that will carry everyone forward.

How to spot a DevOps person – Finding the sweet spot

The DevOps person is someone with the right combination of skill, knowledge and experience that places him or her in the sweet spot where quality assurance, programming skills and operations overlap.
There are also a few distinguishing characteristics that will help identify such an ideal candidate.

We also wrote a more general piece – What is Devops and why is it important.

Look for good writers and communicators

Imagine the beads of sweat forming when a developer tells you: “We’ve made the changes. Nothing is broken yet.”
This is like stepping on glass because it implies something will actually break. The point is savvy developers should be aware that the majority of people do not think along the same lines as they do.
Assuming your candidate has all the required technical skills, a programmer with writing skills tends to be better at articulating ideas and methods coherently. He or she would also be less resistant to documentation and be able to step back somewhat from the itty-bitty details. Communication, afterall is at the core of the DevOps culture where different sides attempt to understand each other.

Related: What’s the 4 letter word dividing dev & ops?

Pick good listeners

Even rarer than good writers are good listeners. Being able to hear what someone else is saying, and reiterate it in their own terms is a key important quality. In our example, the good listener would probably have translated ‘nothing is broken yet’ into “the app is running smoothly. We didn’t encounter any interruptions but we’ll keep watch on things.”

Lean towards pragmatists and avoid the fanatics

We all want people who are passionate about something but when that passion morphs into fanaticism it can be unpleasant. Fanaticism suggests a lower propensity to compromise. Such characters are very difficult to negotiate with. Analogously in tech, we see people latch on to a certain standard with unquestioning loyalty that’s bafflingly irrational. Someone who has had their hand in many different technologies is more likely to be technology agnostic, or rather, pragmatic. They’ll also have a broader perspective, and are able to anticipate how those technologies will play together.  Furthermore a good sense of where things will run smoothly and where there will be friction is vital.

Related: When you have to take the fall

Pay attention to extra-curricular activities

Look at technology interests, areas of study, or even outside interests. Does the person have varying interests and can converse about different topics?  Do they tell stories, and make analogies from other disciplines to make a point?  Do they communicate in jargon-free language you can understand?

Sniff out those hungry for success

As with any role, finding someone who is passionate and driven is important.  Are they on-time for appointments? Did they email you the information you requested? Are they prepared and communicative?  Are they eager to get started?

Hiring usually focuses on skills and very well-crafted resumés but why do you still find some duds now and then? By emphasizing personality, work ethics, and the ability to work with others, you can sift through the deluge of candidates and separate the wheat from the chaff for qualities that will surely serve your business better in the long run.

 
Related: When clients don’t pay

 

Read this far? Grab my newsletter!

 

Why generalists are better at scaling the web

Join 12,100 others and follow Sean Hull on twitter @hullsean.

Recently at Surge 2011, the annual  conference on scalability  and performance, Google’s CIO Ben Fried gave an illuminating keynote address. His main insight was that generalists are the people that will lead engineering teams in successfully scaling the web.

Read: Why devops talent is in short supply

In a world where the badge of Specialist or Expert is prized, this was refreshing perspective from an industry bigwig. As tech professionals, or any professional for that matter, we don’t welcome the label of generalist. The word suggests a jack-of-all-trades and master of none. But the generalist is no less an expert than the specialist. Generalists can get their hands greasy with the tools to fix bugs in the machine but they are especially good at mobilizing the machine itself; with their talents of broad vision, and perspective they can direct an entire team to accomplish tasks efficiently. This ability to see big-picture can not be underestimated especially during times of crisis or pressure to meet targets. For a team to scale the web effectively, you’re going to need a good mix of both types of personalities.

Also: Why a killer title can make or break your content efforts

Continue reading “Why generalists are better at scaling the web”

iHeavy Insights 82 – Better Practices

Best Practices, the term we hear thrown around a lot.  But like going on that new years diet, too often ends up more talk than action.

Manage Processes

Operator error ie typing the wrong command is always a risk.  Logging into the wrong server to drop a database or typing the dump command such that you dump data into the database, these are risks that operations folks face everyday.

Accountability is important, be sure all of your systems folks login to their own accounts.  Apply the least privileges model, give permissions on an as needed basis.

Set prompts with big bold names that indicate production servers and their purpose.  Automate repetitive commands that are prone to typos.

Don’t be afraid to give developers read-only accounts on production servers.

Communicate Clearly

Regular team meetings, a la the Agile stand ups are a great way to encourage folks to communicate.  Bring the developers and operations folks together.   Ask everyone in turn to voice their current todos, their concerns and risks they see.  Encourage everyone to listen with an open mind.  Consider different perspectives.

Communication is a cultural attribute.  So it comes from the top.  Encourage this as a CTO or CIO by asking questions, communicating your concerns, repeat your own requests in different words and paraphrase.  Listen to what your team is saying, repeat and rephrase those concerns, and how and when they will be addressed.

Document Processes

A culture of documenting services, and processes is healthy.  It provides a central location and knowledge base for the team.  It also prevents sliding into the situation where only one team member understands how to administer critical business components.  Were that person to be unavailable or to leave the company, you’re stuck reverse engineering your infrastructure and guessing at architectural decisions.

Better Practices

Rather than think of best practices as something you need to achieve today, think of it as an ongoing day-to-day quest for improvement.

  • repetitive manual processes – employ automation & script those processes where possible.
  • where steps require investigation and research – document it
  • where production changes are involved – communicate with business units, qa & operations
  • always be improving – striving for better practices

Review – Test Driven Infrastructure with Chef – Stephen Nelson-Smith

In search of a good book on Chef itself, I picked up this new title on O’Reilly.  It’s one of their new format books, small in size, only 75 pages.

There was some very good material in this book.  Mr. Nelson-Smith’s writing style is good, readable, and informative.  The discussion of risks of infrastructure as code was instructive.  With the advent of APIs to build out virtual data centers, the idea of automating every aspect of systems administration, and building infrastructure itself as code is a new one.  So an honest discussion of the risks of such an approach is bold and much needed.  I also liked the introduction to Chef itself, and the discussion of installation.

Chef isn’t really the main focus of this book, unfortunately.  The book spends a lot of time introducing us to Agile Development, and specifically test driven development.  While these are lofty goals, and the first time I’ve seen treatment of the topic in relation to provisioning cloud infrastructure, I did feel too much time was spent on that.  Continue reading “Review – Test Driven Infrastructure with Chef – Stephen Nelson-Smith”

Service Monitoring – What is it and why is it important?

Data centers are complex beasts, and no amount of operator monitoring by itself can keep track of everything.  That’s why automated monitoring is so important.

So what should you monitor?  You can divide up your monitoring into a couple of strategic areas.  Just as with metrics collection, there is business & application level monitoring and then there is lower level system monitoring which is also important.

Business & Application Monitoring

  • If a user is getting an error page or cannot connect
  • If an e-commerce  transaction is failing
  • General service outages
  • If a business goal is met – or not
  • Page timeouts or slowness

Systems Level Monitoring

  • Backups completed and success
  • Error logs from database, webserver & other major services like email
  • Database replication is running
  • Webserver timeouts
  • Database timeouts
  • Replication failures – via error logs & checksum checks
  • Memory, CPU, Disk I/O, Server load average
  • Network latency
  • Network security

Tools that can perform this type of monitoring include Nagios,

Quora discussion – Web Operations Monitoring

Infrastructure Provisioning – What is it and why is it important?

In the old days…

You would have a closet in your startup company with a rack of computers.  Provisioning involved:

  1. Deciding on your architectural direction, what, where & how
  2. Ordering the new hardware
  3. Waiting weeks for the packages to arrive
  4. Setup the hardware, wire things together, power up
  5. Discover some component is missing, or failed and order replacement
  6. Wait longer…
  7. Finally get all the pieces setup
  8. Configure software components and go

Along came some industrious folks who realized power and data to your physical location wasn’t reliable.  So datacenters sprang up.  With data centers, most of the above steps didn’t change except between steps 3 & 4 you would send your engineers out to the datacenter location.  Trips back and forth ate up time and energy.

Then along came managed hosting.  Managed hosting saved companies a lot of headache, wasted man hours, and other resources.  They allowed your company to do more of what it does well, run the business, and less on managing hardware and infrastructure.  Provisioning now became:

  1. Decide on architecture direction
  2. Call hosting provider and talk to sales person
  3. Wait a day or two
  4. Setup & configure software components and go

Obviously this new state of affairs improved infrastructure provisioning dramatically.  It simplified the process and sped it up as well.  What’s more a managed hosting provider could keep spare parts and standard components on hand in much greater volume than a small firm.  That’s a big plus.  This evolution continued because it was a win-win for everyone.  The only downside was when engineers made mistakes, and finger pointing began.  But despite all of that, a managed hosting provider which does only that, can do it better, and more reliably than you can yourself.

So where are we in present day?  We are all either doing, or looking out cloud provisioning of infrastructure.  What’s cloud provisioning?  It is a complete paradigm shift, but along the same trajectory as what we’ve described above.  Now you removed all the waiting.  No waiting for sales team, or the ordering process.  That’s automatic.  No waiting for engineers to setup the servers, they’re already setup.  They are allocated by your software and scripts.  Even the setup and configuration of software components, Operating System and services to run on that server – all automatic.

This is such a dramatic shift, that we are still feeling the affects of it.  Traditional operations teams have little experience with this arrangement, and perhaps little trust in virtual servers.  Business units are also not used to handing the trigger to infrastructure spending over to ops teams or to scripts and software.

However the huge economic pressures continue to push firms to this new model, as well as new operational flexibility.  Gartner predicts this trend will only continue. The advantages of cloud infrastructure provisioning include:

  1. Metered payment – no huge outlay of cash for new infrastructure
  2. Infrastructure as a service – scripted components automate & reduced manual processes
  3. Devops – Manage infrastructure like code with version control and reproduceability
  4. Take unused capacity offline easily & save on those costs
  5. Disaster Recovery is free – reuse scripts to build standard components
  6. Easily meet seasonal traffic requirements – spinup additional servers instantly

On Quora Sean Hull asks – What is infrastructure provisioning and why is it important?

Object Relational Mapper – What is it and why is it important?

Object Relational Mappers or ORMs are a layer of software that sits between web developers and the database backend.  For instance if you’re using Ruby as your web development language, you’ll interact with MySQL through an ORM layer called ActiveRecord.  If you’re using Java, you may be fond of the ORM called Hibernate.

ORMs have been controversial because they expose two very different perspectives to software development.  On the one hand we have developers who are tasked with building applications, fulfilling business requirements, and satisfying functional requirements in a finite amount of time.  On the other hand we have operations teams which are tasked with managing resources, supporting applications, and maintaining uptime and availability.

Often these goals are opposing.  As many in the devops movement have pointed out, these teams don’t always work together keeping common goals in mind.  How does this play into the discussion of ORMs?

Relational databases are a technology developed in the 70’s that use an arcane language called SQL to move data in and out of them.  Advocates of ORMs would argue rightly so, that SQL is cumbersome and difficult to write, and that having a layer of software which helps you in this task is a great benefit.  To be sure it definitely helps the development effort, as software designers, architects and coders can focus more of their efforts on functional requirements and less on arcane minutiae of SQL.

Problems come when you bump up against scalability challenges.  The operations team is often tasked with supporting performance requirements.  Although this can often mean providing sufficient servers, disk, memory & cpu resources to support an application, it also means tuning the application.  Adding hardware can bring you 2x or 5x improvement.  Tuning an application can bring 10x or 100x improvement.  Inevitably this involves query tuning.

That’s where ORMs become problematic, as they don’t promote tweaking of queries.  They are a layer or buffer to keep query writing out of sight.

In our experience as performance and scalability experts for the past fifteen years, query tuning is the single biggest thing you can do to improve your web application.  Furthermore some of the most challenging and troublesome applications we’ve been asked to tune have been built on top of ORMs like Hibernate.

Sean Hull asks on Quora – What is an ORM and why is it important?

Capacity Planning – What is it and why is it important?

Look at your website’s current traffic patterns, pageviews or visits per day, and compare that to your server infrastructure. In a nutshell your current capacity would measure the ceiling your traffic could grow to, and still be supported by your current servers. Think of it as the horsepower of you application stack – load balancer, caching server, webserver and database.

Capacity planning seeks to estimate when you will reach capacity with your current infrastructure by doing load testing, and stress testing. With traditional servers, you estimate how many months you will be comfortable with currently provisioned servers, and plan to bring new ones online and into rotation before you reach that traffic ceiling.

Your reaction to capacity and seasonal traffic variations becomes much more nimble with cloud computing solutions, as you can script server spinups to match capacity and growth needs. In fact you can implement auto-scaling as well, setting rules and thresholds to bring additional capacity online – or offline – automatically as traffic dictates.

In order to be able to do proper capacity planning, you need good data. Pageviews and visits per day can come from your analytics package, but you’ll also need more complex metrics on what your servers are doing over time. Packages like Cacti, Munin, Ganglia, OpenNMS or Zenoss can provide you with very useful data collection with very little overhead to the server. With these in place, you can view load average, memory & disk usage, database or webserver threads and correlate all that data back to your application. What’s more with time-based data and graphs, you can compare changes to application change management and deployment data, to determine how new code rollouts affect capacity requirements.

Sean Hull asks about Capacity Planning on Quora.

Zero Downtime – What is it and why is it important?

For most large web applications, uptime is of foremost importants.  Any outage can be seen by customers as a frustration, or opportunity to move to a competitor.  What’s more for a site that also includes e-commerce, it can mean real lost sales.

Zero Downtime describes a site without service interruption.  To achieve such lofty goals, redundancy becomes a critical requirement at every level of your infrastructure.  If you’re using cloud hosting, are you redundant to alternate availability zones and regions?  Are you using geographically distributed load balancing?  Do you have multiple clustered databases on the backend, and multiple webservers load balanced.

All of these requirements will increase uptime, but may not bring you close to zero downtime.  For that you’ll need thorough testing.  The solution is to pull the trigger on sections of your infrastructure, and prove that it fails over quickly without noticeable outage.  The ultimate test is the outage itself.

Sean Hull on Quora: What is zero downtime and why is it important?

Feature Flags – What are they and why are they important?

Feature flags are switches that developers architect into their web applications to allow a feature to be turned on or off.  It is simple sounding in description, but harder to implement or enable after the fact.

These switches allow the systems team to operationalize new application functionality.  It allows the ability to turn hot button features on or off as needed.  This can be bring a tremendous power and flexibility to the operations team for deployments where traffic patterns and site usage patterns cannot be known in advance.   It can increase uptime and availability of the overall site, by minimizing the impact any new feature might have.

Feature flags can also be implemented as feature dials, allowing the feature to be exposed to a percentage of users, select users, or some other meaningful way to turn it up or down gradually.

Sean Hull asks on Quora: What are feature flags and why are they important?