Tag Archives: operations

How to hire a developer that doesn't suck

xkcd_goodcode
Strip by Randall Munroe; xkcd.com

First things first. This is not meant to be a beef against developers. But let’s not ignore the elephant in the living room that is the divide between brilliant code writers and the risk averse operations team.

By the way we also have a MySQL DBA Interview Questions article which is quite popular.

Also take a look at our AWS & EC2 Interview questions piece.

Lastly we have a great Oracle DBA Hiring Guide.

It is almost by default that developers are disruptive with their creative coding while the guys in operations, those who deploy the code, constantly cross their fingers in the hope that application changes won’t tilt the machine. And when you’re woken up at 4am to deal with an outage or your sluggish site is costing millions in losses, the blame game and finger-pointing starts.

If you manage a startup you may be faced with this problem all the time. You know your business, you know what you’re trying to build but how do you find people who can help you build and execute your ideas with minimal risk?

Ideally, you want people who can bridge the mentality divide between the programmers eager to see feature changes, the business units pushing for them, and the operations team resistant to changes for the sake of stability. Continue reading How to hire a developer that doesn't suck

Service Monitoring – What is it and why is it important?

Data centers are complex beasts, and no amount of operator monitoring by itself can keep track of everything.  That’s why automated monitoring is so important.

So what should you monitor?  You can divide up your monitoring into a couple of strategic areas.  Just as with metrics collection, there is business & application level monitoring and then there is lower level system monitoring which is also important.

Business & Application Monitoring

  • If a user is getting an error page or cannot connect
  • If an e-commerce  transaction is failing
  • General service outages
  • If a business goal is met – or not
  • Page timeouts or slowness

Systems Level Monitoring

  • Backups completed and success
  • Error logs from database, webserver & other major services like email
  • Database replication is running
  • Webserver timeouts
  • Database timeouts
  • Replication failures – via error logs & checksum checks
  • Memory, CPU, Disk I/O, Server load average
  • Network latency
  • Network security

Tools that can perform this type of monitoring include Nagios,

Quora discussion – Web Operations Monitoring

Devops – What is it and why is it important?

Devops is one of those fancy contractions that tech folks just love.  One part development or developer, and another part operations.  It imagines a blissful marriage where the team that develops software and builds features that fit the business, works closely and in concert with an operations and datacenter team that thinks more like developers themselves.

In the long tradition of technology companies, two separate cultures comprise these two roles.  Developers, focused on development languages, libraries, and functionality that match the business requirements keep their gaze firmly in that direction.  The servers, network and resources those components of software are consuming are left for the ops teams to think about.

So too, ops teams are squarely focused on uptime, resource consumption, performance, availability, and always-on.  They will be the ones worken up at 4am if something goes down, and are thus sensitive to version changes, unplanned or unmanaged deployments, and resource heavy or resource wasteful code and technologies.

Lastly there are the QA teams tasked with quality assurance, testing, and making sure the ongoing dearth of features don’t break anything previously working or introduce new show stoppers.

Devops is a new and I think growing area where the three teams work more closely together.  But devops also speaks to the emerging area of cloud deployments, where servers can be provisioned with command line api calls, and completely scripted.  In this new world, infrastructure components all become components in software, and thus infrastructure itself, long the domain of manual processes, and labor intensive tasks becomes repeatable, and amenable to the techniques of good software development.  Suddenly version control, configuration management, and agile development methodologies can be applied to operations, bringing a whole new level of professionalism to deployments.

Sean Hull asks on Quora – What is devops and why is it important?

iHeavy Insights 77 – What Consultants Do

 

What Do Consultants Do?

Consultants bring a whole host of tools to experiences to bear on solving your business problems.  They can fill a need quickly, look in the right places, reframe the problem, communicate and get teams working together, and bring to light problems on the horizon. And they tell stories of challenges they faced at other businesses, and how they solved them.

Frame or Reframe The Problem

Oftentimes businesses see the symptoms of a larger problem, but not the cause.  Perhaps their website is sluggish at key times, causing them to lose customers.  Or perhaps it is locking up inexplicably.  Framing the problem may involve identifying the bottleneck and pointing to a particular misconfigured option in the database or webserver.  Or it may mean looking at the technical problem you’ve chosen to solve and asking if it meets or exceeds what the business needs.

Tell Business Stories

Clients often have a collection of technologies and components in place to meet their business needs.  But day-to-day running of a business is ultimately about bringing a product or service to your customer.  Telling stories of challenges and solutions of past customers, helps illustrate, educate, and communicate problems you’re facing today.

Fill A Need Quickly

If you have an urgent problem, and your current staff is over extended, bringing in a consultant to solve a specific problem can be a net gain for everyone.  They get up to speed quickly, bring fresh perspectives, and review your current processes and operations.  What’s more they can be used in a surgical way, to augment your team for a short stint.

Get Teams Communicating

I’ve worked at quite a number of firms over the years and tasked with solving a specific technical problem only to find the problem was a people problem to begin with.  In some cases the firm already has the knowledge and expertise to solve a problem, but some members are blocking.  This can be because some folks feel threatened by a new solution which will take away responsibilities they formerly held.  Or it can be because they feel some solution will create new problems which they will then be responsible to cleanup.  In either case bridging the gap between business needs and operations teams to solve those needs can mean communicating to each team in ways that make sense to them.  A technical detail oriented focus makes most sense when working with the engineering teams, business and bottom-line focused when communicating with the management team.

Highlight Or Bring To Light Problems On Horizon

Is our infrastructure a ticking timebomb?  Perhaps our backups haven’t been tested and are missing some crucial component?  Or we’ve missed some security consideration, left some password unset, left the proverbial gate open to the castle.  When you deal with your operations on a day-to-day basis, little details can be easy to miss.  A fresh perspective can bring needed insight.

BOOK REVIEW – Jaron Lanier – You Are Not a Gadget

Lanier is a programmer, musician, the father of VR way back in the 90’s, and wide-ranging thinker on topics in computing and the internet.

His new book is a great, if at times meandering read on technology, programming, schizophrenia, inflexible design decisions, marxism, finance transformed by cloud, obscurity & security, logical positivism, strange loops and more.

He opposes the thinking-du-jour among computer scientists, leaning in a more humanist direction summed up here:  “I believe humans are the result of billions of years of implicit, evolutionary study in the school of hard knocks.”    The book is worth a look.

iHeavy Insights 69 – Fewer Moving Parts

In a lot of different kinds of systems there are moving parts.  Electronics, automobiles, bridges and even living systems.  As it turns out in many if not most of these systems, the simpler designs tend to have various advantages over the more complex designs.  These benefits ring true in the business world as well.

Rock Climbing

Take the extreme sport rock climbing as an example.  I’ve been rock climbing off and on for about five years, though mostly indoors at rock climbing gyms.  One thing that you learn a lot about in rock climbing is safety.  There is a discussion of the harness, and how to double-back the waist cinch, and using multiple carabiners to lock into the rope, and then how to tie the rope in such a way that it tightens as it bears weight.  Both the person climbing and the person balaying – gathering the rope below – each have to take care of these things.  So generally they both check their own rope, harness, carabiners, and then check the other persons.

With indoor climbing this is all rather simple, and with just six checks for each climber to make, generally quite safe.  Plus there are monitors in the room watching people climb, and further checking for mistakes or oversights.  So over the years I’ve heard of practically *no* injuries in the gym.  It is so-called top-roping, and their are few moving parts.

With outdoor climbing you can do top-roping, however more advanced climbers prefer lead climbing.  It is much more challenging, and as I’ve described above there are many more moving parts.  The lead climber has to place “protection” into the rock every few meters.  These are special camming devices that grip into the rock.  Obviously all these components are not fool-proof, hence you want to add as many as possible.  But there are limits to endurance, and statistical averages at play, and more importantly many more moving parts.  So unfortunately lead climbing outdoors although possible to be on the safe side, tends to be much more prone to accidents.  More moving parts increases the statistical chance of a system breakdown.

iPhone

Something similar is at play when it comes to interface design.  With user interface or UI design, there is often a discussion of how many steps it takes to perform a function.  The more steps, the deeper the function is hidden.  Fewer steps means simplicity of design.

The iphone is a great example of this.  By simplifying the user interface, the machine works better.  At the Mobile World Congress last year Google announced that they get 50 times more searches from the iphone than *any* other mobile device.  Fifty times!  Think about that statistic.  This is more that flashy glitz and a pretty package.  This is a device that has fewer moving parts, not only in terms of buttons, but in the virtual interface components that a user navigates on the touch screen.

Internet & Engineering

Many of the same truisms that apply in the examples of rock climbing or smartphones also apply to internet systems, and the operations side of the business.  Can we use a web-services solution such a mailchimp.com to handle our email newsletter?  That means less to manage in-house, so our IT staff can focus on more important tasks.  Or how about outsource all email handling through a service like google’s Gmail for Business, or salesforce.com for CRM.

Simplifying your operations can also mean going with managing hosting solution, or better yet embracing the cloud with Amazon Web Services or Rackspace Cloud.   For that matter what database platform are you running on, or what computing platform?  Does it embrace the complexity and more  features philosophy?  Or does it strive for simplicity, and fewer moving parts?  And for that matter how many of those endless features are you actually using for your application?

Conclusion

As it turns out, engineers as much as business folks are wowed by endless features and the appeal of glitz and shine of a fancy new car.  But often in business what you need is reliability, simplicity, and fewer moving parts to get the job done, and get it done well.

5 Tips for Scalability

Your website is slow but you’re not sure why.  You do know that it’s impacting your business.  Are you losing customers to the competition? Here are five quick tips to achieve scalability

1. Gather Intelligence

With any detective work you need information.  That’s where intelligence comes in.  If you don’t have the right data already, install monitoring and trending systems such as Cacti and Collectd.  That way you can look at where your systems have been and where they’re going.

2. Identify Bottlenecks

Put all that information to use in your investigation.  Use stress testing tools to hit areas of the application, and identify which ones are most troublesome.  Some pages get hit A LOT, such as the login page, so slowness there is more serious than one small report that gets hit by only  a few users.  Work on the biggest culprits first to get the best bang for your buck.

3. Smooth Out the Wrinkles

Reconfigure your webservers to make more connections to your database, or spin-up more servers.  On the database tier make sure you have fast RAIDed disk, and lots of memory.  Tune queries coming from your application, and look at possible upgrades to servers.

4. Be Agile But Plan for the Future

Can your webserver tier scale horizontally?  Pretty easy to add more servers under a load balancer.  How about your database.  Chances are with a little work and some HA magic your database can scale out with more servers too, moving the bulk of select operations to read-only copies of your primary server, while letting it focus on transactions, and data updates.  Be ready and tested so you know exactly how to add servers without impacting the customers or application.  Don’t know how?  Look at the big guys like Facebook, an investigate how they’re doing it.

5. A Going Concern

Most importantly, just like your business, your technology infrastructure is an ongoing work in progress.  Stay proactive with monitoring, analysis, trending, and vigilance.  Watch application changes, and filter for slow queries.  Have new hardware or additional hardware dynamically at-the-ready for when you need it.