iHeavy Insights 70 – Ultimate Causes

Considering proximate causes versus ultimate causes can help give us a sense of perspective.  When it comes time to assign blame, it often isn’t as cut and dry as we’d like.  Getting this sense of perspective might help us consider and formulate more constructive solutions rather than pointing fingers.

Oil Spills

With the oil spill all over the news in the last weeks, its easy to see this finger point.  What’s the proximate cause?  Perhaps a faulty valve is to blame.  If we look at larger causes of that, we see that regulation, oversight and checks and balances were missing.  But why drill in such a hazardous extreme environment 1 mile down?

Ultimately the cause is of demand for oil.  As a supply of easy oil dwindles, we search ever more remote locations for it.  So in essence each of us plays a part in this demand for oil.  Since complex technology and systems are inherently unstable, accidents are inevitable.  They can be mitigated, but there always remains a degree of error.  One might even argue that ultimate cause was an ever growing human population which needs energy to make food, which in turn is an inherent quality of living creatures.  So says Jared Diamond in his book collapse, discussing the rise, and fall of some past societies.

Diamond’s book is also a great example of constructive thinking on this topic, as he provides examples of some societies which pulled back from that edge, and how they did it.

Financial Instability

With the recent financial meltdown, a lot of fingers have been pointed too.  Everyone is looking to find the bad guy, and upon whom to assign blame.  But individuals or large financial firms have always taken risks.  Their whole raison d’etre is to find more and creative ways of making money, off of fees, leverage, or so-called arbitrage.  So that impulse is obviously nothing new.  Lack of regulation, perhaps the government is to blame?  True regulatory changes did usher in a new wave of gambling.  Many would argue not enforcing capital requirements allowed firms like Lehman to get so overleveraged in the first place.

But economists argue that ultimately capitalism is prone to cycles of boom and bust.  It also seems that those cycles are shortening.  And ultimately the internet and computerization of trading has played a huge huge role in speeding up the global economic machine in ways that we have yet to understand or unravel.

Tech Outages

Just as with these other complex systems we’ve described, computing systems have inherent complexity built into them.  Many web-facing applications have a fleet of database servers, webservers, load balancers, and many many interlocking software components interacting in a myriad of ways.

When a site outage happens they typically coincide with a huge spike in traffic, which means some spike in popularity or attention or focus to your services and business.  So it often happens to be the worst time for an outage.  So the pressure mounts when such an outage occurs to find a root cause.  Now obviously root cause analysis is important to help mitigate future problems, but we must be careful not to also assign blame too quickly along with that.  It is a very human knee-jerk reaction, but may miss the bigger picture.

An infinite amount and variety of QA and testing can be performed, and still not model what will happen in the real world.  Perhaps your users all started coming to your site because they wanted some specific piece of content that was suddenly indexed by google making your site the best result?  Or maybe a new feature was rolled out, but enough testing was not done to find one particular bug.  Or a developer added a new table without the proper index, making a heavily used part of the site an even heavier burden for the database.

Again as with the other scenarios we’ve described there are indeed root causes, but we also must consider the ultimate causes of complexity, and inherent failure built into such systems.

Conclusions

Proximate causes are in the details, they point to the specific event that triggered an outage, avalanche or other disaster.  But they do not provide the entire picture.  Only by considering the ultimate causes of and the complexity of the system as a whole along with proximate causes, can we can get perspective.  It is from that vantage point that we may build more constructive solutions to not eliminate all risk, but at least to mitigate it.

BOOK REVIEW – Seth Godin – The Big Moo

Seth Godin edits this little book of insights, including essays and new ideas from such luminaries as Malcolm Gladwell, Mark Cuban, Tom Peters, and Amit Gupta.

View The Big Moo

iHeavy Insights 69 – Fewer Moving Parts

In a lot of different kinds of systems there are moving parts.  Electronics, automobiles, bridges and even living systems.  As it turns out in many if not most of these systems, the simpler designs tend to have various advantages over the more complex designs.  These benefits ring true in the business world as well.

Rock Climbing

Take the extreme sport rock climbing as an example.  I’ve been rock climbing off and on for about five years, though mostly indoors at rock climbing gyms.  One thing that you learn a lot about in rock climbing is safety.  There is a discussion of the harness, and how to double-back the waist cinch, and using multiple carabiners to lock into the rope, and then how to tie the rope in such a way that it tightens as it bears weight.  Both the person climbing and the person balaying – gathering the rope below – each have to take care of these things.  So generally they both check their own rope, harness, carabiners, and then check the other persons.

With indoor climbing this is all rather simple, and with just six checks for each climber to make, generally quite safe.  Plus there are monitors in the room watching people climb, and further checking for mistakes or oversights.  So over the years I’ve heard of practically *no* injuries in the gym.  It is so-called top-roping, and their are few moving parts.

With outdoor climbing you can do top-roping, however more advanced climbers prefer lead climbing.  It is much more challenging, and as I’ve described above there are many more moving parts.  The lead climber has to place “protection” into the rock every few meters.  These are special camming devices that grip into the rock.  Obviously all these components are not fool-proof, hence you want to add as many as possible.  But there are limits to endurance, and statistical averages at play, and more importantly many more moving parts.  So unfortunately lead climbing outdoors although possible to be on the safe side, tends to be much more prone to accidents.  More moving parts increases the statistical chance of a system breakdown.

iPhone

Something similar is at play when it comes to interface design.  With user interface or UI design, there is often a discussion of how many steps it takes to perform a function.  The more steps, the deeper the function is hidden.  Fewer steps means simplicity of design.

The iphone is a great example of this.  By simplifying the user interface, the machine works better.  At the Mobile World Congress last year Google announced that they get 50 times more searches from the iphone than *any* other mobile device.  Fifty times!  Think about that statistic.  This is more that flashy glitz and a pretty package.  This is a device that has fewer moving parts, not only in terms of buttons, but in the virtual interface components that a user navigates on the touch screen.

Internet & Engineering

Many of the same truisms that apply in the examples of rock climbing or smartphones also apply to internet systems, and the operations side of the business.  Can we use a web-services solution such a mailchimp.com to handle our email newsletter?  That means less to manage in-house, so our IT staff can focus on more important tasks.  Or how about outsource all email handling through a service like google’s Gmail for Business, or salesforce.com for CRM.

Simplifying your operations can also mean going with managing hosting solution, or better yet embracing the cloud with Amazon Web Services or Rackspace Cloud.   For that matter what database platform are you running on, or what computing platform?  Does it embrace the complexity and more  features philosophy?  Or does it strive for simplicity, and fewer moving parts?  And for that matter how many of those endless features are you actually using for your application?

Conclusion

As it turns out, engineers as much as business folks are wowed by endless features and the appeal of glitz and shine of a fancy new car.  But often in business what you need is reliability, simplicity, and fewer moving parts to get the job done, and get it done well.