What’s the luckiest thing that’s happened in your career?

sashi [via Flickr]

I was browsing through Career Dean recently, a site that facilitates professionals to share knowledge & experience with more junior & recent college grads about the work world. It’s a great site. I saw the question What’s the luckiest thing that’s ever happened for your career?

I read in John Adam’s AMA his “million dollar piss” (www.careerdean.com/q/howd-you-get-the-job-twitter), which he sowed the seeds of his success basically during a piss. That’s a 1 in a million kind of story I know. I’d like to hear if anyone else has ever experienced anything remotely lucky in that way? =) something fun to come back and read if anyone answers.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

Here’s how I responded…

I moved to NYC & worked at a tiny startup in the mid-nineties. Got to do Mac stuff, windows & Sun Solaris unix as well. Jumped on an Oracle project where I was a bit underwater. The firm hired a consultant to assist me for a few days. I watched what he did and learned like a sponge. Within a few months I dove into Oracle consulting and never looked back.

I felt this was an amazingly lucky opportunity to for a few reasons.

1. DIY

I’ve been consulting for almost twenty years now. And I get asked all the time how to get into freelance or independent consulting. For me the jumping off point was working for a really small ten person startup.

An environment like this is very different from a large corporation where you do one thing. At a tiny shop, everything is very do-it-yourself. You have to be self-serve & lean. It’s a constant challenge to teach yourself what you don’t already know. It’s a very vibrant environment as you enter your career.

Also: 5 Things toxic to scalability

2. Generalist

I also found that I had the chance to really apply everything I learned in computer science. It’s a hardware problem? It’s a software problem? These kind of silos that you experience at university don’t apply. One day you can be doing windows, mac, or Unix operating system configuration, the next you can be writing code. And on the third day you can be doing dba work.

In today’s terminology, this role was site reliability engineer or SRE, fullstack developer, tech support, evangelist, CTO, DBA, scalability & performance lead and more.

Related: Are generalists better at scaling the web?

3. Cutting edge

Startups to be sure are on the bleeding edge. They’re constrained by budgets, and through sheer will & experimentation, are cutting their teeth on the newest technologies out there.

These days that might be Cassandra & Kafka, Docker, MongoDB, hdfs, Redshift and so on.

Read: Do managers underestimate operational cost?

4. Ok to Fail

In larger enterprises, a lot of politics weigh on decisions, and exotic technologies are risky. When you’re at a startup, and by design you are entering uncharted waters, it’s sort of a given that it is ok to fail. This encourages learning, as there is less risk of failure.

Also: Is the difference between dev & ops a four-letter word?

5. Iterative & Agile

We talk about being agile, and lean at startups. At a very small place like this, you have one or two developers, and you deploy code constantly. It’s agile by default. And that’s a good thing.

Also: Is high availability overrated? The myth of five nines.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is AWS the patient that needs constant medication?

storm coming

I was just reading High Scalability about Why Swiftype moved off Amazon EC2 to Softlayer and saw great wins!

We’ve all heard by now how awesome the cloud is. Spinup infrastructure instantly. Just add water! No up front costs! Autoscale to meet seasonal application demands!

But less well known or even understood by most engineering teams are the seasonal weather patterns of the cloud environment itself!

Join 28,000 others and follow Sean Hull on twitter @hullsean.

Sure there are firms like Netflix, who have turned the fickle cloud into one of virtues & reliability. But most of the firms I work with everyday, have moved to Amazon as though it’s regular bare-metal. And encountered some real problems in the process.

1. Everyday hardware outages

Many of the firms I’ve seen hosted on AWS don’t realize the servers fail so often. Amazon actually choosing cheap commodity components as a cost-savings measure. The assumption is, resilience should be built into your infrastructure using devops practices & automation tools like Chef & Puppet.

The sad reality is most firms provision the usual way, through the dashboard, with no safety net.

Also: Is your cloud speeding for a scalability cliff

2. Ongoing network problems

Network latency is a big problem on Amazon. And it will affect you more. One reason is you’re most likely sitting on EBS as your storage. EBS? That’s elastic block storage, it’s Amazon’s NAS solution. Your little cheapo instance has to cross the network to get to storage. That *WILL* affect your performance.

If you’re not already doing so, please start using their most important & easily missed performance feature – provisioned IOPS.

Related: The chaos theory of cloud scalability

3. Hard to be as resilient as netflix

We’ve by now heard of firms such as Netflix building their Chaos Monkey to actively knock out servers, in effort to test their ability to self-healing infrastructure.

From what I’m seeing at startups, most have a bit of devops in place, a bit of automation, such as autoscaling around the webservers. But little in terms of cross-region deployments. What’s more their database tier is protected only by multi-az or just a read-replica or two. These are fine for what they are, but will require real intervention when (not if) the server fails.

I recommend building a browse-only mode for your application, to eliminate downtime in these cases.

Read: 8 questions to ask an aws expert

4. Provisioning isn’t your only problem

But the cloud gives me instant infrastructure. I can spinup servers & configure components through an API! Yes this is a major benefit of the cloud, compared to 1-2 hours in traditional environments like Softlayer or Rackspace. But you can also compare that with an outage every couple of years! Amazon’s hardware may fail a couple times a hear, more if you’re unlucky.

Meanwhile you’re going to deal with season weather problems *INSIDE* your datacenter. Think of these as swarms of customers invading your servers, like a DDOS attack, but self-inflicted.

Amazon is like a weak immune system attacking itself all the time, requiring constant medication to keep the host alive!

Also: 5 Things toxic to scalability

5. RDS is going to bite you

Besides all these other problems, I’m seeing more customers build their applications on the managed database solution MySQL RDS. I’ve found RDS terribly hard to manage. It introduces downtime at every turn, where standard MySQL would incur none.

In my experience Upgrading RDS is like a shit-storm that will not end!

Also: Does open source enable the cloud?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Can entrepreneurs learn from how science beat Cholera?

cholera ghost map johnson

When I picked up Johnson’s book, I knew nothing about Cholera. Sure I’d heard the name, but I didn’t know what a plague it was, during the 19th century.

Johnson’s book is at once a thriller, of the deadly progress of the disease. But in that story, we learn of the squalor inhabitants of victorian england endured, before public works & sanitation. We learn of architecture & city planning, statistics & how epidemiology was born. The evidence is weaved together with map making, the study of pandemics, information design, environmentalism & modern crisis management.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

“It is a great testimony to the connectedness of life on earth that the fates of the largest and the tiniest life should be so closely dependent on each other. In a city like Victorian London, unchallenged by military threats and bursting with new forms of capital and energy, microbes were the primary force reigning in the city’s otherwise runaway growth, precisely because London had offered Vibrio cholerae (not to mention countless other species of bacterium) precisely what it had offered stock-brokers and coffee-house proprietors and sewer-hunters: a whole new way of making a living.”

1. Scientific pollination

John Snow was the investigator who solved the riddle. He didn’t believe that putrid smells carried disease, the miasma theory prevailing of the day.

“Part of what made Snow’s map groundbreaking was the fact that it wedded state-of-the-art information design to a scientifically valid theory of cholera’s transmission. “

Also: Did Airbnb have to fail?

2. Public health by another name

“The first defining act of a modern, centralized public health authority was to poison an entire urban population”

Although they didn’t know it at the time, the dumping of waste water directly into the Thames river was in fact poisoning people & wildlife in the surrounding areas.

In large part the establishment was blinded by it’s belief in miasma, the theory that disease was originated from bad smells & thus traveled through the air.

Related: 5 reasons to move data to amazon redshift

3. A Generalist saves the day

The interesting thing about John Snow was how much of a generalist he really was. Because of this he was able to see thing across disciplines that others of the time were not able to see.

“Snow himself was a kind of one-man coffeehouse: one of the primary reasons he was able to cut through the fog of miasma was his multi-disciplinary approach, as a practicing physician, mapmaker, inventor, chemist, demographer & medical detective”

Read: Are generalists better at scaling the web?

4. Enabling the growth of modern cities

The discovery of the cause of Cholera prompted the city of London to a huge public works project, to build sewers that would flush waste water out to sea. Truly, it was this discovery and it’s solution that has enabled much of the population densities we see in the 21st century. Without modern sanitation, 20 million plus cities would certainly not be possible.

Also: Is automation killing old-school operations?

5. Bird Flu & modern crisis management

In recent years we have seen public health officials around the world push for Thai poultry workers to get their flu shots. Why? Because although avian flu only spreads from animal to humans now, were it to encounter a person who had our run-of-the-mill flu, it could quickly learn to move human to human.

All these disciplines of science, epidemiology and so forth direct decisions we make today. And much of that is informed by what Snow pioneered with the study of cholera.

Also: 5 Things toxic to scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is Agile right for fixing performance issues?

storm coming

I was sifting through the CTO school email list recently, and the discussion of performance tuning came up. One manager had posted asking how to organize sprints, and break down stories for the process.

Join 29,000 others and follow Sean Hull on twitter @hullsean.

Another CTO chimed in with a response…

“Agile is not right for fixing performance issues.”

I agree with him & here’s why.

1. Agile roadblocks

At a very high level, agile seeks to organize work around sprints of a few weeks, and sets of stories within those sprints. The assumption here is that you have a set of identified issues. With software development, you have features you’re building. With performance tuning, it’s all about investigation.

How long will it take to solve the crime? Very good question!

Also: 5 things toxic to scalability

2. Reproduce problem

Are you seeing general site slowness? Is there a particular feature that loads extremely slowly? Or is there a report that runs forever? Whatever it is, you must first be able to reproduce it. If it’s general site slowness, identify when it is happening.

Related: Are SQL Databases dead?

3. Search for bottlenecks

Once you’ve reproduced your problem, next you want to start digging. Looking at logfiles can help you find errors, such as timeouts. The database has a slow query log, which you’ll definitely want to review. Slow queries can be surfaced by new code deploys, or middleware in front of your database, such as an ORM.

If you find your logfiles aren’t enabled, it’s a good first step to turn them on. Also look at how you’re caching. The browser should be directed to cache, assets should be on CDN, a page cache should protect your application server, and an object cache in front of your database.

Read: Is five nines a myth that just won’t die?

4. Find the root cause

As you dig deeper into your problem, you’ll likely uncover the root of your scalability problem. Likely causes include synchronous, serial or locking processes & requests, object relational modelers, lack of caching or new code that has not been tuned well.

Also: Did Airbnb, reddit , heroku & flipboard have to fail?

5. Optimize

This is what I think of as the fun part. You’ve measured the issues, found the problem. Now it’s time to fix it. This is an exciting moment, to bring real benefit to the business. Eliminating a performance problem can feel like springtime at the end of a long cold winter!

Also: Is zero downtime even possible on RDS?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters