Category Archives: War Stories

Are engineering orgs like Google so different from sales driven ones like Oracle?

Editor & writer in friendly dialog

Over the years I’ve worked with over 100 different organizations. Two decades in the industry you see a lot of things. Some businesses are more engineering heavy, while others are more sales driven.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

So this past week, I was somewhat surprised because I met with two very different organizations, and the contrast stood out dramatically to me. Pando Daily called it the Clash of Cultures.

I wonder will we ever learn from eachother?

1. On Monday I met with CloudOne

I’m choosing a fictional name here, but the meeting was real. We met over lunch to discuss how we might work together. Their org has been around for years, has a phenomenal track record, and they are strongly sales oriented.

Some observations:

o They’re hungry. They pushed for client lists & sniffed for leads.
o They’re margin oriented, they had a clear idea of where their strong suit was, and what types of customers they wanted to work with. That’s because they had a clear idea of their margins.
o They understand the industry well, much better than I did.
o They could certainly talk circles around me in terms of industry categories & verticals.
o They glossed over technical details
o They made broad generalizations & mixed up facts at times

Also: Beware the sales wolf in sheep suits

2. On Thursday I met with DataOne

Here again I’m choosing a fictional name. We met over dinner to discuss my opinions of the market and also if I might have any venture leads or could make introductions.

Some Observations I came away with:

o Their company is all engineering.
o They’re intimately focused on coding & building the product.
o They downplayed product limitations & somewhat out of touch with customer.
o They seemed to be feeling around in the dark for investors
o They seemed to have a weak network

Related: When you have to take the fall

3. Org experience: LearnOne

One of my past customers, also a fictional name here, they were also an incredibly sales heavy organization.

Some Observations:

o Their monthly standups felt like a sporting huddle.
o Lots of ra ra ra & high fives
o They were extremely sales driven, growing rapidly
o They had tremendous problems around engineering.
o They seemed to be boxing wayyy above their weight class.

Read: 5 Things I learned from Dvaid Maister about trust & advising clients

4. Cross-cultural studies

As a consultant I find this all fascinating. It often seems like this cultural style is driven from the top. The big movers are the ones who shape the organization.

I think of Google as an incredible example of an engineering driven organization. Finding top people is always about math & problem solving, but short on personality emphasis. Meanwhile their products lack the UI polish, but are functionally accurate & always fast.

Contrast that with Oracle, which send in a heavy armament of perfect suits to close a deal, negotiate soft until you’re firm is locked in, then jack up the license fees until you bleed. Meanwhile although the product is a sturdy technical construction, it’s every bit the marketing that is smooth & polished.

Also: Why is devops talent in short supply?

5. The takeaway

A winning team needs both. I’m obviously born of the engineering camp, but I agree with Ben Horowitz that the new enterprise customer is much like the old enterprise customer. And yes sales matters more than ever before.

At the same time the engineering team needs to carry equal weight, and decisions for both teams need to be framed as tradeoffs for the other.

Also: Five ways to build an analytics database with Amazon Redshift

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

When You Have to Take the Fall

Also find Sean Hull’s ramblings on twitter @hullsean.

One of the biggest jobs in operations is monitoring. There are so many servers, databases, webservers, search servers, backup servers. Each has lots of moving parts, lots that can go wrong. Typically if you have monitoring, and react to that monitoring, you’ll head off bigger problems later.

A problem is brewing

We, myself & the operations team started receiving alerts for one server. Space was filling up. Anyone can relate to this problem. You fill up your dropbox, or the drive on your laptop and all sorts of problems will quickly bubble to the surface.

Also check out – Why generalists are better at scaling the web.

As we investigated over the coming days, a complicated chain of processes and backups were using space on this server. Space that didn’t belong to them.

Dinner boils over

What happened next was inevitable. The weekly batch jobs kicked off and failed for lack of space. Those processes were not being monitored. Business units then discovered missing data in their reports and a firestorm of emails ensued.

Hiring? Get our MySQL DBA Interview Guide for managers, recruiters and candidates alike.

Why weren’t these services being monitored, they wanted to know.

Time to shoot the messenger

Having recently seen a changing of the guard, and a couple of key positions left vacant, it was clear that the root problem was communication.

Looking for talent? Why is it so hard to find a mythical MySQL DBA or devops expert these days?

I followed up the group emails, explaining in polite tone that we do in fact have monitoring in place, but that it seemed a clear chain of command was missing, and this process fell through the cracks.

I quickly received a response from the CTO requesting that I not send “these types of emails” to the team and to direct issues directly to him.

You might also like: A CTO Must Never Do This

A consultants job

As the sands continued to shift, a lead architect did emerge, one who took ownership of the products overall. Acting as a sort of life guard with a higher perch from which to watch, we were able to escalate important issues & he would then prioritize the team accordingly.

Are you a startup grappling with scalability? Keep in mind these 5 things toxic to scalability

Sometimes things have to break a little first.

What’s more a consultants job isn’t necessarily to lead the pack, nor to force management to act. A consultant’s job is to provide the best advice possible & to raise issues to the decision makers. And yes sometimes it means being a bit of a fall guy.

Those are the breaks of the game.

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

Upcoming for Scalable Startups

Just back from the Labor Day holiday, and ready to dive back in.

I thought this would be a great time to outline some of our upcoming topics so here goes…

1. Why Oracle usability sucks

– a rant about Oracle’s weak points

In the meantime take a peek at our piece on why we wrote the book on Oracle & Open Source. We ruminate on trends in the datacenter and take a stab at Oracle’s future.

2. Why relational databases don’t scale

– Is there any such thing as automatic scalability?
– What blocks scalability?
– Are NoSQL databases magic?

Also one of our articles that went viral – 5 things toxic to scalability

3. Eternal tension between dev & operations

– origin in different job roles & priorities
– balance found in each appreciating the others point of view
– hiring the best or building the right culture

You might enjoy a wildly popular piece we wrote a few months back How to hire a developer that doesn’t suck.

4. MySQL Query Tuning Cheatsheet

– SQL queries are hellish to tune, that we know.
– An outline of some of the common patterns that don’t work will help you identify and avoid them.

5. Differentiation in professional services

– A commodity does not stand out in the services business
– Differentiation is about personality, relationships & how you solve problems for the business

Also in the meantime take a look at our professional services 101 guide.

Read this far? Grab our monthly scalable startups newsletter.

Beware the sales wolf in sheep suits

Recently a colleague called me up to get my opinion.

[quote]We’re in the process of standardizing our systems on Red Hat Linux, but management and higher ups are convinced we should deploy Oracle on Oracle’s own Linux distribution. Which is better?[/quote]

Therein lies the eternal drama in organizations, the push & pull between dollars and technology best practices.

We had a similar experience with a MySQL deployment, and solution framed by Oracle sales.

Battle lines are drawn

Clearly the battle lines are drawn now. Between director of operations & team versus management & business stakeholders, between high level and the trenches, or between the systems that support your business and day-to-day running of them.

Business units & management are tasked with budgets, cost management, and long term thinking about trajectory and what’s best for the business. Operations teams are tasked with the day-to-day stability, the command line perspective.

What is the sales team’s position?

Sales guys at Oracle have a job to sell licenses. This isn’t good or bad, it’s their driver. Understanding all the drivers will help us align them.

Sales guys sell to management, so they will likely frame all their stories to management concerns. Also Oracle’s history here is fairly clear. Get customers locked into Oracle up and down the stack, and they become more and more beholden to you as their primary provider. As customers become more dependent, they will begin to squeeze more and more out of them.

Nothing personal, this is how money is made. But understand the goal.

How do OS choices affect the business bottom line?

Standardizing across the enterprise reduces costs & reduces operational complexity. This can reduce risks to operator error & other downtime that increase with more heterogeneous environment.

On the Oracle distribution side, you likely have tweaks to make Oracle run better. However don’t forget the profit motive. Some tweaks may be conveniently “overlooked” in favor of profit. For example for many years the Oracle installer would not complete without error on many Linux systems. Imagine all the professional services that are sold around running through a complex install. Streamlining such an install would *reduce* profits. Don’t laugh.

What happens on the front lines?

On the front lines of course are the ops teams & DBAs, actually installing and supporting enterprise software. Let’s not forget these guys are at the command line. They know inordinately more about what’s really happening down in the trenches. You may find them repeatedly rolling their eyes at salesmen claims.

However they are not the colorful storytellers or communicators that salesmen are, so they may

Want to hire a DBA? Here’s our MySQL interview hiring guide. We also wrote a similar one for Oracle DBA Interview questions.

Align each division’s interests

Despite cultural differences, business management & operations teams should work hard to connect, and align with one another.

Operations should make an effort to better understand the business bottom line. Money doesn’t grow on trees as they say, and choices have to be based on budget, and real-world needs. We’d all like to sit in a university and program or build things just to create something new, but in a business there are market pressures. All teams should reflect on those.

Management should also make an effort to understand ops teams needs. Why are my ops teams telling me a different story than they Oracle sales guys? Fight the urge to bond with the sales folks, despite their smooth delivery, great suits and peer positioning.

Weigh short and long term tradeoffs

List out advantages & tradeoffs on all sides. These should be technical and business bullet points. Brainstorming a full list like this, and having the whole team discuss the list openly will help the team together come up with a more realistic outcome. Some questions to ask…

1. What are the advantages & disadvantages of having multiple providers for your technology stack?
2. Which solutions are open and which are proprietary? What are the tradeoffs there?
3. What does your team have subject matter expertise in?
4. Are there real technical advantages to one solution or the other?
5. Are there real cost advantages to one solution or the other?
6. Are there expertise advantages & training savings to go one direction?
7. Is the technology widely used in your industry? Will additional or replacement operations experts be easy or hard to find?

Read this far? Grab our scalable startups newsletter.

Juggling apples & oranges in the datacenter


In which a few choice words become one serious accident…

The Backstory

More than five years ago now, I worked for a shop in the business of news & information around the legal and real estate sectors. It was a fairly large organization with a number of Oracle and MySQL backed applications. The whole place ran on Sun servers, with a team of systems administrators, developers, and of course editors & content folks.

I was the primary database administrator for almost an entire year back then. I reported directly to the CTO. She was bright, competent and great to work for.

Although she had a technical background, she often spoke about products and gave very high level directives when making requests. This was made more confusing as the environment lacked naming conventions. So often product names didn’t match server or database names.

I tended to take the very paranoid approach. I’d ask over and over for clarification, and let some time pass before actually executing on a request.

A Changing of the Guard

After many months as a contractor DBA, the firm finally located a fulltime guy to replace me. It’s no easy task finding a DBA these days, especially for MySQL.

He was a very bright guy with a lot of technical knowledge. A bit green behind the ears, but fully capable to manage an enterprise database shop.

Looking for a top-notch DBA? Here’s our MySQL interview questions & hiring guide. We also have one for hiring an Oracle DBA.

Nuking the database

After two weeks on the job, something unpleasant happened.

Imagine a chef working with cooks & confusing dishes with vegetables.

[quote]
Chef says, “Toss the avocado”
Cook throws the avocado salad in the trash thinking it’s rotten.
Chef comes back later asking quizzically, “I wanted you to mix it up!”.
[/quote]

In the datacenter the conversation went something like this…

[quote]
CTO: Drop the journal database & rebuild.
DBA: Ok. Give me a few minutes
CTO: What did you do? The whole application is offline now!
[/quote]

From there scrambling ensued. After nearly six hours of screaming, and firefighting, everything is finally restored from backups and the application brought back online.

Naming – product or components?

Semantics is very important. Those in the trenches tend to take requests word-for-word while those managing the troops tend to make requests in terms of products, divisions & the vantage point of the business.

That’s why naming conventions can be so important. Don’t want to be talking about apples when you really mean oranges.

Living with dysfunction

As environments grow over years and years, they tend to evolve into a spaghetti of confusing names & relationships. It’s the nature of enterprise environments.

– big confusion can mean big mistakes
– check & recheck – be risk averse and a bit paranoid
– check yourself, your shell, your hostname, your login
– ask questions & clarify repeatedly
– let some time pass before executing a destructive command

Made it this far? Grab our newsletter.

Sometimes… let things break a little

Have you ever started a new project, just into it you realize that maybe there aren’t technical problems to solve? It starts to dawn on you the real crux of the problem boils down to people & processes?

It’s happened to me on a number of occasions, but once in particular really stands out for me.

I was working for a firm in the education space, in particular around test preparations.

Asked to automate a publish process

The environment had a mix of relational databases, from SQL*Server to MySQL for some applications. The web facing database however used Oracle on the backend.

Their career DBA was real old guard Oracle, he had his ways of doing things, and didn’t want to rock the boat. In particular he managed the process for publishing changes to the website. Publishing amounted to running a few hand rolled scripts and each step was a manual one.

With the process setup this way, the editors had to work closely with engineering each time they wanted to move content to the website. Slow, cumbersome, and not very workable.

The real problem, siloed departments & infighting

As I worked closely with the DBA, quite a few things became clearer. For one he was sometimes a grumpy fellow & he had a strong accent which was sometimes hard to cut through. Knowing the other team members, I knew this all contributed to the trouble. But he maintained quite a bit of resistance to automation of the process no matter what. His view was, if he hands over the reigns to editors who don’t understand the technology, they’ll screw something up, make a mess, and that would ultimately create more work for him. After all that he’d be doing it manually anyway!

Further attempts to communicate between teams or even between the managers and this guy went nowhere.

It’s not easy to find a good DBA. We wrote a MySQL interview guide to help and one for an Oracle interview too. The mythical dbas remain in rather short supply.

We weren’t trained for this in engineering school

When you’re looking for a technical solution and you realize the bigger issue is a people problem, what do you do? You can gently bring it up with the higher ups, but they may have a different style, or prefer the shout orders and bark mode of management.

Things continued to go around in circles, and attempts to get further information from this DBA didn’t prove fruitful. He was protective of his domain, and fought tooth & nail to open up.

Things come to a head

The bigger boss, the one above my direct report, one day called me into his office. Actually it was an off day I was just stopping by to check in on progress.

He pulls me into his office. Since I rarely interacted with the guy, my guard was very much down. I thought we’d have a chat about the weather or perhaps who was going to win the world cup.

He proceeds to tear into me without warning. Practically screaming, he’s giving me a piece of his mind and not stopping to hear what I have to say. Where is this project going, why is there no progress, we’ve got serious deadlines, you’re pushing us right up to the wall ‚Ķ that kind of thing.

As I listened to him fire away at me, I realized some of this had gotten filtered through some sources, who didn’t completely understand the blocking issues either. That it wasn’t technical challenges, but rather people and processes that weren’t working. As I began to explain this, he stopped me and said:

[quote]Sean, your job is to push, push, push and push us more. Rock the boat if necessary. When I was a constultant I was constantly running around making sure everyone was talking to eachother[/quote].

A few things ran through my mind at that point. One was well he’s not a consultant, so either he couldn’t last in it, or the life didn’t appeal to him. Or perhaps his skill sets ran truer to management in a large firm, a different albeit tougher role to master.

But it also occurred to me that different folks have very different styles, some like to push, and prefer confrontation & believe that leads to resolution. While others are more listeners and find there way around a problem by giving everyone a chance to voice their positions.

My style is the latter, while his was clearly the former.

The fallout

What ended up happening is a project manager also got assigned to the project, as well as another manager. The PM liked working with me very much, and as things unfolded, much of the departmental siloing began to dissolve, and the bigger communication problems began to surface. From there solutions followed.

Lessons learned

– beware the status quo – some don’t really want to rock the boat
– communicate your position, but beware that others may have a different style
– you may be asked to support a path you see as the wrong one
– let things “break a little bit” so everyone learns the hard lessons
– getting burned can be a lesson for the whole team, and lead to new solutions

Made it this far huh? Grab our newsletter.

Where’s my 80 million dollars?

Way back in the heydays of the dot-com boom, the year is 1999.

Join 12,100 others and follow Sean Hull on twitter @hullsean.

I worked for a medium size internet startup called Method Five. When I came on board they were having a terrible time with their site performance.

Website crashing

When I first met the team, I was tasked with performance problems. After all their flagship web property kept crashing, and it didn’t look good to investors. As with most web properties in those days it was a home-grown datacenter in the back of the office, running on Sun Microsystems hardware, with Oracle on the backend and Apache serving webpages.

Also: Why a killer title can make or break your content efforts

Negotiating an acquisition

As it became clearer after day one, the project was particularly sensitive. They were negotiating a huge acquisition by a firm called Xceed Corp. The sticking point? Their crashing website did not sell their technology prowess in a particularly positive light. To say the least!

Read: Why high availability is so very hard to deliver

Investigation

As it turns out the site had all the right players, from systems administrators to a DBA who sat watch over the Oracle systems.

As I dug into the systems, I found a serious smoking gun. It seems the Oracle software was configured to use just 5M of memory out of about 256M free. Just like MySQL, the server must be configured to use available memory upon startup. There are myriad caches and buffers which need to be attended to. By today’s standards these numbers probably sound absurd. Nevertheless the DBA wasn’t familiar with the basic memory settings, and so the system was terribly bottlenecked.

Read this: Why a four letter word divides dev and ops

Problem Solved

We then ordered some urgent changes to the system, configuring all of Oracle’s caches to use up the precious memory available.

Immediate the website unlocks, transactions begin flowing, and webpages are returning quickly. End users pull their noggins off their keyboards, and the executives begin breathing a sigh of relief. The site was literally 1000x faster during peak.

Related: MySQL interview guide for managers and candidates alike

Acquisition

Shortly thereafter the acquisition goes through for a cool 5 million in cash and 80 million in stock.

Where’s my cut?! You might be asking that question. But my policy is almost always defer to something concrete and tangible, aka fees and real compensation. I did not negotiate any stock in the deal.

Another popular war story we wrote A CTO Must Never Do This….

Read: Why devops talent is in short supply

Lesson’s Learned

o Don’t believe received wisdom. Check and double check what’s really happening.
o Use the memory and resources you have available.
o Measure capacity, and isolate bottlenecks in the system
o Decouple services wherever possible
o Problems are as often people and process as they are with technology

Also: 5 more things deadly to scalability

Make it this far? Grab our newsletter!

You're Too Young To Be My Boss

About a year ago I engaged with a firm to do some operations work on their site. They provided services to colleges and universities.

When they first reached out to me, they were rather quick to respond to my proposal. They seemed to think the quote was very reasonable. I also did some due diligence of my own, checking the guy’s profile on the about page. I noticed he was 25, rather young, but I didn’t think much else of it.

We discussed whether they wanted fixed hours. Since those would limit my availability we both agreed a more flexible approach made sense. This worked well for me as I tend to shift and schedule time liberally, so I can be efficient & flexible with clients, but still have a life too.

Trouble Brewing

As we began to interact the first week, I sensed something amiss. My thought was that the first week you work with a client, they feel you out. They see how you work, when you work, how much gets done and so forth. This provides a benchmark with which to measure you. If either party is unhappy with how things are going, they discuss and make adjustments accordingly.

What was happening in this case was the guy started pestering me. I began to get incessant messages on instant messenger asking for updates. I had none. I explained that I would contact him as things were completed, or if I had questions.

This was only two days into the project. I’d barely gained access to the servers!

The Fever Pitch

After discussing my concerns on the phone, the gentleman kind of glossed them over. From there the pestering continued. I explained that I could not be available to him any hour of the day, while the engagement only provided for one half of a week. This began to interrupt me from other client work, so I had to signoff of instant messenger. Not good.

The Pot Boils Over

We spoke again on Monday briefly, and decided to connect the following day. From there the pestering began anew, and I began to lose my patience. I insisted that we speak on the phone before work would continue. I felt the problem was deteriorating and discussing over text would only make things worse.

He emailed me back as I was then offline. In his email he ordered me to come online. While he sat in a meeting, he explained, he could not take a call! Nevertheless he insisted we resolve it during the meeting. Distracted no less.

[quote]It was then that I started receiving text messages on my personal mobile phone from the guy, pestering me to get online so we could resolve our communication problem! You can’t make this stuff up![/quote]

The Fallout

Eventually we did both get on the phone, and I explained I had reached wits end. After only ten short days of working together, we had both set strong precedents and they were obviously not compatible. He asked if I would stay on longer, and reconsider working together, and I said I would think about it.

I chose not to dig a deeper hole, and let him know I wouldn’t be invoicing for previous the weeks work.

The Lessons

o beware age differences – in our case an 18 year gap
o pay attention to management styles – self-starters don’t need micromanaging
o be patient & keep communicating
o allow for an exit strategy that is amenable to both parties

Read this far? You’ll love our newsletter. Get Scalable Startups. No Spam. No Selling..

A CTO Must Never Do This…

A couple years back I was contacted to look at a very strange problem.

The firm ran flash sales. An email goes out at noon, the website traffic explodes for a couple of hours, then settles back down to a trickle.

Of course you might imagine where this is going. During that peak, the MySQL database was brought to its knees. I was asked to do analysis during this peak load, and identify and fix problems. Make it go faster, please!

First day on the job I’m working with a team of outsourced DBAs. I was also working with a sort of swat team chatting on SKYPE, while monitoring the systems closely.

Then up popped one comment from a gentlemen I hadn’t worked with. He insisted there was contention for a little known MySQL resource called the AUTO_INC lock. Since I wanted to know more, I asked who the guy was and to my surprise he turned out to be the CTO.

[quote]The CTO was tuning and troubleshooting the database![/quote]

Wow, that’s a first. I thought I’d seen it all. A CTO is normally overseeing technology & the team rather than crawling around in the trenches on the front line.

This all raised some important points

1. The app was having major growing pains
2. Current architecture was not scaling
3. Amazon elasticity was not helping at the database layer
4. People & process were also failing, hence the CTOs hands on approach

It was shocking to see a problem deteriorate to this point, but when you consider the context its understandable. A company like this is struggling with hypergrowth to such a degree, that each day seems like a hurricane storm. With emergency meetings, followed by hardware & application emergencies, trouble seems constant. It can be very difficult to step back and see the larger picture.

The takeaway from this experience…

o Amazon EC2 can’t do it all – consider physical servers for disk intensive apps
o MySQL still has some real scalability limitations
o use technology for its intended purpose – MySQL isn’t great for queueing
o A CTO tuning the database means problems have deteriorated too far

Read all the way to the end? Grab our newsletter – scalable startups.