Are top candidates evaluating your startup?

Editor & writer in friendly dialog

I work for a lot of startups. Many ask me for referrals. I play matchmaker when I can. But as the market continues to heat up, the demand for top talent is reaching a boiling point.

Join 29,000 others and follow Sean Hull on twitter @hullsean.

1. It’s a sellers market

That means folks with technical skills across the spectrum are very indemand. How in demand? Check Angellist, Made In NY or Indeed.com. From SRE’s to full stack developers, devops & automation experts to DBAs. Java, Ruby, Python, PHP, node.js, and of course design skills too.

I was speaking with a recruiter just today, and heard the same refrain…

Top candidates are evaluating us just as we are evaluating you.

That means firms must go the extra mile to stand out, and draw in the best talent.

Also: 5 Things toxic to scalability

2. Open the glassdoor

That’s right, manage your social media presence. Sites like Glass Door provide forums where employees past & present can discuss the day-to-day work environment. This gives prospects a chance to peer behind the curtain.

Other social media can be avenues too, from Facebook to Twitter. Having someone on staff that monitors online reputation can be crucial.

Related: Are SQL Databases dead?

3. Host a tech blog & meetups

A lot of top firms have great tech blogs. Truth be told many are dormant as demands of the day trump these outward facing initiatives. But they also put a face on the technical side of working for a firm. What problems are they solving? How cutting edge is their team?

Meetups are also a limitless forum. Smart minds will be mixing, your company brand will be spreading. Hosting technical discussions brings your firm front & center in multiple ways. It also brings possible new hires to your living room.

Read: Is high availability a myth?

4. Show warmth & transparency

I know everybody loves to grill candidates at interviews. But interviewees should be schooled on politeness & how to give a pleasant interview.

I remember one interview where I faced off with four other engineers at a round table. As the discussion unfolded, each aimed shots in succession, almost rapid fire at me. It was not only intimidating, but frustrating. Needless to say it made me a stronger more resilient interviewer, but it’s not a great way to welcome great talent. Buyer beware!

Also: The chaos theory of cloud scalability

5. Show me the money

I know I know, for engineers it’s not all about the money. Or is it? Truth be told compensation is always something prospects will weigh. Equity is fine, for what it is. But it’s a promise into the future.

More senior talent who have been through a few startups or even dot-com 1.0, may be a bit more dubious of abstract compensation. In the end competitive real dollars will speak volumes.

Also: Is upgrading Amazon RDS like a shit-storm that will not end?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

The chaos theory of cloud scalability

The.Rohit - Flickr
The.Rohit – Flickr

Reading Benedict Evans weekly newsletter, you’re bound to bump into something new & useful. His newsletter covers Mobile, but that also means it touches on a lot of other areas of tech, innovation & startups.

This week he pointed me to A Weissman’s The Chaos Theory of Startups. He argues a VC’s job is to help a startup identify the right framework. It’s about finding the signal in the noise.

Join 29,000 others and follow Sean Hull on twitter @hullsean.

I think you can carry this idea over to technical operations today. There are a few key maxims I follow to keep you on the scalable track.

1. Degrade gracefully

You’ve heard it before, but have you done anything about it?

Build a read-only or browse only mode into your application. Do it now. You will thank me. When your database goes down unexpectedly (with RDS this might happen sooner than you think), you want to be able to use your lovely read-only slave database. Browse only mode, forces developers to add read-only support in most application functions, keeping the site up and running, without a full visible and ugly outage.

Which brings me to point two, be sure to have copies of your production database. Real live, only read-only copies. In Amazon speak, this is a read-replica, in MySQL this is a slave database. Most startups I see these days have this, but if you’re one of the ones dragging your feet, do this now.

Also: Is the difference between dev & ops a four-letter word?

2. Monitor & measure

Amazon’s cloudwatch is fine for what it is, and so is New Relic. But employing a dedicated tool just for monitoring, such a Nagios & cacti can give you much more granular intelligence about what’s happening with your infrastructure. Nagios gives you the monitoring & alerting, Cacti gives you the history. It’s like a BI reporting tool for infrastructure.

Related: Is automation killing old-school operations?

3. Keep components simple

Keep it simple stupid? Don’t adopt new technologies, languages, or versions of software, without first vetting them. Ask questions:

o Is there an existing piece of software or service that can overlap this new one, killing two birds with one stone?
o Does everybody know this new technology?
o Does this choice of technology solve any other broad problems we have?
o Is there a large community around the project?
o Are there a lot of engineers with experience in this chosen technology?

Tellingly, many startups don’t have an operations person to start with. In those, the danger is developers choose new solutions, with no push back.

I asked… Does a four letter word divide dev and ops?

Read: Do managers underestimate operational cost?

4. Don’t force database abstraction

Object Relational Modelers, aka database middleware, are great in theory. We want a library that takes database & SQL drudgery away from developers. Why reinvent the wheel?

The trouble is database independent code doesn’t work, and never has. ORMs are painfully inefficient, selecting all columns, or repeatedly reading rows from tables. This causes serious traffic jams inside your database.

They come in various guises, Cake PHP, Active Record for Ruby, Hibernate for Java, SQL Alchemy for Python.

Also: Is the difference between dev & ops a four-letter word?

5. Be asynchronous

This means don’t make your application code wait. Make asynchronous calls to APIs & check back later, use software queues so traffic backups don’t clog your components & communication.

Avoid any type of two-phase or multi-phase commit. These are common in clustered databases, forcing a serialization point so nodes can agree on what data looks like. However they’re toxic to scalability. Look for technologies that use eventually consistent algorithms.

Also: Is the difference between dev & ops a four-letter word?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is upgrading RDS like a shit-storm that will not end?

aws logo

Join 29,000 others and follow Sean Hull on twitter @hullsean.

Can RDS worsen an outage ?? That’s another way to think about this question. In my experience, it very clearly increases outages, by tying one or both hands behind your back. Believe me when I say, that is terribly frustrating when you’re putting out fires!

1. Changing Parameters

An everyday occurance, is the need to change database parameters. Want to enable a login, great no problem. Except in RDS it becomes a problem! Ok, you’re thinking, why is that?

In regular MySQL, you login with the shell & issue SET GLOBAL parameter = value; Nice, easy, straightforward. No servers restarting, no nonsense. If the parameter requires a reboot, MySQL will tell you.

In RDS, the process is waay more complex. First you edit a parameter group. You can copy an existing one, or change the one you’re using. If that parameter group applies to many servers, be careful!

Ok, what next? Now you APPLY that new parameter group. You can do so immediately, or during the next maintenance window. Here’s the tricky part. Is Amazon going to restart my instance? That’s something your boss or manager will surely ask you. Well you might think it would only do so if the parameter in question required it. But I tried to enable the general log recently and Amazon tells me the status of “pending-reboot”. This change shouldn’t require that! I’m sitting there scared Amazon might suddenly decide to reboot a production server for no reason!

This is where you feel you’ve lost control. You can dig through docs all you want, but you can’t ever say for sure if a managed service will behave predictably. There’s already more layers of software between you and your relational database. Not what you want.

Also: Did MySQL & Mongo have a beautiful baby called Aurora?

2. How much longer?

Another question you’ll ask yourself is, how long will this maintenance take? With MySQL at the command line, you can run through test after test & time the process. When you go to perform tasks offhours, you already have a clear picture.

With RDS, things can’t be predicted. Servers are restarted when they needn’t be. Rebuilds take forever, and you have no progress bar. EBS performance has a hiccup and your snapshot time doubles. The troubles go on and on.

Related: Is automation killing old-school operations?

3. Why did Amazon just force an OS upgrade?

Here’s another surprise I ran into. Again we have a managed solution, so Amazon must take opportunities when they can. But you pay for it in unpredictability.

Going to perform a MySQL 5.1 to 5.5 upgrade, and I’d run through test after test in advance. Timed the process to about 45 minutes. Then went to do it in production. Amazon decided to throw in the OS upgrade too, adding 40 minutes of surprise time. What’s worse? No progress bar on that either.

Upgrades are nerve wracking enough, without this kind of stuff scaring the daylights out of you.

Read: Do managers underestimate operational cost?

4. What’s happening on my server?

All of the questions about progress are opaque on RDS because you lack command line. You can’t watch processes, disk I/O or any of the granular stuff. In my surgery analogy below, it’s as though you can’t touch the patient, find their pulse or guage if their skin is cold, clammy or pale.

Also: Is the difference between dev & ops a four-letter word?

5. Surgery with blunt instruments

At the end of the day, RDS feels like surgery with blunt instruments. If command line were your scalpel, windows & GUI tools may be your remote video surgery. And worse still, RDS would be like doing surgery on the Opportunity Mars rover, after it’s landed & stuck in a valley. Everything is delayed, it’s hard to tell what’s going on, and the worst environment to work in when you have an emergency with your database.

If you have any operations experience, deploy your own MySQL on an EC2 instance. You’ll thank yourself later.

Also: Is zero downtime even possible on RDS?

Upside to RDS

Is there any upside? Why do people use it? Push-button replication. Check. Push-button multi-az, check. Those are great if you have no DBA. Automated backups so you don’t shoot yourself in the foot, check.

I guess there is *something* to love.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Did Dropbox, Samsung, BestBuy & TimeWarner have to fail in 2014?

dropbox outage

It was a blockbuster year for web outages in 2014. From retailers to cable providers, it seems everyone had their 15 minutes of fame.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

How can we protect our web property from the same fate? Here’s a short list of suggestions.

1. How much traffic can my site handle?

If you’re not already asking this question, you should be.

Answer: Load test well in advance

Load testing simulates what happens when the internet hordes descent on your website like a plague of locusts. This can be a good thing if you’re having a special sale or promotion. Or possibly a bad thing if you’re the victim of a denial of service attack.

Also: Is there a devops talent gap?

2. How quickly can I bring online more capacity?

A more nuanced version of number one, once you know what your site can handle, you need to know if you’re even ready for more.

Answer: Auto-scale & test.
Answer: Scale your database tier well in advance

Autoscaling is a feature popularized by amazon web services, which couples monitoring your traffic, with thresholds that will kick off more capacity. This capacity may be deploying new webservers to sit behind your load balancer. Whatever your method, be sure to test carefully.

Related: Is zero downtime even possible on RDS?

3. Cache, cache, cache!!

Optimizing your site means

Answer: Use an object & page cache.

If you’re not already using memcache, redis or elasticache, you should be. These object caches sit in between your application & your database, holding frequently accessed data & reducing load on your central database server.

Consider a page cache too like nginx or varnish. These act as sort of tiny webservers that are very low overhead, low memory & fast response. They provide a buffer in front of your main application server, and can service simpler but repetitive web requests.

A third caching layer sits in the db itself, called a query cache. This is implemented differently on Oracle, SQL Server, MySQL and Postgres, but the concept is similar. Cache frequent queries, so they don’t need to be rerun each time.

Read: Do managers underestimate operational cost?

4. Degrade gracefully

Answer: Build a browse only mode

If you’ve ever logged into an airline website & selected your flight, only to run into trouble during checkout, you know what a browse-only mode is! Many sites build this, along with other feature flags & toggles to allow them to degrade gracefully, that is keep the site up and running, while essential services may be inoperable.

Every production site should prioritize for this. Sooner or later you’re gonna need it!

Also: Why you need a performance dashboard like StackExchange

5. Try CloudFlare

CloudFlare is a service for protection from denial of service attacks. Integrate their servers into your infrastructure & a monitoring process begins, of each packet you receive. If a denial of service occurs, your site can be throttled down to reduce the impact & keep things online. This protection can double for heavily loaded legitimate traffic, keeping your website online in a worse case scenario.

Also: Big data scientist interview questions

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters