Thinking deeply about Amazon cloud & infrastructure code

via GIPHY

If you’re building anything in the public cloud these days, you’re probably using some automation. There are a lot of ways to reach the goal posts, and a lot of tools to choose from.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

In my case I’ve put Terraform to use over and over again. I’m built vpcs, public & private subnets, and bastion boxes for mobile apps for mental health & fitness, building security, and two factor authentication apps.

I’ve chosen Terraform because it has a vibrant & growing community, the usability is miles ahead of CloudFormation, and it can work in a multi-cloud environment.

But this article isn’t about choice of tools. I’m curious about this one question:

“What architectural considerations should I keep in mind as I build my infrastructure code?”

Here are my thoughts on that one…

The VPC is your fundamental container

Everything you build sits inside of a VPC. Your entire stack references back to those variables, including vpc-id, private and public subnet IDs.

Here’s an example where you can get into trouble. Digging through some infra code, reviewing with a new devops hire, we were going through everything with a fine toothed comb. We found that the RDS instance was being deployed in PUBLIC subnet, instead of private.

Alerted to the problem, we first checked to see whether it was accessible from the internet at large. It wasn’t, because we had not exposed a public facing IP. That said it wasn’t the most secure setup and I wanted to fix it.

I made some changes to the Terraform code, to update the subnet to private, and tried “$ terraform apply”. Then I got all sorts of errors. Try as I might, this update would not work.

Sadly the long term solution was to destroy the entire stack, and rebuild with RDS in the right place. Lesson learned.

Related: When you have to take the fall

Why I discovered a shared or utility vpc was so useful

o story of placing ELK inside an application vpc

Related: Before you do infrastructure as code, consider your workflow

Think carefully about domains

As you build your application, you’ll likely need a route 53 zone associated with it. And you’ll want a CNAME in front of your load balancer, so it’s easy for customers to hit your endpoint.

1. rebuilding stack means new zone & new nameservers

If your registrar is elsewhere, you’ll need to update nameservers each time you destory & build the zone. This happens even if you host the domain at AWS. And it can’t be automated at the moment.

You could also have the zone created *outside* terraform. Then your terraform code would reference and add CNAMEs to that zone by using it’s ARN as reference. This is another possible pattern.

The pattern I prefer is to have each vpc & stack have their own unique top level domain. That way terraform can cleanly create and destroy the whole stack and nothing is comingled.

Related: I tried to do infrastructure as code, it didn’t go as I expected

Enable easy create & destroy

Each time you tear up your work and rebuild, you test the whole process. This is good. Iron out those hiccups before they cause you trouble. After some time, you’ll be able to move your entire application stack, db, ec2 instances, vpc & network resources from one region to another easily & quickly.

After doing this a few times, you’ll start to learn what resources in AWS are region specific. And which ones are global.

Remember, don’t allow any manually created objects or resouces inside your automated ones. If you aren’t strict here, you’ll hit errors when you try to destroy, and then have to troubleshoot those one-by-one.

Related: How to setup an Amazon ECS cluster with Terraform

Automate first

I was building an ELK server setup to centralize our logging infra. Everything worked pretty well. After a time, I added some more S3 buckets for load balancer log ingestion.

Later we hit a problem, where the root volume was filling up. This stopped new logs from appearing. So we rebuild the ELK server with a 10x larger root volume. As we had used terraform and ansible, the rebuild was easy. And quickly we had are logging system back online.

A week or so later though we had trouble again. It seemed that some of the load balancer log data wasn’t showing up. We spent a day troubleshooting, and eventually found out why. Those S3 buckets weren’t being ingested.

Turns out when we added those, we added them to the config file *directly* on the server, but not in the configuration scripts.

Moral of the story…

“Always update the automation scripts first, and apply those to the server. Don’t work on the server directly.”

Related: Are you getting good at Terraform or wrestling with a bear

Beware of account limits

As you’re building your stack in us-east-1, you may later go and try to create another copy of it. Suddenly AWS complains that there are no VPCs left. Or you’ve hit a maximum of 20 EC2 instances. While these errors may be irritating, you should be glad to have them. With them in place, and errant piece of infra code or application cannot accidentally run up your account and receive a surprise bill.

That said you should be mindful of those limits, and increase them before your application hits a wall.

A few that I’ve run into:

o 20 ec2 instances per region
o 5 Elastic IPs per region
o 5 VPCs per region

If your application requires more, prepare to switch regions, or up those service limits. You can open a support ticket to do that.

Related: How do you interview for key AWS skills?

What resources live on past a stack build/destroy cycle

As you build your stack with infrastructure code, you’ll tear it down again often. Each time you do this, you’ll be reminded of one thing. Any data inside there will be gone.

That means for starters don’t store things in the filesystem. Store them in a database. RDS is great for this purpose. Then the question becomes, when I destroy my stack, how can I backup and restore my database. RDS does support this, but if you have more nuanced requirements, you may have to build your own backup & restore.

What about cloudwatch logs? As long as your stack doesn’t destroy those resources, they’ll be kept in perpetuity for you. You may want to further back them up.

What about your load balancer logs? Here you can either create the S3 log bucket *outside* of your infra code. In that case it won’t get cleaned up during a destroy. Alternatively you can create a meta bucket for load balancer logs, and copy those over regularly. Then when you cleanup your infra, you can do a bucket destroy with –force option.

Related: Is Amazon too big to fail?

Some things remain manual but can be made easier

One thing that remains manual in AWS is the SSL certificate creation. You can “request” a new certificate and select DNS validation.

When you do this, incorporate the certificate control cname and certificate control record into the infra code as variables. Then copy/paste these two values from your certificate dashboard.

Assuming your nameservers are pointing to aws, the certificate validation code should spot the above secret control record in DNS. When it does it will conclude that you control the zone, and therefore validate your SSL certificate.

Once it shows VALID on the dashboard, copy the SSL certificate ARN, and pop that into your terraform code. You will add it as part of an SSL listener to your ALB (application load balancer) configuration.

Related: Does AWS have a dirty little secret?

Don’t just monitor metrics

Monitoring is of course important. You’ll want to setup a prometheus server that can do discovery. This allows it to dynamically configure and learn about new servers, so it can monitor those too. It does this by using the AWS API to find out what is there to monitor.

All of this monitoring is crucial, but it applies mainly to server metrics. What is the load average, CPU utilization, memory or disk usage.

What about your log counts? As I describe above, a logstash misconfiguration meant that log records didn’t show up. However this was only noticed through manual discovery. We want that to be automatically discovered!

Do that by creating checks that count records, and alert to numbers that are

You can also validate other data with checksums, by creating your own custom methods. You’ll need to think intelligently about your application, and what type of checks make sense.

Related: How I resolved some tough Docker problems on Amazon ECS

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Has Apple jumped the shark or is app complexity out of control?

I was recently using my mac. I say that in jest. Because like all digital creatures, moderns, everyone with a pulse these days I suppose, like all those people I use my mac everyday.

So when I see something strange, it’s rather jarring. Because when you use a device everyday, you become quite familiar with it. And with that familiarity comes comfort.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

So what to my surprise when I was doing a task I do very often. Simply trying to add a contact from email into my address book. And I got the lovely and strange dialog you see above.

Now to most users it might not scare them much. I mean you know, just click one or other. Simple right? Wrong!

1. Click DELETE

As a long time database admin, I know that before you ever delete *ANYTHING* you make a backup first. Now in the above scenario, the backup menu item is blanked out. You cannot do a backup first. Unless you go to system backups.

Of course I did that, and that took me on a multi-hour wild goose chase, that is still not resolved.

Related: Why Fred Wilson was wrong about Apple

2. Click KEEP

Why not choose that option? Well for me my contacts database is sort of sacred. It’s where I keep track of everyone I’ve ever worked with. Now I never gave Linkedin permission to write anything to my contacts database, so who knows if they got in there, or how. But I certainly don’t want them in there. If I click keep, how the *heck* will I ever get thousands of contacts out of my database again?

Related: Why the Android ecosystem *was* and maybe still *is* broken

3. Where’s my shark?

Sadly, this is the state of computing these days. I don’t know if I was accidentally tricked into *allowing* linkedin to dump its contacts into my database, or if an upgrade changed some default, sneaking the data in there through a side channel. Whatever the case, I’m stuck now.

And Apple was supposed to protect us from this craziness. Apple where’ve you gone?

Related: Do Linux & Apple tell the Gilgamesh story of hacker culture?

4. Complexity is a hard beast to wrestle

This may be the ultimate culprit. As more apps hit the app store, and more codepaths lay dormant, more bugs or call them surprise features, lay undiscovered.

With all these power we have in our hands, it seems it is devolving by day, into a bigger and bigger mess.

Related: Is Apple betting against big data?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Are you getting good at Terraform or wrestling with a bear?

via GIPHY

Terraform can do some amazing things, but it can be a real headache sometimes. It can remind you that it’s a fledgling child in some ways.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

I ran into a number of errors and frequent problems, so I thought I’d summarize those solutions.

Hope this helps you guys & girls out there wrestling with bears!

1. Problems with module source syntax

module "test-iheavy" {
  source = "https://[email protected]/iheavy_automation.git"
}

Simple and innocuous looking right? Terraform doesn’t know how to even argue!

✦ terraform init   
Initializing modules...
- module.test-iheavy
  Getting source "https://[email protected]/iheavy_automation.git"
Error downloading modules: Error loading modules: error downloading 'https://[email protected]/iheavy_automation.git': Get /account/signin/?next=/account/signin/%3Fnext%3D/account/signin/%253Fnext%253D/account/signin/%25253Fnext%25253D/account/signin/%2525253Fnext%2525253D/account/signin/%252525253Fnext%252525253D/account/signin/%25252525253Fnext%25252525253D/account/signin/%2525252525253Fnext%2525252525253D/account/signin/%252525252525253Fnext%252525252525253D/account/signin/%25252525252525253Fnext%25252525252525253D/iheavy/iheavy_automation.git%2525252525252525253Fterraform-get%2525252525252525253D1: stopped after 10 redirects

Well it's just terraform speaking in it's friendly way!

Change the source line and add "git::" and you're all set:

module "test-iheavy" {
  source = "git::https://[email protected]/iheavy_automation.git"
}

Related: How do I migrate my skills to the cloud?

2. Trouble with S3 buckets?

S3 buckets are a real pain with infrastructure code. First time around you create them, and you're happy to move on. But later you try to destroy that infrastructure and rebuild, and inevitably your bucket has files in it.

Other scenarios include where separate infra code has created a shared bucket that you want to access.

The nature of S3 buckets means they are shared across infra, but terraform doesn't like to plan in others sandboxes.

One solution I've found that works well is to add an enable/disable flag.

resource "aws_s3_bucket" "sean-bucket" {
    count = "${var.enable-sean-bucket ? 1 : 0}"
    bucket = "${var.sean-bucket-name"
}

You'll also need to add and entry to your vars.tf file:

variable "enable-sean-bucket" {
  default = "false"
}

Then inside your main.tf you can either enable it, or disable or leave at default without setting it at all.

module "test-iheavy" {
  source = "https://[email protected]/iheavy_automation.git"

  enable-sean-bucket = true
}

Related: How to use terraform to setup vpc and bastion box

3. Play nice with git

Your .gitignore file will help you if only you put it to use.

.terraform*
terraform.tfstate*
*~

Notice it's not just ".terraform". Sometimes terraform creates other .terraform-xyz directories, so if you just ignore .terraform you'll later get junk commiting to your git repo. Ugh.

Same for the state files, it creates other backup ones, and weird versioned ones.

The "*~" is because emacs writes autosave files with ~

Related: How to setup an amazon ecs cluster with terraform

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don't work with recruiters

My answers to some CS career questions

via GIPHY

I flip through Reddit from time to time. One sub that I enjoy is CS Career Questions.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

I guess I enjoy posting and sharing knowledge there, because many of the questions seem so familiar, as ones that I pondered at some point or other along the way.

Here are a few questions, and the answers I shared.

1. I’m lacking direction

“I’m a 20 year old college student about to switch from an unrelated major into CS this upcoming fall. I’ve decided that I want to become a software developer because of my interest in game and web development. Also because I love mathematics and problem solving. I know the basics of Lua, HTML, CSS, Javascript, and PHP and try to get some programming in every day (at least 1 hour)”

My response:

“Paths will change. Just finish. Which degree you end up with does not mather. It will be how you apply it. Gates, Dell, Zuckerberg & Jobs all quit school right where you are now to pursue real world business.”

Related: What’s the luckiest thing that’s happened in your career?

2. Overwhelmed with amount of work, I want to quit my job. Advice?

“Long story short. My current manger is completely unrealistic in terms of amount of done to be in a short amount of time. I’ve keep working overtime every day and weekends to fix issues and I’m tired of it. I want to go to some training and take vacation soon and while my company approved my training, I’m afraid to ask for vacation while this project needs to be done ASAP.
I’m just feeling so overworked that I really need vacation ASAP.
I’m so tempted to quit my job or start another job hunt”

My response:

“I found myself in a similar situation a few years ago. And a colleague advised me “Sean, sometimes you have to let things break a little”. This seemed incredibly odd advice. however when i tried it i was very surprised. Management didn’t “blame it all on me” as i expected they would. In fact they didn’t blame anything on me merely adjusted their timelines.
Lesson learned, we cannot carry the entire org problems on our own shoulders. And no one is expecting us to.”

Related: Is Amazon Web Services too complex for small dev teams?

3. Possible red flags in startup? How can I know for sure?

“Basically my question is this: what questions should I be asking to know what I need to know? The main thing I’m afraid of is the engineering manager treating the engineers like dogshit, where we work insane hours and don’t really have control over what we work on. How can I coax that information out of them?”

My Answer:

“i would trust your gut. i have worked in companies that were all over the place organizationally but there were no weird tells on glassdoor.
Also as far as hours you are expected to work, keep the emails for documentation. remember there are also labor laws protecting w2 employees so you’re fine. just leave at 5 :)”

Related: What makes a highly valued Docker expert?

4. I accepted a job, then got a better offer from another company

“So…I accepted and started a new job…but 1.5 weeks later I hear back with an even better offer from a larger company I applied to 4 months ago.”

My Answer:

“It is a tough position to be in, but also a “good problem to have”.
Don’t burn bridges. but business is business as they say. you could ask if they want to counteroffer. but there may be bad blood now.”

Related: Curve ball technology questions and solutions

5. Am I negotiating, or just being greedy?


“I landed a job with a big name company (non-Big 4). The offered pay is a solid 25% jump from my old job, the team is what I’ve been looking for, 10% annual bonus, etc. Should ask for more money? Every dime will obviously make my life easier and I certainly don’t want to fall behind on my career pay as a whole.”

One reader’s response:

“If it makes you uncomfortable to ask for more money, You might also have a think about what you really value and negotiate for that instead. Maybe thats two extra weeks of vacation?”

My Answer:

“Two extra weeks vacation is money. Negotiate the best you can. Only you can advocate for yourself”

Related: 6 Devops interview questions

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is on-demand consulting the answer to your hiring woes?

via GIPHY

A consultant costs more per hour than a developer you can hire right? That depends!

Join 33,000 others and follow Sean Hull on twitter @hullsean.

A big firm may cost a few thousand per day. But a smaller firm or one-man shop can bring you savings in line with a team hire.

1. You’re still looking

Have you been looking for 3 months? 6 months? You might find someone. But maybe never? I

Also: Is the difference between dev & ops a four-letter word?

2. On-boarding takes forever

***

Related: Is automation killing old-school operations?

3. Fulltime hires quit

I’ve worked at a few firms where the fulltime hires quit within a few months. Why? One was a very mismanaged team. They were juggling a lot of technical debt & lacked leadership direction. Devs were frustrated and morale was suffering.

At another firm the CTO left. A new one replaced him who started throwing his weight around. Many of the old team members got fed up & left.

In all these cases a consultant will still be there, working day-by-day, getting things done. I wrote about this How do we measure devotion.

Read: Do managers underestimate operational cost?

4. Halftime need

Smaller demand? Perhaps your capacity isn’t a full 40-hour week. Then an on-demand hire is really ideal.

Also: Is the difference between dev & ops a four-letter word?

5. Hit the ground running

Of course the biggest advantage is quicker on-boarding. You can expect productive work right away. That’s because a solo consultant has a lot of experience jumping right into the fray, and making an impact right away.

Also: Is the difference between dev & ops a four-letter word?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Essential links this week

via GIPHY

Here’s some links & interesting stuff I’ve stumbled on this week. Enjoy!

Join 33,000 others and follow Sean Hull on twitter @hullsean.

1. Start coding

Looking to start coding? Take a look at Open source for beginners. It’s a graphical list of projects on github, great for beginners!

Also: 30 questions to ask a serverless fanboy

2. DIY Serverless

Interested in serverless & wanna dig past the hype? Take a look at this Functions as a Service howto which shows how to build lambda type offering in Kubernetes or Docker Swarm. Cool yo!

Related: Learning from the Dyn DNS outage

3. Serverless Use Cases

Curious when & where Amazon Lambda might make sense? Any and all microservices? Here’s a newstack article on viable use cases for serverless computing.

Read: Does Amazon Redshift have a dirty little secret?

4. Origami design software

Random, weird, and kinda cool! Robert Lang has designed some Origami software called TreeMaker. It replaces the pencil & paper method of designing new origami figures. Use the software to push the limits of paper folding further!

Also: My DIY Disqus.com hack for blog discovery

5. A distributed relational database that works?

Bloomberg LP has designed a relational database called Comdb2. Unlike many of it’s NoSQL peers, this distributed database is relational, speaks SQL, and is also highly available. Amazing!

Also: Are SQL databases dead?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Some irresistible reading for March – outages, code, databases, legacy & hiring

via GIPHY

I decided this week to write a different type of blog post. Because some of my favorite newsletters are lists of articles on topics of the day.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Here’s what I’m reading right now.

1. On Outages

While everyone is scrambling to figure out why part of the internet went down … wait is S3 is part of the internet, really? While I’m figuring out if it is a service of Amazon, or if Amazon is so big that Amazon *is* the internet now…

Let’s look at s3 architectural flaws in depth.

Meanwhile Gitlab had an outage too in which they *gasp* lost data. Seriously? An outage is one thing, losing data though. Hmmm…

And this article is brilliant on so many levels. No least because Matthew knows that “post truth” is a trending topic now, and uses it his title. So here we go, AWS Service status truth in a post truth world. Wow!

And meanwhile the Atlantic tries to track down where exactly are those Amazon datacenters?

Also: Is Amazon too big to fail?

2. On Code

Project wise I’m fiddling around with a few fun things.

Take a look at Guy Geerling’s Ansible on a Mac playbooks. Nice!

And meanwhile a very nice deep dive on Amazon Lambda serverless best practices.

Brandur Leach explains how to build awesome APIs aka ones that are robust & idempotent

Meanwhile Frans Rosen explains how to 0wn slack. And no you don’t want this. 🙂

Related: 5 surprising features in Amazon’s serverless Lambda offering

3. On Hiring & Talent

Are you a rock star dev or a digital nomad? Take a look at the 12 best international cities to live in for software devs.

And if you’re wondering who’s hiring? Well just about everyone!

Devs are you blogging? You should be.

Looking to learn or teach… check out codementor.

Also: why did dev & ops used to be separate job roles?

4. On Legacy Systems

I loved Drew Bell’s story of stumbling into home ownership, attempting to fix a doorbell, and falling down a familiar rabbit hole. With parallels to legacy software systems… aka any older then oh say five years?

Ian Bogost ruminates why nothing works anymore… and I don’t think an hour goes by where I don’t ask myself the same question!

Also: Are we fast approaching cloud-mageddon?

5. On Databases

If you grew up on the virtual world of the cloud, you may have never touched hardware besides your own laptop. Developing in this world may completely remove us from understanding those pesky underlying physical layers. Yes indeed folks containers do run in “virtual” machines, but those themselves are running on metal, somewhere down the stack.

With that let’s not forget that No, databases are not for containers… but a healthy reminder ain’t bad..

Meanwhile Larry’s mothership is sinking…(hint: Oracle) Does anybody really care? Now’s the time to revisit Mike Wilson’s classic The difference between god and Larry Ellison.

Read: Are SQL Databases Dead?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Launch Festival 2016 Tickets for San Francisco event

launch festival 2016

One of the biggest startup festivals of the year LAUNCH is coming up next week in San Francisco.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Are you feeling lucky? If so enter to win tickets to Launch 2016. The winner will get a pass to the event, a one-on-one with Jason Calacanis, a one-year RushTix pass which seems pretty damn cool, and lastly a Dining on Reserve pass.

Nice!

1. Launch Festival 2016

The Launch Festival is a creation of Jason Calacanis. Formerly one of New York’s own, he started Silicon Alley Reporter way back in the dot-com era v1.0. Remember that? After some huge successes here, he moved on to become a huge figure in the Silicon Valley scene & the bay area.

Past events have included folks like Paul Graham & Mark Cuban & this years event is shaping up to fill Fort Mason Center to capacity again.

Also: Why is everyone suddenly talking about Amazon Redshift?

2. RushTix Membership

RushTix is a membership based way to discover local artists, concerts & events in the bay area. As a member you get comped tickets to all sorts of cool events. Check it out!

Related: Which tech do startups use most?

3. Dining on Reserve

Reserve consolidates restaurant discovery, reservations, and payment all in one smartphone app. What’s more you can use it at restaurants in a few big cities, like our own New York, LA, SF, Philadelphia, Boston & DC. Not bad!

Related: Is data your dirty little secret?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Best of Scalability, Speed & Performance posts

Russian_Dolls

Join 28,000 others and follow Sean Hull on twitter @hullsean.

Twitter IPO

Why did the Twitter IPO filing mention scalability?

It’s been a while since the twitter IPO, and they’ve had their ups and downs. An interesting little side note in the IPO filing mentioned speed, performance & scalability.

5 things toxic to scalability

5 Things toxic to scalability

Still one of our all time most popular articles, this post garnered 20,000 views alone. Covering the five biggest problems web applications face around scalability.

Pitfalls

5 Scalability pitfalls to avoid

Another twist on a popular theme, some of the common pitfalls startups stumble over on scalability.

Hire generalists

Are generalists better at scaling the web?

If you’re hiring to scale the web, think twice before hiring specialists. It may be the generalists that provide the most comprehensive help.

Scalablity happiness

What one change promotes scalability happiness?

If there’s one thing that can help most websites with speed & performance, this has got to be it!

Is scalability big business?

Why is scalability such big business?

Scalability remains a challenge for many web startups. What’s the reason and does that make it big business?

Are ceos hiding scalability problems?

Are Startup CEOs hiding scalability problems?

Are their technology choices that amount to sweeping problems under the rug?

5 ways startups misstep on scalability

5 Ways startups misstep on scalability

Missteps abound, here are some of the biggest for startups.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Best of Scalable Startups Devops Content

strawberries

I wonder if I can blog about devops without first level setting on what the term means. Yes I’ll agree it’s used broadly, sometimes as a buzzword, sometimes as a catch-all phrase. Luckily I already wrote a post like that… What is devops and why is it important?.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

Fear of automation

There’s a lot of automation happening in the cloud. A lot more configuration management (chef, puppet, ansible) is in use. I’ve seen some platform as a service companies (Heroku & EngineYard are examples of these) argue that you can now spend more on devs. You won’t need an operations staff. This raises the question Is automation killing old-school ops?.

NoSQL taking over…

If you look left some startup is building on Mongodb, and look right and another is building on Cassandra. It makes you wonder, Are sql databases dead.

Death of MySQL?

While we’re on the topic of relational databases, it’s been six years since Oracle’s purchase of Sun Microsystems. Some are still worried, Will Oracle kill MySQL?

Big mistakes!

Mistakes happen in the datacenter. Sometimes *big* mistakes. You’ll cringe at When fat fingers take down your database but hopefully learn a few things about what not to do!

Hurricane lessons

Two years after a hurricane devastated lower manhattan we can still learn a lot. Real Disaster Recovery lessons from Sandy.

Db operations

Every startup has a database. You ignore that management at your own peril. I wrote 10 ways avoid trouble database operations

On resistance

Another week, another war story. Sometimes the job of an op, systems administrator or DBA is actually to say “no”. In this story the CTO was shouting, and tons of money was being lost every minute. Supposedly. So I wrote Does a devop need to practice the art of resistence?

Perspectives & mandates

Ops & devs look at the world in different ways. I argue that’s because the business asks them to do very different things. Devs are tasked with bringing change, through new code & product features. Ops are tasked with continuity, stability, uptime & performance. That often means resistance to change. So I wonder Does a four letter word divide dev & ops?

Database as a service?

You’re looking at Amazon Web Services, and wondering, should I use their RDS database service or build my own MySQL? Here are 10 use cases for RDS or MySQL.

On High availability

99.999% uptime you say? Is there a myth of five nines that we’re still struggling with?

Open Source

Many custom Oracle applications could just as easily run on MySQL. But if you’re going to migrate from Oracle to MySQL, prepare to bushwack. Open source is a jungle!

What you don’t know can hurt you…

If you’re a manager or CTO, beware Beware what ops doesn’t tell you mysql.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters