Tag Archives: mysql

What engineering roles are most in demand at startups?

via GIPHY

I was just reading over StackOverflow’s 2017 Developer survey. As it turns out there were some surprising findings.

Join 33,000 others and follow Sean Hull on twitter @hullsean.

One that stood out was databases. In the media, one hears more and more about NoSQL databases like Cassandra, Dynamo & Firebase. Despite all that MySQL seems to remain the most popular database by a large margin. Legacy indeed!

1. Databases

MySQL is still the most popular db by a large margin 56%. Followed by SQL Server 39%, SQLite 27% and Postgres 27%.

Related: Is Amazon too big to fail?

2. Most popular language

Javascript sits at number one for Web developers, sysadmins & Data Scientists alike. Followed by SQL.

Read: Are SQL Databases dead?

3. Most popular framework

Node.js at 47%. It’s followed by AngularJS at 44%.

Also: 5 ways to move data to Amazon Redshift

4. Most loved database

Redis sits at number one here at 65%, followed by Postgres & Mongo.

Also: Myth of five nines – why HA is overrated

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 core pieces of the Amazon Cloud puzzle to get your project off the ground

amazon cloud automation

One of the most common engagements I do is working with firms in and around the NYC startup sector. I evaluate AWS infrastructures & applications built in the Amazon cloud.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I’ve seen some patterns in customers usage of Amazon. Below is a laundry list of the most important ones.

On our products & pricing page you can see more detail including how we perform a performance review and a sample executive summary.

1. Use automation

When you first start using Amazon Web Services to host your application, you like many before you may think of it like you’re old school hosting. Setup a machine, configure it, get your code running. The traditional model of systems administration. It’s fine for a single server, but if you’re managing a more complex deploy with continuous integration, or want to be resilient to regular server failures you need to do more.

Enter the various automation tools on offer. The simplest of the three is Elastic Beanstalk. If you’re using a very standard stack & don’t need a lot of customizations, this may well work for you.

With more complex deployments you’ll likely want to look at Opsworks Sounds familiar? That’s because it *is* Opscode Chef. Everything you can do with Chef & all the templates out there will work with Amazon’s offering. Let AWS manage your templates & make sure your servers are in the right state, just like hosted chef.

If you want to get down to the assembly language layer of infrastructure in Amazon, you’ll eventually be dealing with CloudFormation. This is JSON code which defines everything, from a server with an attached EBS volume, to a VPC with security rules, IAM users & everything inbetween. It is ultimately what these other services utilize under the hood.

Also: Is Amazon too big to fail?

2. Use Advisor & Alerts

Amazon has a few cool tools to help you manage your infrastructure better. One is called Trusted Advisor . This helps you by looking at your aws usage for best practices. Cost, performance, security & high availability are the big focal points.

In order to make best use of alerts, you’ll want to do a few things. First define an auto scaling group. Even if you don’t want to use autoscaling, putting your instance into one allows amazon to do the monitoring you’ll want.

Next you’ll want to analyze your CloudWatch metrics for usage patterns. Notice a spike, could be a job that is running, or it could be a seasonal traffic spike that you need to manage. Once you have some ideas here, you can set alerts around normal & problematic usage patterns.

Related: Are we fast approaching cloud-mageddon?

3. Use Multi-factor at Login

If you haven’t already done so, you’ll want to enable multi-factor authentication on your AWS account. This provides much more security than a password (even a sufficiently long one) can ever do. You can use Google authenticator to generate the mfa codes and associated it with your smartphone.

While you’re at it, you’ll want to create at least one alternate IAM account so you’re not logging in through the root AWS account. This adds a layer of security to your infrastructure. Consider creating an account for your command line tools to spinup components in the cloud.

You can also use MFA for your command line SSH logins. This is also recommended & not terribly hard to setup.

Read: When hosting data on Amazon turns bloodsport

4. Use virtual networking

Amazon offers Virtual Private Cloud which allows you to create virtual networks within the Amazon cloud. Set your own ip address range, create route tables, gateways, subnets & control security settings.

There is another interesting offering called VPC peering. Previously, if you wanted to route between two VPCs or across the internet to your office network, you’d have to run a box within your VPC to do the networking. This became a single point of failure, and also had to be administered.

With VPC peering, Amazon can do this at the virtualization layer, without extra cost, without single point of failure & without overhead. You can even use VPC peering to network between two AWS accounts. Cool stuff!

Also: Are SQL databases dead?

5. Size instances & I/O

I worked with one startup that had been founded in 2010. They had initially built their infrastructure on AWS so they chose instances based on what was available at the time. Those were m1.large & m1.xlarge. A smart choice at the time, but oh how things evolve in the amazon world.

Now those instance types are “previous generation”. Newer instances offer SSD, more CPU & better I/O for roughly the same price. If you’re in this position, be sure to evaluate upgrading your instances.

If you’re on Amazon RDS, you may not be able to get to the newer instance sizes until you upgrade your database. Does upgrading MySQL involve much more downtime on Amazon RDS? In my experience it surely does.

Along with instance sizes, you’ll also want to evaluate disk I/O options. By default instances in amazon being multi-tenant, use disk as a shared resource. So they’ll see it go up & down dramatically. This can kill database performance & can be painful. There are expensive solutions. Consider looking at provisioned IOPS and additional SSD storage.

Also: Is the difference between dev & ops a four-letter word?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Did MySQL & Mongo have a beautiful baby called Aurora?

amazon aurora slide

Amazon recently announced RDS Aurora a new addition to their database as a service offerings.

Here’s Mark Callaghan’s take on what’s happening under the hood and thoughts from Fusheng Han.

Amazon is uniquely positioned with RDS to take on offerings like Clustrix. So it’s definitely worth reading Dave Anselmi’s take on Aurora.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

1. Big availability gains

One of the big improvements that Aurora seems to offer is around availability. You can replicate with aurora, or alternatively with MySQL binlog type replication as well. They’re also duplicating data two times in three different availability zones for six copies of data.

All this is done over their SSD storage network which means it’ll be very fast indeed.

Read: What’s best RDS or MySQL? 10 Use Cases

2. SSD means 5x faster

The Amazon RDS Aurora FAQ claims it’ll be 5x faster than equivalent hardware, but making use of it’s proprietary SSD storage network. This will be a welcome feature to anyone already running on MySQL or MySQL for RDS.

Also: Is MySQL talent in short supply?

3. Failover automation

Unplanned failover takes just a few minutes. Here customers will really be benefiting from the automation that Amazon has built around this process. Existing customers can do all of this of course, but typically require operations teams to anticipate & script the necessary steps.

Related: Will Oracle Kill MySQL?

4. Incremental backups & recovery

The new Aurora supports incremental backups & point-in-time recovery. This is traditionally a fairly manual process. In my experience MySQL customers are either unaware of the feature, or not interested in using it due to complexity. Restore last nights backup and we avoid the hassle.

I predict automation around this will be a big win for customers.

Check out: Are SQL Databases dead?

5. Warm restarts

RDS Aurora separates the buffer cache from the MySQL process. Amazon has probably accomplished this by some recoding of the stock MySQL kernel. What that means is this cache can survive a restart. Your database will then start with a warm cache, avoiding any service brownout.

I would expect this is a feature that looks great on paper, but one customers will rarely benefit from.

See also: The Myth of Five Nines – Is high availability overrated?

Unanswered questions

The FAQ says point-in-time recovery up to the last five minutes. What happens to data in those five minutes?

Presumably aurora duplication & read-replicas provide this additional protection.

If Amazon implemented Aurora as a new storage engine, doesn’t that mean new code?

As with anything your mileage may vary, but Innodb has been in the wild for many years. It is widely deployed, and thus tested in a variety of environments. Aurora may be a very new experiment.

Will real-world customers actually see 500% speedup?

Again your mileage may vary. Lets wait & see!

Related: 5 Things toxic to scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is automation killing old-school operations?

puppet logo

Join 27,000 others and follow Sean Hull on twitter @hullsean.

I was shocked to find this article on ReadWrite: The Truth About DevOps: IT Isn’t Dead; It’s not even Dying. Wait a second, do people really think this?

Truth is I have heard whispers of this before. I was at a meetup recently where the speaker claimed “With more automation you can eliminate ops. You can then spend more on devs”. To an audience of mostly developers & startup founders, I can imagine the appeal.

1. Does less ops mean more devs?

If you’re listening to a platform service sales person or a developer who needs more resources to get his or her job done, no one would be surprised to hear this. If we can automate away managing the stack, we’ll be able to clear the way for the real work that needs to be done!

This is a very seductive perspective. But it may be akin to taking on technical debt, ignoring the complexity of operations and the perspective that can inform a longer view.

chef logo

Puppet Labs’ Luke Kanies says “Become uniquely valuable. Become great at something the market finds useful.”. I couldn’t agree more.

Read: Are SQL Databases Dead?

2. What happens when developers leave?

I would argue that ops have a longer view of product lifecycle. I for one have been brought in to many projects after the first round of developers have left, and teams are trying to support that software five years after the first version was built.

That sort of long term view, of how to refresh performance, and revitalize code is a unique one. It isn’t the “building the future” mindset, the sexy products, and disruptive first mover “we’re changing the world” mentality.

It’s a more stodgy & conservative one. The mindset is of reliability, simplicity, and long term support.

Also: How to hire a developer that doesn’t suck

3. What’s your mandate?

From what I’ve seen, devs & ops are divided by a four letter word.

That word I believe is “risk”. Devs have a mandate from the business to build features & directly answer to customer requests today. Ops have a mandate to reliability, working against change and thinking in terms of making all that change manageable.

Different mandates mean different perspectives.

Related: What is Devops & why is it important?

4. Can infrastructure live as code?

Puppet along with infrastructure automation & configuration management tools like Chef offer the promise of fully automated infrastructure. But the truth is much much more complex. As typical technology stacks expand from load balancer, webserver & database, to multiple databases, caching server, search server, puppet masters, package repositories, monitoring & metrics collection & jump boxes we’re all reaching a saturation point.

Yes automation helps with that saturation, but ultimately you need people with those wide ranging skills, to manage the complex web of dependencies when things fail.

And fail they will.

Check out: Why are MySQL DBA’s and ops so hard to find?

5. ORM’s and architecture

If you aren’t familiar, ORM’s are a rather dry sounding name for a component that is regularly overlooked. It’s a middleware sitting between application & database, and they drastically simplify developers lives. It helps them write better code and get on with the work of delivering to the business. It’s no wonder they are popular.

But as Ward Cunningham elloquently explains, they are surely technical debt that eventually must get paid. Indeed.

There is broad agreement among professional DBA’s. Each query should be written, each one tuned, and each one deployed. Just like any other bit of code. Handing that process to a library is doomed to failure. Yet ORM’s are still evolving, and the dream still lives on.

And all that because devs & ops have a completely different perspective. We need both of them to run modern internet applications. Lets not forget folks. 🙂

Read this: Do managers and CTO’s underestimate operational costs?

Want more? Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Which tech do startups use most?

MySQL on Amazon Cloud AWS

Leo Polovets of Susa Ventures publishes an excellent blog called Coding VC. There you can find some excellent posts, such as pitches by analogy, and an algorithm for seed round valuations and analyzing product hunt data.

He recently wrote a blog post about a topic near and dear to my heart, Which Technologies do Startups Use. It’s worth a look.

One thing to keep in mind looking over the data, is that these are AngelList startups. So that’s not a cross section of all startups, nor does it cover more mature companies either.

In my experience startups can get it right by starting fresh, evaluating the spectrum of new technologies out there, balancing sheer solution power with a bit of prudence and long term thinking.

I like to ask these questions:

o Which technologies are fast & high performance?
o Which technologies have a big, vibrant & robust community?
o Which technologies can I find plenty of engineers to support?
o Which technologies have low operational overhead?
o Which technologies have low development overhead?

1. Database: MySQL

MySQL holds a slight lead according to the AngelList data. In my experience its not overly complex to setup and there are some experienced DBAs out there. That said database expertise can still be hard to find .

We hear a lot about MongoDB these days, and it is surely growing in popularity. Although it doesn’t support joins and arbitrary slicing and dicing of data, it is a very powerful database engine. If your application needs more straightforward data access, it can bring you amazing speed improvements.

Postgres is a close third. It’s a very sophisticated database engine. Although it may have a smaller community than MySQL, overall it’s a more full featured database. I’d have no reservations recommending it.

Also: Top MySQL DBA Interview questions

2. Hosting: Amazon

Amazon Web Services is obviously the giant in the room. They’re big, they’re cheap, they’re nimble. You have a lot of options for server types, they’ve fixed many of the problems around disk I/O and so forth. Although you may still experience latency around multi-tenant related problems, you’ll benefit from a truly global reach, and huge cost savings from the volume of customers they support.

Heroku is included although they’re a different type of service. In some sense their offering is one part operations team & one part automation. Yes ultimately you are getting hosting & virtualization, but some things are tied down. Amazon RDS provides some parallels here. I wrote Is Amazon RDS hard to manage?. Long term you’re likely going to switch to an AWS, Joyent or Rackspace for real scale.

I was surprised to see Azure on the list at all here, as I rarely see startups build on microsoft technologies. It may work for the desktop & office, but it’s not the right choice for the datacenter.

Read: Are generalists better at scaling the web?

3. Languages: Javascript

Javascript & Node.js are clearly very popular. They are also highly scalable.

In my experience I see a lot of PHP & of course Ruby too. Java although there is a lot out there, can tend to be a bear as a web dev language, and provide some additional complication, weight and overhead.

Related: Is Hunter Walk right about operations & startups?

4. Search: Elastic Search

I like that they broke apart search technology as a separate category. It is a key component of most web applications, and I do see a lot of Elastic Search & Solr.

That said I think this may be a bit skewed. I think by far the number one solution would be NO SPECIFIC SEARCH technology. That’s right, many times devs choose a database centric approach, like FULLTEXT or others that perform painfully bad.

If this is you, consider these search solutions. They will bring you huge performance gains.

Check this: Are SQL Databases Dead?

5. Automation: Chef

As with search above, I’d argue there is a far more prevalent trend, that is #1 to use none of these automation technologies.

Although I do think chef, docker & puppet can bring you real benefits, it’s a matter of having them in the right hands. Do you have an operations team that is comfortable with using them? When they leave in a years time, will your new devops also know the technology you’re using? Can you find a good balance between automation & manual configuration, and document accordingly?

Read: Why are database & operations experts so hard to find?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

If you use MySQL in the Amazon cloud, you need to ask yourself this question

Join 25,000 others and follow Sean Hull on twitter @hullsean.

Are you serious about backups?

If you’re just using Amazon EBS snapshots, that may not be sufficient. There’s a good chance it won’t protect you against your next data loss.

That’s why I like to have a few different types of backups

Also: 5 more things deadly to scalability

Protect against operator error

mysqldump is a tool every DBA is familiar with. Same as a hotbackup or snapshot you say? Just more labor? Not true.

A dump allows you to restore one table, or one schema. That’s why they’re also known as logical backups. What’s more you can edit the file, remove indexes, change object names, or datatypes. All these can be essential in the screwy and unpredictable event of a real world outage.

Expect the unexpected!

Read: Why devops talent is in short supply

Test those backups regularly

If you haven’t actually tried to restore, you really don’t know if you have everything. Did you backup stored procedures & database code? How about grants? Database events? How about cronjobs? What about the my.cnf file? And your replication configuration?

Yes there are a lot of little pieces, and testing your backups by rebuilding everything is an attempt to poke holes in your plan, and hit issues before d-day!

Related: MySQL interview guide for managers and candidates alike

Replication isn’t a backup

Replication is getting better and better in MySQL. It used to fail regularly. MyiSAM was very unpredictable. But even in the comfortable realm of Innodb, there can still be data drift. If you’re on MySQL 5.0 or 5.1, you should consider performing regular checksums. These test the integrity of data and compare what’s actually in master & slave. Bulletproofing MySQL replication with checksums.

Read: Why high availability is so very hard to deliver

Have you considered security around your backup files?

While you’re thinking about backups, make sure the files themselves are secure. Remember they contain your crown jewels. Hopefully individual data that’s sensitive is encrypted, but still you should secure their final resting place as well.

If you’re using S3, consider encrypting the file before shipping it up to the bucket.

Read this: Why a four letter word divides dev and ops

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Scalability Tips & Greatest Hits

autoscaling MySQL

Join 8000 others and follow Sean Hull on twitter @hullsean.

In the past two years we’ve written a ton of material on scalability. Here’s the greatest hits…

Why Generalists Are Better at Scaling the Web

The internet stack is a complex infrastructure of interlocking components. An scalability engineer must be adept at Linux, plus webservers, caching servers, search servers, automation services, and relational databases on the backend. We think a generalist with a broad base of experience is most suited to the job of scalability engineer.

5 Things Toxic to Scalability

ORMs should keep you up at night, but so should coupled and locking processes, a single copy of your database, missing metrics and no deployment feature flags.

5 More Things Deadly to Scalability

A followup to the original, we touch on Disk I/O, RAID, queuing in the database (a no-no), full-text searching, insufficient or missing caching and lastly the dreaded technical debt.

Scalability Happiness

A Zen monk might ask what is the sound of one hand clapping? That’s the sound your servers will be making when you apply this one simple principal.

5 Ways to Boost MySQL Scalability

Deploying MySQL as your web-facing database? Here are a few key tips to boost speed & performance.

3 Ways To Boost Cloud Scalability

Building your startup in the Amazon Web Services cloud? There are 3 things you absolutely must do.

Why Your Cloud Is Speeding for a Scalability Cliff

The cloud may seem like the obvious place to build new applications & infrastructure, but there is a precipice hidden from sight…

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

Scalability Happiness – A Quiet Query Log

Peter Van Allen - Pin Drop

Join 7500 others and follow Sean Hull on twitter @hullsean.

There’s a lot of talk on the web about scalability. Making web applications scale is not easy. The modern web architecture has so many moving parts. How can we grapple with the underlying problem?

Also: Why Are MySQL DBAs So Hard to Find?

The LAMP stack scales well

The truth that is half right. True there are a lot of moving parts, and a lot to setup. The internet stack made up of Linux, Apache, MySQL & PHP. LAMP as it’s called, was built to be resilient, dynamic, and scalable. It’s essentially why Amazon works. Why what they’re doing is possible. Windows & .NET for example don’t scale well. Strange to see Oracle mating with them, but I digress…

[quote]
Linux and LAMP that is built on top of it, are highly scalable and dynamic to begin with.
[/quote]

Also: AirBNB Didn’t Have to Fail During an AWS Outage

Ok, so what’s this got to do with MySQL? Well a LOT.

The webserver tier, the caching layers like memcache & varnish, as well as the search tier solr. These all scale fairly easily because their assets are fixed. Or almost so.

The database tier is different. So what affects performance of a database server? Server size? Main memory? Disk speed? The truth is all of those. But

Also check out: The Sexiest New Feature of AWS Speeds Up EBS

After you setup the server – set memory settings and so forth, it’s a fairly fixed object. True there are parameters to tweak but on the whole there isn’t a ton of day-to-day tuning to do.

Well if that’s true, why does performance take a hit?! As applications grow, the db server slows down, don’t we need to tweak server settings? Do we need new hardware?

Read this: A CTO Must Never Do This

The answer is possibly, but 9 times out of 10 what really needs to happen is queries must be tuned.

[quote]
In 17 years of consulting that is the single largest cause of scalability problems. Fix those queries and your problems are over.
[/quote]

The Elephant in the Room – Query Tuning

I was talking with a colleague today at AppNexus. He said, so should we do some of that work inside the application, instead of doing a huge UNION or a large JOIN? I said yes you can move work onto the application, but it makes the application more complex. On the flip side the webserver tier is easier to scale. So there are tradeoffs.

I said this:

[quote]
By and large, if scalability is our goal, we should work to quiet the activity in the slow query log. This is an active project for developers & DBAs. Keep it quiet and your server will run well.
[/quote]

Also: Top MySQL DBA Interview Questions for Candidates, Hiring Managers & Recruiters

Yet I still talk to teams where this is mysterious. It’s unclear. There’s no conviction there. And that’s where I think DBAs are failing. Because this is our subject matter expertise, and if we haven’t convinced developer teams of this, we’re not working together enough. API teams aren’t separate from DBA and operations. Siloing technology departments is a killer…

[mytweetlinks]

As you roll out new code, if some queries show up, then those need attention. Tweak the code until the queries drop out. This is the primary project of scalability.

When should I think about upgrading hardware?

If your code is stable, but you’re seeing a steady line rising on load average of the server, *THEN* go up in hardware. Load average means cpu & disk are being taxed. The server can’t keep up.

Related: Should I use RDS or build a MySQL server on AWS?

Devops means work together!

I close with a final point. Devops means bring dev & ops together! Don’t silo them off in different wings. Communicate. DBAs it’s your job to educate Developers about scalability and help with query tuning. Devs, profile new SQL code, test with large datasets & for god sakes don’t use an ORM – it’s one of 5 things toxic to scalability. Run explain and be sure to index all the right columns.

Together we can tackle this scalability thing!

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

MySQL requires an authoritative master to build slaves

In MySQL database operations, you often need to rebuild slaves. They fail for a lot of different reasons, fall out of sync or crash. When this happens you may find you need to reclone and start fresh. This is normally done by finding your authoritative master database, and doing a hotbackup.

Click through to the end for multi-master solutions that work with MySQL.

Reason 4 – You cannot reclone without single master

If you use active-active multi-master replication, you no longer have a single authoritative master copy of all your data. That means you have no reliable master to clone from, if a replica breaks.

If you’re an operational dba, you probably know that MySQL replication breaks regularly. You always need to be able to go back to a fresh copy when things get mixed up on the slaves.

NEXT: Reason 5 – MySQL row-based replication has limitations

PREV: Reason 3 – Limitations of MySQL row-based replication

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

Limitations of MySQL row-based replication

MySQL offers a few different options for how you perform replication. Statement-based has been around a lot longer, and though it has some troublesome characteristics they’re known well and can be managed. What’s more it supports online schema changes with multi-master active-passive setup. We recommend this solution.

Row-based replication is newer. It attempts to address problems like those introduced by non-deterministic functions, and replicating stored procedures. But it introduces it’s own challenges.

Click through to the end for multi-master solutions that work with MySQL.

3. Row-based replication limitations

Row-based replication addresses some of the limitations of statement based replication.

o works better with stored procedures
o reduces problems associated with non-deterministic functions

But it creates a few of it’s own, some are show-stoppers:

o won’t work if target table storage engine, column order, data types or row itself are different or missing.
o doesn’t write SQL to the binlogs – useful for troubleshooting
o harder to do point-in-time recovery without SQL in binlogs
o harder to do online schema changes by switching masters

NEXT: Reason 4 – Cannot reclone without a single master

PREV: Reason 2 – MySQL Replication is prone to failure

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample