Tag Archives: postgres

How to interview an amazon database expert

via GIPHY

Amazon releases a new database offering every other day. It sure isn’t easy to keep up.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

Let’s say you’re hiring a devops & you want to suss out their database knowledge? Or you’re hiring a professional services firm or freelance consultant. Whatever the case you’ll need to sift through for the best people. Here’s how.

Also: How to interview an AWS expert

What database does Amazon support for caching?

Caching is a popular way to speed up access to your backend database. Put Amazon’s elasticache behind your webserver, and you can reduce load on your database by 90%. Nice!

The two types that amazon supports are Memcache & Redis. Memcache is historically more popular. These days Redis seems a clear winner. It’s faster, and can maintain your cached data between restarts. That will save you I promise!

Also: Is AWS too complex for small dev teams?

How can I store big data in AWS?

Amazon’s data warehouse offering is called Redshift. I wrote Why is everyone suddenly talking about Redshift?. Why indeed!

When you’re doing large reports for your business intelligence team, you don’t want to bog down your backend relational database. Redshift is purpose built for this use case.

I’ve see a report that took over 8 hours in MySQL return in under 60 seconds in Redshift!

A new offering is Amazon Spectrum. This tech is super cool. Load up all your data into S3, in standard CSV format. Then without even loading it into Redshift, you can query the S3 data directly. This is super useful. Firstly because S3 is 1/10th the price. But also because it allows you to stage your data before loading into Redshift itself. Goodbye Google Big Query! I talked about spectrum here.

Related: Which engineering roles are in greatest demand?

What relational database options are there on Amazon?

Amazon supports a number of options through it’s Relational Database Service or RDS. This is managed databases, which means less work on your DBAs shoulders. It also may make upgrades slower and harder with more downtime, but you get what you pay for.

There are a lot of platforms available. As you might guess MySQL & Postgres are there. Great! Even better you can use MariaDB if that’s your favorite. You can also go with Aurora which is Amazon’s own home-brew drop in replacement for MySQL that promises greater durability and some speedups.

If you’re a glutton for punishment, you can even get Oracle & SQL Server working on RDS. Very nice!

Read: Can on-demand consulting save startups time & money?

Does AWS have a NoSQL database solution?

If NoSQL is to your taste, Amazon has DynamoDB. According to . I haven’t seen a lot of large production applications using it, but what he describes makes a lot of sense. The way Amazon scales nodes & data I/O is bound to run into real performance problems.

That said it can be a great way to get you up and running quickly.

Read: Can on-demand consulting save startups time & money?

How do I do ETL & migrate data to AWS?

Let’s be honest, Amazon wants to make this really easy. The quicker & simpler it is to get your data there, that more you’ll buy!

Amazon’s Database Migration Service or DMS allows you to configure your old database as a data source, then choose a Amazon db solution as destination, then just turn on the spigot and pump your data in!

ETL is extract transform and load, data warehouse terminology for slicing and dicing data before you load it into your warehouse. Many of todays warehouses are being built with the data lake model, because databases like Redshift have gotten so damn fast. That model means you stage all your source data as-is in your warehouse, then build views & summary tables as needed to speed up queries & reports. Even better you might look a tool like xplenty.

Amazon’s new offering is called Glue. Five ways to get data into Amazon Redshift. This solution is purpose build for creating a powerful data pipeline, complete with python code to do transformations.

Read: Is data your dirty little secret?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Will SQL just die already?

With tons of new No-SQL database offerings everyday, developers & architects have a lot of options. Cassandra, Mongodb, Couchdb, Dynamodb & Firebase to name a few.

Join 33,000 others and follow Sean Hull on twitter @hullsean.

What’s more in the data warehouse space, you have Hadoop, which can churn through terabytes of data and get you results back before lunchtime!

So when I stumbled on this article SQL is 43 years old, I was intrigued.

Answer the questions you haven’t thought of

No-SQL databases are great if you know how you want to access the data. Users come from the users table, and that’s that!

But if later on you want to ask questions like, which users watched this video, which users are active, which users spent $100 in January? These questions may not be possible because NoSQL can’t join those other tables.

Relational databases shine when you need to aggregate your data, reorganize it, or ask unanticipated questions. And aren’t those most of the interesting questions?

Also: Top serverless interview questions for hiring aws lambda experts

Big Query, Redshift & even Hive speak SQL

I wrote that despite recent popularity in Hadoop, Redshift seems to be eating their lunch. And what would you know, surprise surprise, Amazon’s newish data warehousing solution, speaks SQL! What’s more there’s Apache Hive, which allows you to query Hadoop with, drumroll please… SQL!

Bigquery is the other major bigdata offering from none other than Google. And it too uses SQL!

Related: Which engineering roles are in greatest demand?

Still dominant

If you look at Stackoverflow’s developer survey, you’ll see that SQL is the second most popular language. Why might that be? For one thing it’s simple to learn. Enough that even business users can write simple requests, join & aggregate data.

Read: Can on-demand consulting save startups time & money?

Rugged, Proven & Open

SQL having been around so long is a fairly open standard. Sure there are extensions of it, but most of the basic stuff is there in all the products. That means you learn it once, and can interact with databases across the spectrum. That’s a win for everybody.

Also: 30 questions to ask a serverless fanboy

Business users can write it

Another under appreciated feature though is that basic queries are easy to write. They don’t require complex syntax like a hadoop job, or your favorite imperative programming language. The queries are readable, almost english-like sentences.

Given all that, it seems SQL is likely to be around for a long time to come!

Also: What can startups learn from the DYN DNS outage?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Which tech do startups use most?

MySQL on Amazon Cloud AWS

Leo Polovets of Susa Ventures publishes an excellent blog called Coding VC. There you can find some excellent posts, such as pitches by analogy, and an algorithm for seed round valuations and analyzing product hunt data.

He recently wrote a blog post about a topic near and dear to my heart, Which Technologies do Startups Use. It’s worth a look.

One thing to keep in mind looking over the data, is that these are AngelList startups. So that’s not a cross section of all startups, nor does it cover more mature companies either.

In my experience startups can get it right by starting fresh, evaluating the spectrum of new technologies out there, balancing sheer solution power with a bit of prudence and long term thinking.

I like to ask these questions:

o Which technologies are fast & high performance?
o Which technologies have a big, vibrant & robust community?
o Which technologies can I find plenty of engineers to support?
o Which technologies have low operational overhead?
o Which technologies have low development overhead?

1. Database: MySQL

MySQL holds a slight lead according to the AngelList data. In my experience its not overly complex to setup and there are some experienced DBAs out there. That said database expertise can still be hard to find .

We hear a lot about MongoDB these days, and it is surely growing in popularity. Although it doesn’t support joins and arbitrary slicing and dicing of data, it is a very powerful database engine. If your application needs more straightforward data access, it can bring you amazing speed improvements.

Postgres is a close third. It’s a very sophisticated database engine. Although it may have a smaller community than MySQL, overall it’s a more full featured database. I’d have no reservations recommending it.

Also: Top MySQL DBA Interview questions

2. Hosting: Amazon

Amazon Web Services is obviously the giant in the room. They’re big, they’re cheap, they’re nimble. You have a lot of options for server types, they’ve fixed many of the problems around disk I/O and so forth. Although you may still experience latency around multi-tenant related problems, you’ll benefit from a truly global reach, and huge cost savings from the volume of customers they support.

Heroku is included although they’re a different type of service. In some sense their offering is one part operations team & one part automation. Yes ultimately you are getting hosting & virtualization, but some things are tied down. Amazon RDS provides some parallels here. I wrote Is Amazon RDS hard to manage?. Long term you’re likely going to switch to an AWS, Joyent or Rackspace for real scale.

I was surprised to see Azure on the list at all here, as I rarely see startups build on microsoft technologies. It may work for the desktop & office, but it’s not the right choice for the datacenter.

Read: Are generalists better at scaling the web?

3. Languages: Javascript

Javascript & Node.js are clearly very popular. They are also highly scalable.

In my experience I see a lot of PHP & of course Ruby too. Java although there is a lot out there, can tend to be a bear as a web dev language, and provide some additional complication, weight and overhead.

Related: Is Hunter Walk right about operations & startups?

4. Search: Elastic Search

I like that they broke apart search technology as a separate category. It is a key component of most web applications, and I do see a lot of Elastic Search & Solr.

That said I think this may be a bit skewed. I think by far the number one solution would be NO SPECIFIC SEARCH technology. That’s right, many times devs choose a database centric approach, like FULLTEXT or others that perform painfully bad.

If this is you, consider these search solutions. They will bring you huge performance gains.

Check this: Are SQL Databases Dead?

5. Automation: Chef

As with search above, I’d argue there is a far more prevalent trend, that is #1 to use none of these automation technologies.

Although I do think chef, docker & puppet can bring you real benefits, it’s a matter of having them in the right hands. Do you have an operations team that is comfortable with using them? When they leave in a years time, will your new devops also know the technology you’re using? Can you find a good balance between automation & manual configuration, and document accordingly?

Read: Why are database & operations experts so hard to find?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters