Tag Archives: dynamodb

Top Amazon Lambda questions for hiring a serverless expert

via GIPHY

If you’re looking to fill a job roll that says microservices or find an expert that knows all about serverless computing, you’ll want to have a battery of questions to ask them.

Join 33,000 others and follow Sean Hull on twitter @hullsean.

For technical interviews, I like to focus on concepts & the big picture. Which rules out coding exercises or other puzzles which I think are distracting from the process. I really like what what the guys at 37 Signals say

“Hire for attitude. Train for skill.”

So let’s get started.

1. How do you automate deployment?

Programming lambda functions is much like programming in other areas, with some particular challenges. When you first dive in, you’ll use the Amazon dashboard to upload a zipfile with your code. But as you become more proficient, you’ll want to create a deployment pipeline.

o What features in Amazon facilitate automatic deployments?

AWS Lambda supports environment variables. Use these for credentials & other data you don’t want in your deployment package.

Amazon’s serverless offering, also supports aliases. You can have a dev, stage & production alias. That way you can deploy functions for testing, without interrupting production code. What’s more when you are ready to push to production, the endpoint doesn’t change.

o What frameworks are available for serverless?

Serverless Framework is the most full featured option. It fully supports Amazon Lambda & as of 1.0 provides support for other platforms such as IBM Openwhisk, Google Cloud Functions & Azure functions. There is also something called SAM or Serverless Application Model which extends CloudFormation. With this, you can script changes to API Gateway, Dynamo DB & Cognos authentication stuff.

If you’re using Auth0 instead of Cognito or Firebase instead of Dynamodb, you’ll have to come up with your own way to automate changes there.

Also: Is the difference between dev & ops a four-letter word?

2. What are the pros of serverless?

Why are we moving to a serverless computing model? What are the advantages & benefits of it?

o easier operations means faster time to market
o large application components become managed
o reduced costs, only pay while code is running
o faster deploy means more experimentation, more agile
o no more worry about which servers will this code run on?
o reduced people costs & less infrastructure
o no chef playbooks to manage, no deploy keys or IAM roles

Related: Is automation killing old-school operations?

3. What are the cons of serverless?

There are a lot of fanboys of serverless, because of the promise & hope of this new paradigm. But what about healthy criticism? A little dose of reality can identify a critical & active mind.

o With Lambda you have less vendor control which could mean… more downtime, system limits, sudden cost changes, loss of functionality or features and possible forced API upgrades. Remember that Amazon will choose the needs of the many over your specific application idiosyncracies.

o There’s no dedicated hardware option with serverless. So you have the multi-tenant challenges of security & performance problems of other customers code. You may even bump into problems because of other customers errors!

o Vendor lock-in is a real obvious issue. Changing to Google Cloud Functions or Azure Functions would mean new deployment & monitoring tools, a code rewrite & rearchitect, and new infrastructure too. You would also have to export & import your data. How easy does Amazon make this process?

o You can no longer store application & state data in local server memory. Because each instantiation of a function will effectively be a new “server”. So everything must be stored in the database. This may affect performance.

o Testing is more complicated. With multiple vendors, integration testing becomes more crucial. Also how do you create dev db instance? How do you fully test offline on a laptop?

o You could hit system wide limits. For example a big dev deploy could take out production functions by hitting an AWS account limit. You would thus have DDoS yourself! You can also hit the 5 minute execution time limit. And code will get aborted!

o How do you do zero downtime deployments? Since Amazon currently deploys function-by-function, if you have a group of 10 or 20 that act as a unit, they will get deployed in pieces. So your app would need to be taken offline during that period or it would be executing some from old version & some from new version together. With unpredictable results.

Read: Do managers underestimate operational cost?

4. How does security change?

o In serverless you may use multiple vendors, such as Auth0 for authentication, and perhaps Firebase for your data. With Lambda as your serverless platform you now have three vendors to work with. More vendors means a larger area across which hackers may attack your application.

o With the function as a service application model, you lose the protective wall around your database. It is no longer safely deployed & hidden behind a private subnet. Is this sufficient protection of your key data assets?

Also: Is the difference between dev & ops a four-letter word?

5. How do you troubleshoot & debug microservices?

o Monitoring & debugging is still very limited. This becomes a more complex process in the serverless world. You can log error & warning messages to CloudWatch.

o Currently Lambda doesn’t have any open API for third party tooling. This will probably come with time, but again it’s hard to see & examine a serverless function “server” while it is running.

o For example there is no New Relic for serverless.

o Performance tuning may be a bit of a guessing game in the serverless space right now. Amazon will surely be expanding it’s offering, and this is one area that will need attention.

Also: Is the difference between dev & ops a four-letter word?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 things you didn’t know about Dynamodb that are hurting you bad

amazon-dynamo-db

If you’re like a lot of folks you’re building an application in AWS & using a NoSQL database for persistent data. Dynamodb fits the bill nicely. Little or no ops to worry about, at least in the traditional sense.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

However there are knobs to turn & dials to set. Here are a few you should be thinking about.

1. You can replicate across regions

Dynamodb introduced a feature in 2015 called streams. If you come from the relational database world, you can think of streams like a transaction log. It captures before & after image of your data. Couple those with useful lambda functions, and you have triggers that can do anything you want.

Turns out Amazon have been all over this, and already build a library to do cross-region replication with streams. Pretty cool!

Also: Is aws too complex for small dev teams?

2. You can manage retrieval costs

Dynamodb automatically creates and manages an index on the primary key. But chances are that your application will read data based on other columns too. You can create secondary indexes on these other columns, reducing your data access patterns. Without an index Dynamodb would have to scan every row to find your data, but the index can dramatically reduce this, and making data retrieval faster too!

Related: Does Amazon eat it’s own dogfood?

3. You can do SQL Like queries

That’s right, if you thought NoSQL meant no SQL you were only half right. By loading your Dynamodb data into HDFS, you can allow elastic map reduce to have at it. And thus open the door to use HiveQL to query the data the way you wanted to in the first place.

Convoluted? Yes. But this is the brave new world of the cloud!

Read: Is AMazon too big to fail?

4. Partitions are handy & useful

By default dynamo is partitioning your data behind the scenes. Because that’s what good distributed databases are supposed to do. It does so using the primary key to figure out where the data should go. And just like with Redshift you have option of also using sort key to help the optimizer figure out how to distribute the data. This is important. Going across those different instances brings a lot of latency costs that will surprise you.

Also: When hosting data on Amazon turned bloodsport

5. Metrics are your partner in performance

CloudWatch provides all sorts of instrumentation for Dynamodb. Read & write activity, throttling, errors & latency are just a few of the things you can see.

Also: Is aws the patient that needs constant medication?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What is eventually consistent and will it work for you?

aws dynamodb

In the tech world we’re fond of inventing all sorts of technical terms, that are admittedly kind of confusing.

Join 13,500 others and follow Sean Hull on twitter @hullsean.

I was attending an excellent talk recently called Data at Scale, part of the Database Month series that Eric Benari hosts. In it Mark Uhrmacher presented some phenomenal solutions which worked for flash site ideeli. It allowed them to support their incredible business model, where 15% of traffic would happen in 15 minutes everyday. As he called it a “self-imposed denial of service attack”. Interesting analogy.

What occurred to me though, is that a lot of companies and startups struggle to understand which database solutions will work for them, and what the strengths and weaknesses of each are, and further what tradeoffs they’ll grapple with.

One concept that we hear a lot is “eventually consistent”. Many of the new NoSQL databases achieve their speed & availability this way. But what’s it all about?

Let’s change a smartphone contact

I’m sure you have a smartphone in your pocket, and for demonstration sake I’ll use the iphone configured with iCloud.

Let’s go ahead and dial up your *OWN* contact card. Click “Edit” and go ahead and change something. Let’s change your title to “rock star”. Now click “Done”. We’ll wait a minute. Now go to your desktop and open up Contacts. Scroll through to your contact and verify that the Title field now shows “rock star”.

How does all this happen? When you click the “Done” button, the iphone sends changes up to iCloud. iCloud then lets your laptop know a change has happened and those then sync up.

Now let’s run through the same exercise, but change it in two places. We’ll change the smartphone contact to “Founder” and the desktop Contacts record title to “Consultant”. Wait a little bit and you’ll notice they will both eventually show “Consultant”.

Also: Why a killer title can make or break your content efforts

How long were laptop & phone out of sync?

As you probably noticed, the iCloud seems to lean in favor of the desktop client. It’s not clear to me what rules it uses here, nor does it seem to be configurable. Nevertheless eventually both the desktop and smartphone with have the same contact card for you. Quite a feat of magic!

Read: Why high availability is so very hard to deliver

Handling collisions

There is only one *YOU* and presumably your digital rolodex reflects that too. You have one and only one contact card. Or do you? As far as these digital tools are concerned there are actually THREE! One on your desktop, one in iCloud and one on your phone. Each time you change in any of those places, it syncs *UP* to iCloud and then down to the other devices.

Collisions happen if you make changes in two places. Imagine if you’re a road warrior and your laptop was offline for some days, or your smartphone for that matter. In those cases that syncing would happen much later, and collisions more likely.

Related: Why the twitter IPO made a shocking admission on scalability

In the high frequency world of online databases

With online databases, all of this becomes vastly more complex. Web based applications may have 100,000 simultaneous users. Some may be coming from IMEA while others the Americas. It gets pretty darn complex when you have databases in each of those regions.

We deploy applications this way, so one datacenter, say the East Coast region one version, can fail, but all the others still operate. They can still change data, read and write, without being impacted by the New York outage.

Once that datacenter is restored, the databases will then sync up and reconcile missing data.

Also: Why a killer title can make or break your content efforts

MariaDB and Amazon RDS read replicas

MySQL and it’s variants of MariaDB, Percona and Amazon RDS can do something like this with read-replicas. The read-only copies of the database are asynchronous and take time to catch up to changes. You can have the read-only copies in different regions.

This improves availability for browsing your application, but not for making changes. In other words MySQL can use this method to scale reads but not writes. That’s why I recommend your applications also support a browse only mode which means availability won’t be impacted if your authoritative master dies.

Although you can try to do the same for writes by sharding your MySQL instances, this starts to get very messy very fast. Imagine backing up 10 shards, 10x the complexity, and even more when you want to go and do a restore.

Read: Why devops talent is in short supply

Amazon’s Dynamo DB

Amazon’s DynamoDB is a technology based around the original Dynamo whitepaper which attempts to solve a whole class of problems by easing eventually consistent constraints.

What you get is more availability, it’s hard for the whole cluster to go down. That’s great for applications because they can continue to operate if one or more nodes fails. It also scales writes, which is a sort of holy grail in the database world as it’s typically hard to do.

But remember all this comes at a cost. Traditionally scaling writes is hard to do because all changes are kept in one place. You maintain a single authoritative master. If you want to imagine why this matters, think back to our smartphone example. We changed our contact card on our phone and our desktop at the same time. One of those two changes won the battle. But that’s a case where we’re not overly concerned.

If you imagine a bank doing the same thing, and you wire $1000 via phone and desktop, you can quickly see that there is a whole class of applications that won’t be happy with eventually consistent. Your web application may be one of those. Or it may not. Consider carefully before you go with Amazon RDS or DynamoDB as your datastore.

Read: Why startups need more than great developers to achieve scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters