As always for hacker news, there’s a feisty debate about what such character includes.
Here’s my take.
1. Tackle learning quickly
Whether it means getting up to speed on a new service that AWS has launched, building a new api for an application that has never been built before, or getting up to speed with a new platform. Learning is ongoing.
Top engineers can make this a seemless part of their daily routine. Getting going quickly with new concepts and technologies, means wading into the water at first, to gain the general lay of the land. Now you can talk intelligently about the features, limitations and challenges.
From there he or she can dive in quickly to the specific area required for the project, and move forward with that technology comfortably.
When building code, it’s easy to get mired in libraries, sorting algorithms, and API minutiae. And all of that is very important. But what are you building, and why are you building it?
Understanding your customer, what they do day-to-day is not always easy. It means using the product yourself, and also talking with sales teams regularly to hear what they are hearing.
Then pouring all this into your user stories. For top engineers it will inform their decisions, and help them communicate to product & project managers about what issues their encountering. Tradeoffs about features, coding, performance and technical debt can be better evaluated with more information.
Does your code run slowly? Have you tried to figure out why?
Is it related to:
o latency in production that doesn’t appear our your laptop
o untuned production database queries
o untuned connection pooling
o slow API calls
o weird kubernetes or orchestration issues
o web host issue with memory shortage
o web host issue with slow unoptimized code
o issues on the client side
Top engineers have seen applications slow down or fail in a myriad of ways. This allows them to imagine how a new application might be failing, and investigate those.
In startups, your engineers need to communicate to many folks who don’t have an engineering background. Product & UI/UX folks probably are quite technical on their own. But what about sales teams who are dealing directly with customers? Or C-suite folks who watch the business bottom lines, but may not have the same low level technical understanding?
Great communicators can find the right metaphor to explain hurdles and holdups, technical debt, or the latest performance challenges. And explaining those in terms that resonate for others is incredibly valuable to the team and business velocity.
Meetups are always an excellent way to find great people. Tech networking, can be one part selling your ideas, projecting your prowess or career building for your own people. And meanwhile career forward, self-actuated employees are scoping you out too.
Connecting through your own people is always a good way to find talent. These referrals are also great because you already have a handle on the hires real-world experience and personality. Good for everybody.
As your startup grows, you get some press in TechCrunch or ReCode. This generates some household recognition in the tech community, or posts on HackerNews. From there you get inbound requests because smart people are on the prowl for a great company. Done & done!
Devops is in serious demand these days. At every meetup or tech event I attend, I hear a recruiter or startup founder talking about it. It seems everyone wants to see benefits of talented operations brought to their business.
That said the skill set is very broad, which explains why there aren’t more devs picking up the batton.
I thought it would be helpful to put together a list of interview questions. There are certainly others, but here’s what I came up with.
1. Explain the gitflow release process
As a devops engineer you should have a good foundation about software delivery. With that you should understand git very well, especially the standard workflow.
Although there are other methods to manage code, one solid & proven method is gitflow. In a nutshell you have two main branches, development & master. Developers checkout a new branch to add a feature, and push it back to development branch. Your stage server can be built automatically off of this branch.
Periodically you will want to release a new version of the software. For this you merge development to master. UAT is then built automatically off of the master branch. When acceptance testing is done, you deploy off of master to production. Hence the saying always ship trunk.
Bonus points if you know that hotfixes are done directly off the master branch & pushed straight out that way.
There are a lot of tools in the devops toolbox these days. One that is great at provisioning resources is Terraform. With it you can specify in declarative code everything your application will need to run in the cloud. From IAM users, roles & groups, dynamodb tables, rds instances, VPCs & subnets, security groups, ec2 instances, ebs volumes, S3 buckets and more.
You may also choose to use CloudFormation of course, but in my experience terraform is more polished. What’s more it supports multi-cloud. Want to deploy in GCP or Azure, just port your templates & you’re up and running in no time.
It takes some time to get used to the new workflow of building things in terraform rather than at the AWS cli or dashboard, but once you do you’ll see benefits right away. You gain all the advantages of versioning code we see with other software development. Want to rollback, no problem. Want to do unit tests against your infrastructure? You can do that too!
The four big choices for configuration management these days are Ansible, Salt, Chef & Puppet. For my money Ansible has some nice advantages.
First it doesn’t require an agent. As long as you have SSH access to your box, you can manage it with Ansible. Plus your existing shell scripts are pretty easy to port to playbooks. Ansible also does not require a server to house your playbooks. Simply keep them in your git repository, and checkout to your desktop. Then run ansible-playbook on the yaml file. Voila, server configuration!
Unit testing & integration testing are super import parts of continuous integration. As you automate your tests, you formalize how your site & code should behave. That way when you automate the deployment, you can also automate the test process. Let the software do the drudgework of making sure a new feature hasn’t broken anything on the site.
As you automate more tests, you accelerate the software development process, because you’re doing less and less manually. That means being more agile, and makes the business more nimble.
Docker a low overhead way to run virtual machines on your local box or in the cloud. Although they’re not strictly distinct machines, nor do they need to boot an OS, they give you many of those benefits.
Docker can encapsulate legacy applications, allowing you to deploy them to servers that might not otherwise be easy to setup with older packages & software versions.
Docker can be used to build test boxes, during your deploy process to facilitate continuous integration testing.
Docker can be used to provision boxes in the cloud, and with swarm you can orchestrate clusters too. Pretty cool!
Since devops brings a new process of continuous delivery to the organization, it involves some risk. Actually doing things the old way involves more risk in the long term, because things can and will break. With automation, you can recovery quicker from failure.
But this new world, requires a leap of faith. It’s not right for every organization or in every case, and you’ll likely strike a balance from what the devops holy book says, and what your org can tolerate. However inevitably communication becomes very important as you advocate for new ways of doing things.
She went on to say how much has changed in the last decade. We talked about how the database administrator, as a career role, wasn’t really being hired for much these days. Things had changed. Evolved a lot.
How do you keep up with all the new technology, she asked?
I went on to talk about Amazon RDS, EC2, lambda & serverless as really exciting stuff. And lets not forget terraform (I wrote a howto on terraform), ansible, jenkins and all the other deployment automation technologies.
We talked about Redshift too. It seems to be everywhere these days and starting to supplant hadoop as the warehouse of choice for analytics.
It was a great conversation, and afterward I decided to summarize my thoughts. Here’s how I think automation and the cloud are impacting the dba role.
My career pivots
Over the years I’ve poured all those computer science algorithms, coding & hardware skills into a lot of areas. Tools & popular language change. Frameworks change. But solid deductive reasoning remains priceless.
o C++ Developer
Fresh out of college I was doing Object Oriented Programming on the Macintosh with Codewarrior & powerplant. C++ development is no joke, and daily coding builds strength in a lot of areas. Turns out he application was a database application, so I was already getting my feet wet with databases.
o Jack of all trades developer & Unix admin
One type of job role that I highly recommend early on is as a generalist. At a small startup with less than ten employees, you become the primary technology solutions architect. So any projects that come along you get your hands dirty with. I was able to land one of these roles. I got to work on Windows one day, Mac programming another & Unix administration & Oracle yet another day.
o Oracle DBA
The third pivot was to work primarily on Oracle. I attended Oracle conferences & my peers were Oracle admins. Interestingly, many of the Oracle “experts” came from more of a business background, not computer science. So to have a more technical foundation really made you stand out.
For the startups I worked with, I was a performance guru, scalability expert. Managers may know they have Oracle in the mix, but ultimately the end goal is to speed up the website & make the business run. The technical nuts & bolts of Oracle DBA were almost incidental.
o MySQL & Postgres
As Linux matured, so did a lot of other open source projects. In particular the two big Open Source databases, MySQL & Postgres became viable.
Suddenly startups were willing to put their businesses on these technologies. They could avoid huge fees in Oracle licenses. Still there were not a lot of career database experts around, so this proved a good niche to focus on.
o RDS & Redshift on Amazon Cloud
Fast forward a few more years and it’s my fifth career pivot. Amazon Web Services bursts on the scene. Every startup is deploying their applications in the cloud. And they’re using Amazon RDS their managed database service to do it. That meant the traditional DBA role was less crucial. Sure the business still needed data expertise, but usually not as a dedicated role.
Time to shift gears and pour all of that Linux & server building experience into cloud deployments & migrating to the cloud.
o Devops, data, scalability & performance
Now of course the big sysadmin type role is usually called an SRE or Devops role. SRE being site reliability engineer. New name but many of the same responsibilities.
Now though infrastructure as code becomes front & center. Tools like CloudFormation & Terraform, plus Ansible, Chef & Jenkins are all quite mature, and being used everywhere.
Checkout your infrastructure code from git, and run terraform apply. And minutes later you have rebuilt your entire stack from bare metal to fully functioning & autoscaling application. Cool!
However these days cloud, automation & microservices have brought a lot of madness too! Don’t believe me check out this piece on microservice madness.
With microservices you have more databases across the enterprise, on more platforms. How do you restore all at the same time? How do you do point-in-time recovery? What if your managed service goes down?
Migration scripts have become popular to make DDL changes in the database. Going forward (adding columns or tables) is great. But should we be letting our deployment automation roll *BACK* DDL changes? Remember that deletes data right? 🙂
What about database drop & rebuild? Or throwing databases in a docker container? No bueno. But we’re seeing this more and more. New performance problems are cropping up because of that.
What about when your database upgrades automatically? Remember when you use a managed service, it is build for 1000 users, not one. So if your use case is different you may struggle.
In my experience upgrading RDS was a nightmare. Database as a service upgrades lack visibility. You don’t have OS or SSH access so you can’t keep track of things. You just simply wait.
No longer do we have “zero downtime”. With amazon RDS you have guarenteed downtime upgrades. No seriously.
As the field of databases fragments, we are wearing many more hats. If you like this challenge & enjoy being a generalist, you may feel at home here. But it is a long way from one platform one skill set career path.
Also fragmented db platforms means more complex recovery. I can’t stress this enough. It would become practically impossible to restore all microservices, all their underlying databases & all systems to one single point in time, if you need to.
Let’s say you’re hiring a devops & you want to suss out their database knowledge? Or you’re hiring a professional services firm or freelance consultant. Whatever the case you’ll need to sift through for the best people. Here’s how.
Caching is a popular way to speed up access to your backend database. Put Amazon’s elasticache behind your webserver, and you can reduce load on your database by 90%. Nice!
The two types that amazon supports are Memcache & Redis. Memcache is historically more popular. These days Redis seems a clear winner. It’s faster, and can maintain your cached data between restarts. That will save you I promise!
When you’re doing large reports for your business intelligence team, you don’t want to bog down your backend relational database. Redshift is purpose built for this use case.
I’ve see a report that took over 8 hours in MySQL return in under 60 seconds in Redshift!
A new offering is Amazon Spectrum. This tech is super cool. Load up all your data into S3, in standard CSV format. Then without even loading it into Redshift, you can query the S3 data directly. This is super useful. Firstly because S3 is 1/10th the price. But also because it allows you to stage your data before loading into Redshift itself. Goodbye Google Big Query! I talked about spectrum here.
What relational database options are there on Amazon?
Amazon supports a number of options through it’s Relational Database Service or RDS. This is managed databases, which means less work on your DBAs shoulders. It also may make upgrades slower and harder with more downtime, but you get what you pay for.
There are a lot of platforms available. As you might guess MySQL & Postgres are there. Great! Even better you can use MariaDB if that’s your favorite. You can also go with Aurora which is Amazon’s own home-brew drop in replacement for MySQL that promises greater durability and some speedups.
If you’re a glutton for punishment, you can even get Oracle & SQL Server working on RDS. Very nice!
Let’s be honest, Amazon wants to make this really easy. The quicker & simpler it is to get your data there, that more you’ll buy!
Amazon’s Database Migration Service or DMS allows you to configure your old database as a data source, then choose a Amazon db solution as destination, then just turn on the spigot and pump your data in!
ETL is extract transform and load, data warehouse terminology for slicing and dicing data before you load it into your warehouse. Many of todays warehouses are being built with the data lake model, because databases like Redshift have gotten so damn fast. That model means you stage all your source data as-is in your warehouse, then build views & summary tables as needed to speed up queries & reports. Even better you might look a tool like xplenty.
Whether your a hiring manager, head of HR or recruiter, you are probably looking for a devops expert. These days good ones are not easy to find. The spectrum of tools & technologies is broad. To manage today’s cloud you need a generalist.
If you’re a devops expert and looking for a job, these are also some essential questions you should have in your pocket. Be able to elaborate on these high level concepts as they’re crucial in todays agile startups.
Believe it or not there are small 1 person teams that haven’t done this. But even with those, there’s real benefit. Get on it!
B. Evolve to one script push-button deploy (script)
If deploying new code involves a lot of manual steps, move file here, set config there, set variable, setup S3 bucket, etc, then start scripting. That midnight deploy process should be one master script which includes all the logic.
It’s a process to get there, but keep the goal in sight.
C. Build confidence over many iterations (team process & agile)
As you continue to deploy manually with a master script, you’ll iron out more details, contingencies, and problems. Over time You’ll gain confidence that the script does the job.
D. Employ continuous integration Tools to formalize process (CircleCI, Jenkins)
Now that you’ve formalized your deploy in code, putting these CI tools to use becomes easier. Because they’re custom built for you at this stage!
E. 10 deploys per day (long term goal)
Your longer term goal is 10 deploys a day. After you’ve automated tests, team confidence will grow around developers being able to deploy to production. On smaller teams of 1-5 people this may still be only 10 deploys per week, but still a useful benchmark.
Microservices is about two-pizza teams. Small enough that there’s little beaurocracy. Able to be agile, focus on one business function. Iterate quickly without logjams with other business teams & functions.
Microservices interact with each other through APIs, deploy their own components, and use their own isolated data stores.
Function as a service, Amazon Lambda, or serverless computing enables microservices in a huge way.
Serverless computing is a model where servers & infrastructure do not need to be formalized. Only the code is deployed, and the platform, AWS Lambda for example, takes care of instant provisioning of containers & VMs when the code gets called.
Events within the cloud environment, such a file added to S3 bucket, trigger the serverless functions. API Gateway endpoints can also trigger the functions to run.
Authentication services are used for user login & identity management such as Auth0 or Amazon Cognito. The backend data store could be Dynamodb or Google’s Firebase for example.
Containers are like faster deploying VMs. They have all the advantages of an image or snapshot of a server. Why is this useful? Because you can containerize your microservices, so each one does one thing. One has a webserver, with specific version of xyz.
Containers can also help with legacy applications, as you isolate older versions & dependencies that those applications still rely on.
Containers enable developers to setup environments quickly, and be more agile.
CloudFormation, formalizes all of your cloud infrastructure into json files. Want to add an IAM user, S3 bucket, rds database, or EC2 server? Want to configure a VPC, subnet or access control list? All these things can be formalized into cloudformation files.
Once you’ve started down this road, you can checkin your infrastructure definitions into version control, and manage them just like you manage all your other code. Want to do unit tests? Have at it. Now you can test & deploy with more confidence.
Terraform is an extension of CloudFormation with even more power built in.
For technical interviews, I like to focus on concepts & the big picture. Which rules out coding exercises or other puzzles which I think are distracting from the process. I really like what what the guys at 37 Signals say…
“Hire for attitude. Train for skill.”
So let’s get started.
1. How do you automate deployment?
Programming lambda functions is much like programming in other areas, with some particular challenges. When you first dive in, you’ll use the Amazon dashboard to upload a zipfile with your code. But as you become more proficient, you’ll want to create a deployment pipeline.
o What features in Amazon facilitate automatic deployments?
AWS Lambda supports environment variables. Use these for credentials & other data you don’t want in your deployment package.
Amazon’s serverless offering, also supports aliases. You can have a dev, stage & production alias. That way you can deploy functions for testing, without interrupting production code. What’s more when you are ready to push to production, the endpoint doesn’t change.
o What frameworks are available for serverless?
Serverless Framework is the most full featured option. It fully supports Amazon Lambda & as of 1.0 provides support for other platforms such as IBM Openwhisk, Google Cloud Functions & Azure functions. There is also something called SAM or Serverless Application Model which extends CloudFormation. With this, you can script changes to API Gateway, Dynamo DB & Cognos authentication stuff.
If you’re using Auth0 instead of Cognito or Firebase instead of Dynamodb, you’ll have to come up with your own way to automate changes there.
Why are we moving to a serverless computing model? What are the advantages & benefits of it?
o easier operations means faster time to market
o large application components become managed
o reduced costs, only pay while code is running
o faster deploy means more experimentation, more agile
o no more worry about which servers will this code run on?
o reduced people costs & less infrastructure
o no chef playbooks to manage, no deploy keys or IAM roles
There are a lot of fanboys of serverless, because of the promise & hope of this new paradigm. But what about healthy criticism? A little dose of reality can identify a critical & active mind.
o With Lambda you have less vendor control which could mean… more downtime, system limits, sudden cost changes, loss of functionality or features and possible forced API upgrades. Remember that Amazon will choose the needs of the many over your specific application idiosyncracies.
o There’s no dedicated hardware option with serverless. So you have the multi-tenant challenges of security & performance problems of other customers code. You may even bump into problems because of other customers errors!
o Vendor lock-in is a real obvious issue. Changing to Google Cloud Functions or Azure Functions would mean new deployment & monitoring tools, a code rewrite & rearchitect, and new infrastructure too. You would also have to export & import your data. How easy does Amazon make this process?
o You can no longer store application & state data in local server memory. Because each instantiation of a function will effectively be a new “server”. So everything must be stored in the database. This may affect performance.
o Testing is more complicated. With multiple vendors, integration testing becomes more crucial. Also how do you create dev db instance? How do you fully test offline on a laptop?
o You could hit system wide limits. For example a big dev deploy could take out production functions by hitting an AWS account limit. You would thus have DDoS yourself! You can also hit the 5 minute execution time limit. And code will get aborted!
o How do you do zero downtime deployments? Since Amazon currently deploys function-by-function, if you have a group of 10 or 20 that act as a unit, they will get deployed in pieces. So your app would need to be taken offline during that period or it would be executing some from old version & some from new version together. With unpredictable results.
o In serverless you may use multiple vendors, such as Auth0 for authentication, and perhaps Firebase for your data. With Lambda as your serverless platform you now have three vendors to work with. More vendors means a larger area across which hackers may attack your application.
o With the function as a service application model, you lose the protective wall around your database. It is no longer safely deployed & hidden behind a private subnet. Is this sufficient protection of your key data assets?
While everyone is scrambling to figure out why part of the internet went down … wait is S3 is part of the internet, really? While I’m figuring out if it is a service of Amazon, or if Amazon is so big that Amazon *is* the internet now…
I loved Drew Bell’s story of stumbling into home ownership, attempting to fix a doorbell, and falling down a familiar rabbit hole. With parallels to legacy software systems… aka any older then oh say five years?
If you grew up on the virtual world of the cloud, you may have never touched hardware besides your own laptop. Developing in this world may completely remove us from understanding those pesky underlying physical layers. Yes indeed folks containers do run in “virtual” machines, but those themselves are running on metal, somewhere down the stack.
I worked at a customer last year, on a short term assignment. A brilliant engineer had built their infrastructure, automated deployments, and managed all the systems. Sadly despite all the sleepless nights, and dedication, they hadn’t managed to build up good report with management.
I’ve seen this happen so many times, and I do find it a bit sad. Here’s an engineer who’s working his butt off, really wants the company to succeed. Really cares about the systems. But doesn’t connect well with people, often is dismissive, disrespectful or talks down to people like they’re stupid. All burns bridges, and there’s a lot of bad feelings between all parties.
How to manage the exit process. Here’s a battery of recommendations for changing credentials & logins so that systems can’t be accessed anymore.
1. Lock out API access
You can do this by removing the administrator role or any other role their IAM user might have. That way you keep the account around *just in case*. This will also prevent them from doing anything on the console, but you can see if they attempt any logins.
At one of my customers the outgoing op had setup many moving parts & automated & orchestrated all the deployment processes beautifully. However he also used his personal github key inside jenkins. So when it went to deploy, it used those credentials to get the code from github. Oops.
We ended up creating a company github account, then updating jenkins with those credentials. There were of course other places in the capistrano bits that also needed to be reviewed.
Have some servers outside of AWS in a traditional datacenter? Or even servers in AWS that are using usernames & passwords? Be sure to audit the full list of systems, and change passwords or disable accounts for the outgoing sysop.