How do we become a high performing organization, to move faster and build more secure and resilient systems? That’s the $64,000 question!
A16Z strikes again! Andreeson Horowitz’s epic podcast hosts world class guests around all sorts of startup & new technology topics. This week they interview Jez Humble and Nicole Forsgren. They run Dora which is DevOps Research and Assessment, which shows organizations just how to get the advantages of devops in the real world.
Technology does not drive organizational performance
Check out section 16:04 in the podcast…
“the point of distinction comes from how you tie process and culture together technology through devops”
It’s the classic Amazon model. They’re running hundreds of experiments in production at any one time!
What about tooling? Is that important? Here’s what Jez has to say. Jump to 29:30…
“Implementing those technologies does *not* give you those outcomes. You can achieve those results with Mainframes. Equally you can use Kubernetes, Docker and microservices and not achieve those outcomes.”
I flipped through page after page, and chapter after chapter, and found the bits that I thought were particularly useful. And I have summarized those here.
o docker-compose organizes docker runs with a yaml config
o multiple services in one container is an antipattern
o deleting files don’t reduce container size, because they still exist in previous layer
o export followed by import can be a quick way to reduce image size
o docker-machine allows you to provision containers on virtual hosts locally or in the cloud
o build a private registry node, then push & pull images through it with deploy pipeline
o unit tests are key and provide tests for individual functions in your code
o component tests are also important to test api endpoints for example
o integration tests can be useful, verifying an auth service or external API is working with app
o end-to-end tests verify that the entire application is working
o by default containers can talk, consider –icc=false & –iptables=true
o passing secrets with env variables or better yet use a file, vault or kms
o SkyDNS on top of etcd can provide a powerful service discovery solution
o use registrator project to automatically register containers when they start
o orchestration with swarm (native), fleet, mesos or Kubernetes
o don’t run as root – because a breakout would have root on host
o use limits on memory, cpu, restarts & filesystem to avoid DoS
o defang setuid root binaries with a find +6000 & chmod a-s
o use gpg keys & verify checksums when downloading software
o selinux & AppArmor may help, but buyer beware
o you can use logsprout to send docker image logs to logstash
o add elasticsearch on top with kibana as frontend to give a great searchable logging UI
o Jason Wilder’s docker-gen can streamline config file creation from templates
o we can modularize compose files with the extends keyword (like library import)
o audit containers & use docker diff to find issues
Devops is in serious demand these days. At every meetup or tech event I attend, I hear a recruiter or startup founder talking about it. It seems everyone wants to see benefits of talented operations brought to their business.
That said the skill set is very broad, which explains why there aren’t more devs picking up the batton.
I thought it would be helpful to put together a list of interview questions. There are certainly others, but here’s what I came up with.
1. Explain the gitflow release process
As a devops engineer you should have a good foundation about software delivery. With that you should understand git very well, especially the standard workflow.
Although there are other methods to manage code, one solid & proven method is gitflow. In a nutshell you have two main branches, development & master. Developers checkout a new branch to add a feature, and push it back to development branch. Your stage server can be built automatically off of this branch.
Periodically you will want to release a new version of the software. For this you merge development to master. UAT is then built automatically off of the master branch. When acceptance testing is done, you deploy off of master to production. Hence the saying always ship trunk.
Bonus points if you know that hotfixes are done directly off the master branch & pushed straight out that way.
There are a lot of tools in the devops toolbox these days. One that is great at provisioning resources is Terraform. With it you can specify in declarative code everything your application will need to run in the cloud. From IAM users, roles & groups, dynamodb tables, rds instances, VPCs & subnets, security groups, ec2 instances, ebs volumes, S3 buckets and more.
You may also choose to use CloudFormation of course, but in my experience terraform is more polished. What’s more it supports multi-cloud. Want to deploy in GCP or Azure, just port your templates & you’re up and running in no time.
It takes some time to get used to the new workflow of building things in terraform rather than at the AWS cli or dashboard, but once you do you’ll see benefits right away. You gain all the advantages of versioning code we see with other software development. Want to rollback, no problem. Want to do unit tests against your infrastructure? You can do that too!
The four big choices for configuration management these days are Ansible, Salt, Chef & Puppet. For my money Ansible has some nice advantages.
First it doesn’t require an agent. As long as you have SSH access to your box, you can manage it with Ansible. Plus your existing shell scripts are pretty easy to port to playbooks. Ansible also does not require a server to house your playbooks. Simply keep them in your git repository, and checkout to your desktop. Then run ansible-playbook on the yaml file. Voila, server configuration!
Unit testing & integration testing are super import parts of continuous integration. As you automate your tests, you formalize how your site & code should behave. That way when you automate the deployment, you can also automate the test process. Let the software do the drudgework of making sure a new feature hasn’t broken anything on the site.
As you automate more tests, you accelerate the software development process, because you’re doing less and less manually. That means being more agile, and makes the business more nimble.
Docker a low overhead way to run virtual machines on your local box or in the cloud. Although they’re not strictly distinct machines, nor do they need to boot an OS, they give you many of those benefits.
Docker can encapsulate legacy applications, allowing you to deploy them to servers that might not otherwise be easy to setup with older packages & software versions.
Docker can be used to build test boxes, during your deploy process to facilitate continuous integration testing.
Docker can be used to provision boxes in the cloud, and with swarm you can orchestrate clusters too. Pretty cool!
Since devops brings a new process of continuous delivery to the organization, it involves some risk. Actually doing things the old way involves more risk in the long term, because things can and will break. With automation, you can recovery quicker from failure.
But this new world, requires a leap of faith. It’s not right for every organization or in every case, and you’ll likely strike a balance from what the devops holy book says, and what your org can tolerate. However inevitably communication becomes very important as you advocate for new ways of doing things.
I was talking to a customer recently and they asked about deployments. They wanted to do things the real way. Here’s a snippet…
I’m helping out a company called Blue Marble and they are getting ready to deploy a new POS system. The app has been built using a Node.js back-end and Google Cloud Datastore for storage. The current dev build is hosted on AWS and connects to Google for the data bits.
For prod launch, they are interested in migrating to the “real” way of deployment on Google for everything.
They are pressed on time and looking for someone who can jump in quickly. Are you available? Do you have Google Cloud expertise?
Here’s what I said.
Yep, I’ve have used Bigquery & GCE.
What are they looking for specifically? Full deployment automation? Multiple deploys per day?
I’ve found that sometimes the biggest hurdle to fully automated deploys can be cultural issues.
In other words yes you can automate your deployment so it is push button, get all the artifacts & moving parts automated. Then deploy without much intervention. But to go from that to the team having *faith* in the system, that is a challenge.
Once the process has been streamlined, a lot often still needs to happen around unit & smoke tests.
If the team isn’t already in the habit of building tests for each bit of code, this may take some time. Also building tests can be an art in itself. What are the edge cases? What values are out of bounds?
Consider for example odd vulnerabilities that show up when hackers type SQL code into fields that devs were expecting. Sanity checking anyone?
What makes this all even more complicated is integration testing. Today many application use various third party APIs, service-based authentication, and even web-based databases like Firebase. So these things can complicate testing.
Although your project, startup or business may be pressed for time, that may not change the realities of development. Your team has to become culturally ready to be completely agile. Many teams choose a middle ground of automating much of the deployment process, but still having a person in the loop just in case.
Same with testing. Sure automating can make you more agile & more efficient. But you’ll never automate out creative thinking, problem solving & ownership of the product.
The trend to a #NOops movement I think is a dangerous one.
At first glance this might seem reflexive on my part. After all I’ve specialized in operations & databases for years. But I think there’s something more insidious here.
Devs are often presiding over the first wave of software. That’s the initial period of perhaps five years, where frenetic product development is happening. After those years have passed, early innovators are long gone, and an OPS team is trying to keep things running, and patch where necessary. This is when more conservative thinking, and the perspective of fewer moving parts & a simpler infrastructure seems so obvious. All the technical debt is piled up & it’s hard to find the front door.
I’ve sat in on teams talking about getting rid of ops & how it’ll mean more money to spend on devs etc. It’s always a surprising sentiment to hear.
I would argue that developers have a mandate to build production & functionality that can directly help customers. This is in essence a mandate for change. Faster, more agile & responsive means quicker to market & more responsive to changes there.
How I setup wordpress to deploy automatically on aws
You want to make your wordpress site bulletproof? No server outage worries? Want to make it faster & more reliable. And also host on cheaper components?
I was after all these gains & also wanted to kick the tires on some of Amazon’s latest devops offerings. So I plotted a way forward to completely automate the deployment of my blog, hosted on wordpress.
In this one I decouple the assets from the website. What do I mean by this? By moving the db to it’s own server or RDS of even simpler management, it means my server can be stopped & started or terminated at will, without losing all my content. Cool.
You’ll also need to decouple your assets. Those are all the files in the uploads directory. Amazon’s S3 offering is purpose built for this use case. It also comes with easy cloudfront integration for object caching, and lifecycle management to give your files backups over time. Cool !
In a future post we’re going to put all these files in version control. Amazon’s CodeCommit is feature compatible with Github, but integrated right into your account. Once you have your files there, you can use CodeDeploy to automatically place files on your server.
We chose to leave this step out, to simplify the server role you need, for your new EC2 webserver instance. In our case it only needs S3 permissions!
The terraform configuration formalizes what you are asking of Amazon’s API. What size instance? Which AMI? What VPC should I launch in? Which role should my instance assume to get S3 access it needs? And lastly how do we make sure it gets the same Elastic IP each time it boots?
Visit the domain name you specified inside your /etc/httpd/conf.d/mysite.conf
You have full automation now. Don't believe me? Go ahead & TERMINATE the instance in your aws console. Now drop back to your terminal and do:
$ terraform apply
Terraform will figure out that the resources that *should* be there are missing, and go ahead and build them for you. AGAIN. Fully automated style!
Don't forget your analytics beacon code
Hopefully you remember how your analytics is configured. The beacon code makes an API call everytime a page is loaded. This tells google analytics or other monitoring systems what your users are doing, and how much time they're spending & where.
This typically goes in the header.php file. We'll leave it as an exercise to automate this piece yourself!
I was sifting through the CTO school email list recently, and the discussion of performance tuning came up. One manager had posted asking how to organize sprints, and break down stories for the process.
“Agile is not right for fixing performance issues.”
I agree with him & here’s why.
1. Agile roadblocks
At a very high level, agile seeks to organize work around sprints of a few weeks, and sets of stories within those sprints. The assumption here is that you have a set of identified issues. With software development, you have features you’re building. With performance tuning, it’s all about investigation.
How long will it take to solve the crime? Very good question!
Are you seeing general site slowness? Is there a particular feature that loads extremely slowly? Or is there a report that runs forever? Whatever it is, you must first be able to reproduce it. If it’s general site slowness, identify when it is happening.
Once you’ve reproduced your problem, next you want to start digging. Looking at logfiles can help you find errors, such as timeouts. The database has a slow query log, which you’ll definitely want to review. Slow queries can be surfaced by new code deploys, or middleware in front of your database, such as an ORM.
If you find your logfiles aren’t enabled, it’s a good first step to turn them on. Also look at how you’re caching. The browser should be directed to cache, assets should be on CDN, a page cache should protect your application server, and an object cache in front of your database.
As you dig deeper into your problem, you’ll likely uncover the root of your scalability problem. Likely causes include synchronous, serial or locking processes & requests, object relational modelers, lack of caching or new code that has not been tuned well.
This is what I think of as the fun part. You’ve measured the issues, found the problem. Now it’s time to fix it. This is an exciting moment, to bring real benefit to the business. Eliminating a performance problem can feel like springtime at the end of a long cold winter!
I work at a lot of startups, and these days more and more are building tech blogs. With titles like labs or engineering at acme inc, these can be great ways to build your brand, and bring in strong talent.
So how do we make them succeed? It turns out many of the techniques that work for other blogs apply here, and regular attention can yield big gains.
Click bait asside, you *do* still need to write headlines that will click. What works often is for your title to be a little sound bite, encapsulating the gist of your post, but leaving enough hook that people need to click. Don’t be afraid to push the envelope a bit.
Of course you want to make the posts easy as hell to share. Cross posting on twitter, linkedin, facebook and whereever else your audience hangs out is a must. Use tools like hootsuite & buffer to line up a pipeline of content, and try different titles to see which are working.
You’ll also want to enable feedburner. Some folks will add your blog to feedly. Subscriber counts there can be a good indication of how it is growing in popularity too.
You’re going to keep an eye on traffic by installing a beacon into your page header. There are lots of solutions, GA being the obvious one because it’s free. But how to use it?
Ask yourself questions. Who are my readers? Where are they coming from? How long do they spend on average? Do some pages spur readers to read more? Is there copy that works better for readers? Are my readers converting?
It’ll take time if you’re new to the tool, but start with questions like those.
Although you don’t want to go overboard here, you do want to pay some attention. Using keyword rich titles, and < h2 > tags, along with wordpress SEO plugins that support other meta html tags means you’ll be speaking the language search engines understand. Add tags & categories that are relevant to your content.
Don’t overdo it though. Stick to a handful of tags per post. If you add zillions with lots of word order combinations & so forth, this kind of stuff may tip of the search engines in ways that work against you.
When I first started getting serious about blogging, I had an intern helping me with SEO. She did some searching with the moz keyword research tools and found some gems. These are searches that internet users are doing, but for which there still is not great content for.
For example if results showed “cool tech startups in gowanus brooklyn” had no strong results, then writing an article that covered this topic would be a winner right away.
These are big opportunities, because it means if you write directly for that search, you’ll rank highly for all those readers, and quickly grow traffic.
One of the main takeaways from that work is the idea of “getting out of the building”. It means essentially that before you get to far along with your idea, building your product, and too heavily vested and invested in one direction, go do real research with real potential customers.
Right from the beginning test your ideas, and talk to customers. It’s not easy, but if done right will be very revealing.
The book can be had for free at Talking To Humans as an e-book, or send it straight to your kindle for $0.99 cents! With a forward by Steve Blank & Tom Fishburne’s funny cartoons and at only 98 pages, it’s well worth an hour or two of your time.
We all have biases. We think are customers are soccer moms, or 20-somethings who like lattes. By calculating metrics, we find out which market segments actually want our product and why. Keep calculating metrics, and make conclusions from real data.
At the same time beware the dynamic of mistaking statistics for facts. Remain skeptical!
Interviewing real customers in the field requires a lot of balance. Here are a few things you should avoid:
o let speculation equal confirmation
o lead the witness to your conclusion
o talk over them
o selective hearing
o weigh one conversation heavily
o let fear of rejection win
o talking to anyone
o be unprepared for the interview
o try to learn everything at once
o only do customer dev first week
o ask customer to design the product