How do we test performance in a microservices world?


I recently ran across this interesting question on a technology forum.

“I’m an engineering team lead at a startup in NYC. Our app is written in Ruby on Rails and hosted on Heroku. We use metrics such as the built-in metrics on Heroku, as well as New Relic for performance monitoring. This summer, we’re expecting a large influx of traffic from a new partnership and would like to have confidence that our system can handle the load.”

“I’ve tried to wrap my head around different types of performance/load testing tools like JMeter, Blazemeter, and others. Additionally, I’ve experimented with scripts which have grown more complex and I’m following rabbit holes of functionality within JMeter (such as loading a CSV file for dynamic user login, and using response data in subsequent requests, etc.). Ultimately, I feel this might be best left to consultants or experts who could be far more experienced and also provide our organization an opportunity to learn from them on key concepts and best practices.”

Join 38,000 others and follow Sean Hull on twitter @hullsean.

Here’s my point by point response.

I’ve been doing performance tuning since the old dot-com days.

It used to be you point a loadrunner type tool at your webpage and let it run. Then watch the load, memory & disk on your webserver or database. Before long you’d find some bottlenecks. Shortage of resources (memory, cpu, disk I/O) or slow queries were often the culprit. Optimizing queries, and ripping out those pesky ORMs usually did the trick.

Related: Why generalists are better at scaling the web

Today things are quite a bit more complicated. Yes jmeter & blazemeter are great tools. You might also get newrelic installed on your web nodes. This will give you instrumentation on where your app spends time. However it may still not be easy. With microservices, you have the docker container & orchestration layer to consider. In the AWS environment you can have bottlenecks on disk I/O where provisioned IOPS can help. But instance size also impacts network interfaces in the weird world of multi-tenant. So there’s that too!

Related: 5 things toxic to scalability

What’s more a lot of frameworks are starting to steer back towards ORMs again. Sadly this is not a good trend. On the flip side if you’re using RDS, your default MySQL or postgres settings may be decent. And newer versions of MySQL are getting some damn fancy & performant indexes. So there’s lots of improvement there.

Related: Anatomy of a performance review

There is also the question of simulating real users. What is a real user? What is an ACTIVE user? These are questions that may seem obvious, although I’ve worked at firms where engineering, product, sales & biz-dev all had different answers. But lets say you’ve answered that. Does are load test simply login the user? Or do they use a popular section of the site? Or how about an unpopular section of the site? Often we are guessing what “real world” users do and how they use our app.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Also published on Medium.