A lot of firms come to us with a specific scalability problem. “Our user base is growing rapidly and the website is falling over!” Or they’re selling more widgets, “Our shopping cart is slowing down and we’re seeing users abandon their purchases”. These are real startup growing pains, so what to do?
We like to take a measured approach with these types of challenges, so we thought it would be helpful to run through a hypothetical scenario and see how we work.
Having trouble with scalability? Check out our 5 things toxic to scalability piece.
1. Contract outline
First we talk on the phone, or meet face to face and discuss what’s happening. Do you have one page that’s problematic? Is the website slow during certain hours? Or are you seeing erratic behavior and can’t point to a single source?
From there we outline a course of action, based on:
o talking with team, devs & architects
o reviewing systems first hand
o identifying bottlenecks and trouble spots
This with this outline we’ll include an estimate of the number of work days it’ll take to complete. We’ll then send that back to you for review, exchange a deposit and set a start date.
2. Meet team & discuss architecture
Next we’ll meet the team and review the problems in more technical detail. If you’re in NYC we’ll probably make a stop into your offices and have a warm meet & greet. If you’re located further afield we can either meet over a skype call, or arrange for us to travel to your location for the start of the engagement.
3. Measure current throughput
In order to get a sense of the current state of the systems we’ll measure some system metrics. This could be load average or queries per second or other MySQL internal metrics. We’ll also look at some business metrics such as speed of an ecommerce checkout, or a speed test on a particularly slow page.
These metrics are designed to create a baseline of where things are before any changes are made.
[quote]Measuring both business and system metrics before and after changes, allow a rough ROI measurement to be done. This goes a long way towards justifying the expense of a performance review, current and future.[/quote]
4. Review systems, configurations & setups
Next we’ll jump on the various systems and review configurations. This includes webservers, caching servers and the database servers as necessary. We’ll review memory settings, important configurations, all the dials and switches.
Along with this we’ll also review development and architecture. Are you using Java with Hibernate a popular ORM? Or perhaps CakePHP? Are you writing custom SQL code? Are developers up to speed with EXPLAIN and query profiling? For that matter is code in version control?
Just looking for a DBA? Check out our MySQL Hiring Guide.
5. Report on actionable advice & findings
Perhaps the most essential and useful part of an initial engagement is our overall findings and review report. We’ve found these are very valuable to firms as they speak to a lot of folks up and down the business hierarchy. They speak to management about high level architectural problems and structural or process related challenges. And they can speak well to developers and operations teams as they provide a third party birds eye view of day-to-day activities.
6. Discuss which steps to move on
From here we’ll meet again. In particular we’ll review the actionable advice. Some changes will be low cost, requiring no downtime, while others might require a downtime window. Further medium term changes might require refactoring some code and deploying. Typically the larger longer term architecture changes will also be outlined.
Based on time & costs, we’ll decide together which changes are a priority. Obviously we’ll want to move on low hanging fruit first, and move forward from there.
7. Take action on agreed changes
Once we’ve decided which changes we’ll make, we’ll schedule downtime windows as needed and make the changes to systems. From there we’ll carefully observe everything for stability, and no adverse affects.
8. Measure throughput again
Based on the throughput measurements in #3 above, we’ll perform those same benchmarks again. We’ll check low level system metrics, along with higher level business & user based throughput. Both of these are important as they can provide different perspectives on changes made.
For example if the system metrics improve markedly, but the business or user metrics do not, we know are change had some affect on overall performance, but likely we did not identify the one which directly is causing the business slowdown.
9. Summarize findings & performance gain
In the most likely case they both improve markedly, and we can measure the improvements from our entire process of performance review.
This can be helpful and measuring overall return on investment for the engagement. ROI is obviously an important exercise as we want to know that the money is well spent.
10. Document solutions & recommendations
The last step is to document what we did and what we learned. This allows us to carry forward that knowledge and keep applying it to the development and operations process. This allows the business to continue adding value from the engagement even after it’s completed.
Read this far? Grab our newsletter.