Scaling high traffic websites is an on-going challenge. Web caching is one key technology which can help to achieve huge scale. Here’s a primer of the various types of caching.
An object cache is very fast and efficient at storing name/value pairs. Imagine you had a notebook which kept track of your contacts. You could have attributes centered around the persons name. Sean + consultant, Sean + database, Sean + NewYork, Sean + performance. You might have attributes for birthday or hair color, colleagues, or alternate phone numbers. But all of them would be organized as a name and a value. This is what an object cache such as memcache provides.
Web content management systems like Drupal, Joomla and WordPress already support memcache. It merely needs to be installed as a service, and your admin interface configured to use it. Then enjoy the huge performance boost! With custom applications, whether they are written in PHP or Ruby on Rails, you’ll want to build your application using a memcache library. Whenever objects are fetched from your database or backend data store you check the object cache first. Only if you don’t find them do you go to the database and then throw those new items fetched, into the object cache for later.
Object caches themselves can be distributed so that all of the webservers will have access to the same cached objects. In more advanced architectures, those object caches can be placed on their own tier. For instance Facebook uses 800 memcache servers serving up a whopping 28 terabytes of memory. Now that’s scalability!
A page cache sits in front of the webserver itself. You can think of a page cache as a sort of streamlined, hyperfast web server. Varnish is a good example of a page cache sometimes known as a reverse proxy cache. These can be much more efficient than a full blown web server like Apache, and can be employed successfully to bring further caching and scalability gains.
On the database server itself, there are various memory areas which cache blocks read from disk. Those buffer caches should all be configured, but more importantly the query cache should be employed. The MySQL query cache will cache queries and their result sets. It can be configured quite large, however in typical environments 500M may well provide sufficient space as larger queries tend to get executed rarely and so aren’t worth caching. Better to concentrate on your small frequent queries and result sets!
Content Delivery Network or CDN
Let’s not forget the all-important browser caching. But browser caching is not all controlled by how users have setup IE, Firefox, Chrome or Safari. It has a lot to do with expirations set in the pages you send. Longer expirations mean the application can control when or how those pages expire, if at all. That means repeat visits to your site will be lightning fast.