Highly available systems build redundancy into the application and the architecture layers to mitigate against disasters. Since computing systems are made from commodity hardware and components which are prone to failure, having redundancy at every layer is key.
Redundancy of switches, network interfaces, and load balanced webservers are fairly straightforward and run-of-the-mill. But clustering your database tier is another trick entirely. With MySQL, master-master active-passive can work quite well, running circular replication to send all changes to both nodes. Both nodes are able to handle production traffic, and you pick the one that is active simply by configuring your application to point to that. Use a technology like MMM or Pacemaker to front your database cluster with a virtual IP (vip), so no application or webserver changes are required to switch which node takes on the master role.
Redundant components are important in a single datacenter, but what if that datacenter goes out or gets hit by a natural disaster? Is your whole business out? That’s where geographic redundancy comes in. Geographic redundancy and geo load balanced DNS comes in. Having redundant copies of your whole site on both the east and west coast with geo-dns provides the next level of high availability.
Sean Hull discusses on Quora – What is High Availability and why is it important?
Websites and web applications have traffic patterns that are often unpredictable. After all growth in traffic is really what we’re after. However, even with the best stress testing, it’s sometimes difficult to predict what areas of the site will get innundated, or how the site will scale.
Degrade gracefully describes an architecture built specially to unwind in a smooth manner without any real site-wide outage. What do we mean by that? We mean build in operational switches to turn off components in the site. Have a star rating on pages? Build an on/off switch for your operations team to disable it if necessary. Have site-wide comments, or robust search? Allow those features to be disabled. If possible, architect in a read-only mode for your site that you can turn on in a real difficult situation. By operationalizing these components, you give more flexibility to the operations team, and reduce the likelihood of having a complete outage.
Sean Hull asks on Quora: What does degrade gracefully mean, and why is it important?
With cloud-based hosting solutions, new servers can be provisioned and “spun up” with a few options on the command line. This opens a whole new dimension for infrastructure, allowing software scripts to bring new computing power into your web infrastructure.
Internet based applications often exhibit seasonal traffic patterns where traffic stays steady or grows slowly over a period, but then experiences a sharp spike in demand requiring much higher computing resources to meet customer demand.
Enter auto-scaling, an even more powerful feature of cloud-based offerings. Define roles for your webservers and database servers, set capacity rules that control how much traffic will trigger new servers to be rolled out, and watch your infrastructure scale automatically to meet the needs of your internet application.
We talked about Scalability previously. So what is mobile scalability? Mobile devices and smartphones run applications just like your laptop or home computer. However these applications have some special requirements such as location-based search. They also are typically not as weighty as their desktop counterparts, as memory and computing cycles are limited on a mobile device. What’s more they should have a reduced network requirement and make fewer roundtrips between the device and the server.
Just like for other web applications, mobile scalability involves a few key areas:
- load balancing the webserver tier
- load balancing the database tier
It also introduces these new requirements:
- fast location based search and lookups
- minimal network usage
For the operations team tasked with optimizing a mobile application, pay particular attention to the measured amount of data that moves back and forth on each page. Optimize images and/or remove them if possible. Adjust layout for the most popular devices, and spend extra time testing for those.
To address the location based search requirements, look at what’s happening on your database tier. If you’re running MySQL, enable the slow query log, and watch for heavy queries doing location-based lookups. If you’re not already using a radius indexes, definitely consider those as a way to speedup such lookups.
Also consider creative ways of looking at the business requirements to reduce actual computational power. What do we mean by this? Chances are the business team discusses what the application needs to do and the developers then go about figuring out how to do it. But what happens as in the case of a location search, when the how is very expensive and slow? It may be that customers think about nearby me in a much looser way than the technical folks do. They may not care that you offer them businesses that are exactly within a one-mile radius at all. They may be happy enough with a search that puts them uniquely inside one zipcode for example, or alternatively by breaking a city area for example into adjacent boxes instead of circles. A square or rectangular area will be at least an order of magnitude (10x) faster but possibly as much as 100x faster because you can index perfectly on latitude and longitude lines, then compare the lat and long location of the businesses nearby.
This creative solution may fit business requirements and will bring huge speedups to your database tier, and thus your overall mobile application. Scalability indeed!
Sean Hull asks on Quora: What is mobile scalability and why is it important?
Disaster recovery involves the anticipation of major business outage, and the contingency planning to avoid business loss in revenue, customers or sales.
All of the technology components that make up your enterprise applications should be carefully considered against loss. What happens if this database server disappears? Do we have all the data backed up somewhere? Have we tested that backup to restore it? How long does it take to restore? Can we reconnect the application to said database? What if the network goes down? How about if the whole datacenter goes out?
Planning for disaster recovery is important whether you’re hosted in-house or with a hosting provider. Consider Amazon’s EC2 outage in April. Various availability zones went out. Were affected customers to have their database backed up properly – with offsite & tested copies, and further if they had other components such as webserver document roots, software configurations, etc they would be able to rebuild their entire infrastructure in an alternate availability zone or region. Remember it was only a small component of Amazon Web Services which was out.
Sean Hull asks on Quora: Disaster Recovery – What is it and why is it important?
Scaling comes in a few different flavors. Vertical scaling involves growing the computing power of a single server, adding memory, faster or more CPUs and/or faster disk I/O.
Horizontal scaling involves adding additional computing resources or servers in parallel and then load balacing across them.
Scalability refers to applications which facilitate scaling well. With web applications, the middle tier aka the webservers are fairly easy to scale horizontally and most enterprise class applications already do this with commercial load balancers – with either hardware or software.
Doing the same with the database tier, however can be trickier. Enter MySQL replication to facilitate a fairly painless horizontal scalability. Build your application architecture with read-only transactions, and write/update transactions segmented apart, and you can send the latter to one master database, and the former to a handful of replicated slaves. With a typical web application that is less than 10% writes, and 90% reads, there is the potential to add as many as 5-10 servers horizontally to increase application throughput by as much as 500-1000%.
Sean Hull asks on Quora: What is scalability and why is it important?
Cloud Computing has a few varied meanings from API services such as twitter to web-based (read cloud-based) email services such as gmail and yahoo.
An even bigger tectonic shift is happening though, in the area of infrastructure and hosting, to cloud based solutions. No longer is provisioning a slow ordering process, followed by a multi-year contract and commitment with an associated high price tag. Now computing resources can be provisioned and “spin-up” in seconds, even allowing for auto-scaling, bringing new computing resources online dynamically as seasonal traffic patterns demand.
- uniquely suited to applications with seasonal traffic requirements
- supports disaster recovery effectively for free
- allows temporary provisioning of test environments
- facilitates auto-scaling of bare metal servers
- no huge budgetary outlay, pay for only what you use
- bring up resources in seconds – supports true agile development
What’s more since cloud resources are all provisioned in software through an API, it encourages the treatment of infrastructure as a whole as software. Now the scripts to completely rebuild all of your systems, from spin-up, to package configuration to application configuration can all be done in software, and managed in version control.
Sean Hull asks the question on Quora: What is Cloud Computing?
When you enter a website name in your browser or click on a google result, you start a cascade of events to unfold. Your request various pieces and components that make up the webpage from a remote server which hosts that website. Those pieces are sent back to you, and your browser assembles them.
There are many moving parts in that process. Anywhere along the way you can hit a snag, slowing down the overall process of that page displaying. Website Optimization attempts to identify all of those processes and components, and organize them by slowest to fastest. This allows us to focus our attention on the slowest part of the process. Like a physician looking at your vascular system, it allows the performance expert to identify and then fix those pipes that are slowing you down.
Since website performance has been shown to directly influence customer retention, conversion, and user experience, overall website performance and optimization are key to your business success.
Sean Hull asks on Quora: What is website optimization and why is it important?
SQL is a difficult acronym for a difficult language, but what it does is shuttle information into and out of your database in an organized manner. Your web applications and developers have to speak it, and your database – whether Oracle, MySQL, Postgres or some other will return information back using this computing dialect.
Since every movement on your website, from page to page (sessions) and purchase to purchase all involve interaction using these queries, writing them well can have a huge impact on your website performance. How big? We’ve fixed queries by adding indexes or rewriting them and seen improvements by as much as 100x. That’s converting pages that take ten seconds to ones that take 1/10 of a second. Be especially vigilant about those queries generated by Object Relational Mappers like Active Record, Ruby’s ORM layer.
What is SQL on Quora
Open Source, a term understood well by the technology set, but not enough by everyone.
Open Source for the software industry is like generic drugs for the pharmaceutical industry. It enables more players to come to the table, it is a huge driving force behind internet infrastructures, which are built on Linux, Apache and many other technologies. It is the backbone of companies like google, and facilitates cloud services from the likes of Amazon EC2, Joyent, Rackspace and many others.
It is the rising tide that lifts all boats, if you will.
Sean Hull’s writing on Quora.