Category Archives: All

Relational Database – What is it and why is it important?

A relational database is the warehouse of your data.  Your crown jewels.  It’s your excel spreadsheet or filing cabinet writ large.  You use them everyday and may not know it.  Your smartphone stores it’s contact database in a relational database, most likely sqlite – the ever present but ever invisible embedded database platform.  Your online bank at Citibank or Chase stores all your financial history, statements, contact info, personal data and so forth, all in a relational database.

  • organized around records
  • data points are columns in a table
  • relationships are enforced with constraints
  • indexing data brings hi-speed access
  • SQL is used to get data in and out of the db
  • triggers, views, stored procs & materialized views may also be supported

Like excel, relational databases are organized around records.  A record is like a 3×5 card with a number of different data points on it.  Say you have 3×5 cards for your addressbook.  Each card holds one address, phone number, email, picture, notes and so forth.  By organizing things nicely on cards, and for each card predictable fields such as first name, last name, birthday etc, you can then search on those data points.  Want all the people in your addressbook with birthday of July 5th, no problem.

While relational databases have great advantages, they require a lot of work to get all of your information into neatly organized files.  What’s more the method for getting things into and out of them – SQL is a quirky and not very friendly language.  What’s more relational databases have trouble clustering, and scaling horizontally.  NOSql database have made some headway in these departments, but at costs to consistency and reliability of data.

As servers continue to get larger, it becomes rarer that a single web-facing database really needs more than one single server.  If it’s tuned right, that is.  Going forward and looking to the future, the landscape will probably continue to be populated by a mix of traditional relational databases, new nosql type databases, key-value stores, and other new technologies yet to be dreamed up.

Sean Hull asks on Quora – What is an rdbms and why are they important?

Database Change Management – What is it and why is it important?

During the software development process, whether you’re cutting edge Agile developers, or traditional waterfall method folks, your code changes are periodically accompanied by database changes.  For instance tables have particular rows and columns.  When developers add new columns (ie fields on an index card), or create new tables, relationships, indexes or other database objects, all of these are lumped together as database changes.

Version control systems have brought great manageability gains to software projects, even ones involving only  a single developer.  That’s because they allow you to rewind to any savepoints, just like you can undo and redo changes in a word document.  With database changes, however the picture because more muddied.

Database Change Management Best Practices

  1. Require developers to include a roll forward and a roll back script with each set of schema changes.
  2. Check in those scripts into the version control system just like other software code.
  3. Be sure developers test the roll forward and roll backward operation on dev.
  4. Ensure that changes are documented, as well as possible side affects.

During deployment, operations folks and/or a DBA must still have their fingers on the trigger.  Some frameworks such as Ruby on Rails include Migration scripts.  Do not for example allow rollback scripts to run automatically.  This is a recipe for disaster.

  • Be sure to take a backup of the database before running any schema change scripts.
  • Consider application downtime if ALTERs or other operations may lock large tables.
  • Perform another backup following the schema changes
  • If database is particularly large you may isolate your backup to the schema or tables being altered.

Above all use common sense, and always second and/or third guess yourself.  Better to be safe than sorry when juggling your crown jewels.

Quora discussion by Sean Hull – What is database change management and why is it important?

Business Continuity Planning – What is it and why is it important?

BCP or BCRP if you want to also include “resiliency” in the acronym, basically outlines planning for the worst.  In the old days you had a filing cabinet with documents, for example there might be a central government office which houses birth certificates or titles and deeds.  Perhaps a copy of documents is regularly created, and kept offsite in case of fire.

In the digital era, more and more companies have most of their assets in digital form.  So the lifespan of those digital copies, and the contents thereof must be protected.  Since running computing systems and data centers always involves operational risk, planning for the worst is the essence of business continuity planning.

For example you may have redundant switches, routers and networking cables, but what if the whole data center goes down?  Do you have offsite backups that you’re creating regularly?  When was the last time those backups were tested with a full application restore and fire drill?

Discussed by Sean Hull on Quora – What is BCP and why is it important?

Query Profiling – What is it and why is it important?

Queries are so-named because of the lovely language we call SQL – simplified query language.  That’s a bit of sarcasm on my part, I’ve never found it particularly simple or elegant.  Profiling them involves finding how they are spending their time, and what they are asking the server to do.   In this way you can make them faster, and improve performance of the whole server, and thus your website.

At any rate queries ask the database for information.  If they are on the simpler side, something like give me all the users whose name starts with “hu” for example, and last name is indexed, that will run very fast.  The database will lookup in the index the last name field, and find the subset of ones starting with those letters, then go lookup the records by id and return them to you.  Cache the index blocks, cache the data blocks.  Great!  However, say those are customers, and you want their calling cellphone calling history.  Ok, now you have to join on another table, matching by some key, hold all those records in memory, shuffle them around, and so on.

So queries are effectively little jobs or bits of work you ask your database server to perform on your behalf.  With websites you typically have hundreds of concurrently running sessions all sending their own little SQL jobs to the server to get processed, and records returned.  And blip in the radar slows everyone down, so you want them to all run quickly.

That’s where profiling comes in.  MySQL, Oracle, and SQL Server alike all have EXPLAIN type facilities for showing the plan with which the database will use to fetch your data.  It shows indexes, sorting, joins, order, and so forth.  All of this contributes to the overall execution time and resources used on the database server.

Quora discussion by Sean Hull – What is query profiling and why is it important?

NOSQL Database – What is it and why is it important?

NOSQL is a sort of all-encompassing term which includes very simple key/value databases like Memcache, along with more sophisticated non-relational databases such as Mongodb and Cassandra.

Relational databases have been around since the 70′s so they’re a very mature technology.  In general they support transactions allowing you to make changes to your data in discrete, controlled manner, they support constraints such as uniqueness, primary and foreign keys, and check constraints.  And furthermore they use SQL or so-called Simplified Query Language to access ie fetch data, and also modify data by inserting, updating or deleting records.

SQL though is by no means simple, and developers over the years have taken a disliking to it like the plague.  For good reason.  Furthermore RDBMS’ aka relational database management systems, don’t horizontally scale well at all.  To some degree you can get read-only scalability with replication, but with a lot of challenges.  But write-based scaling has been much tougher a problem to solve.  Even Oracle’s RAC (formerly Parallel Server) also known as Real Application Clusters, faces a lot of challenges keeping it’s internal caches in sync over special data interconnects.  The fact is changes to your data – whether it’s on your iphone, desktop addressbook or office directory, those changes take time to propagate to various systems.  Until that data is propagated, you’re looking at stale data.

Enter NOSQL databases like MongoDB which attempt to address some of these concerns.  For starters data is not read/written to the database using the old SQL language, but rather using an object-oriented method which is developers find very convenient and intuitive.  What’s more it supports a lot of different type of indexing for fast lookups of specific data later.

But NOSQL databases don’t just win fans among the development side of the house, but with Operations too, as it scales very well.  MongoDB for instance has clustering built-in, and promises an “eventually consistent” model to work against.

To be sure a lot of high-profile companies are using NOSQL databases, but in general they are in use for very specific needs.  What’s more it remains to be seen whether or not many of those databases as they grow in size, and the needs for which they are put stretch across more general applications, if they won’t need to be migrated to more traditional relational datastores later.

Sean Hull asks on Quora – What is NOSQL and why is it important?

Macrowikinomics book review by Tapscott & Williams

Macrowikinomics follows on the success of the best selling Wikinomics.  It hits on a lot of phenomenal success stories, such as the Linux project, which has over a roughly twenty year history, produced 2.7 million lines of code per year and would have cost an estimated 10.8 billion that billion with a b, dollars to create by conventional means.  What’s more it’s estimated the Linux economy is roughly 50 billion.  With huge companies like Google, and Amazon Web Services built on datacenters driven principally by Linux it’s no wonder.

They also draw on the successes of companies like Local Motors who use collaboration and the internet in new and innovative ways.

In total this book speaks to the disruptive power of the internet and new technologies, and offers a lot of hopeful stories and optimism about where they are taking us.  Food for thought.

iHeavy Insights 81 – Web Performance Metrics

Metrics are those pesky little numbers we like to keep an eye on to see how we’re doing.  Website performance may be full of lots of jargon and fancy terminology but in the end the purpose is the same.  Watch the numbers to know how we’re doing.

In Economics

If you follow the economy you’re probably familiar with GDP or Gross Domestic Product tells you the total amount of goods and services that a country produced.  What about the CPI or Consumer Price Index, well that measures the price of a so-called basket of goods.  Those are intended to be goods everyone must have, such as food & beverages, housing, apparel, transportation, medical care and so forth.  By measuring the CPI we get a sense of consumers buying power or how far the dollar goes.

In Baseball

If you follow sports, you’ve probably heard of a players batting average which is hits divided by at bats.  A simple ratio, gives a picture of the players past performance.  Another statistic is the RBI or runs batted in, which tells you how many times the player caused runs to be scored.

Web Performance

Taken generally metrics give us a quick view of a more complicate picture.  Performance metrics for websites are no different.  For instance if we’re looking at the business or application level, we might keep track of things like:

  • user registrations
  • subscriptions sold
  • widgets sold
  • new accounts sold
  • user & social interactions
  • ratings and other gamification stats

So too at a lower level we can capture metrics of the systems our web application runs on top of with tools like Cacti, Munin, Ganglia, Zabbix, or OpenNMS.  The basics include:

  • cpu utilization
  • network throughput
  • disk throughput
  • memory usage
  • load average

And further down the stack we can keep metrics of are database activity such as:

  • buffer pool usage
  • files & table I/O
  • sorting activity
  • locking and lock waits
  • queries per second
  • transaction log activity

By tracking these metrics over time, we can view graphs at-a-glance and see trends.  What’s more folks from different sides of the business, can get visibility into what others needs are.  Business teams can see server loads and operations people can see real revenue and income.  That brings teams together to a common goal.

Performance Metrics – What are they and why are they important?

In order to understand how fast your website is, we need some numbers.  We call these fancy numbers performance metrics, objective measures that we can track over time.  We can track them for seasonality as well as website traffic and growth.  But we can also track them for feature and application changes based on deployments to see if new code has caused perfermance to improve or degrade.

Some useful business or application performance metrics include:

  • user registrations
  • accounts sold
  • widgets sold
  • user interactions & social metrics
  • so-called gamification, ratings & related

We also want to capture lower-level system  metrics with a tool like Cacti, Ganglia, Munin, OpenNMS, Zabbix or similar:

  • cpu
  • memory
  • disk
  • network
  • load average

Along with the basic system level metrics you’ll want to collect some at the database level such as:

  • InnoDB Buffer Pool activity
  • Files & Tables
  • Binary log activity
  • Locking
  • Sorting
  • Temporary objects
  • Queries/second

Sean Hull asks on Quora – What are web performance metrics and why are they important?

Agile – What is it and why is it important?

Agile software development seeks a more lightweight methodology of making changes and releases to software.  In the traditional, incremental approach, large pieces of software are written at once, and releases happen less frequently.  Once features are complete, the testing phase happens, and then deployment to production.  These releases can happen over many weeks in time, so turnaround for new features tends to be slow.   Advocates would argue that this forces discipline in the process, and prevents haphazard releases and buggy software.

Agile methodologies, seek to accelerate releases of much smaller pieces of code. These releases can happen daily or even many times a day, as developers themselves are given the levers to push code.  Agile tends to be more reactive to business needs, with less planning and requirements gathering up front.

While Agile remains the buzzword of the day, it may not work for every software development project.  Web development & applications where small failures can easily be tolerated and where small teams are at work on the effort, make most sense.

Sean Hull asks on Quora – What is Agile software development and why is it important?

Sharding – What is it and why is it important?

Sharding is a way of partitioning your datastore to benefit from the computing power of more than one server.  For instance many web-facing databases get sharded on user_id, the unique serial number your application assigns to each user on the website.

Sharding can bring you the advantages of horizontal scalability by dividing up data into multiple backend databases.  This can bring tremendous speedups and performance improvements.

Sharding, however has a number of important costs.

  • reduced availability
  • higher administrative complexity
  • greater application complexity

High Availability is a goal of most web applications as they aim for always-on or 24×7 by 365 availability.  By introducing more servers, you have more components that have to work flawlessly.  If the expected downtime of any one backend database is 1/2 hour per month and you shard across five servers, your downtime has now increased by a factor of five to 2.5 hours per month.

Administrative complexity is an important consideration as well.  More databases means more servers to backup, more complex recovery, more complex testing, more complex replication and more complex data integrity checking.

Since Sharding keeps a chunk of your data on various different servers, your application must accept the burden of deciding where the data is, and fetching it there.  In some cases the application must make alternate decisions if it cannot find the data where it expects.  All of this increases application complexity and is important to keep in mind.

Sean Hull asks on Quora – What is Sharding and why is it important?