Sean Hull discusses on Quora – What is Behavior Driven Development?
In the old days…
You would have a closet in your startup company with a rack of computers. Provisioning involved:
- Deciding on your architectural direction, what, where & how
- Ordering the new hardware
- Waiting weeks for the packages to arrive
- Setup the hardware, wire things together, power up
- Discover some component is missing, or failed and order replacement
- Wait longer…
- Finally get all the pieces setup
- Configure software components and go
Along came some industrious folks who realized power and data to your physical location wasn’t reliable. So datacenters sprang up. With data centers, most of the above steps didn’t change except between steps 3 & 4 you would send your engineers out to the datacenter location. Trips back and forth ate up time and energy.
Then along came managed hosting. Managed hosting saved companies a lot of headache, wasted man hours, and other resources. They allowed your company to do more of what it does well, run the business, and less on managing hardware and infrastructure. Provisioning now became:
- Decide on architecture direction
- Call hosting provider and talk to sales person
- Wait a day or two
- Setup & configure software components and go
Obviously this new state of affairs improved infrastructure provisioning dramatically. It simplified the process and sped it up as well. What’s more a managed hosting provider could keep spare parts and standard components on hand in much greater volume than a small firm. That’s a big plus. This evolution continued because it was a win-win for everyone. The only downside was when engineers made mistakes, and finger pointing began. But despite all of that, a managed hosting provider which does only that, can do it better, and more reliably than you can yourself.
So where are we in present day? We are all either doing, or looking out cloud provisioning of infrastructure. What’s cloud provisioning? It is a complete paradigm shift, but along the same trajectory as what we’ve described above. Now you removed all the waiting. No waiting for sales team, or the ordering process. That’s automatic. No waiting for engineers to setup the servers, they’re already setup. They are allocated by your software and scripts. Even the setup and configuration of software components, Operating System and services to run on that server – all automatic.
This is such a dramatic shift, that we are still feeling the affects of it. Traditional operations teams have little experience with this arrangement, and perhaps little trust in virtual servers. Business units are also not used to handing the trigger to infrastructure spending over to ops teams or to scripts and software.
However the huge economic pressures continue to push firms to this new model, as well as new operational flexibility. Gartner predicts this trend will only continue. The advantages of cloud infrastructure provisioning include:
- Metered payment – no huge outlay of cash for new infrastructure
- Infrastructure as a service – scripted components automate & reduced manual processes
- Devops – Manage infrastructure like code with version control and reproduceability
- Take unused capacity offline easily & save on those costs
- Disaster Recovery is free – reuse scripts to build standard components
- Easily meet seasonal traffic requirements – spinup additional servers instantly
Loading data into a database whether it is Oracle or MySQL, involves passing a lot of hurdles. If you’re working with CSV files, special characters may cause trouble, end of line characeters, very long lines, and so forth.
Once your data load is complete, spot checks involve basic checks to determine that all went well. Counting the number of records which should have been loaded and comparing the table’s count before and after the load, would be one simple spot check.
These types of checks are a basic once over, as opposed to a more thorough and rigorous analysis of the data, for example calculating checksums, that can guarantee that a data load completed and all data is in good shape.
A relational database is the warehouse of your data. Your crown jewels. It’s your excel spreadsheet or filing cabinet writ large. You use them everyday and may not know it. Your smartphone stores it’s contact database in a relational database, most likely sqlite – the ever present but ever invisible embedded database platform. Your online bank at Citibank or Chase stores all your financial history, statements, contact info, personal data and so forth, all in a relational database.
- organized around records
- data points are columns in a table
- relationships are enforced with constraints
- indexing data brings hi-speed access
- SQL is used to get data in and out of the db
- triggers, views, stored procs & materialized views may also be supported
Like excel, relational databases are organized around records. A record is like a 3×5 card with a number of different data points on it. Say you have 3×5 cards for your addressbook. Each card holds one address, phone number, email, picture, notes and so forth. By organizing things nicely on cards, and for each card predictable fields such as first name, last name, birthday etc, you can then search on those data points. Want all the people in your addressbook with birthday of July 5th, no problem.
While relational databases have great advantages, they require a lot of work to get all of your information into neatly organized files. What’s more the method for getting things into and out of them – SQL is a quirky and not very friendly language. What’s more relational databases have trouble clustering, and scaling horizontally. NOSql database have made some headway in these departments, but at costs to consistency and reliability of data.
As servers continue to get larger, it becomes rarer that a single web-facing database really needs more than one single server. If it’s tuned right, that is. Going forward and looking to the future, the landscape will probably continue to be populated by a mix of traditional relational databases, new nosql type databases, key-value stores, and other new technologies yet to be dreamed up.
During the software development process, whether you’re cutting edge Agile developers, or traditional waterfall method folks, your code changes are periodically accompanied by database changes. For instance tables have particular rows and columns. When developers add new columns (ie fields on an index card), or create new tables, relationships, indexes or other database objects, all of these are lumped together as database changes.
Version control systems have brought great manageability gains to software projects, even ones involving only a single developer. That’s because they allow you to rewind to any savepoints, just like you can undo and redo changes in a word document. With database changes, however the picture because more muddied.
Database Change Management Best Practices
- Require developers to include a roll forward and a roll back script with each set of schema changes.
- Check in those scripts into the version control system just like other software code.
- Be sure developers test the roll forward and roll backward operation on dev.
- Ensure that changes are documented, as well as possible side affects.
During deployment, operations folks and/or a DBA must still have their fingers on the trigger. Some frameworks such as Ruby on Rails include Migration scripts. Do not for example allow rollback scripts to run automatically. This is a recipe for disaster.
- Be sure to take a backup of the database before running any schema change scripts.
- Consider application downtime if ALTERs or other operations may lock large tables.
- Perform another backup following the schema changes
- If database is particularly large you may isolate your backup to the schema or tables being altered.
Above all use common sense, and always second and/or third guess yourself. Better to be safe than sorry when juggling your crown jewels.
BCP or BCRP if you want to also include “resiliency” in the acronym, basically outlines planning for the worst. In the old days you had a filing cabinet with documents, for example there might be a central government office which houses birth certificates or titles and deeds. Perhaps a copy of documents is regularly created, and kept offsite in case of fire.
In the digital era, more and more companies have most of their assets in digital form. So the lifespan of those digital copies, and the contents thereof must be protected. Since running computing systems and data centers always involves operational risk, planning for the worst is the essence of business continuity planning.
For example you may have redundant switches, routers and networking cables, but what if the whole data center goes down? Do you have offsite backups that you’re creating regularly? When was the last time those backups were tested with a full application restore and fire drill?
Queries are so-named because of the lovely language we call SQL – simplified query language. That’s a bit of sarcasm on my part, I’ve never found it particularly simple or elegant. Profiling them involves finding how they are spending their time, and what they are asking the server to do. In this way you can make them faster, and improve performance of the whole server, and thus your website.
At any rate queries ask the database for information. If they are on the simpler side, something like give me all the users whose name starts with “hu” for example, and last name is indexed, that will run very fast. The database will lookup in the index the last name field, and find the subset of ones starting with those letters, then go lookup the records by id and return them to you. Cache the index blocks, cache the data blocks. Great! However, say those are customers, and you want their calling cellphone calling history. Ok, now you have to join on another table, matching by some key, hold all those records in memory, shuffle them around, and so on.
So queries are effectively little jobs or bits of work you ask your database server to perform on your behalf. With websites you typically have hundreds of concurrently running sessions all sending their own little SQL jobs to the server to get processed, and records returned. And blip in the radar slows everyone down, so you want them to all run quickly.
That’s where profiling comes in. MySQL, Oracle, and SQL Server alike all have EXPLAIN type facilities for showing the plan with which the database will use to fetch your data. It shows indexes, sorting, joins, order, and so forth. All of this contributes to the overall execution time and resources used on the database server.
NOSQL is a sort of all-encompassing term which includes very simple key/value databases like Memcache, along with more sophisticated non-relational databases such as Mongodb and Cassandra.
Relational databases have been around since the 70′s so they’re a very mature technology. In general they support transactions allowing you to make changes to your data in discrete, controlled manner, they support constraints such as uniqueness, primary and foreign keys, and check constraints. And furthermore they use SQL or so-called Simplified Query Language to access ie fetch data, and also modify data by inserting, updating or deleting records.
SQL though is by no means simple, and developers over the years have taken a disliking to it like the plague. For good reason. Furthermore RDBMS’ aka relational database management systems, don’t horizontally scale well at all. To some degree you can get read-only scalability with replication, but with a lot of challenges. But write-based scaling has been much tougher a problem to solve. Even Oracle’s RAC (formerly Parallel Server) also known as Real Application Clusters, faces a lot of challenges keeping it’s internal caches in sync over special data interconnects. The fact is changes to your data – whether it’s on your iphone, desktop addressbook or office directory, those changes take time to propagate to various systems. Until that data is propagated, you’re looking at stale data.
Enter NOSQL databases like MongoDB which attempt to address some of these concerns. For starters data is not read/written to the database using the old SQL language, but rather using an object-oriented method which is developers find very convenient and intuitive. What’s more it supports a lot of different type of indexing for fast lookups of specific data later.
But NOSQL databases don’t just win fans among the development side of the house, but with Operations too, as it scales very well. MongoDB for instance has clustering built-in, and promises an “eventually consistent” model to work against.
To be sure a lot of high-profile companies are using NOSQL databases, but in general they are in use for very specific needs. What’s more it remains to be seen whether or not many of those databases as they grow in size, and the needs for which they are put stretch across more general applications, if they won’t need to be migrated to more traditional relational datastores later.
Macrowikinomics follows on the success of the best selling Wikinomics. It hits on a lot of phenomenal success stories, such as the Linux project, which has over a roughly twenty year history, produced 2.7 million lines of code per year and would have cost an estimated 10.8 billion that billion with a b, dollars to create by conventional means. What’s more it’s estimated the Linux economy is roughly 50 billion. With huge companies like Google, and Amazon Web Services built on datacenters driven principally by Linux it’s no wonder.
They also draw on the successes of companies like Local Motors who use collaboration and the internet in new and innovative ways.
In total this book speaks to the disruptive power of the internet and new technologies, and offers a lot of hopeful stories and optimism about where they are taking us. Food for thought.
Metrics are those pesky little numbers we like to keep an eye on to see how we’re doing. Website performance may be full of lots of jargon and fancy terminology but in the end the purpose is the same. Watch the numbers to know how we’re doing.
If you follow the economy you’re probably familiar with GDP or Gross Domestic Product tells you the total amount of goods and services that a country produced. What about the CPI or Consumer Price Index, well that measures the price of a so-called basket of goods. Those are intended to be goods everyone must have, such as food & beverages, housing, apparel, transportation, medical care and so forth. By measuring the CPI we get a sense of consumers buying power or how far the dollar goes.
If you follow sports, you’ve probably heard of a players batting average which is hits divided by at bats. A simple ratio, gives a picture of the players past performance. Another statistic is the RBI or runs batted in, which tells you how many times the player caused runs to be scored.
Taken generally metrics give us a quick view of a more complicate picture. Performance metrics for websites are no different. For instance if we’re looking at the business or application level, we might keep track of things like:
- user registrations
- subscriptions sold
- widgets sold
- new accounts sold
- user & social interactions
- ratings and other gamification stats
So too at a lower level we can capture metrics of the systems our web application runs on top of with tools like Cacti, Munin, Ganglia, Zabbix, or OpenNMS. The basics include:
- cpu utilization
- network throughput
- disk throughput
- memory usage
- load average
And further down the stack we can keep metrics of are database activity such as:
- buffer pool usage
- files & table I/O
- sorting activity
- locking and lock waits
- queries per second
- transaction log activity
By tracking these metrics over time, we can view graphs at-a-glance and see trends. What’s more folks from different sides of the business, can get visibility into what others needs are. Business teams can see server loads and operations people can see real revenue and income. That brings teams together to a common goal.