Category Archives: All

5 Ways to fortify MySQL replication

fort

Also find Sean Hull’s ramblings on twitter @hullsean.

MySQL replication technology is powerful and flexible. But it doesn’t do everything perfectly all the time. You may experience trouble with the slaves falling behind the master, or want to scale horizontally by building new slaves automatically. Or you might need to build a slave without blocking the master database.

All of these goals can be achieved using some powerful tools. Here’s a quick guide to those tools and how to use them.

    1. Build new Replicas without Downtime

Something we’re sure you need to do quite often is to build new slaves. You can snapshot from another slave by bringing the slave down, copying it’s datadir to an alternate machine, updating the server_id and then starting up. However sometimes you have no slave, or your current slaves are serving data.

In those cases you’ll need a way to snapshot off the master. With the great xtrabackup tool, you can create a snapshot of your production database without the locking of mysqldump. What’s more the resulting snapshot is a full datadir, so you won’t need to import all the data as you would with mysqldump. Saves yourself a lot of time!

Take a look at our how-to for building replication slaves using hotbackups.

    1. Autoscale in the Cloud

We wrote an extensive how-to diving into the nuts and bolts of MySQL Autoscaling.

    1. Use semisynchronous replication

In MySQL 5.5, the Google code contributions got rolled into the Generally Available version. That brings some great new features and functionality to make your replicas more resilient and improve overall redundancy.

We have an upcoming article planned for May that digs into configuring semisynchronous replication.

    1. Add integrity checking

Statement based replication, as powerful as it is, has some serious limitations. As it turns out, many slave databases drift silently out of sync with the master. What’s more they don’t flag errors when they do so. The reason why this can happen has to do with the nature of statement based replication. If you combine transactional & non-transactional tables, for example, and a transaction rolls back, the statements on MyISAM tables will still get replicated, resulting in potentially different rows on the slave. Other cases include various non-deterministic functions such as sysdate which may return different results on the slave.

Row-based replication begins to address these problems, by offering an alternative which includes the actual data changes replicated over the wire, instead of the statement instructions. Still this technology is new, and situations limit it’s use in some environments.

The best way to handle this situation is by using the Percona Toolkit checksum tool. This calculates checksums just like you might to compare files at the filesystem, and just as rsync does to sync data over ssh.

We have an upcoming article planned for May, covering table checksuming in depth. We’ll discuss how to collect the checksums via a cronjob, and then how to write a check which you can roll into Nagios, to monitor your data regularly.

    1. Watch out for the Potholes
    1. Use stored procedures & triggers sparingly or not at all

Stored procedures can behave strangely with statement based replication, and can easily break things. That’s because stored procedures are fairly non-deterministic by nature. If you want your replication to be bulletproof, avoid them.

    1. Don’t write to both masters

Sounds straightforward enough, but MySQL does not prevent you from doing so. We recommend you set the read-only flag on inactive masters, to protect you. You could also monitor certain key tables to be further cautious.

    1. Be sure to set unique server_id

MySQL’s binary logging uses the server_id setting to uniquely identify servers in the replication topology. So it’s crucial that they be set on all servers, and be unique.

    1. Be very wary using temporary tables

If a slave server dies, it’s temporary tables will be missing upon restart. If you have subsequent queries that rely on them, they’ll fail. Better to avoid them or use them very sparingly.

    1. Avoid MyISAM tables altogether

InnoDB should really be used for all MySQL environments. Any exceptions are so few, as to be not worth mentioning.

MyISAM tables are not crash safe, can and will lose data, and can be very slow due to locking problems.

Do you need to do full text searching, Sphinx comes to mind. It’s more scalable, takes load off the database server, and is lightening quick!

Lastly MyISAM tables can break replication, and we don’t want that! They’re not transaction safe, so if you mix them with InnoDB, bad things can happen!

    1. Avoid non-deterministic functions.

Some functions such as UUID() and RAND() will behave differently on different calls. That means they may behave differently when the slave calls the same statement it fetches from the binlog. These types of problems may break things silently, allowing your slave to drift out of sync with your master without error. See integrity checking above for ways to fix this.

    1. UPDATE with LIMIT clause

This interesting SQL combination relies heavily on the current sort of rows in the table. Suffice it to say it can behave differently on the slave, so be cautious if you have this syntax in your code. It may break replication.

  1. Row-based replication (still fairly new)

We are not ready to recommend row-based replication yet, as there have been reports of some troubles and surprises with it. It is a fairly new code path and though it’s included in GA release, we don’t see it widely in production yet. It does take a stab at addressing many of the issues with statement based replication though, so the potential is there. We expect to see it used more widely in the future.

Easy MySQL replication with hotbackups

Clone army

Also find Sean Hull’s ramblings on twitter @hullsean.

Setting up replication in MySQL is something we need to do quite often. Slaves die, replication fails, or tables and data get out of sync. Whenever we build a slave, we must start with a snapshot of all the data from the master database.

MySQLdump is the tried and true method of doing this, however it requires that we lock all the tables in the database. If we’re dumping a large database, this could be a significant period, where no writing can happen to our database for the duration of the backup. For many environments read-only is still an outage.

Enter hotbackups to the rescue. Percona comes with a tool that allows you to perform hotbackups of a running MySQL database, with no blocking. It’s able to do this because of Innodb & multi-version concurrency control (MVCC). Luckily we don’t need to dig into the guts to enjoy the benefits of this great technology.

Here’s a quick step-by-step guide to using xtrabackup to create a slave.

  1. Install xtrabackup
  2. If you don’t have any Percona software already on your server, don’t worry. You don’t need to use the Percona distribution to use xtrabackup. But you will need their repository installed. Here’s how:

    $ rpm -Uhv http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm

    From there simply install xtrabackup:

    $ yum install -y xtrabackup
  3. Snapshot master datadir
  4. The innobackupex utility comes with xtrabackup, and does the heavy lifting. So we can just use that to perform the backup.

    $ innobackupex /data/backup/

    Now we’ll see a new directory created inside /data/backup which looks something like this:

    /data/backup/2012-04-08_04-36-15/
  5. Apply binary logs
  6. The backup which xtrabackup created above is of the current state of the database. However there are transactions which are incomplete, and others which have not been flushed to their datafiles. In other words the backup as-is would be similar to a datadir if your database crashed. Some additional transactions must still be applied.

    To apply those changes, use the following command on the backup directory you created above:

    $ innobackupex --apply-log /data/mysql/backup/2012-04-08_04-36-15/
  7. Copy to slave
  8. $ scp -r /data/mysql/backup/2012-04-08_04-36-15 root@newslave:/data/
  9. Stop MySQL
  10. $ /etc/init.d/mysql stop
  11. Swap datadir
  12. $ cd /data$ mv mysql mysql_old$ mv 2012-04-08_04-36-15 mysql

  13. Adjust my.cnf parameters
  14. At minimum you need to set the server_id to a unique value. The IP address with the periods removed can make a good server_id.

  15. Start MySQL
  16. $ /etc/init.d/mysql start
  17. Point to master & start the slave
  18. One very nice thing about xtrabackup is that it automatically captures the master info, so we’ll easily be able to find out the current log file & log position! That’s a very nice feature.

    Find out where the slave should start from:

    $ cat /data/mysql/xtrabackup_binlog_infolog_bin.000027 2973624

    Now tell MySQL where the new master is:

    mysql> change master to-> master_user=’rep’,-> master_password=’rep’,

    -> master_host=’10.20.30.40′,

    -> master_log_file=’log_bin.000027′,

    -> master_log_pos= 2973624;

    Now start the slave:

    mysql> start slave;

    Lastly verify that it is running properly:

    mysql> show slave statusG;

    You should see the following:

    Slave_IO_Running: YesSlave_SQL_Running: Yes
  19. Test Replication
  20. Once you have replication up and running, you should test it as well. I like to keep a test table installed in the test schema for this purpose. Then you can test as follows:

    master> insert into sean_test values ('xtrabackup is a great way to create a slave with MySQL');

    Then verify that you see that row on your new slave:

    slave> select * from sean_test;

    Once you’ve used xtrabackup a few times, I’m sure you’ll be converted. It makes building a slave much simpler in MySQL. It captures the file & position for you and what’s more there is no dump file to apply – which typically takes a lot of time too! All in all the tool makes you more efficient, and allows you to snapshot slaves anytime you like.

    Now that you have replication working, you should add the icing to the cake. MySQL’s statement based replication is powerful, but even when it’s not throwing errors, the databases can get silently out of sync. In a future article we’ll discuss how to bulletproof your replication setup with a tool that performs checksums on your tables. That will give you professional enterprise class data protection in MySQL.

Thank You for Arguing – Persuasion for fun and profit

thank you for arguing cover

Join 10,000 others and follow Sean Hull on twitter @hullsean.

I first read about Heinrichs in a Bloomberg Businessweek piece on him. He’s quite a character, with high profile clients like Ogilvy & Mather and the Pentagon. Struck by some of his ideas, I decided to pickup Thank You for Arguing.

Related: AirBNB Didn’t Have to Fail – With AWS Outage

48 laws of soft power

Compiled into 25 very readable chapters, Heinrichs illustrates how to win trust through managing your voice with volume control for positive affect, verbal jousting and calling fouls, and mastering timing. Sure in the real world this is all going to require a lot of trial and error, and practice in the trenches. But his book serves as a very good guide along the way.

Also: 5 Conversational Ways to Evaluate Great Consultants

Don’t worry too much about Aristotle, Cicero or the classics you never learned in school. If anything they serve as a colorful highlight to his useful everyday illustrations.

Some examples worth recalling…

1. Have a disagreement at a meeting? Diffuse it with “let’s tweak it”.

2. Pay attention to your tenses:

o using past tense the conversation is trying to place blame
o using present tense you’re talking about values
o using future tense you’re considering choices and solutions

3. Pay attention to commonplaces – your audience’s beliefs and values

4. Effective argument works by:

o appealing to character (pathos) understand your audience’s personality
o using logic (logos)
o appealing to emotion (ethos)

Read this: RDS or MySQL – 10 Use Cases

[quote]
I know what I believe. I will continue to articulate what I believe and what I believe–I believe what I believe is right. – George W. Bush
[/quote]

He has one whole chapter on Bushisms, which I found intriguing. Bush used code grooming to very strong effect. When speaking to different groups, he emphasized these code words in his sentences. With women, words like “I understand”, “peace”, “security” and “protecting”. With a military group words such as “never relent”, “we must not waver” and “not on my watch” were common. For religious audiences, “I believe” resonated strongly. He quotes a superb Bushism which in this light suddenly begins to sound powerful:

Check out: A CTO Should Never Do This

“I know what I believe. I will continue to articulate what I believe and what I believe–I believe what I believe is right.”

Rhetoric indeed. I’ll be studying this book for months to come!

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

Autoscaling MySQL on Amazon EC2

Also find Sean Hull’s ramblings on twitter @hullsean.

Autoscaling your webserver tier is typically straightforward. Image your apache server with source code or without, then sync down files from S3 upon spinup. Roll that image into the autoscale configuration and you’re all set.
autoscaling MySQL
With the database tier though, things can be a bit tricky. The typical configuration we see is to have a single master database where your application writes. But scaling out or horizontally on Amazon EC2 should be as easy as adding more slaves, right? Why not automate that process?

Below we’ve set out to answer some of the questions you’re likely to face when setting up slaves against your master. We’ve included instructions on building an AMI that automatically spins up as a slave. Fancy!

  1. How can I autoscale my database tier?
    1. Build an auto-starting MySQL slave against your master.
    2. Configure those to spinup. Amazon’s autoscaling loadbalancer is one option, another is to use a roll-your-own solution, monitoring thresholds on servers, and spinning up or dropping off slaves as necessary.
  2. Does an AWS snapshot capture subvolume data or just the SIZE of the attached volume?
  3. In fact, if you have an attached EBS volume and you create an new AMI off of that, you will capture the entire root volume, plus your attached volume data. In fact we find this a great way to create an auto-building slave in the cloud.

  4. How do I freeze MySQL during AWS snapshot?
  5. mysql> flush tables with read lock;mysql> system xfs_freeze -f /data

    At this point you can use the Amazon web console, ylastic, or ec2-create-image API call to do so from the command line. When the server you are imaging off of above restarts – as it will do by default – it will start with /data partition unfrozen and mysql’s tables unlocked again. Voila!

    If you’re not using xfs for your /data filesystem, you should be. It’s fast! The xfsprogs docs seem to indicate this may also work with foreign filesystems. Check the docs for details.

  6. How do I build an AMI mysql slave that autoconnects to master?
  7. Install mysql_serverid script below.

    1. Configure mysql to use your /data EBS mount.
    2. Set all your my.cnf settings including server_id
    3. Configure the instance as a slave in the normal way.
    4. When using GRANT to create the ‘rep’ user on master, specify the host with a subnet wildcard. For example ‘10.20.%’. That will subsequently allow any 10.20.x.y servers to connect and replicate.
    5. Point the slave at the master.
    6. When all is running properly, edit the my.cnf file and remove server_id. Don’t restart mysql.
    7. Freeze the filesystem as described above.
    8. Use the Amazon console, ylastic or API call to create your new image.
    9. Test it of course, to make sure it spins up, sets server_id and connects to master.
    10. Make a change in the test schema, and verify that it propagates to all slaves.
  8. How do I set server_id uniquely?
  9. As you hopefully already know, in MySQL replication environment each node requires a unique server_id setting. In my Amazon Machine Images, I want the server to startup and if it doesn’t find the server_id in the /etc/my.cnf file, to add it there, correctly! Is that so much to ask?

    Here’s what I did. Fire up your editor of choice and drop in this bit of code:

    #!/bin/shif grep -q “server_id” /etc/my.cnf

    then

    : # do nothing – it’s already set

    else

    # extract numeric component from hostname – should be internet IP in Amazon environment

    export server_id=`echo $HOSTNAME | sed ‘s/[^0-9]*//g’`

    echo “server_id=$server_id” >> /etc/my.cnf

    # restart mysql

    /etc/init.d/mysql restart

    fi

    Save that snippet at /root/mysql_serverid. Also be sure to make it executable:

    $ chmod +x /root/mysql_serverid

    Then just append it to your /etc/rc.local file with an editor or echo:

    $ echo "/root/mysql_serverid" >> /etc/rc.local

    Assuming your my.cnf file does *NOT* contain the server_id setting when you re-image, then it’ll set this automagically each time you spinup a new server off of that AMI. Nice!

  10. Can you easily slave off of a slave? How?
  11. It’s not terribly different from slaving off of a normal master.

    1. First enable slave updates. The setting is not dynamic, so if you don’t already have it set, you’ll have to restart your slave.
    2. log_slave_updates=true
    3. Get an initial snapshot of your slave data. You can do that the locking way:
    4. mysql> flush tables with read lock;mysql> show master statusG;

      mysql> system mysqldump -A > full_slave_dump.mysql

      mysql> unlock tables;

      You may also choose to use Percona’s excellent xtrabackup utility to create hotbackups without locking any tables. We are very lucky to have an open-source tool like this at our disposal. MySQL Enterprise Backup from Oracle Corp can also do this.

    5. On the slave, seed the database with your dump created above.
    6. $ mysql < full_slave_dump.mysql
    7. Now point your slave to the original slave.
    8. mysql> change master to master_user='rep', master_password='rep', master_host='192.168.0.1', master_log_file='server-bin-log.000004', master_log_pos=399;mysql> start slave;

      mysql> show slave statusG;

  12. Slave master is set as an IP address. Is there another way?
  13. It’s possible to use hostnames in MySQL replication, however it’s not recommended. Why? Because of the wacky world of DNS. Suffice it to say MySQL has to do a lot of work to resolve those names into IP addresses. A hickup in DNS can interrupt all MySQL services potentially as sessions will fail to authenticate. To avoid this problem do two things:

    1. Set this parameter in my.cnf
    2. skip_name_resolve = true
    3. Remove entries in mysql.user table where hostname is not an IP address. Those entries will be invalid for authentication after setting the above parameter.
  14. Doesn’t RDS take care of all of this for me?
  15. RDS is Amazon’s Relational Database Service which is built on MySQL. Amazon’s RDS solution presents MySQL as a service which brings certain benefits to administrators and startups:

    • Simpler administration. Nuts and bolts are handled for you.
    • Push-button replication. No more struggling with the nuances and issues of MySQL’s replication management.
    • Simplicity of administration of course has it’s downsides. Depending on your environment, these may or may not be dealbreakers.

    • No access to the slow query log.
    • This is huge. The single best tool for troubleshooting slow database response is this log file. Queries are a large part of keeping a relational database server healthy and happy, and without this facility, you are severely limited.

    • Locked in downtime window
    • When you signup for RDS, you must define a thirty minute maintenance window. This is a weekly window during which your instance *COULD* be unavailable. When you host yourself, you may not require as much downtime at all, especially if you’re using master-master mysql and zero-downtime configuration.

    • Can’t use Percona Server to host your MySQL data.
    • You won’t be able to do this in RDS. Percona server is a high performance distribution of MySQL which typically rolls in serious performance tweaks and updates before they make it to community addition. Well worth the effort to consider it.

    • No access to filesystem, server metrics & command line.
    • Again for troubleshooting problems, these are crucial. Gathering data about what’s really happening on the server is how you begin to diagnose and troubleshoot a server stall or pileup.

    • You are beholden to Amazon’s support services if things go awry.
    • That’s because you won’t have access to the raw iron to diagnose and troubleshoot things yourself. Want to call in an outside consultant to help you debug or troubleshoot? You’ll have your hands tied without access to the underlying server.

    • You can’t replicate to a non-RDS database.
    • Have your own datacenter connected to Amazon via VPC? Want to replication to a cloud server? RDS won’t fit the bill. You’ll have to roll your own – as we’ve described above. And if you want to replicate to an alternate cloud provider, again RDS won’t work for you.

The myth of five nines – Why high availability is overrated

nine_clock

Join 12,000 others and follow Sean Hull on Twitter @hullsean.

In the Internet world 24×7 has become the de facto standard. Websites must be always on, available 24 hours a day, 365 days a year. In our pursuit of perfection, performance is being measured down to three decimal places, that is being up 99.999% of the time; in short, five-nines

Just like a mantra, when repeated enough it becomes second nature and we don’t give the idea a second thought. We don’t stop to consider that while it may be generally a good thing to have, is five-nines necessary and is it realistic for the business?

Also: How to hire a developer that doesn’t suck

In my dealings with small businesses, I’ve found that the ones that have been around longer, and with more seasoned managers tend to take a more flexible and pragmatic view of the five-nines standard. Some even feel that periods of outages during off hours as – *gasp* – no problem at all! On the other hand it is a universal truth held by the next-big-idea startups that 24×7 is do or die. To them, a slight interruption in service will send the wrong signal to customers.

The sense I get is that businesses that have been around longer have more faith in their customers and are confident about what their customers want and how to deliver it.  Meanwhile startups who are building a customer base feel the need to make an impression and are thus more sensitive to perceived limitations in their service.

Of course the type of business you run might well inform your policy here. Short outages in payments and e-commerce sites could translate into lost revenue while perhaps a mobile game company might have a little more room to breathe.

Related: Why generalists are better at scaling the web

Sustaining five nines is too expensive for some

The truth is sustaining high availability at the standard of five-nines costs a lot of money. These costs are incurred from buying more servers, whether as physical infrastructure or in the cloud. In addition you’ll likely involve more software components and configuration complexity. And here’s a hard truth, with all that complexity also comes more risk.  More moving parts means more components that can fail. Those additional components can fail from bugs, misconfiguration, or interoperability issues.

What’s more, pushing for that marginal 0.009% increase in high availability means you’ll require more people and create more processes.

Read this: Why reddit didn’t have to fail

Complex architecture downtime

In a client engagement back in 2011, I worked with a firm in the online education space.  Their architecture was quite complex.  Although they had web servers and database servers—the standard internet stack—they did not have standardized operations.  So they had the Apache web server on some boxes, and Nginx on others.  What’s more they had different versions of each as well as different distributions of Linux, from Ubuntu to RedHat Enterprise Edition.  On the database side they had instances on various boxes, and since they weren’t all centralized they were not all being backed up.  During one simple maintenance operation, a couple of configurations were rearranged, bringing the site down and blocking e-commerce transactions for over an hour.  It wasn’t a failure of technology but a failure of people and processes made worse by the hazard of an overly complex infrastructure.

In another engagement at a financial media firm, I worked closely with the CTO outlining how we could architect an absolutely zero downtime infrastructure.  When he warned that “We have no room for *ANY* downtime,” alarm bells were ringing in my head already.

Also: Why RDS doesn’t support Maria DB or Percona

When I hear talk of five-nines, I hear marketing rhetoric, not real-world risk reduction.   Take for example the power grid outage that hit the Northeast in 2003.  That took out power from large swaths of the country for over 24 hours.  In real terms that means anyone hosted in the Northeast failed five-nines miserably because downtime for 24 hours would be almost 300 years of downtime at the five-nines standard!

For true high availability look at better management of processes

So what can we do in the real-world to improve availability?  Some of the biggest impacts will come from reducing so-called operator error, and mistakes of people and processes.

Before you think of aiming for five-nines,  first ask some of these questions:

o Do you test servers?
o Do you monitor logfiles?
o Do you have network wide monitoring in place?
o Do you verify backups?
o Do you monitor disk partitions?
o Do you watch load average?
o Do you monitor your server system logs for disk errors and warnings?
o Do you watch disk subsystem logs for errors? (the most likely component in hardware to fail is a disk)
o Do you have server analytics?  Do you collect server system metrics?
o Do you perform fire drills?
o Have you considered managed hosting?

If you’re thinking about and answering these questions you’re well on your way to improving availability and uptime.

Read this: Top MySQL interview questions for DBAs, hiring managers & recruiters

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

The risk of living in a filter bubble

The Filter Bubble coverI’ve been looking for this book for a long time. Or maybe I should rephrase that. I was pondering this topic and had been looking for a book which covered it for a while so I was pleased to come across The Filter Bubble.

Digging into search engine optimization and analytics while building my own website, I was often confused by inconsistent Google search results. Realizing I was on a different computer, or I was logged into Google services I would logout to see the untainted results, the results everyone else was seeing. Or was I?

As Google+ personalization launched the topic of search really piqued my interest. Why had I been given different results at different times? The coverage on Gigaom, AllThingsD, TechCrunch and ReadWriteWeb cautioned that this could be the turning point for Google, in a bad way.

The filter

It’s true Google takes signals from many different sources. With the launch of Google+ they now incorporate additional social signals. As Facebook becomes the default dashboard for more and more internet users, the means of finding content is shifting from Search to Social. So Google is responding to this by making their overall service more social as well.

The impact though for users of the service could be confusion. Many users I’ve spoken to, working in tech or otherwise think the results they see on Google are unbiased and the same for each user. Google’s secret sauce has always been its algorithm that returned the best results. Now that social signals are mixed into the page rank brew, will users continue to value Google results?

A cause for concern

Pariser’s illustrates the difference in Google search today with great examples. After the gulf oil spill, he asked two friends to search for “BP”. One saw breaking news on the topic, the other got investment information about BP. Filter bubble, indeed.

Behavior Targeting, as it’s termed in the industry, is all about figuring out what you want before you ask. But sociologist Danah Boyd argued in a Web 2.0 Expo speech in 2009, that with all this personalization giving us exactly what we *want* that
“If we’re not careful, we’re going to develop the psychological equivalent of obesity”.
Even foundres, Sergey Brin and Larry Page in the early days apparently thought that this bias might turn out to be a problem

[quote_left]“We expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers”.— Sergey Brin & Larry Page, Google[/quote_left]

Firms like Recorded Future promise to “unlock the predictive power of the web” and lesser known but formidable Acxiom which specializes in marketing and personalization, combing through mountains of data to figure out what coast you live on, how you vote, what you eat and drink, and almost what you’re going to do before you do it.

Pariser touches on everything in this book from present bias to meaningful threats, the priming effect, accessibility bias and warns of getting trapped in what he terms a “you loop” where you continue to see things framed and personalized by what you’ve viewed and reacted to before, ultimately narrowing your view, and limiting your exposure to new information.

Perhaps the biggest problem with these opaque transformations applied to your data is that they play judge and jury with no appeal; sometimes without your knowledge that you were in a courtroom being judged. Programmers write algorithms and code to perform these transformations, sorting people into groups. If they put people in a group that doesn’t match them they call it “overfitting”. In society we might call this some sort of stereotyping.

One chapter titled the Adderall Society, asks if this filter bubble isn’t part of a larger transformation that’s pushing our thinking towards the conservative and calculative, limiting creativity or foreign ideas that can break categories, and encouraging us to ignore or steer around serendipity.

The book bumps into a great spectrum of thinkers on this topic, from Danny Sullivan of SearchEngineLand, to Amit Singhal an engineer on Google’s team, and John Battelle’s SearchBlog. He speaks to Chris Coyne – okcupid.com, David Shields on what he calls truthiness and former CIA consultant John Rendon who says “filter bubbles provide new ways of managing perceptions” and suggests the garden variety Thesaurus as a powerful tool that nudges debates with new language.

Be aware but don’t be paranoid

Although I think the book gives valuable insight, I was a little dismayed by the mood of paranoia in its title. With a subtitle like What the Internet is Hiding from You, it suggests a conspiracy or hidden agenda. Now obviously these large corps have a motive to make money, but I don’t think anyone is surprised at that. To some, Pariser’s views may appear somewhat left leaning but the issues raised in his book transcends political boundaries. They are matters that concern society at large.

In the end I think I’m probably more optimistic about these things. With a long view, society tends to work out these issues, through public pressure or simply buying differently. As Google is quick to remind us, we can easily choose an alternate search engine. In the future perhaps public pressure will push firms to provide more transparency about these filtering mechanisms allowing end users to manage their own filter settings.

I’ll leave you with a few ideas to chew on. Can code and algorithms curate properly? Should there be another button alongside the Like button such as “Important”?

Pariser quoted the folks at the New York Times: “We don’t let metrics dictate our assignments and play, because we believe readers come to us for our judgement, not the judgement of the crowd”. Indeed. But in the internet age, is that what they *buy* or *click*?

“My startup is too cool for your business school”

An article I read on Tech Crunch recently got me thinking about startup culture. In Are You Building A Company, Or Just Your Credentials Geoff Lewis, expressed his distaste for a friend’s plan to get on Y Combinator’s ‘no idea’ startup incubation program. In this experimental approach, groups or individuals with a desire to be part of a startup but who have no product or business idea to begin with, can apply.

The thinking I believe, is that since brilliant ideas aren’t the only factor for startup success (many other factors like organisation skills, business savvy and tenacity matter too) YC will dig into their vault of ideas and match one that’s most suitable to these idea-starved groups.

Firstly, reading this I could sympathize with the discontent.  Venture capitalists exist for people who have an idea and want to realise it. There are already programs for people who don’t have an idea but want to achieve some success. They are called “careers”.

However, if you look at it from an incubator’s perspective, it’s a pretty clever and measured approach. YC knows what sort of group dynamic in startups have a higher chance of success and it is casting its net wide to find them.

But Lewis took issue with the fact that his MBA-qualified, credential-seeking buddy was signing up with YC presumably just to add to his blue ribbon collection.

With such articles what’s usually more interesting is the comments they elicit. Many who reacted responded with the same feelings of contempt; calling paper-chasers ‘hucksters’ and scoffing at their lack of passion and sincerity.

The tone among some carried this notion that startups were a special breed of entrepreneurs burdened with some sort of higher calling to liberate the world; money and honour being an afterthought.

Not being directly part of the startup circle, so-to-speak, I found the reaction amusing and frankly, rather foolish. Are people involved in startups turning into a sort of in-group? Do they really think of themselves as some kind of mutant-strain of businesses that are different from ‘regular’ enterprises?

If we look at the richest Internet companies today, once startups themselves, they are no less motivated by avarice and the bottomline, so why be so judgmental of Mr MBA treating the YC program as a way to gain credentials? I bet many talented individuals are going to have a go at it for the same reason if not a variety of reasons. As the program draws out, I suspect the number of participants will peter out by attrition anyway.

And who’s to say not coming up with a disruptive idea makes you less enterprising? Think of startups as team sports. There are some teams that innovate with the most creative gameplay that catch their opponents off-guard. There are teams that win by consistency and endurance, doing the same thing well over many years to pull ahead of the pack.

To cast aspersions on the motivations of others and to let silly prejudices limit participation seems discordant with the spirit of the Internet itself, where challenging convention is the order of the day and everyone is entitled to a shot at success.

To the ‘Microsoft Azure’ Cloud

To The Cloud: Powering An Enterprise introduces the concepts of cloud computing from a high-level and strategic standpoint. I’ve read quite a few tomes on cloud computing and I was interested to see how this one would stack up against the others.

The book is not too weighty in technical language so as not to be overwhelming and intimidating. However at ninety five pages, one might argue it is a bit sparse for a $30 book, if you purchase it at full price.

It is organized nicely around initiatives to get you moving with the cloud.

Chapter 1, Explore takes you through the process of understanding what the cloud is and what it has to offer.

Chapter 2, Envision puts you in the drivers seat, looking at the opportunities the cloud can offer in terms of solutions to current business problems.

Chapter 3, Enable discusses specifics of getting there, such as selecting a vendor or provider, training your team, and establishing new processes in your organization.

Finally in Chapter 4, we hit on real details of adopting the cloud in your organization. Will you move applications wholesale, or will you adopt a hybrid model? How will you redesign your applications to take care of automated scaling? What new security practices and processes will you put in place. The authors offer practical answers to these questions. At the end there is also an epilogue discussing emerging market opportunities for cloud computing, such as those in India.

One of the problems I had with the book is that although it doesn’t really position itself as a Microsoft Cloud book per se, that is really what the book aims at.

For example, Microsoft Azure is sort of the default platform throughout the book, whereas in reality most folks think of Amazon Web Services to be the sort of default when talking about cloud computing. Although specifically, Azure is really a platform, while AWS is Infrastructure or raw iron, that can run Linux based Operating Systems, or Windows Azure stuff.

Of course having a trio of Microsoft executives as authors gives a strong hint to readers to expect some plugging but a rewrite of the title would probably manage readers’ expectations better.

The other missing piece with this book is a chapter on tackling new challenges in the cloud. Cloud Computing – Azure or otherwise, brings challenges with respect to hardware as using the cloud means deploying across shared resources. For example it’s hard to deploy a high-performance RAID array or SAN solution devoted to one server in the cloud. This is a challenge on AWS as well, and continues to be a major adoption hurdle. It’s part of the commoditization puzzle, but it’s as yet not completely solved. Such a chapter to discuss mitigating against virtual server failures, using redundancy, and cloud components to increase availability would be useful.

Lastly, I found it a bit disconcerting that all of the testimonials were from fellow CTOs and CIOs of big firms, not independents or other industry experts. For example I would have liked to see George Reese of Enstratus, Thorsten von Eicken from Rightscale or John Engates from Rackspace provide a comment or two on the book.

Overall the book is a decent primer if you’re looking for some guidance on Microsoft Azure Cloud. It is not a comprehensive introduction to cloud computing and you’d definitely need other resources to get the full picture. At such a hefty sticker price, my advice is to pick this one up at the bargain bin.

Oracle to MySQL – prepare to bushwhack through the open source jungle

oracle to mysql

I was recently approached by a healthcare company for advice on suitable database solutions capable of executing its new initiative. The company was primarily an Oracle shop so naturally, they began by shopping for possible Oracle solutions.

The CTO relayed his conversation with the Oracle sales rep, who at first recommended an Oracle solution that, expensive as it may have been, ultimately aligned with the company’s existing technology and experience. Unfortunately this didn’t match their budget and so predictably, the Oracle sales rep whipped out a MySQL-based solution as an alternative.

Having worked as an Oracle DBA throughout the dot-com years, I know the technology well. I also know the cultural differences between enterprises that choose Oracle solutions and those that choose open-source ones.

This encounter with the healthcare firm struck me as a classic conundrum for today’s companies who are under pressure to meet business targets under a tight budget, and in a very short time.

Can an open-source solution like MySQL be the answer to such huge demands?

The Oracle sales rep will likely nod excitedly and say no sweat. But as a consultant I could only manage an equivocal yes.

As the healthcare CTO rattled off the list of products he wanted to use, specific RTOs and RPOs (recovery time objective + recovery point objective – all I could think was to react with concern.

In my experience with startup after startup I’ve seen plenty of different MySQL installations but I’d never heard of one with the technology stack he described. What’s more I’d never heard of these solutions described with the Oracle Corp titles.

On one hand I wanted to discuss the merits of the solution he was keen to implement, while on the other, I was expressing concern over possible directions and paths we might take.

An Oracle cluster is not a MySQL cluster

The solution Oracle suggested was a MySQL Cluster. The term cluster unfortunately means different things to different people. Such loose usage of the word dilutes its meaning. In particular a lot of Oracle technologists expect that this solution might be similar to Oracle’s Real Application Cluster technology. It’s not. There are a lot of limitations, and frankly it’s really just a different beast.

The list also included various management dashboards which Oracle likes to push, but which I rarely see in my consulting assignments. What’s more I heard nothing about replication integrity considering that replication problems are an ongoing concern for real-world MySQL installations due to the particular technology used under the hood. There are reliable solutions to this problem but none yet available from Oracle. In fact, this is a big problem but one that may be completely off the sales guys’ radar.

Don’t let sales frame your architecture

Honestly, I don’t have a particularly large axe to grind with the sales guys. They have a job to do, and providing solutions which bring revenue to their firm and commissions for themselves is what puts food on their tables. Each party is motivated in different ways. But as a company shopping for solutions, this should be kept clearly in mind when starting down that road.

Beware prescribed architectural frameworks that appear too easy because they almost always don’t do what they say on the tin. Unfortunately sales folks don’t have experiencing designing architectures in the real world, so they can’t really know how the technologies work beyond the data sheet with feature bullet points.

As we all know in the technology space, all software come with bugs and real-world experience does not match the feature lists in the brochures. In law they have de jure and de facto. The former describes what is written and the latter, what’s practiced. For technology solutions, its never just adding water for something to work.

Do your homework

Before you embark on a new trip through the open source technology jungle, do some due diligence. Read up on real-world solutions, and how other large firms are using the technology. What configurations are they having success with? Which are causing trouble for a lot of people.

One of the great advantages of open-source are the very vibrant communities, forums and discussion groups where people are glad to share their experiences and offer advice.

Allow sufficient time to test and
bring your team up to speed

This is very important one. Shifting from an enterprise that relies primarily on Oracle for it’s relational database solution over to one that relies on open source technologies is a very big step indeed. Open-source technologies tend to be much more do-it-yourself and roll your own. Oracle solutions tend much more toward predefined paths and solutions and prescriptions for customers.

There are merits to each of these paths, with attendant pros and cons. But they are decidedly different. It’s likely that your team will also require time to get up to speed, not just with the particular software components, but with the new process by which things happen in the open-source space. Allow sufficient time for this shift to take place, lest you create more problems than solutions.

A handy guide for PHP and MongoDB Web Development

PHP and MongoDBWhat makes a beginner’s guide handy is when it speaks to your intuition. It anticipates the burning questions that follow from a newbie trying to grasp new concepts and it quickly answers them. PHP and MongoDB Web Development – Beginner’s Guide is one such guide.

I hadn’t heard of Packt Publishing or Rubayeet Islam before picking up this title and I must say I’m impressed. Based in Birmingham, with offices in Mumbai, part of Packt’s business model is to give part of the royalties earned from its books to the open source projects they cover.

I already had a working knowledge of MongoDB, mostly from an operational perspective. If you are new to MongoDB you’ll certainly appreciate how this book is structured. it cuts to the chase, diving right into the nuts and bolts of installing the pieces you’ll need such as database and drivers, and getting your first application running.

From there they take you through a basic web application step-by-step, with chapters on session management, MapReduce and GridFS. EVery time I flip through the pages of a technical book, I find I always have questions in the background; ‘what about performance?’ or ‘How do I troubleshoot these pieces as I’m building them?’

What I liked about this book is that almost as quickly as I’d formulate some question about performance, I’d happen upon answers in the book, as if it knew what would come to my mind at each point of the

I was thinking about tuning and application performance and then found chapter 9 which discusses MongoDB’s explain facility, similar to that of MySQL. From there they cover index creation, hints, and finally profiling. These are all important topics for a developer, ones that he or she should have in mind while building their applications. So I was happy to see good coverage of that even in a self-avowed beginners’ guide.

Building apps that talk to both MySQL and MongoDB

Another interesting chapter was one introducing the idea of building an application that can talk to both MySQL and MongoDB and using those two datastores for different purposes. Again while I’m reading it I start thinking about operational concerns, and I start asking how one would support such an architecture. And then just like clockwork, Islam answers that very question.

He explains the challenges around data consistency and operational support in detail. It’s a great way to introduce a topic without necessarily pushing that adoption per se. Islam is clearly an experienced programmer, with much reasoned advice to share.

The book had great utility but I do have a few complaints.

First off the font is a little funky, and hard to read after a while. In that same vein, some of the screenshots are very wide and as such were zoomed down. This made those tiny and not very readable. Also the screenshots aren’t really consistent, some are black on white and some white text on black terminal which ended up being impossible to read.

Lastly I would have liked to see more use case discussions. Particularly, when should I consider a NoSQL database like MongoDB over a relational database? Which types of applications are really well suited? Which aren’t? What about versus other NoSQL’s? The same with GridFS. There was some caution there after the material was introduced but more discussion about what applications it is well suited for would be useful.

Those few complaints aside, the book is overall very good and perhaps the publishers will consider improving the type and diagrams in the next edition. It definitely sticks to it’s cover page motto “Learn by doing: less theory, more results”.