Category Archives: All

Sometimes… let things break a little

Have you ever started a new project, just into it you realize that maybe there aren’t technical problems to solve? It starts to dawn on you the real crux of the problem boils down to people & processes?

It’s happened to me on a number of occasions, but once in particular really stands out for me.

I was working for a firm in the education space, in particular around test preparations.

Asked to automate a publish process

The environment had a mix of relational databases, from SQL*Server to MySQL for some applications. The web facing database however used Oracle on the backend.

Their career DBA was real old guard Oracle, he had his ways of doing things, and didn’t want to rock the boat. In particular he managed the process for publishing changes to the website. Publishing amounted to running a few hand rolled scripts and each step was a manual one.

With the process setup this way, the editors had to work closely with engineering each time they wanted to move content to the website. Slow, cumbersome, and not very workable.

The real problem, siloed departments & infighting

As I worked closely with the DBA, quite a few things became clearer. For one he was sometimes a grumpy fellow & he had a strong accent which was sometimes hard to cut through. Knowing the other team members, I knew this all contributed to the trouble. But he maintained quite a bit of resistance to automation of the process no matter what. His view was, if he hands over the reigns to editors who don’t understand the technology, they’ll screw something up, make a mess, and that would ultimately create more work for him. After all that he’d be doing it manually anyway!

Further attempts to communicate between teams or even between the managers and this guy went nowhere.

It’s not easy to find a good DBA. We wrote a MySQL interview guide to help and one for an Oracle interview too. The mythical dbas remain in rather short supply.

We weren’t trained for this in engineering school

When you’re looking for a technical solution and you realize the bigger issue is a people problem, what do you do? You can gently bring it up with the higher ups, but they may have a different style, or prefer the shout orders and bark mode of management.

Things continued to go around in circles, and attempts to get further information from this DBA didn’t prove fruitful. He was protective of his domain, and fought tooth & nail to open up.

Things come to a head

The bigger boss, the one above my direct report, one day called me into his office. Actually it was an off day I was just stopping by to check in on progress.

He pulls me into his office. Since I rarely interacted with the guy, my guard was very much down. I thought we’d have a chat about the weather or perhaps who was going to win the world cup.

He proceeds to tear into me without warning. Practically screaming, he’s giving me a piece of his mind and not stopping to hear what I have to say. Where is this project going, why is there no progress, we’ve got serious deadlines, you’re pushing us right up to the wall … that kind of thing.

As I listened to him fire away at me, I realized some of this had gotten filtered through some sources, who didn’t completely understand the blocking issues either. That it wasn’t technical challenges, but rather people and processes that weren’t working. As I began to explain this, he stopped me and said:

[quote]Sean, your job is to push, push, push and push us more. Rock the boat if necessary. When I was a constultant I was constantly running around making sure everyone was talking to eachother[/quote].

A few things ran through my mind at that point. One was well he’s not a consultant, so either he couldn’t last in it, or the life didn’t appeal to him. Or perhaps his skill sets ran truer to management in a large firm, a different albeit tougher role to master.

But it also occurred to me that different folks have very different styles, some like to push, and prefer confrontation & believe that leads to resolution. While others are more listeners and find there way around a problem by giving everyone a chance to voice their positions.

My style is the latter, while his was clearly the former.

The fallout

What ended up happening is a project manager also got assigned to the project, as well as another manager. The PM liked working with me very much, and as things unfolded, much of the departmental siloing began to dissolve, and the bigger communication problems began to surface. From there solutions followed.

Lessons learned

– beware the status quo – some don’t really want to rock the boat
– communicate your position, but beware that others may have a different style
– you may be asked to support a path you see as the wrong one
– let things “break a little bit” so everyone learns the hard lessons
– getting burned can be a lesson for the whole team, and lead to new solutions

Made it this far huh? Grab our newsletter.

Oracle DBAs… You Know You Want MySQL!

If you’re an Oracle DBA or developer and considering migrating an application over to MySQL you probably have a lot of questions. Here’s a five minute summary of what you should know.

Looking to hire a top flight MySQL DBA? Check out our MySQL interview guide.

What is truly delicious

o a command line that’s decades ahead of SQL*Plus
o fast & easy install via yum or apt-get package managers
– saves hassles, headaches, and standarded on linux distribution
o a very active open-source community with tons of roll-your-own solutions
o myriad of replication topologies with endless use cases

Flashback technology

Although MySQL doesn’t provide this by default, a slave set to lag behind by an hour or more can serve this function quite nicely. Perfect DIY solution.

Auditing

Out of the box, MySQL doesn’t have great auditing facilities. However the binary log can be of great assistance here. Use the mysqlbinlog tool to get human readable data and filter for the information you need with your favorite Unix tools such as sed, awk, sort and so forth.

MySQL also includes a general query log which may also be useful for such auditing jobs.

Users & Security

Logins are authenticated by hostname & username. So user@localhost has a distinct set of permissions from user@remoteserver. This can be very confusing so make every effort to create users & grants consistently. Also use –skip-name-resolve option to avoid DNS lookups. These can easily cause scalability problems.

Stored procedures

They’re there in MySQL, but not robust. Use sparingly. They can easily break replication, and cause serious performance and scalability problems. They also make your deployments more complicated – for example mysqldump requires the –routines flag to capture them.

SQL Queries & the cost based optimizer

MySQL uses a cost based optimizer like Oracle’s however it is much less mature. Don’t expect the same level of smarts. Some queries will need special attention.

o there’s no hash join, only nested loops
o you can use one index per table – no index merging
o correlated subqueries with IN() lists don’t optimize well
o not much control over stats collection & freshness

Partitioning

You can partition large tables by range, key or hash. However managing to get your queries just right to do partition pruning may be a trick. What’s more all the added operational benefits you get in Oracle may be limited. You can drop partitions quickly, however reorganizing them will be painfully slow.

Materialized views

MySQL doesn’t offer them per se, but check out Justin Swanhart’s Flexviews for a solution that may give you what you need.

Replication

Oracle standby database is a change based technology. Although MySQL is beginning to experiment with this using row-based replication, the vast majority of deployments are statement based.

Since slaves are hot or read-only, they can be used to horizontally scale your applications read capacity, serve as live backup, makeshift flashback or a logging apply server. Here’s how you can setup replication without downtime using hotbackups.

Be vigilant about performing checksums against your slave instances. This will alert you to any drift or other inconsistencies across your tables or data. We wrote a howto bulletproof replication with checksums.

Clustering

Don’t expect to find something like Oracle RAC. You won’t. For any typical web-facing database needs, MySQL NDB Cluster is most likely not what you’re looking for.

Consider MySQL master-master replication where one is active and one is passive. Configured as circular replication, all changes propagate to all nodes. It’s crucial you only ever write to one node, however. Use the inactive node to do online schema changes, test upgrades or other availability sensitive operations.

Lastly you may try DRBD which is a Linux filesystem level solution. Keep in mind your inactive database will be cold. You won’t experience any replication consistency issues, but upon promotion to master, you may experience slow performance until the caches are warmed up.

A few miscellaneous surprises

o many ALTER functions are blocking operations, so can’t be done online
o replication can be troublesome to manage & drift out of sync silently
o default MyISAM tables can corrupt easily and lose data – use Innodb instead
o complex many table joins & certain subqueries do not perform well
o there is a lot less instrumentation around the optimizer & db engine

Read this far, grab our newsletter!

Where’s my 80 million dollars?

Way back in the heydays of the dot-com boom, the year is 1999.

Join 12,100 others and follow Sean Hull on twitter @hullsean.

I worked for a medium size internet startup called Method Five. When I came on board they were having a terrible time with their site performance.

Website crashing

When I first met the team, I was tasked with performance problems. After all their flagship web property kept crashing, and it didn’t look good to investors. As with most web properties in those days it was a home-grown datacenter in the back of the office, running on Sun Microsystems hardware, with Oracle on the backend and Apache serving webpages.

Also: Why a killer title can make or break your content efforts

Negotiating an acquisition

As it became clearer after day one, the project was particularly sensitive. They were negotiating a huge acquisition by a firm called Xceed Corp. The sticking point? Their crashing website did not sell their technology prowess in a particularly positive light. To say the least!

Read: Why high availability is so very hard to deliver

Investigation

As it turns out the site had all the right players, from systems administrators to a DBA who sat watch over the Oracle systems.

As I dug into the systems, I found a serious smoking gun. It seems the Oracle software was configured to use just 5M of memory out of about 256M free. Just like MySQL, the server must be configured to use available memory upon startup. There are myriad caches and buffers which need to be attended to. By today’s standards these numbers probably sound absurd. Nevertheless the DBA wasn’t familiar with the basic memory settings, and so the system was terribly bottlenecked.

Read this: Why a four letter word divides dev and ops

Problem Solved

We then ordered some urgent changes to the system, configuring all of Oracle’s caches to use up the precious memory available.

Immediate the website unlocks, transactions begin flowing, and webpages are returning quickly. End users pull their noggins off their keyboards, and the executives begin breathing a sigh of relief. The site was literally 1000x faster during peak.

Related: MySQL interview guide for managers and candidates alike

Acquisition

Shortly thereafter the acquisition goes through for a cool 5 million in cash and 80 million in stock.

Where’s my cut?! You might be asking that question. But my policy is almost always defer to something concrete and tangible, aka fees and real compensation. I did not negotiate any stock in the deal.

Another popular war story we wrote A CTO Must Never Do This….

Read: Why devops talent is in short supply

Lesson’s Learned

o Don’t believe received wisdom. Check and double check what’s really happening.
o Use the memory and resources you have available.
o Measure capacity, and isolate bottlenecks in the system
o Decouple services wherever possible
o Problems are as often people and process as they are with technology

Also: 5 more things deadly to scalability

Make it this far? Grab our newsletter!

You're Too Young To Be My Boss

About a year ago I engaged with a firm to do some operations work on their site. They provided services to colleges and universities.

When they first reached out to me, they were rather quick to respond to my proposal. They seemed to think the quote was very reasonable. I also did some due diligence of my own, checking the guy’s profile on the about page. I noticed he was 25, rather young, but I didn’t think much else of it.

We discussed whether they wanted fixed hours. Since those would limit my availability we both agreed a more flexible approach made sense. This worked well for me as I tend to shift and schedule time liberally, so I can be efficient & flexible with clients, but still have a life too.

Trouble Brewing

As we began to interact the first week, I sensed something amiss. My thought was that the first week you work with a client, they feel you out. They see how you work, when you work, how much gets done and so forth. This provides a benchmark with which to measure you. If either party is unhappy with how things are going, they discuss and make adjustments accordingly.

What was happening in this case was the guy started pestering me. I began to get incessant messages on instant messenger asking for updates. I had none. I explained that I would contact him as things were completed, or if I had questions.

This was only two days into the project. I’d barely gained access to the servers!

The Fever Pitch

After discussing my concerns on the phone, the gentleman kind of glossed them over. From there the pestering continued. I explained that I could not be available to him any hour of the day, while the engagement only provided for one half of a week. This began to interrupt me from other client work, so I had to signoff of instant messenger. Not good.

The Pot Boils Over

We spoke again on Monday briefly, and decided to connect the following day. From there the pestering began anew, and I began to lose my patience. I insisted that we speak on the phone before work would continue. I felt the problem was deteriorating and discussing over text would only make things worse.

He emailed me back as I was then offline. In his email he ordered me to come online. While he sat in a meeting, he explained, he could not take a call! Nevertheless he insisted we resolve it during the meeting. Distracted no less.

[quote]It was then that I started receiving text messages on my personal mobile phone from the guy, pestering me to get online so we could resolve our communication problem! You can’t make this stuff up![/quote]

The Fallout

Eventually we did both get on the phone, and I explained I had reached wits end. After only ten short days of working together, we had both set strong precedents and they were obviously not compatible. He asked if I would stay on longer, and reconsider working together, and I said I would think about it.

I chose not to dig a deeper hole, and let him know I wouldn’t be invoicing for previous the weeks work.

The Lessons

o beware age differences – in our case an 18 year gap
o pay attention to management styles – self-starters don’t need micromanaging
o be patient & keep communicating
o allow for an exit strategy that is amenable to both parties

Read this far? You’ll love our newsletter. Get Scalable Startups. No Spam. No Selling..

Why I Wrote the Book – Oracle and Open Source

Back in the late 90’s New York City was deep in the dot-com boom. Silicon Alley was being born, and a thousand internet startups were sprouting. Everyone was hiring, it was an exciting time to work in technology!

Join 11,500 others and follow Sean Hull on twitter @hullsean.

Trend Spotting Circa 2000

As an independent consultant, I had the opportunity to work at quite a few startups. The technology stack was identical at almost all of them. Sun Microsystems hardware, Apache webservers, and Oracle on the backend. The database was always the sticking point, and developers struggled to get their queries right.

It was an interesting role to hold. Most career DBAs worked at large fortune 500 firms, the old stodgy kind where nothing ever changes. Few of the Oracle old guard, the kind you’d meet at User Groups or conferences, had much exposure to Linux, and they certainly didn’t trust it.

Also: Here’s how to do a scalability performance review

Meanwhile in the startup scene in NYC I was seeing the cutting edge uses of the technology, with more and more shops switching to Linux and commodity hardware. There was even talk of *gasp* Oracle porting to Linux. There was a real rumor mill around all of this.

Oracle and Open Source Published – 2001

Seeing this shift towards commodity hardware, and the tremendous demand for Oracle married with open source technologies, I pitched O’Reilly and Associates with a book idea. Let’s talk about what’s happening in the trenches. How and when does Oracle – the most commercial of relational databases, work with Open Source technologies? What is in the mix? What are real firms using it for? What tools and technologies can help firms grow faster?

Related: Oracle DBA Interview questions for managers, candidates & recruiters alike

These were the questions my co-author and I sought to answer, and to judge from the response I think we did a very good job. As that push continued, Oracle eventually ported it’s enterprise database to Linux. This was a seismic shift that meant existing Oracle customers would spend a lot less on hardware, and thus have more to spend on Oracle licenses. Win-win except for Sun. The trend continued with Oracle pushing Apache into the mix as well.

Fast Forward a Decade

Now a decade later, Oracle has bought it’s former partner Sun, and in so doing owns MySQL too.

Read this: Top MySQL Interview questions for Devops, managers & recruiters

What new trends are happening? We hear an incessant drum of hype around cloud computing. In many ways the trend parallels what happened a decade ago. See our related piece a history lesson for cloud detractors. How so?

[quote]Commoditization: push towards new platforms, driven by cost. [/quote]

But this is slowed by an equally large stumbling block.

[quote]Performance: new cloud servers can’t compete with their big iron cousins. Not yet at least.[/quote]

Interested in Amazon EC2? We wrote an Intro to EC2 Cloud Deployments article which digs in deeper.

What’s Next for Datacenters

Commiditization will continue, driving costs downward. This will provide more gravity to cloud migrations for firms big and small.

Performance will improve. Cloud services like Amazon EC2 will get bigger & better, as will the all important network & disk subsystems.

Also: 5 things toxic to scalability

Big enterprises are already dipping their feet in the water with VPC technology, tying their existing datacenter to a cloud. They can grow elastically while still having feet firmly planted on the ground.

As large enterprises begin to get experience behind the wheel, it’ll chip away at the stranglehold of Oracle and the huge taxation type licensing that firms struggle with today. Where salesforce.com had a huge impact, workday.com will be even bigger.

[quote]The cloud will finally disrupt the last old guard industry – enterprise software.[/quote]

Read this far? Get us monthly in your inbox. Grab our scalable startups newsletter!

A CTO Must Never Do This…

A couple years back I was contacted to look at a very strange problem.

The firm ran flash sales. An email goes out at noon, the website traffic explodes for a couple of hours, then settles back down to a trickle.

Of course you might imagine where this is going. During that peak, the MySQL database was brought to its knees. I was asked to do analysis during this peak load, and identify and fix problems. Make it go faster, please!

First day on the job I’m working with a team of outsourced DBAs. I was also working with a sort of swat team chatting on SKYPE, while monitoring the systems closely.

Then up popped one comment from a gentlemen I hadn’t worked with. He insisted there was contention for a little known MySQL resource called the AUTO_INC lock. Since I wanted to know more, I asked who the guy was and to my surprise he turned out to be the CTO.

[quote]The CTO was tuning and troubleshooting the database![/quote]

Wow, that’s a first. I thought I’d seen it all. A CTO is normally overseeing technology & the team rather than crawling around in the trenches on the front line.

This all raised some important points

1. The app was having major growing pains
2. Current architecture was not scaling
3. Amazon elasticity was not helping at the database layer
4. People & process were also failing, hence the CTOs hands on approach

It was shocking to see a problem deteriorate to this point, but when you consider the context its understandable. A company like this is struggling with hypergrowth to such a degree, that each day seems like a hurricane storm. With emergency meetings, followed by hardware & application emergencies, trouble seems constant. It can be very difficult to step back and see the larger picture.

The takeaway from this experience…

o Amazon EC2 can’t do it all – consider physical servers for disk intensive apps
o MySQL still has some real scalability limitations
o use technology for its intended purpose – MySQL isn’t great for queueing
o A CTO tuning the database means problems have deteriorated too far

Read all the way to the end? Grab our newsletter – scalable startups.

Half of NYC Fastest Growing Firms In Tech


If there was ever any question about the exploding tech scene in New York City circa 2012, check out Crains recent 50 fastest growing firms.

#3 Thrillist

City guide & lifestyle site for men.

#5 Usablenet

Provides a powerful platform for content delivery to mobile & tablets as well as laptops.

#8 Admarketplace

A leader in search syndication.

#9 Yodle

From the bloodbath of newspaper classifieds brought on by disrupters like Craigslist & eBay, comes a new breed of online advertiser catering to customers and the digital space.

#15 Telx

Provides datacenter services such as high speed connections, private clouds, servers & networking equipment.

#16 Complex Media

An online style and lifestyle magazine catering to young men.

#18 Wavsys

Finding talented mobile engineers isn’t easy. This firm provides contractors for the big wireless firms.

#19 Return Path

Helps firms with email delivery, by routing around and through spam filters.

#20 M5 Networks

Cloud Based phone systems.

#22 Mitchell Martin

Tech staffing and consulting solutions

#25 Artech Information Systems

Staffing for technology and project management requirements.

#27 Infinity Consulting Solutions

Recruiters for the technology industry as well as finance and accounting firms.

#28 Vicom Computer Services Inc.

Providing high end IBM enterprise solutions to datacenters.

#35 Direct Agents Inc.

A fast growing digital marketing firm.

#37 TTI of USA

Another staffing & consulting solutions firm for technology firms.

#41 Net@work

Bringing better customer service to technology consulting & services industry.

#44 Insys Group

Technology consulting and services.

#46 Travelclick

Provides technology solutions that help hotels increase revenues.

#48 Tekserve

Apple products repair & servicing.

#49 Axispoint Inc.

Business technology services and solutions.

#50 Paradysz Inc.

Email & social media marketing company

Social Scoring with Klout, Kred & PeerIndex

Klout, Kred or PeerIndex… What’s with all these new social scoring sites? I wanted to know the answer, so I picked up a copy of Return on Influence.

Schaefer’s book is not intimidating, it’s a short 180 pages plus a short appendix with a primer on all things social media.

Upon first opening the book, I wanted to get right to the meaty topic first, so I flipped straight to chapter 10 – how to increase your Klout score. I mean that’s really what we want to know right?

It lists three steps:

1. build your network of relevant & related fans

2. build content, create, curate or conversation

3. find influencers and engage

Of course the devil’s in the details.

When I was first exposed to twitter, I would crosspost blog posts or newsletters, then sit back and wonder why no one was clicking on my links! Is it only valuable for brands to send out press releases and speak to loyal customers?

The answer takes time, with trial and error to get the hang of. How do you write punchy 140 tweets that grab people, and make them click? That’s a talent all by itself. Being on point can only get you so far if you can’t shake people up with those pithy one liners.

Mark Schaefer has done a laudable job introducing the world of social media with great page after page of good stories. What’s more the title itself is what grabbed me, flipped Return On Investment on it’s head to the more nebulous social media influence measured by scores like Klout.

Still all those chapters discussing Cialdini’s concepts of social proof and reciprocity are all fine and good, but I learned most of what I needed to know about the topic from the first page.

After big time interactive marketing exec Sam Fiorella learned about Klout the hard way, he set out to improve his score that sat around a lowly 45.

[quote]Fiorella went on a tweeting rampage to increase his Klout score by any means. He was determined that his experience at the ad agency interview would enver be repeated. He carefully studied and tried to reproduce the online behaviors of top-rated influencers. When he spoke at conferences, he made sure that every slide had a tweetable quote aimed at the Klout algorithm and asked attendees to tweet his name throughout the presentation. He engineered his online engagement to attract attention of high Klout influencers who could bend his score upward and filtered his followers by their levels of influence so that he knew which contacts to nurture to affect his score.[/quote]

And what was Fiorella able to push his score to? The Elite level of 70! Wow not bad at all. You might call it gaming the algorithm, or you could call it doing all the things that social media marketing and networking demand. I’d personally be happy to hit the 45 myself!

Marketing isn’t just for the big guys anymore, and as the ranks of freelancers and independent consultants swells, more and more will be looking to improve their influence and reach. Social scoring, like website page ranking is another one of those measures and one of growing importance.

If you read this far, you’ll definitely want to grab our newsletter!

And just for posterity we thought we’d include our Klout badge below. Follow us on twitter we don’t mind the attention!

One more thing… if you love the newsletter, check out our archives. They go back 7+ years!

Accidental DBA's Guide to MySQL Management

problem solvingSo you’ve been tasked with managing the MySQL databases in your environment, but you’re not sure where to start.  Here’s the quick & dirty guide. Oh yeah, and for those who love our stuff, take a look to your right. See that subscribe button? Grab our newsletter!

1. Installation

The “yum” tool is your friend.  If you’re using debian, you’ll use apt-get but it’s very similar. You can do a “yum list” to see what packages are available. We prefer to use the Percona distribution of MySQL.  It’s fully compatible with stock MySQL distribution, but usually a bit ahead in terms of tweak and fixes.  Also if you’re not sure, go with MySQL 5.5 for new installations.

$ rpm -Uhv http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm
$ yum install Percona-Server-client-55
$ yum install Percona-Server-shared-55
$ yum install Percona-Server-shared-compat
$ yum install Percona-Server-server-55

The last command will create a fresh database for you as well.

Already have data in an existing database? Then you can migrate between MySQL and Oracle.

2. Setup replication

MySQL replication is a process you’ll need to setup over and over again. It’s statement based in MySQL. A lot of INSERT, UPDATE, DELETE & CREATE statements are transferred to the slave database, and applied by a thread running on that box.

The steps to setup are as follows:

A. lock the primary with FLUSH TABLES WITH READ LOCK;

B. issue SHOW MASTER STATUS and note the current file & position

C. make a copy of the data. You can dump the data:

$ mysqldump -A --single-transaction > full_primary.mysql

Alternatively you can use xtrabackup to take setup replication without locking!

D. copy the dump to the slave database (scp works, but rsync is even better as it can restart if the connection dies).

E. import the dump on the slave box (overwrites everything so make sure you got your boxes straight!)

$ mysql < full_primary.mysql

F. point to the master

mysql> change master to
> master_user='rep',
> master_password='rep',
> master_host='10.20.30.40',
> master_log_file='bin-log.001122',
> master_log_pos=11995533;

G. start replication & check

mysql> start slave;
mysql> show slave statusG;

You should see something like this:

Slave_IO_Running: Yes
Slave_SQL_Running: Yes

3. Analyze slow query & tune

If you’re managing an existing MySQL database and you hit a performance blip, it’s likely due to something that has changed. You may be getting a spike in user traffic, that’s new! Or you may have some new application code that has been recently deployed, that’s new SQL that’s running in your database. What to do?

If you haven’t already, enable the slow query log:

mysql> set global slow_query_log=1;
mysql> set global long_query_time=0.50;

Now wait a while. A few hours perhaps, or a few days. The file should default to

/var/lib/mysql/server-slow.log

Now analyze it. You’ll use a tool from the percona toolkit to do that. If you haven’t already done so, install the percona toolkit as well.

$ yum install percona-toolkit
$ pt-query-digest /var/lib/mysql/server-slow.log > /tmp/server-report.txt

Once you’ve done that “less” the file, and review. You’ll likely see the top five queries account for 75% of the output. That’s good news because it means less query tuning. Concentrate on those five and you’ll get the most bang for your buck.

Bounce your opinions about the queries off of the developers who build application code. Ask them where the code originates. What are those pages doing?  Check the tables, are there missing indexes? Look at the EXPLAIN output. Consider tuning the table data structures, multi-column or covering indexes. There is typically a lot that can improve these troublesome queries.

4. Monitoring command line tools

You’ll want to have a battery of day-to-day tools at your disposal for interactive monitoring of the database.  Don’t go overboard. Obsessive tuning means obsessively turning knobs and dials. If there are no problems, you’re likely to create some.  So keep that in mind.

innotop is a “top” like utility for monitoring what’s happening inside your little database universe.  It’s probably already available through yum and the “epel” repository:

$ yum install innotop

First edit the .my.cnf file and add:
[client]
user=root
password=mypw

From there you should be able to just fire up innotop without problems.

mysqltuner is a catch all tool that does a once over of your server, and gives you some nice feedback.  Get a copy as follows:

$ wget mysqltuner.pl

Then run it:
$ chmod +x mysqltuner.pl
$ ./mysqltuner.pl

Here are a couple of useful mysql shell commands to get database information:

mysql> show processlist;
mysql> show innodb status;
mysql> show status;

There is also one last tool which can come in handy for reviewing a new MySQL server. Also from percona toolkit, the summary tool. Run it as follows:

$ pt-summary

5. Backups

You absolutely need to know about backups if you want to sleep at night. Hardware and database servers fail, software has bugs that bite. And if all that doesn’t get you, people make mistakes. So-called operator error will surely get you at some point. There are three main types:

A. cold backups

With the database shutdown, make a complete copy of the /var/lib/mysql directory, along with perhaps the /etc/my.cnf file. That together amounts to a cold backup of your database.

B. hot backups

There has been an enterprise tool for MySQL that provides this for some time. But we’re all very lucky to also have the open source Percona xtrabackup at our disposal. Here’s a howto using it for replication setup.

C. logical backups

These will generate a file containing all the CREATE statements to recreate all your objects, and then INSERT statements to add data.

$ mysqldump -A > my_database_dump.mysql

6. Review existing servers

The percona toolkit summary tool is a great place to start.

$ pt-summary

Want to compare the my.cnf files of two different servers?

$ pt-config-diff h=localhost h=10.20.30.40

Of course you’ll want to review the my.cnf file overall. Be sure you have looked at these variables:

tmp_table_size
max_head_table_size
default_storage_engine
read_buffer_size
read_rnd_buffer_size
sort_buffer_size
join_buffer_size
log_slow_queries
log_bin
innodb_log_buffer_size
innodb_log_file_size
innodb_buffer_pool_size
key_buffer_size (for MyISAM)
query_cache_size
max_packet_size
max_connections
table_cache
thread_cache_size
thread_concurrency

7. Security essentials

The output of the pt-summary and mysqltuner.pl scripts should give you some useful information here. Be sure to have passwords set on all accounts. Use fewer privileges by default, and only add additional ones to accounts as necessary.

You can use wildcards for the IP address but try to be as specific as possible. Allow for a subnet, not the whole internet ‘10.20.30.%’ for example instead of just ‘%’.

Also keep in mind that at the operating system or command line level, anyone with root access can really mess up your database. Writing to the wrong datafile or changing permissions can hose a running database very quickly.

8. Monitoring

Use a monitoring system such as Nagios to keep eye on things.  At minimum check for:

A. connect to db
B. server load average
C. disk partitions have free space
D. replication running – see above IO & SQL running status messages
E. no swapping – plenty of free memory

9. Ongoing maintenance

Periodically it’s a good idea to review your systems even when they’re running smoothly. Don’t go overboard with this however. As they say if it ain’t broke, don’t fix it.

A. check for unused & duplicate indexes
B. check for table fragmentation
C. perform table checks (if using MyISAM)

10. Manage the surprises

MySQL is full of surprises. In the Oracle world you might be surprised at how arcane some things are to setup, or how much babysitting they require. Or you might be surprised at how obscure some tuning & troubleshooting techniques are. In the MySQL world there are big surprises too. Albeit sometimes of a different sort.

A. replication checksums

One that continues to defy my expectations are those surrounding replication. Even if it is running without error, you still have more checking today. Unfortunately many DBAs don’t even know this!  That’s because MySQL replication can drift out of sync without error. We go into specific details of what things can cause this, but more importantly how to check and prevent it, by bulletproofing MySQL with table checksums.

B. test & confirm restores of backups

Spinup a cloud server in Amazon EC2, and restore your logical dump or hotbackup onto that box. Point a test application at that database and verify that all is well. It may seem obvious that a backup will do all this. But besides the trouble when a filesystem fills up, or some command had the wrong flag or option included. There can be even bigger problems if some piece or section of the database was simply overlooked.  It’s surprising how easy it is to run into this trouble. Testing also gives you a sense of what restore time looks like in the real world. A bit of information your boss is sure to appreciate.

If you made it this far, you know you want to grab the newsletter.

The Art of Resistance

Sometimes, you have to be the bad guy. Be resistant to change. Here’s a story about how stubbornness pays off. As we’ve written about before A 4 letter word divides Dev & Ops.

I had one experience working as the primary MySQL DBA for an internet startup. Turns out they had Oracle for some applications too. And another DBA just to handle the Oracle stuff.

So it came time for Oracle guy to go on vacation. Suddenly these Oracle systems landed on my shoulders. We reviewed everything in advance, then he bid his goodbyes.

Almost as soon as he was out the door I started getting requests to change things.
“Oh we have to add this field”, or “oh we’d like to make this table change”.

I resisted enough to hold off development for a week.

“We’re now getting more heat from the CTO. Apparently certain pages on the site don’t save some very important content properly. It’s costing the business a lot of money. We need to make this change.”

My response –

[quote]No I can’t sanction this. If you want to do it, understand that it could well break Oracle’s replication.[/quote]

Oracle’s multi-master replication notoriously requires a lot of baby sitting. We held off for a few more days, and the Oracle DBA returned from his much needed vacation.

When discussing it afterward he said…

[quote]Am very glad you didn’t change those fields in the database. It would indeed have broken replication and caused problems with the backups![/quote]

Moral of the story…

  • apply the brakes around tight turns
  • don’t be afraid to say, we’re not going to do this before testing
  • you may have to say – we’re not going to do this at all…