What's That Noise?! [Ian Kallen's Weblog]

All | LAMP | Music | Java | Ruby | The Agilist | Musings | Commute | Ball
Main | Next day (Sep 21, 2004) »

20040920 Monday September 20, 2004

The Optimal Time to Optimize in Operations

One of the recent hassles I've had recently was with a hardware migration that needed to proceed quickly. The clock was ticking down on the disk capacity utilization on some key database hosts. Now suppose one of the sysadmins wanted to perform "preventitive fsck's" and "table consistency checks" -- when you're dealing with over 100 GB (closer to 200 GB, actually) of data, these are not quick propositions. In fact, they might take days. Ergo, just not feasible. Given the time, would it be optimal to sanity check every subsystem's functionality? Perhaps. But when struggling to beat the clock, you just gotta say, "Not now, Poncho!" Sometimes the only effective action is fast action.

First of all, the only times I've ever needed to do a reiserfsck has been after a cold power loss (and reiser is usually fine even after one of those). So the fact that this sysadmin wanted to do a reiserfsck "preemptively" made even less sense. As far as doing a table consistency check, with innodb this is never needed on an anticipatory basis. In my experience, innodb either is able to keep itself consistent with its own journaling or it's just hosed... not a lot of grey in between. Again, the only exception has been in cases of a cold power loss. Sure, sometimes other hardware problems, low level disk defects, will manifest themselves as problems with the filesystem or a database's data file. But usually there are other indicators as well (kernel complaints in syslog, etc). But even with the dependency stack accounted for and checked, it's no guarantee against failure.

Sometimes the optimal course is just the fastest one between where you are and where you need to be. Choosing the deliberate and cautious route, dwelling on unnecessary optimizations, may in fact be the slow and steady road to.... failure! In this case, if we'd followed the course of doing every unnecessary system check possible, we'd have run out disk space and crashed these particular databases.

Stop optimizing. Just shut up and get it done already.

( Sep 20 2004, 11:51:27 PM PDT ) Permalink