Home > SQL > Leaving a trail of bread crumbs

Leaving a trail of bread crumbs

A few weeks ago I upgraded the hardware that my most critical SQL Cluster lives on.  I upgraded to a total of 12 cores and 72GB RAM for each node, and I did this without an OS rebuild and with little notice to my customers.  I planned for just enough down time to reboot each node 3 times. 

Sounds a little bit fool hardy doesn’t it? 

Normally I would not be so brazen and would never put a production system at risk with untested hardware upgrades.  Certainly not in the first week of December which happens to be the busiest month of the year for the business.  So, what on earth would possess me to risk stability just to get a bit more processing power and memory?   I went with it because my recovery plan was simple, well tested, and quick. 

You see, this cluster happens to run on an Egenera Bladeframe system.  I knew that because this cluster was on the Bladeframe migrating it from one set of blades to another truly is as simple as a reboot, and in some cases (like this one) a driver installation.   This is not going to be a post that sings the praises of Egenera – although if you are interested I can wax poetic about their awesomeness all day long; rather this is about your recovery plans, and why spending a bit of time thinking about how to revert to your ‘pre-modification’ state is a good thing. 

As I’ve mentioned previously we make frequent changes to our ERP system and that system is highly integrated with other business systems.  This makes recovery planning difficult, and at times the only way I can see to get back to our ‘pre-change’ state is to perform a full database restore of the impacted databases.  This isn’t a happy thought – my best restore time to date is 3 hours, and that’s 3 hours that we can’t take orders from our customers.  Sometimes my recovery plan is as simple as creating a copy of a table and saving it in a staging DB to be put back in a hurry if needed.   Regardless of the complexity or simplicity of the plan I make sure to have it written down, tested, and I communicate it to the business so that they are aware of how long it will take to revert should we need to.  I also make sure that I know where my point of no return is in any upgrade, and that my recovery process is tested, tested and tested again.

Categories: SQL Tags:
  1. January 4, 2011 at 9:15 am

    I would actually love to read more about how you use Egenera, if you’re interested in writing it! Great post.

  2. January 4, 2011 at 9:16 am

    That’s a good point! We certainly need to work on our back-out/recovery plans. They almost always come down to “restore from backup” which, while simple, is slow and agonizing. I’m a process person by temperament and find that process to be less than ideal. Here’s hoping to better process in 2011!

  3. January 4, 2011 at 9:23 am

    I recall @Kendra_Little making a point of having a tried and true recovery strategy during her SQL Sat Iowa presentation on Agile development.
    Seriously folks, you’re only as good as your recovery process!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: