Misunderstanding Disaster Recovery Can Be Disastrous

A friend of mine sent me an article from LifeHacker about the JournalSpace website being wiped out at the beginning of January, 2009.  Jeff Fitzpatrick’s blurb, Hard Lessons in the Importance of Backups: JournalSpace Wiped Out, talks briefly about what happened, and reminds us all to backup our data.

But, what happened to JournalSpace is more than just a failure to backup  data – it’s a perfect example of misunderstanding disaster recovery – or DR, for short. Good IT people will tell you, it’s not just about doing backups.  Although that’s a big part, it’s really about knowing how to recover from whatever catastrophic event – the disaster – and what tools you will use to help you.  In this article, I want to talk about a few of the ways you can protect your data, and some scenarios where they are good and bad.

Backups, and Mirrors, and Syncs … OH MY!

Deciding how to back up your data – or as we say in the biz, Disaster Recovery Planning – is critical to business IT planning.  But, as JournalSpace recently found out – the hard way – there are lots of options, and understanding them and how they work together can make a huge difference in whether or not your information is protected… or a disaster waiting to happen.

Disaster Recovery Planning … is critical to business IT

To sum up what happened to JournalSpace, they had a set of mirrored drives holding their key company database – but no backup.  And, when someone accidentally deleted that database, they assumed the mirror would have it… they were wrong.  So why the mirror didn’t have the data?  Isn’t that what it was for?

Unfortunately, that’s now how mirrors work. Obviously, as just about everyone has said, they should have had a backup – but in addition to the mirror. Again, why?

The Disasters We Plan For

Before we get into why certain DR tools work the way they do, it’s important to understand the kind of disasters – what us IT-folk, and your insurance guy call “risks” – that can affect your business. If we start to make a list of all the things – both big and small – that can cause your business to lose money, we come up with a pretty big list. Even just having your people unable to work for a few hours can be VERY expensive – lost sales, wages paid while employees sat around yakking with each other, etc.  A buddy of mine always liked to add, “A meteor strikes the data-center.”  It’s funny, but you might just as easily have said, “The building burns down.” – a much more likely risk.

So, the important question is, what kinds of things can happen that would cause you to either (a) lose money, or (b) go out of business.  I’m going to give you the answer to this test, because there are only a few “umbrella” categories – what I think of as Levels – for the types of risks we normally face in DR planning, they are:

  1. Someone loses a critical piece of information (a-la JournalSpace).
  2. A key component – like a hard drive – fails in a server or system.
  3. An entire system/server fails or becomes inaccessible.
  4. A catastrophic failure, or disaster, affects an entire location… “A meteor strikes the data-center.”

It’s important to note that these aren’t ALL the risks that could affect your business data.  Most other data risks come under the heading of “Security” rather than DR – things like hacking, and theft. Plus, the specific ways in which just each of these can happen would be too much to list.  But, covering these bases will get you to a pretty solid DR plan.

DR Building Blocks

I’m going to explain – in non-techie terms – what three key disaster recovery methods are, and what they’re good for.  I’m also going to try explain what some of the weaknesses are in each method, how you can overcome them, and how they can work together to protect your business.  This isn’t meant to be an in-depth, or even comprehensive technical lecture on DR, but an introductory explanation of the key tools available. Let’s start with backups.

Backups

I can’t emphasize this enough: BACKUP YOUR DATA! In fact, if you can only use one of the three methods I describe in this post, make it a backup.

The concept here is simple: A backup is just a copy of your data that you keep somewhere else.  It’s no different from making a photocopy of an important document and keeping it in a file vault.  With data, we do the same thing – we copy it off of our servers or workstations, and keep it somewhere else for safe-keeping.

Backup is really the backbone of any DR plan, and can be critical in every risk category.  Backups are also the first place to go when somebody deletes something important – a Level 1 risk.

A backup is just a copy of your data that you keep somewhere else.

In server environments – like your office – backup usually means Tape Backup.  Just like it sounds, you copy your data off the server’s hard disk, and onto some type of digital tape.  Then, if you’re smart, you take the tape somewhere safe – away from the servers it copies.  Why use tape?  Well, for starters, tapes are relatively cheap.  In fact, for most of the history of computer networks, they have been the CHEAPEST storage media.  The problem with tape, is it’s S L O W… I mean REALLY SLOW by any modern standard.  And with the amount of data we keep in the average business environment right now, tape is becoming less and less the go-to backup solution.

The good news is, disk-based solutions are getting cheaper and cheaper.  They’re fast, reliable, and now competitive with tape in terms of cost to implement.  The speed isn’t just for backup – it translates directly to recovery speed, as well (i.e. how quickly you can get that data you need FROM the backup).  Plus, with removable hard drives, you can backup to disk, and still take your data off-site.  Most modern data-centers use disk-based backup in their DR operations, and I recommend them for most small businesses, too.

So why would I ever use tape?  Cost. Tape is still king when it comes to cheap, long-term, data storage.  While disk-based backup drives are often cheaper than tape drives, the media – the disks themselves – are usually at least 5x more expensive.  Modern data-centers often use disk-based backup for speed, and then copy the disk-based backup to tape for long-term, off-site storage – a strategy called Disk-to-Disk-to-Tape.

No matter how simple or elaborate your backup strategy is, just do it.  Backup, and backup often!  You’ll thank me when that huge proposal you finished yesterday gets deleted today… but your automated backup caught it over-night.

Mirrors & RAID

No cockroaches here – RAID stands for “Redundant Array of Inexpensive Disks”, it’s a complex technology that allows multiple cheap disks to be used together.  There are lots of RAID strategies, but using two disks that are exact copies of each other in the same system – what we call a Mirror – is one of the simplest and cheapest ways to protect your data.  I know you’re thinking, “JournalSpace had a Mirror, and it didn’t help them!”  But, remember our risk list?  JournalSpace had a Level 1 risk, and I mentioned earlier they misunderstood how mirrors work.  That’s because all forms of RAID – mirrors included – are strictly for protecting against Level 2 risks!

A Mirror works like this:

  1. Data is written to or deleted from a disk.
  2. The same action is performed on the same piece of data – in fact, the same location on disk – for its mirror.
  3. If either disk stops working – fails – the other drive can work by itself, and the system keeps running.
  4. When the failed drive is replaced, the data from the good drive is mirrored to the new drive to rebuild the mirror set.

As an added bonus, any data needed can be read from either disk – so reading information, like program files, is much faster.  Notice, that deleted data is also mirrored!  This is what JournalSpace didn’t understand, so when the data was deleted, they thought it would still be on one of the drives in the mirror.  Instead, the data was deleted from both.

So why use RAID?  Insurance. It’s simple really.  If you’ve got good backups, you can recover from a failed hard drive.  But what if you don’t have a new one handy to put in place of the bad one?  And, even if you do, it can take HOURS to fully recover a downed system.  Now, if you’re a small business – that’s under 500 people, per U.S. Government standards – and it’s your only server, or you only server for that particular need (think email, or order processing), start adding up how much it will cost to pay your employees to sit around doing nothing while you recover your failed server?  For the cost of a RAID card, and an extra hard drive or two, if a hard drive crashes, the server keeps running! No lost time, no lost productivity.

Use RAID in your server(s), and a good automated backup, and you’ve got great protection for DR Risk Levels 1 through 3.

Replication & Synchronization

OK, so you’ve got a good backup, and you’re running a Mirror RAID set (or better), but you’ve heard about some more high-end stuff you can do to protect your data, right?  You’ve heard some cool terms like “Hot Site” and “Remote Backup”.  There are lots of other things you can do to help protect your data, but the more advanced things boil down to some variation of Replication or Synchronization.  Replication (one-way) and Synchronization (two-way) are advanced ideas, but when it comes to DR, they’re important to understand.  The basic idea is really pretty simple, and it answers the question: How do we get our data someplace else but keep it exactly like it is here?

Replication is a one-way transfer of information.  Basically, it’s just like a Mirror – only you’re not copying to another disk, you’re copying to completely different system – when you save data, that data is copied over to another system or storage location as quickly as digitally possible.  Remote Backup solutions replicate your data over to a server that’s somewhere other than your office or data-center – a remote location.  A “Hot Site” replicates your entire environment to a remote location that can be brought online quickly if a meteor hits your main office.  Synchronization is just like Replication, only it works in both directions – so any information applied to either the local data or the remote data affects the other location.

How can this help?  Well, Replication & Synchronization have wide, and varied uses.  Like backups, if configured properly, they can assist with all risk levels.  Replicating databases can improve performance for certain systems, and help keep critical systems running.  Replicating certain kinds of changes – like file creation and changes, but not file deletes – can help in recovering lost information.  And, in many cases, use of Replication or Synchronization as a DR tool allows for faster recovery than Backups do.  It’s less important that you know all the different types of Replication and Synchronization tools out there, as long as you get the basic concept behind what it’s doing with your data.

The down side, of course, is that just like with a mirrored drive, if bad data gets written (or good data erased) that change can get replicated – unless you’ve taken special steps t prevent it.  These are advanced, and very powerful tools, but VERY easy to mess up.

Parting Thoughts

The information here should give you a good foundation for basic DR.  If your budget allows, you should take advantage of all of these tools, but remember what I said at the beginning: If you can only use one of the three methods I describe in this post, make it a backup!

Links in this post

Leave a Reply