The Myth of Raid Failure

It is often believed that the chance of two disks failing together is  very rare, probably  one in a million or almost unlikely to happen. In fact, such failure may  only  be caused by physical environmental factos such as fire or external impact. On that basis, engineers often rely heavily on Raid 5 server to protect the mission-critical data, with the  belief that that such system will never fail completely  as Raid 5 could withstand single disk failure without losing data. Ironically, things may not turn out this  way.

The fact of the  matter is the  array of disks are actually  subject to same working  conditions, operating environments and probably the same manufacturing batch. When  one disk fails, the other is  likely to follow suit quickly, often within  a few hours or days. Before the engineers could repace the faulty disk to attempt a rebuild, other disks may suddenly fail, frequently one after the other, resulting in complete disaster.

Another  point to consider is during RAID  rebuild, the remaining  disks are subject to very intensive read operations. This will hastern  the  failure of remaining  disks which  has nearly come to the end of life.

Very often, the engineer may not perform the rebuild process correctly. This  happens  when the  wrong  disk is replaced with  new disk. The failed disk in semi working condition  may  now  kick in for the  rebuild process, resulting in irreversible data corruption - the infamous botched rebuild.

This may also happen when a number of disks are taken out of the raid box for inspection, after that the disks are not put back to their original position before starting a rebuild.

It is  common that the RAID 5 server may be configured with a high number of disk elements (more than 5). This is not really advisable as  the  greater the  number of disks, the greater the  chance of multiple disk failures happening  one after the other, resulting in loss of volume as a result.

So you need to know that there is a high chance that your raid volume will fail ! If you do not have sound backup plan because you depend on RAID as the "data storage cum backup", when disaster strikes eventually,  you can only hope to recover the data from the failed RAID volume.

To prepare the failed RAID server for data recovery, one must do it right to avoid further damage to data.
