StableBit DrivePool – Data Consistency

More info / download: See the wiki

Continuing on from my last post about StableBit DrivePool BETA M2, a disk pooling application for the Windows Home Server 2011, let’s talk about data consistency.

We all know that hard disks and SSDs go bad, it’s just a matter of time. DrivePool BETA M2 has built in mechanisms to detect read errors / write errors and data duplication errors and offer a fix in the form of a wizard for each condition.

In addition to fault detection and resolution, DrivePool’s folder duplication engine is designed to keep files in duplicated folders safe from a single hard drive failure, in real-time. In many cases you can continue reading or writing to a file even in the face of drive failure.

Read Errors

(pictured above)

Whenever a read error occurs on one of the drives in the pool, DrivePool remembers which file was being read and immediately marks that drive as Unhealthy. If the file that had the read error is a duplicated file, then DrivePool will attempt to read from the duplicated part, in order to satisfy the original read request.

If the file is not duplicated, or if both parts of a duplicated file fail the read, then and only then is the error returned to the caller.

This means that files in duplicated shares are resilient to read errors.

Wizard

To get rid of the Unhealthy status you will need to complete the read error wizard pictured above. It will try to read the damaged file in its entirety. If it can’t, then it won’t remove the Unhealthy status of the drive. In that case, you can try to recover the file (perhaps using the StableBit Scanner for Windows Home Server 2011 – in development) or delete the file.

Write Errors

When DrivePool detects a write error on one of the drives in the pool it will remember the file and mark that drive Unhealthy, just like when it encounters a read error. If the file is in a duplicated share, DrivePool will try to continue writing to the other duplicated file part and not interrupt the write process. The bad file part that couldn’t be written to is taken out of service and given a .bad extension. The file will show up in normal directory listing and it’s up to you to decide what to do with it.

Write errors on non-duplicated folders mark the drive Unhealthy and return an error to the caller of the write.

Limitations

Under some circumstances, a write error on a duplicated folder will not be able to take the damaged file part out of service (file may be locked, or MFT is damaged). In this case, the write will error out to the caller. This should be rare, and under most circumstances DrivePool will be able to recover the write. If this does happens, the write error wizard will restore proper folder duplication once completed.

A file in a duplicated folder that is missing a duplicated part, like when there’s a write error, can still be written to and read from. You can delete it, but you can’t rename it. This limitation will be lifted once you complete the write error wizard.

Wizard

Same as for read errors, to clear the Unhealthy status of a disk that has experienced a write error, you will need to complete a wizard. In this case, the wizard verifies the file, makes sure that it’s duplicated properly and that the bad part is taken out of service.

Duplication Errors

Duplication errors should not happen. DrivePool always maintains two copies of every file in a duplicated share. However, if something (or someone) has manually modified files on the pool part, then duplication status can be compromised. But that’s not a problem, DrivePool will detect this condition and offer a fix.

Whenever you open any file on a duplicated share, DrivePool checks to make sure that its duplication status is good. If there is a problem with one of the file parts, then it marks the drive Unhealthy and offers you a wizard to re-duplicate the problem file. This check does not take extra time, because DrivePool has to locate and open both file parts anyway, so it’s not expensive in terms of overhead.

Wizard

Run the duplication errors wizard to check all the files on the problem drive. Even though DrivePool might have detected a duplication error on one or more files, it will re-check the entire drive just to make sure that all the other files on it are duplicated properly.

Summary

StableBit DrivePool is resilient to read and write errors on duplicated folders. It tries very hard to satisfy the request without returning an error. It flags the appropriate drive and file on each error, so that you are aware of the problem as soon as it happens. DrivePool has wizards designed to recover from each type of error, even duplication errors.

As we’ve seen in the last post, DrivePool also keeps track of known and foreign disks. It makes sure that each file is duplicated on each foreign disk before lifting the Unhealthy status from it.

It is a clearly defined system, working to get your pool back on its feet when things go wrong.