StableBit DrivePool – Pool Consistency

Download: See the wiki

Managing a pool of disks that hold all of your data raises some interesting questions regarding pool consistency. Namely, what happens if one of the disks go missing? What happens if you need to re-install the OS? What happens to your data if one of the disks starts going bad, how does folder duplication protect your files in this case? These are all very good questions and I think the answers have to be very clear. Because it is a different system, and it’s not obvious how it would work in case of failures. After all, it’s not just enough to say that there are 2 copies of every file for redundancy if you can’t read or write that file in case of failure. Then the redundancy would be kind of useless.

In this post I will talk about what DrivePool BETA M2 brings to the table to ensure pool consistency.

Design mantra

Before going into specifics, I’d just like to say that you don’t need to know any of this to use DrivePool properly. DrivePool will detect special conditions that can compromise your data (missing disks, damaged disks, etc…), it will issue a Windows Home Server alert and flag the disk as Unhealthy or Missing in the Dashboard. It will also offer you a way to fix the problem by running a Wizard.

The idea is, if everything is ok then all the pooled disks will be tagged as Healthy. But if one of the disks has some sort of problem, its status will change. For example, when you accidentally unplug a pooled disk from the system, it will be flagged as Missing.

Generally, DrivePool will not automatically correct problems in the background without informing you, unless it is absolutely necessary to do it in real-time.

Missing disks

So you’ve dropped a disk by accident while is was running. After the initial shock wears off (pun), the thought occurs, what if now you can’t access your other data on the pool because one of  the disks is dead?

Rest assured, not with DrivePool. You pool remains accessible for read and write access across all folders, duplicated and not duplicated, even in the face of missing disks. However, there are some limitations in place to ensure data consistency.

  1. If you don’t have enough physical disks / disk space left to store duplicated files, write access will be denied to duplicated folders.
  2. Files with duplicated parts that were left on the missing disk cannot be moved or renamed until the missing disk is either re-connected or removed permanently.

In addition, DrivePool BETA M2 introduces missing disk management.

If it ever becomes impossible for you to reconnect a missing disk, you have the option of removing it from the pool permanently. This does 2 things. First, DrivePool re-duplicates any duplicated files that were on that disk. Second, DrivePool forgets it has ever seen this disk.

Un-duplicated files that were on that disk are of course lost.

The time it takes to complete this wizard depends on the number of duplicated files that were on the missing disk. It’s worth noting that DrivePool doesn’t have a record to keep track of duplicated files on each disk (using SQLite or some such engine). It’s one less thing to go wrong, and it means that you can’t loose duplication status because of database corruption.

Accessing the pool after an OS re-install or on a new system

First, let’s consider the best case scenario, a normal hard drive with a basic NTFS volume on it that is not part of any pool.

What happens to your data files on this hard drive if you ever re-install the OS? Obviously, they are never touched, so you would have no trouble accessing them after an OS re-install or from a different system.

I think this is a worthy goal for DrivePool to try to attain. The key here is of course the fact that the OS does not need to know anything extra about the hard drive in order to access the files on it. Specifically, that there is no external information in the registry or anywhere that is required for your OS to read a standard NTFS basic volume.

From the beginning DrivePool was designed with this idea in mind. No required external metadata in order to access the pool. This is still true with BETA M2.

This means that you can take all of your pooled drives, disconnect them from one server with DrivePool, connect them to another server running DrivePool and voila. DrivePool will automatically detect those drives as pooled drives and include them in the pool. Instant access to your files, it just works. This is of course in addition to the fact that your files are stored as standard NTFS files anyway, so even if you didn’t have another server with DrivePool installed, you could just access them from each pooled drive individually using any Windows machine.

What about shared folders? Any shared folders that were part of the old pool and that are not part of the new pool will be re-created for you automatically with default permissions. You can change the permissions to those folders from the Dashboard (this may change with BETA M3 because the permissions will be part of the pool).

I’ve said that there is no required metadata, but that doesn’t mean that there is no metadata at all. One of the things that DrivePool BETA M2 keeps track of about your pool is which disks are part of the current pool and which ones are new or foreign. When DrivePool sees a foreign disk, it flags it as Unhealthy in the Dashboard and gives you the option of running a wizard to verify the disk. This is because we don’t know the duplication status of the files on that new disk. For all we know, some files on it might be un-duplicated.

The time it takes to complete the foreign disk wizard depends on the number of un-duplicated files on that disk. If all the files are duplicated then it will take seconds. One interesting scenario is if you’ve added a single foreign disk to a system and the foreign disk has duplicated folders on it. In this case you will need to add another disk to the pool before the foreign disk can be validated. There are other edge cases, such as when there is not enough disk space left to duplicate the new files, and those are handled appropriately when encountered.

The bottom line is, just plug in a pooled disk to any machine with DrivePool installed and it will be automatically made part of the pool. You can even do this while DrivePool is running with USB disks and such. At the same time, DrivePool will ensure proper file duplication consistency.

Which brings us to the next point of interest. What happens if the actual physical disk is going bad, how does file duplication protect you in that case?

I’ll tackle that one in the next post.