StableBit DrivePool 1.2 BETA is now out.
Get it here: http://wiki.covecube.com/DrivePool_Development_BETA
- New file balancing system.
- Supports multiple configurable file balancers. Each controlling file placement and organization on the pool.
- Balancers can be 3rd party plug-ins.
First, I should mention that the new balancing infrastructure is designed to operate completely automatically with no user setup required. In fact, the default settings will operate almost exactly like the DrivePool 1.1 file balancer, with some additional balancing features.
The bottom line is, if you don’t really care about tweaking balancing, you can completely ignore the new balancing system and DrivePool will continue to work just as it should.
The new balancing UI can be accessed from the DrivePool -> Disks tab by clicking Balancing settings…
What is a Balancer?
In the Balancing window, a balancer is represented by a single item in the list of the left.
In short, a balancer is responsible for moving pooled files around the individual pooled drives. Keep in mind that this is only happening on the pooled physical hard disks and you will not notice any difference in file placement on the pool drive itself.
Why do we Balance?
There are many reasons why you might want to reorganize pooled file distribution. We can re-balance to optimize the pool for new duplicated file creation, existing file growth or new non-pooled file creation.
If a drive is showing signs of wear, you might want to immediately move any un-duplicated files off of it. Perhaps you would like to optimize for heat management by always placing new files on the coldest hard drive. We can optimize for performance by always placing new files on the fastest disk and moving them off at night to other disks. These are just some of the things that are possible with this new balancing framework.
The built-in balancers that come with DrivePool are designed to keep the pool working at optimal efficiency. In addition, anyone will be able to write a balancer plug-in to do whatever they’d like (more information on this later).
Duplication Space Optimizer Example
Consider that you have 1 hard drive part of the pool, a 1 TB drive:
You fill it up to capacity. Then you decide to add another hard drive to the pool, a 2 TB drive:
Now that you have a 3 TB pool with 2 drives you decide to store some duplicated files on it. But in fact, with the current file distribution, you can only store 10 GB worth of duplicated files on the pool. That’s because every duplicated file needs to be on 2 separate physical hard drives, but there is only 10 GB free of the first drive.
The solution is to move all the pooled files from the first drive onto the second drive. Which would end up looking like this:
Now you can store 1 TB of duplicated files on the pool.
This particular file placement optimization is the job of theÂ Duplication Space Optimizer.
It has no settings to tweak because it’s fully automatic. It calculates if your current file distribution is not the optimal layout for duplicated files, and figures out which files need to get moved where in order to optimize the pool for duplicated file placement.
The duplication space optimizer handles the above scenario and many more that are much more complicated. It uses some proprietary algorithms to examine any number of hard drives with any combination of duplicated, un-duplicated and un-pooled files on them in order to compute an optimal balancing model. It balances files only when needed and only what’s needed.
Order of Priority
DrivePool’s balancers are organized in terms of priority. Some balancers are more important than others, and what they say will override any lower balancer whenever there is a conflict of intent.
If you’d like, you can re-order the balancers in the Balancing window and a new balancing model will be computed after you click save.
The Balancing Model
Simply put, the balancing model is a map of where your files are vs. where your files need to go. You can see in the screen above the little arrows on top of each bar representing where DrivePool wants to move your files in relation to where they are now.
The balancing model is constructed by asking each balancer what it wants to do, starting with the lowest priority balancer first.
The pool condition bar towards the bottom shows you how much data is in the right place (i.e. doesn’t need to be moved).
If you clickÂ Re-balance, DrivePool goes to work moving your files around.
DrivePool never locks any files while moving them around and doesn’t change time stamps, so you will not notice any difference on the pool drive itself even when DrivePool is re-balancing.
You will see what DrivePool is currently doing represented by little blinking arrows next to each bar indicating which drives we’re re-balancing now.
After the re-balancing process completes, our files are reorganized to best fit the balancing model.
Sometimes, if you lack any small files on the pool DrivePool will not be able to satisfy the balancing model entirely, or “fill in the gaps” so to speak, but it will come as close as possible given the available files. As you can see in the screen above, it re-balanced to 99.9% accuracy.
File Placement Limits
In addition to moving existing files a balancer can request to limit new file placement to certain drives. The built-in File Placement Limiter is designed to do just that.
We’ve decided that we don’t want to store any duplicated files on drives H:\ and J:\. But only if there are other drives available to store these duplicated files, and we don’t want to fill the other drives beyond 90% of their capacity in order to satisfy our request.
Notice that J:\ and H:\ are shown as one drive. That is because these volumes are stored on the same physical hard drive. Here DrivePool deals withÂ storage units instead of volumes.
A storage unit consists of one or more volumes that are part of the pool, on the same physical hard drive.
Given these new settings, this is what the new balancing model looks like:
Now we see little red arrows below the bars indicating real-time file placement limits that will be respected by the file system when selecting the destination for new files.
Notice that the balancer has not requested to move all the duplicated files off of H:\ and J:\ because doing so would overfill the other drives beyond 90 % capacity, so it only requested to move as much data as possible. Adding another drive to this pool will instantly change this model and the rest of those duplicated files will be moved off to the new disk.
Splitting up Duplicated vs. Un-duplicated Files
We can set up the File Placement Limiter such that it doesn’t allow duplicated files to be stored on the same physical disks as un-duplicated files.
After re-balancing our pool would look like this:
A few things to note:
- Any new files created on the pool from now on will automatically be placed on the correct drive, depending on the type (duplicated vs. un-duplicated), reducing the need for future re-balancing.
- The files will be split up in their respective groups unless all the drives in those groups are 90 % full. In that case, instead of the user getting an out of disk space message DrivePool will utilize any available disk with free space regardless of the limits. If you later add another disk to the pool, DrivePool will recognize that once again we can respect the limits and will set up a new balancing model moving all the files to their respective disks.
- Notice that there is some disk space that is Unusable for duplication. This is because the limits that we’ve set up have violated the optimal use of free space for duplicated files (see example earlier). We can resolve this by moving the Duplication Space Optimizer balancer above theÂ File Placement Limiter balancer in terms of priority. But if we do this the limits that we have set up might be broken in order to optimize the pool.
The Built-in Balancers
Now that I’ve shown you everything that the new balancing system is capable of, let’s go through all of the built-in balancers. You’ve already seen some of them in the examples above.
Duplication Space Optimizer
This is actually a combined balancer that is made up of 2 balancers that are designed to optimize the available disk space for duplicated files.
It does this by using some proprietary algorithms to determine if the current file placement is obstructing free space that is available to store duplicated files. It then resolves the situation by moving as few files as possible to make sure that duplicated files can utilize 100% of the available free space.
It has no user configurable settings.
Prevent Drive Overfill
This balancer is responsible for making sure than no one single drive gets filled to capacity, if we have free space on other drives.
It can either work in percent or GB (which can be useful for larger drives).
Why in GB?
For example, on a 4 TB disk, 90 % would be 400 GB free. You might not want to keep that much free space on all your 4 TB drives.
With the default setting pictured above, DrivePool will either keep 90 % free or 100 GB free, whichever leaves less free space. So on a 4 TB drive it will keep 100 GB free.
File Placement Limiter
This balancer is designed to control real-time file placement, but it also features a re-balancing component.
File placement limits are respected by the file system when selecting a disk or disks to store new files.
The limit that this balancer sets up is called a soft-limit. This means that if all the drives for that file type are “full” it will not issue an out of disk space message to the user but instead choose any disk with sufficient free space.
How full is “full”? That’s defined in the balancer’s settings as a percentage of disk size.
This balancer is responsible for equalizing the disk space used across volumes that are located on the same physical disk.
DrivePool treats any volumes part of the pool that are on the same physical disk as one storage unit. This balancer goes through any such volumes and makes sure that they are consuming disk space equally on the disk that they’re on.
This balancer has no user configurable settings.
As I’ve mentioned, the DrivePool balancing framework allows for 3rd party developers to write their own balancers. I’ll talk about how this will work in a future post.
The current build does not yet accept 3rd party plug-ins but you can look at the DrivePool.Integration.dll (in C:\Program Files\StableBit\DrivePool). This is a .NET 4.0 class library, so you can reference it in visual studio.
Writing your own balancer is extremely easy. All the balancers are written in .NET 4.0 and all you have to do is inherit from the DrivePool.Integration.Balancing.BalancerBase class. DrivePool will then call your Balance method, and in that method you call MoveFiles one or more times on the passed in BalanceStateInfos.
The new balancers give you a lot of control over how your pool is organized, and with 3rd party plug-ins the possibilities will be far reaching.
For example, one of the goals for the next StableBit Scanner BETA is to be able to integrate with DrivePool using the new plug-in system.
But balancers can’t do everything. For example, they can’t designate which folders go onto which disks. For this we will need multiple pool support.
With the new balancing system and multiple pool support you will be able to control file placement on a very fine grained level. But at the same time we absolutely do not want to increase the complexity of DrivePool as a whole. Just like the new balancing system, multiple pool support will be implemented in a completely unobtrusive way.
If you don’t want to use it, you won’t even know it’s there.
Multiple pool support is coming in DrivePool 1.3 later this year.