Storage Spaces Direct: Parallel rebuild

Posted by: Romain Serre in Storage June 22, 2018 2 Comments 2,245 Views

Parallel rebuild is a Storage Spaces features that enables to repair a storage pool even if the failed disk is not replaced. This feature is not new to Storage Spaces Direct because it exists also since Windows Server 2012 with Storage Spaces. This is an automatic process which occurs if you have enough free space in the storage pool. This is why Microsoft recommends to leave some free space in the storage pool to allow the parallel rebuild. This amount of free space is often forgotten when designing Storage Spaces Direct solution, this is why I wanted to write this theoretical topic.

How works parallel rebuild

Parallel rebuild needs some free spaces to work. It’s like spare free space. When you create a RAID6 volume, a disk is in spare in case of failure. In Storage Spaces (Direct), instead of spare disk, we have spare free space. Parallel rebuild occurs when a disk fails. If enough of capacity is available, parallel rebuild runs automatically and immediately to restore the resiliency of the volumes. In fact, Storage Spaces Direct creates a new copy of the data that were hosted by the failed disk.

When you receive the new disk (4h later because you took a +4h support :p), you can replace the failed disk. The disk is automatically added to the storage pool if the auto pool option is enabled. Once the disk is added to the storage pool, an automatic rebalance process is run to spread data across all disks to get the best efficiency.

How to calculate the amount of free spaces

Microsoft recommends to leave free space equal to one capacity disk per node until 4 drives:

2-node configuration: leave free the capacity of 2 capacity devices
3-node configuration: leave free the capacity of 3 capacity devices
4-node and more configuration: leave free the capacity of 4 capacity devices

Let’s think about a 4-node S2D cluster with the following storage configuration. I plan to deploy 3-Way Mirroring:

3x SSD of 800GB (Cache)
6x HDD of 2TB (Capacity). Total: 48TB of raw storage.

Because, I deploy a 4-node configuration, I should leave free space equivalent to four capacity drives. So, in this example 8TB should be the amount of free space for parallel rebuild. So, 40TB are available. Because I want to implement 3-Way Mirroring, I divide the available capacity per 3. So 13.3TB is the useable storage.

Now I choose to add a node to this cluster. I don’t need to reserve space for parallel rebuild (regarding the Microsoft recommendation). So I add 12TB capacity (6x HDD of 2TB) in the available capacity for a total of 52TB.

Conclusion

Parallel rebuild is an interesting feature because it enables to restore the resiliency even if the failed disk is not yet replaced. But parallel rebuild has a cost regarding the storage usage. Don’t forget the reserved capacity when you are planning the capacity.

2 comments

Sergio Porter
August 3, 2018 at 6:36 pm

Romain,

First let me thank you for sharing your knowledge and experience.

I wonder if you have an idea for the rational behind this recommendation?

I generally leave the equivalent of the largest perfomrance drive and the largest capacity drive, so that in the case of any single-drive failure, after rebalancing I can remove the drive and replace it.

That is, of course, much less tan the MS recommendation. The only downside I see is for multiple-drive failures.

Or am I missing something else?

Thanks again,

Sergio

- Romain Serre
  August 8, 2018 at 6:06 am
  
  Hey,
  
  First this is a recommendation and it is not mandatory. We do that in case of failure. When a disk crashes, if you have free space, the data that should be on the disk is replicated in the free space. So you don’t have to wait for receiving the new disk to restore again the resiliency. It’s like a “pro-active” repair process and I find it really genius 🙂

Tech-Coffee

Storage Spaces Direct: Parallel rebuild

How works parallel rebuild

How to calculate the amount of free spaces

Conclusion

Related

About Romain Serre

Related Posts

2 comments

Leave a Reply Cancel reply

Don’t do it: enable performance history in an Azure Stack HCI mixed mode cluster

Keep Dell Azure Stack HCI hardware up to date with WSSD Catalog

Archive Rubrik backup in Microsoft Azure

Getting started with Azure Update Management to handle Windows updates

Getting started with Rubrik to backup VMware VMs

Check Also

Storage Spaces Direct and deduplication in Windows Server 2019

Real Case: Implement Storage Replica between two S2D clusters

Deploy a Software-Defined Storage solution with StarWind Virtual SAN

Storage Spaces Direct: Parallel rebuild

How works parallel rebuild

How to calculate the amount of free spaces

Conclusion

Share this:

Related

About Romain Serre

Related Posts

2 comments

Leave a Reply Cancel reply

Check Also