When a customer calls me to design or validate the hardware configuration for hyperconverged infrastructure with Storage Spaces Direct, there is often a misunderstanding about the remaining useable capacity, the required cache capacity and ratio, and the different mode of resilience. With this topic, I’ll try to help you to plan the storage for hyperconverged and to clarify some points.
Hardware consideration
Before sizing the storage devices, you should be aware about some limitations. First you can’t exceed 26 storage devices per node. Windows Server 2016 can’t handle more than 26 storage devices so if you deploy your Operating System on two storage devices, 24 are available for Storage Spaces Direct. However, the storage devices are bigger and bigger so 24 storage devices per node is enough (I have never seen a deployment with more than 16 storage devices for Storage Spaces Direct).
Secondly, you have to pay attention on your HBA (Host Bus Adapter). With Storage Spaces Direct, this is the Operating System which is in charge to handle the resilience and cache. This is a software-defined solution after all. So, there is no reason that the HBA manages RAID and cache. In Storage Spaces Direct case, the HBA is used mainly to add more SAS ports. So, don’t buy an HBA with RAID and cache because you will not use these features. Storage Spaces Direct storage devices will be configured in JBOD mode. If you choose to buy Lenovo server, you can buy N2215 HBA. If you choose Dell, you can select HBA330. The HBA must provide the following features:
-
Simple pass-through SAS HBA for both SAS and SATA drives
-
SCSI Enclosure Services (SES) for SAS and SATA drives
-
Any direct-attached storage enclosures must present Unique ID
-
Not Supported: RAID HBA controllers or SAN (Fibre Channel, iSCSI, FCoE) devices
Thirdly, there are requirements regarding storage devices. Only NVMe, SAS and SATA devices are supported. If you have old SCSI storage devices, you can drop them :). These storage devices must be physically attached to only one server (local-attached devices). If you choose to implement SSD, these devices must be enterprise-grade with power loss protection. So please, don’t install a hyperconverged solution with Samsung 850 pro. If you plan to install cache storage devices, these SSD must have 3 DWPD. That means that this device can be written entirely three times per day at least.
To finish, you have to respect a minimum number of storage devices. You must implement at least 4 capacity storage devices per node. If you plan to install cache storage devices, you have to deploy two of them at least per node. For each node in the cluster, you must have the same kind of storage devices. If you choose to deploy NVMe in a server, all servers must have NVMe. The most possible, keep the same configuration across all nodes. The below table provides the minimum storage devices per node regarding the configuration:
Drive types present | Minimum number required |
All NVMe (same model) | 4 NVMe |
All SSD (same model) | 4 SSD |
NVMe + SSD | 2 NVMe + 4 SSD |
NVMe + HDD | 2 NVMe + 4 HDD |
SSD + HDD | 2 SSD + 4 HDD |
NVMe + SSD + HDD | 2 NVMe + 4 Others |
Cache ratio and capacity
The cache ratio and capacity is an important part when you choose to deploy cache mechanism. I have seen a lot of wrong design because of cache mechanism. The first thing to know is that the cache is not mandatory. As explained in the above table, you can implement an all flash configuration without cache mechanism. However, if you choose to deploy a solution based on HDD, you must implement a cache mechanism. When the storage devices behind cache are HDDs, the cache is set to Read / Write mode. Otherwise, it is set to write only mode.
The cache capacity must be at 10% of the raw capacity. If in each node you have 10TB of raw capacity, you need at least 1TB of cache. Moreover, if you deploy cache mechanism, you need at least two cache storage devices. This ensures the high availability of the cache. When Storage Spaces Direct is enabled, capacity devices are bound to cache devices in round-robin manner. If a cache storage device fails, all its capacity devices are bound to another cache storage device.
To finish, you must respect a ratio between the number of cache devices and capacity devices. The capacity devices must be a multiple of cache devices. This ensures that each cache device has the same number of capacity devices.
Reserved capacity
When you design the storage pool capacity and you choose the number of storage devices, you need to keep in mind that you need some unused capacity in the storage pool. This is the reserved capacity in case of repair process. If a capacity device fails, storage pool duplicates blocks that were written in this device to respect the resilience mode. This process requires free space to duplicate blocks. Microsoft recommends to leave empty the space of one capacity device per node up to four drives.
For example, I have 6 nodes with 4x 4TB HDD per node. I leave empty 4x 4TB (one per node up to four drives) in the storage pool for reserved capacity.
Example of storage design
You should know that in hyperconverged infrastructure, the storage and the compute are related because these components reside in the same box. So before calculate the required raw capacity you should have evaluated two things: the number of nodes you plan to deploy and the useable storage capacity required. For this example, let’s say that we need four nodes and 20TB of useable capacity.
First thing, you have to choose a resilience mode. In hyperconverged, usually 2-way Mirroring and 3-way Mirroring are implemented. If you choose 2-Way mirroring (1 fault tolerance), you have 50% of useable capacity. If you choose 3-Way Mirroring (recommended, 2 fault tolerances) you have only 33% of useable capacity.
PS: At the time of writing this topic, Microsoft has announced deduplication in next Windows Server release for ReFS volume.
So, if you need 20TB of useable capacity and you choose 3-Way Mirroring, you need at least 60TB (20 x 3) of raw storage capacity. That means that in each node (4-node) you need 15TB of raw capacity.
Now that you know you need 15TB of raw storage per node, you need to define the number of capacity storage devices. If you need maximum performance, you can choose only NVMe devices. But this solution will be very expensive. For this example, I choose SSD for the cache and HDD for the capacity.
Next, I need to define which kind of HDD I select. If I choose 4x 4TB HDD per node, I will have 16TB raw capacity per node. I need to add an additional 4TB HDD for the reserved capacity. But this solution is not good regarding the cache ratio. No cache ratio can be respected with five capacity devices. In this case I need to add an additional 4TB HDD to get a total of 6x 4TB HDD per node (24TB raw capacity) and I can respect the cache ratio with 1:2 or 1:3.
The other solution is to select 2TB HDD. I need 8x 2TB HDD to get the required raw capacity. Then I add an additional 2TB HDD for the reserved capacity. I get 9x 2TB HDD and I can respect the cache ratio with 1:3. I prefer this solution because I’m closest of the specifications.
Now we need to design the cache devices. For our solution, we need 3x cache devices for a total capacity of 1.8TB at least (10% of raw capacity per node). So I choose to buy 800GB SSD (because my favorite cache SSD, Intel S3710, exists in 400GB or 800GB :)). 800GB x 3 = 2.1TB cache capacity per node.
So, each node will be installed with 3x 800GB SSD and 9x 2TB HDD with a cache ratio of 1:3. The total raw capacity is 72TB and the reserved capacity is 8TB. The useable capacity will be 21.12TB ((72-8) x 0.33).
About Fault Domain Awareness
I have made this demonstration with a Fault Domain Awareness at the node level. If you choose to configure Fault Domain Awareness at chassis and rack level, the calculation is different. For example, if you choose to configure Fault Domain Awareness at the rack level, you need to divide the total raw capacity across the rack number. You need also the exact same number of nodes per rack. With this configuration and the above case, you need 15TB of raw capacity per rack.
Hello
Where do you have this number of max 26 drives supported per server from? I’ve never heard or read it anywhere, and can only find it referenced on your blog.
Best wishes
Hi,
This is what said me the Microsoft product group. Currently 416 disks are supported in the cluster. 416 / 16 = 26 drives. I think this is the reason …
Hello
I am just studying cache.
Could you let me know why we do?
“You must implement at least 4 capacity storage devices per node. If you plan to install cache storage devices, you have to deploy two of them at least per node.”
Hi,
Microsoft supports at least four physical disks per node. If you plan to install cache, you need at least two cache devices to ensure HA. If a cache device fails, the other can manage IO for all capacity disks while the failure.