This week I was in Stockholm to build a Storage Spaces Direct cluster (hyperconverged model). When implementing the cluster, I have seen that a physical disk was failing. I’ve written this topic to show you how I have replaced this disk.
Identify the failed physical disk
I was deploying VMFleet when I saw the both virtual disks in a degraded state. So, I checked the job my running Get-StorageSubSystem *Cluster* | Get-StorageJob. Then I opened the Storage Pool and I have seen the following:
So, it seems that this physical disk was not healthy and I decided to change it. First, I ran the following cmdlet because my trust in the Failover Cluster Manager is limited:
Get-StoragePool *S2D* | Get-PhysicalDisk
Then I add the physical disk object into a PowerShell variable (called $Disk) to manipulate the disk. You can change the OperationalStatus filter by another thing while you get the right disk.
$Disk = Get-PhysicalDisk |? OperationalStatus -Notlike ok
Retire and physically identify storage device
Next I set the usage of this disk to Retired to stop writing on this disk and avoid data loss.
Set-PhysicalDisk -InputObject $Disk -Usage Retired
Next I tried to remove the physical disk from the Storage Pool. It seems that the physical disk is in really bad state. I can’t remove it from the pool. So, I decided to change it anyway.
I ran the following cmdlet to turn on the storage device LED to identify it easily in the datacenter:
Get-PhysicalDisk |? OperationalStatus -Notlike OK | Enable-PhysicalDiskIdentification
Next I move to the server room and as you can see in the below photo, the LED is turned on. So, I changed this disk.
Once the disk is replaced, you can turn off the LED:
Get-PhysicalDisk |? OperationalStatus -like OK | Disable-PhysicalDiskIdentification
Add physical disk to storage pool
Before a reboot of the server, the physical disk can’t identify its enclosure name. The disk automatically joined the Storage Pool but without enclosure information. So, you have to reboot the server to get the right information.
Storage Spaces Direct spread automatically the data across the new disk. This process took almost 30mn.
Sometime the physical disk doesn’t join automatically the Storage Pool. So, you can run the following cmdlet to add the physical disk to the Storage Pool.
Conclusion
With storage solutions, you can be sure that a physical disk, either SSD or HDD will fail some days. With Storage Spaces Direct, Microsoft provides all required tools to change properly failed disks easily. Just set the physical disk as retired, then remove the physical disk (if you can) from the storage pool. To finish you can change the disk.
Why drive not detected without rebooting the server? It is only in this case, or it is a normal for s2d? Windows?
I have rebooted because the disk was not detected in the enclosure. It worked without a reboot but I have rebooted the server to see the drive in the right enclosure.
I think this is related to the HBA.
Regards,
Romain
If you run “Update-StorageProviderCache -DiscoveryLevel Full” on the ownernode the info will be updated.
Thank you very much for this information !
Hi Romain, I am thinking about using s2d in our prod vmware environment, is that fully supported?
Regards Johan
No you can’t. S2D doesn’t support iSCSI, NFS and so on. If you plan S2D, you need Hyper-V. But I can work with you to plan a VMware to Hyper-V migration :p
I paused one node, rebooted it after draining the roles and I’m in a situation now by which the 2 SSD Cache drives reported loss communication in cluster failover manager on the node i rebooted, I can see the disks fine in the BIOS, but not in the OS. Is it worth treating them like a failure and kill them off, reseat them and add them back in. Open to suggestions.
Hi,
What is your HBA, server and disks ?