Real case: Storage Spaces Direct physical disk replacement

Posted by: Romain Serre in Storage February 11, 2017 8 Comments 21,829 Views

This week I was in Stockholm to build a Storage Spaces Direct cluster (hyperconverged model). When implementing the cluster, I have seen that a physical disk was failing. I’ve written this topic to show you how I have replaced this disk.

Identify the failed physical disk

I was deploying VMFleet when I saw the both virtual disks in a degraded state. So, I checked the job my running Get-StorageSubSystem *Cluster* | Get-StorageJob. Then I opened the Storage Pool and I have seen the following:

So, it seems that this physical disk was not healthy and I decided to change it. First, I ran the following cmdlet because my trust in the Failover Cluster Manager is limited:

Get-StoragePool *S2D* | Get-PhysicalDisk

Then I add the physical disk object into a PowerShell variable (called $Disk) to manipulate the disk. You can change the OperationalStatus filter by another thing while you get the right disk.

$Disk = Get-PhysicalDisk |? OperationalStatus -Notlike ok

Retire and physically identify storage device

Next I set the usage of this disk to Retired to stop writing on this disk and avoid data loss.

Set-PhysicalDisk -InputObject $Disk -Usage Retired

Next I tried to remove the physical disk from the Storage Pool. It seems that the physical disk is in really bad state. I can’t remove it from the pool. So, I decided to change it anyway.

I ran the following cmdlet to turn on the storage device LED to identify it easily in the datacenter:

Get-PhysicalDisk |? OperationalStatus -Notlike OK | Enable-PhysicalDiskIdentification

Next I move to the server room and as you can see in the below photo, the LED is turned on. So, I changed this disk.

Once the disk is replaced, you can turn off the LED:

Get-PhysicalDisk |? OperationalStatus -like OK | Disable-PhysicalDiskIdentification

Add physical disk to storage pool

Before a reboot of the server, the physical disk can’t identify its enclosure name. The disk automatically joined the Storage Pool but without enclosure information. So, you have to reboot the server to get the right information.

Storage Spaces Direct spread automatically the data across the new disk. This process took almost 30mn.

Sometime the physical disk doesn’t join automatically the Storage Pool. So, you can run the following cmdlet to add the physical disk to the Storage Pool.

Conclusion

With storage solutions, you can be sure that a physical disk, either SSD or HDD will fail some days. With Storage Spaces Direct, Microsoft provides all required tools to change properly failed disks easily. Just set the physical disk as retired, then remove the physical disk (if you can) from the storage pool. To finish you can change the disk.

8 comments

Alexey
February 27, 2017 at 2:40 pm

Why drive not detected without rebooting the server? It is only in this case, or it is a normal for s2d? Windows?

- Romain Serre
  February 27, 2017 at 4:04 pm
  
  I have rebooted because the disk was not detected in the enclosure. It worked without a reboot but I have rebooted the server to see the drive in the right enclosure.
  
  I think this is related to the HBA.
  
  Regards,
  
  Romain
  
  - Joachim Stapelfeldt
    April 25, 2017 at 1:11 pm
    
    If you run “Update-StorageProviderCache -DiscoveryLevel Full” on the ownernode the info will be updated.
    
    - Romain Serre
      May 4, 2017 at 8:20 pm
      
      Thank you very much for this information !
      
Johan
August 21, 2017 at 8:29 am

Hi Romain, I am thinking about using s2d in our prod vmware environment, is that fully supported?

Regards Johan

- Romain Serre
  August 23, 2017 at 10:30 am
  
  No you can’t. S2D doesn’t support iSCSI, NFS and so on. If you plan S2D, you need Hyper-V. But I can work with you to plan a VMware to Hyper-V migration :p
  
Mattp
August 26, 2017 at 4:39 pm

I paused one node, rebooted it after draining the roles and I’m in a situation now by which the 2 SSD Cache drives reported loss communication in cluster failover manager on the node i rebooted, I can see the disks fine in the BIOS, but not in the OS. Is it worth treating them like a failure and kill them off, reseat them and add them back in. Open to suggestions.

- Romain Serre
  September 6, 2017 at 6:09 am
  
  Hi,
  
  What is your HBA, server and disks ?

Tech-Coffee

Real case: Storage Spaces Direct physical disk replacement

Identify the failed physical disk

Retire and physically identify storage device

Add physical disk to storage pool

Conclusion

Related

About Romain Serre

Related Posts

8 comments

Leave a Reply Cancel reply

Don’t do it: enable performance history in an Azure Stack HCI mixed mode cluster

Keep Dell Azure Stack HCI hardware up to date with WSSD Catalog

Archive Rubrik backup in Microsoft Azure

Getting started with Azure Update Management to handle Windows updates

Getting started with Rubrik to backup VMware VMs

Check Also

Storage Spaces Direct: Parallel rebuild

Storage Spaces Direct and deduplication in Windows Server 2019

Real Case: Implement Storage Replica between two S2D clusters

Real case: Storage Spaces Direct physical disk replacement

Identify the failed physical disk

Retire and physically identify storage device

Add physical disk to storage pool

Conclusion

Share this:

Related

About Romain Serre

Related Posts

8 comments

Leave a Reply Cancel reply

Check Also