Storage Spaces Direct: performance tests between 2-Way Mirroring and Nested Resiliency

Microsoft has released Windows Server 2019 with a new resiliency mode called nested resiliency. This mode enables to handle two failures in a two-node S2D cluster. Nested Resiliency comes in two flavors: nested two-way mirroring and nested mirror-accelerated parity. I’m certain that two-way mirroring is faster than nested mirror-accelerated parity but the first one provides only 25% of usable capacity while the second one provides 40% of usable capacity. After having discussed with some customers, they prefer improve the usable capacity than performance. Therefore, I should expect to deploy more nested mirror-accelerated parity than nested two-way mirroring.

Before Windows Server 2019, two-way mirroring (provide 50% of usable capacity) was mandatory in two-node S2D cluster. Now with Windows Server 2019, we have the choice. So, I wanted to compare performance between two-way mirroring and nested mirror-accelerated parity. Moreover, I want to know if compression and deduplication has an impact on performance and CPU workloads.

N.B: I executed tests on my lab which is composed of Do It Yourself servers. What I want to show is a “trend” to know what could be the bottleneck in some cases and if nested resiliency has an impact on performance. So please, don’t blame me in comment section 🙂

Test platform

I run my tests on the following platform composed of two nodes:

  • CPU: 1x Xeon 2620v2
  • Memory: 64GB of DDR3 ECC Registered
  • Storage:
    • OS: 1x Intel SSD 530 128GB
    • S2D HBA: Lenovo N2215
    • S2D storage: 6x SSD Intel S3610 200GB
  • NIC: Mellanox Connectx 3-Pro (Firmware 5.50)
  • OS: Windows Server 2019 GA build

Both servers are connected to two Ubiquiti ES-16-XG switches. Even if it doesn’t support PFC/ETS and so one, RDMA is working (I tested it with test-RDMA script). I have not enough traffic in my lab to disturb RDMA without a proper configuration. Even if I implemented that in my lab, it is not supported and you should not implement your configuration in this way for production usage. On Windows Server side, I added both Mellanox network adapters in a SET and I created three virtual network adapters:

  • 1x Management vNIC for RDP, AD and so one (routed)
  • 2x SMB vNIC for live-migration and SMB traffics (not routed). Each vNIC is mapped to a pNIC.

To test the solution I use VMFleet. First I created volumes in two-way mirroring without compression, then I enabled deduplication. After I deleted and recreated volumes in nested mirror-accelerated parity without deduplication. Finally, I enabled compression and deduplication.

I run the VM Fleet with a block size of 4KB, an outstanding of 30 and on 2 threads per VM.

Two-Way Mirroring without deduplication results

First, I ran the test without write workloads to see the “maximum” performance I can get. My cluster is able to deliver 140K IOPS with a CPU workload of 82%.

In the following test, I added 30% of write workloads. The total IOPS is almost 97K for 87% of CPU usage.

As you can see, the RSS and VMMQ are well set because all Cores are used.

Two-Way Mirroring with deduplication

First, you can see that deduplication is efficient because I saved 70% of total storage.

Then I run a VMFleet test and has you can see, I have a huge drop in performance. By looking closely to the below screenshot, you can see it’s because of my CPU that reach almost 97%. I’m sure with a better CPU, I can get better performance. So first trend: deduplication has an impact on CPU workloads and if you plan to use this feature, don’t choose the low-end CPU.

By adding 30% write, I can’t expect better performance. The CPU still limit the overall cluster performance.

Nested Mirror-Accelerated Parity without deduplication

After I recreated volumes I run a test with 100% read. Compared to two-way mirroring, I have a slightly drop. I lost “only” 17KIOPS to reach 123KIOPS. The CPU usage is 82%. You can see also than the latency is great (2ms).

Then I added 30% write and we can see the performance drop compared to two-way mirroring. My CPU usage reached 95% that limit performance (but the latency is content to 6ms in average). So nested mirror-accelerated parity require more CPU than two-way mirroring.

Nested Mirror-Accelerated Parity with deduplication

First, deduplication works great also on nested mirror-accelerated parity volume. I saved 75% of storage.

As two-way mirroring with compression, I have poor performance because of my CPU (97% usage).

Conclusion

First, deduplication works great if you need to save space at the cost of a higher CPU usage. Secondly, nested mirror-accelerated parity requires more CPU workloads especially when there are write workloads. The following schemas illustrate the CPU bottleneck. In the case of deduplication, the latency always increases and I think because of CPU bottleneck. This is why I recommend to be careful about the CPU choice. Nested Mirror Accelerated Parity takes also more CPU workloads than 2-Way Mirroring.

Another interesting thing is that Mirror-Accelerated Parity produce a slightly performance drop compared to 2-Way Mirroring but brings the ability to support two failures in the cluster. With deduplication enabled we can save space to increase the usable space. In two-node configuration, I’ll recommend to customer Nested Mirror-Accelerated Parity by paying attention to the CPU.

About Romain Serre

Romain Serre works in Lyon as a Senior Consultant. He is focused on Microsoft Technology, especially on Hyper-V, System Center, Storage, networking and Cloud OS technology as Microsoft Azure or Azure Stack. He is a MVP and he is certified Microsoft Certified Solution Expert (MCSE Server Infrastructure & Private Cloud), on Hyper-V and on Microsoft Azure (Implementing a Microsoft Azure Solution).

2 comments

  1. Hi, thank you for sharing that.
    Im deep in testing my new S2D W2K19 2-Way-Nested-Resilency-Mirror-Cluster (even hard to spell) 😉 and like compare the results given by my VMFLEET Tests.

    Can you provide your chosen parameters vor the sweep tests please? like:
    “.\start-sweep.ps1 -b 4 -t 4 -o 4 -w 10 -d 180”

    Thank you.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

x

Check Also

Don’t do it: enable performance history in an Azure Stack HCI mixed mode cluster

Lately I worked for a customer to add two nodes in an existing 2-nodes Storage ...

Support two failures in 2-node S2D cluster with nested resiliency

Microsoft just released Windows Server 2019 with a lot of improvement for Storage Spaces Direct. ...

Monitor and troubleshoot VMware vSAN performance issue

When you deploy VMware vSAN in the vSphere environment, the solution comes from several tools ...