VMWare released the white paper of new VSAN 6.2 on Feb. 10, 2016. The technical white paper can be found here: https://www.vmware.com/files/pdf/products/vsan/vmware-virtual-san-6-2-technical-white-paper.pdf
VSAN is pretty new to me. I was trying to understand the space requirements for RAID-1 shown in the table given in the white paper (page 7). This post introduces how to understand and calculate the minimum number of hosts/FDs and total capacity for FTT with RAID1.
First, I assume you already know the RAID-1’s basics. If not please check out the wiki: https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_1
Second, FTT stands for NumberOfFailuresToTolerate, which is one configuration parameter by customer.
The total capacity in the above table means the total required space capacity in all the disks for storing all of the exact copies (here exact means the copy should be as the same size as the original object).
Now, let’s take a look at FTT=0. Since RAID-1 is configured, the total capacity should be 2x object size by default RAID-1 setting. Because here we explicitly specify FTT=0, which means no disk failure can be tolerated, there is no need to store the second copy on the second disk. Therefore, the total capacity reduces to 1x object size. However, the number of minimum hosts should be 2 (the same as the minimum number for RAID-1), instead of 3 in the table.
For the cases of FTT>0, we need to understand how this failure tolerance works. In VSAN 5.5, for a VM to remain accessible, “one full copy of the data and more than 50% of components must be available”. Let’s take an example to explain this statement, if FTT=1 and there are three hosts with 1 magnetic disk (component):
–disk1: 1st copy of VM
–disk1: 2nd copy of VM
–disk1: witness (2MB metadata)
Host1 and host2 stores the full copies of the VM, and host3 (witness for quorum) stores pure metadata. Each disk has a vote to vote for the disk which can take over the service. So
If host1 is down, one full copy is on host2 and two votes are from host2 and host3.
If host2 is down, one full copy is on host1 and two votes are from host1 and host3.
If host3 is down, two full copies are on host1 and host2. Two votes are from host1 and host2.
Therefore, if FTT=n (n>0), the minimum required total capacity is (n+1)x object size, which guarantees there is always at least one more full copy if n hosts/FDs are down. If there is n hosts are down, in order to continue the service, at least n+1 components need to be up and vote. These n+1 components needs to span across at least n+1 hosts/FDs. So the minimum number of hosts for FTT=n is n+(n+1) = 2n+1. Let’s take FTT=2 as an example, the total required capacity is (2+1)x = 3x, and the minimum number of required hosts/FDs is 2*2+1 = 5.
Please note three things here:
1. We assume one disk (component) in the example for understanding easily. If there are more than 1 disk for each host, usually VSAN is configured as RAID-01 level. The calculation is applied at the level of hosts/FDs, since a full copy still exists at such level, but not at the disk level.
2. Among 2n+1 required hosts for FTT=n, at least n+1 full copies are required and the remaining n hosts could be witnesses.
3. Starting from VSAN 6.0, the statement of “more than 50 percent of components” has changed to “more than 50 percent of votes”. With this change, witness could be eliminated since one component can have more than 1 vote.