Oracle Cloud VMware Solution vSAN sizing and scaling

July 13, 2022 | 11 minute read
Thomas Thyen
Cloud VMware Solutions Specialist
Text Size 100%:

Oracle Cloud VMware Solution is a VMware software-defined data center (SDDC) solution on Oracle Cloud Infrastructure (OCI) that can be based on Intel or AMD bare metal hosts and based on the following VMware product stack:

  • VMware vSphere Hypervisor (ESXi)

  • VMware vCenter Server

  • VMware vSAN

  • VMware NSX-T

  • VMware HCX Advanced Edition (Enterprise Edition billed separately)

VMware vSAN is a software-defined, enterprise storage solution that supports hyper-converged infrastructure (HCI) systems. vSAN is fully integrated with VMware vSphere, as a distributed layer of software within the ESXi hypervisor. vSAN aggregates local or direct-attached data storage devices to create a single storage pool shared across all hosts in a vSAN cluster.

vSAN eliminates the need for external shared storage and simplifies storage configuration through storage policy-based management. Using virtual machine (VM) storage policies, you can define storage requirements and capabilities.

VMware Solution bare metal shapes

The following shapes are supported for Oracle Cloud VMware Solution ESXi hosts:

loci-shapes

 

For more details, see Dense I/O Shapes.

VMware Solution cluster minimum sizing

The minimum supported sizing for a VMware Solution cluster is always three hosts—the minimum required host count for a vSAN standard cluster to support a RAID-1 configuration that can tolerate one host failure.

An Intel-based three-host cluster has the following specifications:

  • 156 cores per cluster

  • 2,304-GB memory per cluster

  • 153 TB NVMe SSD per cluster (Bare metal shape RAW capacity)

  • 122.22 TB vSAN storage

An AMD-based three-host cluster has the following specifications:

  • 96, 192, or 384 cores per cluster

  • 6,144-GB memory per cluster

  • 163 TB NVMe SSD per cluster (Bare metal shape RAW capacity)

  • 129.9 TB vSAN storage

Cluster types

Oracle Cloud VMware Solution clusters can either be single-availability domain clusters, which are only available in a single availability domain stretched across three fault domains, or multi-availability domain clusters, which can be stretched across availability domains where the availability domains acts as a fault domain

Single availability domain cluster

Single-availability domain clusters span three fault domains within an availability domain and aggregate the ESXi hosts local storage into a vSAN datastore. Each OCI fault domain acts as a vSAN fault domain.

A graphic depicting the architecture for a single-availability domain cluster in Oracle Cloud VMware Solution.

Multi-availability domain cluster

Multi-availability domain clusters can span three availability domains within an OCI region and aggregate the ESXi hosts local storage into a vSAN datastore. Each availability domain acts as a fault domain building an Oracle Cloud VMware Solution cluster across availability domains. This setup provides the maximum availability for a VMware Solution cluster and its VMs.

A graphic depicting the architecture for a multi-availability domain cluster in Oracle Cloud VMware Solution.

Oracle Cloud VMware Solution vSAN sizing

Each ESXi host participating in a vSAN cluster on Oracle Cloud VMware Solution is stacked with eight NVMe SSD devices that represent a vSAN disk group. The disk group represents the local aggregation of the storage resources made up of one cache device and seven capacity devices. Oracle Cloud VMware Solution uses a single disk group because all cache and capacity devices are NVMe-based and don’t require other storage controllers. This configuration eliminates a single point of failure for the vSAN disk group design.

The capacity devices represent the total amount of vSAN Storage a ESXi host is presenting to the vSAN cluster.

A graphic depicting a vSAN disk group.

Intel-based ESXi host vSAN raw capacity

VMware Solution Intel-based ESXi hosts are configured with eight 5.82-TB NVMe SSDs: One NVMe SSD for the vSAN cache and seven NVMe SSDs for vSAN capacity. This configuration results in an ESXi host vSAN raw capacity of 40.74 TB.

Capacity device zize * number of capacity devices = ESXi vSAN host raw capacity

VMware Solution cluster vSAN raw capacity

The minimum supported configuration for an Intel-based cluster is three ESXi hosts, which Cloud VMware Solution vSAN cluster.

ESXi host vSAN raw capacity * number of ESXi hosts = cluster vSAN raw capacity

AMD-based ESXi host vSAN raw capacity

AMD-based ESXi hosts are configured with eight 6.18-TB NVMe SSDs: One NVMe SSD for the vSAN cache and seven NVMe SSDs for vSAN Capacity. This configuration results in a ESXi host vSAN raw capacity of 43.3 TB.

Capacity device size * number of capacity devices = ESXi vSAN host raw capacity

AMD-based cluster vSAN raw capacity

The minimum supported configuration for an AMD-based cluster is three ESXi hosts, which results in a total vSAN raw capacity of 129.9 TB for a three-host vSAN cluster.

ESXi host vSAN raw capacity * number of ESXi hosts = cluster vSAN raw capacity

Capacity dependencies

The usable capacity of an Oracle Cloud VMware Solution vSAN cluster depends not only on the number of ESXi hosts provisioned to scale out the cluster’s vSAN raw capacity. A following factors also have an impact on the sizing and scaling of a cluster:

  • Failure tolerance method (FTM)

  • Failures to tolerate (FTT)

  • Host reservation

  • Deduplication and compression

  • Operations reserve and host rebuild reserve

  • VM sizing (vCPU and vMEM)

  • vSphere high-availability resources (vCPU and vMEM)

FTM and FTT

The FTM and FTT have an important role when you plan and size storage capacity for vSAN. Based on the availability requirements of a VM, the setting might result in doubled vSAN capacity consumption or more.

  • FTM: RAID level of a vSAN object

  • FTT: Host failures a vSAN cluster must withstand without degradation

RAID-1 (Mirroring): Performance

  • If the FTT is set to 0, the consumption is 1x. (100GB VMDK = 100GB vSAN)

  • If the FTT is set to 1, the consumption is 2x. (100GB VMDK = 200GB vSAN)

  • If the FTT is set to 2, the consumption is 3x. (100GB VMDK = 300GB vSAN)

  • If the FTT is set to 3, the consumption is 4x. (100GB VMDK = 400GB vSAN)

RAID-5/6 (Erasure coding): Capacity

  • If the FTT is set to 1, the consumption is 1.33x. (100GB VMDK = 133GB vSAN)

  • If the FTT is set to 2, the consumption is 1.50x. (100GB VMDK = 150GB vSAN)

ocvs-raid-config

 

RAID-1 (Mirroring) versus RAID-5/6 (Erasure coding)

  • RAID-1 (Mirroring) in vSAN employs a 2n+1 host or fault domain algorithm, where n is the number of FTTs.

  • RAID-5/6 (Erasure coding) in vSAN employs a 3+1 (RAID-5) or 4+2 (RAID-6) host or fault domain requirement, depending on one or two failures to tolerate respectively.

  • RAID-5/6 (Erasure coding) does not support three failures to tolerate. This setting is only available for RAID-1 (Performance).

Host reservation

Host reservation in vSAN generally reserves ESXi hosts in a vSAN cluster, meaning that these hosts are participating in the vSAN cluster and act as dedicated failover resources depending on the availability requirements of a cluster when it comes to the event of a failure or maintenance operations. This optional setting brings higher availability and resilience.

n+1 guarantees an extra host to satisfy the vSAN cluster availability requirement in terms of a host failure or maintenance operations.

n+2 guarantees two extra hosts to satisfy the vSAN cluster availability requirement to maintain a n+1 reservation at any point in time. No degradation in performance or availability occurs when a host fails or need maintenance.

Deduplication and compression

Deduplication and compression are most suitable for highly dedupable workloads, such as VDI full clones. On-disk format version 3.0 and later adds an extra overhead, typically no more than 1–2 percent capacity per device. Deduplication and compression with software checksum enabled require extra overhead of approximately 6.2 percent capacity per device.

Reduction ratios are based on VMware measurements that provide an indication of how deduplication and compression can save space on a vSAN cluster per workload type. Always adjust these ratios to the desired workload reductions ratios measured in an on-premises environment as a reference value.

Reduction ratio indications:

  • General purpose VMs: 1.5

  • Databases: 2.0

  • File services: 1.5

  • VDI full clone: 8

  • VDI instant clone: 2

  • VDI linked clone: 2.5

  • Tanzu: 1.5

Operations reserve and host rebuild reserve

Operations reserve defines the capacity needed form vSAN to perform internal operations, such as policy changes, rebalancing, and data movement.

vSAN provides the option to reserve the capacity in advance so that it has enough free space available to perform internal operations and repair data back to compliance following a single host failure. By enabling reserve capacity in advance, vSAN prevents you from using the space to create workloads and intends to save the capacity available in a cluster. By default, the reserved capacity is disabled.

If the vSAN cluster has enough free space, you can enable the operations reserve and the host rebuild reserve. Operations reserve is reserved space in the cluster for vSAN internal operations. Host rebuild reserve is the reserved space for vSAN to be able to repair in case of a single host failure.

Enabling host rebuild reserve demarcates one hosts worth of capacity in the cluster. host rebuild reserve works on the principle of n+1. For example, in a four-node cluster of identical hardware configuration, host rebuild reserve would require 25 percent reserve capacity to ensure sufficient rebuild capacity. When a vSAN cluster increases in size, the host rebuild reserve is decreasing. For example, in an eight-node cluster of identical hardware configuration, host rebuild reserve requires 12.5 percent.

VM sizing (vCPU and vMEM)

As a vSAN cluster scales horizontally the vSAN capacity scales linear with the vCPU and vMEM resources, which results in potential unused vCPU and vMEM resources compared to the provided vSAN storage.

VMware vSphere high-availability resources (vCPU and vMEM)

VMware vSphere high availability is an integral part of every vSphere deployment that provides the ability the reserve cluster capacity for vCPU and vMEM resources if a host must go into maintenance mode or fails because to a hardware error. It guarantees that all VMs in the cluster are provided with sufficient vCPU and vMEM resources. Typically this high-availability reserve is calculated on a percentage based on the number of hosts in a cluster. In a three-node cluster of identical hardware configuration, high-availability reserve requires 33 percent reserve capacity to ensure sufficient high-availability capacity. In an eight-node cluster of identical hardware configuration, high-availability reserve requires 12.5% reserve capacity to ensure sufficient rebuild capacity. The examples show the high-availability reservation for a n+1 scenario. For n+2, the reservation doubles in size.

Oracle Cloud VMware Solution vSAN scaling

Unlike many on-premises vSAN clusters, Oracle Cloud VMware Solution is a scale-out only solution, meaning that you can’t scale up an ESXi host by adding disks to expand disk groups or create more disk groups. Oracle Cloud VMware Solution can only scale on the vSAN storage level by adding hosts to the cluster—also known as horizontal scaling. A cluster can scale from three hosts to a maximum of 64 hosts in a single SDDC.

Oracle recommends VMware SDDCs deployed across availability domains within a region can’t exceed a maximum of 16 ESXi hosts.

A graphic depicting the process of vSAN scaling in Oracle Cloud VMware Solution.

Depending on the chosen VM shape (Intel or AMD), the scaling of a cluster is dictated by the required vCPU, vMEM, storage requirements (FTM and FTT), and potential storage saving techniques line deduplication and compression, depending on the workload type.

The following table shows the raw capacity of on eight-node Oracle Cloud VMware Solution vSAN cluster with a CPU to vCPU ratio of 1:2 without applying any storage saving techniques like deduplication and compression and without the operations and host rebuild reserve. The actual usable vSAN capacity is dictated by the FTM and FTT settings per VM object.

OCVS-example-sizing

See Dense I/O Shapes for more detail.

VMware vSAN sizer

A useful and powerful tool to verify the sizing and scaling for VMware vSAN clusters is the VMware vSAN Sizer, which brings the ability to calculate a sizing for mixed workload clusters with different FTM and FTT levels considering VM CPU, memory requirements, n+1 or n+2 configurations, deduplication and compression, CPU to vCPU ratios, operations reserve, and host rebuild reserve.

Conclusion

This blog summarizes the general sizing and scaling principles of VMware vSAN on Oracle Cloud VMware Solution for the different types of clusters that are either Intel- or AMD-based to demonstrate sizing and scaling options for customers that want to migrate workloads to Oracle Cloud VMware Solution, extend their local VMware SDDC to an Oracle Cloud VMware Solution SDDC, use Oracle Cloud VMware Solution for disaster recovery, or simply start with a greenfield Oracle Cloud VMware Solution deployment to host legacy workloads, VMware Tanzu, or VDI on Oracle Cloud VMware Solution.

If you want to read more on this topic, the following links give you more insight about VMware on Oracle Cloud Infrastructure:

Thomas Thyen

Cloud VMware Solutions Specialist

Thomas Thyen is a Cloud VMware Solutions Specialist within the Technical Cloud Engineering Organization at Oracle mainly working on Oracle Cloud VMware Solution and integration with the VMware product stack and 3rd party products. He is specialised in technologies like VMware vSphere, VMware vSAN, VMware NSX, VMware HCX, VMware vSphere Replication, VMware Site Recovery Manager, VMware vRealize Suite, VEEAM Backup & Replication and Citrix Virtual Apps & Desktops.

Show more

Previous Post

4 popular GitHub actions for OCI

Sherwood Zern | 3 min read

Next Post


Alert log support for OCI Database Management

Murtaza Husain | 3 min read
Oracle Chatbot
Disconnected