Oracle Cloud VMware Solution is a VMware software-defined data center (SDDC) solution on Oracle Cloud Infrastructure (OCI) that can be based on Intel or AMD bare metal hosts and based on the following VMware product stack:
VMware vSphere Hypervisor (ESXi)
VMware vCenter Server
VMware HCX Advanced Edition (Enterprise Edition billed separately)
VMware vSAN is a software-defined, enterprise storage solution that supports hyper-converged infrastructure (HCI) systems. vSAN is fully integrated with VMware vSphere, as a distributed layer of software within the ESXi hypervisor. vSAN aggregates local or direct-attached data storage devices to create a single storage pool shared across all hosts in a vSAN cluster.
vSAN eliminates the need for external shared storage and simplifies storage configuration through storage policy-based management. Using virtual machine (VM) storage policies, you can define storage requirements and capabilities.
The following shapes are supported for Oracle Cloud VMware Solution ESXi hosts:
For more details, see Dense I/O Shapes.
The minimum supported sizing for a VMware Solution cluster is always three hosts—the minimum required host count for a vSAN standard cluster to support a RAID-1 configuration that can tolerate one host failure.
An Intel-based three-host cluster has the following specifications:
156 cores per cluster
2,304-GB memory per cluster
153 TB NVMe SSD per cluster (Bare metal shape RAW capacity)
122.22 TB vSAN storage
An AMD-based three-host cluster has the following specifications:
96, 192, or 384 cores per cluster
6,144-GB memory per cluster
163 TB NVMe SSD per cluster (Bare metal shape RAW capacity)
129.9 TB vSAN storage
Oracle Cloud VMware Solution clusters can either be single-availability domain clusters, which are only available in a single availability domain stretched across three fault domains, or multi-availability domain clusters, which can be stretched across availability domains where the availability domains acts as a fault domain
Single-availability domain clusters span three fault domains within an availability domain and aggregate the ESXi hosts local storage into a vSAN datastore. Each OCI fault domain acts as a vSAN fault domain.
Multi-availability domain clusters can span three availability domains within an OCI region and aggregate the ESXi hosts local storage into a vSAN datastore. Each availability domain acts as a fault domain building an Oracle Cloud VMware Solution cluster across availability domains. This setup provides the maximum availability for a VMware Solution cluster and its VMs.
Each ESXi host participating in a vSAN cluster on Oracle Cloud VMware Solution is stacked with eight NVMe SSD devices that represent a vSAN disk group. The disk group represents the local aggregation of the storage resources made up of one cache device and seven capacity devices. Oracle Cloud VMware Solution uses a single disk group because all cache and capacity devices are NVMe-based and don’t require other storage controllers. This configuration eliminates a single point of failure for the vSAN disk group design.
The capacity devices represent the total amount of vSAN Storage a ESXi host is presenting to the vSAN cluster.
VMware Solution Intel-based ESXi hosts are configured with eight 5.82-TB NVMe SSDs: One NVMe SSD for the vSAN cache and seven NVMe SSDs for vSAN capacity. This configuration results in an ESXi host vSAN raw capacity of 40.74 TB.
Capacity device zize * number of capacity devices = ESXi vSAN host raw capacity
The minimum supported configuration for an Intel-based cluster is three ESXi hosts, which Cloud VMware Solution vSAN cluster.
ESXi host vSAN raw capacity * number of ESXi hosts = cluster vSAN raw capacity
AMD-based ESXi hosts are configured with eight 6.18-TB NVMe SSDs: One NVMe SSD for the vSAN cache and seven NVMe SSDs for vSAN Capacity. This configuration results in a ESXi host vSAN raw capacity of 43.3 TB.
Capacity device size * number of capacity devices = ESXi vSAN host raw capacity
The minimum supported configuration for an AMD-based cluster is three ESXi hosts, which results in a total vSAN raw capacity of 129.9 TB for a three-host vSAN cluster.
ESXi host vSAN raw capacity * number of ESXi hosts = cluster vSAN raw capacity
The usable capacity of an Oracle Cloud VMware Solution vSAN cluster depends not only on the number of ESXi hosts provisioned to scale out the cluster’s vSAN raw capacity. A following factors also have an impact on the sizing and scaling of a cluster:
Failure tolerance method (FTM)
Failures to tolerate (FTT)
Deduplication and compression
Operations reserve and host rebuild reserve
VM sizing (vCPU and vMEM)
vSphere high-availability resources (vCPU and vMEM)
The FTM and FTT have an important role when you plan and size storage capacity for vSAN. Based on the availability requirements of a VM, the setting might result in doubled vSAN capacity consumption or more.
FTM: RAID level of a vSAN object
FTT: Host failures a vSAN cluster must withstand without degradation
RAID-1 (Mirroring): Performance
If the FTT is set to 0, the consumption is 1x. (100GB VMDK = 100GB vSAN)
If the FTT is set to 1, the consumption is 2x. (100GB VMDK = 200GB vSAN)
If the FTT is set to 2, the consumption is 3x. (100GB VMDK = 300GB vSAN)
If the FTT is set to 3, the consumption is 4x. (100GB VMDK = 400GB vSAN)
RAID-5/6 (Erasure coding): Capacity
If the FTT is set to 1, the consumption is 1.33x. (100GB VMDK = 133GB vSAN)
If the FTT is set to 2, the consumption is 1.50x. (100GB VMDK = 150GB vSAN)
RAID-1 (Mirroring) versus RAID-5/6 (Erasure coding)
RAID-1 (Mirroring) in vSAN employs a 2n+1 host or fault domain algorithm, where n is the number of FTTs.
RAID-5/6 (Erasure coding) in vSAN employs a 3+1 (RAID-5) or 4+2 (RAID-6) host or fault domain requirement, depending on one or two failures to tolerate respectively.
RAID-5/6 (Erasure coding) does not support three failures to tolerate. This setting is only available for RAID-1 (Performance).
Host reservation in vSAN generally reserves ESXi hosts in a vSAN cluster, meaning that these hosts are participating in the vSAN cluster and act as dedicated failover resources depending on the availability requirements of a cluster when it comes to the event of a failure or maintenance operations. This optional setting brings higher availability and resilience.
n+1 guarantees an extra host to satisfy the vSAN cluster availability requirement in terms of a host failure or maintenance operations.
n+2 guarantees two extra hosts to satisfy the vSAN cluster availability requirement to maintain a n+1 reservation at any point in time. No degradation in performance or availability occurs when a host fails or need maintenance.
Deduplication and compression are most suitable for highly dedupable workloads, such as VDI full clones. On-disk format version 3.0 and later adds an extra overhead, typically no more than 1–2 percent capacity per device. Deduplication and compression with software checksum enabled require extra overhead of approximately 6.2 percent capacity per device.
Reduction ratios are based on VMware measurements that provide an indication of how deduplication and compression can save space on a vSAN cluster per workload type. Always adjust these ratios to the desired workload reductions ratios measured in an on-premises environment as a reference value.
Reduction ratio indications:
General purpose VMs: 1.5
File services: 1.5
VDI full clone: 8
VDI instant clone: 2
VDI linked clone: 2.5
Operations reserve defines the capacity needed form vSAN to perform internal operations, such as policy changes, rebalancing, and data movement.
vSAN provides the option to reserve the capacity in advance so that it has enough free space available to perform internal operations and repair data back to compliance following a single host failure. By enabling reserve capacity in advance, vSAN prevents you from using the space to create workloads and intends to save the capacity available in a cluster. By default, the reserved capacity is disabled.
If the vSAN cluster has enough free space, you can enable the operations reserve and the host rebuild reserve. Operations reserve is reserved space in the cluster for vSAN internal operations. Host rebuild reserve is the reserved space for vSAN to be able to repair in case of a single host failure.
Enabling host rebuild reserve demarcates one hosts worth of capacity in the cluster. host rebuild reserve works on the principle of n+1. For example, in a four-node cluster of identical hardware configuration, host rebuild reserve would require 25 percent reserve capacity to ensure sufficient rebuild capacity. When a vSAN cluster increases in size, the host rebuild reserve is decreasing. For example, in an eight-node cluster of identical hardware configuration, host rebuild reserve requires 12.5 percent.
As a vSAN cluster scales horizontally the vSAN capacity scales linear with the vCPU and vMEM resources, which results in potential unused vCPU and vMEM resources compared to the provided vSAN storage.
VMware vSphere high availability is an integral part of every vSphere deployment that provides the ability the reserve cluster capacity for vCPU and vMEM resources if a host must go into maintenance mode or fails because to a hardware error. It guarantees that all VMs in the cluster are provided with sufficient vCPU and vMEM resources. Typically this high-availability reserve is calculated on a percentage based on the number of hosts in a cluster. In a three-node cluster of identical hardware configuration, high-availability reserve requires 33 percent reserve capacity to ensure sufficient high-availability capacity. In an eight-node cluster of identical hardware configuration, high-availability reserve requires 12.5% reserve capacity to ensure sufficient rebuild capacity. The examples show the high-availability reservation for a n+1 scenario. For n+2, the reservation doubles in size.
Unlike many on-premises vSAN clusters, Oracle Cloud VMware Solution is a scale-out only solution, meaning that you can’t scale up an ESXi host by adding disks to expand disk groups or create more disk groups. Oracle Cloud VMware Solution can only scale on the vSAN storage level by adding hosts to the cluster—also known as horizontal scaling. A cluster can scale from three hosts to a maximum of 64 hosts in a single SDDC.
Oracle recommends VMware SDDCs deployed across availability domains within a region can’t exceed a maximum of 16 ESXi hosts.
Depending on the chosen VM shape (Intel or AMD), the scaling of a cluster is dictated by the required vCPU, vMEM, storage requirements (FTM and FTT), and potential storage saving techniques line deduplication and compression, depending on the workload type.
The following table shows the raw capacity of on eight-node Oracle Cloud VMware Solution vSAN cluster with a CPU to vCPU ratio of 1:2 without applying any storage saving techniques like deduplication and compression and without the operations and host rebuild reserve. The actual usable vSAN capacity is dictated by the FTM and FTT settings per VM object.
See Dense I/O Shapes for more detail.
A useful and powerful tool to verify the sizing and scaling for VMware vSAN clusters is the VMware vSAN Sizer, which brings the ability to calculate a sizing for mixed workload clusters with different FTM and FTT levels considering VM CPU, memory requirements, n+1 or n+2 configurations, deduplication and compression, CPU to vCPU ratios, operations reserve, and host rebuild reserve.
This blog summarizes the general sizing and scaling principles of VMware vSAN on Oracle Cloud VMware Solution for the different types of clusters that are either Intel- or AMD-based to demonstrate sizing and scaling options for customers that want to migrate workloads to Oracle Cloud VMware Solution, extend their local VMware SDDC to an Oracle Cloud VMware Solution SDDC, use Oracle Cloud VMware Solution for disaster recovery, or simply start with a greenfield Oracle Cloud VMware Solution deployment to host legacy workloads, VMware Tanzu, or VDI on Oracle Cloud VMware Solution.
If you want to read more on this topic, the following links give you more insight about VMware on Oracle Cloud Infrastructure:
Thomas Thyen is a Cloud VMware Solutions Specialist within the Technical Cloud Engineering Organization at Oracle mainly working on Oracle Cloud VMware Solution and integration with the VMware product stack and 3rd party products. He is specialised in technologies like VMware vSphere, VMware vSAN, VMware NSX, VMware HCX, VMware vSphere Replication, VMware Site Recovery Manager, VMware vRealize Suite, VEEAM Backup & Replication and Citrix Virtual Apps & Desktops.