How to reach the maximum disk I/O throughput with Windows OS instances on OCI

February 1, 2023 | 10 minute read
Marco Santucci
EMEA Enterprise Cloud Solution Architect
Text Size 100%:

When maximum I/O disk performance is essential for your Windows-based production service, choosing the right configuration that can deliver it is critical. This post covers two options to achieve the max I/O throughput with a Windows OS-based instance on Oracle Cloud Infrastructure (OCI). Windows OS currently has several limitations for the shapes available and eligible with the OCI Block Volume service.

The final I/O performance of a server is a sum of several factors, not only the Compute shape. Choosing a bare metal shape doesn’t mean you automatically obtain the best I/O. Considering the cost of these instances and understanding how to reach max I/O with minimal cost are important for customers and companies of all kinds.

You have two main Block Volume attachment options to consider when you need local reliable disk performance for your service in the OCI environment: ISCSI and paravirtualized. NVME storage is also available but isn’t considered persistent. Block Volume offers more longterm reliability and scalability.

The max I/O throughput that an instance can reach not only depends on the block volume itself, but also the following components:

  • Network: The total network bandwidth provided by the network interface cards (NICs) available on the instance. The block volumes are attached using the network to the Compute instance.

  • Attachment type: ISCSI or paravirtualized attachments offer different types of performance.

  • Block volume performance: Volume performance units (VPUs) affect the performance that the block volume can deliver.

  • oCPU: The computational process of the instance can affect the performance too.

The following table shows the storage services available in OCI with some use cases to understand which storage is more suitable for your service.

  Local NVMe Block Volume File Storage Object Storage Archive Storage
Type NVMe SSD-based temporary storage NVMe SSD-based block storage NFSv3 compatible file system Highly durable object storage Longterm archival and backup
Access Block Block File Object Object
Structure Block level structured Block level structured Hierarchical Unstructured Unstructured
Durability Nonpersistent and survives reboots Durable (Multiple copies in an availability domain) Durable (Multiple copies in an availability domain) Multiple across availability domains Multiple across availability domains
Capacity Terabytes+ Petabytes+ Exabytes+ Petabytes+ Petabytes+
Unit Size 51.2 TB for bare metal, 6.4-25.6 TB for virtual machines (VMs) 50 GB to 32 TB/vol with 32 vols/instance Up to 8 Exabyte 10 TB/object 10 TB/object
Use cases OLTP, NoSQL, Data warehousing Database, VMFS, NTFS, boot, and data disks for instances Oracle apps (E-Business Suite), high-performance computing, general purpose file systems Unstructured data, including logs, images, and videos Backups and long-term archival (Database backups)

So, if you want to ensure that, for example, your SQL server has all the required performance, scalability, and reliability for your production, choosing the right infrastructure is quite important.

Option 1: Ultra high-performance block volume paravirtualized attachment

Currently in OCI, the ISCSI UHP block volumes aren’t supported by Windows OS images. The only way to take advantage of UHP Block Volume is by using the paravirtualized attachment. This option has the following imitations:

  • Multipath-enabled attachments are required to optimize performance with volumes configured for ultra high performance and only one volume can be attached with a paravirtualized multipath-enabled attachment to an instance at a time.

  • Multipath-enabled attachments have several prerequisites and requirements to meet.

  • The maximum block volume size is 32 TB.

This following table is an example of the performance reachable with a paravirtualized UHP 120 VPU Block Volume attachment with a 16 oCPU VM.standard.E4.Flex instance. The minimum oCPUs required to support UHP block volumes.

D:\5G file MiB/S I/O per second
64K_Random/READ 1,891.02 30,256.24
64K_Random/WRITE 1,879.58 30,073.23
64K_Sequential/READ 1,889.94 30,238.97
64K_Sequential/WRITE 1,879.01 30,064.17
64K_Random/R40%_W60% 2,612.98 41,807.65
64K_Sequential/R40%_W60% 2,617.78 41,884.44

 

UHP can achieve up to 2,680 MB/s per volume at maximum speed, so with the paravirtualized attachment, this limit of the I/O is achievable.

Option 2: Multiple high-performance block volumes ISCSI attachment

If a paravirtualized attachment with UHP Block volume can’t fulfill your needs for performance or costs, another option is to create a storage pool of multiple ISCSI attached block volumes (VPU 20) to further increase the performances reachable on a Windows OS instance. This option has the following imitations:

  • In Windows, you can’t extend a disk that’s part of a storage pool, unlike with the paravirtualized attachment, so carefully consider your yearly growth rate for storage size and provide the right block volume dimension according to your needs.

  • You can add more block volumes to the pool, but you can’t increase or extend your previous logical partition with these block bolumes. You must create a different new virtual disk.

Use the following steps to achieve the absolute max ISCSI I/O throughput on a Windows OS instance:

  1. Choose a good shape for cost and performance that can offer a high network bandwidth. For example the VM.Optimized3.Flex provides 4 Gbps per oCPU (Up to 40 Gbps). Standard Compute shapes provide 1 Gbps per oCPU.

  2. Create a set of high-performance VPU20 block volumes, according to the max performance provided by the NICs bandwidth.

    Set the block volumes’ size to at least the minimum dimension required to reach its best performance, according to the total storage size that you need. For example, for a high-performance block volume VPU20, to achieve 50,000 IOPS with a throughput of 680 MB/s, the minimum size per volume to set is 1,134 GB. This reference of shapes and block volume performances can assist in setting the correct size.

    Deploy a set of 8–10 block volumes to achieve the max performance provided by the Compute shape.

  3. Attach all the VPU20 block volumes created to your shape with ISCSI.

  4. In Windows OS, create a storage pool and virtual disk with the all block volumes previously attached.

  5. Test your configuration with IoMeter or DISKSPD to check the instance performance. The I/O throughput can change in relation to the type of workload running on the configurations. Simulate and test the best workload accordingly to the production service that you intend to use on this type of configuration, such as read/write percentage, random, or sequential.

The following table provides performance reference for a 10-oCPU VM.Optimized3.Flex instance with n.10 VPU20 block volumes of 1,200 GB attached in ISCSI:

D:\5G file MiB/S I/O per second
64K_Random/READ 4,070 65,121
64K_Random/WRITE 3,416 54,656
64K_Sequential/READ 4,592 73,476
64K_Sequential/WRITE 3,850 61,606
64K_Random/R40%_W60% 3,621 55,254
64K_Sequential/R40%_W60% 4,044 61,715

 

With this method, depending on the type of workload, you can reach almost the maximum I/O throughput that the network bandwidth can provide (40 Gbit/s), approximately 4.5GB/s.

Considerations

What type of instance or storage attachment should you choose for a Windows OS high workload I/O throughput? To make the right choice for your workload, consider the following factors:

Cost

The UHP block volume (VPU120 for the example) is more expensive than a HP VPU20 block volume, but if you need it to reach a high I/O throughput and the size of your data is low, the cost of UHP can be quite affordable. Consider using performance-based autotuning for OCI Block Volume to keep the costs low as possible.

Remember that it requires a shape with at least 16 oCPUa to attach an UHP volume, and the license of Windows OS is calculated per oCPU. On the other hand, to obtain a high I/O throughput with a set of VPU20 HP block volumes, you have to allocate the minimum size of the block volume to reach its max performance (1,134 GB) and deploy a set of 8–10 block volumes to achieve the top performance of the instance. In this case, you need to consider that the minimum size of the storage allocated for this configuration is approximately 9–12 TB.

Max performance reachable

If you need to reach the max performance possible with Windows OS, excluding the dense shape with local NVME storage, the only way to achieve it is with a set of ISCSI VPU20 Block volumes used together in a Windows storage pool.

Windows’ volume expandability

If you want the flexibility to expand your virtual volume using Disk Management in Windows OS, the only option is to use the single UHP Block volume with a paravirtualized attach (Option 1).

Max storage dimension attachable

The maximum dimension of a block volume is 32 TB. With a UHP paravirtualized block volume, you can attach only one block volume. With ISCSI attachment, you can attach up to n.32 block volumes of 32 TB each, approximately 1 PB.

Deployment difficulty

The UHP paravirtualized block volume is much simpler to set up and deploy than the configuration with multiple ISCSI block volumes attached.

A table comparing an instance with a paravirtualized block volume with an instance with multiple ISCSI block volumes.

Examples of price-performance configurations

In the following exmaples, the costs are indicative and based on the pricelist of December 2022. The I/O throughput can change in relation to the type of workload running on the instance.

  • The cost of the minimum configuration with n.1 paravirtualized UHP VPU90 block volume that can achieve around 2000 MB/s of throughput is approximately €1,600/month.

    • 16 oCPU VM.standard.E4.Flex

    • 64 GB RAM

    • 1,430 GB UHP 90VPU block volume

  • The cost of the configuration with n.1 paravirtualized UHP VPU120 block volume that can achieve the max throughput (2700 MB/s) is approximately €2,328/month.

    • 24 oCPU VM.standard.E4.Flex

    • 64 GB RAM

    • 1,490 GB UHP 120VPU block volume

  • The cost of the minimum configuration with n.5 ISCSI HP VPU20 Block volumes that can achieve around 2000 MB/s of throughput is approximately €534/month.

    • 4 oCPU VM.Optimized3.Flex

    • 64 GB RAM

    • 5,670 GB 5 HP 20VPU block volumes of 1,134 GB each

  • The cost of the configuration with n.8 ISCSI HP VPU20 Block volumes able to achieve the max throughput (4,500 MB/s) is approximately €1,579/month.

    • 10 oCPU VM.Optimized3.Flex

    • 64 GB RAM

    • 9,072 GB 8 HP 20VPU block volumes of 1,134 GB each

Conclusion

Deciding which configuration is best for your workload isn’t always easy. Option 2 shows the absolute max I/O achievable with Oracle Cloud Infrastructure Block Volume service in Windows OS, even if you choose a bare metal instance. If you need this kind of performance, replicate it on your tenancy. If the disadvantages of flexibility are a problem, use the resources within this post to choose a configuration that better fulfills the requirements of your service.

 

For more information on the concepts in this blog post, see the following resources:

Marco Santucci

EMEA Enterprise Cloud Solution Architect


Previous Post

Hospital service line cost accounting

Abhiram Jakkireddy | 6 min read

Next Post


Protect data in use with OCI Confidential Computing

Klaudia Warner | 3 min read