Thursday Jul 01, 2010

Partition Alignment Guidelines for Unified Storage

If you create and access logical disks (aka LUNs) from your Sun Unified Storage appliance, whether over iSCSI or Fiber Channel, you should be aware that client side partition alignment can have a big impact on performance. This is a generic issue that applies to any virtual disk interface, not just Unified Storage, and relates to how client generated virtual disk I/O maps to actual I/O in the appliance. The good news is that it can be quite easy to properly align partitions.

Background

The reason we care about alignment is that most of the storage industry is based on the historical abstraction of 512 byte sectors, however the consumers of those sectors (filesystems and applications managing raw storage) and the sophisticated storage arrays that provide block storage generally organize their data internally in larger units, typically 4KB or multiples thereof. This includes LUNs in a Sun Unified Storage appliance, which use a default volume block size of 8KB. Without proper care, filesystem blocks can end up non-aligned with the natural block size of the storage, impacting performance.

With proper alignment, a single client block that is the same size or less as the volume block size of a LUN will be contained entirely within a single volume block in the LUN. Without proper alignment, that same client block may span multiple volume blocks in the LUN. That could result in 2 appliance reads for a single client read, and 2 appliance reads plus 2 appliance writes for a single client write. This will obviously have a big impact on performance if ignored.

The graphic below illustrates partition misalignment:

What we see in the graphic above is that the LUN is divided into fixed sized volume blocks, which are 8KB by default in the Unified Storage appliance. A given volume block in the LUN will always be read or written as a whole. When a LUN is imported by the client, it is presented as if it were a physical disk drive, with its own virtual sector size of 512 bytes. The client generally allocates sectors to one or more partitions or slices, which are then made available for file systems or raw application data. The block addresses of application or file system I/O are relative to the first sector in the partition, and the partition may be allowed to start on an arbitrary 512 byte sector.

In the example above, partition block P0 starts on an arbitrary sector, and spans LUN volume blocks L0 and L1. If we issue an 8KB read to P0, which matches the default 8KB volume block size of the LUN, we will have to read both L0 and L1 to get the data. If we issue an 8KB write to P0, we will have to read both L0 and L1 to get the data that is not being changed, then write back both L0 and L1 to store the combined new and old data.

The graphic below illustrates a properly aligned partition:

We now see that partition block P0 starts on a 512 byte sector that coincides with the start of LUN volume block L1. If we issue an 8KB read to P0, which matches the default 8KB volume block size of the LUN, we only have to read L1 to get the data. If we issue an 8KB write to P0, we simply replace LUN volume block L1 and do not need to do any reads at all. We have skipped a small amount of space in the LUN, but the result is a potentially large reduction in I/O.

There are three primary issues that lead to partition misalignment:

  • Most platforms consume some number of sectors at the beginning of a disk for a disk label, and actual data storage must skip these sectors to avoid overwriting the label.
  • Most partition management software, such as format, parted, and diskpart, was written to manage physical disks with 512 byte sectors. If there are constraints on how sectors are grouped into partitions, they typically relate to disk characteristics like heads, tracks, and cylinders, not virtual characteristics like user selected volume block sizes.
  • Most platforms allow you to create multiple partitions or slices within a disk, so even if the first partition is aligned, subsequent partitions may not be aligned.
Taken together, these factors mean that block zero of a given partition may map to an arbitrary 512 byte sector on a virtual disk, and for most platforms your partitions will not be aligned on LUN volume block size boundaries by default.

General Recommendations

If possible, use a disk label that allows sector addressing rather than cylinder addressing for partition/slice locations. This allows for simple math when calculating alignment.

If possible, create a single data partition/slice on the LUN, especially if you must use cylinder addressing. This avoids having to calculate alignment at multiple points within the LUN.

If you are offered an explicit alignment option by your disk partitioning software, use it. This currently only applies to Windows 2003 or later, where the diskpart.exe utility allows an "align=X" option on the create partition command, where X is the desired alignment in kilobytes. You should specify an alignment that either matches the volume block size in the LUN, or is a power of two and is larger than the volume block size.

Aligning by Sector

To manually calculate partition alignment by sector, make sure that the starting sector number of each partition is a multiple of the number of sectors in a LUN volume block. For example, with 512 byte sectors, there are 16 sectors in a default 8KB LUN volume block. In that case, the starting sector of each partition/slice should be a multiple of 16. The maximum volume block size for LUNs in the Sun Unified Storage appliance is currently 128KB, and there are 256 sectors in a 128KB volume block. For a 128KB volume block size, the starting sector of each partition/slice should be a multiple of 256.

If you are aligned for a power of two volume block size, you are also aligned for any smaller power of two volume block size. All supported volume block sizes in the Sun Unified Storage appliance are powers of two, so aligning for the maximum 128KB volume block size (ie starting partitions on multiples of 256 sectors) ensures alignment for all currently supported LUN volume block sizes.

Aligning by Cylinder

If you use a disk label that requires partitions/slices to begin on a cylinder boundary (for example, Solaris SMI labels), make sure that the starting cylinder number multiplied by the number of sectors per cylinder is a multiple of the number of sectors per LUN volume block.

The following Least Common Multiple (LCM) method can simplify the process:

  • Determine sectors per cylinder. In Solaris format, this is nhead \* nsect. In Linux fdisk, this is heads \* sectors/track.
  • Determine sectors per LUN volume block. There are two 512 byte sectors per kilobyte, so an 8KB volume block is 16 sectors, and a 128KB volume block is 256 sectors.
  • Find the LCM of the number of sectors per cylinder and per LUN volume block. For example, by using a tool like http://www.mathsisfun.com/least-common-multiple-tool.html
  • Divide the LCM by the number of sectors per cylinder
  • The result is the first non-zero cylinder that is aligned for your volume block size. Any cylinder that is a multiple of this number is also aligned.
For example, with 255 heads and 63 sectors per track, we have 16065 sectors per cylinder. With an 8KB LUN volume block size, we have 16 sectors per volume block. The LCM of 16065 and 16 is 257040. Dividing the LCM by 16065 (sectors per cylinder) gives us 16. Cylinder 16 is the first non-zero cylinder that is aligned for an 8KB LUN volume block, and any cylinder that is a multiple of 16 is also aligned.

Caveats

  • Do not use sector 0 of an MBR/msdos labled LUN, or sectors 0 through 33 of an EFI/gpt labeled LUN, to avoid overwriting the label.
  • Do not trust cylinder numbers reported by Linux fdisk or parted, because both may be rounded to the nearest cylinder. As described in the Linux specific section below, set units to sectors in both tools to verify alignment.
  • Do not trust KB offsets reported by Windows diskpart.exe, because they may be rounded to the nearest KB. As described in the Windows specific section below, you can use the wmic.exe utility to display actual byte offsets.
  • Do not trust cylinder numbers reported by Solaris fdisk on x86/amd64/x86_64 in interactive mode, because they may be rounded to the nearest cylinder. As decribed in the Solaris on x86/amd64/x86_64 section below, you can run "fdisk -W - {raw_device}" and use the reported Rsect (relative starting sector) to verify alignment. Note the Solaris fdisk will only create cylinder aligned partitions, so this issue relates primarily to reporting the location of partitions created by another mechanism.
  • If you use an SMI label with Solaris on x86/amd64/x86_64 keep in mind that the SMI label subdivides a partition within an MBR/msdos labeled LUN, so there are two levels of alignment to consider. See the Solaris on x86/amd64/x86_64 section for details.

Platform Specific Recommendations

Solaris on SPARC

If possible, use an EFI label (requires "format -e") which allows sector addressing. Configure data slices with a starting sector that is a multiple of the number of 512 byte sectors per LUN volume block. With a default volume block size of 8KB, the starting sector of each slice should be a multiple of 16. With any currently supported volume block size up to 128KB, a slice can begin on a sector that is a multiple of 256.

If you use an EFI label, ensure that sector 0 through 33 are not assigned to any slice, to avoid overwriting the label.

If you use an SMI label, you will be constrained to begin all slices on a cylinder boundary. To determine whether a cylinder is aligned on a LUN volume block boundary, multiply the cylinder number by the number of 512 byte sectors per cylinder. The result should be a multiple of the number of sectors per LUN volume block.

Refer to the Aligning by Cylinder section above for a Least Common Multiple method you can use to determine cylinder alignment.

Solaris on x86/amd64/x86_64

If possible, use an EFI label (requires "format -e") which allows sector addressing. However, be aware that unlike an SMI label, which subdivides a partition within an MBR/msdos labeled LUN when used on x86/amd64/x86_64, an EFI label replaces any existing MBR/msdos label, destroying any existing non-Solaris partitions.

If using an EFI label, use the same EFI guidelines as those described above in the Solaris on SPARC section.

If you use an SMI label with Solaris on x86/amd64/x86_64 keep in mind that the SMI label subdivides a Solaris2 partition within an MBR/msdos labeled LUN, so there are two levels of alignment to consider. The Solaris fdisk utility will report partitions relative to the beginning of the disk/LUN, and the Solaris format utility will report slices relative to the beginning of the Solaris2 partition.

One caveat with fdisk is that in interactive mode it will only create cylinder aligned partitions, but will also report partition starting points rounded to the nearest cylinder if they were created by another mechanism and are not actually cylinder aligned.

To confirm that a Solaris2 fdisk partition starts on a cylinder boundary, run "fdisk -W - {raw device}" and verify that the reported Rsect (relative starting sector) is a multiple of the number of sectors per cylinder.

The simplest alignment method for SMI on x86/amd64/x86_64 is to ensure that the Solaris2 partition created/reported by fdisk is on a non-zero cylinder boundary that is aligned for your LUN volume block size. You can then use the same guidelines as those described above in the Solaris on SPARC section to align slices within the Solaris2 partition using the format utility.

Refer to the Aligning by Cylinder section above for a Least Common Multiple method you can use to determine cylinder alignment.

Linux

Make sure that units is set to sectors when creating or displaying partitions in fdisk and/or parted. If using fdisk in interactive mode, the "u" command toggles units back and forth between sectors and cylinders. If using parted in interactive mode, the "units s" command sets units to sectors.

If you use either tool with units set to cylinders, the reported cylinder numbers may be rounded. Even if you do the math to determine a cylinder that should be aligned, you can not be sure that you are actually aligned unless you set units to sectors.

To ensure alignment, configure data partitions with a starting sector that is a multiple of the number of 512 byte sectors per LUN volume block. With a default volume block size of 8KB, the starting sector of each partition should be a multiple of 16. With any currently supported volume block size up to 128KB, a partition can begin on a sector that is a multiple of 256.

If you would like to choose a sector that is aligned for your LUN volume block and is also on a cylinder boundary, refer to the Aligning by Cylinder section above for a Least Common Multiple method you can use to determine cylinder alignment. After determining an aligned cylinder, multiply the cylinder number times sectors per cylinder, and use that as your starting sector number.

If you use a gpt label (equivalent to an EFI label in Solaris), ensure that sector 0 through 33 are not assigned to any partition, to avoid overwriting the label.

If you use an MBR (aka msdos) label, ensure that sector 0 is not assigned to any partition, to avoid overwriting the label.

Windows

For Windows 2003 SP1 and later, the diskpart.exe utility can be used to create aligned partitions by including the align=X option on the create partition command, where X is the desired alignment in kilobytes. To create an aligned partition, simply specify a power of two alignment that is greater than or equal to the LUN volume block size. For example, use align=128 to align for any LUN volume block size up to 128 KB. The default in Windows Vista and Windows 2008 is align=1024, which is correctly aligned for any power of two LUN volume block size up to 1MB, and does not need to be changed.

A caveat with the diskpart.exe utility is that it displays the offset in KB, but this is a rounded value. For example, a default Windows 2003 partition offset of 63 sectors is actually 31.5 KB, but will be displayed by diskpart.exe as 32 KB.

To determine the actual byte offset of partitions in Windows, you can use the wmic.exe utility, with a command like:

wmic partition get StartingOffset, Name, Index

This will show partition information for all of the basic disks/luns in the system, with StartingOffset specified in bytes. For proper alignment, StartingOffset should be a multiple of the number of bytes (not sectors) in the LUN volume block. For example, with a default 8KB LUN volume block size, StartingOffset should be a multiple of 8192.

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

About

user12610824

Search

Categories
Archives
« July 2010
SunMonTueWedThuFriSat
    
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
       
Today