Inquiring Minds Want To Know: UFS

UFS in Solaris 10
I spent the better part of the last year getting to know UFS. I think we are on a first name basis now :-). Thus, I begin my blog debut with some interesting UFS bugs and how they were fixed.

UFS had many improvements integrated in to Solaris 10 and Solaris 9 9/04: Bug fixes, logging on by default and general robustness improvements. In this post I will talk about three specific bug fixes which affect the UFS tuneable maxcontig and therefore aspects of UFS performance.

4639871 Logging ufs fails to boot from ATA drive on Ultra-10 if maxphys is too large
4638166 Ultra 5/10 panics with simba and pci errors if logging enabled and maxphys > 1MB
4349828 Inconsiderate tuning of maxcontig causes scsi bus to hang

As a result of these bugs, UFS in Solaris 10 and Solaris 9 9/04 was modified to change the values that could be used to set maxcontig and subsequently the value used for the maximum transfer size when I/O was issued.

Previously, an inconsiderate value set either for maxcontig or maxphys(in /etc/system) would result in a system getting hung. This was due to the fact that the filesystem I/O request size was calculated using the value set for maxcontig. The maximum transfer rate of the underlying device was never considered when calculating the size of the I/O transfer in UFS.

In UFS, the filesystem cluster size, for both reads and writes, is set to the value set for maxcontig. The filesystem cluster size is used to determine:

  • The maximum number of logical blocks contiguously laid out on disk for a UFS filesystem before inserting a rotational delay.
  • When, and the amount to read ahead and/or write behind if the sequential IO case is found. The algorithm that determines sequential read ahead in UFS is broken, so system administrators use the maxcontig value to tune their filesystems to achieve better random I/O performance.
  • The UFS filesystem cluster size also indicates how many pages to attempt to push out to disk at a time. It also determines the frequency of pushing pages because in UFS pages are clustered for writes, based on the filesystem cluster size.

How These Bugs Were Fixed:
1) The UFS filesystem cluster size(maxcontig) and I/O transfer size were separated, therefore removing the dependency that was causing systems to hang. UFS will no longer allow a setting of maxcontig to interrupt or hang any I/O requests to the device. UFS will always issue I/O requests that <= maximum transfer size of the device hosting the filesystem.

The UFS filesystem cluster size is still set using the value indicated for maxcontig. The I/O transfer size will be set in UFS as shown below.

2) The value for rotational delay(gap mkfs(1M),-d tunefs(1M)) no longer makes sense. The devices today are very sophisticated and do not need a delay artificially built in via software. As noted above, the value of maxcontig, determines the length of contiguous blocks placed on disk, before inserting space to account for rotational delay. The value for rotational delay has been obsoleted in Solaris 10 and Solaris 9 9/04 and defaults to 0 now, ensuring contiguous allocation.

Transfer size of I/O requests in UFS:
The device that hosts the filesystem will be queried as to the maximum transfer size it can handle, and the UFS I/O transfer size will default to this, if this information is obtainable. If the device does not support obtaining the maximum transfer data, the maximum transfer will be set using:

  • min(maxphys, ufs_maxmaxphys).

  • ufs_maxmaxphys is currently set to 1MB.

If, however the user sets the value of maxcontig to be less than the maximum device transfer size, UFS will honor the value of maxcontig as the maximum value for data transfers on this device.

The default value is determined from the disk drive's maximum transfer size as noted above. Any positive integer value is acceptable when setting this parameter, via tunefs(1M) or mkfs(1M).


Thank you very much for the info I was looking for, and Greetings from Malaga-Spain Antonio

Posted by Malaga on April 12, 2005 at 07:22 PM MST #

Thanks for the post. I was trying to determine whether the >128 bug was fixed in Solaris 10 and this is the only place that I could find it!

Posted by Michael Crozier on October 14, 2005 at 07:19 AM MST #

Hi, A quick question, why do we adviced to set maxcontig to 1 on random access? Is this to avoid System from detecting sequential from any randomly sequenced reads which would cause wasteful read-ahead? And, am i correct to say, in pure random, this value has no effect? The reason for this question is to I am currently tuning my application which does both random and sequential reads, however I dont like the idea of losing read-ahead benefits on sequential reads. Thanks. Ravi Nallappan

Posted by Ravi Nallappan on July 03, 2006 at 02:01 AM MST #

Post a Comment:
  • HTML Syntax: NOT allowed



« April 2014