What do you prefer - a faulty program to run slowly or to die immediately ?
By seongbae on Jun 28, 2005
Whenver we change the default compilation flag, often we have to make interesting tradeoffs. A relatively recent trade off was regarding the default value for -xmemalign (Here's the link to the exact page for the description of the flag). Starting from Studio 9, our compiler uses -xmemalign=8i in 32bit mode as default (vs -xmemalign=4s before).
With -xmemalign=4s, a program that does unaligned memory access would die right away, telling the developer what was wrong and where it went wrong. But with -xmemalign=8i, such program would simply run slowly and it's somewhat difficult to track down such a performance degradation (well, if you know where to look at, dtrace is again your friend here but you pretty much have to know the answer beforehand).
Sounds bad, so why did we do it ? The answer is again, the performance (what else?). When you compile a code with -xmemalign=8i as opposed to 4s, the compiler can safely use 8byte store and load for appropriately sized datum (like double precision floating point). Since most programs are alignment-safe, that is, most code don't do funky typecasting (like casting a char pointer to an integer point and accessing it), this change doesn't cause any performance degradation on those correct programs but actually could improve them somewhat. You can ask why not -xmemalign=8s. Unfortunately, some constructs in Fortran and the artifact of 32bit ABI in SPARC makes it not possible to use -xmemalign=8s even for a completely correct program. However, those occurrences are very rare as to not affect the overal performance in most programs, hence the decision to switch to -xmemalign=8i.
But when you're writing code, you may actually want to use -xmemalign=4s instead of -xmemalign=8i, to make it easier to find any alignment trouble in your code. Anyway, if I have some time, I'll write on how all these unaligned access works - the dance between the user code and the kernel. In the mean time, if you're in a hurry, here would be an interesting point to start looking at how kernel emulates unaligned access for 32bit apps).