By rchrd on Jun 27, 2007
Prefetching Pragmas and IntrinsicsDiane Meirowitz, Senior Staff Engineer, and Spiros Kalogeropulos, Staff Engineer, June, 2007
Explicit data prefetching pragmas and intrinsics for the x86 platform and additional pragmas and intrinscs for the SPARC platform are now available in Sun Studio 12 compilers, released June 2007.
Prefetch instructions can increase the speed of an application substantially by bringing data into cache so that it is available when the processor needs it. This benefits performance because today's processors are so fast that it is difficult to bring data into them quickly enough to keep them busy, even with hardware prefetching and multiple levels of data cache.
The compilers have several options that enable them to generate prefetch instructions automatically:
-xprefetch, -xprefetch_level, and
(described below). The compilers generally do an excellent job of
inserting prefetch instructions, and this is the most portable and best
way to use prefetch. If finer control of prefetching is desired,
prefetch pragmas or intrinsics can be used. Note that the performance
benefit due to prefetch instructions is hardware-dependent and
prefetches which improve performance on one chip may not have the same
effect on a different chip. It is a good idea to study the instruction
reference manual for the target hardware before inserting prefetch
pragmas or intrinsics. Furthermore, the Sun Studio Performance Analyzer
can be used to identify the cache misses of an application.