By Darryl Gove-Oracle on Mar 17, 2008
A while back I was reading up on the intel compiler's -auto_ilp32 flag. This flag produces a binary that enjoys the benefits of the EMT64 instructions set extensions, but uses a 32-bit memory model. The idea of the flag is to get the performance from the instruction set while avoiding the increase in memory footprint that comes from 64-bit pointers. I'm totally in favour of the idea, after all, that's the idea behind the v8plus/sparcvis architecture.
However, I was a bit distressed at the details of the implementation. The flag tells the compiler to make two assumptions, firstly that longs can be constrained to 4-bytes (rather than 8-bytes), secondly that pointers can also be held in 4 bytes (rather than 8).
The assumption for longs can be argued that if the application works when compiled for IA32, then any longs in the program do fit into 32-bits, so only using 32-bits for longs is therefore ok.
The docs place this restriction on the use of the flag for pointers:
Specifies that the application cannot exceed a 32-bit address space, which allows the compiler to use 32-bit pointers whenever possible.
The idea being that if the application only uses a 32-bit memory range, then the upper bits of the 64-bit pointer are going to be zero anyway - so why store them.
The problem with both of these is that the code ends up being run as if it were a 64-bit application. So the OS thinks the app is 64-bit, so will quite happily pass 8 byte longs to the app, or pass memory that is not in the low 32-bit addressable memory.
To go into a couple of situations where this could be 'bad':
- Imagine a program which relies on a zero return value from
mallocto tell it to stop allocating memory (perhaps it caches data, or maybe the algorithm it choses depends on how much memory is available). Under -auto_ilp32, the OS keeps returning new memory, so the application thinks it has not run out of memory, but in fact it is getting memory that is no longer addressable using just 32-bits.
- Consider an application which allocates a chunk of memory to handle a problem. The larger the problem, the more memory is required. At some point the size of the problem will require a chunk of memory that is not entirely 32-bit addressable. Because malloc does not return zero, the application has no way of knowing that the problem is too large.
- The application takes the pointers to memory that is not 32-bit addressable, and drops the upper 32-bits. Now it has a pointer into 32-bit addressable memory that has already been used for other data, and it starts storing new data on top of the old data.
- Imagine another situation where the application is profiled, and it's realised that too much time is spent in the memory management library. So the developer produces a new library and gets the application to use this in preference to the system provided library. The only side-effect of this library is that it starts allocating memory at a higher address, which doesn't matter if you have a 64-bit address, but it eats up a sizable amount of the 32-bit addressable space.
In many of these cases the memory corruption problem would just leap out, and the user would know to remove the flag, but there will be some cases where the flag could cause silent data corruption.
But hang on a second ... didn't I just say that SPARC has the same hybrid. Well, not quite. v8plus is, as it's name suggests, built on SPARC V8 - so the OS thinks that it is a 32-bit application. Hence longs are 32-bits in size, as are pointers. There's some code necessary to store the additional state of a V9 processor. But basically, the application is a 32-bit app which happens to use a few more instructions.
The thing that's frustrating is that it's quite possible to make the OS aware of the 32-bit/64-bit hybrid, or to engineer a layer to be able to safely run a 32-bit hybrid app with the OS believing it to be a 64-bit app, but that was not the approach taken. A bit of a missed opportunity, IMO.