The Ksplice Pointer Challenge

Back when Ksplice was just a research project at MIT, we all spent a lot of time around the student computing group, SIPB. While there, several precocious undergrads kept talking about how excited they were to take 6.828, MIT's operating systems class.

"You really need to understand pointers for this class," we cautioned them. "Reread K&R Chapter 5, again." Of course, they insisted that they understood pointers and didn't need to. So we devised a test.

Ladies and gentlemen, I hereby do officially present the Ksplice Pointer Challenge, to be answered without the use of a computer:

What does this program print?

#include <stdio.h>
int main() {
  int x[5];
  printf("%p\n", x);
  printf("%p\n", x+1);
  printf("%p\n", &x);
  printf("%p\n", &x+1);
  return 0;
}

This looks simple, but it captures a surprising amount of complexity. Let's break it down.

To make this concrete, let's assume that x is stored at address 0x7fffdfbf7f00 (this is a 64-bit system). We've hidden each entry so that you have to click to make it appear -- I'd encourage you to think about what the line should output, before revealing the answer.

printf("%p\n", x);
What will this print?

Well, x is an array, right? You're no stranger to array syntax: x[i] accesses the ith element of x.

If we search back in the depths of our memory, we remember that x[i] can be rewritten as *(x+i). For that to work, x must be the memory location of the start of the array.

Result: printf("%p\n", x) prints 0x7fffdfbf7f00. Alright.

printf("%p\n", x+1);
What will this print?

So, x is 0x7fffdfbf7f00, and therefore x+1 should be 0x7fffdfbf7f01, right?

You're not fooled. You remember that  in C, pointer arithmetic is special and magical. If you have a pointer p to an int, p+1 actually adds sizeof(int)to p. It turns out that we need this behavior if *(x+i) is properly going to end up pointing us at the right place -- we need to move over enough to pass one entry in the array. In this case, sizeof(int) is 4.

Result: printf("%p\n", x) prints 0x7fffdfbf7f04. So far so good.

printf("%p\n", &x);
What will this print?

Well, let's see. & basically means "the address of", so this is like asking "Where does x live in memory?" We answered that earlier, didn't we? x lives at 0x7fffdfbf7f00, so that's what this should print.

But hang on a second... if &x is 0x7fffdfbf7f00, that means that it lives at 0x7fffdfbf7f00. But when we print x, we also get 0x7fffdfbf7f00. So x == &x.

How can that possibly work? If x is a pointer that lives at 0x7fffdfbf7f00, and also points to 0x7fffdfbf7f00, where is the actual array stored?

Thinking about that, I draw a picture like this:


That can't be right.

So what's really going on here? Well, first off, anyone who ever told you that a pointer and an array were the same thing was lying to you. That's our fallacy here. If x were a pointer, and x == &x, then yes, we would have something like the picture above. But x isn't a pointer -- x is an array!

And it turns out that in certain situations, an array can automatically "decay" into a pointer. Into &x[0], to be precise. That's what's going on in examples 1 and 2. But not here. So &x does indeed print the address of x.

Result: printf("%p\n", &x) prints 0x7fffdfbf7f00.

Aside: what is the type of &x[0]? Well, x[0] is an int, so &x[0] is "pointer to int". That feels right.

printf("%p\n", &x+1);
What will this print?

Ok, now for the coup de grace. x may be an array, but &x is definitely a pointer. So what's &x+1?

First, another aside: what is the type of &x? Well... &x is a pointer to an array of 5 ints. How would you declare something like that?

Let's fire up cdecl and find out:

cdecl> declare y as array 5 of int;
int y[5]
cdecl> declare y as pointer to array 5 of int;
int (*y)[5]

Confusing syntax, but it works:
int (*y)[5] = &x; compiles without error and works the way you'd expect.

But back to the question at hand. Pointer arithmetic tells us that &x+1 is going to be the address of x + sizeof(x). What's sizeof(x)? Well, it's an array of 5 ints. On this system, each int is 4 bytes, so it should be 20 bytes, or 0x14.

Result &x+1 prints 0x7fffdfbf7f14.

And thus concludes the Ksplice pointer challenge.

What's the takeaway? Arrays are not pointers (though they sometimes pretend to be!). More generally, C is subtle. Oh, and 6.828 students, if you're having trouble with Lab 5, it's probably because of a bug in your Lab 2.


P.S. If you're interested in hacking on low-level systems at a place where your backwards-and-forwards knowledge of C semantics will be more than just an awesome party trick, we're looking to hire kernel hackers for the Ksplice team.

We're based in beautiful Cambridge, Mass., though working remotely is definitely an option. Send me an email at waseem.daher@oracle.com with a resume and/or a github link if you're interested!

Comments:

Should probably specify that sizeof(int)==4.

Posted by guest on October 18, 2011 at 08:20 AM EDT #

In a 64 bit architecture, an int would not be 8 bytes instead of 4 bytes in 32bits arch machine?

If so sizeof(int) is NOT 4 but 8.

Thanks

Posted by guest on October 18, 2011 at 08:20 AM EDT #

printf("%p\n", x+1);
What will this print? 0x7fffdfbf7f08
Not exactly...

sizeof(int) should be 8, not 4, on a 64-bit system.

Posted by guest on October 18, 2011 at 08:32 AM EDT #

You really should have specified sizeof (int) here.

Sure, on most 32 & 64 bit systems today it'll be 4, but, especially since you make a point of specifying a 64-bit system, it's certainly not a sure thing.

Posted by Sindisil on October 18, 2011 at 08:45 AM EDT #

I've lost track of how many times I've had to answer that "arrays are not pointers" on StackOverflow (http://www.google.com/search?q="arrays+are+not+pointers"+site%3Astackoverflow.com).

Posted by Adam Rosenfield on October 18, 2011 at 08:46 AM EDT #

For #2 you need to specify whether this is an ILP64 system or an LP64 system.

Posted by James R on October 18, 2011 at 08:49 AM EDT #

The right answer to all four questions is: it is undefined behavior. p conversion specifier expects a pointer to void argument. As it is undefined behavior, everything could be printed, even the books of Shakespeare;)

Now if we correctly cast the arguments to a pointer to void, the right answer for the four questions is then: it depends as it is implementation defined. Not only the size of an int is implementation defined, and this even in 64-bit systems (some are reported to have 64-bit int). But also the p conversion specifier is specified by the C Standard to print in an "implementation-defined manner".

Posted by ouah on October 18, 2011 at 09:10 AM EDT #

Post 1, 2, 3: on x86-64, an int is generally 4 bits wide, just like on a regular x86 system. However, the author didn't specify *which* 64-bit architecture we're running on. As far as I can see, there exists no 64-bit architecture which treats int as 8 bits wide (on Linux, source: http://makelinux.com/ldd3/chp-11-sect-1), however.

Posted by Purgox on October 18, 2011 at 09:15 AM EDT #

Regarding sizeof(int):
On the one hand, yes, technically, it's under-specified. On the other hand, explicitly stating it reminds you more than I'd like about how pointer arithmetic works, and gives you a clue to #2.

(In practice, though, sizeof(int) on almost all 64-bit systems is, in fact, 4:
http://en.wikipedia.org/wiki/64-bit#64-bit_data_models)

The other thing to remember is when this was designed, its objective was to stump know-it-all undergrads :). (In its initial incarnation, it was also interactive, so you could have asked "What is sizeof(int)?" and we would have answered.) So I guess that aspect doesn't translate over super-well.

Posted by Waseem on October 18, 2011 at 09:15 AM EDT #

One can, but shouldn't, think "&x" as "x" being an instance of some struct. That way it makes sense to sizeof(x) == 5*sizeof(int)

Posted by guest on October 18, 2011 at 09:26 AM EDT #

> As far as I can see, there exists no 64-bit architecture which treats int as 8 bits wide

Some Cray machine on UNICOS were know to have 64-bit int.

See for example Table 3. in "9.1.2.2 Types" in this Cray reference manual:
docs.cray.com/books/004-2179-003/004-2179-003-manual.pdf

Posted by guest on October 18, 2011 at 09:40 AM EDT #

> explicitly stating it reminds you more than I'd like about how pointer arithmetic works

That's a very good reason! Besides, after answering, the quiz-taker can tell whether they had the answer essentially correct.

Posted by Nolan Capehart on October 18, 2011 at 09:52 AM EDT #

I remember being horribly confused by pointers and arrays when first learning C, because of the unfortunate "pun" of the "decay to pointer". For that reason, I avoid using the pun. For years I have written such code as

printf("%p\n", &x[0]);
printf("%p\n", &x[1]);
printf("%p\n", &x);
printf("%p\n", &x+1); // see note below

And I assume nobody would get the wrong answers if in C, arrays were a first-class data type, and if the syntax for array declarations weren't suggestively confusing.

Finally, even in C, if my intent were to treat an array as an object that I want to get to the end of, I would write &x[ARRAY_SIZE], where "#define ARRAY_SIZE 5" or something, rather than &x+1. Furthermore, if what I really want is the address of something that actually follows the array, then I might write

struct {
int x[5];
int y;
} s;
&s.y

(assuming it is possible to define the struct to map to memory without padding issues).

So basically, I would like to know in what situations clearer code than the example code cannot be written.

Posted by guest on October 18, 2011 at 09:59 AM EDT #

> on x86-64, an int is generally 4 bits wide, just like on a regular x86 system

Note that the size of an int is technically dependent on the COMPILER, not on the target machine. Of course, a compiler generally uses the target machine as one of the factors that influences the size.

Posted by Nolan Capehart on October 18, 2011 at 10:01 AM EDT #

Regarding the question of int size, you could keep from explicitly declaring the size of int by mentioning, just in passing, that the total array size was 20 bytes. That would not even impact the last question since you would have to know exactly how to calculate &x + 1 before it became any kind of prompt.

Posted by callenish on October 18, 2011 at 11:04 AM EDT #

yup, sizeof(int) must be specifically specified to be 4 and compile specified to have packed arrays to support the explicit given answers.
Any optimization for speed at all will align the ints at word boundaries, so ints can be spaced 8 bytes apart, and &x+1 can will move forward 5*8 bytes...

Posted by guest on October 18, 2011 at 11:52 AM EDT #

printf("0x%p\n", x); will print 0x7fffdfbf7f00, your code will print 7fffdfbf7f00. You're no stranger to memory notation. alright.

Posted by guest on October 18, 2011 at 12:35 PM EDT #

All arrays are always packed. The ISO C standard requires this absolutely. Structs are an entirely different matter, but arrays are required to be stored as a contiguous series of items. This is why even the most hardened language lawyers over at comp.lang.c will happily divide the sizeof an array by the sizeof an element of that array in order to obtain the number of elements in that array (which makes it easy to write code that makes no assumptions about the array, so you only have one place to change the number of elements, and don't even need to rely on macros).

But that still leaves sizeof(int) == 4 as an assumption. It may be a reasonable one, but it's still an assumption, and is by no means guaranteed.

Posted by Micah Cowan on October 18, 2011 at 01:07 PM EDT #

I think a better way to phrase something like this isn't necessarily always to say arrays are not pointers, as you can at certain times treat them to be the same thing. However, when you start doing arithmetics on them, then you have to be a lot more careful.

For example

printf("%x", x);
printf("%p", x);
printf("%p", &x);
etc will all yield the same result, and very often in code for things like string and memory operations will x be an easy replacement for &x[0] (while not necessarily the cleanest).

As to another guest above that asked when clearer code than the above code can not be written.. a lot of real time DSP code I've written in the past involved creating temporary pointers and marching through an array with the pointer, though it may be messier, it's a good 50-70% faster. Something like..

int * pX = x;
for(i = 5; i--;)
*pX++ = i;

Posted by guest on October 18, 2011 at 01:27 PM EDT #

"ouah" found the correct answer in

http://blogs.oracle.com/ksplice/entry/the_ksplice_pointer_challenge#comment-1318961442719

The pointers get passed as (int *) or (int (*)[5]). See C99 6.2.5p26: their representation need not equal the representation of (void *), which is what %p expects.

Posted by lacos on October 18, 2011 at 01:47 PM EDT #

guest: Arrays are *always* packed in C. If the compiler wants to align 32-bit ints in arrays at 8-byte boundaries, it must return 8 for sizeof(int) - that is, the four padding bytes must be internal to the int.

Posted by caf on October 18, 2011 at 02:11 PM EDT #

When I got my first DEC Alpha with OSF/1 UNIX, the compiler treated an int as 64 bits. Later fixed under the renamed Digital UNIX when they moved to conform with the X/Open single UNIX Specification back to 32 bits.

It only caught me once, but that was enough.

Posted by guest-from-oracle on October 18, 2011 at 02:44 PM EDT #

This doesn't seem to be testing understanding of pointers so much as understanding obscure C pointer/array syntax semantics.
Basically, trying apply the & operator to a value that is already an address constant results in the address constant. It could just as easily default to 0, NULL, or whatever.

Posted by guest on October 18, 2011 at 04:11 PM EDT #

As many people stated, the answer to the second question is wrong. 64-bit system, sizeof(int) would return 8 not 4. the answer would be 0x7fffdfbf7f08

Posted by Felix on October 18, 2011 at 04:33 PM EDT #

Do not be fooled.... The Size of data types is defined by the Model the installed Compiler uses and not the Machine Architecture..
Its not CORRECT to say that on 32 bit Arch... size of int is 4 bytes
and for 64 bit Arch it will be 8 bytes.
The 64 bit compiler models are available as follows:
LP64 - Long and Pointers are 64 bit (8 bytes)
ILP64 - Integer, Long and Pointers are 64 bit (8 bytes)

Posted by guest on October 18, 2011 at 09:45 PM EDT #

This question was asked on StackOverflow.

Posted by guest on October 19, 2011 at 02:54 AM EDT #

How is 0x7FFFDFBF7F00 different from 0x7fffdfbf7f00 other than one's in uppercase and the other in lower? You should correct your javascript, gentlemen ...

Posted by erKURITA on October 19, 2011 at 06:12 AM EDT #

@erKURITA: You're right :) Consider yourself having gotten full credit.

@Felix: I'd encourage you to actually try this out on any reasonable system you have access to. On such a system (which is probably going to be an LP64 system), sizeof(int) is going to be 4.

Posted by Waseem on October 19, 2011 at 06:46 AM EDT #

All of your answers are wrong. The correct answers are:

00007FFFDFBF7F00
00007FFFDFBF7F04
00007FFFDFBF7F00
00007FFFDFBF7F14

printf with "%p" does not print the leading 0x. You would need "%#p" for that. Even then, all the letters would still be capitalized. To get the output your test considers correct you would need to use "%#x". This would make the letters lowercase and truncate the leading zeros to match your answers.

Posted by Pausing Reality on October 19, 2011 at 10:35 AM EDT #

You missed one detail

different rules applies here:

void foo(int y[5]) // same as if 'void foo(int* y)'
{
printf("%p\n", &y);
printf("%p\n", &y+1);
}

Posted by Jens on October 20, 2011 at 12:18 AM EDT #

I passed the test - but I consider this a tougher one:

#include <stdio.h>

typedef int mystery[5];

void foo(mystery a)
{
printf("%p\n", &a);
printf("%p\n", &a+1);
}

int main() {
mystery b;
foo(b);
return 0;
}

Posted by Thanassis on October 20, 2011 at 12:56 AM EDT #

@Pausing Reality:
Well, my answers aren't "wrong" any more than your answers are wrong -- that's what my 64-bit Ubuntu machine produces!

The real issue here is that the behavior of %p isn't specified, and is implementation-defined, so yes, a more lenient checker would accept a number of forms.

Posted by Waseem on October 20, 2011 at 05:17 AM EDT #

Warning: Do not use &x+1 to reference memory or objects. This will probably cause stack corruption, object corruption, heap corruption, memory buffer overflow, or other bad behavior.

Posted by guest on October 20, 2011 at 06:52 AM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Tired of rebooting to update systems? So are we -- which is why we invented Ksplice, technology that lets you update the Linux kernel without rebooting. It's currently available as part of Oracle Linux Premier Support, Fedora, and Ubuntu desktop. This blog is our place to ramble about technical topics that we (and hopefully you) think are interesting.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today