X

News, tips, partners, and perspectives for the Oracle Linux operating system and upstream Linux kernel work

The Ksplice Pointer Challenge

Back when Ksplice was just a research project at MIT, we all spent a lot of time around the student computing group, SIPB. While there, several precocious undergrads kept talking about how excited they were to take 6.828, MIT's operating systems class.

"You really need to understand pointers for this class," we cautioned them. "Reread K&R Chapter 5, again." Of course, they insisted that they understood pointers and didn't need to. So we devised a test.

Ladies and gentlemen, I hereby do officially present the Ksplice Pointer Challenge, to be answered without the use of a computer:

What does this program print?

#include <stdio.h>
int main() {
  int x[5];
  printf("%p\n", x);
  printf("%p\n", x+1);
  printf("%p\n", &x);
  printf("%p\n", &x+1);
  return 0;
}

This looks simple, but it captures a surprising amount of complexity. Let's break it down.

To make this concrete, let's assume that x is stored at address 0x7fffdfbf7f00 (this is a 64-bit system). We've hidden each entry so that you have to click to make it appear -- I'd encourage you to think about what the line should output, before revealing the answer.

printf("%p\n", x);
What will this print?

Well, x is an array, right? You're no stranger to array syntax: x[i] accesses the ith element of x.

If we search back in the depths of our memory, we remember that x[i] can be rewritten as *(x+i). For that to work, x must be the memory location of the start of the array.

Result: printf("%p\n", x) prints 0x7fffdfbf7f00. Alright.

printf("%p\n", x+1);
What will this print?

So, x is 0x7fffdfbf7f00, and therefore x+1 should be 0x7fffdfbf7f01, right?

You're not fooled. You remember that  in C, pointer arithmetic is special and magical. If you have a pointer p to an int, p+1 actually adds sizeof(int)to p. It turns out that we need this behavior if *(x+i) is properly going to end up pointing us at the right place -- we need to move over enough to pass one entry in the array. In this case, sizeof(int) is 4.

Result: printf("%p\n", x) prints 0x7fffdfbf7f04. So far so good.

printf("%p\n", &x);
What will this print?

Well, let's see. & basically means "the address of", so this is like asking "Where does x live in memory?" We answered that earlier, didn't we? x lives at 0x7fffdfbf7f00, so that's what this should print.

But hang on a second... if &x is 0x7fffdfbf7f00, that means that it lives at 0x7fffdfbf7f00. But when we print x, we also get 0x7fffdfbf7f00. So x == &x.

How can that possibly work? If x is a pointer that lives at 0x7fffdfbf7f00, and also points to 0x7fffdfbf7f00, where is the actual array stored?

Thinking about that, I draw a picture like this:


That can't be right.




 

 

So what's really going on here? Well, first off, anyone who ever told you that a pointer and an array were the same thing was lying to you. That's our fallacy here. If x were a pointer, and x == &x, then yes, we would have something like the picture above. But x isn't a pointer -- x is an array!

And it turns out that in certain situations, an array can automatically "decay" into a pointer. Into &x[0], to be precise. That's what's going on in examples 1 and 2. But not here. So &x does indeed print the address of x.

Result: printf("%p\n", &x) prints 0x7fffdfbf7f00.

Aside: what is the type of &x[0]? Well, x[0] is an int, so &x[0] is "pointer to int". That feels right.

printf("%p\n", &x+1);
What will this print?

Ok, now for the coup de grace. x may be an array, but &x is definitely a pointer. So what's &x+1?

First, another aside: what is the type of &x? Well... &x is a pointer to an array of 5 ints. How would you declare something like that?

Let's fire up cdecl and find out:

cdecl> declare y as array 5 of int;
int y[5]
cdecl> declare y as pointer to array 5 of int;
int (*y)[5]

 

Confusing syntax, but it works:
int (*y)[5] = &x; compiles without error and works the way you'd expect.

But back to the question at hand. Pointer arithmetic tells us that &x+1 is going to be the address of x + sizeof(x). What's sizeof(x)? Well, it's an array of 5 ints. On this system, each int is 4 bytes, so it should be 20 bytes, or 0x14.

Result &x+1 prints 0x7fffdfbf7f14.

And thus concludes the Ksplice pointer challenge.

What's the takeaway? Arrays are not pointers (though they sometimes pretend to be!). More generally, C is subtle. Oh, and 6.828 students, if you're having trouble with Lab 5, it's probably because of a bug in your Lab 2.


P.S. If you're interested in hacking on low-level systems at a place where your backwards-and-forwards knowledge of C semantics will be more than just an awesome party trick, we're looking to hire kernel hackers for the Ksplice team.  Send me an email at jamie.iles@oracle.com with a resume and/or a github link if you're interested!

Join the discussion

Comments ( 28 )
  • guest Tuesday, October 18, 2011

    Should probably specify that sizeof(int)==4.


  • guest Tuesday, October 18, 2011

    In a 64 bit architecture, an int would not be 8 bytes instead of 4 bytes in 32bits arch machine?

    If so sizeof(int) is NOT 4 but 8.

    Thanks


  • guest Tuesday, October 18, 2011

    printf("%p\n", x+1);

    What will this print? 0x7fffdfbf7f08

    Not exactly...

    sizeof(int) should be 8, not 4, on a 64-bit system.


  • Sindisil Tuesday, October 18, 2011

    You really should have specified sizeof (int) here.

    Sure, on most 32 & 64 bit systems today it'll be 4, but, especially since you make a point of specifying a 64-bit system, it's certainly not a sure thing.


  • James R Tuesday, October 18, 2011

    For #2 you need to specify whether this is an ILP64 system or an LP64 system.


  • ouah Tuesday, October 18, 2011

    The right answer to all four questions is: it is undefined behavior. p conversion specifier expects a pointer to void argument. As it is undefined behavior, everything could be printed, even the books of Shakespeare;)

    Now if we correctly cast the arguments to a pointer to void, the right answer for the four questions is then: it depends as it is implementation defined. Not only the size of an int is implementation defined, and this even in 64-bit systems (some are reported to have 64-bit int). But also the p conversion specifier is specified by the C Standard to print in an "implementation-defined manner".


  • guest Tuesday, October 18, 2011

    One can, but shouldn't, think "&x" as "x" being an instance of some struct. That way it makes sense to sizeof(x) == 5*sizeof(int)


  • guest Tuesday, October 18, 2011

    > As far as I can see, there exists no 64-bit architecture which treats int as 8 bits wide

    Some Cray machine on UNICOS were know to have 64-bit int.

    See for example Table 3. in "9.1.2.2 Types" in this Cray reference manual:

    docs.cray.com/books/004-2179-003/004-2179-003-manual.pdf


  • Nolan Capehart Tuesday, October 18, 2011

    > explicitly stating it reminds you more than I'd like about how pointer arithmetic works

    That's a very good reason! Besides, after answering, the quiz-taker can tell whether they had the answer essentially correct.


  • guest Tuesday, October 18, 2011

    I remember being horribly confused by pointers and arrays when first learning C, because of the unfortunate "pun" of the "decay to pointer". For that reason, I avoid using the pun. For years I have written such code as

    printf("%p\n", &x[0]);

    printf("%p\n", &x[1]);

    printf("%p\n", &x);

    printf("%p\n", &x+1); // see note below

    And I assume nobody would get the wrong answers if in C, arrays were a first-class data type, and if the syntax for array declarations weren't suggestively confusing.

    Finally, even in C, if my intent were to treat an array as an object that I want to get to the end of, I would write &x[ARRAY_SIZE], where "#define ARRAY_SIZE 5" or something, rather than &x+1. Furthermore, if what I really want is the address of something that actually follows the array, then I might write

    struct {

    int x[5];

    int y;

    } s;

    &s.y

    (assuming it is possible to define the struct to map to memory without padding issues).

    So basically, I would like to know in what situations clearer code than the example code cannot be written.


  • Nolan Capehart Tuesday, October 18, 2011

    > on x86-64, an int is generally 4 bits wide, just like on a regular x86 system

    Note that the size of an int is technically dependent on the COMPILER, not on the target machine. Of course, a compiler generally uses the target machine as one of the factors that influences the size.


  • callenish Tuesday, October 18, 2011

    Regarding the question of int size, you could keep from explicitly declaring the size of int by mentioning, just in passing, that the total array size was 20 bytes. That would not even impact the last question since you would have to know exactly how to calculate &x + 1 before it became any kind of prompt.


  • guest Tuesday, October 18, 2011

    yup, sizeof(int) must be specifically specified to be 4 and compile specified to have packed arrays to support the explicit given answers.

    Any optimization for speed at all will align the ints at word boundaries, so ints can be spaced 8 bytes apart, and &x+1 can will move forward 5*8 bytes...


  • guest Tuesday, October 18, 2011

    printf("0x%p\n", x); will print 0x7fffdfbf7f00, your code will print 7fffdfbf7f00. You're no stranger to memory notation. alright.


  • Micah Cowan Tuesday, October 18, 2011

    All arrays are always packed. The ISO C standard requires this absolutely. Structs are an entirely different matter, but arrays are required to be stored as a contiguous series of items. This is why even the most hardened language lawyers over at comp.lang.c will happily divide the sizeof an array by the sizeof an element of that array in order to obtain the number of elements in that array (which makes it easy to write code that makes no assumptions about the array, so you only have one place to change the number of elements, and don't even need to rely on macros).

    But that still leaves sizeof(int) == 4 as an assumption. It may be a reasonable one, but it's still an assumption, and is by no means guaranteed.


  • caf Tuesday, October 18, 2011

    guest: Arrays are *always* packed in C. If the compiler wants to align 32-bit ints in arrays at 8-byte boundaries, it must return 8 for sizeof(int) - that is, the four padding bytes must be internal to the int.


  • guest-from-oracle Tuesday, October 18, 2011

    When I got my first DEC Alpha with OSF/1 UNIX, the compiler treated an int as 64 bits. Later fixed under the renamed Digital UNIX when they moved to conform with the X/Open single UNIX Specification back to 32 bits.

    It only caught me once, but that was enough.


  • guest Tuesday, October 18, 2011

    This doesn't seem to be testing understanding of pointers so much as understanding obscure C pointer/array syntax semantics.

    Basically, trying apply the & operator to a value that is already an address constant results in the address constant. It could just as easily default to 0, NULL, or whatever.


  • Felix Tuesday, October 18, 2011

    As many people stated, the answer to the second question is wrong. 64-bit system, sizeof(int) would return 8 not 4. the answer would be 0x7fffdfbf7f08


  • guest Wednesday, October 19, 2011

    Do not be fooled.... The Size of data types is defined by the Model the installed Compiler uses and not the Machine Architecture..

    Its not CORRECT to say that on 32 bit Arch... size of int is 4 bytes

    and for 64 bit Arch it will be 8 bytes.

    The 64 bit compiler models are available as follows:

    LP64 - Long and Pointers are 64 bit (8 bytes)

    ILP64 - Integer, Long and Pointers are 64 bit (8 bytes)


  • guest Wednesday, October 19, 2011

    This question was asked on StackOverflow.


  • erKURITA Wednesday, October 19, 2011

    How is 0x7FFFDFBF7F00 different from 0x7fffdfbf7f00 other than one's in uppercase and the other in lower? You should correct your javascript, gentlemen ...


  • Waseem Wednesday, October 19, 2011

    @erKURITA: You're right :) Consider yourself having gotten full credit.

    @Felix: I'd encourage you to actually try this out on any reasonable system you have access to. On such a system (which is probably going to be an LP64 system), sizeof(int) is going to be 4.


  • Pausing Reality Wednesday, October 19, 2011

    All of your answers are wrong. The correct answers are:

    00007FFFDFBF7F00

    00007FFFDFBF7F04

    00007FFFDFBF7F00

    00007FFFDFBF7F14

    printf with "%p" does not print the leading 0x. You would need "%#p" for that. Even then, all the letters would still be capitalized. To get the output your test considers correct you would need to use "%#x". This would make the letters lowercase and truncate the leading zeros to match your answers.


  • Jens Thursday, October 20, 2011

    You missed one detail

    different rules applies here:

    void foo(int y[5]) // same as if 'void foo(int* y)'

    {

    printf("%p\n", &y);

    printf("%p\n", &y+1);

    }


  • Thanassis Thursday, October 20, 2011

    I passed the test - but I consider this a tougher one:

    #include <stdio.h>

    typedef int mystery[5];

    void foo(mystery a)

    {

    printf("%p\n", &a);

    printf("%p\n", &a+1);

    }

    int main() {

    mystery b;

    foo(b);

    return 0;

    }


  • Waseem Thursday, October 20, 2011

    @Pausing Reality:

    Well, my answers aren't "wrong" any more than your answers are wrong -- that's what my 64-bit Ubuntu machine produces!

    The real issue here is that the behavior of %p isn't specified, and is implementation-defined, so yes, a more lenient checker would accept a number of forms.


  • guest Thursday, October 20, 2011

    Warning: Do not use &x+1 to reference memory or objects. This will probably cause stack corruption, object corruption, heap corruption, memory buffer overflow, or other bad behavior.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services