Success with the hunt for the segv.
By timatworkhomeandinbetween on May 26, 2005
Yes ! Finally after three whole days I have worked out the answer for the segv in malloc code. This program is a graphics application and it only segv's when run on a jfb framebuffer < big hint>. I sat down and read the malloc code - very clever, I should write it up to save others having the learning curve. We segv when we were carving up a big buffer area into two smaller ones, and importantly we were carving up a buffer called Bottom. This buffer contains all the unallocated space, when it runs out we call _morecore() which uses sbrk() to get more heap pages.
In this case early on we have a 19Mbyte malloc so the next call to _morecore() allocates a wee bit more than 0x01000000 bytes, this is important as it has a bit set in the top bye of the size. The size id kept in the 32 bits 8 bytes below the malloc/free pointer. Using dbx's watchi function I noticed that just as we segv the size of Bottom drops from 0x01001238 to 0x00001238 - maybe something is overwriting the bottom bye of the size.
Luckily I have the source to the standard malloc, so I turned it into a library and preloaded it -it still failed. So I used procfs's watchpoint functionality to put a write watchpoint across the size when the code recognised the address. Yes it took a SIGTRAP and it was whilst writing the null byte at the end of a strcpy. The classic sign of someone doing..
foo = malloc(strlen(string)); strcpy(foo, string);
This of course is wrong as it has not allocated room for the trailing NULL ( and it should have some error checking). This is a bit better...
foo = malloc(1 + strlen(string)); strcpy(foo, string);
So who was the culprit.. /usr/openwin/lib/GL/devhandlers/oglSUNWjfb.so.6 , time for a bug.
More worryingly why did no debugging malloc library not catch it? A combination of things, the space was not a multipleof 16 bytes so there was some padding at the end, I will get a simple testcase and log an rfe against libumem to get it to catch these issues.
Back to core dump running code