I had my first brush with the new M7 chip’s Realtime Application Data Integrity (ADI) feature today. A new tool which I integrated a few weeks back was found to have a problem when running with ADI enabled. From the resulting core dump, it was immediately obvious what the issue was in this code fragment (which bears more than a passing resemblence to EXAMPLE 1 in the scf_simple_prop_get(3SCF) man page):

void
foo(int64t * sz)
{
        int64_t *size = NULL;

        if (scf_simple_prop_numvalues(prop) > 0)
                size = scf_simple_prop_next_integer(prop);

        scf_simple_prop_free(prop);

        if (size)
                sz = *size * MEGABYTE;
}


I had somehow transposed two blocks – the pointer returned by svc_simple_prop_next_integer() is a pointer to some storage which is freed by scf_simple_prop_free(). Oops.

However, the ADI feature found this immediately, and caused a SEGV at the exact point the stale memory reference was used, i.e. at the “*size” access in the last line. The fix is obvious – access *size before calling scf_simple_prop_free().

This got me thinking – could I have caught that before integration, without using ADI? How did some of the other alternate memory allocators fare in the detection of this problem?

  1. Standard malloc
    This implementation did not notice the problem at all, and worse still (from a debugging point of view), returned the correct and expected result as well. This is why it went unnoticed.
  2. libmapmalloc
    This implementation did not notice the error either (although I haven’t looked into why).
  3. libumem with “UMEM_DEBUG=default”
    This sort of found it by ensuring *size was a really large number, but this caused an error much later in the program, at a point that would have been difficult to diagnose back to the original fault.
  4. watchmalloc.so.1
    This also kind of found it by ensuring that *size was always zero, but again, the error was noticed much further down the line.

So all in all, the ADI experience was a very positive one, and allowed me to very quickly get to the root cause of a stale memory pointer.

You can read more about ADI here: https://community.oracle.com/docs/DOC-912448