JVM TI: How VM Agents Work

Chapter 1: What is a VM Agent?

Traditionally a Java VM agent is loaded into a VM at initialization with the option -XrunNAME, where NAME is the name of the native shared library or DLL, e.g. "libNAME.so" or "NAME.dll". For example, using HPROF you would say "java -Xrunhprof", it would find "libhprof.so" or "hprof.dll" in the JDK, load it, and make a call into that agent library to get it started. Of course in JDK 5.0, the new option spelling is "-agentlib",.e.g. "java -agentlib:hprof", but JDK 5.0 will accept either option spelling. There is a set of sample JVM TI agents in the demo directory of the JDK available in the JDK 5.0 download, or if you are brave the latest JDK 6.0 download. Source and binaries are included for anyone interested in creating their own custom agent library.

Now I must warn you, this agent library will be operating in the same process and address space as the VM itself, so if you do anything nasty inside your agent code, it's being done in the VM process too, and it's easy to crash a VM with a bad agent. Be very careful, and don't accept any agents from strangers. ;\^) Having written many agents, I can attest to the fact that they are very difficult to "get right". Since they are running in the VM, anything you do in the agent library is being done in the VM process, they must be re-entrant and MT-safe, and you need to make sure you obey all the JVM TI and JNI rules. If your agent leaks memory by calling malloc() and not doing the free(), then the VM will appear to have a leak. If you allocate too much memory, the VM process will fail with an 'out of memory'. So be very very careful.

Of course the VM must be able to locate the native library via the platform specific search rules, either by having the library copied into your JDK with the other shared libraries, or using some kind of platform specific mechanism to help a process find it, e.g. using LD_LIBRARY_PATH on Solaris/Linux, or PATH on Windows. In addition, the agent library itself must be able to find all the external symbols it needs. On Solaris and Linux, the utility 'ldd' can be used to verify that a native library knows how to find all the externals it needs.

Once a VM has managed to successfully load an agent library, it looks for a symbol in it to call and establish the agent to VM connection. The native library should have exposed as an extern (exported symbol) with the name "JVM_OnLoad" for use of JVM DI or JVM PI, or "Agent_Onload" for use of JVM TI in JDK 5.0 and newer JDKs. I won't spend much time on JVM PI or JVM DI, but the basic principle is the same, JVM TI is a combined newer interface in JDK 5.0 and it seems more valuable to discuss it than the old world.

Chapter 2: Agent Interfaces

The documentation on the latest JDK 5.0 agent API and JNI is at:
java.sun.com/j2se/1.5.0/docs/guide/jvmti
java.sun.com/j2se/1.5.0/docs/guide/jni

The older agent API documentation can be found in JDK 1.4.2:
java.sun.com/j2se/1.4.2/docs/guide/jvmpi
java.sun.com/j2se/1.4.2/docs/guide/jpda/jvmdi-spec.html
An additional warning here. These older interfaces, in particular JVMPI (which has always been labeled "experimental") are being discontinued, so my advice would be to limit your time doing any development work on agents based on these older APIs. In addition, JVMPI is a very touchy API, not only in it's implementation, but also in the stability of your own agent code. It's true that you need to be very careful in all agent code, but in particular the JVMPI interface has proven to be even more so. So be careful, and if you plan on exposing your agent to anyone other than yourself, you might want to be figuring out your testing strategy early. If you aren't familiar with highly recursive and highly re-entrant coding, you'll be in for an education if you develop any large amount of agent code.

An article on the transition from JVMPI to JVM TI can be found here:
java.sun.com/developer/technicalArticles/Programming/jvmpitransition

So I won't go into the older agent API, but concentrate on JVM TI. The older JVMDI is probably only of interest to debugger developers, and the older JVMPI which would interest mostly profiler developers has been a very problematic interface. Neither of the older interfaces worked well together, and JVMPI is limited and difficult to use with JNI. The newer JVM TI on the other hand, works very well with JNI, has all the functionality of JVMDI and almost all the functionality of JVMPI. Note I said "almost". It turns out that some of the functionality of JVMPI was not put directly into JVM TI, but for a very good reason. Turns out that many of the limitations of JVMPI were due to these features because they required custom code for every garbage collector and/or 'just in time' compiler. With JVM TI it was decided that bytecode instrumentation (BCI) was more effective at getting some of this functionality, so JVM TI made sure BCI was easy to do. I'll do a separate chapter on BCI, but just consider it a way to inject code into the classfile methods, either before the VM ever sees the classfile (ClassFileLoadHook), or by redfining the classfiles on the fly (RedefineClass). See http://weblogs.java.net/blog/kellyohair/archive/2005/05/bytecode_instru.html for more information about BCI.

So depending on what you want to do, it's a good idea to get a basic understanding of what these interfaces can and cannot do.

Chapter 3: Agent Initialization

So once the VM has located your shared library and successfully loaded it into the VM process, it goes looking for either JVM_OnLoad or Agent_OnLoad in your shared library. I'll only cover Agent_OnLoad here, but the basic principle is the same. In JVM TI there has been an attempt at making the interface as robust as possible, so unlike the older interfaces, JVM TI has "capabilities" which need to be requested during the Agent_OnLoad time. This better informs the VM what the agent will need to do and allows for optimal performance whenever possible. Agents in general should only request the capabilities they really need to avoid unnecessary VM logic.

So what does agent initialization look like? Well the best thing to do here is just show you some code. I'm going to shorten this code to make it easier to follow, so don't consider this code complete, go get the complete copy of heapTracker.c for all the details and comments. It can be found in the demo/jvmti/heapTracker directory of any JDK 5.0 or Mustang JDK binary download.


#include "jvmti.h"
#include "jni.h"

static jrawMonitorID agent_lock;

JNIEXPORT jint JNICALL
Agent_OnLoad(JavaVM \*vm, char \*options, void \*reserved) {
    jvmtiEnv              \*jvmti;
    jvmtiError             error;
    jint                   res;
    jvmtiCapabilities      capabilities;
    jvmtiEventCallbacks    callbacks;

    res = (\*vm)->GetEnv(vm, (void \*\*)&jvmti, JVMTI_VERSION_1);

    parse_agent_options(options);

    (void)memset(&capabilities,0, sizeof(capabilities));
    capabilities.can_generate_all_class_hook_events  = 1;
    capabilities.can_tag_objects                     = 1;
    capabilities.can_generate_object_free_events     = 1;
    capabilities.can_get_source_file_name            = 1;
    capabilities.can_get_line_numbers                = 1;
    capabilities.can_generate_vm_object_alloc_events = 1;
    error = (\*jvmti)->AddCapabilities(jvmti, &capabilities);

    (void)memset(&callbacks,0, sizeof(callbacks));
    callbacks.VMStart           = &cbVMStart;
    callbacks.VMInit            = &cbVMInit;
    callbacks.VMDeath           = &cbVMDeath;
    callbacks.ObjectFree        = &cbObjectFree;
    callbacks.VMObjectAlloc     = &cbVMObjectAlloc;
    callbacks.ClassFileLoadHook = &cbClassFileLoadHook;
    error = (\*jvmti)->SetEventCallbacks(jvmti, &callbacks, 
                      (jint)sizeof(callbacks));

    error = (\*jvmti)->SetEventNotificationMode(jvmti, JVMTI_ENABLE, 
                      JVMTI_EVENT_VM_START, (jthread)NULL);
    error = (\*jvmti)->SetEventNotificationMode(jvmti, JVMTI_ENABLE, 
                      JVMTI_EVENT_VM_INIT, (jthread)NULL);
    error = (\*jvmti)->SetEventNotificationMode(jvmti, JVMTI_ENABLE, 
                      JVMTI_EVENT_VM_DEATH, (jthread)NULL);
    error = (\*jvmti)->SetEventNotificationMode(jvmti, JVMTI_ENABLE, 
                      JVMTI_EVENT_OBJECT_FREE, (jthread)NULL);
    error = (\*jvmti)->SetEventNotificationMode(jvmti, JVMTI_ENABLE, 
                      JVMTI_EVENT_VM_OBJECT_ALLOC, (jthread)NULL);
    error = (\*jvmti)->SetEventNotificationMode(jvmti, JVMTI_ENABLE,
                      JVMTI_EVENT_CLASS_FILE_LOAD_HOOK, (jthread)NULL);

    error = (\*jvmti)->CreateRawMonitor(jvmti, "agent data", &(agent_lock));

    return JNI_OK;
}

Obviously I haven't shown you all the code, but you can see the entire agent source by downloading the latest JDK 5.0, installing it, and looking in the directory demo/jvmti/heapTracker. In particular you will notice that I have completely ignored the error returns, this is NOT a good practice, and if you copy the above code without adding checks on the error returns, you will regret it at some point. JVM TI is a much more robust VM interface than the previous interfaces (JVMPI and JVMDI) and this is a very good thing, since agents are running inside the VM process itself, it's critical that the agent does extensive error checking, so don't skimp on the error checking. This example code is basically from the heapTracker demo JVM TI agent that is shipped with all the JDK 5.0 downloads.

In the demo agents, any error indicates a problem in our implementation, so often the demo agents will exit the process, but these are demo agents. You need to decide what your agent wants to do in the face of an error condition.

So the basic things that happened in the above agent initialization was to:

  • Get a jvmtiEnv (JVM TI Environment)
  • Parse any options supplied to this agent from the command line
  • Ask for the capabilities the agent will need
  • Provide the function pointers to the agent's event callback functions
  • Enable the initial events we want active at the start.
  • Create any raw monitor locks you might need in the agent

See how simple it is? Well wait until you start handling events, it gets a bit more hairy.

Chapter 3: Event Callbacks

So if you have gotten past Agent_OnLoad and setup the proper capabilities and event requests, now what? Well, once the VM gets going, you should start seeing calls to the functions you supplied to SetEventCallbacks, in this particular case I've happened to use a naming convention of a "cb" prefix on these functions: cbVMStart, cbVMInit, cbObjectFree, cbVMObjectAlloc, cbClassFileLoadHook, and cbVMDeath. In this particular case, it's the cbClassFileLoadHook we would expect to see called first, at least until the first few basic system classes are loaded, then cbVMStart and cbVMInit which are called only once. In any case, after cbVMInit expect multiple threads to be calling cbObjectFree, cbVMObjectAlloc, cbClassFileLoadHook, and cbVMDeath. There will only be one call to cbVMDeath, but don't expect the VM to be completely calmed down at VM death, other threads could still be triggering events.

Let's look at what some of this callback code looks like, this again is taken from the jdk demo file demo/jvmti/heapTracker/heapTracker.c, available inside the Mustang or Tiger binary images. Of course it's also available in the Mustang and Tiger source downloads too in the src/share/demo/jvmti/heapTracker directory.

JVMTI_EVENT_CLASS_FILE_LOAD_HOOK

Let's first look at cbClassFileLoadHook since it will probably be called first:

static void JNICALL
cbClassFileLoadHook(jvmtiEnv \*jvmti, JNIEnv\* env,
                jclass class_being_redefined, jobject loader,
                const char\* name, jobject protection_domain,
                jint class_data_len, const unsigned char\* class_data,
                jint\* new_class_data_len, unsigned char\*\* new_class_data) {
    enterCriticalSection(jvmti); {
	if ( !gdata->vmDead ) {
	    const char \* classname;
	    if ( name == NULL ) {
		classname = java_crw_demo_classname(class_data, class_data_len,
				NULL);
            } else {
	        classname = strdup(name);
            }
	    \*new_class_data_len = 0;
            \*new_class_data     = NULL;
            if ( strcmp(classname, STRING(HEAP_TRACKER_class)) != 0 ) {
                jint           cnum;
                int            systemClass;
                unsigned char \*newImage;
                long           newLength;

                cnum = gdata->ccount++;
                systemClass = 0;
                if ( !gdata->vmStarted ) {
                    systemClass = 1;
                }
                newImage = NULL;
                newLength = 0;

                java_crw_demo(cnum,
                    classname,
                    class_data,
                    class_data_len,
                    systemClass,
                    STRING(HEAP_TRACKER_class),
                    "L" STRING(HEAP_TRACKER_class) ";",
                    NULL, NULL,
                    NULL, NULL,
                    STRING(HEAP_TRACKER_newobj), "(Ljava/lang/Object;)V",
                    STRING(HEAP_TRACKER_newarr), "(Ljava/lang/Object;)V",
                    &newImage,
                    &newLength,
                    NULL,
                    NULL);
                if ( newLength > 0 ) {
                    unsigned char \*jvmti_space;

                    jvmti_space = (unsigned char \*)allocate(jvmti, (jint)newLength);
                    (void)memcpy((void\*)jvmti_space, (void\*)newImage, (int)newLength);
                    \*new_class_data_len = (jint)newLength;
                    \*new_class_data     = jvmti_space; /\* VM will deallocate \*/
                }
                if ( newImage != NULL ) {
                    (void)free((void\*)newImage);
                }
            }
	    (void)free((void\*)classname);
	}
    } exitCriticalSection(jvmti);
}

Let's ignore what is happening in java_crw_demo for now, we can discuss that later in the BCI chapter, let's just say that java_crw_demo accepts a class image and returns back a class image, in memory, which is what is important here. First note the use of a critical section, we are making sure that nothing else in the agent can mess with the global static data (gdata) while we are processing this classload. Well, actually, the classload really hasn't happened yet, this event represents the time that the VM has located the classfile, read it into memory, but hasn't actually processed the image yet. With this event we can replace the bytes that represent the class image and give the VM a different class image to actually load. Now I know some of you are thinking some pretty dangerous thoughts right about now, but what you can change here is limited. You can't add methods or fields or arguments to methods, or lots of other things. The intent here is to instrument the method bytecodes, that is all, sorry to let you down. Since classloading is mostly done in a sequential manner, you could argue that the critical section was unnecessary, but it makes me feel better.

So what is special to note here? The test on gdata->vmDead is protecting us from doing anything if some other thread is trying to terminate the VM (waste of time to process classfiles). The classname being NULL is a very rare occurance and only happens when the ClassLoader.defineClass() method is being used with a NULL name, and when that happens we use a java_crw_demo library function to dig the name out of the classfile. The strcmp() on the HEAP_TRACKER_CLASS is very important, we don't want to inject calls to the HEAP_TRACKER_CLASS inside the HEAP_TRACKER_CLASS, we wouldn't get anywhere that way. The gdata->ccount is just a way to have a unique numeric ID for every class loaded, this is passed into the main java_crw_demo function. The last special thing to note here is the use of gdata->vmStarted. There may be a better solution to this some day, but currently the first classes loaded between Agent_OnLoad and the VM_START event are considered (for lack of a better word) "system classes". There are usually less than 12 of these classes and java_crw_demo treats these special when instrumenting them due to the primordial nature of these classes and the state of the VM prior to the VM start event. You'll need to look into the details of java_crw_demo for more information on this.

Note that the memory allocated by the java_crw_demo library is malloc() memory, not JVM TI allocated memory. The VM gets the new class file image through the arguments new_class_data_len and new_class_data, and it's important that the memory returned back to the VM be allocated via JVM TI Allocate, which is why the malloc() allocated java_crw_demo memory is copied. The java_crw_demo code is neutral code and does not have any dependence on JVM TI or the VM, it's just a C library with normal C library dependencies.

JVMTI_EVENT_VM_START

After a few dozen system classes are loaded and the VM has been started but not fully initialized the VM_START event is posted. At this time it's considered safe to call many JNI functions, but keep in mind, the VM has not been fully initialized and Java threads may not exist. At the VM start event, the VM is considered to be out of its primordial phase of creeping up from the muck, it now has feet. ;\^)

static void JNICALL
cbVMStart(jvmtiEnv \*jvmti, JNIEnv \*env) {
    enterCriticalSection(jvmti); {
        jclass klass;
	jfieldID field;
	jint rc;
	static JNINativeMethod registry[2] = {
	    {STRING(HEAP_TRACKER_native_newobj), 
                "(Ljava/lang/Object;Ljava/lang/Object;)V", 
		(void\*)&HEAP_TRACKER_native_newobj},
	    {STRING(HEAP_TRACKER_native_newarr), 
                "(Ljava/lang/Object;Ljava/lang/Object;)V", 
		(void\*)&HEAP_TRACKER_native_newarr}
        };
	
	klass = (\*env)->FindClass(env, STRING(HEAP_TRACKER_class));
	rc = (\*env)->RegisterNatives(env, klass, registry, 2);
	field = (\*env)->GetStaticFieldID(env, klass, 
                STRING(HEAP_TRACKER_engaged), "I");
	(\*env)->SetStaticIntField(env, klass, field, 1);
        gdata->vmStarted = JNI_TRUE;
    } exitCriticalSection(jvmti);
}

Besides saving the fact that the VM start event happened (gdata->vmStarted) the above code involves setting up the class being used for BCI, the Tracker class as it is called. First we call the JNI FindClass to get the jclass handle (note this could trigger a ClassFileLoadHook event, which will be ignored, or due to our earlier BCI operations on the system classes this class might already be loaded, it depends), second we register the native methods for the Tracker class with JNI RegisterNatives, third we get the jfieldID using JNI GetStaticFieldID, and lastly we assign the value 1 to that field with SetStaticIntField. All the Tracker methods which will be called by the BCI classes modified by java_crw_demo are effectively turned off by default until this field value changes to 1, so this triggers the injected Tracker Method calls to actually call the native methods we have registered earlier. We'll talk about what happens in those native methods next, but it's important to know that we can't turn these native calls on until the VM has started, we need the ability to call JNI functions in our native code.

TraceInfo and Tracker methods

The whole point of this agent is to find out who is allocating the most space, so at each object allocation we want to find out what the stack trace is, and then tag the object with a reference to that trace information. Inside the agent itself we have a TraceInfo structure which also serves as the Tag for the objects (a tag is any 64bit value). Along with the TraceInfo struct is some support code to create a hash table for quick lookups. You will probably find that the creation or lookup of the TraceInfo will take up a considerable amount of the application time, when this agent is activated, so it's important that this be as fast and as efficient as possible. The basic actions here boils down to findTraceInfo:

static TraceInfo \*
findTraceInfo(jvmtiEnv \*jvmti, jthread thread, TraceFlavor flavor) {
    TraceInfo \*tinfo;
    jvmtiError error;
    
    tinfo = NULL;
    if ( thread != NULL ) {
	static Trace  empty;
	Trace         trace;

	/\* Before VM_INIT thread could be NULL, watch out \*/
	trace = empty;
	error = (\*jvmti)->GetStackTrace(jvmti, thread, 0, MAX_FRAMES+2,
			    trace.frames, &(trace.nframes));
	/\* If we get a PHASE error, the VM isn't ready, or it died \*/
	if ( error == JVMTI_ERROR_WRONG_PHASE ) {
	    /\* It is assumed this is before VM_INIT \*/
	    if ( flavor == TRACE_USER ) {
		tinfo = emptyTrace(TRACE_BEFORE_VM_INIT);
	    } else {
		tinfo = emptyTrace(flavor);
	    }
	} else {
	    check_jvmti_error(jvmti, error, "Cannot get stack trace");
	    /\* Lookup this entry \*/
	    tinfo = lookupOrEnter(jvmti, &trace, flavor);
	}
    } else {
	/\* If thread==NULL, it's assumed this is before VM_START \*/
	if ( flavor == TRACE_USER ) {
	    tinfo = emptyTrace(TRACE_BEFORE_VM_START);
	} else {
	    tinfo = emptyTrace(flavor);
	}
    }
    return tinfo;
}

First, if thread==NULL that usually means that VM initialization has not happened, and we won't be able to get an accurate stack trace. Calling GetStackTrace could also return an error, telling us that the VM is not in the 'live' phase, but if it returns JVMTI_ERROR_NONE, then we have a stack trace. With this stack trace we do a lookupOrEnter of this stack trace into the hash table and return a reference to a TraceInfo structure. This pointer to a TraceInfo struct will then be used as the tag on this object. All objects allocated from this stack trace will have the same tag.

A special note on scaling here. The thing to watch on memory scaling with this agent is the TraceInfo structure. There should be only one per stack trace of an allocation bytecode ('new' or newarray bytecode), but how many will that be? It depends on the application. There is also no cleanup going on here, so a very long running application could experience some problems, but only if the total number of allocation traces is very high. The hash table is also a fixed size, which could also be a problem. On thread scaling, it's the critical sections you have to watch out for. There is a critical section in the lookupOrEnter() function, but other than that, object allocations and object free events are fairly critical section free. In general, this agent has few serious scaling problems, but there is always room for improvements. There could be a separate hash table per thread, avoiding the need for the critical section, but that would increase the stress on the memory scaling issue. I'll leave this as an exercise for the reader.

JVMTI_EVENT_VM_INIT

After the VM_START event and probably a few hundred class load events, the VM will get to the fully initialized event.

static void JNICALL
cbVMInit(jvmtiEnv \*jvmti, JNIEnv \*env, jthread thread) {
    jvmtiError error;
    
    error = (\*jvmti)->IterateOverHeap(jvmti, JVMTI_HEAP_OBJECT_UNTAGGED,
				      &cbObjectTagger, NULL);
    enterCriticalSection(jvmti); {
        gdata->vmInitialized = JNI_TRUE;
    } exitCriticalSection(jvmti);
}

Besides saving the fact that we got this event (gdata->vmInitialized), there are many Java level objects that have been allocated at this point which we haven't tracked because we had to wait until the VMStart event before we turned on our Tracker classes. So we use the JVM TI IterateOverHeap to traverse the heap and tag all these objects now. We'll discuss object tagging in another chapter, but keep in mind that you can't track the objects unless you tag them. Also note that it's important for these heap iterations to be done without holding any locks.

JVMTI_EVENT_OBJECT_FREE

static void JNICALL
cbObjectFree(jvmtiEnv \*jvmti, jlong tag)
{
    TraceInfo \*tinfo;
  
    if ( gdata->vmDead ) {
	return;
    }
    
    /\* The object tag is actually a pointer to a TraceInfo structure \*/
    tinfo = (TraceInfo\*)(void\*)(ptrdiff_t)tag;
    
    /\* Decrement the use count \*/
    tinfo->useCount--;
}


JVMTI_EVENT_VM_OBJECT_ALLOC

static void JNICALL
cbVMObjectAlloc(jvmtiEnv \*jvmti, JNIEnv \*env, jthread thread, 
		jobject object, jclass object_klass, jlong size)
{
    TraceInfo \*tinfo;
    
    if ( gdata->vmDead ) {
	return;
    }
    tinfo = findTraceInfo(jvmti, thread, TRACE_VM_OBJECT);
    tagObjectWithTraceInfo(jvmti, object, tinfo);
}


JVMTI_EVENT_VM_DEATH

This is the last major VM event, but don't be fooled that you won't see other events when you get this, many threads could still be triggering events during and slightly after this event.

static void JNICALL
cbVMDeath(jvmtiEnv \*jvmti, JNIEnv \*env) {
    jvmtiError error;

    error = (\*jvmti)->ForceGarbageCollection(jvmti);
    error = (\*jvmti)->IterateOverHeap(jvmti, JVMTI_HEAP_OBJECT_EITHER,
				      &cbObjectSpaceCounter, NULL);

    enterCriticalSection(jvmti); {
        jclass              klass;
	jfieldID            field;
        jvmtiEventCallbacks callbacks;

	klass = (\*env)->FindClass(env, STRING(HEAP_TRACKER_class));
	field = (\*env)->GetStaticFieldID(env, klass, 
                                         STRING(HEAP_TRACKER_engaged), "I");
	(\*env)->SetStaticIntField(env, klass, field, 0);
        (void)memset(&callbacks,0, sizeof(callbacks));
	error = (\*jvmti)->SetEventCallbacks(jvmti, &callbacks, 
					    (jint)sizeof(callbacks));
	gdata->vmDead = JNI_TRUE;
	if ( gdata->traceInfoCount > 0 ) {
	    TraceInfo \*\*list;
	    int         count;
	    int         i;
	   
	    stdout_message("Dumping heap trace information\\n");
	    list = (TraceInfo\*\*)calloc(gdata->traceInfoCount, 
					      sizeof(TraceInfo\*));
	    count = 0;
	    for ( i = 0 ; i < HASH_BUCKET_COUNT ; i++ ) {
		TraceInfo \*tinfo;

		tinfo = gdata->hashBuckets[i];
		while ( tinfo != NULL ) {
		    if ( count < gdata->traceInfoCount ) {
			list[count++] = tinfo;
		    }
		    tinfo = tinfo->next;
		}
	    }
	    qsort(list, count, sizeof(TraceInfo\*), &compareInfo);
	    for ( i = 0 ; i < count ; i++ ) {
		if ( i >= gdata->maxDump ) {
		    break;
		}
		printTraceInfo(jvmti, i+1, list[i]);
	    }
	    (void)free(list);
	}
    } exitCriticalSection(jvmti);
}  

So it is time to summarize the data. First we use JVM TI to force a garbage collection, then we iterate over the heap and get a count, per stack trace, of the objects currently allocated. Then we turn the Tracker class off and disconnect all the JVM TI callbacks. Now keep in mind that some callbacks may still be active, we have only turned off any future callbacks by removing their addresses from this JVM TI environment, we could have disabled the events which would do the same thing. Lastly we construct a single dimensioned list of all the TraceInfo structures, sort it by allocation amount, and then print out up to gdata->maxDump of the stack traces that allocated the most memory.

Chapter 4: Object Tagging

Ok are you completely and hopelessly confused? It's easy to get that way with VM agent code. Let's get a higher level view of what is happening here. This example code is called heapTracker, and it's intent is to track all Java object allocations in the heap, saving away the stack trace of where each object was allocated. Using BCI this agent has slipped in extra bytecodes around the object allocations so that it can capture the stack trace at that time, and also tag the objects that were allocated with that stack trace. So as the VM executes your bytecodes, it is also executing the bytecode we added, calling the Tracker methods, which will then call the native methods we have registered for the Tracker class. Those native methods create a TraceInfo struct and tag the object with that struct address.

Objects that have non-zero tags are treated differently, and we need the tag on any objects we are interested in. In this case, all objects. Only objects with tags will be seen with the OBJECT_FREE event, so the only way to know that an object really is still allocated is by way of this event. If we wanted to have a unique identification per object, we would need a unique tag value for every object. We could for example, tag an object with an integer counter, but all we would have would be that counter, representing when in allocation time an object was allocated. But if in addition that counter was used to index into some additional data about an object, then we'd have more specific data. You could envision using the counter technique, and then only track details about every 10th object, or maybe the last 1000 objects allocated. Lots of variations possible here.

The Garbage Collector of the VM has a basic task of managing the allocations by compacting, re-arranging or doing whatever is necessary for reclaiming space and providing the necessary space needed for allocations. In the process, objects get moved, which is why a particular address in the process memory isn't very helpful, and why we use tags. If you want access to a tagged object, you can get the JNI jobject handle to an object, and use any of the JNI or JVM TI calls to access that object through that JNI handle. Now someone might ask 'Why not tag an object with the jobject handle'? Sure, that's possible, hopefully a weak reference because a global reference would prevent all garbage collection of the objects. So far, using a jobject handle has proven very helpful to me, but I'm sure there is a case where that might be handy.

So how is an object tagged? There is an explicit SetTag interface and you can also just assign the tag during the callback from interfaces like IterateOverHeap. Both these are used in the heapTracker.c example.

Chapter 5: BCI and BCI Events

So what does this particular Tracker class looks like for heapTracker:

public class HeapTracker {
    private static int engaged = 0; 
    private static native void _newobj(Object thread, Object o);
    public static void newobj(Object o){
	if ( engaged != 0 ) {
	    _newobj(Thread.currentThread(), o);
	}
    } 
    private static native void _newarr(Object thread, Object a);
    public static void newarr(Object a){
	if ( engaged != 0 ) {
	    _newarr(Thread.currentThread(), a);
	}
    
}

As you can see, there isn't much to it. It's the methods newobj() and newarr() that are called from the bytecode of the application, which in turn call the native methods, which have been registered as the functions HEAP_TRACKER_native_newobj() and HEAP_TRACKER_native_newarr().

The newobj() method only needs to be placed at the entry to java.lang.Object., and in fact turns out to be one of the easier places to put it. We need to adjust the stacktrace we are seeing to compensate for the few extra frames we get, but overall easier than the alternative mechanism or turning off verification completely. The injection is just a 'dup' and an 'invokestatic' bytecode insertion (along with the necessary constant pool entries for the Tracker classname and newobj() method name). The VM specification does not allow for an object to passed anywhere before it is initialized, so injecting bytecodes after every 'new' bytecode will trigger verification errors when the object is passed into newobj(). The alternative mechanism would be to find the 'new' bytecodes, add the 'dup' after it, and then insert the 'invokestatic' bytecode after the matching method call for the specific class.

If you are interested in the details of how classfiles are modified you should look at the actual source code to the java_crw_demo library, which is available with the download of either JDK 5.0 or the new Mustang binary downloads at http://mustang.dev.java.net. In addition there are other BCI libraries that can be used and the JDK itself provides a way for pure java agents to be written via the java.lang.instrument classes and the -javaagent option of JDK 5.0. (Perhaps the basis of a completely different article?).

There are probably more efficient ways to get stack traces, and I'm sure better ways to do the BCI to obtain a similar heapTracker type of functionality. It might be more efficient to also use BCI to inject bytecodes at method entry and exit, and letting the Java code itself track the current stack trace (per thread), thus avoiding the overhead of having to drop into native code and calling JVM TI GetStackTrace. Consider that an exercise left to the reader. :\^)

Summary

As you can see, this was a particular application of JVM TI for a particular purpose. The different kinds of agents that can be written are pretty countless, as are the different ways to approach the problem. In most cases it takes experimentation and time to come up with a good solution, and sometimes that solution may not fit all applications. It's easy to create a bottleneck with too many critical sections, and it's also easy to forget how things need to scale. Trying to figure out where the time is being spent in an agent requires the use of native tools like those available on Solaris 10 DTrace or Analyzer.

I hope this has been helpful. Please post comments on anything you think should be added to this.

Comments:

Post a Comment:
Comments are closed for this entry.
About

Various blogs on JDK development procedures, including building, build infrastructure, testing, and source maintenance.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today