MVM in HotSpot - Bright, Shiny Objects
By Dave on Sep 08, 2006
MVM -- the Multi-Tasking Virtual Machine technology developed by SunLabs -- is extremely useful in some environments. There's considerable debate, however, about why MVM is not part of J2SE. For balanced background reading I'd recommend starting with the MVM article on java.sun.com, Chet Haase's blog entry on MVM and Greg Czajkowski's blog. (Greg was one of the creators of MVM while he was at SunLabs). Briefly, the JVM can be though of as providing a virtual machine. (That's a tautology, but bear with me). Similarly, Java threads are vaguely analogous to virtual processors. There's no concept, however, of a process or address-space isolation in the JVM. MVM adds that missing construct under the name of task. MVM allows independent Java tasks to coexist and run safely in a single JVM without any risk that the tasks will interfere with each other. One of the key mechanisms MVM uses to provide isolation is the replication of class static data; each task binds to a private instance of class static data. MVM also leverages Java's existing type-safety properties: a thread in one task, for instance, can't forge a reference to objects used in other tasks. Most operating systems provide address-space isolation via the hardware memory-management unit (MMU), but MVM provides isolation via software type-safety. (This is actually an old idea. The Burroughs B5000 series provided full application isolation but without MMU isolation and without the concept of hardware privilege. In fact the JVM borrows a number of key concepts from the B5000, but that's a story for another day). AppDomains in Microsoft's CLR are similar to MVM tasks. MVM offers the possibility of a number of direct benefits such as fast inter-task communication, decreased startup latency, fine-grained resource management, and improved memory footprint.
MVM is extremely useful - you might even say required - in environments where the platform doesn't provide a process-like construct. This could be because the operating system is too small or because the MMU doesn't address-space isolation. MVM can step in and make up for the missing platform features, enabling the user to run multiple independent Java applications. This makes MVM a good choice for small embedded environments often inhabited by J2ME. J2SE, however, typically runs on more capable systems, such as Solaris, Linux, or Windows, that already provide mature and efficient process-based isolation. At that point, we have to ask ourselves if we want to recreate the wheel in J2SE and try to do in user-mode what the operating system and processor already do quite well?
The Isolate JSR (JSR121) is often conflated with MVM. If a JVM employs MVM technology, then isolates would be the logical way to express a task. But isolates can also be implemented without MVM, using the existing task-per-process model.
Properties of MVM:
Startup MVM provides exceptionally fast startup for the 2nd and subsequent tasks. In a sense, the JVM is already on hot standby, running some tasks, but ready to accept and start a new task without the need to create a new process and run through JVM initialization. It's worth noting, however, that class data sharing (CDS), which was developed after MVM, has improved startup performance in J2SE. A simple alternative to MVM is use class loader-based isolation. Under loader-based isolation each "task" runs under its own class loader instance. Application classes will have private per-task class static data. It's not perfect isolation, however, as the static variables in the system classes loaded by the boot class loader will not not be replicated. That means that one task might take actions that modify class static variables in the system classes and another task could see those changes. Depending on your environment, of course, this is unlikely to be a practical concern. If you're interested in this approach you might find NailGun of use.
It's worth mentioning that MVM is not multi-user. That single process can not normally be safely shared by multiple users, so each user would need at least one MVM process. In addition, MVM provides data isolation at the Java level, but not for native code that might be called by Java. Application-specific state could inadvertently be captured and stored by native JNI code. MVM can't and doesn't prevent that for happening.
Footprint It's commonly thought that MVM would offer considerable footprint savings. That is, running applications A, B and C in a single MVM process will consume much less memory than running A, B and C as distinct traditional JVM processes. This sounds like a compelling benefit and so the point deserves more attention. Lets consider the memory consumption of application A running in a traditional JVM. First, we'll have all the code associated with the JVM itself (the .text segment of jvm.dll or libjvm.so). That's already shared by virtue of of the operating system, however. Next, there's mutable native data; that's unshared and largely unsharable. The sample applies to thread stacks -- unshared and entirely unsharable. Next, we have plain vanilla mutable live heap data. As you'd expect that's unshared and unsharable. Critically, for a large class of enterprise applications the live heap data dominates footprint. Next we have the class file image data. The class files for rt.jar are shared to the extent possible by CDS. The in-memory class file images for application A are not currently shared in J2SE 6. Likewise, the code emitted by the just-in-time compiler (JIT) is unshared. So if we run A, B and C under MVM we won't actually enjoy much savings as compared to running three processors. Paraphrasing James Carville, "It's the heap!". MVM could share some in-memory class file data images and metadata, and potentially some emitted code, but that's it. Those components of footprint are typically just a small fraction of the heap. Very small applications (consider A, B and C as ls, cat and ps written in Java), however, would see more relative footprint benefit under MVM as they would typically have small heaps.
I should point out that we're at risk of engaging in circular reasoning. Small applications (e.g., ls or cat) might be less common in Java because of startup and footprint issues. Because the class of small applications is itself small, J2SE might continue to focus on larger enterprise-class applications, perpetuating the cycle. MVM would certainly provide excellent startup time for such small application. In addition, for such small applications the heap wouldn't dominate footprint and the sharing fraction under MVM would be much higher. We have to beware that we don't preclude an entirely new class of applications.
Resource Control MVM holds the promise of very fine grained resource control. But again, this might be better handled by the host operating or virtual machine monitor (VMM). You can achieve much the same result with existing JVM command line switches, Solaris zones or virtualization mechanisms such as VMware or Xen. Virtual machines make a excellent "container" for resource management in addition to providing checkpoint-restart capabilities that can be used to circumvent startup latency.
MVM is great technology, and in fact I'd claim it's mandatory for J2ME on smaller platforms. It's arguably superfluous for J2SE, however, as the underlying operating systems already provide safe and efficient page-level sharing for the major sharable components in the JVM's address space. If you're interested in MVM-like capabilities -- particularly startup -- I'd recommend giving NailGun a try. Chris Oliver's JApplication takes an even leaner approach to the problem. Also take a look at Steve Bohne's blog entry, which helps dispel some of the myths around the JVM's memory footprint. For an alternate viewpoint see MVM Fact and Fiction.