Wednesday May 09, 2007

Moving Dynamic RMI-IIOP to codegen

The GlassFish ORB features include dynamic RMI-IIOP, which generates RMI-IIOP stubs at runtime without using rmic -iiop. I have recently changed the method used in the ORB to generate dynamic stubs.  The code previously used BCEL for code generation.   The old code is fairly complicated, reflecting the level of detail needed to generate a class using BCEL (or any of the other low level bytecode generation  frameworks).

Due to its length, I'll only include a small fragment of the current code that generates stubs. Here is the code that generates the writeReplace method on each stub for RMI-IIOP (BCEL version):

private void createWriteReplace() {
    InstructionList il = new InstructionList();
    MethodGen method = new MethodGen(ACC_PRIVATE, Type.OBJECT, Type.NO_ARGS,
        new String[] {}, "writeReplace", className, il, constantPoolGen);
    il.append( instructionFactory.createLoad( Type.OBJECT, 0 ) ) ;
    il.append( instructionFactory.createInvoke( className, "selfAsBaseClass",
        Type.OBJECT, Type.NO_ARGS, Constants.INVOKEVIRTUAL ) ) ;
    il.append( instructionFactory.createReturn( Type.OBJECT ) ) ;

    finalizeMethod( classGen, il, method ) ;
In the new version, this is replaced with:
    _method( PRIVATE, _Object(), "writeReplace" ) ; 
    _body() ;
    _return(_call(_this(), "selfAsBaseClass" )) ;
_end() ;

which is not much longer than the equivalent Java code:

    private Object writeReplace() { 
     return selfAsBaseClass() ;

Here is the main part of the new code for generating dynamic RMI-IIOP stubs:
(className, superClass, interfaces, and methods are private data members set up
by the constructor)

public Class<?> create( ProtectionDomain pd, ClassLoader cl,
boolean debug, PrintStream ps ) {

    Pair<String,String> nm = splitClassName( className ) ;
_clear() ;
_package( nm.first() ) ;
    _class( PUBLIC, nm.second(), superClass, interfaces ) ;

    _constructor( PUBLIC ) ;
    _body() ;
    _end() ;

    _method( PRIVATE, _Object(), "writeReplace" ) ;
    _body() ;
        _return(_call(_this(), "selfAsBaseClass" )) ;
    _end() ;

    int ctr=0 ;
    for (MethodInfo method : methods)
        createMethod( ctr++, method ) ;    
    _end() ; // of _class

    return _generate( cl, pd, debug ? debugProps : emptyProps, ps ) ;

private static final Type objectArrayType = Type._array(_Object()) ;

private static void createMethod( int mnum, MethodInfo method ) {
    Type rtype = method.returnType() ;
    _method( method.modifiers() & ~ABSTRACT, rtype, ;   
    List<Expression> args = new ArrayList<Expression>() ;
    for (Variable var : method.arguments() )
        args.add( _arg( var.type(), var.ident() ) ) ;
    _body() ;
        List<Expression> wrappedArgs = new ArrayList<Expression>() ;
        for (Expression arg : args) {
        wrappedArgs.add( Primitives.wrap( arg ) ) ;

        Expression invokeArgs = _define( objectArrayType, "args",
        _new_array_init( _Object(), wrappedArgs ) ) ;

        // create expression to call the invoke method
        Expression invokeExpression = _call(
        _this(), "invoke", _const(mnum), invokeArgs ) ;

        // return result if non-void
        if (rtype == _void()) {
        _expr( invokeExpression ) ;
    _return() ;
} else {
       Expression resultExpr = _define( _Object(), "result", invokeExpression ) ;
    if (rtype != _Object()) {
            // Must cast resultExpr to expected type, or unwrap won't work!
            Type ctype = Primitives.getWrapperTypeForPrimitive( rtype ) ;
            Expression cexpr = _cast( ctype, resultExpr ) ;
            _return( Primitives.unwrap( cexpr ) ) ;
        } else {
            _return( resultExpr ) ;
    _end() ; // of method

Each method that starts with an underscore is a static method in the class  All of these methods are imported using the Java 5 static import feature (which I normally avoid, but it's really useful here to reduce the amount of code needed).  This example does not use it, but an _import( String ) method is also provided, which returns a Type that can be used as needed for generating code.  I've also written another class (Primitives) that provides useful methods for generating the wrap/unwrap code needed to move primitive data types into and out of objects.  Wrapping/unwrapping is extremely common in all sorts of proxy generation.  Of course, that's really what the code generation for dynamic RMI-IIOP is: a particular kind of proxy.

The basic structure of a class defined by codegen is:

_package( String ) (or no String for the global package)
_import( String ) as desired
_class( int, String, Type, Type... )

_data to introduce class data

_constructor( int, Type... ) to start a constructor
_method( int, Type, String, Type...

either _constructor or _method can be followed by _arg( String, Type ) to add arguments

_body() introduces the main body of the constructor method.  if, while, try/catch/finally, switch, and most Java expressions are supported. 

_end() ends the enclosing statement, method, or class.

This allows seamless mixing of code that generates code with the computations necessary to generate the code.  Expressions become just an instance of a Java Type, and can be manipulated easily as Expressions.

Internally the _xxx method calls build a stack of state machines representing the context used to generate an AST.  The AST is fairly conventional, and uses Visitors to generate source code, decorate the AST, generate byte code, and perform diagnostic and utility functions such as dumping the AST. 

I have not compared performance of the two approaches, as it is not very relevant for most applications.  A class is generated once in codegen, then re-used many times (typically until an EJB is undeployed in the app server case).  This one-time cost is insignificant in most cases.

The visitor that generates bytecode uses the ASM package for low-level bytecode generation. I used ASM for a number of reasons:

  1. After using BCEL and examining ASM, I found ASM to be easier to use in general, and also for use in codegen.
  2. ASM is considerably smaller than BCEL.
  3. ASM seems to be more actively maintained, and is actively tracking new developments in the Java VM.

Currently the ORB maintains local copies of both BCEL and ASM, renamed using the ORB build rename mechanism to avoid collision with other copies elsewhere in the codebase (there are at least 3 copies of BCEL in various places, all under different names!).  The renaming is done to avoid versioning problems.  For example, the ORB and the EJB persistence code at one point used different versions of ASM. Removing BCEL from the ORB will reduce the ORB's footprint (in the optorbcomp.jar file) by around 600KB.  I plan to integrate these changes into GlassFish v2 in the near future. Build 44 or 45 should include dynamic RMI-IIOP using codegen by default; a later build will remove BCEL completely from the ORB.

I'll discuss codegen further in a future blog entry.




« July 2016