• JVM
    March 18, 2014

the isthmus in the VM

John Rose

This is a good time to consider new options for a “native interconnect”
between code managed by the JVM and APIs for libraries not managed by
the JVM.

Notably, Charles Nutter has followed up on his
JVM Language Summit talk (video on this page)
by proposing JEP 191,
to provide a new foreign function interface for Java.

To access native data formats (and/or native-like ones inside the JVM),
there are several projects under way
including David Chase’s data layout package,
Marcel Mitran’s packed object proposal,
and Gil Tene’s object layout project.

This article describes some of the many questions related to native
interconnect, along with some approaches for solving them.
We will start Project Panama in OpenJDK to air out these questions
thoroughly, and do some serious engineering to address them, for the JDK.

Let us use the term native interconnect for connections between
the JVM and “native” libraries and their APIs.
By “native” libraries I simply mean those routinely used by
programmers of statically compiled languages outside the JVM.

the big goal

I think the general, basic, idealistic goal is something like this:

If non-Java programmers find some library useful and easy to
access, it should be similarly accessible to Java programmers.

That ideal is easy to state but hard to carry out.

The fundamental reason is simple—the languages are different.
C++ programmers use the #include statement for pulling in APIs,
but it would be deeply misguided to try to add #includes to
the Java language.
For more details on how language differences affect
, see the discussion below.
Happily, this is not completely new ground, since managed languages
(including Lisp, Smalltalk, Haskell, Python, Lua, and more)
have a rich history of support for native interconnect.

Most subtly, even if all the superficial differences could be
adjusted, the rules for safe and secure usage of Java differ
from those of the native, statically-compiled languages.
There is a range of choices for ensuring that a native
library gets safely used. The main two requirements are to
make VM-damaging errors very rare,
and (as a corollary) to make intentional attacks very difficult.
We will get into more details below.

Besides safety, Java has a distinctive constellation of
“cultural” values and practices,
notably the features which provide safety and error management.
So, the access to C APIs must be be adapted to the client language
(Java) by means of numerous delicate compromises and engineering
choices to preserve not only the “look and feel” of Java expressions
but also their deeper cultural norms.
By using the metaphor of culture, I don’t imagine a “Java way of life”,
but I observe that there are “Java ways” of coding, which differ
interestingly from other ways of coding.
Cultural awareness becomes salient when cultures meet and mix.

Anyway, to get this done, we need to build a number of different
, including Java libraries, JVM support
mechanisms, tools, and format specifications.
A number of possibilities are enumerated below.

why this is difficult

First, let’s survey some of the main challenges to full native interconnect.

  1. syntax: Since the languages differ, Java user code for a native
    API will differ in syntax from the corresponding native user code,
    sometimes surprisingly.
    For example, Java 8 lambdas are very different in detail from C
    function pointers, although they sometimes have corresponding uses.
    Java has no general notions corresponding to C macros or C++ templates.

  2. naming: Different languages have different rules for identifier
    formation, API scoping (packages vs. namespaces), and API element naming.
    Languages even have differing kinds of names: Java has distinct name
    spaces for fields and methods, while C++ has just members.

  3. data types: Basic data types differ.
    Booleans, characters, strings, arrays differ between the languages
    C++ uses pointers, sometimes for information hiding, sometimes for
    structurally transparent data. Java uses managed references, which
    always have some hidden structure (the object header). And so on.
    A user-friendly Java interconnect to a native API needs to adjust
    the types of API arguments and return values to reduce surprises.

  4. storage management: Many native libraries operate through
    pointers to memory, and they provide rules for managing that memory’s
    Java and native languages have very distinct tactics for this.
    Java uses garbage collection and C++ libraries usually require manual
    storage management.
    Even if C++ were to add garbage collection, the details would
    probably be difficult to reconcile.
    A safe Java interconnect to a native API needs to manage native storage
    in a way that cannot crash the JVM.

  5. exceptions: As with storage management, languages differ in
    how they handle error conditions. C++ and Java both have exceptions,
    but they are used (and behave) in very different ways.
    For example, C++ does not mandate null pointer exceptions.
    C APIs sometimes require ad hoc polling for errors.
    A user-friendly Java interconnect to a native API needs a clear story
    for producing exceptions, which is somehow derived from the
    native library’s notion of error reporting.

  6. other semantics: Java’s strings are persistent (used to be called
    “immutable”) while C’s strings are directly addressable character arrays
    which can sometimes change. (And C++ strings are yet another thing.)

  7. performance: Code which uses Java primitives performs on a par
    corresponding C code, but if an API exchanges information using other
    types, including strings, boxing or copying can cause performance
    “potholes”. I expect that value types will narrow the gap eventually
    for other C types, but they are not here yet.

  8. safety: I’m putting this last, but it is the most difficult and
    important thing to get right. It deserves its own list of issues,
    but the gist of it is the JVM as a whole must continue to operate
    correctly even in the face of errors or abuse of any single API.
    The next section examines this requirement in detail.

safety first

The JVM as a whole must continue to operate correctly when
native APIs are in use by various kinds of users.

  1. no attacks from untrusted code: Untrusted code
    must not be allowed to subvert the correct operation of the JVM,
    even if it makes very unusual requests of native APIs available to it.
    This implies that many native APIs must be made inaccessible to untrusted

  2. no privilege escalation from untrusted code: Untrusted users
    should not be able to access files, resources, or Java APIs via
    native APIs, if they would not already have access to them via Java code.

  3. no crashes: It must be difficult for ordinary user code, and
    impossible for untrusted code, to crash the JVM using using a native
    API. Native API calls which might lead to unpredictable behavior
    must be detected and prevented in Java code, preferably by throwing
    exceptions. Pointers to native memory must be checked for null
    before all use, and discarded (e.g., set to null) when freed.

  4. no leaks: It must be difficult or impossible for ordinary user
    code to use a native API to use memory or other system resources in a
    way that they cannot be recovered when the user code exits.
    Native resources must be used in a manner that is scoped

  5. no hangs: It must be difficult or impossible for ordinary user
    to cause deadlocks or long pauses in system execution.
    Pauses for JVM housekeeping, like garbage collection, must not be
    noticeably lengthened because of waits for threads running native code.

  6. rare outages: Even if code is partially or fully trusted, errors
    that might lead to crashes, leaks, or hangs must be detected before
    they cause the outage, almost always.

  7. no unguarded casts: If privileged Java code must use cast-like
    operators to adjust its view of native data or functions, the casting
    must be done only after some kind of check has proven that the cast
    will be valid.
    This implies that native data and functions must be accessed through
    Java APIs that fully describe the native APIs and can mechanically check
    their use.

From these observations, it is evident that there are at least three trust
levels that are relevant to native interconnect: untrusted, normal,
and privileged.

Java enforces configurable security policies on untrusted code, using
APIs like the security manager. This ensures that untrusted code
cannot break the system (or elevate privileges) even if APIs are

Normal code is the sort of code which can run in a JVM without a
security manager set. Such code might be able to damage the JVM,
using APIs like sun.misc.Unsafe, but will not do so by accident.
As a practical way to reduce risk, we can search normal code
for risky operations, which should be isolated, and review their use
for safety.

I think many of the tricky details of native interconnect are related
to this concept of privileged code. Any system like the JVM that
enforces safety invariants or access restrictions has trusted,
privileged code that performs unsafe or all-access operations,
such as file system access, on behalf of other kinds of code.

Put another way, privileged code is expected to be in the risky business.
It is engineered with great care to conform to safety and security policies.
It supports requests from non-privileged code—even untrusted code—after
access checks on behalf of the requester.
Privileged code needs maximum access to native APIs of the
underlying system, and must use them in a way that does not propagate
that access to other requesters.

engineering privileged wrapper code

In the present discussion, we can identify at least two levels of binding
from Java code to native APIs: a privileged “raw access” to most or
all API features, and a wrapped access that provides safety guarantees
that match the cultural expectation of Java programmers.

So let’s examine the process of engineering the wrapper code
that stands between normal Java users and native APIs.

In current implementations of the JDK, native APIs are wrapped
in hand-written JNI wrapper code, written in C.
In particular, all C function calls are initiated from JNI wrappers.

(There is plenty of other privileged code written both in Java and C++.
Much Java code in packages under java.lang and sun is privileged
in some way. Most of it is not relevant to the present subject.)

Ideally, wrapper code should be constructed or checked mechanically when possible.
In the present system, the javah tool assists, slightly, in bridging between
Java APIs and JNI code. JNI wrapper code is checked by the native C compiler.
And that is about all. Surely Java-centered tools could do more.

On the other hand, as we saw above, bringing the languages together
is hard.
No tool can erase the cultural differences between Java and
native languages. There will always be ad hoc adjustment to reduce or
remove hazards from native APIs. Such adjustments will usually be
engineered by hand in privileged code, as they are today in JNI
wrapper code.

We must ask ourselves, why bother to build new mechanisms for
native interconnect when JNI wrappers already do the job?
If manual coding will always be required, perhaps it is better to do
the coding in the native language, where (obviously) the native APIs
are most handy. In that case, there would be no need for Java
code ever to perform unsafe operations. Isn’t this desirable?

I think the general answer is that we can improve on the trade-offs provided
by the present set of tools and procedures. Specifically, by using more
Java-centered tools and procedures, we can improve performance.
Independently of performance, we can also decrease the engineering
costs of safety.

better performance without compromising safety

Safety will always trade against performance, but—as Java has proven
over its lifetime—it is possible with care to formulate and optimize
safety checks that do not interfere unacceptably with performance.

Classic JNI performance is relatively poor, and some of the reasons
are inherent in its design. JNI wrappers are created and maintained
by hand, which means that the JVM cannot “see into” them for
optimizing them.

If the JNI wrappers were recoded in Java (or some other transparent representation)
then the JVM could much better optimize the enforcement of safety checks.
For example, a program containing many JNI calls could be reorganized as
one which grouped the required safety checks (and other housekeeping)
into a smaller number of common blocks of code.
These blocks could then be optimized, amortizing the cost of safety
checks across many JNI calls.

Analogous optimizations of lock coarsening or boxing elimination are
possible because all the operations are fully transparent to the JVM.
By comparison, there is much unnecessary overhead around native calls today.

This sort of optimization is routine when the thing being called can be
broken down into analyzable parts by the JIT compiler.
But C-coded JNI wrappers are totally opaque to it.
The same is currently true of the wrappers created by JNR,
but they are regular enough in structure that the JIT can
begin to optimize them.

In my opinion, a good goal is to continue opening up the
representation of native API calls until the optimized
JIT code for a native API call is, well, optimal.
That is, it can and should consist of a direct call
to the native API, surrounded by a modest amount of housekeeping,
and all inlined and optimized with the client Java code.

Making this happen in the compiler will require certain design
adjustments. Specifically, the metadata for the native API
must be provided in a form suitable for both the JVM interpreter
and compiler.
More precisely, it must support both execution by the JVM interpreter
and/or first-level JIT, and also optimizing compilation by the full JIT.
This implies that the native API metadata must contain some of the
same kind of information about function and data shape that a C compiler
uses to compile calls within C code.

lower engineering costs for safety

I also think that coding more wrapper logic in Java instead of C will
provide more correctness at a lower engineering cost.
Although wrapper code in C has the advantage of direct access
to native APIs, the code itself is difficult to write and to review for
C programmers can create errors such as unsafe casts in a few
benign-looking keystrokes.
C-oriented tools can flag potential errors, but they are not designed
to enforce Java safety norms.

If direct access to C APIs were available to Java code, all other
aspects of wrapper engineering would be simpler and easier to
verify as correct.
Java code is safer and more verifiable than C code.
If written by hand, it is often more compact and simple than
corresponding C code.
Routine aspects of wrapper engineering could be specified declaratively,
using specialized tools to generate Java code or bytecode automatically.
Whether Java wrapper code is created manually or automatically, it is
subject to layers of safety checking (verifying and dynamic linking)
that C code does not enjoy.
And Java code (both source files and class files) can be easily inspected
by tools such FindBugs.

The strength of such an automated approach can be seen in the
work noted by JEP 191, the excellent JNR project.
For a quick look at a “hello world” type example from JNR,
see Getpid.java.
Although the emphasis on JNR is on function calling,
integrated native interconnect to functions, data, and types
is also possible.

Side note:
My personal favorite example of automated language integration
is an old project that integrated C++ and Scheme
on Solaris.
The native interconnect was strong enough in that system to
allow full interactive exploration of C++ APIs using the Scheme
interpreter. That was fun.

One way we can improve on the safe use of these prior technologies is
to provide more mechanical infrastructure for reasoning about the
safety of Java application components.
It should be possible to create wrapper libraries that internally use
unsafe native APIs but reliably block their users from accessing those APIs.
To me this feels like a module system design problem.
In any case, it must be possible to correctly label, track, review,
and control both unsafe code and the wrapper code that secures it.

wrapper tactics

A likely advantage of Java-based wrappers is easier access to good
engineering tactics for wrapping native APIs.
Here are a few examples of such tactics:

  • exception conversion: Error reporting conventions specific
    to native languages or APIs can be converted to Java exceptions.
  • pointer handles: Native pointers which can or must be freed
    can be stored in Java wrapper objects which nullify the saved
    pointer when it is freed, and check for this state as needed.
  • wrapper objects: Native data can be encapsulated inside Java
    objects to mediate access by providing a safe view.
    The object can use an internal handle field to manage native lifetime.
  • (Future wrapper values: In cases where stateless wrappers can
    do the job, value types are likely to provide provide cheaper
    encapsulation in the future. This would be the case with primitive
    types not in Java, such as unsigned long or platform specific vectors.
    When native lifetime is not an issue, value types could also
    provide encapsulating views of native pointers, structs, and arrays.)
  • resource scoping: APIs which require critical sections or paired
    primitives can be mapped to the Java try-with-resources syntax
    or refactored into a callback driven style (using lambdas).
  • language feature mapping: Corresponding types and operations
    can usually be mapped according to simple conventional rules.
    For example, a C char* can usually be represented by a Java
    String object at an API boundary.
    (But, these mappings must be tunable on a case-by-case basis.)
  • static typing: The Java type system can represent a wide
    variety of type shapes.
  • design rule checking:
    Ad hoc usage rules for native APIs can be enforced as executable
    assertions in code wrapped around the unchecked native API.
  • interfaces: Every transfer of control or data into or out of
    a native API can (and should) be mediated through a Java interface.
    In this way fully abstract API shapes can be presented directly to the
    (unprivileged) end user without exposing sensitive implementations.

Most of these tactics can be made automatic or semi-automatic
within a code generation tool, and apply routinely unless manually disabled.
This will further reduce the need for tricky hand-maintained code.

Interfaces are particularly useful for expressing groups of methods,
since they express (mostly) pure behavior rather than Java object
Also, interfaces are easy to compose and adapt, allowing flexible
application of many of the above tactics.

As used to represent an extracted native API, an interface
would be unique to that API. Uses of such interfaces would tend
to be in one-to-one correspondence with their implementations.
In that case JVMs are routinely able to remove the overhead of
method selection and invocation by inlining the only relevant

questions to answer, artifacts to build

A native interconnect story will supply answers to a number of related questions:

  • How do we simplify the user experience for Java programmers who use C and C++ APIs?
    (The benchmark is the corresponding experiences of C and C++ programmers,
    as well as the experiences of today’s JNI programmers.)

  • What appropriate tools, APIs, and data formats support these experiences?
    Specifically, how is API metadata produced, stored, loaded, and used?
    How are native libraries named and loaded?

  • What appropriate JVM and JDK infrastructure works with native API elements
    (layouts, functions, etc.) from Java code (interpreter and JIT)?

  • How performant are calls and data access to native libraries?
    (Again, the benchmark is the corresponding experiences of C and C++ programmers,
    as well as the experiences of today’s JNI programmers.)
    enjoyed by their primary users (programmers of C, C++, Fortran, etc.).

  • What are the definite, reliable safety levels available for using
    native libraries from Java?
    This includes the question: What is the range of options between
    automatic, perhaps unsafe import, and engineered hand-adjustments?

  • What are the options for managing portability?
    This includes the use of platform-specific libraries,
    and a story for switching between platform-specific bindings
    and portable backup implementations.

Answering these questions affirmatively will require us to build some
interesting technology, including discrete and separable projects
to enable these functions:

  • native function calling from JVM (C, C++)
  • native data access from JVM or inside JVM heap
  • new data layouts in JVM heap
  • native metadata definition for JVM
  • header file API extraction tools (see below)
  • native library management APIs
  • native-oriented interpreter and runtime “hooks”
  • class and method resolution “hooks”
  • native-oriented JIT optimizations
  • tooling or wrapper interposition for safety
  • exploratory work with difficult-to-integrate native libraries

Project Panama in OpenJDK will provide a venue for exploring
these projects.
Some of them will be closely aligned with OpenJDK JEPs,
notably JEP 191,
allowing the Project to incubate early work on them.

Other inspiration and/or implementation starting points include:

  • the Java Native Runtime package and the libffi native call binder
  • Java data layout packages
  • JVM support for new layouts (IBM packed objects, Sun Labs Maxine hybrids, Arrays 2.0)
  • metadata-based native API extractors (WinRT metadata)
  • existing JVM infrastructure (class files, SA, JNI, sun.misc.Unsafe)

A native header file import tool scans C or C++ header files
and provides raw native bindings for privileged Java code.
Such tools exist already for other languages, and can get colorful
names like SWIG or Groveller.

For the present purposes, I suggest a simpler name like jextract.
A high-quality implementation for Java could start with an
off-the-shelf front end like libclang.
It would apply Java-oriented rules (with hand-tunable defaults)
and produce some form of metadata, such as loadable class files.

A toolchain that embodies many of these ideas could look something like this:

 /-----------|    /-----------|
| stdio.h | | stdio.java |
|------------| |------------|
| |
v |
|------------| |
| jextract | <-----/
| stdio.jar | /------------|
|------------| | userapp.jar|
| |------------|
v |
|------------| |
| jvm | <--------/ /---------|
| | <--------------| libc.dll |
|------------| |----------|

The stdio.java file would contain hand-written adjustments to the raw API from the header file.
The stdio.jar file would contain automatically gathered metadata from the header file,
plus the results of compiling stdio.java.
The contents of stdio.java could be straight Java code for the user-level API,
but could also be annotations to be expanded by a code generation step in the extraction process.

The code in userapp.jar would access the features it needs from stdio.jar.
The implementations of these interfaces would avoid C code as much as possible,
so that the JVM’s JIT can optimize them suitably.

Side note:
The familiar header file I am picking on is actually unlikely to need this full treatment.
In a more typical case, a whole suite of header files would be extracted and wrapped.

For bootstrapping or pure interpretation, a minimum set of trusted
primitives are required in the JVM to perform data access and function call.
these would be coded in C and also known to the JIT as intrinsics.
They can be made general enough to implement once in the JVM, rather than
loaded (as JNI wrappers are loaded today) separately for each native API.
For example, JNR uses a set of less than 100 specially
designed JNI methods to perform all native calls; these methods are
collectively called jffi.

Building such toolchains will allow cheaper, faster commerce between Java
applications and native APIs, much as the famous Panama Canal cuts through
the rocky isthmus that separates the Atlantic and Pacific Oceans.

Let’s keep digging.

Appendix: preserving Java culture

Let’s go back to the metaphor of culture as it applies to the world
of Java programming.

Here is a list of benefits about Java that programmers rely on, which
any design for native interconnect must preserve.
As a group, these features support a set of basic programming practices
and styles which allow programmers great freedom to create good code.
They can be viewed as the basis of a programming “culture”, peculiar
to Java, which fosters safe, useful, performant, maintainable code.

Side note:
This list contains many truisms and will be unsurprising to Java users.
Remember that culture is often overlooked until two cultures meet.
I am writing this list in hopes it will prove useful as a checklist to
help analyze design problems with native interconnect, and to evaluate
Also, I am claiming that the sum total of these items underlies a unique
programming culture or ecosystem to Java, but not that they are individually
unique to Java.

  • basic type safety: Pointers, integers, and floats must not be confused;
    conversions must be explicit and must preserve VM integrity.
    This applies to values of all kinds, in memory and elsewhere.
  • basic operation safety: Any basic VM operation either completes
    according its specification, or produces a catchable exception.
    It cannot corrupt memory or any other VM state.
  • class safety: Pointer conversions must be explicit and checked.
    There are exceptions for conversion to a Java superclass (which is always safe),
    to a Java interface (which is always checked later at any use point),
    and to an erased generic type (which is checked implicitly).
  • storage lifetime safety: No block of memory can be accessed after it has been
    deallocated. This is why we have automatic storage management.
  • variable domain safety: There is no way to obtain “garbage” or
    indeterminately initialized values of any type (especially pointers, of course).
  • API type checking: Every use of an API, such as a method call, is fully
    type-consistent with its definition (such has a method definition).
    This requirement serves the earlier ones, of course; it shows up
    in detail in the operation of Java’s dynamic linkage rules.
  • late linking: All uses of names, including class, method, and
    field names, are resolved and access-checked not only at compile
    time but also at run time. Separately compiled modules (classes)
    cannot observe the implementation details of other modules.
  • concurrency safety: Race conditions between threads can be
    prevented, or their effects can be predicted usefully,
    or (at worst) they cannot violate the other safety invariants.
  • error manifestation: Exceptional or erroneous conditions
    are not discarded. They are manifested as thrown exceptions,
    which will be caught and/or displayed.
  • access control: Non-public or otherwise restricted API points
    cannot be accessed except by their specified users.
    Access is enforced at all phases of compilation and execution.
    System internals cannot be touched except by highly trusted code.
  • appropriately concise: Typically, Java code does not pay for
    any of Java’s built-in safety features by unnecessary verbosity.
    Safe and sane practices are encouraged by simpler notations.
    The “semantic payload” of a bit of code is not obscured by
    any necessary ceremony. (But note next points.)
  • predictably explicit: Typically, complex or potentially
    expensive features of Java are made explicit by a visible
    syntax, such as a method call. (This point is in tension
    with the previous point, and reasonable people differ on
    the proper resolution.)
  • explicit types: Java code has reasonably strong static typing,
    with many types explicitly written in the source code.
    (Notably, declaration types are explicit on the left, despite type
    inference elsewhere.)
    This feature catches errors early and gives IDEs helpful context
    for each name.
  • transparent code: Programs are represented using bytecode,
    which automated tools can inspect, verify, and transform.
    User-written annotations can help guide these tasks.
    There are easy to use, open source implementations of
    offline processors for both source code and bytecode,
    as well as the VM itself. Multiple good IDEs exist.
  • transparent data: Data can be inspected using reflection
    and other ubiquitous self-description machinery such as
    toString and debuggers.
    (Transparency of data is balanced with access control, of course.)
  • robust performance: With moderate programmer care and experience,
    simple single-threaded programs tend to not show surprising
    performance “potholes”, not even when they are composed together.
    Multi-threaded programs preserve and scale up throughput with
    additional CPUs, in the absence of algorithmic bottlenecks.

All of these benefits are familiar to Java programmers, perhaps
even taken for granted.
The corresponding benefits for a native language like C++ are often
more complex, and require more work and care from the native
programmer to achieve.

A good native interconnect story will provide ways to reliably dispose
of this work and care before it gets to the end user coding Java to a
native API.

This requires native APIs to be acculturated to Java by the
artful creation of wrapper code, as noted above.

Join the discussion

Comments ( 4 )
  • guest Wednesday, March 19, 2014

    It would be really great, if Project Panama would also be designed to include easy access to Objective-C APIs, beside C and C++ APIs.

    Objective-C is in widespread use these days.

  • Nick Evgeniev Sunday, March 23, 2014

    speaking of easy of use there are



    they both focus on easy of use and bridj fixes some awful performance issues of jna ... but speaking of the performance (I mean real performance)

    1. SLOW callbacks from native to java

    2. lack of structs&layout control in java (it kills!!!)

    check structs, pointers to structs, fixed arrays in .NET problem was solved years ago... and in java we have to go off-heap and do all the nasty tricks :(... it's a shame :(

    even go-lang has better integration with C :)

  • guest Tuesday, June 17, 2014

    This reminds me distinctly of a certain project that Sun did back in the day, addressing all the same concerns:


    Are you leveraging that excellent piece of work in Panama?

    Good luck John!

  • davidcl Tuesday, June 24, 2014

    Do you know the gluegen2 tool from JOGL2 ? This seems to be really similar to the jextract tool and has already been used to map a complex API (OpenGL all versions). Even if this tool do no focus on safety it seems to be quiet easy to add protection on the shared runtime code.

Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.