Friday Apr 17, 2009

IO, NIO, It's Off to JavaOne 2009 We Go

This year I'm co-presenting on two technical sessions:

TS-4222: Asynchronous I/O Tricks and Tips with Jean-Francois Arcand. I'm going to talk about the API, implementation details, and cover some usage guidelines. Jean-Francois is going to talk about the tricks he has learned using the API with ports of the Grizzly framework and Tomcat.

TS-5052: Hacking the File System with JDKTM Release 7 with Carl Quinn (Netflix). This talk will cover various topics with the new file system API. We hope to spend most of the time looking at code examples.

In addition to the sessions, there is: BOF-5087: All Things I/O with JDK™ Release 7. So far it is Chris and I, but I'm sure others will join us at the table.

Thursday Apr 09, 2009

JSR-203/NIO2 update

The implementation of JSR-203/NIO2 has been brewing as the OpenJDK New I/O project over the last year or so. The bulk of the socket-channel changes were pushed to the jdk7 repository back in build 37 and most of the rest went into jdk7 build 50. I've have way too much on my plate recently so I haven't had cycles to write about it. That needs to change as there is so much to talk about!

For those that haven't been tracking this project then the one-line summary is that it brings a new API to the file system, a new API for asynchronous I/O to both sockets and files, and completes the socket-channel API that was originally defined by JSR-51 back in Java SE 1.4.

The java.nio.file package is where the new API to the file system resides. The API is provider-based so the adventurous can deploy their own file system implementations. Out of the box, there is a default file system that provides access to the regular file system that both java and native applications see.

Most developers will likely use the Path class and little else. Think of Path as the equivalent of in the new API. Interoperability with and existing code is achieved using the toPath method so existing code can be retrofitted to use the new API without too many changes. There is long list of major short-comings and behavioral issues with that can never be fixed for compatibility reasons so using the toPath method gets you an object in the new API that accesses the same file as the File object.

A Path is created from a path-string or URI. It defines various syntactic operations to access its components, supports comparison, testing if a Path starts or ends with another Path, allows Paths to be combined by resolving one Path against another, support relativization, etc.

A Path may also be used to access the file that it locates. Both stream and channel I/O are supported. Using InputStream and OutputStream allows for good interoperability with the package and existing code. Channel I/O is via the new SeekableByteChannel or any of its super-types. A SeekableByteChannel is a ByteChannel that maintains a file position so it can be used for single-thread random access as well as sequential access. In the case of the default provider, the channel can be cast to a FileChannel for more advanced operations like file locking, positional read/write to support concurrent threads doing I/O to different parts of the file, and memory-mapped I/O. When opening files a set of options allows applications to indicate how the file should be opened or created. There are quite a few and these can be extended further with implementation specific options where required. Speaking of extensibility, much of the API allows for implementation specific extensions where needed.

In addition to I/O, there are methods to do all the usual things like create directories, delete files, checking access to files, copy and move, etc. The copyTo/moveTo methods do the right thing and know how to copy file meta-data, named streams, special files etc. The important thing is that they work in a cross-platform manner and also throw useful exceptions when they fail. In this API all methods that access the file system throw the checked IOException. There are specific IOExceptions defined for cases where recovery action from specific errors is useful. All other exceptions in this API are unchecked.

Accessing directories is a bit different to Instead of returning an array or List of the entries, a DirectoryStream is used to iterate over the entries in the directories. This scales to large directories, uses less resources, and can smooth out the response time when accessing remote file systems. Another difference is that the platform representation of the file names in the directory is preserved so that the files can be accessed again (this is very important where the file names are stored as sequences of bytes for example). A further advantage to having a handle to an open directory is that it is possible to do operations relative to the directory, something that is important for security-sensitive applications. When iterating over a directory the entries can be filtered. The API has built-in support for glob and regex patterns to filter by name or arbitrary filters can be developed.

The API has full support for symbolic-links based on the long-standing semantics of Unix symbolic links. This works on Windows Vista and newer Windows aswell. By default symbolic links are followed with a couple exceptions, such as move and delete. There are also a few cases where the application can specify an option to follow or not follow links. This is important when reading file attributes or walking file trees for example.

Speaking of walking file trees, the Files.walkFileTree utility method is invoked with a starting directory and a FileVisitor that is invoked for each of the directories and files in the file tree. Think of walkFileTree as a simple-to-use internal iterator for file trees. File tree traversal is depth-first with pre-visit and post-visit for directories. By default symbolic links are not followed which is exactly what you want when developing find, chmod -R, rm -r, and other operations. An option can be used to cause symbolic links to be followed (say find -L or cp -r). When following symbolic links then cycles are detected and reported to avoid infinite loops or stack overflow issues. Another thing to mention about FileVisitor is that each of the methods return a result to control if the iteration should continue, terminate or if parts of the file tree should be skipped. Many of the samples in the samples directory use walkFileTree.

The java.nio.file package also has a WatchService to support file change notification. This maps to the inotify or equivalent facility to detect changes caused by non-communicating entities. The main goal here is to help with the performance issues of applications that are forced to poll the file system today. It's a relatively low-level/advanced API, arguable niche, but is straight-forward to build on. The WatchDir example in the samples directory demonstrates using it to snoop a directory or file tree.

The java.nio.file.attribute package provides access to file attributes. File attributes or meta-data is an awkward area because it is very file system specific. The approach we've taken in this API is to group related attributes and define a “view” of the commonly used attributes. An implementation is required a support a “basic” view that defines attributes that are common to most file systems (file type, size, timestamps, etc.). An implementation may support additional views. The package defines a number of views for other common groups of attributes, including a subset of the POSIX attributes. An implementation isn't required to support these of course and may instead support implementation-specific views. In most cases, developers won't need to be concerned with all this but instead will use the static methods defined by the Attributes class for the common cases.

A couple of other things to say about file attributes is that, where possible, attributes are read in bulk (think stat or equivalent). This will help with the performance of applications that need several attributes of the same file. Dynamic access is also supported. This allows attributes to be treated as name/value pairs. This is useful to avoid compile time dependencies on implementation specific classes at the expense of type-safety. The API also allows initial file attributes to be set when creating files. This is very important to eliminate the attack window that would otherwise exist if file permissions or ACL are set after the file is created.

The other new package is java.nio.file.spi This is for the provider mechanism mentioned above and is really only interesting to the few that are developing their own file system implementations. One thing to mention is that in addition to custom providers, it is possible to replace or interpose on the default provider. If someone wants to extend the default provider then they install their own provider that delegates to the otherwise default provider. This is useful to those interested in developing virtual file systems for example. One thing that isn't complete yet is that the existing classes (File/FileInputStream, etc.) don't yet redirect when the default provider is replaced. I hope to get this fixed soon.

Moving on from the file system API, the second major topic is a new set of AsynchronousChannels for asynchronous I/O. The new channels provide an asynchronous programming model for both sockets and files. The API is designed to map well to operating systems with a highly scalable mechanism to demultiplex events to pools of threads (think Solaris 10 event ports or Windows completion ports for example).

Asynchronous I/O operation takes one of two forms. In the first form, I/O operations return a java.util.concurrent.Future that represents the pending result. The Future interface defines methods to poll the status or wait for the I/O operation to complete. The second form is where the I/O operation is initiated specifying a CompletionHandler that is invoked when the I/O operation completes (think callbacks).

Completion handlers immediately raise the question as to what threads invoke the handlers. In this API, asynchronous channels are bound to a group that encapsulates a thread pool and the other resources shared between channels in the group. Groups are constructed with a thread pool so the application gets to control the number of threads, thread identity, and other policy matters. Combining thread pools and I/O event demultiplexing is a complex area. The project page has further information on this topic.

The other big update in the channels package is the completion of the socket channel API. Each of the network oriented channels now implement NetworkChannel and so define methods to bind the channel's socket, set and query socket options, etc. This should eliminate the need to use the troublesome socket adapter. Furthermore, the support for socket options is extensible to allow for operating system specific options, important for fine tuning some high performance servers. Multicast support has also been added.

The original early draft proposed a new set of buffer classes that were indexed by long and so could support more than 231 elements. In the I/O area the main use-case is contiguous mapping of files regions larger than 2GB. On further examination, this was an ugly solution so this has been buried for now. If support for big arrays is added to the collections API in the future then with appropriate factories then it should be possible to have such arrays backed by a large mapped region. In the mean-time, applications needing to map massive files need to do so in chunks of 2GB or less.

One other update to mention is BufferPoolMXBean. This is new management interface for pools of buffers so that tools such as VisualVM, jconsole, or other JMX clients, can monitor the resources associated with direct or mapped buffers.

So, that's a brief run through the project. In the introduction paragraph I mentioned that the bulk of the implementation was pushed to jdk7 a few builds back. Overall, I'm relatively happy with it but there are a few areas that need to be re-visited. For now, development will continue at the New I/O project and we will hopefully sync up again with jdk7 in a couple of builds time.

Saturday Apr 28, 2007

Multicasting with NIO

A long standing issue for many developers is that the java.nio.channels package has lacked support for Internet Protocol (IP) multicasting. In the NIO.2 early review draft specification you will see that we have addressed this issue by adding multicast support to DatagramChannel. Here's a small example that opens a DatagramChannel, binds the channel's socket to a local port, sets the network interface for multicast datagrams sent via the channel, and then joins a multicast group on the same interface:

NetworkInterface interf = NetworkInterface.getByName("eth0");
InetAddress group = InetAddress.getByName("");

DatagramChannel dc =
    .setOption(SocketOption.SO_REUSEADDR, true)
    .bind(new InetSocketAddress(5000))
    .setOption(SocketOption.IP_MULTICAST_IF, interf);

MembershipKey key = dc.join(group, interf); 

Once the channel has joined the group then it reads or receives multicast datagrams in the same manner that it reads or receives unicast datagrams.

The most significant thing in this example is the static factory method to open the channel specifies a protocol family. In this example, the channel is to an IPv4 socket. In the existing APIs, the protocol family is transparent and all sockets created by the or java.nio.channels packages are either all IPv4 or all IPv6. For IP multicasting it is important that the protocol family corresponds to the address type of the multicast groups that the socket joins; otherwise it is highly operating system specific if the socket can join the group, configure options, or receive multicast datagrams. Legacy has suffered greatly from problems in this area. Another interesting thing to point out is that the channel's socket is bound and socket options are configured directly. The awkward and counterintuitive socket adaptor isn't required so it isn't necessary to mix socket APIs when configuring the channel's socket.

In the example, the MembershipKey that is returned by the join method is a token to represent membership of the group. It defines methods to query information about the membership and defines the "drop" method to drop membership of the group.

Developers tracking multicast standards will know that the RFCs in this area have been updated in recent years to add source filtering and this is now supported by almost all modern operating systems. In the NIO.2 draft specification we have included basic support for source filtering. The following code fragment shows a channel joining a multicast group to only receive multicast datagrams sent by a specific IP source address (otherwise known as "include-mode" filtering):

MembershipKey key = dc.join(group, interf, source);

"Exclusive-mode" filtering is where a group is joined to receive all multicast datagrams except those from specific IP source address:

MembershipKey key = dc.join(group, interf).block(source1).block(source2);

So that's a brief introduction to the multicast support that we propose to add to the java.nio.channels package. There's a lot more detail in the draft specification for those interested in this topic.




Top Tags
« August 2016

No bookmarks in folder