JSR-203/NIO2 update

The implementation of JSR-203/NIO2 has been brewing as the OpenJDK New I/O project over the last year or so. The bulk of the socket-channel changes were pushed to the jdk7 repository back in build 37 and most of the rest went into jdk7 build 50. I've have way too much on my plate recently so I haven't had cycles to write about it. That needs to change as there is so much to talk about!

For those that haven't been tracking this project then the one-line summary is that it brings a new API to the file system, a new API for asynchronous I/O to both sockets and files, and completes the socket-channel API that was originally defined by JSR-51 back in Java SE 1.4.

The java.nio.file package is where the new API to the file system resides. The API is provider-based so the adventurous can deploy their own file system implementations. Out of the box, there is a default file system that provides access to the regular file system that both java and native applications see.

Most developers will likely use the Path class and little else. Think of Path as the equivalent of java.io.File in the new API. Interoperability with java.io.File and existing code is achieved using the toPath method so existing code can be retrofitted to use the new API without too many changes. There is long list of major short-comings and behavioral issues with java.io.File that can never be fixed for compatibility reasons so using the toPath method gets you an object in the new API that accesses the same file as the File object.

A Path is created from a path-string or URI. It defines various syntactic operations to access its components, supports comparison, testing if a Path starts or ends with another Path, allows Paths to be combined by resolving one Path against another, support relativization, etc.

A Path may also be used to access the file that it locates. Both stream and channel I/O are supported. Using InputStream and OutputStream allows for good interoperability with the java.io package and existing code. Channel I/O is via the new SeekableByteChannel or any of its super-types. A SeekableByteChannel is a ByteChannel that maintains a file position so it can be used for single-thread random access as well as sequential access. In the case of the default provider, the channel can be cast to a FileChannel for more advanced operations like file locking, positional read/write to support concurrent threads doing I/O to different parts of the file, and memory-mapped I/O. When opening files a set of options allows applications to indicate how the file should be opened or created. There are quite a few and these can be extended further with implementation specific options where required. Speaking of extensibility, much of the API allows for implementation specific extensions where needed.

In addition to I/O, there are methods to do all the usual things like create directories, delete files, checking access to files, copy and move, etc. The copyTo/moveTo methods do the right thing and know how to copy file meta-data, named streams, special files etc. The important thing is that they work in a cross-platform manner and also throw useful exceptions when they fail. In this API all methods that access the file system throw the checked IOException. There are specific IOExceptions defined for cases where recovery action from specific errors is useful. All other exceptions in this API are unchecked.

Accessing directories is a bit different to java.io.File. Instead of returning an array or List of the entries, a DirectoryStream is used to iterate over the entries in the directories. This scales to large directories, uses less resources, and can smooth out the response time when accessing remote file systems. Another difference is that the platform representation of the file names in the directory is preserved so that the files can be accessed again (this is very important where the file names are stored as sequences of bytes for example). A further advantage to having a handle to an open directory is that it is possible to do operations relative to the directory, something that is important for security-sensitive applications. When iterating over a directory the entries can be filtered. The API has built-in support for glob and regex patterns to filter by name or arbitrary filters can be developed.

The API has full support for symbolic-links based on the long-standing semantics of Unix symbolic links. This works on Windows Vista and newer Windows aswell. By default symbolic links are followed with a couple exceptions, such as move and delete. There are also a few cases where the application can specify an option to follow or not follow links. This is important when reading file attributes or walking file trees for example.

Speaking of walking file trees, the Files.walkFileTree utility method is invoked with a starting directory and a FileVisitor that is invoked for each of the directories and files in the file tree. Think of walkFileTree as a simple-to-use internal iterator for file trees. File tree traversal is depth-first with pre-visit and post-visit for directories. By default symbolic links are not followed which is exactly what you want when developing find, chmod -R, rm -r, and other operations. An option can be used to cause symbolic links to be followed (say find -L or cp -r). When following symbolic links then cycles are detected and reported to avoid infinite loops or stack overflow issues. Another thing to mention about FileVisitor is that each of the methods return a result to control if the iteration should continue, terminate or if parts of the file tree should be skipped. Many of the samples in the samples directory use walkFileTree.

The java.nio.file package also has a WatchService to support file change notification. This maps to the inotify or equivalent facility to detect changes caused by non-communicating entities. The main goal here is to help with the performance issues of applications that are forced to poll the file system today. It's a relatively low-level/advanced API, arguable niche, but is straight-forward to build on. The WatchDir example in the samples directory demonstrates using it to snoop a directory or file tree.

The java.nio.file.attribute package provides access to file attributes. File attributes or meta-data is an awkward area because it is very file system specific. The approach we've taken in this API is to group related attributes and define a “view” of the commonly used attributes. An implementation is required a support a “basic” view that defines attributes that are common to most file systems (file type, size, timestamps, etc.). An implementation may support additional views. The package defines a number of views for other common groups of attributes, including a subset of the POSIX attributes. An implementation isn't required to support these of course and may instead support implementation-specific views. In most cases, developers won't need to be concerned with all this but instead will use the static methods defined by the Attributes class for the common cases.

A couple of other things to say about file attributes is that, where possible, attributes are read in bulk (think stat or equivalent). This will help with the performance of applications that need several attributes of the same file. Dynamic access is also supported. This allows attributes to be treated as name/value pairs. This is useful to avoid compile time dependencies on implementation specific classes at the expense of type-safety. The API also allows initial file attributes to be set when creating files. This is very important to eliminate the attack window that would otherwise exist if file permissions or ACL are set after the file is created.

The other new package is java.nio.file.spi This is for the provider mechanism mentioned above and is really only interesting to the few that are developing their own file system implementations. One thing to mention is that in addition to custom providers, it is possible to replace or interpose on the default provider. If someone wants to extend the default provider then they install their own provider that delegates to the otherwise default provider. This is useful to those interested in developing virtual file systems for example. One thing that isn't complete yet is that the existing java.io classes (File/FileInputStream, etc.) don't yet redirect when the default provider is replaced. I hope to get this fixed soon.

Moving on from the file system API, the second major topic is a new set of AsynchronousChannels for asynchronous I/O. The new channels provide an asynchronous programming model for both sockets and files. The API is designed to map well to operating systems with a highly scalable mechanism to demultiplex events to pools of threads (think Solaris 10 event ports or Windows completion ports for example).

Asynchronous I/O operation takes one of two forms. In the first form, I/O operations return a java.util.concurrent.Future that represents the pending result. The Future interface defines methods to poll the status or wait for the I/O operation to complete. The second form is where the I/O operation is initiated specifying a CompletionHandler that is invoked when the I/O operation completes (think callbacks).

Completion handlers immediately raise the question as to what threads invoke the handlers. In this API, asynchronous channels are bound to a group that encapsulates a thread pool and the other resources shared between channels in the group. Groups are constructed with a thread pool so the application gets to control the number of threads, thread identity, and other policy matters. Combining thread pools and I/O event demultiplexing is a complex area. The project page has further information on this topic.

The other big update in the channels package is the completion of the socket channel API. Each of the network oriented channels now implement NetworkChannel and so define methods to bind the channel's socket, set and query socket options, etc. This should eliminate the need to use the troublesome socket adapter. Furthermore, the support for socket options is extensible to allow for operating system specific options, important for fine tuning some high performance servers. Multicast support has also been added.

The original early draft proposed a new set of buffer classes that were indexed by long and so could support more than 231 elements. In the I/O area the main use-case is contiguous mapping of files regions larger than 2GB. On further examination, this was an ugly solution so this has been buried for now. If support for big arrays is added to the collections API in the future then with appropriate factories then it should be possible to have such arrays backed by a large mapped region. In the mean-time, applications needing to map massive files need to do so in chunks of 2GB or less.

One other update to mention is BufferPoolMXBean. This is new management interface for pools of buffers so that tools such as VisualVM, jconsole, or other JMX clients, can monitor the resources associated with direct or mapped buffers.

So, that's a brief run through the project. In the introduction paragraph I mentioned that the bulk of the implementation was pushed to jdk7 a few builds back. Overall, I'm relatively happy with it but there are a few areas that need to be re-visited. For now, development will continue at the New I/O project and we will hopefully sync up again with jdk7 in a couple of builds time.

Comments:

To get the content of a file newInputStream() is a good start but I'm (still) missing a simple getContent()/getText() or something similar.

Posted by Christian on April 09, 2009 at 10:53 PM PDT #

Christian, it is important that the common things are easy to do. One thing we have in the nio repository that isn't in jdk7 yet is java.io.Inputs class that defines the readAllBytes and readAllLines methods. I think what is what you are asking for. In any case, I should have mentioned in the blog that it would be best to follow up comments on the nio-discuss mailing list rather than here.

Posted by Alan on April 10, 2009 at 12:42 AM PDT #

Hi Alan,

It seems like a lot of the long-standing problems with the Java I/O API are being fixed, so great work.

A bit sad that the 2GB limit is going to remain for some time still, but I guess one can't have everything.

Best,
Ismael

Posted by Ismael Juma on April 10, 2009 at 07:02 AM PDT #

Hi,

All looks great!

One question on Files.walkFileTree(); is there any attempt to parallelise the implementation for decent performance, especially on remote filesystems? Over the last 15 years (ish) I've usually found it worthwhile to throw 4--8 threads to overcome latency and/or give the disc-scheduling algorithms more to chew on at once.

It \*might\* be useful to have a variant that takes a ThreadPool argument and the maximum number of threads that it is allowed to use from the pool, and the visitors would have to allow for concurrent calls. Calls to the visitor in the current version could be serialised to avoid unhappy surprises but still improve I/O throughput IMHO.

Or did I miss that? B\^>

Rgds

Damon

Posted by Damon Hart-Davis on April 11, 2009 at 10:01 PM PDT #

how much i would like to have this time ago!!! it sounds great!

Posted by Dani on April 13, 2009 at 10:16 AM PDT #

@Ismael

I actually see the 2GB limit as a good thing? It promotes stream safety :)

I usually Chunk in 32-256mb segments. (the smaller the better :) )

Posted by Richard on April 16, 2009 at 12:34 AM PDT #

Hi Richard,

Memory mapping files is useful in certain situations and 2GB is not particularly large these days (and even less in 2010 when JDK7 will be released).

If streaming is the way forward, why does Java even support memory mapping of files?

Best,
Ismael

Posted by Ismael Juma on April 16, 2009 at 12:47 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

user12820862

Search

Top Tags
Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News
Blogroll

No bookmarks in folder