file paths and file URIs

The ability to convert between file paths and file URIs is very important in many applications but it can be a problematic area. For starters, it's not too uncommon to see code doing the wrong thing. For example, invoking URI's getPath method and using the URI path component as a file path. Another one is code that continues to use File's toURL method even though that method has long been deprecated in favor of the toURI method as it knows how to escapae characters that wouldn't be legal in the URI.

Even if the APIs are used correctly there are a couple of issues that still arise, mostly when integrating with applications that come with the operating system or other native applications. On Windows, a long standing gripe has been with the handling of UNC names. The toURI method is specified to return a URI with an undefined authority component (in URI speak this means that it is null) and so the UNC server name is encoded into the URI's path component. There have been many calls over the years to change the toURI method but that would an incompatible change likely to break existing applications. In JDK 7 the issue is addressed by means of the new file system API. Existing code can be changed to use toPath().toUri() which will encode the server name into the authority component as expected. For example, a file path of "\\\\server\\Docs and stuff\\file" will result in a URI of "file://server/Docs%20%and%20stuff/file".

Recent editions of Windows added support for literal IPv6 addresses in UNC path names and this transcription scheme is understood by the URI support in the new file system API. For example invoking Paths.get with the URI "file://[fe80::203:baff:fe5a:749d%1]/Documents/file" will convert to the Path "\\\\fe80--203-baff-fe5a-749ds1.ipv6-literal.net\\Documents\\file".

There are also a few improvements on Unix. The first one is easy to spot as the toUri method constructs URIs of the form "file:///path" instead of "file:/path". The second difference is more subtle and stems from the ability to preserve the platform representation of the path, something that is important on Unix where the file names are stored as bytes. For example, suppose you share a directory with a Russian colleague and he creates the file русский in that directory. The resulting URI is "file:///dir/%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9", obtained from a native application or by invoking the toUri() method, is transmitted to you. You use Paths.get(URI) method and it results in a Path that allows you to locate and access the file, irrespective of your locale.

Comments:

Finally! Well done.

Posted by Edward Wang on December 02, 2010 at 06:00 PM PST #

Post a Comment:
Comments are closed for this entry.
About

user12820862

Search

Top Tags
Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News
Blogroll

No bookmarks in folder