LimitedInputStream: how to secure against OutOfMemoryErrors


When reading data off the web one has to always to be ready for the worst. One of the situations that is perhaps too often ignored is that the file one may want to read, may be huge, if not infinite. When parsing xml with SAX this could just lead to method that never returns, but when working with a DOM this could also lead to a huge data structure being created, and the fatal java.lang.OutOfMemoryError being thrown. To guard against this sitation I yesterday wrote a special LimitedInputStream class. You just use it like this:
      Document document = builder.parse(
                           new LimitedInputStream(
                               rsd.openConnection().getInputStream(), 
                               100 \* 1024)
                         );
     
If the file you download exceeds the size in bytes passed to the constructor an IOException is thrown. In my case I am trying to parse Real Simple Discoverability xml files, which ought to be very small (much less than the 100k I allow for).

Very simple, but I am sure, very useful. This has probably been done before, but anyway, it's released under a BSD licence, so anyone can use it. Feedback appreciated.

-----

Update: I just realised that I could have used the URLConnection.getContentLength() method to do this! Duh.

----

Update: Tim Bray tells me that the Content Length is not a reliable indicator of the length of the message, and is not allways present. So it is still a good precautionary step to use the LimitedInputStream or the LimitInputStream developed by the jxta group (thanks to Mike Duigou).

Comments:

JXTA has a similar class as part of it's source, but it is primarily intended as a mechanism to split up a stream for sub-components. It works well for situations like processing chunked HTTP transfers. <p/>So instead of throwing an Exception when the limit is reached it pretends that EOF was reached. <p/><a href ="http://platform.jxta.org/source/browse/platform/binding/java/api/src/net/jxta/util/LimitInputStream.java?rev=1.7&view=auto&content-type=text/vnd.viewcvs-markup">LimitInputStream.java

Posted by Mike Duigou on September 15, 2005 at 06:18 PM CEST #

Thanks for the link. I could have used a class like that too, I suppose. After all if the XML parser received an end of file before finishing the parse, it will throw an error. Mind you it just occurred to me that as far as http requests go, one should just be able to look at the Content-Length header of the response and decide from that if the content is of an acceptable length. A well. There is even a method for that in the URLConnection class: getContentLength.... Mhh looks like my class is not that useful for the case I was looking at. But it is interesting to learn how it used in jxta on the other hand :-)

Posted by Henry Story on September 17, 2005 at 04:05 AM CEST #

Post a Comment:
Comments are closed for this entry.
About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today