Breaking the Large File Barrier


"Large file" is actually a technical term for files larger than 2 GB (and to be really precise, that's 231 or 2,147,483,648 bytes.) In the past, we've not been able to distribute files over 2 GB on Sun Download Center (SDLC). Our engineers tell me that's because 32 bit systems cannot handle signed integers greater than 2 GB. Way back when we built our "old" download system, it didn't really seem to matter, as we would never try to offer files that large for downloading (kind of like not worrying about Y2K back in the 1980's!). But times have changed, and with the proliferation of large, single file DVD ISO images that many Linux distros use, it's no longer uncommon. (Of course, this goes hand-in-hand with the proliferation of broadband access.)

As we built our new download application, large file support was a requirement. But, we were still stuck with a large file limit in some of the older code in Sun Download Manager (SDM). As using a download manager is really, really helpful for files this large, we appeared to be unable to proceed. We were very aware of this limit and had a high priority mandate to fix it, but we haven't had enough engineering resources to take that on yet. The pressure was building, however, as the Solaris OS team really wanted to be able to release single large file DVD images (and I don't blame them).

That's when things got interesting. Internally, we were testing our new download system, and we put a few large files on it. One of the testers sent in some test results saying, "Successfully downloaded the 3.2 GB test file using SDM." Impossible, I thought, there must be a mistake. But no, the tester insisted it worked. So I tried it myself -- it worked! They say "ignorance is bliss," and thankfully this tester was unaware of what we all knew "would not work" and simply went for it. It was quite a surprise.

Now these types of bugs really don't have a habit of fixing themselves, but we figured out what's going on.

For files on SDLC, we generate what we call "Verification Property Files" (VPFs) that contain the checksums SDM uses to checksum downloads real-time as they are received. Another piece of data in VPFs is the file size, and that's how we get around this limit in SDM. It turns out that as long as there is a VPF for a large file (and we create them automatically for all files released on SDLC), SDM can get the file size from the VPF and it all works! When there is no VPF, the file size is part of the header info sent from the web server, and this is when things break. (Some older web servers can't handle the large numbers either.)

So, bottom line, after a bunch more testing, we've just released the first ever single large file on SDLC -- the latest version of Solaris Express Developer Edition (~ 3.7 GB DVD ISO image). This is a small but significant milestone after years of butting up against the 2 GB limit.

Now the larger the file, the more that can go wrong, so if you give this a try, please do use SDM. And here are some notes and "best practices" we gleaned from rolling this out:

  • The "32 bit" limit isn't unique to our systems but can affect servers, routers, operating systems, and clients throughout the network. For example, if a Windows XP system uses a FAT32 file system rather than NTFS, there's no way it's going to work -- the OS simply can't handle the file. (Thanks to openSUSE.org where I found that tip.)
    • As a result, we highly recommend to our product teams that they not rely solely on a large file for distribution, as it's not going to work for all customers. Offer options such as a "chunked" version of the DVD that users concatenate after downloading in smaller pieces. Or offer multiple CD images instead of the DVD (as we do for Solaris). And finally, offer a hard media version (DVD) that users can order inexpensively (or better yet, free) and is then shipped to them.
  • Use an up-to-date browser and fully patched, modern operating system to be sure large files are adequately supported on the client end.
  • Absolutely do not attempt this with a slow line, like a dial-up modem. You can expect it to take about 40 hours per GB on a 56K modem.
  • Use a download manager so you can resume where you left off in case anything goes wrong (you do not want to have to start over from the beginning). You can also pause and resume, if you're running out of time.
  • Make sure you have at least twice the size of the file in free disk space -- with these large files, that's actually quite a bit of disk space. Operating systems typically make a temporary copy of the file while downloading, then copy it to its final location, so you must have the extra space.
  • And a couple of notes specific to SDM:
    • As noted previously, SDM will not support large files except on SDLC, so don't try it on other sites (until we can get this fixed).
    • When SDM finishes downloading the large file, there is internal processing that must take place before the download is actually complete. Due to the huge file size, this processing can take several minutes. As a result, the SDM progress bar will say "100%" while the Status still says "Downloading data..." Be patient and do not close SDM. After a few minutes, the Status changes to "Downloaded", and the download is complete.

Hopefully this first large file release goes well and is the first of many. If you give it a try and have any problems or questions, please let us know -- the feedback is very helpful as we learn the ins and outs of large file distribution over the Internet.


Comments:

rsync is my download manager.

Posted by Mikael Gueck on February 05, 2008 at 07:48 AM PST #

I don't get this. All download managers I have used since 5 or 10 years ago have not had any problems with files much larger than 2 gigs. I really don't get this!

Posted by Anonymous Retard on February 05, 2008 at 07:58 AM PST #

There's really not that much to get. There's simply a bug in the code that does not allow downloading of large files from sites other than SDLC. Please understand that Sun Download Manager is not and has never been an "official" Sun supported product and so it is always a big battle to get engineers to work on it. We have too many other pressing priorities and not enough people. I personally have been fighting for this fix for several years now and will continue to do so, but there's no one that can work on it right now. It's frustrating, believe me, but a fact of life in the current environment.

Posted by Gary Zellerbach on February 05, 2008 at 09:20 AM PST #

I'm all for pushing past this barrier. Most of the apps I work with handle 2GB files except for one I can think of that choked somewhere after 4GB.

According to http://kernel.org/faq/#largefiles and https://bugzilla.mozilla.org/show_bug.cgi?id=184452 , Firefox still has this problem. Unfortunately, I'd never trust a browser to download such a large file. Browser's internal download managers have a long way to go.

I don't understand, if a web server is serving the correct file size in the header, does SDM still have problems?

Posted by Anthony Bryan on February 05, 2008 at 05:41 PM PST #

This seems to mainly be a server side problem of being unable to report the filesize accurately. Still I see no reason why the download manager shoudn't be able to download a file of unknown size though resume might be problematic.

Posted by Hayden Legendre on February 05, 2008 at 07:48 PM PST #

Anthony, thanks for the links! That is interesting and helpful content and good to know.

As to the questions about what happens when the web server sends the correct size to SDM, I suspect that SDM would still not work but I don't have the full technical analysis of the bug from our engineers so it's speculation. Or, of course, we could try it on a known web server that sends the right large file size and see what happens. Feel free to give it a try if you have a good example or let me know a URL and I'll give it a try. Either way, we still have to fix it in SDM.

Posted by Gary Zellerbach on February 06, 2008 at 03:54 AM PST #

My pleasure. As always, another interesting topic.

FYI, Hayden is the author of Retriever, a very capable Java download manager that handles large files well. (You can reach it in his sig). I don't know how much a priority fixing this is, but you could contact him about consulting if you were interested.

Posted by Anthony Bryan on February 06, 2008 at 04:01 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

I helped design, build, and manage download systems at Sun for many years. Recently I've focused on web eMarketing systems. Occasionally, I write about other interests, such as holography and jazz guitar. Follow me on Twitter: http://twitter.com/garyzel

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
News

No bookmarks in folder

Blogroll
ESD