Tuesday Feb 05, 2008

Breaking the Large File Barrier


"Large file" is actually a technical term for files larger than 2 GB (and to be really precise, that's 231 or 2,147,483,648 bytes.) In the past, we've not been able to distribute files over 2 GB on Sun Download Center (SDLC). Our engineers tell me that's because 32 bit systems cannot handle signed integers greater than 2 GB. Way back when we built our "old" download system, it didn't really seem to matter, as we would never try to offer files that large for downloading (kind of like not worrying about Y2K back in the 1980's!). But times have changed, and with the proliferation of large, single file DVD ISO images that many Linux distros use, it's no longer uncommon. (Of course, this goes hand-in-hand with the proliferation of broadband access.)

As we built our new download application, large file support was a requirement. But, we were still stuck with a large file limit in some of the older code in Sun Download Manager (SDM). As using a download manager is really, really helpful for files this large, we appeared to be unable to proceed. We were very aware of this limit and had a high priority mandate to fix it, but we haven't had enough engineering resources to take that on yet. The pressure was building, however, as the Solaris OS team really wanted to be able to release single large file DVD images (and I don't blame them).

That's when things got interesting. Internally, we were testing our new download system, and we put a few large files on it. One of the testers sent in some test results saying, "Successfully downloaded the 3.2 GB test file using SDM." Impossible, I thought, there must be a mistake. But no, the tester insisted it worked. So I tried it myself -- it worked! They say "ignorance is bliss," and thankfully this tester was unaware of what we all knew "would not work" and simply went for it. It was quite a surprise.

Now these types of bugs really don't have a habit of fixing themselves, but we figured out what's going on.

For files on SDLC, we generate what we call "Verification Property Files" (VPFs) that contain the checksums SDM uses to checksum downloads real-time as they are received. Another piece of data in VPFs is the file size, and that's how we get around this limit in SDM. It turns out that as long as there is a VPF for a large file (and we create them automatically for all files released on SDLC), SDM can get the file size from the VPF and it all works! When there is no VPF, the file size is part of the header info sent from the web server, and this is when things break. (Some older web servers can't handle the large numbers either.)

So, bottom line, after a bunch more testing, we've just released the first ever single large file on SDLC -- the latest version of Solaris Express Developer Edition (~ 3.7 GB DVD ISO image). This is a small but significant milestone after years of butting up against the 2 GB limit.

Now the larger the file, the more that can go wrong, so if you give this a try, please do use SDM. And here are some notes and "best practices" we gleaned from rolling this out:

  • The "32 bit" limit isn't unique to our systems but can affect servers, routers, operating systems, and clients throughout the network. For example, if a Windows XP system uses a FAT32 file system rather than NTFS, there's no way it's going to work -- the OS simply can't handle the file. (Thanks to openSUSE.org where I found that tip.)
    • As a result, we highly recommend to our product teams that they not rely solely on a large file for distribution, as it's not going to work for all customers. Offer options such as a "chunked" version of the DVD that users concatenate after downloading in smaller pieces. Or offer multiple CD images instead of the DVD (as we do for Solaris). And finally, offer a hard media version (DVD) that users can order inexpensively (or better yet, free) and is then shipped to them.
  • Use an up-to-date browser and fully patched, modern operating system to be sure large files are adequately supported on the client end.
  • Absolutely do not attempt this with a slow line, like a dial-up modem. You can expect it to take about 40 hours per GB on a 56K modem.
  • Use a download manager so you can resume where you left off in case anything goes wrong (you do not want to have to start over from the beginning). You can also pause and resume, if you're running out of time.
  • Make sure you have at least twice the size of the file in free disk space -- with these large files, that's actually quite a bit of disk space. Operating systems typically make a temporary copy of the file while downloading, then copy it to its final location, so you must have the extra space.
  • And a couple of notes specific to SDM:
    • As noted previously, SDM will not support large files except on SDLC, so don't try it on other sites (until we can get this fixed).
    • When SDM finishes downloading the large file, there is internal processing that must take place before the download is actually complete. Due to the huge file size, this processing can take several minutes. As a result, the SDM progress bar will say "100%" while the Status still says "Downloading data..." Be patient and do not close SDM. After a few minutes, the Status changes to "Downloaded", and the download is complete.

Hopefully this first large file release goes well and is the first of many. If you give it a try and have any problems or questions, please let us know -- the feedback is very helpful as we learn the ins and outs of large file distribution over the Internet.


About

I helped design, build, and manage download systems at Sun for many years. Recently I've focused on web eMarketing systems. Occasionally, I write about other interests, such as holography and jazz guitar. Follow me on Twitter: http://twitter.com/garyzel

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
News

No bookmarks in folder

Blogroll
ESD