Tuesday Feb 05, 2008

Breaking the Large File Barrier

"Large file" is actually a technical term for files larger than 2 GB (and to be really precise, that's 231 or 2,147,483,648 bytes.) In the past, we've not been able to distribute files over 2 GB on Sun Download Center (SDLC). Our engineers tell me that's because 32 bit systems cannot handle signed integers greater than 2 GB. Way back when we built our "old" download system, it didn't really seem to matter, as we would never try to offer files that large for downloading (kind of like not worrying about Y2K back in the 1980's!). But times have changed, and with the proliferation of large, single file DVD ISO images that many Linux distros use, it's no longer uncommon. (Of course, this goes hand-in-hand with the proliferation of broadband access.)

As we built our new download application, large file support was a requirement. But, we were still stuck with a large file limit in some of the older code in Sun Download Manager (SDM). As using a download manager is really, really helpful for files this large, we appeared to be unable to proceed. We were very aware of this limit and had a high priority mandate to fix it, but we haven't had enough engineering resources to take that on yet. The pressure was building, however, as the Solaris OS team really wanted to be able to release single large file DVD images (and I don't blame them).

That's when things got interesting. Internally, we were testing our new download system, and we put a few large files on it. One of the testers sent in some test results saying, "Successfully downloaded the 3.2 GB test file using SDM." Impossible, I thought, there must be a mistake. But no, the tester insisted it worked. So I tried it myself -- it worked! They say "ignorance is bliss," and thankfully this tester was unaware of what we all knew "would not work" and simply went for it. It was quite a surprise.

Now these types of bugs really don't have a habit of fixing themselves, but we figured out what's going on.

For files on SDLC, we generate what we call "Verification Property Files" (VPFs) that contain the checksums SDM uses to checksum downloads real-time as they are received. Another piece of data in VPFs is the file size, and that's how we get around this limit in SDM. It turns out that as long as there is a VPF for a large file (and we create them automatically for all files released on SDLC), SDM can get the file size from the VPF and it all works! When there is no VPF, the file size is part of the header info sent from the web server, and this is when things break. (Some older web servers can't handle the large numbers either.)

So, bottom line, after a bunch more testing, we've just released the first ever single large file on SDLC -- the latest version of Solaris Express Developer Edition (~ 3.7 GB DVD ISO image). This is a small but significant milestone after years of butting up against the 2 GB limit.

Now the larger the file, the more that can go wrong, so if you give this a try, please do use SDM. And here are some notes and "best practices" we gleaned from rolling this out:

  • The "32 bit" limit isn't unique to our systems but can affect servers, routers, operating systems, and clients throughout the network. For example, if a Windows XP system uses a FAT32 file system rather than NTFS, there's no way it's going to work -- the OS simply can't handle the file. (Thanks to openSUSE.org where I found that tip.)
    • As a result, we highly recommend to our product teams that they not rely solely on a large file for distribution, as it's not going to work for all customers. Offer options such as a "chunked" version of the DVD that users concatenate after downloading in smaller pieces. Or offer multiple CD images instead of the DVD (as we do for Solaris). And finally, offer a hard media version (DVD) that users can order inexpensively (or better yet, free) and is then shipped to them.
  • Use an up-to-date browser and fully patched, modern operating system to be sure large files are adequately supported on the client end.
  • Absolutely do not attempt this with a slow line, like a dial-up modem. You can expect it to take about 40 hours per GB on a 56K modem.
  • Use a download manager so you can resume where you left off in case anything goes wrong (you do not want to have to start over from the beginning). You can also pause and resume, if you're running out of time.
  • Make sure you have at least twice the size of the file in free disk space -- with these large files, that's actually quite a bit of disk space. Operating systems typically make a temporary copy of the file while downloading, then copy it to its final location, so you must have the extra space.
  • And a couple of notes specific to SDM:
    • As noted previously, SDM will not support large files except on SDLC, so don't try it on other sites (until we can get this fixed).
    • When SDM finishes downloading the large file, there is internal processing that must take place before the download is actually complete. Due to the huge file size, this processing can take several minutes. As a result, the SDM progress bar will say "100%" while the Status still says "Downloading data..." Be patient and do not close SDM. After a few minutes, the Status changes to "Downloaded", and the download is complete.

Hopefully this first large file release goes well and is the first of many. If you give it a try and have any problems or questions, please let us know -- the feedback is very helpful as we learn the ins and outs of large file distribution over the Internet.

Tuesday Nov 27, 2007

Metalink takes off!

In a world of unintended consequences, one I often think about is not realizing how many new friends I would make because of my kids. Through numerous events, car-pooling, baby sitting, play dates, parties, and years of schooling with the same kids in their classes, I gained a whole new set of good friends (the parents that is, not the kids!).

Similarly, I hadn't realized when starting my blog that you can make some very interesting connections and virtually meet people who share your interests. It's a great benefit, and here's a great example. Soon after starting my blog, I "met" Anthony Bryan via some thoughtful, intelligent comments he left on subjects I was discussing. He must've found me due to my interest in ESD and download managers -- if you look around, there's simply not that much written about those subjects. And it's always great to find others who share your interests and passions.

I first mentioned Anthony's project, Metalink, almost two years ago, when he was just starting to gain traction. We've kept in touch, and it's really amazing to see how it's taken off since then. It's no accident of course -- it took perseverance, in combination with his clever, well-implemented, open technology. Metalink filled a gap in download managers and systems, providing for much needed enhanced redundancy, load sharing, and fault tolerance for large file downloads.

There's a long list of products now that incorporate Metalink, a sure sign of growing acceptance and success. I was going to mention a few, but I see his home page is up-to-date and says it much better than I can, so take a look. Also, here's an informative interview with Anthony about his project and its benefits.

So, what about Sun Download Manager (SDM), does it use Metalink? Well, no, not yet at least. The main reason is that SDM's primary audience is customers downloading Sun software from Sun Download Center (SDLC). Access to this software is carefully controlled for security and export control reasons. We use load management to distribute the load on multiple servers in our own data centers. As there aren't mirrors our there for this class of software (i.e., mostly not Sun's open source software), we lose one of the main advantages of Metalink. That said, we do know a lot of people use SDM on other sites because it's a good, simple, free, cross-platform download manager. So that sounds like a good argument to build in Metalink support in the future! (I'll say it before Anthony does.) I'll certainly keep it on the radar, but must admit all our engineers are tied up finishing our new download system at the moment.

In the meantime, I see that Metalink is used for OpenOffice distribution and was further pleased to see it mentioned in a number of other Sun blogs

Congratulations Anthony, and I hope Metalink is just the first of many successes for you. 


I helped design, build, and manage download systems at Sun for many years. Recently I've focused on web eMarketing systems. Occasionally, I write about other interests, such as holography and jazz guitar. Follow me on Twitter: http://twitter.com/garyzel


« July 2016

No bookmarks in folder