Saturday Jun 13, 2009

How to compact your Virtual Disks

In the previous article, we've discussed the reasons that cause your virtual disk images to become quite large over time. It might seem to be a pretty hopeless situation, but let me assure you: there is a way out. This article will show you how to reclaim your disk space. A couple of steps are presented and while each of them make sense, it is important that you execute them in the exact order given if you want to achieve the best results.

The first step (don't shoot me for that) is to delete files in your VM you don't really need. Clean up your trashcan, the temp directories, download stuff, uninstall applications you never use, etc. There are a lot of tools out there to delete unused files from Windows, I'm sure you will find a tool that suits your needs.

Now when you have just the files on the disk that you need, it's time to defragment the hard disk. After defragmentation, the files will be nicely aligned and there will be little free space between files. Windows comes with (not so good) defragmentation software, you will find it in the properties of your disk drive in My Computer under the "Tools" tab. Just give it a run and hope it will improve fragmentation. There are better tools out there, some of them cost money.

The data is now nicely aligned but we still got all those unused blocks that contain garbage (the contents of the files that used to live there). Therefore we need a tool that can find these blocks and overwrite them with zeros. Windows does not come with such a compact tool but it's available for download from Microsoft:

Close all your programs and do a

sdelete.exe -c

This will take a while, you should let it do its job without interfering. The tool will go through all parts of your virtual disk and look for things it can wipe out. It's known to be a very safe process, so don't worry.

Next, you should shutdown your virtual machine (power off, not save state) and let VirtualBox optimize the disk image and cut out all parts that SDelete zeroed out. There are two ways to do this: first you could compact the image (this will just operate on the disk image and make it smaller) or you could clone the disk image to a new image. The former needs more disk space but the latter has the advantage of being more secure (you still got the orginal bloated file after all) and it even allows you to switch from one virtual disk format (e.g. VDI) to another (e.g. VMDK). Let look at both options.

VBoxManage modifyhd XP.vdi --compact

This will compact your disk image and it will take some time. The cloning from VDI to VMDK works as follows:

VBoxManage clonehd XP.vdi NewXP.vmdk --format VMDK 

There are a lot more options to clonehd and modifyhd, have a look at the VirtualBox user manual.

This concludes our article and I hope I've given you some useful information that allows you to reclaim some of your disk space. I'm about to go on a 11h intercontinental flight and the virtual machine I want to work with was too big for my small notebook so I've used these techniques to shrink the image to less than half of its previous size. 

Some background on Virtual Disks

Virtual disk images have a life of their own: what appears to be just a big file on your host system is actually a file system in the guest image. What does that mean? Most users go for the dynamically expanding images in VirtualBox as they do not want to limit themselves to a small virtual disk size and at the same time do not want to waste disk space on their host while the guest doesn't actually need it. A great concept in theory but there are some caveats. Why do you think would the disk image grow when there is still a lot of free space in the guest? Why does it only keep getting bigger, even if you delete files from the guest?

The answer is in the way how modern filesystems work. In this article, I'd will focus on Windows guests and the NTFS filesystem which is both the most common combination and also the most problematic one. A disk is a (quite large) collection of bits that can be addressed in in arbitrary order (random access). A filesystem is designed to manage those disks and turn them into more useful things such as directories and files. The filesystem governs the disk and it blindly assumes that it owns the whole disk and that it can address all the bits on the disk. If it doesn't make use of some part of the disk, it considers that space to be wasted. It also contains some other assumptions, for example that it's better to not always touch the same bits because they might derogate after some time and therefore the disk is more likely to fail. Sometimes they even try to benefit from the fact that a disk spins faster at the outside than the inner "circles" so reads and writes there are more than 4 times faster usually. What does all this mean to VirtualBox? Well, all these things explain why a filesystem is so wasteful with disk space: it tries to scatter its data all over the disk and this makes the virtual disk image grow.

Why does the virtual disk image grow in large chunks, even if just a small file was written in the guest? That's actually an optimization. The dynamically growing disk images grow in chunks to limit the number of chunks (if it grew by e.g. 1 byte, we'd have to waste more than one byte of overhead for each byte written!). For the standard VDI files, the chunk size is 1MB. If one or more bytes get written to a 1MB chunk (we call them grains), the whole grain gets allocated. For VHD (originally from Microsoft Virtual PC), the grain size is 2MB even. For VMDK (originally from VMware), the grain size is just 64kB. This means VMDK is the most storage efficient file format but it's also a bit less efficient in its overhead and write performance.

Another important factor is fragmentation. If you delete a 1MB file on your NTFS disk, there will be 1MB of free space somewhere on the disk (assuming the file was not fragmented). Now you want to copy a 2MB file to your disk. What should the filesystem do? Should it look for a place on the disk where there 2MB free? Should it cut the file in chunks, put the first 1MB at the place where you just deleted a file and try to squeeze in the rest somewhere else? That's a decision that the filesystem has to make each time and NTFS is known for tending towards fragmentation. Fragmentation is an efficient way of using free space but if your files are distributed all over the disks, it will take a lot of time to read them and performance will degrade. Which user hasn't observed that Windows keeps getting slower and slower? Disk fragmentation is one explanation for that phenomenon.

Let's look again at that 1MB file we just deleted. What happens when you delete a file? Not much actually: the filesystem just marks that file as deleted in some global file structure (MFT - master file table for NTFS). That's very quick and allows undelete programs to do their job in many cases. However, this also means that the free space the file used to live at will still contain the contents of the file we just deleted. Until the filesystem allocates these blocks again, the data will remain as it was. For dynamically growing disk images, this has a major consequence: as the blocks contain data, they appear to VirtualBox as being used so they need to remain in the virtual disk.

If you've made until here, you've seen answers to the following questions:

  • How are virtual disk images organized?
  • Why do virtual disk images grow so fast?
  • Why do virtual disk images never shrink?
  • What is fragmentation and how does it affect virtual disk images?
In the next article, I'll show you the weapons you need to fight excess disk use. 


This blog concerns all things VirtualBox.


« February 2017