Saturday Jun 13, 2009

Some background on Virtual Disks

Virtual disk images have a life of their own: what appears to be just a big file on your host system is actually a file system in the guest image. What does that mean? Most users go for the dynamically expanding images in VirtualBox as they do not want to limit themselves to a small virtual disk size and at the same time do not want to waste disk space on their host while the guest doesn't actually need it. A great concept in theory but there are some caveats. Why do you think would the disk image grow when there is still a lot of free space in the guest? Why does it only keep getting bigger, even if you delete files from the guest?

The answer is in the way how modern filesystems work. In this article, I'd will focus on Windows guests and the NTFS filesystem which is both the most common combination and also the most problematic one. A disk is a (quite large) collection of bits that can be addressed in in arbitrary order (random access). A filesystem is designed to manage those disks and turn them into more useful things such as directories and files. The filesystem governs the disk and it blindly assumes that it owns the whole disk and that it can address all the bits on the disk. If it doesn't make use of some part of the disk, it considers that space to be wasted. It also contains some other assumptions, for example that it's better to not always touch the same bits because they might derogate after some time and therefore the disk is more likely to fail. Sometimes they even try to benefit from the fact that a disk spins faster at the outside than the inner "circles" so reads and writes there are more than 4 times faster usually. What does all this mean to VirtualBox? Well, all these things explain why a filesystem is so wasteful with disk space: it tries to scatter its data all over the disk and this makes the virtual disk image grow.

Why does the virtual disk image grow in large chunks, even if just a small file was written in the guest? That's actually an optimization. The dynamically growing disk images grow in chunks to limit the number of chunks (if it grew by e.g. 1 byte, we'd have to waste more than one byte of overhead for each byte written!). For the standard VDI files, the chunk size is 1MB. If one or more bytes get written to a 1MB chunk (we call them grains), the whole grain gets allocated. For VHD (originally from Microsoft Virtual PC), the grain size is 2MB even. For VMDK (originally from VMware), the grain size is just 64kB. This means VMDK is the most storage efficient file format but it's also a bit less efficient in its overhead and write performance.

Another important factor is fragmentation. If you delete a 1MB file on your NTFS disk, there will be 1MB of free space somewhere on the disk (assuming the file was not fragmented). Now you want to copy a 2MB file to your disk. What should the filesystem do? Should it look for a place on the disk where there 2MB free? Should it cut the file in chunks, put the first 1MB at the place where you just deleted a file and try to squeeze in the rest somewhere else? That's a decision that the filesystem has to make each time and NTFS is known for tending towards fragmentation. Fragmentation is an efficient way of using free space but if your files are distributed all over the disks, it will take a lot of time to read them and performance will degrade. Which user hasn't observed that Windows keeps getting slower and slower? Disk fragmentation is one explanation for that phenomenon.

Let's look again at that 1MB file we just deleted. What happens when you delete a file? Not much actually: the filesystem just marks that file as deleted in some global file structure (MFT - master file table for NTFS). That's very quick and allows undelete programs to do their job in many cases. However, this also means that the free space the file used to live at will still contain the contents of the file we just deleted. Until the filesystem allocates these blocks again, the data will remain as it was. For dynamically growing disk images, this has a major consequence: as the blocks contain data, they appear to VirtualBox as being used so they need to remain in the virtual disk.

If you've made until here, you've seen answers to the following questions:

  • How are virtual disk images organized?
  • Why do virtual disk images grow so fast?
  • Why do virtual disk images never shrink?
  • What is fragmentation and how does it affect virtual disk images?
In the next article, I'll show you the weapons you need to fight excess disk use. 

Thursday Jan 22, 2009

Sun xVM VirtualBox 2.1.2 is released!

Just a quick note to say that version 2.1.2 is released.

Our friend the Fat Bloke has more details.

Wednesday Dec 17, 2008

VirtualBox 2.1 now released

Just a quick note to say that version 2.1 is released.

Our friend the Fat Bloke has more details.

Thursday Sep 04, 2008

Sun xVM VirtualBox 2.0 is released

We're pleased to announce that VirtualBox 2.0 is now available.

Read all about it in the Press release, or just go download it yourself.

Headline Features:

• 64 bits guest support (64 bits host only)
• New native Leopard user interface on MacOS X hosts
• The GUI was converted from Qt3 to Qt4 with many visual improvements
• New-version notifier
• Guest property information interface
• Host Interface Networking on Mac OS X hosts
• Host Interface Networking on Solaris 10 hosts
• Support for Nested Paging on modern AMD-V CPUs (major performance gain)
• Framework for collecting performance and resource usage data (metrics)
• Clipboard integration for OS/2 Guests
• Support for VHD images
• Created separate SDK component featuring a new Python programming interface
on Linux and Solaris hosts

In addition, the following items were fixed and/or added:
• VMM: VT-x fixes
• AHCI: improved performance
• GUI: keyboard fixes
• Linux installer: properly uninstall the package even if unregistering the DKMS
module fails
• Linux additions: the guest screen resolution is properly restored
• Network: added support for jumbo frames (> 1536 bytes)

Monday Aug 04, 2008

Sun xVM VirtualBox 1.6.4 now available!

A wise Superhero once said something along the lines of: "With great power comes great responsibility" and we, in the VirtualBox team, never disagree with Superheroes.

So when we were informed by the guys at CoreLabs of a security vulnerability on the Windows platform we took it very seriously indeed. And the result is a new maintenance release which fixes the security problem and several other niggly bugs. 

This new version (Sun xVM VirtualBox 1.6.4) is available from the usual place  and the ChangeLog contains fuller details of the bugs fixed in this release.




Wednesday Jun 25, 2008

Using VRDP to view VirtualBox virtual machines

Here's a nice blog which covers one of the unique features of VirtualBox, the built-in RDP server. Using VRDP to view VirtualBox virtual machines -FB

Tuesday Jun 24, 2008

Secretary or data center?

VirtualBox is highly popular among end users, found on over five million desktops already. Recently, I observed a discussion about VirtualBox and what it can be used for. Eventually it turned into a dispute whether it's suitable for server deployments. "VirtualBox is for the secretary whereas \*\*\* is for the data center because ..... well, because." Is it true that VirtualBox does not target server deployments and other products such as Sun xVM Server or VMware ESX should be used for running virtual machines on server hardware?

Well, yes and no. VirtualBox in reality is much more than a simple to use end user product. The way you download it from, it's a distribution containing (among other things) a hypervisor, virtual device modules, an RDP server and an application programming interface (API) on top of which we've developed a nice and simple to use graphical interface. Our distribution is perfect for end users but it just being a distribution, it means that everyone is able to take the pieces and assemble them in a different way and create a new distribution.

We have lots of customers that benefit from VirtualBox as the only truly modular virtualization software and build their own products on top of VirtualBox. Some of them implement their own virtual PCI cards, others integrate VirtualBox with web interfaces they've developed. With its high performance, integrated RDP server and API, VirtualBox is ideally suited to be the heart of solutions for VDI (virtual desktop infrastructure) and we have customers marketing VirtualBox based VDI software that can run hundreds of desktops on a single server.

Now when looking at high end server workloads, I have to admit that VirtualBox doesn't have all the features imaginable. We do not support 64-bit guests, we only present one CPU to each guest (but we do make use of all cores) and we can't transfer a running virtual machine from one server to another (live migration). That might be not enough for some scenarios but you will be surprised how little time will pass until those features become available...

Monday Jun 23, 2008

VirtualBox Dark Matter

There are some great VirtualBox technology bloggers out there, too many to mention by name or pseudonym.

But experiments have shown the existence of VirtualBox Dark Matter. This is expressed by the equation:

  Σ (VirtualBox Knowledge) > Blogs + Wikis + Forums + YouTube

In order to prise this Dark Matter out of people's heads and get it into the public domain we've created this group blog. This is for anyone to contribute anything they want, no matter how large or how small, about the World's most popular Open Source virtualization platform for Windows, Mac OS X, OpenSolaris, Solaris and Linux.

You may feel you don't have enough content to fill your own blog, or you may be a compulsive blogger. Whatever your persuasion, we now offer this new VirtualBox soapbox up to you.

Come along, stand up and Blog!


This blog concerns all things VirtualBox.


« February 2017