By Wcoekaer-Oracle on May 29, 2011
I wrote a blog entry earlier about how all the Linux kernel bits to be a complete Dom0 and DomU kernel on top of Xen have been committed to the mainline Linux tree. Shortly after Linus pulled that set of changes in, he also merged something called CleanCache.
Cleancache is something that's actually very cool, has huge potential to make running VMs super optimized/performant/efficient and is the result of quite a bit of research and experimentation.
At the end of the blog I will have a few links that point to more information regarding the topic as it's actually quite complex to get into great detail.
cleancache is a way for the kernel to put away pages that can disappear at any point in time, as would normally be the case when it would discard cache pages. However using the cleancache method, if at some point in time that page would still be useful, it might still exist and as such doesn't have to come back from disk.
So in other words, the kernel has a cache of pages, it has a specific fixed amount of memory and manages the cached pages within those known boundaries. When there is memory pressure / memory is needed, the memory manager will go through the cache and toss out pages which at that point literally are gone, and if that data later on is needed again (file data) it will have to go back to disk and read those pages in again. Of course diskaccess being super slow compared to memory access. Dan came up with a feature he called transcendent memory, that's in the xen hypervisor (4.0 / oracle vm etc..)- This allows the hypervisor to use memory that's not allocated to any particular guests and through an interface/api let guest VMs make use of those pages, but, with the understanding that if, for instance, a new VM starts and the memory is needed, it will literally disappear instantly. so it's a way to give more temporary memory pages to a VM without any guarantee.
cleancache is the linux kernel side of being able to make use of that tmem/xen api. The changes to the filesystems (ext3, ext4, btrfs and ocfs2) allow them to have an extra set of free pages to use for cache, if the memory doesn't get reclaimed by the hypervisor for other purposes, until then the linux VM running just has more "ram". And anything that can read a page from a memory cache instead of from disk, speeds up - a lot. So page -> pagecache -> evict (old style -> remove / new style -> put in cleancache) -> need access again, check cleancache -> if there, great if not, back to disk.
cleancache relies on an external vehicle to provide the functionality and has implementations for ext3, ext4, Btrfs and OCFS2 at this point. It can also work with zcache (compressed cache), there's still a lot of discussion going on around frontswap and the more recent work around being able to share memory from multiple systems based on the same mechanism : Ramster. Ramster is going to be another really cool enhancement and shows that virtualization is more than running a VM and doing things behind the scenes, cooperative virtualization is critical to be able to share resources wisely and highly performant. Trying to do things outside might be a good way to do generic virtualization but it has serious limitations and in many cases can work against workloads in the guest. If the Virtual machine thinks memory is in RAM and the hypervisor thinks its not widely used and writes it out behind the VM's back on disk. you have a performance problem that's very difficult to predict it causes nasty side-effects. Things like tmem, ramster, cleancache etc are much more interesting methods.
There is a lot of research and development going on in these areas in our groups. Good times...
A good detailed explanation on cleancache can be found in the linux source tree : Documentation/vm/cleancache.txt
More on tmem : http://oss.oracle.com/projects/tmem/
More detail of older article on lwn : http://lwn.net/Articles/423540/
More on ramster : http://permalink.gmane.org/gmane.linux.kernel.mm/59876
happy memorial day.