Sunday Oct 18, 2009

CECR talk

28 Oct I will be giving a talk on VirtualBox at SECR conference: "VirtualBox: Struggle for Performance in Type 2 Hypervisors" see this link. So if you'll be in Moscow around this timeframe — come by, will try to make it both technical and fun.

Thursday Mar 12, 2009

Bug investigation (part 1 - tale of sir VIF)

Last year at VirtualBox was fun, but provided such a heavy load that I almost had no time to blog. Now let's fix this a bit. There are several almost detective stories of recent bug hunting in VBox core I'd like to share. They show level of complexity of modern virtualizers, and how bugs in system software may manifest in rather unexpected ways to applications (and guest kernel is an application for us :)). Let's start with this little bugger. Almost 2 years old, and it is a lot for VBox bug. Bug manifested itself as
I'm receiving the following message when certain commands are run in FreeBSD 6.2 and VirtualBox 1.4.0:

   sigreturn: eflags 0x80247

This error was one of nasty blockers to run FreeBSD reliably as the guest, and as I thought it's a good idea to support it, I looked on that bug. As it was pointed out in the bug, message is triggered by:

                if (!EFL_SECURE(eflags & ~PSL_RF, regs->tf_eflags & ~PSL_RF)) {
                       printf("sigreturn: eflags = 0x%x\\n", eflags);
                       return (EINVAL);
in FreeBSD kernel's sys/i386/i386/machdep.c. This check essentially means, that sometimes, in EFLAGS CPU register some bit FreeBSD considered insecure was toggled. Bit which is of interest is VIF bit, which has rather convoluted story behind itself.

When Intel, in early 90s wanted to keep compatibility with legacy 16-bit DOS code, while running protected mode OSes, they introduced so called VM86 mode. This mode was first take of Intel on virtualization, and call it clumsy is kind of compliment. VIF (and VIP) flags are exactly part of this extension. VIF represents virtualized version of IF flag (interrupts enabled flag). If DOS application would be allowed to modify real IF - it could disable interrupts and render whole system unusable. Thus instead, cli instruction in VM86 mode affected only VIF flag. At the same time, pushf and popf instructions which are (almost) only way to access EFLAGS, were modified in such a way, that value of VIF bit was placed to IF bit. And as VIF is 19th bit of EFLAGS, it's not visible in 16-bit version of pushf.

So now the bug reasons: sometimes FreeBSD executes BIOS calls in VM86 mode, which may modify VIF flag on the CPU. When taking protected mode interrupts (such as timer used for task scheduling), if it happens in the wrong moment (when VIF flag value was toggled) our dynamic recompiler wasn't clearing VIF flag (as according to Intel/AMD instruction manuals it shouldn't). All following EFLAGS accesses has VIF flag setting masked, thus to OS it looked like VIF bit toggled at random. Fix was not that hard: just mask out VIF and VIP bits in EFLAGS when taking interrupts in VM86, as those bits makes no sense outside of VM86 mode.

Next post will cover story of most time consuming bug I ever worked on (about 80 hours of continuous hacking).

Thursday Oct 23, 2008

Long absolute jumps on AMD64

Sometimes it may be required to perform calls and jumps to absolute address on 64-bit AMD. Unfortunately, x86_64 instruction set only allows 32-bit displacements, so traditional approach is to move desired address into register and call or jump using it. Unfortunately, it requires scratch register, or push/pop of register. In case of jump, it also problematic if we wish not touch registers. Here I suggest alternative approach, using ret instructions for long jumps. While not too complicated, this trick can help some compiler/JIT writers to handle very long jumps.
DECLINLINE(void) tcg_out_pushq(TCGContext \*s, tcg_target_long val)
     tcg_out8(s, 0x68); /\* push imm32, subs 8 from rsp \*/
     tcg_out32(s, val); /\* imm32 \*/
     if ((val >> 32) != 0)
         tcg_out8(s, 0xc7); /\* mov imm32, 4(%rsp) \*/ 
         tcg_out8(s, 0x44); 
         tcg_out8(s, 0x24);
         tcg_out8(s, 0x04);
         tcg_out32(s, ((uint64_t)val) >> 32); /\* imm32 \*/
DECLINLINE(void) tcg_out_long_jmp(TCGContext \*s, tcg_target_long dst)
    tcg_out_pushq(s, dst);
    tcg_out8(s, 0xc3); /\* ret \*/

Friday Sep 05, 2008

Python API to the VirtualBox VM

One of the important advantages of the VirtualBox virtualization solution is powerful public API allowing to control every aspect of virtual machine configuration and execution. Last month I was working on Python and Java bindings to that API. Those bindings are shipped with VirtualBox 2.0 SDK.

There are two families of API bindings:

SOAP allows to control remote VMs over HTTP, while XPCOM is much more high-performing and exposes certain functionality not available with SOAP. They use very different technologies (SOAP is procedural, while XPCOM is OOP), but as it is ultimately API to the same functionality of the VirtualBox, we kept in bindings original semantics, so other that connection establishment, code could be written in such a way that people may not care what communication channel with VirtualBox instance is used. As an example of how flexible and powerful those API are, I developed extensible Python command line shell to the ViritualBox, usable as simpler CLI alternative to GUI. Same shell code can work with either SOAP or XPCOM connection to the VirtualBox. To start XPCOM version of shell:
  • download VirtualBox 2.0 for your platform (Linux and Solaris Python bindings officially supported)
  • download SDK
  • unpack SDK
  • cd sdk/bindings/xpcom/python/sample
  • export VBOX_PROGRAM_PATH=/opt/VirtualBox-2.0.0/ PYTHONPATH=..:$VBOX_PROGRAM_PATH
  • ./ to start the shell
Currently shell is capable to start/suspend/resume/powerdown VMs, and persistently set any VM variable. You can easily extend shell with your own command, for example let's implement command to show information on hard disk of the machine.
def showvdiCmd(ctx, args):
    mach = argsToMach(ctx,args) 
    if mach == None:
        return 0
    hdd = mach.getHardDisk(ctx['ifaces'].StorageBus.IDE, 0, 0)
    print 'HDD0 info: id=%s desc="%s" size=%dM location=%s' %(,hdd.description,hdd.size,hdd.location)
    return 0

and add following line to commands map:
'vdiinfo':['Show VDI info', showvdiCmd],

Then you can run it like this: vdiinfo Win32 (or however your VM of interest is named). Easy, isn't it? Moreover this command will work not only with XPCOM bindings, but with SOAP too. This example also shows how to access VirtualBox constants in toolkit neutral manner - 'ifaces' field of context contains reflection information useadble to get values of the constant.

Actually, there are other languages bindings to VirtualBox API shipped with SDK, including Java and C++, but I personally find Python easiest for start. You can ask here questions on VirtualBox language bindings (not only Python), and I will try to help.

Thursday Aug 07, 2008

Informative paper on memory

Ulrich Drepper wrote pretty interesting, yet somewhat longish and probably too detailed paper on memory management. Wouldn't say that I recommend it to everybody, but software people who want look cool knowing that SRAM cell needs 6 transistors, while for DRAM it's enough to have 1 transistor and 1 capacitor should read it for sure.

Seriously speaking, this paper could be of interest if you want to understand what really goes on when you do MOV EAX,[ECX].

Wednesday Aug 06, 2008

Back at Sun

Now I'm back at Sun again — definitively love this place! This time I'm working on the VirtualBox project - OS virtualization software. Project looks very interesting, so in my future postings will cover what I encounter in my journeys deep into the kernel and back :). If you have questions I'm capable to answer - feel free to ask me in comments.

Update: thanks everybody who welcomed me back!

Thursday Jul 19, 2007

Explicit template instantiation in shared libraries

When explicit template instantiation saves the day.[Read More]

Saturday Jul 14, 2007

Double mapping of memory regions on Unix

Mapping same physical memory pages onto several different virtual addresses locations at the time from the userland.[Read More]

Sunday Jul 08, 2007

Hotspot internals Q&A

If you have question on Hotspot VM internals - feel free to ask here.[Read More]

ILP64, LP64, LLP64

What LP64, LLP64, ILP64 stands for?[Read More]

Thursday Jul 05, 2007

Raw page table access

Solaris x86 code demonstrating raw access to CPU's page table. As usual, don't try this on sensitive machines (although this code is pretty safe). [Read More]

Wednesday Jul 04, 2007

Debugger for Win32 (v2)

Mini-debugger for Win32 allowing tracing even statically linked binaries, not only imported symbols.[Read More]

Tuesday Jul 03, 2007

Neat book

Frank Hoffman at Solaris team wrote this book which is neat summary of x86/amd64 low level programming. I like it.

Friday Jun 29, 2007


Using SPARC address space identifiers(ASIs) in application programming.[Read More]

Wednesday Jun 27, 2007

VTBL games

Virtual functions table in C++ - an easy target of accidental or intentional override. [Read More]



« July 2016