Eeek: Time isn't accurate in a VM!

I noticed that time was all over the place on my Solaris guest, and it made me wonder just how to measure and quantify time on a virtualized guest. The problem is, if you use gettimeofday() in the guest as a reference, it too may not be accurate. So, I used an external time reference to measure the guest, and low and behold, time was indeed out!

On my most recent VMware test configuration, Solaris was jumping forward several seconds/minutes at random times with snv as a guest to vmware on Ubuntu.

The test host was Ubuntu 6.10 on a 2x2core opteron rev.e system.

Basically, the problem is that the ubuntu dom0 power manages the opteron cores, AND it seems that some virtualization layers (in this case we used vmware) don't take into account that the time registers (tsc's) are not in sync across cores. When this happens, time jumps forward at random intervals, sometimes up to an hour. This particular problem only happens if numa systems are used with non syncrhonous tsc's.

To solve the problem, I bound my snv guest to a core, and tell VMware not to adjust the tsc: here's what I have as a description: "The host.noTSC and ptsc.noTSC lines enable a mechanism that tries to keep the guest clock accurate even when the time stamp counter (TSC) is slow."

processor0.use = FALSE 
processor1.use = FALSE 
processor2.use = FALSE 
 
host.cpukHz = 2200000 
host.noTSC = TRUE 
ptsc.noTSC = TRUE 

Here's what I did to quantify the issue: I ran an externally controlled timed benchmark of a time program, one of a reference (unvirtualized host) and the other on the virtualized snv guest. That way, I could know what the real elapsed time was rather than assuming what the guest was telling me. Interestly, the guest was indeed lying about it's notion of wallclock time.

Here's what 18 seconds looks like on (a) a vmware solaris guest, and (b) a reference machine (old SPARC), e.g. both of these test ran for exactly 18 real seconds. Sec is the # of seconds via gettimeofday() from the start, the tod is a delta of gettimeofday() between 1s sleeps.

Reference: 
Sec     tod   hrtime 
 0  1009354  1009476 
 1  1009801  1009817 
 2  1010484  1010505 
 3  1009315  1009333 
 4  1009905  1009923 
 5  1009909  1009930 
 6  1009905  1009924 
 7  1009911  1009945 
 8  1009842  1009869 
 9  1009897  1009918 
10  1009887  1009906 
11  1009918  1009933 
12  1009903  1009920 
13  1009913  1009931 
14  1009892  1009910 
15  1009908  1009928 
16  1009895  1009921 
17  1009900  1009929 
18  1009876  1009896 

snv Guest:

 
Sec     tod   hrtime 
 0  1000665  1000702  
 1  1007738  1007756  
 2 177169798 177169813  <= Argh!
179  1011251  1011275  
180  1008404  1008432  
181  1009989  1010016  
182  1009618  1009644  
183  1009896  1009924  
184  1009747  1009766  
185  1000265  1000291  
186  1009336  1009360  

In the Solaris vmware guest with numanode = "1" set, it gets better, but now time runs slow (setting this binds the guest onto a numa & time coherent set of cores):

Sec     tod   hrtime 
 0  1000122  1000139 
 1  1004468   989472 
 2  4973883   939326 
 6  1009682  1009689 
 7  1005019   991275 
 8  4975168   939355 
13  1009630  1009638 
14  1003097   989955 

With the following params set:

processor0.use = FALSE 
processor1.use = FALSE 
processor2.use = FALSE 
host.cpukHz = 2200000 
host.noTSC = TRUE 
ptsc.noTSC = TRUE 

Sec     tod   hrtime 
 0  1004787  1004911 
 1  1009783  1009802 
 2  1009914  1009935 
 3  1009894  1009913 
 4  1009895  1009913 
 5  1009900  1009918 
 6  1037644  1037680 
 7  1002091  1002117 
 8  1009910  1009929 
 9  1009897  1009920 
10  1009904  1009923 
11  1009893  1009913 
12  1009916  1009934 
13  1009876  1009894 
14  1009893  1009918 
15  1009873  1009891 
16  1009901  1009921 
17  1009883  1009911 
18  1009903  1009922 

Success!

Comments:

Have you looked at the vmware-tools option for time syncing? After installing the tools and running vmware-toolbox I had an option: tools.syncTime = "TRUE" in my VM config file.

Posted by William Hathaway on March 28, 2007 at 09:42 PM PDT #

Hi William, Indeed, syncTime is useful to adjust for small drifts, but the underlying time counters in VMware don't work well on MP opteron systems, since the TSC's aren't in sync -- even with syncTime=true in the config, time jumps forward then backward in large amounts, and timer based sleeps don't work as expected. I'd like to get the out of the box defaults better set for Solaris, I'll post back after futher discussion with VMware.

Posted by Richard McDougall on March 30, 2007 at 03:10 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

rmc

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today