Thursday Jul 03, 2008

Welcome to the year 2038!

An illustration of the year 2038 problem. Image taken from Wikipedia, see license there.A lot of interesting things are going to happen in the next 30 years. One of them will be a big push to fix the so called "Year 2038 Problem" on Solaris and other Unix and C-based OSes (assuming there'll be any left), which will be similar to the "Year 2000 Problem".

The Year 2038 Problem

To understand the Year 2038 Problem, check out the definition of time_t in sys/time.h:

typedef long time_t; /\* time of day in seconds \*/

To represent a date/time combination, most Unix OSes store the number of seconds since January 1st, 1970, 00:00:00 (UTC) in such a time_t variable. On 32-Bit systems, "long" is a signed integer between -2147483648 and 2147483647 (see types.h). This covers the range between December 13th, 1901, 20:45:52 (UTC) and January 19th, 2038, 03:14:07, which the fathers of C and Unix thought to be sufficient back then in the seventies.

On 64-Bit systems, time_t can be much bigger (or smaller), covering a range of several hundred thousands of years, but if you're 32-Bit in 2038 you'll be in trouble: A second after January 19th, 2038, 03:14:07 you'll travel back in time and immediately find yourself in the middle of December 13th, 1901, 20:45:52 with a major headache called "overflow".

More details about this problem can be found on its Wikipedia page.

2038 could be today...

Well, you might say, I'll most probably be retired in 2038 anyway and of course, there won't be any 32-Bit systems that far in the future, so who cares?

A customer of mine cared. They run a very big file server infrastructure, based on Solaris, ZFS and a number of Sun Fire X4500 machines. A big infrastructure like this also has a large number of clients in many variations. And some of their clients have a huge problem with time: They create files with a date after 2040.

Now, the NFS standard will happily accept dates outside the 32-Bit time_t range and so will ZFS. But any program compiled in 32-Bit mode (and there are many) will run into an overflow error as soon as it wants to handle such a file. Incidentally, most of the Solaris file utilities (you know, rm, cp, find, etc.) are still shipped in 32-Bit, so having files 30+ years in the future is a big problem if you can't administer them.

The 64-Bit solution

One simple solution is to recompile your favourite file utilities, say, from GNU coreutils in 64-Bit mode, then put them into your path and hello future! You can do this by saying something like:

CC=/opt/SUNWspro/bin/cc CFLAGS=-m64 ./configure --prefix=/opt/local; make

(Use /opt/SunStudioExpress if you're using Sun Studio Express).

Now, while trying to reproduce the problem and sending some of my own files into the future, I found out thanks to Chris and his short "what happes if I try" DTrace script, that OpenSolaris already has a way to deal with these problems: ufs and ZFS just won't accept any dates outside the 32-Bit range any more (check out lines 2416-2428 in zfs_vnops.c). Tmpfs will, so at least I could test there on my OpenSolaris 2008.05 laptop.

That's one way to deal with it, but shutting the doors doesn't help our poor disoriented client of the future. And it's also only available in OpenSolaris, not Solaris 10 (yet).

The DTrace solution

So, I followed Ulrich's helpful suggestions and Chris' example and started to hack together a DTrace script of my own that would print out who is trying to assign a date outside of 32-Bit-time_t to what file, and another one that would fix those dates so files can still be accepted and dealt with the way sysadmins expect.

The first script is called "showbigtimes" and it does just that:

constant@foeni:~/file/projects/futurefile$ pfexec ./showbigtimes_v1.1.d 
dtrace: script './showbigtimes_v1.1.d' matched 7 probes
CPU     ID                    FUNCTION:NAME
  0  18406                  futimesat:entry 
UID: 101, PID: 2826, program: touch, file: blah
  atime: 2071 Jun 23 12:00:00, mtime: 2071 Jun 23 12:00:00


constant@foeni:~/file/projects/futurefile$ /usr/bin/amd64/ls -al /tmp/blah
-rw-r--r--   1 constant staff          0 Jun 23  2071 /tmp/blah

Of course, I ran "/opt/local/bin/touch -t 207106231200 /tmp/blah" in another terminal to trigger the probe in the script above (that was a 64-Bit touch compiled from GNU coreutils).

A couple of non-obvious hoops needed to be dealt with:

  • To make the script thread-safe, all variables need to be prepended with self->.
  • There are two system calls that can change a file's date: utimes(2) and futimesat(2) (let me know if you know of more). The former is very handy, because we can steal the filename from it's second argument, but the latter also allows to just give it a file descriptor. If we want to see the file names for futimesat(2) calls, then we may need to figure them out from the file descriptor. I decided to create my own fd-to-filename table by tapping into open(2) and close(2) calls because chasing down the thread data structures or calling pfiles from within a D script would have been more tedious/ugly.
  • Depending on whether we are fired from utimes(2) or futimesat(2), the arguments we're interested in change place, i.e. the filename will come from arg0 in the case of utimes(2) or arg1 if futimesat(2) was used. To always get the right argument, we can use something like probefunc=="utimes"?argo:arg1.
  • We can't directly access arguments to system calls and manipulate them, we have to use copyin().

I hope the comments inside the script are helpful. Be sure to check out the DTrace Documentation, which was very useful to me.

The second script is called correctbigtimes.d and it not only alerts us of files being sent into the future, it automatically corrects the dates to the current date/time in order to prevent any time-travel outside the bounds of 32-Bit time_t at all:

constant@foeni:~/file/projects/futurefile$ pfexec ./correctbigtimes_v1.1.d 
dtrace: script './correctbigtimes_v1.1.d' matched 2 probes
dtrace: allowing destructive actions
CPU     ID                    FUNCTION:NAME
  0  18406                  futimesat:entry 
UID: 101, PID: 2844, program: touch, fd: 0, file: 
  atime: 2071 Jun 23 12:00:00, mtime: 2071 Jun 23 12:00:00
Corrected atime and mtime to: 2008 Jul  3 16:23:25


constant@foeni:~/file/projects/futurefile$ ls -al /tmp/blah
-rw-r--r-- 1 constant staff 0 2008-07-03 16:23 /tmp/blah

As you can see, we enabled DTrace's destructive mode (of course only for constructive purposes) which allows us to change the time values on the fly and ensure a stable time continuum.

This time, I left out the code that created the file descriptor-to-filename table, because this script may potentially be running for a long time and I didn't want to consume preciuous memory for just a convenience feature (Otherwise we'd kept an extra table of all open files for all running threads in the syste,!). If we get a filename string, we print it, otherwise a file descriptor needs to suffice, we can always look it up through pfiles(1).

The actual time modification takes place inside our local variables, which then get copied back into the system call through copyout().

I hope you liked this little excursion into the year 2038, which can happen sooner than we think for some. To me, this was a great opportunity to dig a little deeper into DTrace, a powerful tool that shows us what's going on while enabling us to fix stuff on the fly.

Update: Ulrich had some suggestions and found a bug, so I updated both scripts to version 1.2:

  • It's better to use two individual probes for utimes(2) and futimesat(2) than placing if-then structures based on probefunc. Improves readability, less error-prone, more efficient.
  • The predicates are now much simpler due to using struct timeval arrays there already.
  • Introduced constants for LONG_MIN and LONG_MAX to improve readability of the predicates.
  • The filename table doesn't account for one process opening a file in one thread, then pass the fd to another thread. Therefore, it's better to have a 2-dimensional array with fd and pid as index that is local.
  • The +8 in the predicate to fetch was incorrect, +16 or sizeof(struct timeval) would have been correct. That's fixed by using the original structures right from the beginning at predicate time.

The new versions are already linked from above or available here: showbigtimes_v1.2.d, correctbigtimes_v1.2.d.


Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!


« April 2014