Monday Nov 16, 2009

Fun With DTrace: The Windows-Key Prank

The current episode of the German HELDENFunk podcast features an interview with Chris Gerhard about one of his favourite subjects: DTrace (in English, beginning at 14:58):

After the interview, we hear a guy called "Konteener Kalle" express his love (in German) for DTrace by playing a prank on his boss: Whenever he presses the Windows key (on an OpenSolaris system, mind you), he's punished by watching the XScreensaver BSOD hack (of course not knowing that it's just a screensaver).

That little joke challenged me to actually implement this prank. Here's how to do it.

The Idea

The idea of this prank is to start the XScreensaver Blue-Screen-of-Death screensaver (which simulates a Windows crash experience) on an OpenSolaris system whenever the user presses a certain key a certain number of times. This could be the Windows-Key (which doesn't have any real use on an OpenSolaris machine) or any other key. We count the number of key presses and only execute the BSOD after a certain number of key presses in order to make the prank less obvious.

Step 1: Identify the Windows (or any other) Key

If you have a Windows-Keyboard, this is easy: Run xev and press the Windows-Key. Take note of the keycode displayed in the xev output. Of course you can use any other key as well to play this prank. In this case, I'm using the left Control-Key, because I don't have a Windows-Key on the system I'm working on. The Control key has the keycode 37.

Step 2: Configure XScreensaver for BSOD

XScreensaver comes with a great collection of "hacks" that do interesting stuff on the screen when the screensaver activates. Check out the /usr/lib/xscreensaver/hacks directory. Each hack can be run individually, but then it will only execute inside a new window. For the BSOD illusion to be realistic, we want to execute the BSOD hack in full-screen.

This can be achieved by telling XScreensaver to demo the BSOD hack for us. It will then create a full-screen window and execute the BSOD hack inside the new window. The following command will tell XScreensaver to run a hack for us:

xscreensaver-command -demo <number>

The <number> part is a little complicated: XScreensaver looks at its config file ~/.xscreensaver where it stores a list of programs and arguments after the keyword "programs:". <number> simply refers to the number of the hack on that list. Therefore, we must create an entry in our admin user's .xscreensaver file that starts bsod(6) with the right parameters and that gives us a known number to call xscreensaver-command with.

Let's put our entry at the top of the list so we can simply use the number "1" to execute the BSOD screensaver. Somewhere in our .xscreensaver, the programs section should look like this:

  textFile:       /etc/motd
  textProgram:    date

  programs:                                                                     \\
  -               "BSOD Windoze"  bsod -root -only nt         \\n\\
  -                "Qix (solid)"  qix -root -solid -segments 100              \\n\\
  -          "Qix (transparent)"  qix -root -count 4 -solid -transparent      \\n\\

You can test this by running xscreensaver-command -demo 1.

Step 3: Write a DTrace Script That Sets Up the Trap

Now it gets more interesting. How do we use DTrace to find out when a user presses a certain key? All we know is that the Xorg server processes the keystrokes for us. So let's start by watching Xorg in action. The following DTrace command will trace all function calls within Xorg:

pfexec dtrace -n pid`pgrep Xorg`:::entry'{ @func[probefunc] = count(); }'

Let's start it, press the desired key 10 times, then stop it with CTRL-C. You'll see a long list of Xorg functions, sorted by the number of times they've been called. Since we pressed the key 10 times, it's a good idea to look for functions that have been called ca. 10 times. And here, we seem to be lucky:

  miUnionO                                                          8
  DeviceFocusInEvents                                               9
  CommonAncestor                                                   10
  ComputeFreezes                                                   10
  CoreLeaveNotifies                                                10
  key_is_down                                                      11
  FreeScratchPixmapHeader                                          12
  GetScratchPixmapHeader                                           12
  LookupIDByType                                                   12
  ProcShmDispatch                                                  12
  ProcShmPutImage                                                  12

The key_is_down function looks like exactly the function we're looking for! In fact, some googling tells us that this function's 2nd argument is the keycode of the key that is down when the function is called.

Why do we see "11" and not "10" function calls to key_is_down? Because it also counted my pressing of the Ctrl-Key when I stopped the DTrace script through Ctrl-C :).

This gives us enough knowledge to create the following DTrace script:

  #!/usr/sbin/dtrace -s

   \* BSODKey.d

   \* This D script will monitor a certain key in the system. When this key is
   \* pressed, a shell script will be executed that simulates a BSOD.
   \* The script needs the process id of the Xorg server to tap into as its
   \* first argument.
   \* One example of using this script is to punish a user pressing the
   \* Windows key on an OpenSolaris system by launching the BSOD screen saver.

  #pragma D option quiet
  #pragma D option destructive

          ctrlcount = 0;

  /arg1 == keycode/
          ctrlcount ++;

  /ctrlcount == 10/
          ctrlcount = 0;
          system("/usr/bin/xscreensaver-command -demo 1");

First, we need to enable DTrace's destructive mode (ever heard of a "constructive prank"?) otherwise we can't call the system-command at the end. The script uses the pid provider to tap into Xorg. Therefore, we need to give it the PID of the Xorg server as an argument:

pfexec ./BSODKey.d `pgrep Xorg`

It then sets up a probe that fires whenever key_is_down is called with our keycode and counts the key presses. At the end of the key_is_down function call, it checks whether we reached 10 keypresses, then executes the BSOD screen saver and resets the counter. You may need to make sure that the DISPLAY variable is set correctly for the BSOD program to show up on the victim's screen when starting this script.

After hitting the Control-Key 10 times, we're rewarded with our beloved BSOD:


That wasn't too difficult, was it? Yes, one could have done the same thing by writing a regular script that taps into /dev/kbd or something similar. But the beauty of DTrace lies in the simplicity of this script (Tap into the right function while it's running) and in the fact that it now can be modified very easily to fire BSODs at any kind of event, including the user hitting a certain area of the screen with his mouse or selecting a particular text or whatever you choose it to be.

So, have fun with this script and let me know in the comments what kind of pranks (or helpful actions) you can imagine with DTrace!

Monday Jun 15, 2009

OpenSolaris meets Mac OS X in Munich

Last Wednesday, Wolfgang and I had the honor to present at "Mac Treff München", Munich's local Mac User Group. There are quite a few touching points between OpenSolaris and Mac OS X, such as ZFS, DTrace and VirtualBox, we thought it would be a good idea to contact them out of our Munich OpenSolaris User Group and talk a little bit about OpenSolaris.

Breaking the Ice

We were a little bit nervous about what would happen. Do Mac people care about the innards of a different, seemingls non-GUIsh OS? Are they just fanboys or are they open to other people's technologies? Will talking about redundancy, BFU, probes and virtualization bore them to death?

Fortunately, the 30-40 people that attended the event proved to be a very nice, open and tolerant group. They let us talk about OpenSolaris in General including some of the nitty-grittyness of the development process, before we started talking about the features that are more interesting to Mac users. We then talked about ZFS, DTrace and VirtualBox:

ZFS for Mac OS X (or not (yet)?)

Explaining the principles behind ZFS to people who are only used to draging'n'dropping icons, shooting photos or video and using computers to get work done, without having to care about what happens inside, is not easy. We concentrated on getting the basics of the tree structure, copy-on-write, check-summing and using redundancy to self-heal while using real world examples and metaphors to illustrate the principles. Here's the deal: If you have lots of important data (photos, recording, videos, anyone?) and care about it (content creators...), then you need to be concerned about data availability and integrity. ZFS solves that, it's that simple. A little animation in the slides were quite helpful in explaining that, too :).

The bad news is that ZFS seems to have vanished from all of Apple's communication about the upcoming Mac OS X Snow Leopard release. That's really bad, because many developers and end-users were looking forward to take advantage of it.

The good news is that there are still ways to take advantage of ZFS as a Mac User: Run an OpenSolaris file server for archiving your data or using it as a TimeMachine store, or even run a small OpenSolaris ZFS Server inside your Mac through VirtualBox.

DTrace: A Mac Developer/Admin's Heaven, Albeit in Jails

Next, we dove a little bit into DTrace and how it makes the OS really transparent for admins, developers and users. In addition to the dtrace(1) command, Apple created a nice GUI called "Instruments" as part of their XCode development environment that leverages the DTrace infrastructure to collect useful data about your application in realtime.

Alas, as with ZFS, there's another downer, and this time it's more subtle: While you can enjoy the power of DTrace in Mac OS X now, it's still kinda crippled, as Adam Leventhal pointed out: Processes can escape the eyes of DTrace at will, which counters the absolute observability idea of DTrace quite massively. Yes, there are valid reasons for both sides of the debate, but IMHO, legal things should be enforced using legal means, and software should be treated as software, meaning it is not a reliable way of enforcing any license contracts - with or without powerful tools such as DTrace.

OpenSolaris for all: VirtualBox

Finally, a free present to the Mac OS X community: VirtualBox. I still get emails asking me to spend 80+ dollars on some virtualization software for my Mac. There are at least two choices in that price range: VMware Workstation and Parallels. Well, the good news is that you can save your 80 bucks and use VirtualBox instead.

This may not be new to you, since as a reader of my blog you've likely heard of VirtualBox before, but it's always amazing for me to see how slowly these things spread. So, after reading this article, do your Mac friends a favour and tell them they can save precious money buy just downloading VirtualBox instead of spending money on other virtualization solutions for the Mac. It's really that simple.

Indeed, this was the part where the attendees took most of their notes, and asked a lot of questions about (ZFS being a close first in terms of discussion/questions).


After our presentations, a lot of users came up and asked questions about how to install OpenSolaris on their hardware and on VirtualBox. Some even asked where to buy professional services for installing them an OpenSolaris ZFS fileserver in their company. The capabilities of ZFS clearly struck some chords inside the Mac OS X community, which is no wonder: If you have lots of Audio/Video/Photo data and care about quality and availability, then there's no way around FS.

I used this event as an excuse to try out keynote, which worked quite well for me, especially because it helped me create some easy to understand animations about the mechanics of ZFS. I also liked the automatic guides a lot which help you position elements on your slides very easily and seem to guess very well what your layout intentions were. I'd love the OpenOffice folks to check out Keynote's guides and see if they can come up with something similar. So, here's a Keynote version of my "OpenSolaris for Mac Users" slides as well as a PDF version (both in German) for you to check out and re-use if you like.

Update: Wolfgang's introductory slides are now available for download as well and Klaus, the organizer of the event, posted a review in the Mac Treff München Blog with some pictures, too.

Monday Mar 02, 2009

The Inner Life of ZFS: Cool ZFS On-Disk Block Structure Movies

Pascal Gienger of Konstanz University published a nifty DTrace script that captures ZFS' on-disk block activity and published it on his Southbrain blog.

The cool thing: He animated the data. That's right. Using a Perl script, he draws greener or redder dots depending on whether a particular range of blocks on disk sees more reads or writes. By aggregating data over many hours while doing interesting tasks such as backup, he created a series of very cool animations.

In his first post, he shows us the inner life of a Postfix mail queue as an animated GIF:

ZFS on-disk block animation

Then, he compared the write patterns of UFS vs. ZFS using a MySQL workload to produce a cool MPEG-4 movie.

In his latest ZFS animation work, he shows us 18 hours of a mirrored file server including some backup, night rest and user action (Download MPEG-4 Movie here).

Congratulations, Pascal, this is way cool stuff. You really should upload these to YouTube so people can embed them in their blogs :).

Update: Meanwhile, pascal told me that he uploaded his videos on YouTube already. He has a full playlist full of them. Enjoy!

Thursday Jul 03, 2008

Welcome to the year 2038!

An illustration of the year 2038 problem. Image taken from Wikipedia, see license there.A lot of interesting things are going to happen in the next 30 years. One of them will be a big push to fix the so called "Year 2038 Problem" on Solaris and other Unix and C-based OSes (assuming there'll be any left), which will be similar to the "Year 2000 Problem".

The Year 2038 Problem

To understand the Year 2038 Problem, check out the definition of time_t in sys/time.h:

typedef long time_t; /\* time of day in seconds \*/

To represent a date/time combination, most Unix OSes store the number of seconds since January 1st, 1970, 00:00:00 (UTC) in such a time_t variable. On 32-Bit systems, "long" is a signed integer between -2147483648 and 2147483647 (see types.h). This covers the range between December 13th, 1901, 20:45:52 (UTC) and January 19th, 2038, 03:14:07, which the fathers of C and Unix thought to be sufficient back then in the seventies.

On 64-Bit systems, time_t can be much bigger (or smaller), covering a range of several hundred thousands of years, but if you're 32-Bit in 2038 you'll be in trouble: A second after January 19th, 2038, 03:14:07 you'll travel back in time and immediately find yourself in the middle of December 13th, 1901, 20:45:52 with a major headache called "overflow".

More details about this problem can be found on its Wikipedia page.

2038 could be today...

Well, you might say, I'll most probably be retired in 2038 anyway and of course, there won't be any 32-Bit systems that far in the future, so who cares?

A customer of mine cared. They run a very big file server infrastructure, based on Solaris, ZFS and a number of Sun Fire X4500 machines. A big infrastructure like this also has a large number of clients in many variations. And some of their clients have a huge problem with time: They create files with a date after 2040.

Now, the NFS standard will happily accept dates outside the 32-Bit time_t range and so will ZFS. But any program compiled in 32-Bit mode (and there are many) will run into an overflow error as soon as it wants to handle such a file. Incidentally, most of the Solaris file utilities (you know, rm, cp, find, etc.) are still shipped in 32-Bit, so having files 30+ years in the future is a big problem if you can't administer them.

The 64-Bit solution

One simple solution is to recompile your favourite file utilities, say, from GNU coreutils in 64-Bit mode, then put them into your path and hello future! You can do this by saying something like:

CC=/opt/SUNWspro/bin/cc CFLAGS=-m64 ./configure --prefix=/opt/local; make

(Use /opt/SunStudioExpress if you're using Sun Studio Express).

Now, while trying to reproduce the problem and sending some of my own files into the future, I found out thanks to Chris and his short "what happes if I try" DTrace script, that OpenSolaris already has a way to deal with these problems: ufs and ZFS just won't accept any dates outside the 32-Bit range any more (check out lines 2416-2428 in zfs_vnops.c). Tmpfs will, so at least I could test there on my OpenSolaris 2008.05 laptop.

That's one way to deal with it, but shutting the doors doesn't help our poor disoriented client of the future. And it's also only available in OpenSolaris, not Solaris 10 (yet).

The DTrace solution

So, I followed Ulrich's helpful suggestions and Chris' example and started to hack together a DTrace script of my own that would print out who is trying to assign a date outside of 32-Bit-time_t to what file, and another one that would fix those dates so files can still be accepted and dealt with the way sysadmins expect.

The first script is called "showbigtimes" and it does just that:

constant@foeni:~/file/projects/futurefile$ pfexec ./showbigtimes_v1.1.d 
dtrace: script './showbigtimes_v1.1.d' matched 7 probes
CPU     ID                    FUNCTION:NAME
  0  18406                  futimesat:entry 
UID: 101, PID: 2826, program: touch, file: blah
  atime: 2071 Jun 23 12:00:00, mtime: 2071 Jun 23 12:00:00


constant@foeni:~/file/projects/futurefile$ /usr/bin/amd64/ls -al /tmp/blah
-rw-r--r--   1 constant staff          0 Jun 23  2071 /tmp/blah

Of course, I ran "/opt/local/bin/touch -t 207106231200 /tmp/blah" in another terminal to trigger the probe in the script above (that was a 64-Bit touch compiled from GNU coreutils).

A couple of non-obvious hoops needed to be dealt with:

  • To make the script thread-safe, all variables need to be prepended with self->.
  • There are two system calls that can change a file's date: utimes(2) and futimesat(2) (let me know if you know of more). The former is very handy, because we can steal the filename from it's second argument, but the latter also allows to just give it a file descriptor. If we want to see the file names for futimesat(2) calls, then we may need to figure them out from the file descriptor. I decided to create my own fd-to-filename table by tapping into open(2) and close(2) calls because chasing down the thread data structures or calling pfiles from within a D script would have been more tedious/ugly.
  • Depending on whether we are fired from utimes(2) or futimesat(2), the arguments we're interested in change place, i.e. the filename will come from arg0 in the case of utimes(2) or arg1 if futimesat(2) was used. To always get the right argument, we can use something like probefunc=="utimes"?argo:arg1.
  • We can't directly access arguments to system calls and manipulate them, we have to use copyin().

I hope the comments inside the script are helpful. Be sure to check out the DTrace Documentation, which was very useful to me.

The second script is called correctbigtimes.d and it not only alerts us of files being sent into the future, it automatically corrects the dates to the current date/time in order to prevent any time-travel outside the bounds of 32-Bit time_t at all:

constant@foeni:~/file/projects/futurefile$ pfexec ./correctbigtimes_v1.1.d 
dtrace: script './correctbigtimes_v1.1.d' matched 2 probes
dtrace: allowing destructive actions
CPU     ID                    FUNCTION:NAME
  0  18406                  futimesat:entry 
UID: 101, PID: 2844, program: touch, fd: 0, file: 
  atime: 2071 Jun 23 12:00:00, mtime: 2071 Jun 23 12:00:00
Corrected atime and mtime to: 2008 Jul  3 16:23:25


constant@foeni:~/file/projects/futurefile$ ls -al /tmp/blah
-rw-r--r-- 1 constant staff 0 2008-07-03 16:23 /tmp/blah

As you can see, we enabled DTrace's destructive mode (of course only for constructive purposes) which allows us to change the time values on the fly and ensure a stable time continuum.

This time, I left out the code that created the file descriptor-to-filename table, because this script may potentially be running for a long time and I didn't want to consume preciuous memory for just a convenience feature (Otherwise we'd kept an extra table of all open files for all running threads in the syste,!). If we get a filename string, we print it, otherwise a file descriptor needs to suffice, we can always look it up through pfiles(1).

The actual time modification takes place inside our local variables, which then get copied back into the system call through copyout().

I hope you liked this little excursion into the year 2038, which can happen sooner than we think for some. To me, this was a great opportunity to dig a little deeper into DTrace, a powerful tool that shows us what's going on while enabling us to fix stuff on the fly.

Update: Ulrich had some suggestions and found a bug, so I updated both scripts to version 1.2:

  • It's better to use two individual probes for utimes(2) and futimesat(2) than placing if-then structures based on probefunc. Improves readability, less error-prone, more efficient.
  • The predicates are now much simpler due to using struct timeval arrays there already.
  • Introduced constants for LONG_MIN and LONG_MAX to improve readability of the predicates.
  • The filename table doesn't account for one process opening a file in one thread, then pass the fd to another thread. Therefore, it's better to have a 2-dimensional array with fd and pid as index that is local.
  • The +8 in the predicate to fetch was incorrect, +16 or sizeof(struct timeval) would have been correct. That's fixed by using the original structures right from the beginning at predicate time.

The new versions are already linked from above or available here: showbigtimes_v1.2.d, correctbigtimes_v1.2.d.


Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!


« July 2016