Door API details

I keep answering this question (or variations) in email, so I thought it might have wider interest. Plus this way I can point to the blog entry rather than repeating myself endlessly. One of the things I've worked on in the past is Solaris Doors. Doors are an inter-process communication mechanism with an RPC-like client/server interface. They differ from "standard" RPC by being (a) fast, (b) relatively simple, and (c) restricted to a single system. In addition, there are some features (particularly the ability to pass door references, and the unreferenced notification) that lend themselves well to implementing complicated distributed system semantics (in fact, the Sun Cluster 3.x product uses a CORBA-style ORB for inter-process and inter-node communication, part of which is implemented using doors). Doors are used fairly extensively within Solaris daemons and other system-level software that is shipped as part of the OS.

A door is created when a process (known as the door server) calls door_create(3DOOR) with a server function and gets a file descriptor back. That descriptor then can be passed to other processes or attached to the file system using fattach(3C). Once another process (the door client) has the descriptor, it can "invoke" the door by calling door_call(3DOOR). The client can also pass data and descriptors (including other door descriptors). As a result of the call to door_call, the client thread blocks and a thread in the door server wakes up and starts running the server function. When the server function is complete, it calls door_return(3DOOR) to pass (optional) data and descriptors back to the client. door_return also switches control back to the client; the server thread blocks in the kernel and never returns from the door_return call.

This leads to a problem: if I allocate data to return to the client via door_return, how do I free it? I can't free it before calling door_return, obviously, and control never returns to me after calling door_return (unless there's an error), so I can't just free it after the call. There are a few ways to handle this (in increasing order of complexity):

  • Copy the data to the stack. On each door call, the server thread's stack is "rewound" to the base. This implicitly frees any data on the stack, so any data that needs to be returned to the client can be first copied onto the stack (using a local variable or alloca(3C)), then freed before calling door_return with a pointer to the stack data.

  • Use thread-specific data. When a server thread starts running the server function, we know that any data previously by returned a call to door return from the same thread has already been copied into the client's address space. This means you can use thread-specific data to track previously returned data; for example, the server function could check and free any per-thread data stored due to prior door calls before continuing to execute. Note that, if the server thread is never re-used, the data will still be allocated. Other threads can't free this data since there's no way to make sure the data has been copied back to the client.

  • Use a door reference. When returning data that needs to be freed, create a door with the DOOR_UNREF flag and associate the door's unique ID (see door_info(3DOOR)) with the data in a hash table. Then, pass the door back to the client. The client should call close(2) on the door as soon as it receives it; this will send an unreferenced notification to the server (as long as the server still has one reference, since the notification happens when the reference count goes from 2 to 1). An unreferenced notification is just like a normal door call from the server's point of view, except that the data pointer is set to a special value (see the door_create(3DOOR) man page for details). When the unreferenced notification happens, the server can look up the unique id in the hash table and free the referenced data.

I've also considered extending the doors API to include something like a door_reply() function, which could be used (optionally) to specify reply data without losing execution control. On return from door_reply, the reply data will have been copied back to the client (or into the kernel), and the server can free the data from its address space. The control transfer back to the client would happen with a subsequent door_return() call (the arguments of which would be ignored). This is a bit slower than the standard door_return semantics (since two trips into the kernel are required), but makes freeing reply data and other server-side cleanup much simpler. Unfortunately, I haven't had time to actually implement this, or convince someone else to do it.

For those wishing more information on doors (particularly if the above didn't make any sense), there's a good introductory chapter in the second edition of Unix Network Programming, Volume 2: Interprocess Communication by the late Richard Stevens. The original idea for doors came from Spring OS, a research operating system developed in Sun Labs. The details were changed significantly in the transition to Solaris. There is also a Linux implementation based on the Solaris API, though it isn't part of the standard kernel.

Comments:

So, how do we debug doors?? If it's over network, we can use snoop/etherreal to see what traffic is transferred to server and what is transferred back. Is there a similar feature for doors?? (truss(1) is not that useful either in this case...) second: i know of nscd(1M) and ldap_cachemgr(1M) using doors concept in Solaris. What other applications are using doors? just curious... thanks, Anonymous (for the time being :-))

Posted by Anonymous on August 04, 2004 at 09:50 AM PDT #

Up until Solaris 10, there hasn't been a good way to monitor data being passed using door calls without adding debugging code to the client or server. With Solaris 10, though, you can use DTrace - it's fairly easy to write a script that dumps out the data being passed around. You'll obviously get better results if you know (or can guess) the structure of the data, though. See http://www.sun.com/bigadmin/content/dtrace/ for details on DTrace.

Other internal Solaris apps using doors: kcfd (a new crypto framework daemon in 10), IKE (the IPsec key management app), the DHCP server, devfsadmd, syslogd, picld, syseventd, and (again in 10) zoneadmd, which manages the zone/container lifecycle. There's more, this is off the top of my head.

Posted by Andy Tucker on August 04, 2004 at 10:55 AM PDT #

Thanks for the info and I appreciate the fast response. And that's a long list of apps using doors in S10, thanks once again.

Posted by Anonymous on August 04, 2004 at 01:37 PM PDT #

That's really nice that Solaris 10 has DTrace. Previously, in Solaris 8 and 9 we had very hard time trying to make peace between door server thread and TNF probes. It took around year to get the problem fixed. And then I was teased by our Sun support Engineer on forthcoming tracing mechanism in Solaris 10. The software we developed relies heavily on both doors and TNF. Also there is small issue with door_return() when you are writing C++ code. You cannot use automatic objects in door server function, since the end of the function will never be reached, thus dtors won't be called. If your classes require proper destuctors you may need to have some workaround. The solution I end up with was an overloaded door_return() which copies parameters aside and exits, thus letting function reach the end. Then the real door_return is called from the wrapper static function.

Posted by Cyril Plisko on August 05, 2004 at 02:06 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

tucker

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today