Saturday Oct 03, 2009

New in 124: physical eject button

OpenSolaris build 124 is out and includes a new feature: now you can use the eject button on your CD/DVD drive to eject the disc, even if it is mounted. To quote the heads-up message:

Pressing Eject button on CD/DVD drive's front panel will have the same effect as typing 'eject' in the terminal or clicking the corresponding GUI icon in the File Browser. Most modern drives support this feature, though some older ones don't.

You may see a harmless pop-up message "Unable to unmount volume" due to GNOME bug 9805. It can be safely ignored, but if it bothers you, the workaround is:

svccfg -s rmvolmgr setprop rmvolmgr/eject_button=boolean: false svcadm restart rmvolmgr

The side effect of the workaround is being able to use physical eject button only with a GNOME session running, not when logged into the text console.

Aside from the issues listed in the build notes, be aware of the "Console User" problems. If you are curious about how the eject button works in OpenSolaris, read ARC case 2009/058.

It is quite remarkable that Solaris managed to go without this capability for so long: other OSes have had it for years. I should have fixed it as part of the Tamarack project, but had to sacrifice some features to meet the schedule. I then switched to networking projects, and the feeling of unfinished business has been nagging me ever since.

Walking around with a solution to a problem in your head is, I imagine, not unlike being pregnant: you cannot carry it inside forever. I knew how, I had the code visualized mentally. It was a great relief when one evening I finally initiated the mind->computer transfer. The hard part was to properly test the code: too much weird or buggy hardware out there. All of the PCs, x86 laptops and recent Sun products worked just fine. I did encounter some old Sun gear that did not support the eject button - for those, the eject(1M) command continues to be the only option.

I'd like to thank Neal, Phi and Larry for their help with the last mile effort.

Monday Jul 14, 2008

New in snv_93: dladm link property persistence

I once joined a pickup basketball game and one of my teammates only noticed me after three or four possessions. Another teammate tried to be supportive and used the word "stealth" to characterize my role in the game (my own choice was "zero impact"). Story of my life.

But sometimes that's a good thing. A lot of OS improvements are like that, incremental in nature, not visible to most users, but contributing to the overall movement towards better future. Not quite the shouting from the rooftops material, but noteworthy to geeks and search engines.

Such is my recent putback, a followup to the earlier Brussels framework putback. Originally, properties set via dladm set-linkprop were not guaranteed to persist when a link was unplumbed and later replumbed. One would have to reapply the properties using the undocumented init-linkprop subcommand:

ifconfig link0 plumb
dladm init-linkprop

My putback makes it automatic. No need to reapply the properties manually anymore.

What happens under the hood is a bit more complicated. dladm shows you all links the system knows about:

# dladm show-link
bge0        phys     1500   up       --
iprb0       phys     1500   unknown  --
iprb1       phys     1500   unknown  --

This information, along with link properties, is stored in /etc/dladm/datalink.conf. The kernel, however, only needs to know about links that are currently in use. What the kernel knows can be extracted from mdb:

# mdb -k
> \*i_dls_devnet_id_hash::modhash -e | ::modent -v | ::print dls_devnet_t dd_spa
dd_spa = [ "bge0/0" ]

Generally, a link becomes in use when its corresponding device file under /dev/net is first opened. That's what happens, for instance, when you plumb the link. This is the moment the kernel learns about the link. This is also the moment a new thread is launched, which communicates with the datalink management daemon, dlmgmtd(1M). It requests the daemon to perform the equivalent of dladm init-linkprop for this new link. The daemon extracts property data from datalink.conf and pushes it into the kernel.

There are still two cases when persistence is not guaranteed: wifi links and private properties. The following CRs have been filed and are being worked on:

6691666 link property persistence for wifi drivers
6688428 dladm init-linkprop ignores private properties

Monday Mar 24, 2008

Extended Self-ID for USB devices

Most USB devices already have some sort of non-volatile memory that contains firmware and some self-identification data, presented to the host OS via USB descriptors - the kind of stuff you'd see as device properties in 'prtconf -v' output. This non-volatile memory is often in the form of tiny EEPROM arrays connected to the internal microcontroller via I2C, or something similar. Instead, now that USB flash memory is becoming dirt cheap, why not put a decent amount of it in every USB device, as an additional logical device ("interface" in USB speak, or "logical unit")?

It would be cool, for instance, if instead of lame, wasteful "installation CDs", devices could carry their own drivers (though it would be even cooler if all devices were class compliant). Just connect your shiny new webcam or a phone and the OS can install drivers right away. Or at least pick up a URI of a Web Service for finding the drivers. It's just a matter of standardizing on file formats and Web APIs. Kinda like a more generic version of U3.

Wednesday Oct 03, 2007

Kicking it Brussels-style

After 7 years of pretending to know something about I/O, I decided to see if I can pull the same trick with networking. The first project I chose to sabotage is Brussels. Project's mission can be described with a thousand words, but, the creative type that I am, I drew you a cool picture:

The first bit of code I've just contributed (here's the webrev) is mainly for the MAC services module. I added per-link property handles, which the network drivers can use like so:

    err = mac_prop_init("driver", instance, &handle);
    val = mac_prop_get_uint64(handle, "property");

For each plumbed link, MAC keeps a list of properties that ever entered the kernel. Pointers to these lists are stored in a hash table, using link name as a key. I also added MDB support for these data structures. The mac_proplist walker walks the hash table entries, and the mac_prop walker walks the property list, so you can say stuff like:

> ::walk mac_proplist | ::walk mac_prop | ::print mac_prop_t

Even more convenient is the new ::mac_prop dcmd (which internally uses the above walkers):

> ::help mac_prop

   mac_prop - display MAC properties of a link or all links

   ::mac_prop [link]


   Target: kvm
   Module: mac
   Interface Stability: Unstable

 > ::mac_prop
            ADDR LINK             PROPERTY         SIZE VALUE
fffffffec7d80480 bge0             default_mtu      8    1500
fffffffec7d80a80 bge0             adv_autoneg_cap  1    1

There wasn't an existing MDB module for MAC, so I created it too. Here's a good opportunity for other contributors to the MAC layer, hint hint, to add more MAC data structures to MDB.

Wednesday Jul 18, 2007

Bug Clicker 0.1

Firefox+Thunderbird extension to open bug pages and ARC cases by clicking on them.[Read More]

Friday Feb 03, 2006

My little Solaris security cheat sheet

This returned me to sanity a few times while learning about Solaris security. Like many others, I'm not a security expert and I often need a short version to fit in my head.

authorization A right assigned to users that is checked by privileged programs to determine whether users can execute restricted functionality. More in auth_attr(4).

privilege An attribute that provides fine-grained control over the actions of processes, as opposed to traditional unix all-or-nothing, super-user vs user, model. More in privileges(5).

profile A logical grouping of authorizations and commands. Profile shells, pf[ck]sh, interpret profiles to form a secure execution environment. More in prof_attr(4), exec_attr(4).

role A type of user account, with associated authorizations and profiles. Roles cannot be logged in directly - users assume roles using su(1M).

how to getCLIAPI

Per-user: all user processes have same authorizations. Per-process: each process has separate privilege sets.
Static: once assigned to user, remains the same. Dynamic: privilege sets can change during process lifecycle.
A simple token. In theory can be easily added to other OSes. Integrated deep into Solaris.
Userland Userland and kernel.
Introduced in Solaris 8 1 Introduced in Solaris 10 1

1Was also available much earlier in Trusted Solaris.


Sunday Dec 18, 2005

USB serial drivers, Part 4

[Parts 1, 2, 3]

Here are a few DSD coding tips (all code herein is under CDDL).

Naming convention

Module names for STREAMS drivers are limited to 8 characters. By convention a USB serial driver name should start with "usbs", which leaves only 4 characters to identify device vendor. For example, our Keyspan driver will be called "usbsksp". (The Edgeport driver is called "usbser_edge", but that's a bug).

I will be using "usbsxx" for driver name in this blog.

Common driver code

Most Solaris drivers share the same structure: define cb_ops, dev_ops, modldrv and modlinkage structures; STREAMS drivers additionally define module_info and streamtab; then define the soft state opaque pointer; and finally the entry point functions. So do USB serial drivers except they also need to define ds_ops. STREAMS structures look like this:

struct module_info usbsxx_modinfo = {
	0,			/\* module id \*/
	"usbsxx",		/\* module name \*/
	USBSER_MIN_PKTSZ,	/\* min pkt size \*/
	USBSER_MAX_PKTSZ,	/\* max pkt size \*/
	USBSER_HIWAT,		/\* hi watermark \*/
	USBSER_LOWAT		/\* low watermark \*/

static struct qinit usbsxx_rinit = {

static struct qinit usbsxx_winit = {

struct streamtab usbsxx_str_info = {
	&usbsxx_rinit, &usbsxx_winit, NULL, NULL

Other driver structures are nothing different from any other driver. Don't forget to put streamtab address in cb_ops.

_init(9E) is special in that it should use usbser_soft_state_size() for soft state allocation:

static void	\*usbsxx_statep;	/\* opaque state pointer \*/

	int    error;

	if ((error = mod_install(&modlinkage)) == 0) {
		error = ddi_soft_state_init(&usbsxx_statep,
		    usbser_soft_state_size(), 1);

	return (error);

GSD provides standard implementations of driver entry points, see for instance the qinit structures above. Functions that require the opaque state pointer need to be called explicitly:

usbsxx_getinfo(dev_info_t \*dip, ddi_info_cmd_t infocmd, void \*arg,
		void \*\*result)
	return (usbser_getinfo(dip, infocmd, arg, result,

static int
usbsxx_attach(dev_info_t \*dip, ddi_attach_cmd_t cmd)
	return (usbser_attach(dip, cmd, usbsxx_statep, &ds_ops));

static int
usbsxx_detach(dev_info_t \*dip, ddi_detach_cmd_t cmd)
	return (usbser_detach(dip, cmd, usbsxx_statep));

static int
usbsxx_open(queue_t \*rq, dev_t \*dev, int flag, int sflag, cred_t \*cr)
	return (usbser_open(rq, dev, flag, sflag, cr, usbsxx_statep));


ds_attach() should allocate and initialize the soft state, configure the device, figure out number of ports and register itself with the USB framework.

static int
usbsxx_ds_attach(ds_attach_info_t \*aip)
	usbsxx_state_t	\*xxp;

	xxp = (usbsxx_state_t \*)kmem_zalloc(sizeof (usbsxx_state_t), KM_SLEEP);
	xxp->xx_dip = aip->ai_dip;
	xxp->xx_usb_events = aip->ai_usb_events;
	\*aip->ai_hdl = (ds_hdl_t)xxp;

	if (usb_client_attach(xxp->xx_dip, USBDRV_VERSION, 0) != USB_SUCCESS) {
		usbsxx_cleanup(xxp, 1);
		return (USB_FAILURE);

	if (usb_get_dev_data(xxp->xx_dip, &xxp->xx_dev_data, USB_PARSE_LVL_IF,
	    0) != USB_SUCCESS) {
		usbsxx_cleanup(xxp, 2);
		return (USB_FAILURE);

	xxp->xx_def_pipe_handle = xxp->xx_dev_data->dev_default_ph;
	mutex_init(&xxp->xx_mutex, NULL, MUTEX_DRIVER,

	/\* ... device specific code ... \*/

	xxp->xx_dev_state = USB_DEV_ONLINE;

	/\* ... register USB events .. \*/

	\*aip->ai_port_cnt = 1;

	return (USB_SUCCESS);

Open and close

Typically ds_open_port() notifies the device of a port open with a special command, does per-port state initialization, opens USB pipes and kicks off data receipt (by submitting a Bulk In request or starting interrupt endpoint polling).

ds_close_port() should dismiss any leftover data characters (GSD is expected to drain and flush before closing, but we want to be on the safe side) and in general reverse what ds_open_port() has done.

static int
usbsxx_open_port(ds_hdl_t hdl, uint_t port_num)
	usbsxx_state_t	\*xxp = (usbsxx_state_t \*)hdl;
	usbsxx_port_t	\*pp = &xxp->xx_ports[port_num];

	if (usbsxx_open_pipes(pp) != USB_SUCCESS) {
		return (USB_FAILURE);
	if (usbsxx_send_cmd(pp, USBSXX_HW_OPEN) != USB_SUCCESS) {
		return (USB_FAILURE);
	pp->port_state = USBSXX_PORT_OPEN;
	usbsxx_rx_start(pp); /\* start data receipt \*/

	return (USB_SUCCESS);

static int
usbsxx_close_port(ds_hdl_t hdl, uint_t port_num)
	usbsxx_state_t	\*xxp = (usbsxx_state_t \*)hdl;
	usbsxx_port_t	\*pp = &xxp->xx_ports[port_num];

	usbsxx_fifo_flush(hdl, port_num, DS_TX | DS_RX);
	(void) usbsxx_send_cmd(pp, USBSXX_HW_CLOSE);
	pp->port_state = USBSXX_PORT_CLOSED;

	return (USB_SUCCESS);

Here is an example of a synchronous command request using Control pipe:

static int
usbsxx_send_cmd(usbsxx_port_t \*pp, uint16_t value, int16_t index)
	usb_ctrl_setup_t setup = { USBSXX_HW_WRITE_REQ_TYPE,
	usb_cb_flags_t	cb_flags;
	usb_cr_t	cr;

	setup.wValue = value;
	setup.wIndex = index;

	return (usb_pipe_ctrl_xfer_wait(pp->ctrl_ph, &setup, NULL,
	    &cr, &cb_flags, 0));

Port parameters

Port parameters need to be parsed and turned into device-specific actions. Baud rate may require additional conversion of the baud constants (B9600, etc) into device-specific values, like absolute rate values or UART divisors.

/\* zero means unsupported rate \*/
static int usbsxx_speedtab[] = {
	0,	/\* B0 \*/
	0,	/\* B50 \*/
	75,	/\* B75 \*/
	0,	/\* B110 \*/
	0,	/\* B134 \*/
	150,	/\* B150 \*/
	0,	/\* B200 \*/
	300,	/\* B300 \*/
	600,	/\* B600 \*/
	1200,	/\* B1200 \*/
	1800,	/\* B1800 \*/
	2400,	/\* B2400 \*/
	4800,	/\* B4800 \*/
	9600,	/\* B9600 \*/
	19200,	/\* B19200 \*/
	38400,	/\* B38400 \*/
	57600,	/\* B57600 \*/
	0,	/\* B76800 \*/
	115200,	/\* B115200 \*/
	0,	/\* B153600 \*/
	230400	/\* B230400 \*/

#define	NELEM(a)	(sizeof (a) / sizeof (\*(a)))

static int
usbsxx_set_port_params(ds_hdl_t hdl, uint_t port_num, ds_port_params_t \*tp)
	usbsxx_state_t	\*xxp = (usbsxx_state_t \*)hdl;
	usbsxx_port_t	\*pp = &xxp->xx_ports[port_num];
	int		i;
	uint_t		ui;
	ds_port_param_entry_t \*pe;

	pe = tp->tp_entries;
	for (i = 0; i < tp->tp_cnt; i++, pe++) {
		switch (pe->param) {
			ui = pe->val.ui;

			/\* for unsupported speeds return failure \*/
			if ((ui >= NELEM(usbsxx_speedtab)) ||
			    ((ui > 0) && (usbsxx_speedtab[ui] == 0))) {
				return (USB_FAILURE);

			/\* set baud rate \*/

			if (pe->val.ui & PARENB) {
				if (pe->val.ui & PARODD) {
					/\* set odd parity \*/
				} else {
					/\* set even parity \*/
			} else {
				/\* disable parity \*/

			if (pe->val.ui & CSTOPB) {
				/\* set stop bits \*/
			} else {
				/\* set stop bits \*/

			switch (pe->val.ui) {
			case CS5:
				/\* set 5 bits \*/
			case CS6:
				/\* set 6 bits \*/
			case CS7:
				/\* set 7 bits \*/
			case CS8:
				/\* set 8 bits \*/

			if (pe->val.ui & IXON || pe->val.ui & IXOFF) {
				uint8_t	xon_char, xoff_char;

				xon_char = pe->val.uc[0];
				xoff_char = pe->val.uc[1];

				/\* set XON/XOFF chars \*/

			if (pe->val.ui & CTSXON) {
				/\* enable hardware flow control \*/



	return (USB_SUCCESS);

Data transmission

Typical ds_tx() would queue up the data block and kick off the transmission, if not already.

static void
usbsxx_put_tail(mblk_t \*\*mpp, mblk_t \*bp)
	if (\*mpp) {
		linkb(\*mpp, bp);
	} else {
		\*mpp = bp;

static int
usbsxx_tx(ds_hdl_t hdl, uint_t port_num, mblk_t \*mp)
	usbsxx_state_t	\*xxp = (usbsxx_state_t \*)hdl;
	usbsxx_port_t	\*pp = &xxp->xx_ports[port_num];
	int		xferd;

	/\* sanity checks \*/
	if (mp == NULL) {
		return (USB_SUCCESS);
	if (MBLKL(mp) <= 0) {

		return (USB_SUCCESS);

	usbsxx_put_tail(&pp->tx_mp, mp);
	usbsxx_tx_start(pp, &xferd);

	return (USB_SUCCESS);

usbsxx_tx_start() is device specific, but typically the alrogithm is to take as much data off the queue as the device and the controller (see usb_pipe_get_max_bulk_transfer_size()) allow, submit a bulk request, and when the request completion callback - repeat until no data is left on the queue. Transmission ends by calling GSD's tx_cb() callback. It might also be necessary to wake up the data draining code by signalling the respective conditional variable.

Here's an example of a Bulk Out request:

static int
usbsxx_send_data(usbsxx_port_t \*pp, mblk_t \*data)
	usb_bulk_req_t	\*br;
	int		rval;

	br = usb_alloc_bulk_req(pp->dip, 0, USB_FLAGS_SLEEP);
	br->bulk_data = data;
	br->bulk_len = MBLKL(data);
	br->bulk_timeout = USBSXX_BULKOUT_TIMEOUT;
	br->bulk_cb = usbsxx_bulkout_cb;
	br->bulk_exc_cb = usbsxx_bulkout_cb;
	br->bulk_client_private = (usb_opaque_t)pp;
	br->bulk_attributes = USB_ATTRS_AUTOCLEARING;

	rval = usb_pipe_bulk_xfer(pp->bulkout_ph, br, 0);

	if (rval != USB_SUCCESS) {

	return (rval);

Data receipt

The driver is usually notified of the received data by a Bulk In or an Interrupt callback. The data is added to the list of received data, the GSD callback is invoked and the next request for receive is submitted.

static void
usbsxx_put_head(mblk_t \*\*mpp, mblk_t \*bp)
	if (\*mpp) {
		linkb(bp, \*mpp);
	\*mpp = bp;

usbsxx_bulkin_cb(usb_pipe_handle_t pipe, usb_bulk_req_t \*req)
	usbsxx_port_t	\*pp = (usbsxx_state_t \*)req->bulk_client_private;
	usbsxx_state_t	\*xxp = pp->soft_state;
	mblk_t		\*data;
	int		data_len;

	data = req->bulk_data;
	data_len = (data) ? MBLKL(data) : 0;

	if ((pp->port_state == USBSXX_PORT_OPEN) && (data_len) &&
	    (req->bulk_completion_reason == USB_CR_OK)) {
		/\* prevent USBA from freeing data along with the request \*/
		req->bulk_data = NULL;	

		/\* save data on the receive list \*/
		usbsxx_put_tail(&pp->rx_mp, data);

		/\* invoke GSD receive callback \*/
		if (pp->cb.cb_rx) {


	usbsxx_rx_start(pp); /\* receive more \*/

The only thing left for ds_rx() to do is simply return pp->rx_mp.

Flush and drain

static int
usbsxx_fifo_flush(ds_hdl_t hdl, uint_t port_num, int dir)
	usbsxx_state_t	\*xxp = (usbsxx_state_t \*)hdl;
	usbsxx_port_t	\*pp = &xxp->xx_ports[port_num];

	if ((dir & DS_TX) && pp->tx_mp) {
		pp->tx_mp = NULL;
	if ((dir & DS_RX) && pp->rx_mp) {
		pp->rx_mp = NULL;

	return (USB_SUCCESS);

Notice that freemsg() is used, but freeb(), because we want to free all b_cont-linked messages.

Data drain can occur at two levels: first draining DSD's internal buffer by waiting on a conditional variable and then draining device's buffer by sending a special command.

Compile and install

USB serial driver modules should be linked with the following parameters:

ld -r -dy -Nmisc/usba -Nmisc/usbser -o usbsxx usbsxx.o

This is to ensure that 'usba' (USB architecture) and 'usbser' (GSD) misc modules are loaded into the kernel memory before DSD is loaded.

Drivers should be installed using the standard add_drv(1M) command. In addition to that, an autopush entry should be added to /etc/iu.ap:

	usbsxx	-1	0	ldterm ttcompat

This is to ensure that ldterm(7M) and ttcompat(7M) are automatically pushed on top of the DSD. Verifying that the entry works is easy:

# strconf < /dev/cua/0

That's it, folks. I hope this blog proves useful to someone either writing a new USB serial driver for Solaris or debugging an existing driver. As always, email your comments and questions to artem dot kachitchkin at sun dot com.


Friday Dec 16, 2005

USB serial drivers, Part 3

[Part 1] Part 2]

Here I discuss some aspects of DSDI, the Device Specific Driver Interface - the interface between GSD and DSD. All DSDI definitions are in usbser_dsdi.h header file. There are plenty of comments there, so I'll skip the least interesting parts.


DSDI provides simple versioning via ds_version. DSD should always set it to DS_OPS_VERSION, which is then resolved to the right value during compilation:

enum {
        DS_OPS_VERSION_V0       = 0,
        DS_OPS_VERSION          = DS_OPS_VERSION_V0

Version number is passed to the GSD with the ds_ops structure as an argument to ds_attach(), which is the very first DSD->GSD call. The GSD will be able to provide DSD with the right version of interfaces or fail attach if it doesn't support this version.

Initial configuration

ds_attach() is called during driver attach(9E) phase:

        int     (\*ds_attach)(ds_attach_info_t \*aip);

The only argument is a pointer to the structure:

typedef struct ds_attach_info {
         \* passed to DSD:
        dev_info_t      \*ai_dip;        /\* devinfo \*/
         \* these event callbacks should be registered by DSD
         \* using usb_register_event_cbs()
        usb_event_t     \*ai_usb_events;
         \* returned by DSD:
        ds_hdl_t        \*ai_hdl; /\* handle to be used by GSD in other calls \*/
        uint_t          \*ai_port_cnt;   /\* number of ports \*/
} ds_attach_info_t;

Pretty self-explanatory, ai_dip and ai_usb_events are input parameters, ai_hdl and ai_port_cnt are output parameters. Attach is the right place to allocate and initialize per-device and per-port resources, download firmware, reset the device into a known state. Some drivers may also open USB pipes at attach time, although doing it at port open time is more preferable.

After attach, the driver would typically register callbacks using ds_register_cb():

typedef struct ds_cb {
        void            (\*cb_tx)(caddr_t);      /\* transmit callback \*/
        void            (\*cb_rx)(caddr_t);      /\* receive callback \*/
        void            (\*cb_status)(caddr_t);  /\* status change callback \*/
        caddr_t         cb_arg;                 /\* callback argument \*/
} ds_cb_t;

        int     (\*ds_register_cb)(ds_hdl_t, uint_t port_num, ds_cb_t \*cb);

Note that callback registration is per port. Typically the function pointers will be the same, but cb_arg is different to uniquely identify a port.

Working with ports

Before GSD uses a port, it opens it using ds_open_port(). This is usually done when application uses open(2) system call on the serial device.

        int     (\*ds_open_port)(ds_hdl_t, uint_t port_num);

Port open initializes the port, opens per-port USB pipes. It is also a good idea to ensure clean software and hardware state - do not assume that the preceding close performed all necessary cleanup.

Right after opening, port settings can be in an unknown state, so GSD sets port parameters, such as baud rate and parity, using ds_set_port_params():

	int     (\*ds_set_port_params)(ds_hdl_t, uint_t port_num, 
			ds_port_params_t \*tp);

The ds_port_params_t structure contains a variable-length array of parameters. This function can be called at any time.

In order to transmit one or more characters, the GSD calls ds_tx():

	int     (\*ds_tx)(ds_hdl_t, uint_t port_num, mblk_t \*mp);

The data is passed in a STREAMS message block. Currently this operation must always succeed: if DSD cannot transmit immediately, it should buffer the data. When data transfer is completed, DSD should notify GSD by calling the previously registered cb_tx() callback.

Data receipt is backwards: when data arrives, DSD buffers it, calls cb_rx() callback. GSD then calls ds_rx(), which returns all available data:

	mblk_t  \*(\*ds_rx)(ds_hdl_t, uint_t port_num);

The returned mblk_t can be a linked list of blocks (through the b_cont field). See also Part 2 for description how error bytes are represented.

Other operations are pretty self-explanatory and/or described in the header file.

In the fourth and the last part I will provide C code that can be used as a starting point for writing a new DSD.


Sunday Nov 20, 2005

USB serial drivers, Part 2

[Part 1]

The generic serial driver (GSD) hides a great deal of termio(7I) complexity from the USB serial driver writers. Another major benefit is that it ensures compliance with UNIX standards, such as Single UNIX specification (see chapter 11, General Terminal Interfaces). VSX-PCTS test suite includes terminal interface tests that all USB serial drivers should pass. Today I am going to discuss some aspects of GSD implementation.


Open(2) implementation for serial devices is quite complicated, there a many rules to follow.

Recall that there are two types of device nodes per serial port: /dev/term (aka tty lines) and /dev/cua (aka dial-out lines). When an application attempts to open a tty line, the open(2) system call should block until Carrier Detect signal is asserted. Dial-out opens do not block and succeed immediately.

Now recall that there can be multiple applications attempting to open the same device simultaneously. For example, while one application is blocked in open(2) waiting for Carrier Detect, another opens the corresponding dial-out line; in this case, the dial-out open should succeed and the first application's open(2) should unblock and fail.

Open behavior also varies depending on the O_NONBLOCK/O_NDELAY flags, soft carrier setting and "ignore-cd" device properties.

usbser_open_setup() function accounts for all possible scenarios through the used of a state machine.


Quite a few things should happen before a serial device can be closed:

  • Remaining data must be drained, first from the local software buffer, then from the hardware FIFO.
  • Any outstanding break and delay requests must be cancelled.
  • The line must be hung up by dropping RTS and DTR lines.

This is done in usbser_close().


The driver uses two threads, one each for read and write message processing. Strictly speaking, separate threads are not needed in Solaris 9 and up since the STREAMS scheduler was improved with dynamic task queues. However, GSD was first written for Solaris 8, and at that time is was problematic using blocking USB requests in the STREAMS context.

So instead of doing processing in the STREAMS service routine, this is done in usbser_wq_thread and usbser_rq_thread, which sleep until woken for new requests. Look for calls to usbser_thr_wake().

Due to asynchronous nature of USB requests, operations that usually happen atomically, take several steps. To prevent multiple operations from stepping on each other, some of them are serialized using usbser_serialize_port_act() and usbser_release_port_act().

The core of write-side processing is usbser_wmsg: it dequeues one message at a time and dispatches it depending on type, such as M_STOP, M_DATA or M_IOCTL.


usbser_ioctl() handles various terminal ioctls, passed via M_IOCTL messages. Setting port parameters, sending break, draining and flushing data, enabling internal loopback mode (great for testing), getting and setting port signals - all this is done here by calling down to DSD. Not all ioctls are handled by GSD, however, some of them are passed to the ttycommon_ioctl() function, which is a generic terminal ioctl implementation used by all Solaris serial drivers.

Status changes

When the DSD detects modem status line changes, such as CTS or CD, it notifies GSD with the status callback. usbser_status_cb(), the callback handler, does not process this request immediately. Instead, it sets a special flag, USBSER_FL_STATUS_CB, and wakes up the write thread. usbser_wq_thread() checks the flag and if it's set, calls the actual handler, usbser_status_proc_cb().

Because the status callback is called from DSD, and the callback handler may need to call other DSD functions, we avoid recursive calls into DSD by delegating status handling to the write thread. Back in Solaris 8, the DSD callback was called in the interrupt context, so a recursive call into DSD could lead to a deadlock.

Outgoing data

usbser_data() simply takes an M_DATA message and passes it on to the DSD by calling USBSER_DS_TX() for transmission. The DSD can't refuse the message, if it needs to postpone transmission, it has to maintain its own queue.

When DSD is done transmitting the data, is would call back into GSD. usbser_tx_cb() callback handler simply wakes up the write thread to check for any new data to transfer.

Incoming data

When DSD receives new data, it lets GSD know via RX callback. See usbser_rx_cb() how it retrieves the data from DSD's queue by calling USBSER_DS_RX(), processes it and sends upstream.

Data processing is the interesting part. If data was received without errors, then it's just a linked list of message blocks (mblk_t), which will be put on the stream's read queue after being processed by usbser_rx_massage_data(). This extra step is required for standards compliance: under certain conditions, a received character '\\377' (0xFF) should be read by application as two such characters, i.e. '\\377' followed by another '\\377'.

If a character was received with a framing or parity error, the DSD must pass it it to GSD as an M_BREAK message. The first byte should contain the error code, with the character in the second byte. The GSD then does the right thing for termio in usbser_rx_massage_mbreak().

Flow control

Flow control is needed when a receive buffer on either end of communication becomes full and it signals the transmitting end to suspend transmission temporarily, until signalled to resume.

There are two types of signalling:

  • Hardware flow control is enabled by the receiving end by deasserting, i.e. setting to logical false, the RTS (Request to Send) line. At the transmitter's end, RTS is seen as CTS (Clear to Send). When the transmitter sees CTS deasserted, it should stop sending any new data. When CTS is asserted again, it can start sending data.
  • Software flow control is enabled by sending a XOFF character, which can be simulated done by pressing Ctrl-S on the terminal. To resume data flow, a XON character (Ctrl-Q) is sent.

Both types of flow control can be implemented in hardware. However, not all devices support flow control, and because GSD should work with any device, it implements both types in software.

Inbound flow control happens when our local queue becomes full (reaches the high watermark) and we want to ask the device to stop transmission. Both hardware and software type of flow control are done in usbser_inbound_flow_ctl().

Outbound flow control happens when the other end asks us to stop transmission. For the hardware type,the GSD detects CTS transition through the status callback, see usbser_status_proc_cb(). When CTS turns 0, an M_STOP message is put on GSD's local write queue; when CTS turns 1, an M_START message is put. usbser_wmsg() then calls usbser_stop() and usbser_start() respectively, to suspend or resume data transmittion.

Next time - DSDI.


Friday Nov 18, 2005

USB serial drivers, Part 1


USB serial adapter is a USB device that tunnels asynchronous serial protocols, such as RS-232 and RS-485, through USB. It allows to add serial lines to computers with insufficient number of built-in serial ports. The most common need for these devices is on laptops to connect other machines' serial consoles, GPS devices, PDAs, etc.

Traditionally Solaris supported class devices only, such as printer class or mass storage class. A class driver can support devices from various vendors as long as they follow the class specification. Unfortunately, USB serial devices do not belong to any class: every device vendor has to invent a unique, often proprietary protocol, and therefore requires a separate driver.

Eventually a project was funded to support one type of USB serial adapters. We selected the Edgeport series from Inside-Out Networks. After a long break, drivers for Keyspan devices and those based on the Prolific PL2303 chip are coming soon to the next Solaris Express build. Due to the proprietary nature of device protocols, it may take some time for the driver sources to propagate to OpenSolaris.

This is the first in a series of articles about writing USB serial drivers for Solaris.


A typical USB serial device consists of a USB interface IC, firmware executed by an embedded CPU, and one or more UARTs (Universal Asynchronous Receiver-Transmitter). Firmware can be downloadable or upgradable by the driver. Firmware is responsible for bridging the two hardware interfaces, encoding outgoing and decoding incoming USB packets. At the other end, the driver similarly decodes/encodes USB packets and provides applications with the standard serial port API.


While each protocol is unique, functionality is largely the same: set serial parameters, such as baud rate and flow control, get modem status, set control lines, transmit and receive data. All these functions are available to applications via the standard UNIX termio(7I) programming interface.

The Solaris terminal subsystem, including the termio interface, is based on the STREAMS framework. STREAMS code is not easy to write, especially serial drivers, due to extreme asynchronicity and everything that it entails. It took some time for se(7D) and su(7D) to stabilize and we're still finding bugs in them once in a while.

It comes as no surprise that we decided to put USB termio implementation into a common module used by all USB serial drivers. We call it generic serial driver, GSD.

The bottom part of the driver that implements the vendor-specific USB protocol is called device-specific driver, DSD. The interface between these two parts is well-defined, we call it DSDI.

DSD in turn talks to the kernel USB framework, also known as USBA.

These layers are illustrated by the following diagram:

Note that GSD does not interact with USBA directly. In fact, GSD turned out generic enough to write any kind of serial driver, not just USB: it is possible, for example, to reimplement se(7D) and su(7D) using GSD. The only USB-specific features utilized in GSD are logging and hotplug events.


Next I'll discuss GSD, then take a closer look at DSDI and show examples of DSD code.


Thursday Nov 17, 2005

ZFS on the go

UPDATE 19-Jul-2006:

It looks like this blog entry is still getting hits from search engines and the USB FAQ, so here's an update. Since I wrote the entry, the project I said was under way has now been integrated in Nevada Build 36, Solaris Express 4/06 and Solaris 10 6/06 Release. It is mentioned in Dan's what's new blog as:

\* Hotpluggable drives are now better able to accomodate EFI-labels and device IDs, both of which are very important to supporting ZFS on USB and Firewire disk drives. [6348407]

If you use one of these releases or later, the hack described below is not necessary (and as Brian noted, may cause some error messages).


ZFS is awesome. I'd bust some zeerleading moves for y'all, but I misplaced my pom poms and it wasn't going to be pretty anyway. Instead I will tell you how to try ZFS on your laptop (or any computer without a spare fixed disk). There is no supported way to create ZFS pools on USB disks yet, although a project is under way to rectify this (you can have limited ZFS functionality using lofi(7D)). What follows is a hack, use it at your own risk.

Before we proceed, you should get vold(1M) out of the way:

# svcadm disable volfs

An important thing to know in this context is that you can't create a pool on removable media devices, i.e. the ones whose storage media can be removed. DVD drives and flash readers are removable media, hard disks are not. So you'll need one or more USB hard disks. Most USB hard disks and IDE-to-USB enclosures should work. Thumb drives are not likely to work. In this experiment I'm using two USB 2.0 disks: a 20GB LaCie and a 40GB IOGEAR.

For the reasons that are beyond this blog entry, the Solaris USB driver presents any USB storage device as removable media. A simple command to list removable and non-removable disks is format(1M). Quoting the man page:

     Removable media devices are listed only when  users  execute
     format  in expert mode (option -e).

Run format with and without -e and notice the difference:

# format
Searching for disks...done

       0. c1d0 
Specify disk (enter its number): \^C
# format -e
Searching for disks...done

       0. c1d0 
       1. c4t0d0 
       2. c5t0d0 
Specify disk (enter its number): \^C

c4t0d0 and c5t0d0 are USB disks.

What we're going to do is tell the USB driver not to treat hard disks as removable media. This can be done by appending the following line to /kernel/drv/scsa2usb.conf file:

attribute-override-list =
 "vid=\* removable=false";

Reboot for these changes to take effect. Now all your USB hard disks are going to be treated as fixed (but you can still hotplug them). You can verify that by running format without -e option - if it still doesn't list your USB disk, most likely it's one of the rare samples that pretends to be removable media and that's a hack for another day.

Now you can use these disks just like any fixed disk:

# zpool create usbpool mirror c4t0d0 c5t0d0
# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
usbpool                27.8G   33.0K   27.7G     0%  ONLINE     -
# zpool status
  pool: usbpool
 state: ONLINE
 scrub: none requested

        NAME        STATE     READ WRITE CKSUM
        usbpool     ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
# zfs list
usbpool                 32K  27.5G     8K  /usbpool


Thursday Sep 08, 2005

Graphing the Solaris kernel

Free Code Graphing Project was an attempt to visualize Linux kernel source code. Now that Solaris is free code, too, why not try to graph it. This blog entry describes my little effort of debugging the thing into existence. I can't afford to spend any more time on this, so hopefully someone picks the baton.

The end result is a huge PostScript file that looks like this:

Sample close-up, common/vm/seg_kmem.c:

Roughly, the algorithm used:

  • Function: sequential code is a line, loop is an arc, branch splits the line.
  • Color: normal functions are blue, static functions are green.
  • File: functions are put in boxes, sorted by size and arranged in a spiral.
  • Ring: Files are put in boxes, sorted by size and arranged in a circle.
  • Rings can be broken into sectors.

Here's how rings are defined for Linux:

RING1:=init lib mm kernel ipc
RING2:=net fs
RING3:=$(subst $(KERNEL_DIR)/,,$(wildcard $(KERNEL_DIR)/arch/\*))
RING4:=drivers crypto sound security

Solaris has a bit different directory structure. Here's what I used:

RING1:=common/os common/vm common/c2 common/cpr common/disp common/dtrace common/exec common/krtld common/syscall
RING2:=common/fs common/inet common/ipp common/ktli common/rpc
RING3:=i86pc sparc sun sun4 sun4u sun4v

Steps to make it work in Solaris

1. Prerequisites:

2. Set you KERNEL_DIR environment variable to usr/src/uts subdirectory of OpenSolaris source, e.g.:

KERNEL_DIR=$HOME/solgraph/usr/src/uts; export KERNEL_DIR

3. Preprocess OpenSolaris code, since the fcgp parser does not like Solaris coding style (ignore warnings):

chmod -R +w $KERNEL_DIR
find $KERNEL_DIR -name '\*.c' -exec indent -npsl {} \\;

4. Modify your path to look in /usr/xpg4/bin first: Alternatively you can replace all 'tail -n number' occurences in fcgp scripts with 'tail -number', but it's easier to just use /usr/xpg4/bin/tail.

PATH=/usr/xpg4/bin:$PATH; export PATH

5. If using X, turn off the bell. Fcgp spits out some escape sequences that would drive you nuts (they might not look pretty on some terminals either):

xset -b

6. Kick off the build process. Takes about 9 minutes on my Ferrari 3400.

cd lgp-2.6.0a-solaris
gmake CC=gcc

7. Generate viewable/printable postscript files. Examples for a single file and for a 6x6 poster:

./posterize a4 1
./posterize a4 6

See README for more information on printing and assembling a poster.

To do list

  • Fix the parser to handle Solaris coding style.
  • Some files, e.g. common/c2/audit_syscalls.c and common/disp/ts.c, are not rendered correctly, resulting in very large bounding boxes. The rest of the files are scaled down too much. This problem might take care of itself when parser is fixed.
  • Replace the Tux in the center of the circle with OpenSolaris logo. See
  • Figure out what's in 'codemap' and 'zoom' directories and make it work.
  • Come up with an easy way to locate files and functions on the map.

I won't have time for these in the next two months, so feel free to pick up where I left. I can be reached at artem dot kachitchkin at sun dot com.


Thursday Jul 14, 2005

A minor instance of driver confusion

Most driver writers are familiar with device minor numbers. But a notion of an instance number is specific to Solaris and it is often incorrectly assumed that minor == instance. Let's see if we can see clear that up.

Instance number

A driver handles devices of a certain type. There can be multiple instances of the same device type in the system. For example, there are two asynchronous serial devices (UARTs) on my system, handled by the asy(7D) driver and represented by two nodes in the device tree:

    $ prtconf -D | grep asy
            asy, instance #0 (driver name: asy)
            asy, instance #1 (driver name: asy)

These are the two instances of the same driver: sharing binary code in the kernel address space, but maintaining separate data 1. Instances are numbered, in this case 0 and 1. Instance numbers are assigned permanently to physical devices in the system; the mapping is maintained in the /etc/path_to_inst file:

    $ grep asy /etc/path_to_inst 
    "/isa/asy@1,3f8" 0 "asy"
    "/isa/asy@1,2f8" 1 "asy"

There are also drivers without respective hardware, called pseudo drivers. An example of a pseudo driver is random(7D), the random number generator driver. Pseudo drivers have no physical device nodes to attach to. We have to synthesize artificial instances via driver.conf file:

    $ cat /kernel/drv/random.conf | egrep -v '\^#|\^$'
    name="random" parent="pseudo" instance=0;

The system obeys and creates a node for instance 0:

    $ prtconf -DP | grep random
            random, instance #0 (driver name: random)

Drivers can retrieve their instance number with ddi_get_instance().

Minor number

Outside the kernel, devices are identified by their {major, minor} pairs, also know as dev_t. The major number range is owned by the system. It assigns major numbers when drivers are installed and keeps the list in /etc/name_to_major:

    $ grep asy /etc/name_to_major
    asy 106

The minor number range is owned by the driver. A minor number becomes visible to the userland when the driver creates a minor node using ddi_create_minor_node() 2, which takes minor number as one of its arguments:

    int ddi_create_minor_node(dev_info_t \*dip, char \*name, int
        spec_type, minor_t minor_num, char \*node_type, int flag);

Problem is, the minor number range is shared among all driver instances, so each instance can only use a subrange. If each instance needs exactly one minor node, then the node's minor number can equal instance number, i.e. the 4th argument in ddi_create_minor_node() is set to instance number.

Things get a bit fancier in case of multiple minor nodes per instance. Serial drivers typically create two nodes per node: one in /dev/term and one in /dev/cua:

$ ls -lL /dev/{term,cua}/[ab]
crw-------   1 uucp     uucp     106, 131072 Jun 29 19:04 /dev/cua/a
crw-------   1 uucp     uucp     106, 131073 Jun 29 19:04 /dev/cua/b
crw-rw-rw-   1 root     sys      106,  0 Jun 29 19:04 /dev/term/a
crw-rw-rw-   1 root     sys      106,  1 Jun 29 19:04 /dev/term/b

106, as we saw earlier, is asy major number. Minor numbers 0 and 131072 (0x20000) belong to instance 0, and 1 and 131073 (0x20001) to instance 1. The instance/minor mapping is not 1:1 here.


The driver holds the algorithm for mapping instances into minors and vice versa. When the kernel needs to find out instance number from a minor number, it has to ask the driver by calling getinfo(9E) entry point with the DDI_INFO_DEVT2INSTANCE command. One example could be found in spec_open() which calls e_ddi_hold_devi_by_dev().

1 Per-instance data is called soft state and explicitly supported in DDI, see ddi_soft_state(9F) man page.

2 A whole lot of action can be triggered by ddi_create_minor_node(). See dacfc_match_create_minor() and /etc/dacf.conf file on your nearest Solaris system.


Tuesday Jun 14, 2005

scsa1394 "symbios" workaround

From a driver writer's perspective, CPUs are amazing devices: they manage to stay extremely reliable while implementing extremely complex logic. Neither is true of most peripheral devices. It is usually not practical to put huge effort into design and testing of cheap, short-lived devices. Which to some degree explains device drivers' reputation for being too obscure, overcomplicated and continuously changing: they reflect imperfections of their hardware... and sometimes of their programmers, but that is an altogether different subject - this blog is about a bug that exists in some IEEE 1394 (FireWire) mass storage devices and how it's been worked around in the scsa1394(7D) driver.

1394 mass storage protocol is based on the Serial Bus Protocol 2. SBP-2 allows to encapsulate arbitrary command sets on buses compliant with IEEE 1212. In case of scsa1394, the IEEE 1212 compliant bus is IEEE 1394 and it transports SCSI commands submitted by SCSI target drivers, such as sd(7D) and st(7D), or via uscsi(7I) interface.

For SCSI commands with data phase, scsa1394 has to map data buffers for DMA transfer. When a buffer cannot be mapped into contiguous I/O memory, the kernel attempts to map it into non-contiguous chunks described by the ddi_dma_cookie(9S) structure. Each cookie contains a DMA address of the memory chunk and its size. Cookies are then programmed into device's DMA engine. This type of DMA transfers is known as scatter-gather.

SBP-2 provides a facility for DMA scatter-gather called page tables. A page table is an array of entries of the following format 1:

 31                                                                0
|         segment_length          |        segment_base_hi          |
|                           segment_base_lo                         |

Page table itself must be located in contiguous I/O space. Once the driver prepares an ORB 2 and attaches a valid page table to it, it signals the device to read them in and perform data transfer. SBP-2 segments fit neatly into Solaris DMA cookie concept: segment length corresponds to cookie size and segment base to cookie address.

SBP-2 does not put any restrictions, other than bit width, neither on segment length, nor the total length of segments in a page table. However, some devices would not function correctly unless page tables meet two requirements:

  • segment length == 4K;
  • total length < 128K;

Evidently, not all vendors use dedicated test software for their devices, some limit themselves to compliance testing on a few target OS/CPU. As far as we know, this bug only exists in devices based on SYM13FW\* chip series, hence known as the "symbios bug".

Solaris driver had to provide a workaround. Satisfying symbios page table limits requires that we divide cookies into 4K segments, which breaks the neat 1:1 mapping between DMA cookies and SBP-2 segments. So I had to introduce an additional mapping layer described by struct scsa1394_cmd_seg:

typedef struct scsa1394_cmd_seg {
	size_t			ss_len;
	uint64_t		ss_daddr;
	uint64_t		ss_baddr;
	t1394_addr_handle_t	ss_addr_hdl;
} scsa1394_cmd_seg_t;

ss_daddr here refers to segment's DMA address and ss_baddr to its 1394 bus address. scsa1394_cmd_dmac2seg() function is responsible for turning cookies into segments. Page table is then created from the segment array in scsa1394_sbp2_seg2pt().

If you ever used Solaris DDI DMA routines, you might think: why not just use ddi_dma_attr(9S) structure to limit cookie size by setting dma_attr_seg to 0xFFF? This is because dma_attr_seg cannot be less that base page size, so 4K would not work on SPARC (or x86 should the page size ever increase).

As if that wasn't enough, there seems to be no way to distinguish between devices with the symbios bug and devices without it. No combination of configuration ROM3 keys provides reliable indicator of a buggy device. This led me to a difficult decision to enable the symbios workaround by default, sacrificing performance for data integrity. Then I thought: we can't blacklist bad devices, but maybe we could whitelist the good ones. That is how the white list was invented:

scsa1394_bw_list_t scsa1394_sbp2_symbios_whitelist[] = {
        { SCSA1394_BW_ONE, 0x0a27 },		/\* Apple \*/
        { SCSA1394_BW_ONE, 0xd04b }		/\* LaCie \*/

It contains vendor IDs of those companies that have never, to the extent of our knowledge, produced devices based on the buggy chip. We'll be adding more to the list in the future. In order to test a device for the symbios bug, the workaround can be disabled by setting scsa1394_wrka_symbios variable to 0:

temporarily in mdb:

	# echo scsa1394_wrka_symbios/W 0 | mdb -kw

or permanently in /etc/system:

	set scsa1394:scsa1394_wrka_symbios = 0

1 These are so called unrestricted page tables. SBP-2 also defines normalized page tables, but at the time of this writing they are not used in scsa1394.

2 ORB: Operation Request Block, prepared by the initiator in I/O memory and read by the target. ORB is a polymorphic data structure: see usr/src/uts/common/sys/sbp2/defs.h for its multiple incarnations.

3 Every 1394 device contains a special address region called configuration ROM. It's a hierarchical key-value structure that describes various device attributes such as device class, vendor/device ID, etc. For those familiar with PCI, it is similar in purpose to PCI configuration space: it is basically what makes these devices self-identifying.

Technorati Tag:
Technorati Tag:
Technorati Tag:
Technorati Tag:

Monday Jun 13, 2005


My name is Artem Kachitchkine2, I'm an engineer in the Operating Platforms Group. Since I joined Sun a few years ago (virtually straight out of college) I've been working on a number of I/O related projects. I started by fixing bugs in Solaris serial and parallel port drivers, then participated in the bringup of the Sun Blade workstation series. After that, I moved into the USB land and later took ownership of the 1394 (FireWire) framework in Solaris, during which I wrote av1394(7D) and scsa1394(7D) drivers and introduced framework extensions to support new drivers. I was also part of the team that ported Solaris Fibre Channel stack aka Leadville to x64 platforms. Lately I've been busy working on various aspects of mass storage and removable media management in Solaris.
Update Oct 2007: After finishing Tamarack and starting WWAN OpenSolaris projects, I joined the networking group and now contributing to Brussels.

1AT&C0 is the Hayes modem command that means "assume data carrier is present". I often feel that way during blogging, i.e. assuming someone is listening.
2 Do not attempt to pronounce or memorize my last name, it will hurt and I don't want to be liable.




Top Tags
« June 2016