Wednesday Jun 14, 2006

OpenSolaris a Gemini?

Yes, OpenSolaris is a Gemini !!
It is probably more famous than these Geminis were when they were a year old.
Happy Birthday !!

Monday Jun 12, 2006

::softint added for x64 servers

On SPARC based systems, ::softint MDB macro provided software interrupt information. This macro was not ported over to the x64 side. Now, starting with Solaris 11 Build 42 onwards one can get live software interrupt info on the x64 systems.

# echo ::softint | mdb -k
ffffffff8634d880 0 1 ffffffff8c547000 0 nge_chip_factotum
ffffffff854e6800 0 1 ffffffff8c547000 0 nge_reschedule
ffffffff84ceb380 0 1 ffffffff86a0d000 0 nge_chip_factotum
ffffffff84ceb500 0 1 ffffffff86a0d000 0 nge_reschedule
ffffffff84536f00 0 1 ffffffff82bb7b40 0 errorq_intr
fffffffffbc05b08 0 1 0 0 softlevel1
ffffffff8453c380 0 2 ffffffff82bb2400 0 errorq_intr
ffffffff8453c200 0 2 ffffffff82bb7e00 0 errorq_intr
fffffffffbc008a8 0 2 0 0 cbe_low_level
ffffffff8b487d80 0 4 ffffffff91113200 0 power_soft_intr
ffffffff861db180 0 4 ffffffff856bb198 0 ghd_doneq_process
ffffffff854e6880 0 4 ffffffff856bb198 0 ghd_timeout_softintr
ffffffff8453c800 0 4 0 0 asysoftintr
ffffffff84536d80 0 4 ffffffff842248d8 0 ghd_doneq_process
ffffffff84536c80 0 4 ffffffff842248d8 0 ghd_timeout_softintr
ffffffff8453cf80 0 9 ffffffff8558eda0 0 hcdi_soft_intr
ffffffff8453cb00 0 9 ffffffff82cd88c0 0 hcdi_soft_intr
fffffffffbc00868 0 10 0 0 cbe_softclock

Note that with Advanced DDI Interrupt Interfaces, interrupt service routines take two arguments (ARG1, ARG2) and the software interrupt identifier is ADDR.

Technorati Tag: Technorati Tag:

Friday Apr 14, 2006

Equivalent of 'cat /proc/interrupts'

        New MDB dcmd ::interrupts

Solaris Express now has support for a new MDB dcmd ::interrupts
that shows similar output to that of 'cat /proc/interrupts' on Linux.

Output of ::interrupts on a x64 running 64-bit Solaris Express:
(System is using IOAPIC)

# echo ::interrupts | mdb -k
IRQ  Vector IPL Bus   Type  CPU Share APIC/INT# ISR(s)
3    0xb1   12  ISA   Fixed 1    1    0x0/0x3   asyintr
4    0xb0   12  ISA   Fixed 1    1    0x0/0x4   asyintr
6    0x42   5   ISA   Fixed 3    1    0x0/0x6   fdc_intr
9    0x80   9   PCI   Fixed 1    1    0x0/0x9   acpi_wrapper_isr
15   0x41   5   ISA   Fixed 2    1    0x0/0xf   ata_intr
16   0x81   9   PCI   Fixed 1    3    0x0/0x10  hci1394_isr, uhci_intr, uhci_intr
17   0x82   9   PCI   Fixed 2    1    0x0/0x11  audio810_intr
18   0x40   5   PCI   Fixed 3    3    0x0/0x12  ata_intr, uhci_intr, ata_intr
19   0x22   1   PCI   Fixed 1    1    0x0/0x13  uhci_intr
23   0x20   1   PCI   Fixed 0    1    0x0/0x17  ehci_intr
24   0x45   5   PCI   Fixed 0    1    0x1/0x0   adpu320_intr
48   0x60   6   PCI   Fixed 2    1    0x2/0x0   e1000g_intr
72   0x43   5         MSI   3    1    -         mpt_intr
73   0x44   5         MSI   3    1    -         mpt_intr
160  0xa0   0         IPI   ALL  0    -         poke_cpu
192  0xc0   13        IPI   ALL  1    -         xc_serv
208  0xd0   14        IPI   ALL  1    -         kcpc_hw_overflow_intr
209  0xd1   14        IPI   ALL  1    -         cbe_fire
210  0xd3   14        IPI   ALL  1    -         cbe_fire
224  0xe0   15        IPI   ALL  1    -         xc_serv
225  0xe1   15        IPI   ALL  1    -         apic_error_intr

Where IPL is the interrupt priority, APIC is the local APIC,
Driver name is represented as <driver_name><instance#>
and Type shows where Fixed (legacy) or MSI interrupts are being used.
IPI interrupt type indicates xcalls.
The Share column shows if the interrupt is being shared and by how many.

On a uppc(7d)  based system, the sample output is shown here

# echo ::interrupts | mdb -k 
> ::interrupts
IRQ  Vector IPL(lo/hi) Bus Share ISR(s)
  0   0x20     14/14    -     1  cbe_fire
  3   0x23     12/12   ISA    1  asyintr
  4   0x24     12/12   ISA    1  asyintr
  5   0x25      1/1    PCI    1  ehci_intr
  6   0x26      5/5    ISA    1  fdc_intr
  7   0x27      5/5    ISA    1  ecpp_isr
  9   0x29      9/9     -     1  acpi_wrapper_isr
 10   0x2a      1/9    PCI    3  e1000g_intr, hci1394_isr, uhci_intr
 11   0x2b      1/1    PCI    1  uhci_intr
 12   0x2c      1/9    PCI    2  uhci_intr, audiovia823x_intr
 14   0x2e      5/5    PCI    1  ata_intr
 15   0x2f      5/5    PCI    1  ata_intr
# echo ::interrupts -d | mdb -k
IRQ  Vector IPL(lo/hi) Bus Share Driver Name(s)
  0   0x20     14/14    -     1  cbe_fire
  3   0x23     12/12   ISA    1  asy#1
  4   0x24     12/12   ISA    1  asy#0
  5   0x25      1/1    PCI    1  ehci#0
  6   0x26      5/5    ISA    1  fdc#0
  7   0x27      5/5    ISA    1  ecpp#0
  9   0x29      9/9     -     1  acpi_wrapper_isr
 10   0x2a      1/9    PCI    3  e1000g#0, hci1394#0, uhci#1
 11   0x2b      1/1    PCI    1  uhci#0
 12   0x2c      1/9    PCI    2  uhci#2, audiovia823x#0
 14   0x2e      5/5    PCI    1  ata#0
 15   0x2f      5/5    PCI    1  ata#1

On a Niagara based SunFire T2000, the output looks like this:

# echo ::interrupts | mdb -k

        Device   Shared  Type    MSG #   State   INO     Mondo    Pil    CPU   
      e1000g#1      no   MSI        1    enbl    0x19    0x799      6     26 
      e1000g#0      no   MSI        0    enbl    0x18    0x798      6     18 
          px#0      no   PCIe      27    enbl    0x3b    0x7bb      1     30 
          px#0      no   PCIe      51    enbl    0x3a    0x7ba     14     31 
          px#0      no   PCIe      49    enbl    0x39    0x7b9     14      0 
          px#0      no   PCIe      48    enbl    0x38    0x7b8      9      1 
      e1000g#3      no   MSI        2    enbl    0x1a    0x7da      6     28 
      e1000g#2      no   MSI        1    enbl    0x19    0x7d9      6     27 
          su#0      no   Fixed    ---    enbl    0x2     0x7c2     12     20 
        uata#0      no   Fixed    ---    enbl    0x4     0x7c4      4      3 
        ohci#1      no   Fixed    ---    enbl    0x3     0x7c3      9     22 
        ohci#0      no   Fixed    ---    enbl    0x1     0x7c1      9     23 
         mpt#0      no   MSI        0    enbl    0x18    0x7d8      4     19 
          px#1      no   PCIe      27    enbl    0x3b    0x7fb      1     24 
          px#1      no   PCIe      51    enbl    0x3a    0x7fa     14     25 
          px#1      no   PCIe      49    enbl    0x39    0x7f9     14     26 
          px#1      no   PCIe      48    enbl    0x38    0x7f8      9     27 

Here interrupt Type PCIe implies it is using PCI Express  INTx.

On a SunFire V890, (which does not support MSI interrupts, the ouput looks like this:

# echo ::interrupts | mdb -k
        Device   Shared  Type    MSG #   State   INO     Mondo    Pil    CPU   
        uata#0      no   Fixed    ---    enbl    0x1c    0x21c      4     19 
          ge#0      no   Fixed    ---    enbl    0x0     0x200      6      7 
         qlc#0      no   Fixed    ---    enbl    0x4     0x204      4     18 
          su#1      no   Fixed    ---    enbl    0x2d    0x26d     12      1 
          su#0      no   Fixed    ---    enbl    0x2e    0x26e     12      0 
     pcf8584#2     yes   Fixed    ---    enbl    0x28    0x268      4      2 
     pcf8584#3     yes   Fixed    ---    enbl    0x28    0x268      4      2 
         eri#0      no   Fixed    ---    enbl    0x1d    0x25d      6     17 
        ohci#0      no   Fixed    ---    enbl    0x1f    0x25f      9     20 
          se#0      no   Fixed    ---    enbl    0x22    0x262     12     21 
   todds1287#0      no   Fixed    ---    enbl    0x24    0x264     15     22 
     hpc3130#3      no   Fixed    ---    enbl    0x26    0x266      1     23 
     hpc3130#0     yes   Fixed    ---    enbl    0x27    0x267      1      7 
     hpc3130#1     yes   Fixed    ---    enbl    0x27    0x267      1      7 
     hpc3130#2     yes   Fixed    ---    enbl    0x27    0x267      1      7 
     pcf8584#1     yes   Fixed    ---    enbl    0x23    0x263      4      0 
     pcf8584#0     yes   Fixed    ---    enbl    0x23    0x263      4      0 

Technorati Tag: Technorati Tag:

Wednesday Mar 29, 2006

New Article on Advanced DDI interrupt interfaces

A new article has been posted to  that has all the details about
Advanced DDI Interrupt Handlers.

Thanks to John Stearns for making it possible.

Technorati Tag: Technorati Tag:

Wednesday Dec 28, 2005

Few more MSI capable drivers

Since initial MSI support which was added in OpenSolaris, number of drivers that
use MSIs (if the underlying hardware is capable) has increased. List includes:
  • bge (Broadcom GbE)
  • mpt (LSI Logic SCSI HBA)
  • ehci/ohci/uhci (USB EHCI/OpenHCI/UHCI)
  • e1000g (Intel GbE)
  • tavor (InfiniBand HCA)
  • px_pci (Sparc PCIe to PCI bridge driver)
  • qlc (Qlogic FC HBA)

In addition, the following drivers use the new Advanced DDI Interrupt Framework natively.
  • audio1575 (Sparc Audio driver )
  • pcie_pci (x86 PCIe bridge driver)
  • qcn (on Sparc servers)
  • rge (Realtek GbE driver)
  • nge (Nvidia GbE driver)
  • SATA HBA drivers

  • Technorati Tag: Technorati Tag:

Friday Nov 04, 2005

Solaris b26: changes to DDI interrupt interfaces

Legacy DDI interrupt interfaces update

More legacy DDI interrupt interfaces are obsoleted

With Solaris b26 the following DDI Interrupt interfaces and data structures are being obsoleted:
  • ddi_dev_nintrs(9f)
  • ddi_get_iblock_cookie(9f)
  • ddi_intr_hilevel(9f)
  • ddi_add_intr(9f)
  • ddi_remove_intr(9f)
  • ddi_get_soft_iblock_cookie(9f)
  • ddi_add_softintr(9f)
  • ddi_remove_softintr(9f)
  • ddi_trigger_softintr(9f)
  • ddi_idevice_cookie(9s)

See modified Appendix B of the latest Solaris Express WDD that maps above obsoleted interfaces, data structures
to the new DDI Interrupt interfaces and data structures respectively.

We strongly urge the device driver developers to start using the new interrupt DDI Interfaces

Technorati Tag: Technorati Tag:

Monday Jul 11, 2005

Solaris Express 6/05 WDD released

Updated WDD for Solaris Express 6/05 Here is the updated WDD. It has details about the Advanced DDI interrupt interfaces.

Technorati Tag: Technorati Tag:

Solaris Express 6/05 announces new DDI interrupt interfaces

Solaris Express 6/05 announces new DDI interrupt interfaces The new DDI interrupt interfaces are announced through Solaris Express.
Check out What's new in Solaris Express 6/05.

Friday Jun 24, 2005

Device Driver Development: How to add interrupts in a driver

Device Driver Development: How to add interrupts

                    New DDI interrupt interfaces

This writeup is about the new DDI interrupt interfaces introduced in OpenSolaris.
The older DDI interrupt interfaces are being replaced for the following reasons:
    • Support new interrupt types like Message Signaled interrupts - MSIs and MSI-Xs
    • Support new features like get/set priority,  get/set device interrupt capability, get interrupt pending information, set/clear interrupt mask etc.
    • Support new Bus technologies like PCI-Express
    • Have a generic framework that can support  other new (and unknown) interrupts where possible
    • Support for multiple interrupts by a single device/function
    • Existing interfaces are antiquated and provide limited features.

Advanced Interrupt Handlers  added recently to that replaced most of this blog.
Topics not covered there are still retained here.

New DDI Interrupt Data structures

See OpenSolaris source code that defines data structures to be used with the new DDI interrupt framework:
    • Interrupt handles
    • Interrupt handler
    • Priority for Normal and Soft interrupts

Interrupt handles

All DDI interrupt interfaces now take an interrupt handle as argument.
  • The 'Normal Interrupts' shown above use this handle
    • ddi_intr_handle_t
  •  The 'Soft Interrupts' use this handle
    • ddi_intr_soft_handle_t

Interrupt handler

The interrupt handler has been modified to take two arguments
\* Typedef for driver's interrupt handler
typedef int (ddi_intr_handler_t)(void \*arg1, void \*arg2);

Interrupt Priority

Normal interrupt priority should be within the two defines shown below:
#define DDI_INTR_PRI_MIN 1
#define DDI_INTR_PRI_MAX 12

priority is a small integer range from DDI_INTR_PRI_MIN to DDI_INTR_PRI_MAX for most drivers
and represents virtual priority. It can be used directly in lock initialization calls: mutex_init, rw_init, etc.

Soft interrupt priority should be within the two defines shown below:

By default most drivers could use the default Soft priority which is defined as

Calls to mutex_int() should use DDI_INTR_PRI() macro
#define DDI_INTR_PRI(pri) (void \*)((uintptr_t)(pri))

How Legacy Interrupt Functions map to new interfaces

To take advantage of the features of the enhanced DDI interrupt framework,
developers need to use these new interfaces
avoid using the following legacy interfaces
, which are retained for compatibility purposes only.

Legacy Interrupt Functions Replacements (Recommended)


Three-step process:
  1. ddi_intr_alloc(9F)
  2. ddi_intr_add_handler(9F)
  3. ddi_intr_enable(9F)
ddi_add_softintr(9F) ddi_intr_add_softint(9F)
ddi_dev_nintrs(9F) ddi_intr_get_nintrs(9F)
ddi_get_iblock_cookie(9F) Three-step process:
  1. ddi_intr_alloc(9F)
  2. ddi_intr_get_pri(9F)
  3. ddi_intr_free(9F)
ddi_get_soft_iblock_cookie(9F) Three-step process:
  1. ddi_intr_add_softint(9F)
  2. ddi_intr_get_softint_pri(9F)
  3. ddi_intr_remove_softint(9F)
ddi_intr_hilevel(9F) ddi_intr_get_hilevel_pri(9F)
ddi_remove_intr(9F) ddi_intr_remove_handler(9F)
ddi_remove_softintr(9F) ddi_intr_remove_softint(9F)
ddi_trigger_softintr(9F) ddi_intr_trigger_softint(9F)
ddi_idevice_cookie(9S) Not applicable
for mutex_init(9F) etc.
(void \*)(uintptr_t)

New Interrupt Interface Examples

This section provides examples for performing the following tasks:
  • Changing soft interrupt priority
  • Checking for pending interrupts
  • Setting interrupt masks
  • Clearing interrupt masks

See Advanced Interrupt Handlers Article for examples on other interfaces.

Example: Using the ddi_inter_set_softint_pri() Function

/\* Change the soft interrupt priority to 9 \*/ 
if (ddi_intr_set_softint_pri(mydev->mydev_softint_hdl, 9) !=
cmn_err (CE_WARN, "ddi_intr_set_softint_pri failed");

Example: Using the ddi_intr_get_pending() Function

/\* Check if an interrupt is pending \*/
if (ddi_intr_get_pending(mydevp->htable[0], &pending) != DDI_SUCCESS) {
cmn_err(CE_WARN, "ddi_intr_get_pending() failed");
} else if (pending)
cmn_err(CE_NOTE, "ddi_intr_get_pending(): Interrupt pending");

Example: Using the ddi_intr_set_mask() Function

/\* Set interrupt masking to prevent device from receiving interrupts \*/ 
if ((ddi_intr_set_mask(mydevp->htable[0]) != DDI_SUCCESS))
cmn_err(CE_WARN, "ddi_intr_set_mask() failed");

Example: Using the ddi_intr_clr_mask() Function

/\* Clear interrupt masking. If successful device will start generating interrupts \*/
if (ddi_intr_clr_mask(mydevp->htable[0]) != DDI_SUCCESS)
cmn_err(CE_WARN, "ddi_intr_clr_mask() failed");


1. See source code bge_main.c from OpenSolaris that has examples on how to use the
    new DDI interrupt interfaces for both Legacy interrupts and MSI interrupts
2. See blog on  MSI and MSI-X
3. See blog on How Interrupts work on Solaris on x86 platforms
4. New article from   Advanced Interrupt Handlers

Technorati Tag:
Technorati Tag:

Tuesday Jun 14, 2005

Hardware interrupts overview for Solaris X86

Hardware interrupts overview for Solaris X86

Welcome to OpenSolaris, and the wonders of Solaris 10.

This paper provides a brief introduction to hardware interrupts on x86 platforms. It is relevant for Intel and AMD based platforms. Interrupt handling is done via interrupt controller hardware in the system which are mostly sideband signals. Inband interrupts, for e.g. Message Signalled Interrupts (MSIs), introduced with PCI v2.2 specification onwards, will be discussed in another blog. MSIs are becoming mainstay with advent of new interrconnects like PCI-Express.

However, there are mainly two kinds of hardware interrupt controllers which are commonly used on x86 platforms:

1. 82c59(A) PIC(Programmable Interrupt Controller)

This is supported by the Solaris uppc(7d) module and its source is located at usr/src/uts/i86pc/io/psm/uppc.c.  Each PIC can handle 8 vectored priority interrupting sources and there are two PICs cascaded together to provide 16 interrupts on x86 systems.  However, one of the pin - IRQ2 of the 1st PIC is used to cascade the 2nd PIC and so there are only 15 interrupt sources.  This can not be used for multiprocessor (MP) systems without any major modifications.

2. APIC (Advanced Programmable Interrupt controller)

This is supported by the Solaris pcplusmp(7d) module and its source is located at usr/src/uts/i86pc/io/pcplusmp/apic.c.  It consists of two components - I/O APIC and Local APIC.  The Local APIC is embedded in the CPU while the I/O APIC is used for connecting the interrupting sources.  The Local APIC also has the capability to send interprocessor interrupt from one cpu to another and so APIC is widely used on all the x86 MP systems.  Each system can have multiple I/O APICs and each I/O APIC can have 4, 16, 20 or 24 interrupt pins.  Since the Local APIC is embedded in the CPU and the I/O APIC can handle more than 16 interrupt sources, even the single-CPU systems uses APIC as well instead of some other hardware.

There are many systems, which have I/O APICs with 4 inputs(this is typically done for PCI-X slotted systems, where each slot is given a dedicated I/O APIC, enabling INTA-INTD for each of the slots to have a dedicated input).

Solaris supports multiple I/O APICs.

Here is a diagram that shows the APIC on a two CPU system:-

             processor #1         processor #2
+-----------+ +-----------+
| | | |
| CPU | | CPU |
| | | |
+-----------+ +-----------+
| local APIC| | local APIC|
+-----------+ +-----------+
\^ \^
| |
| |
| |
v v Processor system bus
| | |
| v |
| +--------+ |
| | | |
| | Bridge | |
| | | |
| +--------+ |
| \^ |
| | |
| v PCI |
| <------------------> |
| \^ |
| | |
| v |
| +----------+ |
| | | |
| | I/O APIC |<-----|---External
| | |<-----|---Interrupts
| +----------+ |
| |
System Chip set

Solaris x86 interrupt handling overview

When a device driver adding the interrupt through ddi_add_intr(9f), it eventually gets to uppc_addspl() in uppc(7d) for PIC or apic_addspl() in pcplusmp(7d) if using APIC.  The interrupt pin will be identified and then enabled.  It is quite simple for the uppc(7d) case, the interrupt pin to enable on the PICs is basically 1-1 mapped to the "IRQ#" or the "interrupt" property of the device on Solaris.  But for APIC (pcplusmp(7d)), it is a lot more complicated as internally it either uses the MP Spec. 1.4[2] or ACPI specification[3]  to locate the right interrupt pin of the right I/O APIC for the device. The system BIOS sets up how the interrupts are routed and saves that information in either the MP Specification table or somewhere that ACPI can easily access. pcplusmp(7d) then access that information to initialize and add the interrupts.

During the ddi_add_intr(9f) call, the device interrupt handler's entry point is stored in the autovect[] and the interrupt pin will be enabled through uppc_addspl (for PCI) or apic_addspl(for APIC).  Also, before the interrupt pin is enabled, an interrupt "vector" (refered to as "vector" from now on) will have to be selected for the CPU to trigger when that particular interrupt comes in.  For Intel CPUs, there are total 256 vectors and the first 32 vectors are reserved for special functions and so the first available vector for devices is 32 (or hex 0x20). 

For uppc(7d), vectors are set up such that the 1st pin or IRQ#0 is mapped to 32, IRQ1 to 33 and so on.  As for pcplusmp(7d), it is not as simple.  Solaris handles interrupts based on interrupt priority and each device is assigned a unique priority (can be modified by the device driver).  Say, if a device "abc" is assigned  priority 5, then all other interrupts at 5 or lower can NOT be triggered when the interrupt handler of "abc" is executing.  However, an interrupt of priority 6 or higher is allowed to trigger.  Since APIC has mechanism to prioritize interrupts, pcplusmp(7d) needs to select the vectors accordingly.

In order to handle the interrupt priority properly, there are few internal interface calls provided by uppc(7d) and pcplusmp(7d). They are the uppc_intr_enter()/uppc_intr_exit() for the uppc(7d); and apic_intr_enter()/apic_intr_exit() for pcplusmp(7d).  After the interrupt is triggered but before the interrupt handler is called, uppc_intr_enter() or apic_intr_enter() will be called to setup the interrupt priority accordingly to block all other interrupts with the same or lower priority.  After the interrupt handler is completed, then uppc_intr_exit() or apic_intr_exit() is called to restore the interrupt priority.

On the x86 platform, all the local variables of the interrupt handler are on stack. Also, if the interrupt handler needs to call another function, the parameters that are passed to the function are on stack too. i.e. all the interrupt handlers should use the stack one way or the other.

Solaris code that handles interrupts

Below are few code snippets that deal with interrupts.

To begin with the 256 vector entries are defined in the autovect[] table shown below:

#define MAX_VECT 256
struct av_head autovect[MAX_VECT];


struct autovec {

\* Interrupt handler and argument to pass to it.
struct autovec \*av_link; /\* pointer to next on in chain \*/
uint_t (\*av_vector)();
caddr_t av_intarg;
uint_t av_prilevel; /\* priority level \*/

\* Interrupt handle/id (like intrspec structure pointer) used to
\* identify a specific instance of interrupt handler in case we
\* have to remove the interrupt handler later.
void \*av_intr_id;
dev_info_t \*av_dip;

av_vector is the device interrupt handler.

struct av_head {
struct autovec \*avh_link;
ushort_t avh_hi_pri;
ushort_t avh_lo_pri;

- All interrupts run at some priority which has a ceiling of LOCK_LEVEL.
Interrupts below LOCK_LEVEL run as threads.


#define CLOCK_LEVEL 10
#define LOCK_LEVEL 10

- The following sequence shows what is done for each CPU to allocate
 enough interrupt threads to handle the interrupts. Since interrupts
are prioritized, one interrupt thread per priority should be sufficient.


start_other_cpus(int cprboot)
for (who = 0; who < NCPU; who++) {



\* Allocate threads and stacks for interrupt handling.
#define NINTR_THREADS (LOCK_LEVEL-1) /\* number of interrupt threads \*/

init_intr_threads(struct cpu \*cp)
int i;

for (i = 0; i < NINTR_THREADS; i++)
(void) thread_create_intr(cp);


thread_create_intr(struct cpu \*cp)
- Here is the actual code handling the interrupts on x86.
\*setlvl is the wrapper for uppc_intr_enter() or apic_intr_enter().

NOTE: Calling the interrupt handler is done in low level assembly code
which is not discussed here.


1.See Chapters 5 and 7 of the Intel Architecture Software Developer's Manual Volume 3: System Programmer Guide for details on how interrupts work on the x86 platform.
2. Intel Multi-Processor Specification v1.4
3. Advanced Configuration & Power Interface (ACPI) specification home

PS: Lots of thanks to Johnny Cheung, also in Solaris I/O, for originally contributing to this material.

Technorati Tag: Technorati Tag:

Using Cfgadm with InfiniBand

Using Cfgadm with InfiniBand Welcome to OpenSolaris, and the wonders of Solaris 10.

This provides a brief introduction to using InifniBand Device management with cfgadm(1m).


An InfiniBand (IB) device is enumerated by the IB nexus driver, ib(7D), based on the interfaces provided by IB Device M anager (IBDM). The IB nexus driver creates and initializes five types of device nodes:
    • IB Port devices
    • IB HCA service (HCA_SVC) devices
    • IB Virtual Physical Point of Attachment (VPPA) devices
    • I/O Controller (IOC)
    • IB Pseudo devices
See ib(7d), ibdm(7d), ibtl(7d) and ib(4) for details on InifiniBand nexus driv er, Device Manager respectively.

Attachment Point format

InfiniBand cfgadm plugin supports above five device nodes as 'dynamic' attachmen t points:

Device Type
Attachment Type format
Port Devices
HCA_SVC devices
VPPA devices
IOC devices
Pseudo devices

servicename    is name of the communication service
P_Key               is the Partition Key

In addition, two 'static' attachment points are supported
Static attachment
Attachment type format
IB Fabric
Host Channel Adapter(s)

See example below that shows all  InfiniBand devices
# cfgadm -a ib hca:2C90109764440 
Ap_Id                          Type         Receptacle   Occupant     Condition
hca:2C90109764440              IB-HCA       connected    configured   ok
ib                             IB-Fabric    connected    configured   ok
ib::2C90109764440,0,svch       IB-HCA_SVC   connected    unconfigured unknown
ib::2C90109764441,0,psvc       IB-PORT      connected    unconfigured unknown
ib::2C90109764441,ffff,ipib    IB-VPPA      connected    unconfigured unknown
ib::2C90109764442,0,psvc       IB-PORT      connected    unconfigured unknown
ib::2C90109764442,ffff,ipib    IB-VPPA      connected    unconfigured unknown
ib::daplt,0                    IB-PSEUDO    connected    configured   ok
ib::rpcib,0                    IB-PSEUDO    connected    configured   ok

The example below shows how to list all kernel clients of a given InfiniBand Host Channel Adapter.
# cfgadm -x list_clients  hca:2C90109764440
Ap_Id                          IB Client                 Alternate HCA
ib::daplt,0                    daplt                     no
ib::rpcib,0                    nfs/ib                    no
-                              ibmf                      no
-                              ibdm                      no

Configure/Unconfigure a IB device

Using cfgadm commands the five devices listed above could be configured for operation or unconfigured. Example below is shown for IB Pseudo device but applies for any device with the appropriate attachment point supplied as the command line argument:

Configuring a device:

# cfgadm -a ib::daplt,0             
Ap_Id                          Type         Receptacle   Occupant     Condition
ib::daplt,0                    IB-PSEUDO    connected    unconfigured unknown
# cfgadm -yc configure ib::daplt,0  
# cfgadm -a ib::daplt,0           
Ap_Id                          Type         Receptacle   Occupant     Condition
ib::daplt,0                    IB-PSEUDO    connected    configured   ok
Unconfiguring a device:
# cfgadm -a ib::daplt,0
Ap_Id                          Type         Receptacle   Occupant     Condition
ib::daplt,0                    IB-PSEUDO    connected    configured   ok
# cfgadm -yc unconfigure ib::daplt,0
# cfgadm -a ib::daplt,0             
Ap_Id                          Type         Receptacle   Occupant     Condition
ib::daplt,0                    IB-PSEUDO    connected    unconfigured unknown
The example below shows how to unconfigure all kernel clients of a give InfiniBand Host Channel Adapter.
# cfgadm -x unconfig_clients hca:2C90109764440 
Unconfigure Clients of HCA /devices/ib:2C90109764440
This operation will unconfigure IB clients of this HCA
Continue (yes/no)? yes <<<<<<< 

Communication Service Commands

InfiniBand Port/HCA-SVC/VPPA devices use communication services. There are certain operations allowed for communication services like adding a service, removing a service or listing known services. See examples below:

Add a communication service:
# cfgadm -x list_services ib   
PORT communication services:

VPPA communication services:
HCA communication services:
# cfgadm -o comm=port,service=srp -x add_service ib
# cfgadm -x list_services ib                       
PORT communication services:
                srp <<<<<<<<<<<<<<

VPPA communication services:
HCA communication services:
Delete a communication service:
# cfgadm -x list_services ib                       
PORT communication services:

VPPA communication services:
HCA communication services:
# cfgadm -o comm=port,service=srp -x delete_service ib
# cfgadm -x list_services ib                          
PORT communication services:

VPPA communication services:
HCA communication services:
Note that the examples are shown only for Port Devices but are applicable to all three device types.

Other useful Commands

Two more useful commands provided by InfiniBand cfgadm plugin are:
  • update_pkey_tbls
    It updates the P_Key information inside ibtl(7d) i.e. The InifniBand Transport Layer module. ibtl(7d) reads the P_Key tables for all ports of all the HCAs seen by the host.
  • update_ioc_conf
    It updates the properties of all IOC devices if ib static attachment is supplied.If an IOC attachment point is supplied then only that IOC's properties are updated.Properties updated are:
    port-list, port-entries,  service-id, and service-name


1. Ted Kim's Blog
2. InfiniBand Trade Association

Technorati Tag: Technorati Tag:

Message Signaled Interrupts

Message Signaled Interrupts Welcome to OpenSolaris, and the wonders of Solaris 10.

This paper provides a brief introduction to inband interrupts - Message Signaled Interrupts. All flavors of PCI (PCI 2.2 onwards, PCI-X, PCI-Express) support Me ssage Signaled Interrupts (referred to as MSIs henceforth).


MSIs unlike fixed interrupts, are in-band messages targeting an address range in the host bridge. Since the messages are in-band, the receipt of the message can be used to "push" any data associated with the interrupt. MSI's are by definiti on, unshared. Each MSI message assigned to a device is guaranteed to be a unique message in the system. PCI functions can request between 1 and 32 MSI messages, in powers of two. The system software may allocate fewer MSI messages to a func tion than the function requested. The host bridge will have some limitation in t he number of unique MSI messages that can be allocated for devices.

The introduction of PCI-Express [1] extended PCI and MSI by requiring the use of MSI for PCI functions. PCI-Express is a serial point-to-point bus with no exter nal wires. For legacy purposes PCI-Express includes INTx (INTA-INTD) emulation m essages for compatibility with existing software, however, within any one PCI-Ex press domain, the four INTx emulation messages are shared by any device using IN Tx emulation with  that hierarchy. Thus, depending on INTx emulation is gen erally a bad idea due to the nature of its implementation.

Extended MSI (MSI-X)

A PCI-SIG MSI-X ECN [2] extended MSI by adding the ability for a function to allocate more (up to 2048) messages, makin g the address and data value used for each message independent of any other MSI- X message, and allowing software the ability to choose to use the same MSI addre ss/data value in multiple MSI-X "slots", as an architected method for dealing wi th the case when the system allocates fewer MSI/X messages to the device than th e device requested.

Implementation Notes

MSI and MSI-X shall be collectively referred to as MSI/X henceforth here. MSI/X is always edge triggered since the interrupt is signaled with a posted write com mand by the device targeting a pre-allocated area of "memory" on the host bridge . However, some host bridges have the ability to "latch" the acceptance of an MS I/X message and can effectively treat it as a level signaled interrupt.

Devices are permitted to send more than one MSI/X message prior to an  outstanding interrupt being services, however, the PCI specifications state tha t there is no guarantee that additional MSI/X messages will be serviced until th e first of a set of MSI/X messages targeting the same address/data values have b een serviced. Therefore, there is only a guarantee of servicing one MSI/X messa ge per set of MSI/X messages. Other than certain devices that send periodic inte rrupts, devices should in general, only send one MSI/X message per interrupt sou rce until that interrupt has been serviced.

With MSI/X, vectors must be allocated by the implementation and assigned to the device. Default interrupt priority is assigned based on the class code of t he device. Native PCI devices should avoid using INTx or INTx emulation when MSI /X is available in the device and supported by the host bridge implementation.

New interrupt DDI interfaces

Upcoming version(s) of Solaris supports MSIs and has new DDI interfaces to regis ter/unregister interrupts. In addition these new interfaces allow:
  • Get and set device's interrupt capabilities
  • Get and set device's interrupt priority
  • Get information if an interrupt is pending
  • Set and clear interrupt mask


1. PCI Express Base Specification v1.0a
2. PCI Express Engineering Change Notice - MSI-X Oct. 31, 2003
Technorati Tag: Technorati Tag:



Top Tags
« April 2014