Interrupts are events delivered to CPUs, usually by external devices
(e.g. FC, SCSI, Ethernet and Infiniband adapters). Interrupts can
cause performance and observability problems for applications.
Performance problems are caused when an interrupt "steals" a CPU from
an application thread, halting its process while the interrupt is
serviced. This is called pinning - the interrupt will pin an
application thread if the interrupt was delivered to a CPU on which
an application was executing at the time.
This can affect other threads or processes in the application if for
example the pinned thread was holding one or more synchronization
objects (locks, semaphores, etc.)
Observability problems can arise if we are trying to account for work
the application is completing versus the CPU it is consuming. During
the time an interrupt has an application thread pinned, the CPU it
consumes is charged to the application.
The SDT provider offers the following probes that indicate
when an interrupt is being serviced:
The first argument (arg0) to both probes is the address of a
struct dev_info (AKA dev_info_t *), which can be
used to identify the driver and instance for the interrupt.
If the interrupt has indeed pinned a user thread, the following will
curthread->t_intr != 0;
curthread->t_intr->t_procp->p_pidp->pid_id != 0
The pid_id field will correspond to the PID of the process that has
been pinned. The thread will be pinned until either
sdt:::interrupt-complete or fbt::thread_unpin:return
Attached are some
scripts that can be used to assess the effect of
pinning. These have been tested with Solaris 10 and Solaris 11.
Probe effect will vary. De-referencing four pointers then hashing
against a character string device name each time an interrupt fires;
as some of the scripts do; can be expensive. The last two scripts are
designed to have a lower probe effect if your application or system is
sensitive to this.
The primary technique used to improve the performance of an
application experiencing pinning is to "fence" the interrupts from the
application. This involves the use of either processor binding or
processor sets (sets are usually preferable) to either dedicate
CPUs to the application that are known to not have the high-impact
interrupts targeted at them, or to dedicate CPUs to the driver(s)
delivering the high-impact interrupts.
This is not the optimal solution for all situations. Testing is
Another technique is to investigate whether the interrupt handling for
the driver(s) in question can be modified. Some drivers allow for
more or less work to be performed by worker threads, reducing the time
during which an interrupt will pin a user thread. Other drivers can
direct interrupts at more than a single CPU, usually depending on the
interface on which the I/O event has ocurred. Some network drivers
can wait for more or fewer incoming packets before sending an
Most importantly, only attempt to resolve these issues yourself if you
have a good understanding of the implications, preferably one
backed-up by testing. An alternative is to open a service call with
Oracle asking for assistance to resolve a suspected pinning issue.
You can reference this article and include data obtained by using the
If you have identified that your multi-threaded or multi-process
application is being pinned, but the stolen CPU time does not seem to
account for the drop in performance, the next step in DTrace would be
to identify whether any critical kernel or user locks are being held
during any of the pinning events. This would require marrying
information gained about how long application threads are pinned with
information gained from the lockstat and plockstat