X

Event ports and performance

Guest Author
So lots of people have been talking about event ports. They were designed to solve the problem with poll(2)
and lots of file descriptors. The goal is to scale with the number of
actual events of interest rather than the number of file descriptors
one is listening on, since the former is often much less than the
latter.


To make this a little more concrete, consider your giant new server
which maintains a thousand connections, only about a hundred of which
are active at any given time. It has its array with a thousand socket
file descriptors, and sits there in a loop calling poll to figure out
which of them can be read without blocking. poll waits until at least
one of them is ready, returns, and then the program iterates over the
whole array to see which descriptor(s) poll said was ready. Finally, it
processes them one by one. You waste all this time iterating over the
array, not to mention the time copying it between the kernel and
userland every time you call poll.


It's true that you could multi-thread this and save some time, but the
overhead of thousands of threads (one for each fd) adds up as well.


This is the problem event ports are designed to solve. Interestingly
enough, this was not the first attempt to speed up this scenario. /dev/poll
was introduced in Solaris 7 for the same purpose, and works great -
except that multi-threaded apps can't use it without [lots of slow]
synchronization. Linux has epoll and BSD has kqueue, about neither of which do I pretend to know much.


Event ports, as applied to file descriptors, are basically a way of
telling the OS that you're interested in certain events on certain
descriptors, and to deliver (synchronous) notification of said events
on said descriptors to a particular object (...which also happens to be
a file descriptor). Then you can have one or more worker threads
sleeping, waiting for the OS to wake them up and hand them an event,
saying "here you go, this socket's got something to read." You can
actually have lots of worker threads hanging around the same event port
and let the OS do your "synchronization" - they probably don't need to
communicate or share any common state (depending on your app, of
course).


No expensive copies. No traversing huge arrays finding needles in haystacks.


At least, in theory. Until now, nobody's really measured their performance with any microbenchmarks. My recent addition of event ports of libevent
gave me an excuse to run libevent's own bunchmarks and see how it
performed. The benchmark basically just schedules lots of read/write
events, and has a few parameters: the number of file descriptors, the
number of writes, and the number of active connections (the higher this
number, the more the benchmark spreads its writes over the many file
descriptors).

Results

All benchmarks below were run on a uniprocessor x86 machine with 1G
memory. Note the vertical axes - some of the graphs use logarithmic
scales. And remember, these are the results of the libevent benchmark,
so differences in performance can be a result of the underlying polling
mechanism as well as the libevent implementation (though since the
implementations don't matter, it's assumed that most affects are
attributable to the underlying polling mechanism)

Let's start with the simplest example. Consider small numbers of file descriptors, and various methods of polling them:




On this machine, event ports are fast, even for small numbers of file
descriptors. In some circumstances, event ports are slightly more
expensive for small numbers of file descriptors, particularly when many
of them are active. This results because (a) when most of your
descriptors are active, then the work done by poll is not such a waste
(because you have to do it anyway), and (b) event ports require two
system calls per descriptor event rather than one.

But let's look at much larger numbers of file descriptors, where event
ports really show their stuff. Note that /dev/poll has a maximum of
about 250 fd's (Update 7/12/06: Alan Bateman
points out that you can use setrlimit to update RLIMIT_NOFILE to
increase this to much higher values), and select(3c) can only handle up
to about 500, so
they're omitted from the rest of these graphs. This graph has a
logarithmic scale.



Wow. That's about a 10x difference for 7,500 fd's. 'Nuff said.
Finally, let's examine what happens when the connections become more
active (more of the descriptors are written to):




Event ports still win, but not by nearly as much (about 2x, for up to
15,000 fd's). Less of poll's array traversing is unnecessary, so the
difference is smaller. But in the long run, event ports scale pretty
much at a constant rate, while poll scales linearly. Select is even
worse, since it's implemented on top of poll (on Solaris). /dev/poll is
good, but has the problem mentioned earlier of being difficult to use
in a multi-threaded application.

For large numbers of descriptors, event ports are the clear winner.

Benchmarking can be difficult, and it's easy to miss subtle effects. I
invite your comments, especially if you think there's something I've
overlooked.



Join the discussion

Comments ( 4 )
  • Merten Schumann Friday, June 30, 2006
    pretty interesting stuff!
    Is there an Java API already available to use these event ports from Java? Via JNI ... I guess event ports haven't made their way into java.io yet :-)
    Thanx
    Merten
  • Alan Saturday, July 15, 2006
    The Selector implementation in Java SE use /dev/poll. See Dave's follow-on entry to compare /dev/poll and event ports:
    http://blog.sun.com/roller/page/dap?entry=event_ports_and_performance_take
  • Merten Schumann Thursday, July 20, 2006
    Alan, thanx alot
  • Lamb Liu Monday, July 23, 2007
    can you show me the code of your benchmark?
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha