X

Ricardo Ferreira's Blog

  • October 28, 2011

Enabling HPC (High Performance Computing) with InfiniBand in Java™ Applications

For many IT managers, choosing a
computing fabric often falls short of receiving the center stage
attention it deserves. Familiarity and time have set infrastructure
architects on a seemingly intransigent course toward settling for
fabrics that have become the "de-facto" norm. On one hand is
Ethernet, which is regarded as the solution for an almost ridiculously
broad range of needs, from web surfing to high-performance,
every microsecond counts storage traffic. On the other hand is Fibre
Channel
, which provides more deterministic performance and bandwidth,
but at the cost of bifurcating the enterprise network while doubling the
cabling, administrative overhead, and total cost of ownership. But
even with separate fabrics, these networks sometimes fall short of
today's requirements. So slowly but surely some enterprises are waking
up to the potential of new fabrics.

There
is no other component of the data center that is as important as the
fabric, and the I/O challenges facing enterprises today have been
setting the stage for next-generation fabrics for several years. One of
the more mature solutions, InfiniBand, is sometimes at the forefront
when users are driven to select a new fabric. New
customers are often surprised at what InfiniBand can deliver, and how
easily it integrates into today's data center architectures. InfiniBand
has been leading the charge toward consolidated, flexible, low latency,
high bandwidth, lossless I/O connectivity. While other fabrics have just
recently turned to addressing next generation requirements, with
technologies such as Fibre Channel over Ethernet (FCoE) still seeking
general acceptance and even standardization, InfiniBand's maturity and e
bandwidth advantages continue to attract new customers.

Although
InfiniBand remains a small industry compared to the Ethernet
juggernaut, it continues to grow aggressively, and this year it is
growing beyond projections. InfiniBand often plays a role
in enterprises with huge datasets. The majority of Fortune 1000
companies are involved in high-throughput processing behind a wide
variety of systems, including business analytics, content creation,
content trans-coding, real-time financial applications, messaging
systems, consolidated server infrastructures, and more. In these cases,
InfiniBand has worked its way into the enterprise as a localized fabric
that, via transparent interconnection to existing networks, is sometimes
even hidden from the eyes of administrators.

For high performance computing environments, the capacity to move data
across a network quickly and efficiently is a requirement. Such networks
are typically described as requiring high throughput and low latency. High throughput refers to an environment that can deliver a large amount of processing capacity over a long period of time. Low latency refers to the minimal delay between processing input and providing output, such as you would expect in a real-time application.

In these environments, conventional networking using socket streams can
create bottlenecks when it comes to moving data. Introduced in 1999 by
the InfiniBand Trade Association,
InfiniBand (IB for short) was created to address the need for high performance
computing. One of the most important features of IB is Remote Direct
Memory Access (RDMA). RDMA enables moving data directly from the memory
of one computer to another computer, bypassing the operating system of
both computers and resulting in significant performance gains. The usage of InfiniBand network is particularly interesting when applied in engineered systems like Oracle Exalogic and SPARC SuperCluster T4-4. In this type of system, you can boost the performance of your application from 3X to 10X without changing a single line of code.

The Sockets Direct Protocol or SDP for short is a networking protocol developed to
support stream connections over InfiniBand fabric. SDP support was
introduced to the JDK 7 release of the Java Platform, for applications deployed in the Solaris and Linux operating systems. The Solaris operating system has
supported SDP and InfiniBand since Solaris 10. On the Linux, world the
InfiniBand package is called OFED (OpenFabrics Enterprise Distribution).
The JDK 7 release supports the 1.4.2 and 1.5 versions of OFED.

SDP support is essentially a TCP bypass technology. When SDP is enabled and an application attempts to open a TCP
connection, the TCP mechanism is bypassed and communication goes
directly to the InfiniBand network. For example, when your application attempts
to bind to a TCP address, the underlying software will decide, based on
information in the configuration file, if it should be rebound to an SDP
protocol. This process can happen during the binding process or the
connecting process, but happens only once for each socket.

There are no API changes required in your code to take advantage of
the SDP protocol: the implementation is transparent and is supported by
the classic networking and the NIO (New I/O) channels. SDP support is disabled by default. The steps to enable SDP support are:

  • Create an text-based file that will act as the SDP configuration file
  • Set the system property that specifies the location of this configuration file.

Creating an Configuration File for SDP

An SDP configuration file is a text file, and you decide where on the
file system this file will reside. Every line in the configuration file
is either a comment or a rule. A comment is indicated by the hash
character "#" at the beginning of the line, and everything following the
hash character will be ignored.

There are two types of rules, as follows:

  • A "bind" rule indicates that the SDP protocol transport should be
    used when a TCP socket binds to an address and port that match the rule.
  • A "connect" rule indicates that the SDP protocol transport should be
    used when an unbound TCP socket attempts to connect to an address and
    port that match the rule.

A rule has the following format:

("bind"|"connect")1*LWSP-char(hostname|ipaddress)["/"prefix])1*LWSP-char("*"|port)["-"("*"|port)]
The first keyword indicates whether the rule is a bind or a connect rule. The next token specifies either a host name or a literal IP address. When you specify a literal IP address, you can also specify a prefix, which indicates an IP address range. The third and final token is a port number or a range of port numbers. Consider the following notation in this sample configuration file:
# Enable SDP when binding to 192.168.0.1
bind 192.168.0.1 *
# Enable SDP when connecting to all
# application services on 192.168.0.*
connect 192.168.0.0/24 1024-*
# Enable SDP when connecting to the HTTP server
# or a database on oracle.infiniband.sdp.com
connect oracle.infiniband.sdp.com 80
connect oracle.infiniband.sdp.com 3306

The first rule in the sample file specifies that SDP is enabled for any port (*) on the local IP address 192.168.0.1. You would add a bind rule for each local address assigned to an InfiniBand adaptor. An InfiniBand adaptor is the equivalent of a network interface card (NIC) for InfiniBand. If you had several InfiniBand adaptors, you would use a bind rule for each address that is assigned to those adaptors. The second rule in the sample file specifies that whenever connecting to 192.168.0.* and the target port is 1024 or greater, SDP will be enabled. The prefix on the IP address /24
indicates that the first 24 bits of the 32-bit IP address should match
the specified address. Each portion of the IP address uses 8 bits, so 24
bits indicates that the IP address should match 192.168.0 and the final byte can be any value. The "-*"
notation on the port token specifies "and above." A range of ports,
such as 1024—2056, would also be valid and would include the end points
of the specified range.

The final rules in the sample file specify a host name called oracle.infiniband.sdp.com,
first with the port assigned to an HTTP server (80) and then with the
port assigned to a database (3306). Unlike a literal IP address, a host
name can translate into multiple addresses. When you specify a host
name, it matches all addresses that the host name is registered to in
the name service.

How Enable the SDP Protocol?

SDP support is disabled by default. To enable SDP support, set the com.sun.sdp.conf
system property by providing the location of the configuration file.
The following example starts an application named MyApplication using a configuration file
named sdp.conf:

     java -Dcom.sun.sdp.conf=sdp.conf -Djava.net.preferIPv4Stack=true MyApplication

MyApplication refers to the client application written in Java that is attempting to connect to the InfiniBand adaptor. Note that this example specifies another system property, java.net.preferIPv4Stack. Maybe, not everything discussed so far will work perfectly fine. There are some technical issues that you should be aware when enabling SDP, most of them, related with the IPv4 or IPv6 stacks. If something in your tests goes wrong, it is a nice idea ensure with your operating system administrators if the InfiniBand is correctly configured at this layer.

APIs and Supported Java Classes

    All classes in the JDK that read or write network sockets can use SDP. As said before, there are no changes in the way you write the code. This is particularly interesting if you are only installing a already deployed application in a cluster that uses InfiniBand. When SDP support is enabled, it just works without any change to your
    code. Compiling is not necessary. However, it is important to know that
    a socket is bound only once. A connection is an implicit bind. So, if
    the socket hasn't been previously bound and connect is invoked, the binding occurs at that time. For example, consider the following code:

    AsynchronousSocketChannel asyncSocketChannel = AsynchronousSocketChannel.open();
    InetSocketAddress localAddress = new InetSocketAddress("10.0.0.3", 8020);
    InetSocketAddress remoteAddress = new InetSocketAddress("192.168.0.1", 1521);
    asyncSocketChannel.bind(localAddress);
    Future result = asyncSocketChannel.connect(remoteAddress);

    In this example, the asynchronous socket channel is bound to a local TCP address when bind
    is invoked on the socket. Then, the code attempts to connect to a
    remote address by using the same socket. If the remote address uses
    InfiniBand, as specified in the configuration file, the connection will
    not be converted to SDP because the socket was previously bound.

    Conclusion

    InfiniBand is a "de-facto" standard for high performance computing environments that needs to guarantee extreme low latency and superior throughput. But to really take advantage of this technology, your applications need to be properly prepared. This article showed how you can maximize the performance of Java applications enabling the SDP support, the network protocol that "talks" with the InfiniBand network.

    Be the first to comment

    Comments ( 0 )
    Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
    Oracle

    Integrated Cloud Applications & Platform Services