Friday Oct 28, 2011

Enabling HPC (High Performance Computing) with InfiniBand in Java™ Applications

For many IT managers, choosing a computing fabric often falls short of receiving the center stage attention it deserves. Familiarity and time have set infrastructure architects on a seemingly intransigent course toward settling for fabrics that have become the "de-facto" norm. On one hand is Ethernet, which is regarded as the solution for an almost ridiculously broad range of needs, from web surfing to high-performance, every microsecond counts storage traffic. On the other hand is Fibre Channel, which provides more deterministic performance and bandwidth, but at the cost of bifurcating the enterprise network while doubling the cabling, administrative overhead, and total cost of ownership. But even with separate fabrics, these networks sometimes fall short of today's requirements. So slowly but surely some enterprises are waking up to the potential of new fabrics.

There is no other component of the data center that is as important as the fabric, and the I/O challenges facing enterprises today have been setting the stage for next-generation fabrics for several years. One of the more mature solutions, InfiniBand, is sometimes at the forefront when users are driven to select a new fabric. New customers are often surprised at what InfiniBand can deliver, and how easily it integrates into today's data center architectures. InfiniBand has been leading the charge toward consolidated, flexible, low latency, high bandwidth, lossless I/O connectivity. While other fabrics have just recently turned to addressing next generation requirements, with technologies such as Fibre Channel over Ethernet (FCoE) still seeking general acceptance and even standardization, InfiniBand's maturity and e bandwidth advantages continue to attract new customers.

Although InfiniBand remains a small industry compared to the Ethernet juggernaut, it continues to grow aggressively, and this year it is growing beyond projections. InfiniBand often plays a role in enterprises with huge datasets. The majority of Fortune 1000 companies are involved in high-throughput processing behind a wide variety of systems, including business analytics, content creation, content trans-coding, real-time financial applications, messaging systems, consolidated server infrastructures, and more. In these cases, InfiniBand has worked its way into the enterprise as a localized fabric that, via transparent interconnection to existing networks, is sometimes even hidden from the eyes of administrators.

For high performance computing environments, the capacity to move data across a network quickly and efficiently is a requirement. Such networks are typically described as requiring high throughput and low latency. High throughput refers to an environment that can deliver a large amount of processing capacity over a long period of time. Low latency refers to the minimal delay between processing input and providing output, such as you would expect in a real-time application.

In these environments, conventional networking using socket streams can create bottlenecks when it comes to moving data. Introduced in 1999 by the InfiniBand Trade Association, InfiniBand (IB for short) was created to address the need for high performance computing. One of the most important features of IB is Remote Direct Memory Access (RDMA). RDMA enables moving data directly from the memory of one computer to another computer, bypassing the operating system of both computers and resulting in significant performance gains. The usage of InfiniBand network is particularly interesting when applied in engineered systems like Oracle Exalogic and SPARC SuperCluster T4-4. In this type of system, you can boost the performance of your application from 3X to 10X without changing a single line of code.

The Sockets Direct Protocol or SDP for short is a networking protocol developed to support stream connections over InfiniBand fabric. SDP support was introduced to the JDK 7 release of the Java Platform, for applications deployed in the Solaris and Linux operating systems. The Solaris operating system has supported SDP and InfiniBand since Solaris 10. On the Linux, world the InfiniBand package is called OFED (OpenFabrics Enterprise Distribution). The JDK 7 release supports the 1.4.2 and 1.5 versions of OFED.

SDP support is essentially a TCP bypass technology. When SDP is enabled and an application attempts to open a TCP connection, the TCP mechanism is bypassed and communication goes directly to the InfiniBand network. For example, when your application attempts to bind to a TCP address, the underlying software will decide, based on information in the configuration file, if it should be rebound to an SDP protocol. This process can happen during the binding process or the connecting process, but happens only once for each socket.

There are no API changes required in your code to take advantage of the SDP protocol: the implementation is transparent and is supported by the classic networking and the NIO (New I/O) channels. SDP support is disabled by default. The steps to enable SDP support are:

  • Create an text-based file that will act as the SDP configuration file
  • Set the system property that specifies the location of this configuration file.

Creating an Configuration File for SDP

An SDP configuration file is a text file, and you decide where on the file system this file will reside. Every line in the configuration file is either a comment or a rule. A comment is indicated by the hash character "#" at the beginning of the line, and everything following the hash character will be ignored.

There are two types of rules, as follows:

  • A "bind" rule indicates that the SDP protocol transport should be used when a TCP socket binds to an address and port that match the rule.
  • A "connect" rule indicates that the SDP protocol transport should be used when an unbound TCP socket attempts to connect to an address and port that match the rule.

A rule has the following format:

("bind"|"connect")1*LWSP-char(hostname|ipaddress)["/"prefix])1*LWSP-char("*"|port)["-"("*"|port)]
The first keyword indicates whether the rule is a bind or a connect rule. The next token specifies either a host name or a literal IP address. When you specify a literal IP address, you can also specify a prefix, which indicates an IP address range. The third and final token is a port number or a range of port numbers. Consider the following notation in this sample configuration file:
# Enable SDP when binding to 192.168.0.1
bind 192.168.0.1 *

# Enable SDP when connecting to all
# application services on 192.168.0.*
connect 192.168.0.0/24     1024-*

# Enable SDP when connecting to the HTTP server
# or a database on oracle.infiniband.sdp.com
connect oracle.infiniband.sdp.com   80
connect oracle.infiniband.sdp.com   3306
The first rule in the sample file specifies that SDP is enabled for any port (*) on the local IP address 192.168.0.1. You would add a bind rule for each local address assigned to an InfiniBand adaptor. An InfiniBand adaptor is the equivalent of a network interface card (NIC) for InfiniBand. If you had several InfiniBand adaptors, you would use a bind rule for each address that is assigned to those adaptors. The second rule in the sample file specifies that whenever connecting to 192.168.0.* and the target port is 1024 or greater, SDP will be enabled. The prefix on the IP address /24 indicates that the first 24 bits of the 32-bit IP address should match the specified address. Each portion of the IP address uses 8 bits, so 24 bits indicates that the IP address should match 192.168.0 and the final byte can be any value. The "-*" notation on the port token specifies "and above." A range of ports, such as 1024—2056, would also be valid and would include the end points of the specified range.

The final rules in the sample file specify a host name called oracle.infiniband.sdp.com, first with the port assigned to an HTTP server (80) and then with the port assigned to a database (3306). Unlike a literal IP address, a host name can translate into multiple addresses. When you specify a host name, it matches all addresses that the host name is registered to in the name service.

How Enable the SDP Protocol?

SDP support is disabled by default. To enable SDP support, set the com.sun.sdp.conf system property by providing the location of the configuration file. The following example starts an application named MyApplication using a configuration file named sdp.conf:

     java -Dcom.sun.sdp.conf=sdp.conf -Djava.net.preferIPv4Stack=true MyApplication

MyApplication refers to the client application written in Java that is attempting to connect to the InfiniBand adaptor. Note that this example specifies another system property, java.net.preferIPv4Stack. Maybe, not everything discussed so far will work perfectly fine. There are some technical issues that you should be aware when enabling SDP, most of them, related with the IPv4 or IPv6 stacks. If something in your tests goes wrong, it is a nice idea ensure with your operating system administrators if the InfiniBand is correctly configured at this layer.

APIs and Supported Java Classes

    All classes in the JDK that read or write network sockets can use SDP. As said before, there are no changes in the way you write the code. This is particularly interesting if you are only installing a already deployed application in a cluster that uses InfiniBand. When SDP support is enabled, it just works without any change to your code. Compiling is not necessary. However, it is important to know that a socket is bound only once. A connection is an implicit bind. So, if the socket hasn't been previously bound and connect is invoked, the binding occurs at that time. For example, consider the following code:

    AsynchronousSocketChannel asyncSocketChannel = AsynchronousSocketChannel.open();
    InetSocketAddress localAddress = new InetSocketAddress("10.0.0.3", 8020);
    InetSocketAddress remoteAddress = new InetSocketAddress("192.168.0.1", 1521);
    asyncSocketChannel.bind(localAddress);
    Future result = asyncSocketChannel.connect(remoteAddress);
    

    In this example, the asynchronous socket channel is bound to a local TCP address when bind is invoked on the socket. Then, the code attempts to connect to a remote address by using the same socket. If the remote address uses InfiniBand, as specified in the configuration file, the connection will not be converted to SDP because the socket was previously bound.

    Conclusion

    InfiniBand is a "de-facto" standard for high performance computing environments that needs to guarantee extreme low latency and superior throughput. But to really take advantage of this technology, your applications need to be properly prepared. This article showed how you can maximize the performance of Java applications enabling the SDP support, the network protocol that "talks" with the InfiniBand network.

    About

    Ricardo Ferreira is just a regular person that lives in Brazil and is passionate for technology, movies and his whole family. Currently is working at Oracle Corporation.

    Search

    Categories
    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today