This large international System Integrator, where today's proofpoint was carried out, had been using and loving Java as a software language for the comfort of development and ease of maintenance. When the request to build some kind of a system and network management application, involving intensive LAN communication, came from a classified customer, the partner knew the non-deterministic nature of Java SE --Java 5 and 6 made big improvements in predictability though-- would not fit the bill. Indeed, the Java Virtual Machine stops application threads for garbage collection and other maintenance tasks so it is not possible to guarantee bounded pauses, especially when the maximal latency allowed for serving requests in this project was in the order of tens of millisecond (ms).
That said, Java, as a runtime, can take many forms. Expressive Java FX for rich clients, lightweight Java ME for mobile devices, transactional Java EE for enterprise services, real-time Java RTS for deterministic applications, etc. Our partner had no previous experience with Java Real Time but the motivation to stay on Java was so strong that they engaged in a proof-of-concept to evaluate Java RTS 2.1 on Solaris 10. With the support of Sun and our ISV Engineering team.
The idea was to take 2 Solaris x64 machines with a few CPU cores and have them intensively exchange UDP or TCP messages --hard real time was not a requirement-- with a round-trip latency not to exceed 100 ms for any message (100%). In addition to sending and receiving messages to each others, both systems were kept busy with some business logic, to bring the overall CPU utilization to 60-80% on each machine. The hardware configuration is described in the figure below; each machine was running Solaris 10 Update 5 and Java RTS 2.1 (build 1.5.0_16_Java-RTS-2.1_fcs-b11_RTSJ-1.0.2).
In addition to an average workload of 400 CPU-bound application threads --part of them being real-time threads-- on each system, Machine 1 was configured to run 4 real-time sender threads sending messages in different sizes and Machine 2 was configured to run 2 real-time sender threads, as shown in the table below. Finally, each system had one real-time thread receiving and sending an
acknowledgement message back, and another real-time thread reading the
acknowledgements (and calculating round-trip times). All the above
communication-handling threads were given highest real-time priority
(among all application threads).
|Machine 1||Machine 2|
|Thread 1 – sends 120KB every 50ms |
Thread 2 – sends 1KB every 50ms
Thread 3 – sends 100B every 50ms
Thread 4 – sends 200B every 50ms
|Thread 1 – sends 300 every 50ms |
Thread 2 – sends 350 every 50ms
When we first ran the tests, we saw, after a while, periodic delays in communication; a message round trip between the servers could take more than 500 ms. We monitored the systems and saw a correlation between those delays and real-time application threads being block by GC threads, whose priority is raised when there is a severe stress on memory. Such situations can be signs of memory leaks.
We used DTrace --a hugely popular Solaris 10 innovation-- to trouble-shoot this first source of the communication delays. We root-caused the problem to memory allocations that grew unintentionally to very large String allocations. They were easily tracked down by 2 DTrace scripts using RTS probes. First, we ran Script 1 which prints the number of allocated bytes per thread, every second. Next, we ran Script 2, which prints the top allocating stack trace, each second, for a specified thread. In 5-10 minutes, we were able to locate the problematic memory allocations
within the code. They were mainly some sort of recursive concatenation
of ever-growing strings, and caused garbage collection to block from
time to time the application threads, leading to significant delays.
|DTrace Script 1||DTrace Script 2|
A second source of communication delays came from CPU handling interrupts. Since the application's real-time threads have a higher priority, they could and would preempt the TCP squeues processes, thus getting in the way of the TCP/IP processing. Luckily, Solaris has this long-time feature of processor sets that enables the binding of specific processes onto a given set of processors, or vice-versa the isolation of these processors from the rest of the system. It is a concept that one could call into the foundations of Solaris Real Time. In this proof-of-concept, we defined a processor set of 7 cores on each machine, bounded the application process to this set (
psrset -c 1-7; psrset -b 1 <JRTS pid>) and moved all interrupts away from the application's processor set (
psrset -f 1).
At the end of the tuning exercise, we contained the max latency for a message round trip, including simple processing, under 45 milliseconds for 50KB messages and 80 ms for 120KB messages --the target was under 100 ms for all messages! Processors were running mostly in the range of 60-80% CPU utilization. The remaining (incompressible) delays were attributed to TCP/IP networking, none of the application critical threads being interrupted. The system integrator was very satisfied with the proof-of-concept and ordered multiple licenses for commercial development, which is on-going as we speak. Java RTS kinda brought the best of both worlds, the coding productivity of the Java language --that translates into lower cost of development and maintenance for the software house-- and the performance and predictability typically achieved by a native language like C.
After initial in-roads into the Finance industry --check out this previous proofpoint at Gigaspaces--, it looks like Java Real Time is getting broader adoption in software development. What could it do for you? Java RTS is available for download for a free evaluation and you can get community support on Sun Forums. Give it a try!