Talend Integration Suite optimized on Solaris
By Frederic Pariente on Mar 01, 2011
Continuing with the spirit of the Tunathon program --an innovative enginneering program to study and tune application performance on Solaris, run at Sun Microsystems in the early 2000's--, we at ISV Engineering are still running today "Tunathon" projects with our partners, i.e. tuning their application on Solaris --we have about 5 in flight right now. Tunathon efforts are in fact more and more relevant as computers are becoming more complex, scalable and heteregeneous --think e.g. of a 4-socket quad-core dual-thread system with extra GPU engines. Developers have the impossible job to release new business logic in their code, faster and faster, while being fully optimized and scalable on systems that a developer never gets his hands on to test scalability to start with, anyway. And the programming frameworks, good for developer productivity and code quality, comes as additional layers that can make debugging and optimization a real nightmare.
Recently, Talend, a fast growing ISV positioned by Gartner in the “Visionaries” quadrant of the “Magic Quadrant for Data Integration Tools”, contacted us to report a serious performance issue at one of their customers, a large bank, using the Talend Integration Suite application on a large 32-way quad-core SPARC M-Series server. Although fully multi-threaded, the software simply did not scale on such a large system. We got on it right away, set up a 128-thread Sun T5140 system in our Lab to reproduce the problem, and took a closer look at the Java code.99% of the time was spent in a hot routine that used hardly any CPU time, memory or IO bandwidth. It turned out that all threads were fighting for the same resource, a Java synchronized map, generating locks on the entire map and thus creating a huge contention. This contention was removed thru the use of a special thread-safe hash map offered since Java 5.0 SE, called ConcurrentHashMap. This map allows multiple threads to do updates, as long as it is not on the same value. Therefore a single thread will only lock what it needs to change and not the entire map. The degree of parallelism, by default 16, can be set in the constructor, e.g. 256 in the example below :
private static Map syncMap = Collections.synchronizedMap(new HashMap());
private static ConcurrentHashMap<Integer, Integer> syncMap =
new ConcurrentHashMap<Integer, Integer>(16, (float) 0.75, 256);
The root cause of this concurrency issue was not easily detectable by the developers because it happened thru the use of Hibernate, a popular framework that facilitates the storage and retrieval of Java objects via Object-Relational mapping mechanisms. While the developers believed to be using the lazy loading mode --a
design pattern commonly used to defer initialization of an object until the point at which it is needed--, we were surprised to see that each time a user object was used, all the user data and the data of its dependent objects where loaded from the database, put into synchronized Java maps and accessed by numerous threads. To make matters worse, the data loading process above was triggered every minute for each connected user, just to count the number of active users logged in the system. We were able to radically change the way active users where counted by means of a simple
select count(\*) JDBC call to the database.
The figure below shows the improvements in time to log-in and server ping, brought by the Tunathon changes --note the logarithmic scale, we were really hitting a scalability wall in the original code! These changes completely removed the performance bottleneck and are included in the latest release 4.0.2 of Talend Integration Suite.
As an ISV supporting Solaris, you do not need to wait for your customers to hit the limits in scalability of your application on their own large enterprise servers. Oracle has an affordable line of SPARC T-Series systems, including the world's first 16-core processor, that you can use to stress your application in-house --a T5140 system, resp. T3-1 system, packs 128 threads in only 1U, resp. 2U, of a standard rack. At ISV Engineering, we welcome opportunities to work with you on such Tunathon projects.