Query optimization using composite indexes
By xmarch on Oct 17, 2013
Many times, during the process of creating a new Coherence application, developers do not pay attention to the way cache queries are constructed; they only check that these queries comply with functional specs. Later, performance testing shows that these perform poorly and it is then when developers start working on improvements until the non-functional performance requirements are met.
This post describes the optimization process of a real-life scenario, where using a composite attribute index has brought a radical improvement in query execution times. The execution times went down from 4 seconds to 2 milliseconds!
E-commerce solution based on Oracle ATG – Endeca
In the context of a new e-commerce solution based on Oracle ATG – Endeca, Oracle Coherence has been used to calculate and store SKU prices. In this architecture, a Coherence cache stores the final SKU prices used for Endeca baseline indexing.
Each SKU price is calculated from a base SKU price and a series of calculations based on information from corporate global discounts. Corporate global discounts information is stored in an auxiliary Coherence cache with over 800.000 entries. In particular, to obtain each price the process needs to execute six queries over the global discount cache.
After the implementation was finished, we discovered that the most expensive steps in the price calculation discount process were the global discounts cache query. This query has 10 parameters and is executed 6 times for each SKU price calculation.
The steps taken to optimise this query are described below;
Initial query was:
With the small dataset used for development the cache queries performed very well. However, when carrying out performance testing with a real-world sample size of 800,000 entries, each query execution was taking more than 4 seconds.
First round of optimizations
The first optimisation step was the creation of separate Coherence index for each of the 10 attributes used by the filter. This avoided object deserialization while executing the query. Each index was created as follows:
After adding these indexes the query execution time was reduced to between 450 ms and 1s. However, these execution times were still not good enough.
Second round of optimizations
In this optimisation phase a Coherence query explain plan was used to identify how many entires each index reduced the results set by, along with the cost in ms of executing that part of the query. Though the explain plan showed that all the indexes for the query were being used, it also showed that the ordering of the query parameters was "sub-optimal". Parameters associated to object attributes with high-cardinality should appear at the beginning of the filter, or more specifically, the attributes that filters out the highest of number records should be placed at the beginning.
But examining corporate global discount data we realized that depending on the values of the parameters used in the query the “good” order for the attributes was different. In particular, if the attributes brand and family had specific values it was more optimal to have a different query changing the order of the attributes. Ultimately, we ended up with three different optimal variants of the query that were used in its relevant cases:
Using the appropriate query depending on the value of brand and family parameters the query execution time dropped to between 100 ms and 150 ms. But these these execution times were still not good enough and the solution was cumbersome.
Third and last round of optimizations
The third and final optimization was to introduce a composite index. However, this did mean that it was not possible to use the Coherence Query Language (CohQL), as composite indexes are not currently supporte in CohQL.
As the original query had 8 parameters using EqualsFilter, 1 using GreaterEqualsFilter and 1 using LessEqualsFilter, the composite index was built for the 8 attributes using EqualsFilter. The final query had an EqualsFilter for the multiple extractor, a GreaterEqualsFilter and a LessEqualsFilter for the 2 remaining attributes. All individual indexes were dropped except the ones being used for LessEqualsFilter and GreaterEqualsFilter. We were now running in an scenario with an 8-attributes composite filter and 2 single attribute filters.
The composite index created was as follows:
And the final query was:
Using this composite index the query improved dramatically and the execution time dropped to between 2 ms and 4 ms. These execution times completely met the non-functional performance requirements .
It should be noticed than when using the composite index the order of the attributes inside the ValueExtractor was not relevant.