Query optimization using composite indexes

Many times, during the process of creating a new Coherence application, developers do not pay attention to the way cache queries are constructed; they only check that these queries comply with functional specs. Later, performance testing shows that these perform poorly and it is then when developers start working on improvements until the non-functional performance requirements are met.

This post describes the optimization process of a real-life scenario, where using a composite attribute index has brought a radical improvement in query execution times.  The execution times went down from 4 seconds to 2 milliseconds!

E-commerce solution based on Oracle ATG – Endeca

In the context of a new e-commerce solution based on Oracle ATG – Endeca, Oracle Coherence has been used to calculate and store SKU prices. In this architecture, a Coherence cache stores the final SKU prices used for Endeca baseline indexing.

Each SKU price is calculated from a base SKU price and a series of calculations based on information from corporate global discounts. Corporate global discounts information is stored in an auxiliary Coherence cache with over 800.000 entries. In particular, to obtain each price the process needs to execute six queries over the global discount cache.

After the implementation was finished, we discovered that the most expensive steps in the price calculation discount process were the global discounts cache query. This query has 10 parameters and is executed 6 times for each SKU price calculation.

The steps taken to optimise this query are described below;

Starting point

Initial query was:

String filter = "levelId = :iLevelId AND  salesCompanyId = :iSalesCompanyId AND salesChannelId = :iSalesChannelId "
+ "AND departmentId = :iDepartmentId AND familyId = :iFamilyId AND brand = :iBrand AND manufacturer = :iManufacturer "
+ "AND areaId = :iAreaId AND endDate >=  :iEndDate AND startDate <= :iStartDate";

Map<String, Object> params = new HashMap<String, Object>(10);

// Fill all parameters.

params.put("iLevelId", xxxx);

// Executing filter.

Filter globalDiscountsFilter = QueryHelper.createFilter(filter, params);

NamedCache globalDiscountsCache = CacheFactory.getCache(CacheConstants.GLOBAL_DISCOUNTS_CACHE_NAME);

Set applicableDiscounts = globalDiscountsCache.entrySet(globalDiscountsFilter);

With the small dataset used for development the cache queries performed very well. However, when carrying out performance testing with a real-world sample size of 800,000 entries, each query execution was taking more than 4 seconds.

First round of optimizations

The first optimisation step was the creation of separate Coherence index for each of the 10 attributes used by the filter. This avoided object deserialization while executing the query. Each index was created as follows:

globalDiscountsCache.addIndex(new ReflectionExtractor("getXXX" ) , false, null);

After adding these indexes the query execution time was reduced to between 450 ms and 1s. However, these execution times were still not good enough. 

Second round of optimizations

In this optimisation phase a Coherence query explain plan was used to identify how many entires each index reduced the results set by, along with the cost in ms of executing that part of the query. Though the explain plan showed that all the indexes for the query were being used, it also showed that the ordering of the query parameters was "sub-optimal".  Parameters associated to object attributes with high-cardinality should appear at the beginning of the filter, or more specifically, the attributes that filters out the highest of number records should be placed at the beginning.

But examining corporate global discount data we realized that depending on the values of the parameters used in the query the “good” order for the attributes was different. In particular, if the attributes brand and family had specific values it was more optimal to have a different query changing the order of the attributes. Ultimately, we ended up with three different optimal variants of the query that were used in its relevant cases:

String filter = "brand = :iBrand AND familyId = :iFamilyId AND departmentId = :iDepartmentId AND levelId = :iLevelId "
+ "AND manufacturer = :iManufacturer AND endDate >= :iEndDate AND salesCompanyId = :iSalesCompanyId "
+ "AND areaId = :iAreaId AND salesChannelId = :iSalesChannelId AND startDate <= :iStartDate";

String filter = "familyId = :iFamilyId AND departmentId = :iDepartmentId AND levelId = :iLevelId AND brand = :iBrand "
+ "AND manufacturer = :iManufacturer AND endDate >=  :iEndDate AND salesCompanyId = :iSalesCompanyId "
+ "AND areaId = :iAreaId  AND salesChannelId = :iSalesChannelId AND startDate <= :iStartDate";

String filter = "brand = :iBrand AND departmentId = :iDepartmentId AND familyId = :iFamilyId AND levelId = :iLevelId "
+ "AND manufacturer = :iManufacturer AND endDate >= :iEndDate AND salesCompanyId = :iSalesCompanyId "
+ "AND areaId = :iAreaId AND salesChannelId = :iSalesChannelId AND startDate <= :iStartDate";

Using the appropriate query depending on the value of brand and family parameters the query execution time dropped to between 100 ms and 150 ms. But these these execution times were still not good enough and the solution was cumbersome.

Third and last round of optimizations

The third and final optimization was to introduce a composite index. However, this did mean that it was not possible to use the Coherence Query Language (CohQL), as composite indexes are not currently supporte in CohQL.

As the original query had 8 parameters using EqualsFilter, 1 using GreaterEqualsFilter and 1 using LessEqualsFilter, the composite index was built for the 8 attributes using EqualsFilter. The final query had an EqualsFilter for the multiple extractor, a GreaterEqualsFilter and a LessEqualsFilter for the 2 remaining attributes.  All individual indexes were dropped except the ones being used for LessEqualsFilter and GreaterEqualsFilter. We were now running in an scenario with an 8-attributes composite filter and 2 single attribute filters.

The composite index created was as follows:

ValueExtractor[] ve = { new ReflectionExtractor("getSalesChannelId" ), new ReflectionExtractor("getLevelId" ),
   new ReflectionExtractor("getAreaId" ), new ReflectionExtractor("getDepartmentId" ),
   new ReflectionExtractor("getFamilyId" ), new ReflectionExtractor("getManufacturer" ),
   new ReflectionExtractor("getBrand" ), new ReflectionExtractor("getSalesCompanyId" )};

MultiExtractor me = new MultiExtractor(ve);

NamedCache globalDiscountsCache = CacheFactory.getCache(CacheConstants.GLOBAL_DISCOUNTS_CACHE_NAME);

globalDiscountsCache.addIndex(me, false, null);

And the final query was:

ValueExtractor[] ve = { new ReflectionExtractor("getSalesChannelId" ), new ReflectionExtractor("getLevelId" ),
   new ReflectionExtractor("getAreaId" ), new ReflectionExtractor("getDepartmentId" ),
   new ReflectionExtractor("getFamilyId" ), new ReflectionExtractor("getManufacturer" ),
   new ReflectionExtractor("getBrand" ), new ReflectionExtractor("getSalesCompanyId" )};

MultiExtractor me = new MultiExtractor(ve);

// Fill composite parameters.
String SalesCompanyId = xxxx;
...
AndFilter composite = new AndFilter(new EqualsFilter(me,
                  Arrays.asList(iSalesChannelId, iLevelId, iAreaId, iDepartmentId, iFamilyId, iManufacturer, iBrand, SalesCompanyId)),
                                     new GreaterEqualsFilter(new ReflectionExtractor("getEndDate" ), iEndDate));

AndFilter finalFilter = new AndFilter(composite, new LessEqualsFilter(new ReflectionExtractor("getStartDate" ), iStartDate));

NamedCache globalDiscountsCache = CacheFactory.getCache(CacheConstants.GLOBAL_DISCOUNTS_CACHE_NAME);

Set applicableDiscounts = globalDiscountsCache.entrySet(finalFilter);     

Using this composite index the query improved dramatically and the execution time dropped to between 2 ms and  4 ms.  These execution times completely met the non-functional performance requirements :).

It should be noticed than when using the composite index the order of the attributes inside the ValueExtractor was not relevant.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

Oracle Coherence, the market leading in-memory datagrid product.
Stay Connected...

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
4
5
6
7
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today