Sunday Apr 27, 2014

Real Time Association Rules Recommendation Engine

This blog shows how you can write a SQL query for Association Rules recommendation; such a query can be used to recommend products (cross-sell) to a customer based on products already placed in his current shopping cart.  Before we can perform the recommendation, we need to build an association rules model that based on previous customer sales transactions. For the demo, I will use the SALES and PRODUCTS tables found in the sample SH schema as input data and build the association model using the free Oracle Data Miner GUI tool.

Association Rules Model Workflow

The SALES table contains time based (TIME_ID) sales transactions of all customers (CUST_ID) product purchases (PROD_ID). The actual product names can be found in the PRODUCTS table, so we join these two tables to get the sales transactions with real product names (instead of looking up the product names using the PROD_ID later).

Enter the following Transaction ids (CUST_ID, TIME_ID) and item id (PROD_NAME) in the Association Rule Build node editor.

Enter the Maximum Rule length of 2 and Minimum Confidence and Support as followings. The lower the Confidence and Support percents will yield more rules; the higher the percents will yield fewer rules. We want the generated rules to have one Antecedent to one Consequent, so we set the Maximum Rule length to 2.

SQL Query for Recommendation

The following SQL query returns the top 3 products recommendation based on products placed in the customer’s current shopping cart.

SELECT rownum AS rank,

  consequent  AS recommendation

FROM

(

  WITH rules AS (

    SELECT AR.rule_id AS "ID",

      ant_pred.attribute_subname antecedent,

      cons_pred.attribute_subname consequent,

      AR.rule_support support,

      AR.rule_confidence confidence

    FROM TABLE(dbms_data_mining.get_association_rules('AR_RECOMMENDATION')) AR,

      TABLE(AR.antecedent) ant_pred,

      TABLE(AR.consequent) cons_pred

  ),

  cust_data AS (

    SELECT 'Comic Book Heroes' AS prod_name FROM DUAL

    UNION

    SELECT 'Martial Arts Champions' AS prod_name FROM DUAL

  )

  SELECT rules.consequent,

    MAX(rules.confidence) max_confidence,

    MAX(rules.support) max_support

  FROM rules, cust_data

  WHERE cust_data.prod_name = rules.antecedent

  AND rules.consequent NOT IN (SELECT prod_name FROM cust_data)

  GROUP BY rules.consequent

  ORDER BY max_confidence DESC, max_support DESC

)

WHERE rownum <=3;


The above SQL query consists of 3 main sections: association rules, current customer data, and product recommendation.

Association Rules

The first section returns the associated rules (antecedent, consequent) and associated confidence and support values discovered by the model (AR_RECOMMENDATION) that was built in the above workflow. You may find the DBMS_DATA_MINING.GET_ASSOCIATION_RULES function reference here.

  WITH rules AS (

    SELECT AR.rule_id AS "ID",

      ant_pred.attribute_subname antecedent,

      cons_pred.attribute_subname consequent,

      AR.rule_support support,

      AR.rule_confidence confidence

    FROM TABLE(dbms_data_mining.get_association_rules('AR_RECOMMENDATION')) AR,

      TABLE(AR.antecedent) ant_pred,

      TABLE(AR.consequent) cons_pred

Current Customer Data

The middle section defines the current customer product selection on the fly (real time). For example, we assume this customer placed the 'Comic Book Heroes' and 'Martial Arts Champions' products in the current shopping cart.

  cust_data AS (

    SELECT 'Comic Book Heroes' AS prod_name FROM DUAL

    UNION

    SELECT 'Martial Arts Champions' AS prod_name FROM DUAL

  )

Product Recommendation

Last but not least is the query to return the recommended products based on the discovered rules and current customer product selection. It is possible that the rules may suggest the same product (consequent) for different customer products (prod_name), so we aggregate the consequents using the MAX function on the confidence and support values. In case of duplicate recommendations, we just use the max confidence and support values for comparison. Moreover, we don’t want the recommended products that are already placed in the customer’s shopping cart, so we add the “NOT IN (SELECTprod_name FROM cust_data)” condition. Finally, the query returns the recommendations in the order of highest confident and support first.

  SELECT rules.consequent,

    MAX(rules.confidence) max_confidence,

    MAX(rules.support) max_support

  FROM rules, cust_data

  WHERE cust_data.prod_name = rules.antecedent

  AND rules.consequent NOT IN (SELECT prod_name FROM cust_data)

  GROUP BY rules.consequent

  ORDER BY max_confidence DESC, max_support DESC

The recommendation query returns the following recommendations for the 'Comic Book Heroes' and 'Martial Arts Champions' products.

RANK   RECOMMENDATION

---------- --------------------------------

         1   Xtend Memory

         2   Endurance Racing

         3   Adventures with Numbers

Alternative SQL Query for Recommendation

The first recommendation query may not be scalable; it returns all possible rules to be processed by the recommendation sub query. The more scalable approach is to push as much processing to the GET_ASSOCIATION_RULES function as possible, so that it returns minimal set of rules for further processing. Here we specify the topn=10, min_confidence=0.1, min_support=0.01, sort_order='RULE_CONFIDENCE DESC', 'RULE_SUPPORT DESC', and the antecedent items to the function, and let it finds the top 10 set of rules that satisfy these criteria. Once we obtain the refined rule set, we filter out recommendations that already in the customer’s shopping cart and also perform aggregation (use MAX() function) on the confidence and support values. Finally, we query the top 3 recommendations based on the order of highest confident and support first.

SELECT rownum AS rank,

  consequent  AS recommendation

FROM

  (SELECT cons_pred.attribute_subname consequent,

    MAX(AR.rule_support) max_support,

    MAX(AR.rule_confidence) max_confidence

  FROM TABLE (DBMS_DATA_MINING.GET_ASSOCIATION_RULES ( 'AR_RECOMMENDATION', 10, NULL, 0.1, 0.01, 2, 1, 

                 ORA_MINING_VARCHAR2_NT ( 'RULE_CONFIDENCE DESC', 'RULE_SUPPORT DESC'), 

                 DM_ITEMS(DM_ITEM('PROD_NAME', 'Comic Book Heroes', NULL, NULL), 

                          DM_ITEM('PROD_NAME', 'Martial Arts Champions', NULL, NULL)), NULL, 1)) AR, TABLE(AR.consequent) cons_pred

  WHERE cons_pred.attribute_subname NOT IN ('Comic Book Heroes', 'Martial Arts Champions')

  GROUP BY cons_pred.attribute_subname

  ORDER BY max_confidence DESC, max_support DESC

  )

WHERE rownum <=3;


Note: another consideration is to order the rules by the lift value; the higher the lift value the more accurate the recommendation.

SQL Query for Recommendation using Customer Previous Sales Transactions
I am going to extend the above recommendation SQL query to include the customer previous sales transactions, so that the recommendation is now based on the previous purchased products and the products in the current shopping cart. Moreover, we don’t want any recommended products that have been purchased previously or already placed in the current shopping cart. For this example, we use a window of 12 months since the last customer purchase as the past sales history used for recommendation.

To include the customer sales history (assume cust_id = 3), a hist_cust_data sub query is added to obtain the previously purchased products. A tot_cust_data sub query is added to include both the products in the current shopping cart and the previously purchased products. The following query returns top 3 recommendations based on customer previously purchased products in the last 12 months and the products in the current shopping cart.

SELECT rownum AS rank, consequent AS recommendation

FROM

(

  WITH rules AS (

    SELECT AR.rule_id AS "ID",

      ant_pred.attribute_subname antecedent,

      cons_pred.attribute_subname consequent,

      AR.rule_support support,

      AR.rule_confidence confidence,

      AR.rule_lift lift

    FROM TABLE(dbms_data_mining.get_association_rules('AR_RECOMMENDATION')) AR,

         TABLE(AR.antecedent) ant_pred,

         TABLE(AR.consequent) cons_pred

  ),

  cur_cust_data AS (

    SELECT 'Comic Book Heroes' AS PROD_NAME FROM DUAL

    UNION

    SELECT 'Martial Arts Champions' AS PROD_NAME FROM DUAL

  ),

  hist_cust_data AS(

    SELECT

      DISTINCT PROD_NAME

    FROM sh.sales s, sh.products p

    WHERE cust_id = 3

      AND s.prod_id = p.prod_id

      -- customer historical purchase for last 12 months

      AND time_id  >= add_months((SELECT MAX(time_id) FROM sh.sales WHERE cust_id = 3), -12) 

  ),

  tot_cust_data AS (

    SELECT PROD_NAME FROM cur_cust_data

    UNION

    SELECT PROD_NAME FROM hist_cust_data

  )

  SELECT rules.consequent,

    SUM(rules.lift) lift_sum,

    SUM(rules.confidence) confidence_sum,

    SUM(rules.support) support_sum

  FROM rules, tot_cust_data

  WHERE tot_cust_data.prod_name = rules.antecedent

    -- don't recommend products that customer already owned or about to purchase  

    AND rules.consequent NOT IN (SELECT prod_name FROM tot_cust_data) 

  GROUP BY rules.consequent

  ORDER BY lift_sum DESC, confidence_sum DESC, support_sum DESC

)

WHERE rownum <= 3;

Conclusion

This blog shows a few examples of how you can write a recommendation SQL query with different flavors (with or without historical sales transactions). You may also consider assign a profit for each product, so that you may come up with a query that returns the top most profitable product recommendations.

About

Everything about Oracle Data Mining, a component of the Oracle Advanced Analytics Option - News, Technical Information, Opinions, Tips & Tricks. All in One Place

Search

Categories
Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today