Tuesday Jan 14, 2014

How to export data from the Explore Node using Data Miner and SQL Developer

Blog posting by Denny Wong, Principal Member of Technical Staff, User Interfaces and Components, Oracle Data Mining Development

The Explorer node generates descriptive statistical data and histogram data for all input table columns.  These statistical and histogram data may help user to analyze the input data to determine if any action (e.g. transformation) is needed before using it for data mining purpose.  An analyst may want to export this data to a file for offline analysis (e.g. Excel) or reporting purpose.  The Explorer node generates this data to a database table specified in the Output tab of the Property Inspector.  In this case, the data is generated to a table named “OUTPUT_1_2”.


To export the table to a file, we can use the SQL Developer Export wizard. Go to the Connections tab in the Navigator Window, search for the table “OUTPUT_1_2” within the proper connection, then bring up the pop-up menu off the table. Click on the Export menu to launch the Export Wizard.


In the wizard, uncheck the “Export DDL” and select the “Export Data” option since we are only interested in the data itself. In the Format option, select “excel” in this example (a dozen of output formats are supported) and specify the output file name. Upon wizard finish, an excel file is generated.


Let’s open the file to examine what is in it. As expected, it contains all statistical data for all input columns. The histogram data is listed as the last column (HISTOGRAMS), and it has this ODMRSYS.ODMR_HISTOGRAMS structure.


For example, let’s take a closer look at the histogram data for the BUY_INSURANCE column:

ODMRSYS.ODMR_HISTOGRAMS(ODMRSYS.ODMR_HISTOGRAM_POINT('"BUY_INSURANCE"',''No'',NULL,NULL,73.1),ODMRSYS.ODMR_HISTOGRAM_POINT('"BUY_INSURANCE"',''Yes'',NULL,NULL,26.9))

This column contains an ODMRSYS.ODMR_HISTOGRAMS object which is an array of ODMRSYS.ODMR_HISTOGRAM_POINT structure. We can describe the structure to see what is in it.


The ODMRSYS.ODMR_HISTOGRAM_POINT contains five attributes, which represent the histogram data. The ATTRIBUTE_NAME contains the attribute name (e.g. BUY_INSURANCE), the ATTRIBUTE_VALUE contains the attribute values (e.g. No, Yes), the GROUPING_ATTRIBUTE_NAME and GROUPING_ ATTRIBUTE_VALUE are not used (these fields are used when the Group By option is specified), and the ATTRIBUTE_PERCENT contains the percents (e.g. 73.1, 26.9) for the attribute values respectively.


As you can see the ODMRSYS.ODMR_HISTOGRAMS complex output format may be difficult to read and it may require some processing before the data can be used. Alternatively, we can “unnest” the histogram data to transactional data format before exporting it. This way we don’t have to deal with the complex array structure, thus the data is more consumable. To do that, we can write a simple SQL query to “unnest” the data and use the new SQL Query node (Extract histogram data) to run this query (see below). We then use a Create Table node (Explorer output table) to persist the “unnested” histogram data along with the statistical data.

1. Create a SQL Query node

Create a SQL Query node and connect the “Explore Data” node to it. You may rename the SQL Query node to “Extract histogram data” to make it clear it is used to “unnest” the histogram data.

2. Specify a SQL query to “unnest” histogram data

Double click the “Extract histogram data” node to bring up the editor, enter the following SELECT statement in the editor:

SELECT
    "Explore Data_N$10002"."ATTR",
    "Explore Data_N$10002"."AVG",
    "Explore Data_N$10002"."DATA_TYPE",
    "Explore Data_N$10002"."DISTINCT_CNT",
    "Explore Data_N$10002"."DISTINCT_PERCENT",
    "Explore Data_N$10002"."MAX",
    "Explore Data_N$10002"."MEDIAN_VAL",
    "Explore Data_N$10002"."MIN",
    "Explore Data_N$10002"."MODE_VALUE",
    "Explore Data_N$10002"."NULL_PERCENT",
    "Explore Data_N$10002"."STD",
    "Explore Data_N$10002"."VAR",
    h.ATTRIBUTE_VALUE,
    h.ATTRIBUTE_PERCENT
FROM
    "Explore Data_N$10002", TABLE("Explore Data_N$10002"."HISTOGRAMS") h

Click OK to close the editor. This query is used to extract out the ATTRIBUTE_VALUE and ATTRIBUTE_PERCENT fields from the ODMRSYS.ODMR_HISTOGRAMS nested object.

Note: you may select only columns that contain the statistics you are interested in.  The "Explore Data_N$10002" is a generated unique name reference to the Explorer node, you may have a slightly different name ending with some other unique number. 

The query produces the following output.  The last two columns are the histogram data in transactional format.

3. Create a Create Table node to persist the “unnested” histogram data

Create a Create Table node and connect the “Extract histogram data” node to it. You may rename the Create Table node to “Explorer output table” to make it clear it is used to persist the “unnested” histogram data.


4. Export “unnested” histogram data to Excel file

Run the “Explorer output table” node to persist the “unnested” histogram data to a table. The name of the output table (OUTPUT_3_4) can be found in the Property Inspector below.


Next, we can use the SQL Developer Export wizard as described above to export the table to an Excel file. As you can see the histogram data are now in transactional format; they are more readable and can readily be consumed.


About

Everything about Oracle Data Mining, a component of the Oracle Advanced Analytics Option - News, Technical Information, Opinions, Tips & Tricks. All in One Place

Search

Categories
Archives
« January 2014 »
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today