X

Learn about Oracle Machine Learning for Oracle Database and Big Data, on-premises and Oracle Cloud

  • January 14, 2014

How to export data from the Explore Node using Data Miner and SQL Developer

Charlie Berger
Sr. Dir. Product Management, Machine Learning, AI and Cognitive Analytics

Blog posting by Denny Wong, Principal Member of Technical Staff, User Interfaces and Components, Oracle Data Mining Development

The Explorer node generates descriptive statistical data and histogram data for all input table columns.  These statistical and histogram data may help user to analyze the input data to determine if any action (e.g. transformation) is needed before using it for data mining purpose.  An analyst may want to export this data to a file for offline analysis (e.g. Excel) or reporting purpose.  The Explorer node generates this data to a database table specified in the Output tab of the Property Inspector.  In this case, the data is generated to a table named “OUTPUT_1_2”.


 

To export the table to a file, we can use the SQL Developer Export wizard. Go to the Connections tab in the Navigator Window, search for the table “OUTPUT_1_2” within the proper connection, then bring up the pop-up menu off the table. Click on the Export menu to launch the Export Wizard.


 

In the wizard, uncheck the “Export DDL” and select the “Export Data” option since we are only interested in the data itself. In the Format option, select “excel” in this example (a dozen of output formats are supported) and specify the output file name. Upon wizard finish, an excel file is generated.


 

Let’s open the file to examine what is in it. As expected, it contains all statistical data for all input columns. The histogram data is listed as the last column (HISTOGRAMS), and it has this ODMRSYS.ODMR_HISTOGRAMS structure.


 

For example, let’s take a closer look at the histogram data
for the BUY_INSURANCE column:

ODMRSYS.ODMR_HISTOGRAMS(ODMRSYS.ODMR_HISTOGRAM_POINT('"BUY_INSURANCE"',''No'',NULL,NULL,73.1),ODMRSYS.ODMR_HISTOGRAM_POINT('"BUY_INSURANCE"',''Yes'',NULL,NULL,26.9))

This column contains an ODMRSYS.ODMR_HISTOGRAMS object which is an
array of ODMRSYS.ODMR_HISTOGRAM_POINT structure. We can describe the structure to see what is
in it.


 

The ODMRSYS.ODMR_HISTOGRAM_POINT
contains five attributes, which represent the histogram data. The ATTRIBUTE_NAME contains the attribute
name (e.g. BUY_INSURANCE), the ATTRIBUTE_VALUE contains the attribute values
(e.g. No, Yes), the GROUPING_ATTRIBUTE_NAME and GROUPING_ ATTRIBUTE_VALUE are
not used (these fields are used when the Group By option is specified), and the
ATTRIBUTE_PERCENT contains the percents (e.g. 73.1, 26.9) for the attribute
values respectively.


 

As you can see the ODMRSYS.ODMR_HISTOGRAMS
complex output format may be difficult to read and it may require some
processing before the data can be used. Alternatively, we can “unnest” the histogram data to transactional data format
before exporting it. This way we don’t
have to deal with the complex array structure, thus the data is more
consumable. To do that, we can write a
simple SQL query to “unnest” the data and use the new SQL Query node (Extract
histogram data) to run this query (see below). We then use a Create Table node (Explorer output table) to persist the
“unnested” histogram data along with the statistical data.

1. Create a SQL Query node

Create a SQL Query node and
connect the “Explore Data” node to it. You may rename the SQL Query node to “Extract histogram data” to make it
clear it is used to “unnest” the histogram data.

 

2. Specify a SQL query to “unnest” histogram data

Double click the “Extract histogram data” node to bring up the editor, enter
the following SELECT statement in the editor:

SELECT
    "Explore Data_N$10002"."ATTR",
    "Explore Data_N$10002"."AVG",
    "Explore Data_N$10002"."DATA_TYPE",
    "Explore Data_N$10002"."DISTINCT_CNT",
    "Explore Data_N$10002"."DISTINCT_PERCENT",
    "Explore Data_N$10002"."MAX",
    "Explore Data_N$10002"."MEDIAN_VAL",
    "Explore Data_N$10002"."MIN",
    "Explore Data_N$10002"."MODE_VALUE",
    "Explore Data_N$10002"."NULL_PERCENT",
    "Explore Data_N$10002"."STD",
    "Explore Data_N$10002"."VAR",
    h.ATTRIBUTE_VALUE,
    h.ATTRIBUTE_PERCENT
FROM
    "Explore Data_N$10002", TABLE("Explore Data_N$10002"."HISTOGRAMS") h

Click OK to close the editor. This query is used to extract out the ATTRIBUTE_VALUE and ATTRIBUTE_PERCENT fields from the ODMRSYS.ODMR_HISTOGRAMS nested object.

Note: you may select only columns that contain the statistics you are interested in.  The "Explore Data_N$10002" is a generated unique name reference to the Explorer node, you may have a slightly different name ending with some other unique number. 

The query produces the following output.  The last two columns are the histogram data
in transactional format.

 

3. Create a Create Table node to persist the “unnested”
histogram data

Create a Create Table node
and connect the “Extract histogram data” node to it. You may rename the Create Table node to “Explorer
output table” to make it clear it is used to persist the “unnested” histogram
data.


 

4. Export “unnested” histogram data to Excel file

Run the “Explorer output table” node to persist the
“unnested” histogram data to a table. The name of the output table (OUTPUT_3_4) can be found in the Property
Inspector below.


 

Next, we can use the SQL Developer Export wizard as
described above to export the table to an Excel file. As you can see the histogram data are now in
transactional format; they are more readable and can readily be consumed.


 

 

 

Join the discussion

Comments ( 1 )
  • AXEL DAVID VELAZQUEZ Thursday, January 16, 2014

    Very interesting and useful.

    thanks for sharing!


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.