Thursday Sep 27, 2012

ROracle support for TimesTen In-Memory Database

Today's guest post comes from Jason Feldhaus, a Consulting Member of Technical Staff in the TimesTen Database organization at Oracle.  He shares with us a sample session using ROracle with the TimesTen In-Memory database. 

Beginning in version 1.1-4, ROracle includes support for the Oracle Times Ten In-Memory Database, version 11.2.2. TimesTen is a relational database providing very fast and high throughput through its memory-centric architecture.  TimesTen is designed for low latency, high-volume data, and event and transaction management. A TimesTen database resides entirely in memory, so no disk I/O is required for transactions and query operations. TimesTen is used in applications requiring very fast and predictable response time, such as real-time financial services trading applications and large web applications. TimesTen can be used as the database of record or as a relational cache database to Oracle Database.

ROracle provides an interface between R and the database, providing the rich functionality of the R statistical programming environment using the SQL query language. ROracle uses the OCI libraries to handle database connections, providing much better performance than standard ODBC.

The latest ROracle enhancements include:

  • Support for Oracle TimesTen In-Memory Database
  • Support for Date-Time using R's POSIXct/POSIXlt data types
  • RAW, BLOB and BFILE data type support
  • Option to specify number of rows per fetch operation
  • Option to prefetch LOB data
  • Break support using Ctrl-C
  • Statement caching support

Times Ten 11.2.2 contains enhanced support for analytics workloads and complex queries:

  • Analytic functions: AVG, SUM, COUNT, MAX, MIN, DENSE_RANK, RANK, ROW_NUMBER, FIRST_VALUE and LAST_VALUE
  • Analytic clauses: OVER PARTITION BY and OVER ORDER BY
  • Multidimensional grouping operators:
  • Grouping clauses: GROUP BY CUBE, GROUP BY ROLLUP, GROUP BY GROUPING SETS
  • Grouping functions: GROUP, GROUPING_ID, GROUP_ID
  • WITH clause, which allows repeated references to a named subquery block
  • Aggregate expressions over DISTINCT expressions
  • General expressions that return a character string in the source or a pattern within the LIKE predicate
  • Ability to order nulls first or last in a sort result (NULLS FIRST or NULLS LAST in the ORDER BY clause)

Note: Some functionality is only available with Oracle Exalytics, refer to the TimesTen product licensing document for details.

Connecting to TimesTen is easy with ROracle.  Simply install and load the ROracle package and load 
the driver.
  
   > install.packages("ROracle")
   > library(ROracle)
    Loading required package: DBI
   > drv <- dbDriver("Oracle")

Once the ROracle package is installed, create a database connection object and connect to a 
TimesTen direct driver DSN as the OS user.

   > conn <- dbConnect(drv, username ="", password="",
                   dbname = "localhost/SampleDb_1122:timesten_direct")

You have the option to report the server type - Oracle or TimesTen?

   > print (paste ("Server type =", dbGetInfo (conn)$serverType))
   [1] "Server type = TimesTen IMDB"

To create tables in the database using R data frame objects, use the function dbWriteTable.  
In the following example we write the built-in iris data frame to TimesTen. The iris data set is 
a small example data set containing 150 rows and 5 columns. We include it here not to highlight 
performance, but so users can easily run this example in their R session.

   > dbWriteTable (conn, "IRIS", iris, overwrite=TRUE, ora.number=FALSE)
   [1] TRUE

Verify that the newly created IRIS table is available in the database. To list the available tables and 
table columns in the database, use dbListTables and dbListFields, respectively.

   > dbListTables (conn)
   [1] "IRIS"
   > dbListFields (conn, "IRIS")
   [1] "SEPAL.LENGTH" "SEPAL.WIDTH"  "PETAL.LENGTH" "PETAL.WIDTH"  "SPECIES"

To retrieve a summary of the data from the database we need to save the results to a local object. The 
following call saves the results of the query as a local R object, iris.summary. The ROracle function 
dbGetQuery is used to execute an arbitrary SQL statement against the database. When connected to 
TimesTen, the SQL statement is processed completely within main memory for the fastest response 
time.
   
   > iris.summary <- dbGetQuery(conn, 'SELECT
                                       SPECIES,
                                        AVG ("SEPAL.LENGTH") AS AVG_SLENGTH,
                                        AVG ("SEPAL.WIDTH") AS AVG_SWIDTH,
                                        AVG ("PETAL.LENGTH") AS AVG_PLENGTH,
                                        AVG ("PETAL.WIDTH") AS AVG_PWIDTH
                                       FROM IRIS
                                       GROUP BY ROLLUP (SPECIES)')

   > iris.summary
        SPECIES AVG_SLENGTH AVG_SWIDTH AVG_PLENGTH AVG_PWIDTH
   1     setosa    5.006000   3.428000       1.462   0.246000
   2 versicolor    5.936000   2.770000       4.260   1.326000
   3  virginica    6.588000   2.974000       5.552   2.026000
   4       <NA>    5.843333   3.057333       3.758   1.199333


Finally, disconnect from the TimesTen Database.

   > dbCommit (conn)
   [1] TRUE
   > dbDisconnect (conn)
   [1] TRUE

We encourage you download Oracle software for evaluation from the Oracle Technology Network. See these links for our software: Times Ten In-Memory DatabaseROracle.  As always, we welcome comments and questions on the TimesTen and  Oracle R technical forums.







Sunday Sep 23, 2012

Video: Analyzing Big Data using Oracle R Enterprise

Learn how Oracle R Enterprise is used to generate new insight and new value to business, answering not only what happened, but why it happened. View this YouTube Oracle Channel video overview describing how analyzing big data using Oracle R Enterprise is different from other analytics tools at Oracle.

Oracle R Enterprise (ORE),  a component of the Oracle Advanced Analytics Option, couples the wealth of analytics packages in R with the performance, scalability, and security of Oracle Database. ORE executes base R functions transparently on database data without having to pull data from Oracle Database. As an embedded component of the database, Oracle R Enterprise can run your R script and open source packages via embedded R where the database manages the data served to the R engine and user-controlled data parallelism. The result is faster and more secure access to data. ORE also works with the full suite of in-database analytics, providing integrated results to the analyst.



Monday Sep 17, 2012

Podcast interview with Michael Kane

In this podcast interview with Michael Kane, Data Scientist and Associate Researcher at Yale University, Michael discusses the R statistical programming language, computational challenges associated with big data, and two projects involving data analysis he conducted on the stock market "flash crash" of May 6, 2010, and the tracking of transportation routes bird flu H5N1. Michael also worked with Oracle on Oracle R Enterprise, a component of the Advanced Analytics option to Oracle Database Enterprise Edition. In the closing segment of the interview, Michael comments on the relationship between the data analyst and the database administrator and how Oracle R Enterprise provides secure data management, transparent access to data, and improved performance to facilitate this relationship.

Listen now...

About

The place for best practices, tips, and tricks for applying Oracle R Enterprise, Oracle R Distribution, ROracle, and Oracle R Advanced Analytics for Hadoop in both traditional and Big Data environments.

Search

Archives
« September 2012 »
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
18
19
20
21
22
24
25
26
28
29
30
      
Today