Oracle's Strategy for Advanced Analytics
By Mark Hornick-Oracle on Apr 16, 2014
At Oracle our goal is to enable you to get timely insight from all of your data. We continuously enhance Oracle Database to allow workloads that have traditionally required extracting data from the database to run in-place. We do this to narrow the gap that exists between insights that can be obtained and available data - because any data movement introduces latencies, complexity due to more moving parts, the ensuing need for data reconciliation and governance, as well as increased cost. The Oracle tool set considers the needs of all types of enterprise users - users preferring GUI based access to analytics with smart defaults and heuristics out of the box, users choosing to work interactively and quantitatively with data using R, and users preferring SQL and focusing on operationalization of models.
Oracle recognized the need to support data analysts, statisticians, and data scientists with a widely used and rapidly growing statistical programming language. Oracle chose R - recognizing it as the new de facto standard for computational statistics and advanced analytics. Oracle supports R in at least 3 ways:
- R as the language of interaction with the database
- R as the language in which analytics can be written and executed in the database as a high performance computing platform
- R as the language in which several native high performance analytics have been written that execute in database
Additionally, of course, you may chose to leverage any of the CRAN algorithms to execute R scripts at the database server leveraging several forms of data parallelism.
Providing the first and only supported commercial distribution of R from an established company, Oracle released Oracle R Distribution. In 2012 Oracle embarked on the Hadoop journey acknowledging alternative data management options emerging in the open source for management of unstructured or not-yet-structured data. In keeping with our strategy of delivering analytics close to where data is stored, Oracle extended Advanced Analytics capabilities to execute on HDFS resident data in Hadoop environments. R has been integrated into Hadoop in exactly the same manner as it has been with the database.
Realizing that data is stored in both database and non-database environment, Oracle provides users options for storing their data (in Oracle Database, HDFS, and Spark RDD), where to perform computations (in-database or the Hadoop cluster), and where results should be stored (Oracle Database or HDFS). Users can write R scripts that can be leveraged across database and Hadoop environments. Oracle Database, as a preferred location for storing R scripts, data, and result objects, provides a real-time scoring and deployment platform. It is also easy to create a model factory environment with authorization, roles, and privileges, combined with auditing, backup, recovery, and security.
Oracle provides a common infrastructure that supports both in-database and custom R algorithms. Oracle also provides an integrated GUI for business users. Oracle provides both R-based access and GUI-based access to in-database analytics. A major part of Oracle's strategy is to maintain agility in our portfolio of supported techniques - being responsive to customer needs.