Predicting Energy Demand using IoT
The Internet of Things (IoT) presents new opportunities for applying advanced analytics. Sensors are everywhere collecting data – on airplanes, trains, and cars, in semiconductor production machinery and the Large Hadron Collider, and even in our homes. One such sensor is the home energy smart meter, which can report household energy consumption every 15 minutes. This data enables energy companies to not only model each customer’s energy consumption patterns, but also to forecast individual usage. Across all customers, energy companies can compute aggregate demand, which enables more efficient deployment of personnel, redirection or purchase of energy, etc., often a few days or weeks out.
To build one predictive model per customer, when an energy company can have millions of customers, poses some interesting challenges. Consider an energy company with 1 million customers. Over the course of a single year, these smart meters will collect over 35 billion readings. Each customer, however, generates only about 35,000 readings. On most hardware, R can easily build a model on 35,000 readings. Note that if each model requires even only 10 seconds to build, doing this serially will require roughly 116 days to build all models. Since the results are need a few days or weeks out, a delay of months makes this project a non-starter. If powerful hardware, such as Oracle Exadata, can be leveraged to compute these models in parallel, say with degree of parallelism of 128, all models can be computed in less than one day.
While users can leverage parallelism enabled by various R packages, there are several factors that need to be taken into account. For example, what happens if certain models fail? Will the models be stored as 1 million separate flat files – one per customer? For flat files, how will backup, recovery, and security be handled? How can these models be used for forecasting customer usage and where will the forecasts be stored? How can these R models be incorporated into a production environment where applications and dashboards normally work with SQL?
Using the Embedded R Execution capability of Oracle R Enterprise, Data Scientists can focus on the task of building a model for a single customer. This model is stored in the R Script Repository in Oracle Database. ORE enables invoking this script from a single function, i.e., ore.groupApply, relying on the database to spawn multiple R engines, load one partition of data from the database to the function produced by the Data Scientist, and then store the resulting model immediately in the R Datastore, again in Oracle Database. This greatly simplifies the process of computing and storing models. Moreover, standard database backup and recovery mechanisms already in place can be used to avoid having to devise separate special practices. Forecasting using these models is handled in an analogous way.
To put these R scripts into production, users can invoke the same R scripts produced by the Data Scientist from SQL, both for the model building and forecasting. The forecasts can be immediately available as a database table that can be read by applications and dashboards, or used in other SQL queries. In addition, these SQL statements that invoke the R functions can be scheduled for periodic execution using the DBMS_SCHEDULER package of Oracle Database.
Leveraging the built-in functionality of ORE, Data Scientists, application developers, and administrators do not have to reinvent complex code and testing strategies, often done for each new project. Instead, they benefit from Oracle's integration of R with Oracle Database - to easily design and implement R-based solutions for use with applications and dashboards, and scale to the enterprise.