Simple and Advanced Time series with Oracle R Enterprise

This guest post from Marcos Arancibia describes how to use Oracle R Enterprise to analyze Time Series data.

In this article, we give an overview of how to use Time Series Analysis against data stored in Oracle Database, using the Embedded R Execution capability to send time series computations to the Oracle Database server instead processing at the client. We will also learn how to retrieve the final series or forecasts and retrieve them to the client for plotting, forecasting, and diagnosing.

One key thing to keep in mind when using Time Series techniques with data that is stored in Oracle Database is the order of the rows, or records. Because of the parallel capabilities of Oracle Database, when queried for records, one might end up receiving records out of order if an option for order is not specified.

Simple Example using Stock Data

Let’s start with a simple Time Series example. First we will need to connect to our Oracle Database using ORE. Then, using the package TTR, we will access Oracle Stock data from YahooData service, from January 1, 2008 to January 1, 2013 and push it to the database.

# Load the ORE library and connect to Oracle Database

library(ORE)

ore.connect("myuser","mysid","myserver","mypass",port=1521,all=TRUE)

library(TTR)

# Get data in XTS format

xts.orcl <- getYahooData("ORCL", 20080101, 20130101)

# Convert it to a data frame and gets the date

# Makes the date the Index

df.orcl <- data.frame(xts.orcl)

df.orcl$date <- (data.frame(date=index(xts.orcl))$date)

# Create/overwrite data in Oracle Database

# to a Table called ORCLSTOCK

ore.drop(table="ORCLSTOCK")

ore.create(df.orcl,table="ORCLSTOCK")

# IMPORTANT STEP!!!

# Ensure indexing is kept by date

rownames(ORCLSTOCK) <- ORCLSTOCK$date

# Ensure the data is in the DB

ore.ls()

# Review column names, data statistics and

# print a sample of the data

names(ORCLSTOCK)

>names(ORCLSTOCK)

[1] "Open" "High" "Low" "Close" "Volume"

[6] "Unadj.Close" "Div" "Split" "Adj.Div" "date"

summary(ORCLSTOCK$Close)

>summary(ORCLSTOCK$Close)

Min. 1st Qu. Median Mean 3rd Qu. Max.

13.36 20.53 24.22 24.79 29.70 35.73

head(ORCLSTOCK)

>head(ORCLSTOCK)

Open High Low Close Volume

2008-01-02 01:00:00 21.74414 22.00449 21.58022 21.68629 44360179

2008-01-03 01:00:00 21.62843 22.28413 21.62843 22.28413 43600532

2008-01-04 01:00:00 21.95628 22.06235 21.01130 21.24272 46391263

2008-01-07 01:00:00 21.17523 21.67664 21.01130 21.45486 41527032

2008-01-08 01:00:00 21.44522 21.52236 20.38453 20.39417 45155398

2008-01-09 01:00:00 20.57738 20.91487 20.39417 20.83773 49750304

Unadj.Close Div Split Adj.Div date

2008-01-02 01:00:00 22.49 NA NA NA 2008-01-02

2008-01-03 01:00:00 23.11 NA NA NA 2008-01-03

2008-01-04 01:00:00 22.03 NA NA NA 2008-01-04

2008-01-07 01:00:00 22.25 NA NA NA 2008-01-07

2008-01-08 01:00:00 21.15 NA NA NA 2008-01-08

2008-01-09 01:00:00 21.61 NA NA NA 2008-01-09

Pull data from the database for a simple plot

# Pull data from Oracle Database (only the necessary columns)

orcl <- ore.pull(ORCLSTOCK[,c("date","Close","Open","Low","High")])

# Simple plot with base libraries - Closing

plot(orcl$date,orcl$Close,type="l",col="red",xlab="Date",ylab="US$",

main="Base plot:Daily ORACLE Stock Closing points")

# Simple plot with base libraries - Other Series

plot(orcl$date,orcl$Open,type="l",col="blue",xlab="Date",ylab="US$",

main="Base plot:Daily ORACLE Stock: Open/High/Low points")

lines(orcl$date,orcl$High,col="green")

lines(orcl$date,orcl$Low,col="orange")

legend("topleft", c("Opening","High","Low"),

col=c("blue","green","orange"),lwd=2,title = "Series",bty="n")

A different plot option, using the package xts

library(xts)

# Pull data from Oracle Database (only the necessary columns)

orcl <- ore.pull(ORCLSTOCK[,c("date","Close","Open","Low","High")])

# Convert data to Time Series format

orcl.xts <- as.xts(orcl,order.by=orcl$date,dateFormat="POSIXct")

# Plot original series

plot(orcl.xts$Close,major.ticks='months',minor.ticks=FALSE,

main="Time Series plot:Daily ORACLE Stock Closing points",col="red")

Simple Time Series: Moving Average Smoothing

We might be tempted to call functions like the Smoothing Moving Average from open-source CRAN packages against Oracle Database Tables, but those packages do not know what to do with an “ore.frame”. For that process to work correctly, we can either load the data locally or send the process for remote execution on the Database Server by using Embedded R Execution.

We will also explore the built-in Moving Average process from ore.rollmean() as a third alternative.

ALTERNATIVE 1 - The first example is pulling the data from Oracle Database into a ts (time series) object first, for a Client-side smoothing Process.

library(TTR)

# Pull part of the database table into a local data.frame

sm.orcl <- ore.pull(ORCLSTOCK[,c("date","Close")])

# Convert "Close" attribute into a Time Series (ts)

ts.orcl <- ts(sm.orcl$Close)

# Use SMA - Smoothing Moving Average algorithm from package TTR

ts.sm.orcl <-ts(SMA(ts.orcl,n=30),frequency=365, start=c(2008,1) )

# Plot both Series together

plot(sm.orcl$date,sm.orcl$Close,type="l",col="red",xlab="Date",ylab="US$",

main="ORCL Stock Close CLIENT-side Smoothed Series n=30 days")

lines(sm.orcl$date,ts.sm.orcl,col="blue")

legend("topleft", c("Closing","MA(30) of Closing"),

col=c("red","blue"),lwd=2,title = "Series",bty="n")

ALTERNATIVE 2 – In this alternative, we will use a Server-side example for running the Smoothing via Moving Average, without bringing all data to the client. Only the result is brought locally for plotting. Remember that the TTR package has to be installed on the Server in order to be called.

# Server execution call using ore.tableApply

# Result is an ore.list that remains in the database until needed

sv.orcl.ma30 <-

ore.tableApply(ORCLSTOCK[,c("date","Close")],ore.connect = TRUE,

function(dat) {

library(TTR)

ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]

list(res1 <- ts(ordered$Close,frequency=365, start=c(2008,1)),

res2 <- ts(SMA(res1,n=30),frequency=365, start=c(2008,1)),

res3 <- ordered$date)

}

);

# Bring the results locally for plotting

local.orcl.ma30 <- ore.pull(sv.orcl.ma30)

# Plot two series side by side

# (the third element of the list is the date)

plot(local.orcl.ma30[[3]],local.orcl.ma30[[1]],type="l",

col="red",xlab="Date",ylab="US$",

main="ORCL Stock Close SERVER-side Smoothed Series n=30 days")

# Add smoothed series

lines(local.orcl.ma30[[3]],

local.orcl.ma30[[2]],col="blue",type="l")

# Add legend

legend("topleft", c("Closing","Server MA(30) of Closing"),

col=c("red","blue"), lwd=2,title = "Series", bty="n")

ALTERNATIVE 3 – In this alternative we will use a Server-side example with the computation of Moving Averages using the native ORE in-Database functions without bringing data to the client. Only the result is brought locally for plotting.

Just one line of code is needed to generate an in-Database Computation of Moving averages and the creation of a new VIRTUAL column in the Oracle Database. We will call this new column rollmean30.

We will use the function ore.rollmean(). The option align="right" makes the MA look at only the past k days (30 in this case), or less, depending on the point in time. This creates a small difference between this method and the previous methods in the beginning of the series, since ore.rollmean() can actually calculate the first sets of days using smaller sets of data available, while other methods discard this data.

# Moving Average done directly in Oracle Database

ORCLSTOCK$rollmean30 <- ore.rollmean(ORCLSTOCK$Close, k = 30, align="right")

# Check that new variable is in the database

head(ORCLSTOCK)

>head(ORCLSTOCK)

Open High Low Close Volume

2008-01-02 01:00:00 21.74414 22.00449 21.58022 21.68629 44360179

2008-01-03 01:00:00 21.62843 22.28413 21.62843 22.28413 43600532

2008-01-04 01:00:00 21.95628 22.06235 21.01130 21.24272 46391263

2008-01-07 01:00:00 21.17523 21.67664 21.01130 21.45486 41527032

2008-01-08 01:00:00 21.44522 21.52236 20.38453 20.39417 45155398

2008-01-09 01:00:00 20.57738 20.91487 20.39417 20.83773 49750304

Unadj.Close Div Split Adj.Div date rollmean30

2008-01-02 01:00:00 22.49 NA NA NA 2008-01-02 21.68629

2008-01-03 01:00:00 23.11 NA NA NA 2008-01-03 21.98521

2008-01-04 01:00:00 22.03 NA NA NA 2008-01-04 21.73771

2008-01-07 01:00:00 22.25 NA NA NA 2008-01-07 21.66700

2008-01-08 01:00:00 21.15 NA NA NA 2008-01-08 21.41243

2008-01-09 01:00:00 21.61 NA NA NA 2008-01-09 21.31665

# Get results locally for plotting

local.orcl <- ore.pull(ORCLSTOCK[,c("date","Close", "rollmean30")])

sub.orcl <- subset(local.orcl,local.orcl$date> as.Date("2011-12-16"))

# Plot the two series side by side

# First plot original series

plot(local.orcl$date, local.orcl$Close,type="l", col="red",xlab="Date",ylab="US$",

main="ORCL Stock Close ORE Computation of Smoothed Series n=30 days")

# Add smoothed series

lines(local.orcl$date,local.orcl$rollmean30,col="blue",type="l")

# Add legend

legend("topleft", c("Closing","ORE MA(30) of Closing"),

col=c("red","blue"),lwd=2,title = "Series",bty="n")

Seasonal Decomposition for Time Series Diagnostics

Now that we have learned how to execute these processes using Embedded R, we can start using other methodologies required for Time Series using the same Server-side computation and local plotting.

It is typical for an analyst to try to understand a Time Series better by looking at some of the basic diagnostics like the Seasonal Decomposition of Time Series by Loess. These can be achieved by using the stl() command in the following process:

# Server execution

sv.orcl.dcom <-

ore.tableApply(ORCLSTOCK[,c("date","Close")],ore.connect = TRUE,

function(dat) {

ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]

ts.orcl <- ts(ordered$Close,frequency=365, start=c(2008,1))

res <- stl(ts.orcl,s.window="periodic")

}

);

# Get result for plotting

local.orcl.dcom <- ore.pull(sv.orcl.dcom)

plot(local.orcl.dcom, main="Server-side Decomposition of ORCL Time-Series",col="blue")

Another typical set of diagnostic charts includes Autocorrelation and Partial Autocorrelation function plots. These can be achieved by using the acf() command with the proper options in Embedded R Execution, so computations happen at the Oracle Database server:

# Server-side ACF and PACF computation

# Use function acf() and save result as a list

sv.orcl.acf <-

ore.tableApply(ORCLSTOCK[,c("date","Close")],ore.connect=TRUE,

function(dat){

ts.orcl <- ts(dat$Close,frequency=365, start=c(2008,1))

list(res1 <- acf(ts.orcl,lag.max=120,type="correlation"),res2 <- acf(ts.orcl,lag.max=30, type="partial"))

}

);

# Get results for plotting

# ACF and PACF as members of the list pulled

local.orcl.acf <- ore.pull(sv.orcl.acf)

plot(local.orcl.acf[[1]],main="Server-side ACF Analysis for Series ORCL",col="blue",lwd=2)

plot(local.orcl.acf[[2]],main="Server-side PACF Analysis for Series ORCL",col="blue",lwd=5)

Simple Exponential Smoothing

Using the popular package “forecast”, we will use the ses() function to calculate a 90 days horizon (h=90) into the future, using the option criterion=MSE for the model. The package forecast needs to be installed on the Oracle Database server R engine.

Then, we will bring the resulting model locally for plotting. Remember to load the library “forecast” locally as well, to be able to interpret the meaning of the ses() output when it’s brought locally.

# Execute ses() call in the server

sv.orcl.ses <-

ore.tableApply(ORCLSTOCK[,c("date","Close")], ore.connect=TRUE,

function(dat) {

library(forecast)

ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]

ts.orcl <- ts(ordered$Close,frequency=365, start=c(2008,1) )

res <- ses(ts.orcl, h=90, alpha=0.1, initial="simple")

}

);

# Get SES result locally for plotting

# Since remote object contains a SES model from package forecast,

# load package locally as well

library(forecast)

plot.orcl.ses <- ore.pull(sv.orcl.ses)

plot(plot.orcl.ses,col="blue",fcol="red",

main="ORCL with Server-side SES - Simple Exponential Smoothing Forecast")

Holt Exponential Smoothing

Using the popular package “forecast”, we will use the holt() function to calculate a 90 days horizon (h=90) into the future, requesting the Intervals of confidence of 80 and 95%. Again. the package “forecast” needs to be installed on the Oracle Database server R engine.

Then, we will bring the resulting model locally for plotting. Remember to load the library forecast locally as well, to be able to interpret the meaning of the holt() output when it’s brought locally.

# Execute holt() call in the server

sv.orcl.ets <-

ore.tableApply(ORCLSTOCK[,c("date","Close")], ore.connect=TRUE,

function(dat) {

library(forecast)

ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]

ts.orcl <- ts(ordered$Close,frequency=365, start=c(2008,1))

res <- holt(ts.orcl, h=90, level=c(80,95), initial="optimal")

}

);

# Get resulting model from the server

# Since remote object contains a Holt Exponential Smoothing

# model from package forecast, load package locally as well

library(forecast)

local.orcl.ets <- ore.pull(sv.orcl.ets)

plot(local.orcl.ets,col="blue",fcol="red",

main="ORCL Original Series Stock Close with Server-side Holt Forecast")

ARIMA – Auto-Regressive Interactive Moving Average

There are at least two options for fitting an ARIMA model into a Time Series. One option is to use the package “forecast”, that allows for an automatic arima fitting (auto.arima) to find the best parameters possible based on the series.

For more advanced users, the arima() function in the “stats” package itself allows for choosing the model parameters.

# ARIMA models on the server using auto.arima() from package forecast

arimaModel <-

ore.tableApply(ORCLSTOCK[,c("date","Close")], ore.connect=TRUE,

FUN = function(dat){

# load forecast library to use auto.arima

library(forecast)

# sort the table into a temp file by date

ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]

# convert column into a Time Series

# format ts(...) and request creation of an automatic

# ARIMA model auto.arima(...)

res <- auto.arima(ts(ordered$Close,frequency=365, start=c(2008,1)),

stepwise=TRUE, seasonal=TRUE)

})

# Alternative using the arima() from package “stats”.

arimaModel <-

ore.tableApply(ORCLSTOCK[,c("date","Close")],ore.connect=TRUE,

FUN = function(dat){

# sort table into a temp file by date

ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]

# convert column into a Time Series

# format ts(...) and request creation of a specific

# ARIMA model using arima(), for example an ARIMA(2,1,2)

res <- arima(ts(ordered$Close,frequency=365, start=c(2008,1)),

order = c(2,1,2))

})

# Load forecast package locally to use the model

# for plotting and producing forecasts

library(forecast)

# Show remote resulting Time Series model

>arimaModel

Series: ts(ordered$Close, frequency = 365, start = c(2008, 1))

ARIMA(2,1,0)

Coefficients:

ar1 ar2

-0.0935 -0.0192

s.e. 0.0282 0.0282

sigma^2 estimated as 0.2323: log likelihood=-866.77

AIC=1739.55 AICc=1739.57 BIC=1754.96

# Get remote model using ore.pull for local prediction and plotting

local.arimaModel <- ore.pull(arimaModel)

# Generate forecasts for the next 15 days

fore.arimaModel <- forecast(local.arimaModel, h=15)

# Use the following option if you need to remove scientific notation of

# numbers that are too large in charts

options(scipen=10)

# Generate the plot of forecasts, including interval of confidence

# Main title is generated automatically indicating the type of model

# chosen by the Auto ARIMA process

plot(fore.arimaModel,type="l", col="blue", xlab="Date",

ylab="Closing value (US$)", cex.axis=0.75, font.lab="serif EUC",

sub="Auto-generated ARIMA for ORCL Stock Closing"

)

# Generate and print forecasted data points plus standard errors

# of the next 15 days

forecasts <- predict(local.arimaModel, n.ahead = 15)

>forecasts

$pred

Time Series:

Start = c(2011, 165)

End = c(2011, 179)

Frequency = 365

[1] 33.29677 33.29317 33.29395 33.29395 33.29393 33.29393 33.29393 33.29393 33.29393 33.29393 33.29393

[12] 33.29393 33.29393 33.29393 33.29393

$se

Time Series:

Start = c(2011, 165)

End = c(2011, 179)

Frequency = 365

[1] 0.4819417 0.6504925 0.7807798 0.8928901 0.9924032 1.0827998 1.1662115 1.2440430 1.3172839 1.3866617

[11] 1.4527300 1.5159216 1.5765824 1.6349941 1.6913898

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

The place for best practices, tips, and tricks for applying Oracle R Enterprise, Oracle R Distribution, ROracle, and Oracle R Advanced Analytics for Hadoop in both traditional and Big Data environments.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today