 ## Learn data science best practices

• September 10, 2018

# How to Calculate the Standard Deviation of a Financial Portfolio in R Welcome to the first installment of a three-part series dedicated to portfolio standard deviation, also known as volatility. In this series, you will learn to build a Shiny application in order to visualize total portfolio volatility over time, as well as how each asset has contributed to that volatility. You will accomplish this over the course of three posts, but if you just can’t wait, the final app is available here.

So, why is standard deviation import? Standard deviation is taken as the main measure of portfolio risk or volatility. To learn more about why, we have to head back to 1959 and read Markowitz’s monograph Portfolio Selection: Efficient Diversification of Investments, which talks about the means and variances of returns. In short, standard deviation measures the extent to which a portfolio’s returns are dispersed around its mean. If returns are more dispersed, the portfolio has a higher standard deviation and is seen as riskier or more volatile.

Our volatility project will proceed as follows:

• Part One (this post): Calculate portfolio standard deviation in several ways, visualize portfolio standard deviation, calculate rolling portfolio standard deviation, and visualize rolling portfolio standard deviation with ggplot2 and highcharter.

• Part Two (coming soon): Write a custom function to calculate individual asset contribution to standard deviation, write a custom function to calculate rolling asset contribution to standard deviation, and visualize rolling asset contribution to standard deviation.

• Part 3 (coming soon): Build a Shiny application where end user can construct a custom portfolio and visualize rolling standard deviation and rolling contribution over time.

With that, on to Part One!

Before we start the tutorial, let’s load the packages that we will need. If these are not already in your environment, you will need to run install.packages("package name") for each package:

``````
# install.packages("PerformanceAnalytics")
# install.packages("tidyverse")
# install.packages("tidyquant")
# install.packages("timetk")
# install.packages("tibbletime")
# install.packages("highcharter")
# install.packages("scales")

library(PerformanceAnalytics)
library(tidyverse)
library(tidyquant)
library(timetk)
library(tibbletime)
library(highcharter)
library(scales)
``````

Now we are ready to build our portfolio, which will consist of the following:

• SPY (S&P500 fund), weighted 25%

• EFA (a non-US equities fund), weighted 25%

• IJS (a small-cap value fund), weighted 20%

• EEM (an emerging-markets fund), weighted 20%

• AGG (a bond fund), weighted 10%

This is the same portfolio that I used in a previous post on the Sortino ratio. Similar to what we did in that project, we will need to calculate the monthly returns of the portfolio before we can calculate standard deviation. That means we first need to get the prices of each ETF over a certain time period, convert prices to the monthly returns for each ETF, and convert those individual returns to portfolio returns.

I won’t cover the logic again, but the bare code is in the code block below. For the curious, I have written posts on converting individual prices to monthly returns and on converting to portfolio returns.

``````
symbols <-
c("SPY","EFA", "IJS", "EEM","AGG")

prices <-
getSymbols(symbols,
src = 'yahoo',
from = "2012-12-31",
to = "2017-12-31",
auto.assign = TRUE,
warnings = FALSE) %>%
reduce(merge) %>%
`colnames<-`(symbols)

w <- c(0.25, 0.25, 0.20, 0.20, 0.10)

prices_monthly <-
to.monthly(prices, indexAt = "lastof", OHLC = FALSE)

asset_returns_xts <-
na.omit(Return.calculate(prices_monthly, method = "log"))

portfolio_returns_xts <-
Return.portfolio(asset_returns_xts, weights = w)

asset_returns_long <-
prices_monthly %>%
tk_tbl(preserve_index = TRUE, rename_index = "date") %>%
gather(asset, returns, -date) %>%
group_by(asset) %>%
mutate(returns = (log(returns) - log(lag(returns)))) %>%
na.omit()

portfolio_returns_tidy <-
asset_returns_long %>%
tq_portfolio(assets_col = asset,
returns_col = returns,
weights = w,
col_rename = "returns")
``````

We now have two portfolio returns objects that hold the same data but have different structures. portfolio_returns_xts is an xts object, whereas portfolio_returns_tidy is a tibble (or data frame).

Notice that portfolio_returns_xts does not have a column called date, but rather a date index accessed by index(portfolio_returns_xts). That’s because, by definition, an xts object is a matrix with a date index. It cannot exist as a time-series object without the date index, and so the date index does not exist as a removable column.

portfolio_returns_tidy does have a date column, accessed by portfolio_returns_tidy\$date. A tibble does not need a date index and we could simply remove the date column and still have a tibble. That makes it a bit easier to understand, but it also means that many tibble-oriented functions are not built to handle time series. We will handle that with the tibbletime package.

If you work with financial time-series data, xts and tibble are two of the most common structures that you will see. Some people prefer to work with one or the other, but I see both of them in my industry work, so we will review how to work with both of them. That means we will be solving our substantive tasks twice and arriving at the exact same answer, which can get tedious — but it’s a good way to gain familiarity with different data structures.

## Standard Deviation with xts

Let’s start with our xts object and calculate standard deviation with the built-in StdDev() function from PerformanceAnalytics. The PerformanceAnalytics package has a lot of useful, built-in functions for running portfolio analysis — way too many to even start listing here — and I definitely recommend reading through the package documentation as you start your journey into R for finance.

To calculate the standard deviation of monthly returns, we pass portfolio_sd_xts to StdDev().

``````
portfolio_sd_xts <-
StdDev(portfolio_returns_xts)

portfolio_sd_xts <-
round(portfolio_sd_xts * 100, 2)

portfolio_sd_xts[1,1]
``````
``````
StdDev
2.66
``````

## Standard Deviation with tibbles

For our calculations on the portfolio_returns_tidy tibble, we use summarise() from the dplyr package and pass it the sd() function. Unlike the xts flow, the tibble flow does not know or care that this is time-series data. It is calculating the standard deviation of our returns column without regard for the date column.

``````
portfolio_sd_tidyverse <-
portfolio_returns_tidy %>%
summarise(dplyr = sd(returns)) %>%
mutate(dplyr = round(dplyr, 4) * 100)

portfolio_sd_tidyverse
`````` We can also use the tidyquant package to apply functions from the xts world to data from the tibble world. In this case, we will use tq_performance() to apply the table.Stats() function from PerformanceAnalytics. The table.Stats() function returns a table of statistics for the portfolio, but since we want only standard deviation, we will select() the Stdev column. Normally, table.Stats() would be called on an xts object, but tidyquant lets us operate on our tibble.

``````
portfolio_sd_tq <-
portfolio_returns_tidy %>%
tq_performance(Ra = returns,
Rb = NULL,
performance_fun = table.Stats) %>%
select(Stdev) %>%
mutate(tq_sd = round(Stdev, 4) * 100)
``````

This nicely demonstrates how tidyquant blends together the xts and tibble paradigms. We used a function from PerformanceAnalytics and applied it to a tibble of returns.

Let’s review our calculations:

``````
portfolio_sd_tidyverse %>%
mutate(xts = portfolio_sd_xts,
tq = portfolio_sd_tq\$tq_sd)
`````` We should now feel comfortable calculating portfolio standard deviation starting from different object types and using different code flows. Let’s see how this work can be visualized.

## Visualizing Standard Deviation

Visualizing standard deviation of portfolio returns comes down to visualizing the dispersion of portfolio returns. Let’s start with a scatter plot and pipe our tibble of returns to ggplot().

``````
portfolio_returns_tidy %>%
ggplot(aes(x = date, y = returns)) +
geom_point(color = "cornflowerblue") +
scale_x_date(breaks = pretty_breaks(n = 6)) +
ggtitle("Scatterplot of Returns by Date") +
theme(plot.title = element_text(hjust = 0.5))
`````` ## Dispersion of Portfolio Returns

At first glance — also known as using an unscientific eyeball test  it seems that 2015 was the riskiest year. Let’s add a different color for any monthly returns that are one standard deviation away from the mean.

First, we will create an indicator for the mean return with mean() and one for the standard deviation with sd(). We will call the variables mean_plot and sd_plot, since we plan to use them in a plot and not for anything else.

``````
sd_plot <-
sd(portfolio_returns_tidy\$returns)
mean_plot <-
mean(portfolio_returns_tidy\$returns)
``````

We want to shade the scatter points according to a returns distance from the mean. Accordingly, we mutate(), or create, three new columns based on if-else logic. If the return is one standard deviation below the mean, we want to add that observation to the column we call hist_col_red, else that column should have an N/A. We will create three new columns this way with if_else().

The chart below shows the results when we add color to these columns:

``````
portfolio_returns_tidy %>%
mutate(hist_col_red =
if_else(returns < (mean_plot - sd_plot),
returns, as.numeric(NA)),
hist_col_green =
if_else(returns > (mean_plot + sd_plot),
returns, as.numeric(NA)),
hist_col_blue =
if_else(returns > (mean_plot - sd_plot) &
returns < (mean_plot + sd_plot),
returns, as.numeric(NA))) %>%
ggplot(aes(x = date)) +

geom_point(aes(y = hist_col_red),
color = "red") +

geom_point(aes(y = hist_col_green),
color = "green") +

geom_point(aes(y = hist_col_blue),
color = "blue") +
labs(title = "Colored Scatter", y = "monthly returns") +
scale_x_date(breaks = pretty_breaks(n = 8)) +
theme(plot.title = element_text(hjust = 0.5))
`````` Let’s add a line for the value that is one standard deviation above and below the mean with geom_hline(yintercept = (mean_plot + sd_plot), color = "purple"...) + geom_hline(yintercept = (mean_plot - sd_plot), color = "purple"...) +.

``````
portfolio_returns_tidy %>%
mutate(hist_col_red =
if_else(returns < (mean_plot - sd_plot),
returns, as.numeric(NA)),
hist_col_green =
if_else(returns > (mean_plot + sd_plot),
returns, as.numeric(NA)),
hist_col_blue =
if_else(returns > (mean_plot - sd_plot) &
returns < (mean_plot + sd_plot),
returns, as.numeric(NA))) %>%

ggplot(aes(x = date)) +

geom_point(aes(y = hist_col_red),
color = "red") +

geom_point(aes(y = hist_col_green),
color = "green") +

geom_point(aes(y = hist_col_blue),
color = "blue") +

geom_hline(yintercept = (mean_plot + sd_plot),
color = "purple",
linetype = "dotted") +
geom_hline(yintercept = (mean_plot-sd_plot),
color = "purple",
linetype = "dotted") +
labs(title = "Colored Scatter with Line", y = "monthly returns") +
scale_x_date(breaks = pretty_breaks(n = 8)) +
theme(plot.title = element_text(hjust = 0.5))
`````` The new plot is showing us returns over time, and whether they fall below or above one standard deviation from the mean. One element that jumps out to me is how many red or green circles we see after January 1, 2017. Zero! That is zero monthly returns that are least one standard deviation from the mean during calendar year 2017. When we get to rolling volatility, we should see this reflected as a low rolling volatility through 2017, along with high rolling volatility through 2015.

Let’s build a final visualization to contextualize the standard deviation of our portfolio in a comparative manner. In this case, we can explore how our portfolio return/standard deviation ratio compares to the return/standard deviation ratio of the five individual assets used to construct the portfolio. In theory, our portfolio should have a better ratio, else we don’t need to bother with portfolio construction.

We start with our asset_returns_long data frame and calculate the mean returns and standard deviation of each asset with summarise(expected_return = mean(returns), sd = sd(returns)).

Then we use dplyr's add_row() to add the portfolio standard deviation/mean from portfolio_sd_tidy.

Finally, we end with a call to ggplot() and geom_point().

``````
asset_returns_long %>%
group_by(asset) %>%
summarise(expected_return = mean(returns),
stand_dev = sd(returns)) %>%
stand_dev =
sd(portfolio_returns_tidy\$returns),
expected_return =
mean(portfolio_returns_tidy\$returns)) %>%

ggplot(aes(x = stand_dev,
y = expected_return,
color = asset)) +
geom_point(size = 2) +
geom_text(
aes(x =
sd(portfolio_returns_tidy\$returns) * 1.11,
y =
mean(portfolio_returns_tidy\$returns),
label = "Portfolio")) +
ylab("expected return") +
xlab("standard deviation") +
ggtitle("Returns versus Risk") +
scale_y_continuous(label = scales::percent) +
scale_x_continuous(label = scales::percent, breaks = pretty_breaks(n = 10)) +
# The next line centers the title
theme_update(plot.title = element_text(hjust = 0.5))
`````` The S&P500 has a higher expected return for just a bit more risk than our portfolio. EEM and EFA have a higher risk and lower expected return (no rational investor wants that!) and IJS has a higher risk and a higher expected return (some rational investors do want that!).

## Rolling Standard Deviation

We have calculated the average volatility for the entire life of the portfolio, but it would help if we could better understand how that volatility has changed over time or behaved in different market conditions.

We might miss a three-month or six-month period where the volatility spiked, plummeted, or did both. And the longer our portfolio life, the more likely we are to miss something important. If we had 10 or 20 years of data, and we calculated the standard deviation for that entire period, we could, or most certainly would, fail to notice a period in which volatility was very high, and hence we would fail to ponder the probability that it could occur again.

Imagine a portfolio that had a standard deviation of returns for each six-month period of 3% and never changed. Now imagine a portfolio with volatility that fluctuated every few six-month periods from 0% to 6%. We might find a 3% standard deviation of monthly returns over a 10-year sample for both, but those two portfolios are not exhibiting the same volatility. The rolling volatility of each would show us the differences, allowing us to hypothesize about the past causes and future probabilities of those differences. We might also want to think about dynamically re-balancing our portfolio to better manage volatility if we are seeing large spikes in rolling windows.

## Rolling Standard Deviation with xts

The xts world is purpose-built for time series and, as such, makes calculating rolling standard deviation fairly straightforward.

First, we assign a value of 24 to the variable window.

``````
window < - 24
``````

We then invoke rollapply(), pass it our xts returns object, the sd() function, and a rolling window with width = window.

``````
port_rolling_sd_xts < -
rollapply(portfolio_returns_xts,
FUN = sd,
width = window) %>%
# omit the 23 months for which there is no rolling 24
# month standard deviation
na.omit() %>%
`colnames< -`("rolling_sd")

``````
``````
rolling_sd
2014-12-31 0.02616475
2015-01-31 0.02638196
2015-02-28 0.02770679
``````

## Rolling Standard Deviation with the tidyverse and tibbletime

To calculate the rolling standard deviation of our tibble, we have two options. We can use tidyquant, or we can convert to a time-aware tibble using the tibbletime package.

To use tidyquant, we start with tq_mutate() and supply mutate_fun = rollapply as our mutation function argument. Then, we invoke FUN = sd as the nested function beneath rollapply(). That combination will apply the sd() function on a rolling basis.

``````
port_rolling_sd_tq <-
portfolio_returns_tidy %>%
tq_mutate(mutate_fun = rollapply,
width = window,
FUN = sd,
col_rename = "rolling_sd") %>%
select(date, rolling_sd) %>%
na.omit()
``````

If we wish to use tibbletime, we first call rollify() to define a rolling standard deviation function. We want to roll the sd() function with a width equal to window, so we define sd_roll_24 <- rollify(sd, window = window).

``````
sd_roll_24 < -
rollify(sd, window = window)
``````

Then we use mutate() to pass it into the code flow. Note that we convert our tibble to a tibbletime data frame with as_tbl_time(index = date).

``````
port_rolling_sd_tidy_tibbletime <-
portfolio_returns_tidy %>%
as_tbl_time(index = date) %>%
mutate(sd = sd_roll_24(returns)) %>%
select(-returns) %>%
na.omit()

tail(port_rolling_sd_tidy_tibbletime, 3)
`````` That nifty combination of mutate and tibbletime rolling functions is applicable to other functions beyond standard deviation. Tibbletime is changing and improving rapidly as of the time of this writing (summer of 2018). Stay tuned!

Now, let’s take a quick peek to confirm consistent results from our xts, tibbletime and tidyquant methods:

``````
port_rolling_sd_tidy_tibbletime %>%
mutate(sd_tq = port_rolling_sd_tq\$rolling_sd,
sd_xts = round(port_rolling_sd_xts\$rolling_sd, 4)) %>%
tail(3)
``````

The results look good. We have an xts object called port_rolling_sd_xts, a tibbletime tibble called port_rolling_sd_tidy_tibbletime and a tibble object called port_rolling_sd_tq. Each contains the 24-month rolling standard deviation of portfolio returns. Now let’s try some visualization.

## Visualizing Rolling Standard Deviation with ggplot

We begin our visualization by passing port_rolling_sd_tq to ggplot(). We want to chart rolling standard deviation as a line chart, with date on the x-axis. We call aes(x=date) and then geom_line(aes(y = rolling_sd), color = "cornflowerblue").

``````
port_rolling_sd_tq %>%
ggplot(aes(x = date)) +
geom_line(aes(y = rolling_sd), color = "cornflowerblue") +
scale_y_continuous(labels = scales::percent) +
scale_x_date(breaks = pretty_breaks(n = 8)) +
labs(title = "Rolling Standard Deviation", y = "") +
theme(plot.title = element_text(hjust = 0.5))
`````` Have a look at the chart and note that we did not manually change our decimal to percentage format. When we called scale_y_continuous(labels = scales::percent), the work was done for us by adding the % sign and multiplying by 100.

## Visualizing Rolling Standard Deviation with highcharter

Next, let’s explore how to visualize our xts data with highcharter.

First, some quick background: highcharter is an R package but Highcharts is a JavaScript library — the R package is a hook into the JavaScript library. Highcharts is fantastic for visualizing time series and it comes with great built-in widgets for viewing different time frames. I highly recommend it for visualizing financial time series. It’s free for personal use, but you do need to buy a license to use it in a commercial setting.

Before we get to charting, let’s convert our data to rounded percentages:

``````
port_rolling_sd_xts_hc < -
round(port_rolling_sd_xts, 4) * 100
``````

To build the highcharter visualization, we first call highchart(type = "stock") and then pass that port_rolling_sd_xts_hc to hc_add_series(). Highcharter recognizes the date index of an xts object, so we do not need to point to it.

``````
highchart(type = "stock") %>%
color = "cornflowerblue") %>%
hc_title(text = paste0(window, "-Month Rolling Volatility", sep = "")) %>%
hc_yAxis(
labels = list(format = "{value}%"),
opposite = FALSE) %>%
hc_navigator(enabled = FALSE) %>%
hc_scrollbar(enabled = FALSE) %>%
hc_exporting(enabled= TRUE) %>%
hc_legend(enabled = TRUE)
`````` Do these rolling visualizations add to our understanding of this portfolio? Well, we can see a spike in rolling volatility in 2016 followed by a consistently falling volatility through mid-2017. That makes sense. Remember back to our returns dispersion scatter plot where zero monthly returns were more than one standard deviation away from the mean in 2017. This visualization also let’s us see whether we missed any other spikes or unexpected periods of riskiness. If this portfolio had 30 years of history, this chart would be very helpful indeed.

That’s all for today. Next time, we will review how to calculate and visualize individual contributions to portfolio standard deviation. See you then!