Data exploration is a key aspect of any analytical workflow, and one nice way to explore and analyze data is by using maps. In this post, we will use a map to visualize housing prices in the U.S., specifically the state-by-state House Price Index (HPI) published by Freddie Mac.
Our goal is to articulate how to quickly build an interactive Shiny app that allows a user to click on a state that has been shaded by its annual house price appreciation and display a chart of house prices over time.
In this post we will focus on how to construct a map and shade it according to the annual house price appreciation (HPA) of each state. We will need to do the following:
import the Freddie HPI data from Quandl
use dplyr and pipes to wrangle to the format we want
calculate annual HPA for each state and save in tidy format
import geospatial object to construct a map of the USA
add our HPA data to that object
use leaflet to build a map with states shaded by HPA
First, let's load up the necessary packages.
# Note that things might get tricky with this particular package.
# If you are using a Linux OS, run the following before installing 'tigris':
# 1. sudo apt-get update && sudo apt-get install libgdal-dev libproj-dev
# 2. install.packages("rgdal")
# 3. install.packages("rgeos")
# 4. library(rgdal)
# 5. library(rgeos)
# You might want to supply an api key
# Quandl.api_key("[your key here]")
Now we can import the data from Quandl using the Quandl() function and supplying the data set code FMAC/HPI.
Quandl("FMAC/HPI", order = 'asc') %>%
Quandl supplies a nicely formatted Date column and defaults the import to a data.frame object. That's helpful because we will use dplyr and pipes for our wrangling. As happens quite often, the wrangling is going to be the most challenging part of our project. We need to take those 50 state HPIs and calculate the annual price appreciation for each state.
We will use pipes to go through the following logic:
We need only two data points, the most recent price and the price 12-months previous to that. So, we'll use filter() to grab those two rows.
Our data are currently in a wide format, but we need it to be in a long, tidy format. We'll use gather(state, value, -Date) to change from wide to long.
Then, we group_by(state)
It's worth lingering on these two steps because they are the least intuitive at first glance. Originally, we had 1 date column and 51 state/DC columns. gather() and group_by() will reformat our data to one very long column called state, one very long column called value and one very long column called Date. Why is this useful?
We want to calculate HPA by state, meaning we are going to run 51 separate calculations and store the results in 51 different locations. Changing our data to a long format and grouping by state will allow us to apply our HPA calculation to 1 column (the value column), and store the results in 1 newly created column. This keeps the data in a tidy format.
Back to the substance, to calculate HPA and create a new column for it, we call mutate(hpa = (value - lag(value))/ lag(value)). This creates the hpa column and gives a value, by state, of the house price appreciation in the last 12 months. Remember we have already filtered the dates in step 1.
Next, we reformat the new hpa column with round().
We don't need the date column anymore, so we can select() just the state and hpa columns.
Finally, we rename() the state column to STUSPS, which will be explained later!
We now have wrangled our original 52 column data frame of state HPIs to a 2-column data frame with state HPAs.
Next, let's build a map of the USA that is shaded according to that second column!
First, we need to import an object that holds geometric data for all 50 states and then add that hpa data to the object.
The tigris package makes it easy to import a simple features data frame that has geospatial data for the 50 states. To do so, use the aptly named states() function and set class = "sf".
states <- states(cb = TRUE, class = "sf")
The states object contains the longitude and latitude coordinates of each state polygon in the geometry column, and also has a column for NAME and STUSPS, which are the state abbreviations used by the USPS. This is the reason that when we created our hpa data frame, we renamed the state column to STUSPS. We will use that common column name to merge our HPA data into the simple features spatial data object by calling merge(states, states_wrangled, by = "STUSPS"...). This function will add an hpa column in our spatial data frame according to matches in the STUSPScolumn (there is a column called STUSPS in both of these data frames). For any columns whose STUSPS values don't match, the function will place an NA in the hpa column.
# Now we want to merge by a common column name.
states_hpa_leaflet <- merge(states, states_wrangled, by = "STUSPS", all.x = TRUE)
Notice that there is now a column called hpa. The fourth row has an NA because our spatial object had an entry for American Samoa but our Quandl HPI data did not. When we shade the map, that NA will be a gray polygon.
Next we'll create a shading scheme with the colorNumeric() function from the leaflet package. We'll go with a the blue-green palette by setting palette = "GnBu", and use the argument domain = states_hpa_leaflet$hpa to shade by hpa.
Click on a state to test the popup. Do the darker blues appear to be in states where we expect large house price increases? Do lighter greens/yellows appear in states where we expect low/negative house price movement? Alright, this is the exact map that we'll use in Shiny - literally the exact map because we are just going to load up that leaf-states object in the app. From there we wire up Quandl so a user can access HPI data by clicking the map, then pass it to highcharter for visualizing. The full code for making that connection is available in the source code button on the finished Shiny app. Thanks for reading!
About the author:
Jonathan works with RStudio's financial services customers. He studied International Relations as an undergraduate at Harvard, worked in finance at JPMorgan and then did graduate work in Political Economy at Emory University before joining RStudio.