Introduction

These are exercises with plotly and data from the State Energy Data System (SEDS).

Data

The original data source is available on a state by state basis, or for the US as a whole, in a “wide” format.

The code below downloads the data, imports it and tidies it into a long format. This code also includes unit codes, which, at the time of writing, are of my own invention.

The data wrangling code is hidden in this file so that the plots can be more easily seen, as they are the focus of the coursework. The code can be seen in the Rmd file in this Github repository.

#Subset data by MSN so that I can plot for MSN
#Note that creating the matrix with a list data type is important, else you can't add dataframes into cells
msn_subsets_mtrx <- matrix(list(), nrow = 2, ncol = nrow(msn_codes))
col_counter <- 1
for (msn_code in msn_codes$MSN) {
  msn_subsets_mtrx[[1, col_counter]] <- msn_code
  msn_subsets_mtrx[[2, col_counter]] <- subset(all_states_long_df, MSN == msn_code)
  col_counter <- col_counter + 1
}

Plots

There are two types of plots here:

Line Plots

These plots show the consumption for all states for a given consumable, identified by a MSN code, across the sampled time period.

#This syntax for using plotly in a loop comes from here:
#https://github.com/ropensci/plotly/issues/273
line_plot_gatherer <- htmltools::tagList()
col_counter <- 1
for (msn_code in msn_codes$MSN) {
  plot_df <- msn_subsets_mtrx[[2, col_counter]]
  if (nrow(plot_df) > 0) {
    line_plot_gatherer[[col_counter]] <- plot_ly(plot_df, x = ~Year, y = ~value, color = ~State, type = "scatter", mode = "lines") %>% 
      layout(title = paste("All US States -", msn_codes[msn_codes$MSN == msn_code, 2], "-", msn_codes[msn_codes$MSN == msn_code, 4]))
  }
  col_counter <- col_counter + 1
  #Stop before too many are created, just to save time and space  
  if (col_counter > 10) {
    break
  }
}
line_plot_gatherer

Choropleth Maps

These plots should show the consumption for all states for a given consumable, identified by a MSN code, for a given year.

There is a bug with, possibly, plotting multiple maps that this code illustrates. This could be user error also. It may be related to: https://github.com/ropensci/plotly/issues/273

There are two main issues:

  1. Each map seems to be using the same dataframe in the map values, though the debug title seems to show that there is a distinct data frame when the map is created.
  2. Trying to plot more than two maps means that no data at all is shown in the maps.

I am keeping this here as an example I can refer to if needed, so I want to publish it also. The coursework deadline is today.

# Make state borders red
borders <- list(color = toRGB("red"))
# Set up some mapping options
map_options <- list(
  scope = 'usa',
  projection = list(type = 'albers usa'),
  showlakes = TRUE,
  lakecolor = toRGB('white')
)
map_plot_gatherer <- htmltools::tagList()
col_counter <- 1
for (msn_code in msn_codes$MSN) {
  plot_df <- msn_subsets_mtrx[[2, col_counter]]
  #Select a specific year
  year <- "2000"
  plot_df <- subset(plot_df, Year == year)
  if (nrow(plot_df) > 0) {
    map_plot_gatherer[[col_counter]] <- plot_ly(z = ~plot_df$value, text = ~plot_df$value, locations = ~plot_df$State, type = 'choropleth', locationmode = 'USA-states', color = plot_df$Value, colors = 'Blues', marker = list(line = borders)) %>%
    layout(title = paste("Debug sum value is -", sum(subset(plot_df, MSN == msn_code)$value)), geo = map_options)
  #The above is a debug title to see whether I realy am using different data sets
  #This is what the title should be:
  #layout(title = paste(msn_codes[msn_codes$MSN == msn_code, 2], "-", year, "-", msn_codes[msn_codes$MSN == msn_code, 4]), geo = map_options)
  }
  col_counter <- col_counter + 1
  #If more than two maps are created, then there is no data, just blank maps. See:
  #https://github.com/ropensci/plotly/issues/273
  #Stopping when col_counter > 2 works, to show two maps. Trying to show three did not work.
  if (col_counter > 3) {
    break
  }
}
map_plot_gatherer