Robert Hyndman is the author of the forecast package in R. I’ve been using the package for long-term time series forecasts. The package comes with some built in methods for plotting forecast data objects in R that Ive wanted to customize for improved clarity and presentation. The following article achieves that goal and shares two scripts for plotting forecast data objects using ggplot.
Script structure
The first script is a custom plotting theme to configure the R graphical device for custom data visualization. The template changes the default plot theme for ggplot and replaces it with the Goggle Docs graph format and some custom label options. The result is a clean plot format and new default template for repeated use.
The second script is a custom function for plotting forecast data objects that utilizes the custom theme. The function presents training, fitted and forecast data along with 3 predictive forecast intervals (80%, 95% and 99%). The function should work with any of the forecast model types for time series data including:
- basic or naive forecast methods
- models that use seasonal dummy variables
- trigonomic models using Fourier series
- exponential smoothing models
- autoregressive moving average models (ARIMA) and
- neural network models
Each of these model types can be easily accommodated since the forecast package utilizes one format for forecast data output and easily supports the required output for 3 predictive intervals.
Custom graph theme
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
library(ggplot2) library(ggthemes) # theme for forecast data objects theme.fxdat <- theme_gdocs() + theme(plot.title = element_text(size = 15), plot.subtitle = element_text(size = 11), plot.caption = element_text(size = 9, hjust = 0, vjust = 0, colour = "grey50"), axis.title.y = element_text(face = "bold", color = "gray30"), axis.title.x = element_text(face = "bold", color = "gray30", vjust = -1), panel.background = element_rect(fill = "grey95", colour = "grey75"), panel.border = element_rect(colour = "grey75"), panel.grid.major.y = element_line(colour = "white"), panel.grid.minor.y = element_line(colour = "white", linetype = "dotted"), panel.grid.major.x = element_line(colour = "white"), panel.grid.minor.x = element_line(colour = "white", linetype = "dotted"), strip.background = element_rect(size = 1, fill = "white", colour = "grey75"), strip.text.y = element_text(face = "bold"), axis.line = element_line(colour = "grey75")) |
Custom function for plotting forecast() objects in ggplot
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
# Header ---- # # Name: plot_fx.R # # Title: Custom plotting function for forecast data objects # # Version: 1.0 # Date: 2017-Oct-06 # Author: Bradley Horn # License: GPL (>=2) # # Description: Custom plot function displays model training, fitted and forecast data # as lines and predictive intervals as ribbons # # Details: Function data inpuits are defined as follows: # fx.dat = forecast data object with 3 predictive intervals # PI = logical if predictive intervals are drawn # shade.cols = character string; 3 shade colors for predictive intervals # line.cols = character strine; 3 line colors for training, fitted and forecast data # date.breaks = character string; such as "3 months", "1 year" # data.format = x-axis date format # main.title = character string; main plot title # sub.title = character string; plot sub title # caption = character string; caption # x.title = character string; x-axis title # y.title =character string; y-axis title # # Dev.Notes: NA # # Depends: The following R package libraries are required: # dplyr, ggplot, RColorBrewer, scales, forecast and zoo # # References: NA # ##-------------------------------------------------------------------------------------------## # function for using ggplot2 for forecast objects plot_fx <- function(fx.dat, PI = TRUE, line.cols = NA, shade.cols = NA, show.gap = FALSE, date.breaks = NA, date.format = "%Y-%b", main.title = NA, sub.title = NA, caption = NA, x.title = NA, y.title = NA){ # manage package libraries pkgs <- c("dplyr", "ggplot2", "RColorBrewer", "scales", "forecast", "zoo") attached <- search() attached_pkgs <- attached[grepl("package", attached)] need_to_attach <- pkgs[which(!pkgs %in% gsub("package:", "", attached_pkgs))] if (length(need_to_attach) > 0) { for (i in 1:length(need_to_attach)) { require(need_to_attach[i], character.only = TRUE) } } # data input testing and formatting if (class(fx.dat) != "forecast") { stop("forecast data object required", call. = FALSE) } if (is.na(line.cols[1])) { line.cols = c("black", "darkcyan", "goldenrod1") } if (length(line.cols) != 3) { stop("length of line.cols not equal to 3", call. = FALSE) } if (PI == TRUE) { pi.levels <- fx.dat$level n.levels <- length(pi.levels) if (is.na(shade.cols)) { shade.cols = brewer.pal(n.levels, "PuBuGn") } if (n.levels != length(shade.cols)) { stop("length of shade.cols not equal to number of predictive intervals", call. = FALSE) } } if (!show.gap) { freq <- attr(fx.dat$x, "tsp")[3] last.obs <- fx.dat$x[length(fx.dat$x)] last.time <- time(fx.dat$x)[length(fx.dat$x)] fx.dat$mean <- ts(c(last.obs, fx.dat$mean), start = last.time, frequency = freq) fx.dat$upper <- ts(rbind(last.obs, fx.dat$upper), start = last.time, frequency = freq) fx.dat$lower <- ts(rbind(last.obs, fx.dat$lower), start = last.time, frequency = freq) } if (is.na(date.breaks)) { print("date.breaks to set to '6 months' absent user input") date.breaks <- "6 months" } # define dataframe with training (x), forecast (y) and interval (pi) data len.x <- length(fx.dat$x) len.y <- length(fx.dat$mean) df <- tibble(date = c(as.Date(time(fx.dat$x)), as.Date(time(fx.dat$mean))), x = c(fx.dat$x, rep(NA, len.y)), fitted = c(fx.dat$fitted, rep(NA, len.y)), forecast = c(rep(NA, lenx), fx.dat$mean), lo.80 = c(rep(NA, len.x), fx.dat$lower[, 1]), up.80 = c(rep(NA, len.x), fx.dat$upper[, 1]), lo.95 = c(rep(NA, len.x), fx.dat$lower[, 2]), up.95 = c(rep(NA, len.x), fx.dat$upper[, 2]), lo.99 = c(rep(NA, len.x), fx.dat$lower[, 3]), up.99 = c(rep(NA, len.x), fx.dat$upper[, 3])) # plot training, fitted and forecast data ggplot(df, aes(date, x)) + geom_line(aes(colour = "Training")) + geom_line(data = df, aes(date, fitted, colour = "Fitted"), size = 0.75) + geom_ribbon(data = df, aes(date, ymin = lo.99, ymax = up.99, fill = "99%")) + geom_ribbon(data = df, aes(date, ymin = lo.95, ymax = up.95, fill = "95%")) + geom_ribbon(data = df, aes(date, ymin = lo.80, ymax = up.80, fill = "80%")) + geom_line(data = df, aes(date, forecast, colour = "Forecast"), size = 0.75) + geom_point(data = df, aes(date, forecast, colour = "Forecast"), size = 1) + geom_point(size = 1) + scale_x_date(breaks = seq(df$date[1], df$date[length(df$date)], by = date.breaks), date_labels = date.format) + scale_colour_manual(name = "Model Data", values = c("Training" = line.cols[1], "Fitted" = line.cols[2], "Forecast" = line.cols[3]), breaks = c("Training", "Fitted", "Forecast")) + scale_fill_manual(name = "Forecast Intervals", values = c("99%" = shade.cols[1], "95%" = shade.cols[2], "80%" = shade.cols[3])) + guides(colour = guide_legend(order = 1), fill = guide_legend(order = 2)) + labs(title = main.title, subtitle = sub.title, caption = caption, x = x.title, y = y.title) + theme.fxdat } ##-------------------------------------------------------------------------------------------## |
By way of example, the function can be tested using the following time series data for small-scale residential solar production for the US:
1 2 3 4 5 |
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 248 280 372 420 465 480 496 496 480 434 360 372 2015 341 364 527 600 682 690 744 744 690 620 510 465 2016 527 609 837 930 1054 1080 1147 1085 990 868 720 651 2017 682 784 1147 1290 1426 1470 1519 |
The data is first assigned a variable name (x) and converted into a time series data object (res.gen) using the ts() function with monthly frequency. A naive forecast model (naive.fx) is generated using a a random walk forecast with a forecast horizon (h) of 17 months and 3 predictive intervals (80%, 95% and 99%). The resulting plot of the forecast data object using is pictured at the start of this article. The naive forecast is easily recognized as one where future values equal the most recent observed value:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# create time series data object (ts) using residential solar production (x) res.gen <- ts(x, frequency = 12, start = c(2014, 1)) # define naive random walk forecast (rwf) forecast with prediction horizon h = 17 months # and 3 predictive intervals 80%, 95% and 99% naive.fx <- rwf(res.gen, h = 17, level = c(80, 95, 99)) # test plot function plot_fxdat(naive.fx$res.gen, PI = TRUE, line.cols = c("black", "darkcyan", "goldenrod1"), shade.cols = NA, show.gap = FALSE, date.breaks = "6 months", date.format = "%b-%y", main.title = "Generation - Residential Solar Sector", sub.title = "Naive forecast", caption = "sources: Short-Term Energy Outlook, eia, July 2017; bxhorn.com", x.title = "Production Months", y.title = "Megawatt Hours (MWhs)") ##-------------------------------------------------------------------------------------------## |
Using the same data, a more realistic forecast can be obtained by capturing the historic data trend and seasonal pattern. This is accomplished by defining a linear model (fit.y) with seasonal dummy variables. The model results are then converted to a forecast data object (fx.y) with the same forecast horizon (h = 17) and predictive intervals. The results are pictured below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
fit.y <- tslm(res.gen ~ trend + season) fx.y <- forecast(fit.y, h = 17, level = c(80, 95, 99)) plot_fx(fx.y, PI = TRUE, line.cols = NA, shade.cols = NA, show.gap = TRUE, date.breaks = "6 months", date.format = "%b-%y", main.title = "Generation - Residential Solar Sector", sub.title = "Linear trend with seasonal dummy variables", caption = "sources: Short-Term Energy Outlook, eia, July 2017; bxhorn.com", x.title = "Production Months", y.title = "Megawatt Hours (MWhs)") ##-------------------------------------------------------------------------------------------## |
In this case, the fitted values capture the trend and seasonal swings in the power generation training data, but the fitted value at the end of the training period is below the actual data. As a result, the forecast data begins below the actual data. Nevertheless, the observed trend and seasonal pattern are still projected forward by the linear model with seasonal dummy variables. This is sufficient for testing the plotting function. Later articles with explore more accurate forecast methods.