Layered plots are one way to achieve new insight and actionable intelligence when working with complex data. ggplot is well suited for layered plots.
Data Pre-Processing
To make graphs with ggplot(), the data must be in a data frame and in “long” (as opposed to wide) format. Converting between “wide” and “long” data formats is facilitated with the reshape2 package. Specifically, the melt() function converts wide to long format, and the cast() function converts long to wide format. The following code block presents examples of the two data formats.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
library(ggplot2) library(reshape2) # Define Doha cloud coverage percent by month and by quartile cloud.wide <- data.frame(Quartile = c(25, 40, 50, 60, 75), Jan = c(.043, .100, .162, .245, .435), Feb = c(.034, .095, .155, .260, .460), Mar = c(.040, .110, .200, .285, .485), Apr = c(.027, .075, .160, .235, .415), May = c(.010, .030, .050, .085, .150), Jun = c(.000, .000, .000, .000, .065), Jul = c(.000, .011, .030, .045, .115), Aug = c(.000, .010, .030, .046, .115), Sep = c(.000, .005, .010, .025, .065), Oct = c(.005, .030, .035, .040, .120), Nov = c(.030, .075, .100, .125, .265), Dec = c(.050, .100, .125, .225, .455)) # Convert Quartile column to factor cloud.wide$Quartile <- factor(cloud.wide$Quartile) print(cloud.wide) # Convert data from wide to long format using melt() cloud.long <- melt(cloud.wide, id.vars="Quartile") print(cloud.long) # create subset data frames for the 25th, 50th, and 75th quartiles subset25 <- cloud.long[which(cloud.long$Quartile==25),] subset50 <- cloud.long[which(cloud.long$Quartile==50),] subset75 <- cloud.long[which(cloud.long$Quartile==75),] |
Basic Plot Commands
The ggplot() function is used to create a plot. It has two arguments: data and an aesthetic mapping, which is accomplished using aes(). There are 3 basic uses of the ggplot() function:
1 2 3 |
ggplot(data = df, aes(x = variable, y = values, <more aesthetics>)) ggplot(data = df) ggplot() |
The first method is used if all plot layers rely on the same data and the same set of aesthetics. The second method specifies the default data frame, but no aesthetics are defined upfront. This is useful when aesthetics are defined by layer and geometric objects (or geoms for short) are layered into the plot using the + operator. The third method launches an empty graphic device, creating a skeleton ggplot object, which is made visible only after data and geoms are added. The last method is preferred when creating complex plots that call multiple data frames to populate different plot layers.
ggplot() Examples
The grammar of graphics is shown in the following charts. The plots depict the Doha Cloud Mask, or the percentage of sky covered with clouds, by month. The examples are created using the three different methods to call to ggplot(). The first two charts rely on different data inputs and graphical aesthetics. The final plot uses data inputs from multiple data frames and combines all the aesthetics and graphical layers from the first two plots.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# Method 1: Data set and all aesthetics in call to ggplot ggplot(data = cloud.long, aes(x = month, y = cloud.percent, color = Quartile, group = Quartile)) + geom_line() + labs(title = "Doha Cloud Mask: Quartiles") # Method 2: Data set and aesthetics defined by layer ggplot(data = subset50) + geom_area(aes(x = month, y = cloud.percent, ymin = 0, ymax = cloud.percent, group = 1), fill = "dark green", alpha = 0.15) + geom_line(aes(x = month, y = cloud.percent, group = 1), color = "green", size = 1) + geom_point(aes(x = month, y = cloud.percent), color = "steelblue", size = 3) + labs(title = "Doha Cloud Mask: Mean") # Method 3: Multiple data sets and aesthetics defined by layer ggplot() + geom_line(data = cloud.long, aes(x=month, y = cloud.percent, group = Quartile, color = Quartile)) + geom_area(data = subset50, aes(x = month, y = cloud.percent, ymin = 0, ymax = cloud.percent, group=1), fill = "darkgreen", alpha = 0.15) + geom_line(data = subset50, aes(x = month, y = cloud.percent, group=1), color = "green", size = 1.25) + geom_point(data = subset50, aes(x = month, y = cloud.percent), color = "steelblue", size = 3) + labs(title = "Doha Cloud Mask By Month") |
The examples above are introductory in nature, showing that complex plots can be created with just a few lines of code. The default theme also pre-formats the plot size, plot background, axes spacing and font, quartile line colors, major and minor gridlines, and legend. Simplified syntax is confirmed.
The objective, of course, is to master low level control of all plot elements. To this end, the following sections serve as a reference, using basic data and structured examples.