Category Archives: ggplot2
The Grammar of Graphics
Comparison of Graphic Tools
When it comes to producing graphics in R, there are basically three choices:
- Base graphics which ships with R,
- the lattice package extension, and
- the ggplot2 package extension.
Base graphics was described extensively in the previous few chapters, and is the preferred choice for creating highly customized charts, like the polar windrose plot below, where flexibility and control over all graph objects is essential:
Themes (ggplot)
Quick Intro
Themes and theme elements control the non-data components of the plot. The use, of predefined themes and custom themes in ggplot are described below.
ggplot provides two built-in format themes. theme_grey() is the default theme and has a grey plot background. theme_bw() is a simple black and white theme. ggplot also provides functions to modify theme elements or to create new themes.
Titles (ggplot)
Options to control titles in ggplot are described below.
Chapter Content
- Data
- Basic Plot: No Title Aesthetics
- Title Aesthetics
Data
The diamonds dataset that ships with ggplot.
Basic Plot: No Titles
1 2 3 4 5 6 7 |
# Plot skeleton p0 <- ggplot(diamonds, aes(depth)) + xlim(58, 68) # Plot1: Histogram with facetting p1 <- p + geom_histogram(aes(y = ..density..), binwidth = 0.1, colour = "dark orange", fill = "orange") + facet_grid(cut ~ .) # Print graph object p1 |
Title Aesthetics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# Plot2: Main title with line break p1 + labs(title = "Diamond Data\nDepth versus Cut") # Plot3: Main title with alternative line break and font formats p1 + labs(title = expression(atop(bold("Diamond Data"), atop(italic("Depth versus Cut"), "")))) + theme(plot.title = element_text(size = 25)) # Plot3: x and y axis title p1 + labs(x = "Diamond Depth", y = "Frequency %") # Plot4: x any y axis titles with text and title formats p1 + labs(x = "Diamond Depth (mm)", y = "Frequency (%)") + theme(axis.title.y = element_text(size = 14, vjust = 0.25, face = "bold"), axis.text.y = element_text(size = 11, color = "black"), axis.title.x = element_text(size = 14, vjust = -0.25, face = "bold"), axis.text.x = element_text(size = 11, color = "black", angle = -90), strip.text.y = element_text(size=12, face="bold", color="white"), strip.background = element_rect(color="black", fill="skyblue4"), panel.border = element_rect(fill = NA, colour="navy")) |
Axes (ggplot)
Data
A simple sequence of numbers is provided to populate the base graph.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
library(ggplot2) # Define data sensors <- factor(paste0("loc", 1:10), levels=paste0("loc", 1:10)) df1 <- data.frame(loc = sensors, x = 100/seq(100, 1000, by = 100), y = 100/seq(1000, 100, by = -100)) # Convert from wide to long format df1 <- melt(df1, id.vars = "loc") # Base graph; no axes formats p <- ggplot(df1, aes(loc, value, group=variable, fill=variable)) + geom_bar(stat="identity", color="green") + scale_fill_manual(values=c("grey35", "grey65")) |
Axes Text Formats
The following examples modify axes text formats with colors and bold font so they are easier to see. The second chart benefits from axis text being rotated 90 degrees.
1 2 3 4 5 6 7 |
# Change axes text font and color p <- p + theme(axis.text.x = element_text(face="bold", color="blue", size=12), axis.text.y = element_text(face="bold", color="red", size=12)) print(p) # Change x-axis angle p + theme(axis.text.x = element_text(angle = -90, vjust = 0.5)) |
Layered Plots
Layered plots are one way to achieve new insight and actionable intelligence when working with complex data. ggplot is well suited for layered plots.
Data Pre-Processing
To make graphs with ggplot(), the data must be in a data frame and in “long” (as opposed to wide) format. Converting between “wide” and “long” data formats is facilitated with the reshape2 package. Specifically, the melt() function converts wide to long format, and the cast() function converts long to wide format. The following code block presents examples of the two data formats.
Bar Graphs (ggplot)
Data
The mtcars data frame ships with R and was extracted from the 1974 US Magazine Motor Trend. The data compares fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
Basic Bar Graph Syntax
A basic plot skeleton is created by declaring the data, aesthetic mappings. The additional stat argument is set equal to “bin” (the default) to obtain counts of the data observations and “identity” and to plot observed values.
1 2 3 4 5 6 7 8 |
library(ggplot) # Plot skeleton c <- ggplot(mtcars, aes(factor(cyl))) # The argument stat="bin" is the default and is not specified # Plot1: Basic bar chart of car count by cylinder c + geom_bar() |
Boxplots (ggplot)
Data
diamonds is a dataset that ships with ggplot2 with observations from almost 54,000 diamonds. Data slicing is possible by price, carat, cut, color, clarity, size, depth and table width. Boxplots are ideally suited for visualizing data variability.
Basic Boxplot Syntax
1 2 3 4 5 6 7 |
library(ggplot) # Plot Skeleton p <- ggplot(diamonds, aes(x=factor(color), y=carat)) # Boxplot of diamond carat as a function of diamond color p + geom_boxplot() |
Boxplot Aesthetics
Boxplot aesthetics define the x input data and have several argument parameters that control box attributes: ymin,lower quartile, middle bar statistic, upper quartile, and ymax. Additional arguments include color, fill, linetype, shape, size, weight and alpha transparency level.
Error Bars (ggplot)
Error bars are a basic chart type in ggplot and easy to execute.
Data
Synthetic sensor data set is created.
1 2 3 4 5 6 7 8 9 10 11 |
library(ggplot2) # Create synthetic sensor data set.seed(1) n <- 4 synth <- data.frame(location = rep(c("1", "2"), each = 2), sensor = rep(c("A", "B"),2), mean = rnorm(n, 6.5, 2), sd = abs(rnorm(n, 0.5, 0.25))) synth <- transform(synth, ci = synth$sd * 2.62) |
View the data:
1 2 3 4 5 6 |
print(synth) location sensor mean sd ci 1 1 A 7.739 0.380 0.996 2 1 B 6.387 0.604 1.583 3 2 A 6.188 0.839 2.199 4 2 B 3.558 0.474 1.242 |
Basic Error Bar Syntax
1 2 3 4 5 |
# Plot Skeleton p <- ggplot(synth, aes(x = location, y = mean, fill = sensor, group = sensor)) # Plot1: Error bars p + geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.1) |
Error Bar Aesthetics
geom_errorbar() takes the following aesthetic arguments: x axis data, ymin and ymax value data, color, linetype, horizontal bar width, and the alpha or transparency level. Error bars can be plotted alone or in layers with points, line and bars, as shown below.
Facetting (ggplot)
Facetting is a process that combines data sub-setting and data visualization.
Data
The diamonds dataset that ships with ggplot.
Plot Syntax: facet_grid
Facets are multiple small plots, each representing a slice of data. Facetting is a powerful data analysis and exploration tool. facet_grid produces a lattice grid of plot panels with basic row and column identifiers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
library(ggplot) # Plot skeleton p <- ggplot(diamonds, aes(carat, price, color = cut)) # Plot1: Raw data with no facets p + geom_point(size = 1.5, alpha = 0.75) # Plot2: facet_grid with diamond cut on columns p + geom_point(size=1.5, alpha = 0.75) + facet_grid( . ~ cut) # Plot3: facet_grid with diamond cut on rows p + geom_point(size=1, alpha = 0.75) + facet_grid(cut ~ .) # Plot 4: facet_grid with diamond cut by diamond clarity p + geom_point(size=1, alpha=0.75) + facet_grid(cut ~ clarity) |
Colors in ggplot
There are number of ways to control the default colors in ggplot.
The HCL Color Wheel
ggplot simplifies color choice with its default color selection, which are based on a “color wheel.” The result is a well balanced graphic that doesn’t draw too much attention to any one color. ggplot uses the HCL color wheel and the hue_pal() function from the scales package. Specifically, if there are two colors, then they will be selected from opposite points on the circle; if there are three colors, they will be 120° apart on the color circle; and so on. This ensure that discrete data has maximum contrast as a function of the number variables present.
Legends (ggplot)
Legends are a key component of data visualization. ggplot format controls are defined below.
Data
The diamonds data that ships with ggplot.
The Default Legend
The following example presents the default legend to be cusotmized.
1 2 |
def <- ggplot(diamonds, aes(cut, price)) + geom_boxplot(aes(fill = factor(cut))) + labs(title = "Diamonds Data", x = "Cut", y = "Price (USD)") |
Click to enlarge
Removing the Legend
Legends are created for different aesthetics, such as fill, color/colour, linetype, shape, etc. Each aesthetic has a scale function than can be called to remove the legend:
Line Graphs (ggplot)
ggplot line graph example are provided to illustrate different format options.
Data
The Orange
data frame ships with R and depicts the growth of fruit trees as a function of age (days). The data object has 35 rows and 3 columns in long data format.
Basic Line Plot Syntax
1 2 3 4 5 6 |
library(ggplot2) # Plot skeleton p <- ggplot(Orange, aes(age, circumference, group=Tree)) # Plot1: Basic line chart of circumference vs. age p + geom_line() |
Line Plot Aesthetics
Line chart aesthetics control selected x and y data, color (by name), linetype and size, transparency or alpha level. Line symbols or points are added using geom_point().
Multiple Plots (ggplot)
The multiplot() Function
Winston Chang’s R Graphical Cookbook provides a useful function to simplify the creation of layouts with multiple plots. The function accepts ggplot objects as inputs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# Multiplot creates a layout structure for multiple plots in one # window. The function accepts ggplot objects via "..." or as a # list of objects. Use input "cols" to define the number of # columns in the plot layout. Alternatively, input "layout" to # define the row/col matrix of plot elements. If "layout" is # present, "cols" is ignored. For example, given a layout defined # by matrix(c(1,2,3,3), nrow=2, byrow=TRUE), then plot #1 goes in # the upper left, #2 goes in the upper right, and #3 will cross # the entire bottom row, utilizing both columns. multiplot <- function(..., plotlist = NULL, file, cols = 1, layout = NULL) { # launch the grid graphical system require(grid) # make a list from the "..." arguments and/or plotlist plots <- c(list(...), plotlist) numPlots = length(plots) # if layout = NULL, then use 'cols' to determine layout if (is.null(layout)) { # make the panel using ncol; nrow is calculated from ncol layout <- matrix(seq(1, cols * ceiling(numPlots/cols)), ncol = cols, nrow = ceiling(numPlots/cols)) } if (numPlots==1) { print(plots[[1]]) } else { # set up the page grid.newpage() pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout)))) # put each plot, in the correct location for (i in 1:numPlots) { # get the i,j matrix position of the subplot matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE)) print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row, layout.pos.col = matchidx$col)) } } } |
Use of the function is straightforward. First, multiplot() needs to be sourced and available in memory. Then the plots need to be coded with variable assignments to create plot objects. Finally, multiplot() is used to call the plot objects for placement in the predefined plot layout.
Other Geoms (ggplot)
All geoms that ship with ggplot2 are listed for below. See HTML help in R for detailed argument structures and examples.
R Graphics (ggplot2)
The flexibility of R graphics using the package ggplot is illustrated through a series of structured examples.
The Grammar of Graphics
Building Layered Plots
Scatter Plots (ggplot)
Bar Graphs (ggplot)
Line Graphs (ggplot)
Boxplots (ggplot)
Error Bars (ggplot)
Facetting (ggplot)
Titles (ggplot)
Axes (ggplot)
Legends (ggplot)
Other Geoms (ggplot)
Multiple Plots (ggplot)
Themes (ggplot)