Data
diamonds is a dataset that ships with ggplot2 with observations from almost 54,000 diamonds. Data slicing is possible by price, carat, cut, color, clarity, size, depth and table width. Boxplots are ideally suited for visualizing data variability.
Basic Boxplot Syntax
1 2 3 4 5 6 7 |
library(ggplot) # Plot Skeleton p <- ggplot(diamonds, aes(x=factor(color), y=carat)) # Boxplot of diamond carat as a function of diamond color p + geom_boxplot() |
Boxplot Aesthetics
Boxplot aesthetics define the x input data and have several argument parameters that control box attributes: ymin,lower quartile, middle bar statistic, upper quartile, and ymax. Additional arguments include color, fill, linetype, shape, size, weight and alpha transparency level.
1 2 3 4 5 6 7 8 9 10 11 |
# Plot2: Coordinate flip p + geom_boxplot() + coord_flip() # Plot3: Set aesthetics to fixed value p + geom_boxplot(fill = "palegreen", color = "blue4", size=0.5, outlier.color = "blue4", outlier.size = 2) # Plot4: Vary fill by diamond color p + geom_boxplot(aes(fill=factor(color))) # Plot5: Add more dimensions with new aesthetic mappings p + geom_boxplot(aes(fill = factor(cut))) |
Example: Fully Loaded
The default probabilities for any box plot are the standard quantiles. Custom probabilities can be plotted using predefined data inputs and the results displayed using geom_text. Note that no outliers are plotted with ymin = 0 and ymax = 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# Plot6: Change probabilities and display summary data # New probabilities: 0%, 5%, 50%, 95%, 100% new.prob <- c(0, 0.10, 0.50, 0.90, 1) label <- paste0("p", as(new.prob*100, "character")) p.dist <- ddply(diamonds, .(color), numcolwise(quantile, probs = new.prob)) p.dist <- transform(p.dist[, c("color", "carat")], probs = rep(label, 7)) p.dist <- dcast(p.dist, color ~ probs, value.var="carat") ggplot(p.dist, aes(x=color, ymin=p0, lower = p10, middle=p50, upper = p90, ymax =p100)) + geom_boxplot(stat="identity", color="darkgreen", fill="plum2") + geom_text(data = p.dist, aes(x = color, y = p0, label = p0), color = "darkgreen", size = 4, vjust = 1.5, fontface="bold") + geom_text(data = p.dist, aes(x = color, y = p10, label = p10), color = "blue", size = 4, vjust = -0.5, fontface="bold") + geom_text(data = p.dist, aes(x = color, y = p50, label = p50), color = "red", size = 4, vjust = -0.5, fontface="bold") + geom_text(data = p.dist, aes(x = color, y = p90, label = p90), color = "blue", size = 4, vjust = 1.5, fontface="bold") + geom_text(data = p.dist, aes(x = color, y = p100, label = p100), color = "darkgreen", size = 4, vjust = -0.5, fontface="bold") + labs(title = expression(atop(bold("Diamond Distributions"), atop(italic("Probabilities: 0%, 10%, 50%, 90% and 100%"),""))), x = "Color", y = "Carat") + theme(plot.title = element_text(size = 20), axis.title.y = element_text(size = 14, vjust = .25, face = "bold"), axis.text.y = element_text(size = 12, color= "grey15"), axis.title.x = element_text(size = 14, vjust = -0.25, face= "bold"), axis.text.x = element_text(size = 12, color = "grey15"), panel.border = element_rect(fill = NA, colour = "plum4")) |