Introduction
RMarkdown provides an authoring system for project and data science reporting. RMarkdown is a core component of the RStudio IDE. It braids together narrative text with embedded chunks of R code. The R code serves to demonstrate the model concepts in the text. RMarkdown produces elegantly formatted document output, including publication quality data plots and tables.
RMarkdown integrates several applications into an easy to use framework. The core components include the R programming language and Markdown, which is a light weight markup language for text creation. RMarkdown then relies on the knitr package to call the R interpreter and to produce model results, report tables and charts. The pandoc application is called next and serves as a document conversion tool that renders text in different output formats. pandoc also has the ability to render commands for state-of-the-art typesetting, equation editing and document control.
Benefits of RMarkdown
The benefits of combining text authoring with scientific programming are significant:
- Reported results are immediately reproducible given embedded code and data objects;
- Code will often detail incremental calculations that the academic text might overlook given the need to simplify text content and equations
- Reports can be easily updated as data or code objects are updated … no copy/paste effort across files or applications is required;
- Collaboration and communication is greatly enhanced within teams and across teams;
- Dozens of output formats are supported, catering to a wide variety of authoring needs.
Perhaps most important, the RStudio IDE delivers integrated code and text authoring so the entire process is easy to implement and use.
Dependencies
The following applications need to be in place, regardless of the operating system used:
- R programming language
- RStudio
- Tex (markup and typesetting language for high quality
)
- Compilers for other programming languages (optional)
The RMarkdown packages also need to be installed, which install dependent packages like knitr, among others:
1 |
install.packages("rmarkdown") |
How It Works
The following sequence defines the standard RMarkdown work flow:
- Open a new .Rmd file in RStudio with File ▶ New File ▶ RMarkdown. Use the wizard that opens (shown at left) to pre-populate the file with a template options;
- Write document by editing template with text narration and code chunks (examples below);
- Knit document to create report by using point-and-click knit button to render the text document or use the render() command in console to knit;
- Preview document output in the IDE window;
- Publish file output (optional) to a web server;
- Examine build log in RMarkdown console in case of errors;
- Use output files (such as *.tex, *.pdf, *.doc, etc) that are created by pandoc and saved with the original*.Rmd file.
RMarkdown Example
The plain text below is an RMarkdown file with the extension *.Rmd:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
--- title: "Sample RMarkdown Document" author: "Brad Horn" date: "January 7, 2019" output: pdf_document: default html_document: df_print: paged word_document: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## R Markdown This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>. When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: ```{r cars} summary(cars) ``` ## Including Plots You can also embed publication quality plots, for example: ```{r pressure, echo=FALSE} plot(pressure) ``` Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot. |
The plain text file contains three important types of content:
- A header section surrounded by — with core document controls;
- Blocks of R code surrounded by
`; and
- Text narration mixed with simple text formatting like ##sub-heading and **bold**.
The example above is very simple in nature with only minimal formatting of text, data and plots. The rendered output looks like this:
Document control and quality are explored next.
Text Formatting
Some basic text formatting commands appear below:
Syntax | Description |
---|---|
*italics* _italics_ | italics text |
**bold** __bold__ | bold text |
superscript^^2^^ | superscript text |
--strikethrough-- | strikethrough text |
[link](www.google.com) | hyperlink |
# Header 1 ## Header 2 ### Header 3 #### Header 4 ##### Header 5 ###### Header 6 | Headers with bold text in various font sizes |
$A = \pi * r^(2)$ | inline equation |
 | image insertion |
* item1 * item2 + sub-itemA + sub-itemB | bullet list |
1. item1 2. item2 + sub-itemA + sub-itemB | ordered list |
```{r} paste("Hello", "World") ``` | code chunk |
`r pase("Hello", "World")`. | inline code |
Top quality document formatting can also be achieved by embedding markdown commands into the Rmarkdown file. An intro to
and many common commands can be found here
Global Document Options
Document output formats are controlled by option controls in the RMarkdown header. The following example shows an expanded header section. The output section is now set to produce a PDF file and an HTML file for web-site use. Hence. multiple output documents can produced simultaneously:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
--- title: "Statistical Machine Learning" author: "Bradley J. Horn" date: "02-Nov-2016" output: pdf_document: keep_tex: yes number_sections: true fig_caption: true html_document: default header-includes: - \usepackage{amsmath} - \usepackage{amssymb} - \usepackage{amsfonts} - \usepackage{mathtools} - \usepackage{graphicx} - \usepackage{placeins} - \usepackage{wrapfig} - \usepackage{framed} - \usepackage[dvipsnames]{xcolor} - \usepackage[font={small,it}]{caption} - \usepackage[font={small,it}]{subcaption} - \usepackage{relsize} - \usepackage[hang,bottom]{footmisc} - \usepackage{hyperref} - \usepackage{pdflscape} subtitle: "Project Notes and Math Proofs for R-Code Implementation" geometry: top=0.7in, bottom=0.5in, left=0.5in, right=0.7in papersize: a4paper --- |
Looking closer at the header details above, several observations are worth sharing:
- The output options will save the
*.tex file created by pandoc to render the PDF file. This can be useful for debugging large markdown documents (either in RStudio or in Tex, where more detailed debugging options are available);
- The PDF output has been configured to include section and figure numbering, which is typical of more formal documents.
- The header-includes section now attaches a large number
packages for high quality document control. Included packages enable equation editing, figure inserting and layout placement, non-standard font colors, footnoting, improved hyperlinks, and the ability to switch page layouts from portrait to landscape within a single document.
- Finally, the header defines a sub-title, page geometry or dimensions and specifies a European paper size.
For large documents, the header section expands during document development as new options are required. As a result, some headers can be substantial.
Document Format Types
The following output formats are available to use with RMarkdown. The supplied links provide detailed format instructions for setting up the header of the RMarkdown file:
Documents
- html_notebook ⧉ – Interactive R Notebooks
- html_document ⧉ – HTML document w/ Bootstrap CSS
- pdf_document ⧉ – PDF document (via LaTeX template)
- word_document ⧉ – Microsoft Word document (docx)
- odt_document ⧉ – OpenDocument Text document
- rtf_document ⧉ – Rich Text Format document
- md_document ⧉ – Markdown document (various flavors)
Presentations (slides)
- ioslides_presentation ⧉ – HTML presentation with ioslides
- revealjs::revealjs_presentation ⧉ – HTML presentation with reveal.js
- slidy_presentation ⧉ – HTML presentation with W3C Slidy
- beamer_presentation ⧉ – PDF presentation with LaTeX Beamer
- powerpoint_presentation ⧉: PowerPoint presentation
More
- flexdashboard::flex_dashboard – Interactive dashboards
- tufte::tufte_handout ⧉ – PDF handouts in the style of Edward Tufte
- tufte::tufte_html ⧉ – HTML handouts in the style of Edward Tufte
- tufte::tufte_book ⧉ – PDF books in the style of Edward Tufte
- html_vignette ⧉ – R package vignette (HTML)
- github_document – GitHub Flavored Markdown document
You can also build books ⧉, websites ⧉, and interactive documents ⧉ with RMarkdown. There are also package solutions that utilize RMarkdown to format text to the specs of different academic journals.
Code Chunk Options
The next section defines options to control the behavior of code chucks in the RMarkdown file:
Option | Default Value | Description |
---|---|---|
EVALUATION | ||
child | NULL | A character vector of filenames. Knitr will knit the files and place them into the main document. |
code | NULL | Set to R code. Knitr will replace the code in the chunk with the code in the code option. |
engine | 'R' | Knitr will evaluate the chunk in the named language, e.g. engine = 'python'. Run names(knitr::knit_engines$get()) to see supported languages. |
eval | TRUE | If FALSE, knitr will not run the code in the code chunk. |
include | TRUE | If FALSE, knitr will run the chunk but not include the chunk in the final document. |
purl | TRUE | If FALSE, knitr will not include the chunk when running purl() to extract the source code. |
RESULTS | ||
collapse | FALSE | If TRUE, knitr will collapse all the source and output blocks created by the chunk into a single block. |
echo | TRUE | If FALSE, knitr will not display the code in the code chunk above it’s results in the final document. |
results | 'markup' | If 'hide', knitr will not display the code’s results in the final document. If 'hold', knitr will delay displaying all output pieces until the end of the chunk. If 'asis', knitr will pass through results without reformatting them (useful if results return raw HTML, etc.) |
error | TRUE | If FALSE, knitr will not display any error messages generated by the code. |
message | TRUE | If FALSE, knitr will not display any messages generated by the code. |
warning | TRUE | If FALSE, knitr will not display any warning messages generated by the code. |
CODE FORMAT | ||
comment | '##' | A character string. Knitr will append the string to the start of each line of results in the final document. |
highlight | TRUE | If TRUE, knitr will highlight the source code in the final output. |
prompt | FALSE | If TRUE, knitr will add > to the start of each line of code displayed in the final document |
strip.white | TRUE | If TRUE, knitr will remove white spaces that appear at the beginning or end of a code chunk. |
tidy | FALSE | If TRUE, knitr will tidy code chunks for display with the tidy_source() function in the formatR package. |
CHUNKS | ||
opts.label | NULL | The label of options set in knitr:: opts_template() to use with the chunk. |
R.options | NULL | Local R options to use with the chunk. Options are set with options() at start of chunk. Defaults are restored at end. |
ref.lables | NULL | A character vector of labels of the chunks from which the code of the current chunk is inherited. |
CACHE | ||
autodep | FALSE | If TRUE, knitr will attempt to figure out dependencies between chunks automatically by analyzing object names. |
cache | FALSE | If TRUE, knitr will cache the results to reuse in future knits. Knitr will reuse the results until the code chunk is altered. |
cache.comments | NULL | If FALSE, knitr will not rerun the chunk if only a code comment has changed. |
cache.lazy | TRUE | If TRUE, knitr will use lazyload() to load objects in chunk. If FALSE, knitr will use load() to load objects in chunk. |
cache.path | 'cache/' | A file path to the directory to store cached results in. Path should begin in the directory that the .Rmd file is saved in. |
cache.vars | NULL | A character vector of object names to cache if you do not wish to cache each object in the chunk. |
ANIMATION | ||
anipots | 'controls, loop | Extra options for animations (see the animate package). |
interval | 1 | The number of seconds to pause between animation frames. |
PLOTS | ||
dev | 'png' | The R function name that will be used as a graphical device to record plots, e.g. dev='CairoPDF'. |
dev.args | NULL | Arguments to be passed to the device, e.g. dev.args=list(bg='yellow', pointsize=10). |
dpi | 72 | A number for knitr to use as the dots per inch (dpi) in graphics (when applicable). |
external | TRUE | If TRUE, knitr will externalize tikz graphics to save LaTex compilation time (only for the tikzDevice::tikz() device). |
fig.align | 'default' | How to align graphics in the final document. One of 'left', 'right', or 'center'. |
fig.cap | NULL | A character string to be used as a figure caption in LaTex. |
fig.env | 'figure' | The Latex environment for figures. |
fig.ext | NULL | The file extension for figure output, e.g. fig.ext='png'. |
fig.height fig.width | 7 | The width and height to use in R for plots created by the chunk (in inches). |
fig.keep | 'high' | If 'high', knitr will merge low-level changes into high level plots. If 'all', knitr will keep all plots (low-level changes may produce new plots). If 'first', knitr will keep the first plot only. If 'last', knitr will keep the last plot only. If 'none', knitr will discard all plots. |
fig.lp | 'fig:' | A prefix to be used for figure labels in latex. |
fig.path | 'figure/' | A file path to the directory where knitr should store the graphics files created by the chunk. |
fig.pos | " | A character string to be used as the figure position arrangement in LaTex. |
fig.process | NULL | A function to post-process a figure file. Should take a filename and return a filename of a new figure source. |
fig.retina | 1 | Dpi multiplier for displaying HTML output on retina screens. |
fig.scap | NULL | A character string to be used as a short figure caption. |
fig.subcap | NULL | A character string to be used as captions in sub-figures in LaTex. |
fig.show | 'as.is' | If 'hide', knitr will generate the plots created in the chunk, but not include them in the final document. If 'hold', knitr will delay displaying the plots created by the chunk until the end of the chunk. If 'animate', knitr will combine all of the plots created by the chunk into an animation. |
fig.showtext | 'as.is' | If TRUE, knitr will call showtext::showtext.begin() before drawing plots. |
out.extra | 'as.is' | A character string of extra options for figures to be passed to LaTex or HTML. |
out.height out.width | 'as.is' | The width and height to scale plots to in the final output. Can be in units recognized by output, e.g. 8\\linewidth, 50px |
resize.height resize.width | 'as.is' | The width and height to resize tike graphics in LaTex, passed to \resizebox{}{}. |
sanitize | FALSE | If TRUE, knitr will sanitize tike graphics for LaTex. |
Use of Other Programming Languages
It is important to acknowledge that code chunks from other languages can also be integrated into RMarkdown documents. The support for multiple languages comes from the knitr package, which has a large number of language engines. For example, there are 50 programming languages supported by knitr:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
names(knitr::knit_engines$get()) [1] "awk" "bash" "coffee" [4] "gawk" "groovy" "haskell" [7] "lein" "mysql" "node" [10] "octave" "perl" "psql" [13] "Rscript" "ruby" "sas" [16] "scala" "sed" "sh" [19] "stata" "zsh" "highlight" [22] "Rcpp" "tikz" "dot" [25] "c" "fortran" "fortran95" [28] "asy" "cat" "asis" [31] "stan" "block" "block2" [34] "js" "css" "sql" [37] "go" "python" "julia" [40] "theorem" "lemma" "corollary" [43] "proposition" "conjecture" "definition" [46] "example" "exercise" "proof" [49] "remark" "solution" |
To use a different language engine, simply change the language name in the chunk header from R to the engine name:
1 2 3 4 |
```{python} x = 'hello, python world!' print(x.split(' ')) ``` |
Shell scripts like bash (for the Linux and OSX operating systems) can also be run in RMarkdown as follows:
1 2 3 4 |
```{bash} echo "Hello Bash!" cat flights1.csv flights2.csv flights3.csv > flights.csv ``` |