# Category Archives: R Data Syntax

## Data Concatenation and Coercion in R

Data concatenation and coercion are common operations in R.

*Data Concatenation*

*Data Concatenation*

The concatenate c() function is used to combine elements into a vector.

1 2 3 4 5 |
> c(T, F, T) [1] T F T > c(8.3, 9.2, 11) [1] 8.3 9.2 11.0 |

When elements are combined from different classes, the c() function coerces to a common type, which is the type of the returned value:

1 2 3 4 |
> x <- c(100, "A", TRUE, as(1, "complex")) > x [1] "100" "A" "TRUE" "1+0i" > class(x) [1] "character" |

*(256 words, estimated 1:01 mins reading time)*

## Data Formatting in R

#### There are a number of ways to accomplish data formatting in R.

**Data Options in R**

**Data Options in R**

R supports a range of data formats and controls. The options() function accesses the default settings R establishes at start-up. Session options that can be changed from the command line include:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
> names(options()) [1] "add.smooth" "bitmapType" "browser" [4] "browserNLdisabled" "check.bounds" "continue" [7] "contrasts" "defaultPackages" "demo.ask" [10] "device" "device.ask.default" "digits" [13] "dvipscmd" "echo" "editor" [16] "encoding" "example.ask" "expressions" [19] "help_type" "help.search.types" "help.try.all.packages" [22] "HTTPUserAgent" "internet.info" "keep.source" [25] "keep.source.pkgs" "locatorBell" "mailer" [28] "max.print" "menu.graphics" "na.action" [31] "nwarnings" "OutDec" "pager" [34] "papersize" "pdfviewer" "pkgType" [37] "printcmd" "prompt" "repos" [40] "rl_word_breaks" "scipen" "show.coef.Pvalues" [43] "show.error.messages" "show.signif.stars" "str" [46] "str.dendrogram.last" "stringsAsFactors" "texi2dvi" [49] "timeout" "ts.eps" "ts.S.compat" [52] "unzip" "useFancyQuotes" "verbose" [55] "warn" "warning.length" "width" |

Each of these variables can be changed to modify R performance. For more details on each element see the HTML help for the options() function. A practical example is given below.

*(202 words, estimated 48 secs reading time)*

## Data Infix Operators in R

*Intro to Infix Operators in R*

*Intro to Infix Operators in R*

Infix operators in R are unique functions and methods that facilitate basic data expressions or transformations.

Infix refers to the placement of the arithmetic operator between variables. For example, an infix operation is given by (a+b), whereas prefix and postfix operators are given by (+ab) and (ab+), respectively.

The types of infix operators used in R include functions for data extraction, arithmetic, sequences, comparison, logical testing, variable assignments, and custom data functions.

*(322 words, 1 image, estimated 1:17 mins reading time)*

## Factors in R

Categorical (e.g. qualitative) data are represented as factors in R. Factors display as character strings (e.g. labels), but are stored as integers (e.g. levels).

*Creating Factors in R*

*Creating Factors in R*

Factors may be created by using the factor() or as.factor() function:

1 2 3 4 5 6 7 8 9 10 |
# Create data object using factor() > age <- factor(c(1, 1, 2, 2, 1, 3, 1, 2), labels = c("20-35yrs", "35-55yrs", "55+yrs")) > age [1] 20-35yrs 20-35yrs 35-55yrs 35-55yrs 20-35yrs 55+yrs 20-35yrs 35-55yrs # Create data object using as.factor() > age <- c("20-35yrs", "20-35yrs", "35-55yrs", "35-55yrs", "20-35yrs", "55+yrs", "20-35yrs", "35-55yrs") > age <- as.factor(age) > age [1] 20-35yrs 20-35yrs 35-55yrs 35-55yrs 20-35yrs 55+yrs 20-35yrs 35-55yrs |

Note that it is not possible to assign labels to the factor levels within the function as.factor().

Another way to create factors in R is to split a data object into category groups and then call the factor() function:

*(368 words, estimated 1:28 mins reading time)*

## Tidy Data Transformations

**Package Dependencies**

**Package Dependencies**

The core packages for tidy data transformations are listed below:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Data format/shape library(tibble) # simple data.frames library(tidyr) # data cleaning/reshaping # Data transforms library(dplyr) # data transforming library(forcats) # data factor mgmt library(lubridate) # data/time objects library(hms) # time-of-day values library(stringr) # string mgmt # Programming library(purrr) # functional programming tools library(purrrlyr) # intersection of purrr and dplyr library(magrittr) # pipe operators |

The dplyr package is by far the most important of the packages in the “tidyverse” for data transformation and manipulation.^{1} Verb-based functions are one of the advantages of the package. The syntax is much easier to use when compared to the cryptic syntax of base R.

*(1093 words, 8 images, estimated 4:22 mins reading time)*

## R Data Syntax

The following pages introduce the fundamentals of R data syntax for program scripting and quantitative data analysis.

- Data Object Modes and Classes
- Data Object Management
- Data Formatting
- Dates and Date Formats
- Data Subscripting
- Data Infix Operators
- Data Expressions
- Data Concatenation and Coercion
- Data Sequences and Repetition
- Data Sorting
- Data Distributions
- Regular Expressions

## Regular Expressions (RegEx) in R

In computing, a regular expression (abbreviated regexp) is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings. The patterns are often a combination of text abbreviations, metacharacters, and wild cards. Regular expressions are used for searching for objects, doing extractions, or find/replace operations. The use of regular expressions offers convenience and can have powerful impact on data or object management.

**regexp ***Functions in R*

*Functions in R*

Functions in R for regular expressions include:

*(913 words, estimated 3:39 mins reading time)*

## Tidy Data Preparation

**Package Dependencies**

**Package Dependencies**

The core packages for tidy data preparation are listed below:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Data format/shape library(tibble) # simple data.frames library(tidyr) # data cleaning/reshaping # Data transforms library(dplyr) # data transforming library(forcats) # data factor management library(lubridate) # data/time objects library(hms) # time-of-day values library(stringr) # string mgmt # Programming library(purrr) # functional programming tools library(purrrlyr) # intersection of purrr and dplyr library(magrittr) # pipe operators |

Of these, the tibble and tidyr packages are core to data consistency and preparation.^{1}

*Creating *tibble** Data**

*Creating*tibble

**Data**

The tibble package provides a new data class for storing tabular data, the tibble. tibbles inherit the data.frame class, but improves 3 behaviors:

- Subsetting – Always returns a new tibble, maintaining data consistency

*(751 words, 4 images, estimated 3:00 mins reading time)*

## Principles of Tidy Data

**Introduction to Tidy Data**

**Introduction to Tidy Data**

Despite the enormous amount of data available, there is surprisingly little alignment or information on how to create clean, consistent and easy to use data.

Human interface with data and code can benefit from some simple principles to facilitate repeatable research and results. The “tidy” approach to data requires that:

- Data is structured consistently and reusable;
- Code flow relies on simple function calls using the pipe;

*(645 words, 1 image, estimated 2:35 mins reading time)*

## Geospatial Data and Mapping in R

I share slides presented at a recent meeting of Doha R users on geospatial data and mapping in R .

Geospatial Data and Mapping in R (281 downloads)

## Data Expressions in R

*Data Expressions*

*Data Expressions*

The following list defines data expressions in R that are used to compute basic numerical results for scalars, vectors, and rectangular data objects. Scroll through the table to see all functions:

Function | Description | Comment |
---|---|---|

abs() | Absolute value | n/a |

approx() | Linear interpolation of points | n/a |

asin(); acos(); atan() | Inverse trigonometric functions | n/a |

asinh(); acosh(); atanh() | Inverse hyperbolic functions | n/a |

ceiling() | Round up to nearest integer | Impacts stored precision |

*(234 words, estimated 56 secs reading time)*