Categorical (e.g. qualitative) data are represented as factors in R. Factors display as character strings (e.g. labels), but are stored as integers (e.g. levels).
Creating Factors in R
Factors may be created by using the factor() or as.factor() function:
1 2 3 4 5 6 7 8 9 10 |
# Create data object using factor() > age <- factor(c(1, 1, 2, 2, 1, 3, 1, 2), labels = c("20-35yrs", "35-55yrs", "55+yrs")) > age [1] 20-35yrs 20-35yrs 35-55yrs 35-55yrs 20-35yrs 55+yrs 20-35yrs 35-55yrs # Create data object using as.factor() > age <- c("20-35yrs", "20-35yrs", "35-55yrs", "35-55yrs", "20-35yrs", "55+yrs", "20-35yrs", "35-55yrs") > age <- as.factor(age) > age [1] 20-35yrs 20-35yrs 35-55yrs 35-55yrs 20-35yrs 55+yrs 20-35yrs 35-55yrs |
Note that it is not possible to assign labels to the factor levels within the function as.factor().
Another way to create factors in R is to split a data object into category groups and then call the factor() function:
1 2 3 4 |
> age <- c(22, 31, 37, 52, 27, 60, 34, 53) > age.groups <- cut(age, breaks = c(20, 35, 55, 80), labels = c("20-35yrs", "35-55yrs", "55+ yrs")) > age <- factor(age.groups) |
The pretty() function creates “pretty” break points with labels which can then be used by cut() to split the data. For example:
1 2 3 4 5 6 |
> age <- c(22, 31, 37, 52, 27, 60, 34, 53) > cut(age, pretty(age)) [1] 1 2 2 4 1 4 2 4 attr(, "levels"): [1] "20+ thru 30" "30+ thru 40" "40+ thru 50" "50+ thru 60" |
Printing Factors
When you print a factor, the result is the level attribute for each data point. The print.default() function displays how a factor is stored internally:
1 2 3 4 5 6 |
> print.default(age) [1] 1 1 2 2 1 3 1 2 attr(, "levels"): [1] "20-35yrs" "35-55yrs" "55+yrs" attr(, "class"): [1] "factor" |
The integers serve as indices to the values in the levels attribute. Integer indices are obtained with the codes() function.
1 2 |
> codes(age) [1] 1 1 2 2 1 3 1 2 |
Or, you can examine the levels of a factor with the levels() function.
1 2 |
> levels(age) [1] "20-35yrs" "35-55yrs" "55+yrs" |
To get the number of cases of each level in a factor, call summary():
1 2 3 |
> summary(age) 20-35yrs 35-55yrs 55+yrs 4 3 1 |
Creating Ordered Factors in R
By default, factor levels are assigned alphabetically with numeric codes assigned accordingly. Use the ordered() function to create ordered factors if the level order of a factor is important. The arguments to ordered() are the same as for factor(). The order of the values used in the levels argument determines the order placed on the levels.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# Create first ordered factor > age <- c(1,1,2,2,1,3,1,2) > age.ord1 <- ordered(age, levels = c(1, 2, 3), labels = c("20-35yrs", "35-55yrs","55+yrs")); > levels(age.ord1) [1] "20-35yrs" "35-55yrs" "55+yrs" > codes(age.ord1) [1] 1 1 2 2 1 3 1 2 > summary(age.ord1) 20-35yrs 35-55yrs 55+yrs 4 3 1 # Create second ordered factor with inverted codes > age.ord2 <- ordered(age, levels = c(3, 2, 1), labels = c("20-35yrs","35-55yrs", "55+yrs")); > levels(age.ord2) [1] "20-35yrs" "35-55yrs" "55+yrs" > codes(age.ord2) [1] 3 3 2 2 3 1 3 2 > summary(age.ord2) 20-35yrs 35-55yrs 55+yrs 1 3 4 |
The order relationship between the different levels is printed for an ordered factor along with the values:
1 2 3 4 5 |
> age.ord1 [1] 20-35yrs 20-35yrs 35-55yrs 35-55yrs 20-35yrs 55+yrs 20-35yrs 35-55yrs 20-35yrs < 35-55yrs < 55+yrs > age.ord2 [1] 55+yrs 55+yrs 35-55yrs 35-55yrs 55+yrs 20-35yrs 55+yrs 35-55yrs 20-35yrs < 35-55yrs < 55+yrs |
Finally, if it is necessary to convert a factor with numeric levels to a (numeric) vector, the following expression must be used:
1 |
> as.numeric(as.character(x)) |
which will convert all non-numeric values to NAs.
Warning: When a category object is converted to an ordered factor object, the levels attribute is NOT used for factor labels. In order to keep the label names, the category object must first be converted to a factor object and then ordered.
Manipulating Factors in R
The following table summarizes common functions for factor manipulation:
[table id=15 /]