# Factors in R

Categorical (e.g. qualitative) data are represented as factors in R.  Factors display as character strings (e.g. labels), but are stored as integers (e.g.  levels).

#### Creating Factors in R

Factors may be created by using the factor() or as.factor() function:

Note that it is not possible to assign labels to the factor levels within the function as.factor().

Another way to create factors in R is to split a data object into category groups and then call the factor() function:

The pretty() function creates “pretty” break points with labels which can then be used by cut() to split the data.  For example:

#### Printing Factors

When you print a factor, the result is the level attribute for each data point.  The print.default() function displays how a factor is stored internally:

The integers serve as indices to the values in the levels attribute. Integer indices are obtained with the codes() function.

Or, you can examine the levels of a factor with the levels() function.

To get the number of cases of each level in a factor, call summary():

#### Creating Ordered Factors in R

By default, factor levels are assigned alphabetically with numeric codes assigned accordingly.  Use the ordered() function to create ordered factors if the level order of a factor is important. The arguments to ordered() are the same as for factor(). The order of the values used in the levels argument determines the order placed on the levels.

The order relationship between the different levels is printed for an ordered factor along with the values:

Finally, if it is necessary to convert a factor with numeric levels to a (numeric) vector, the following expression must be used:

which will convert all non-numeric values to NAs.

Warning: When a category object is converted to an ordered factor object, the levels attribute is NOT used for factor labels. In order to keep the label names, the category object must first be converted to a factor object and then ordered.

#### Manipulating Factors in R

The following table summarizes common functions for factor manipulation:

FunctionDescription
as.numeric()Returns numeric codes of a factor
cut()Create categories from a continuous variable
factor(); as.factor()Create a factor object
levels()assign, change or print the levels of a factor
ordered()Create an ordered factor
table()Creates a contingency table