Data sorting in R is simple and straightforward. Key functions include sort() and order(). The variable by which sort you can be a numeric, a string or a factor variable. Argument options also provide flexibility how missing values will be handled: they can be listed first, last or removed.
Data Sorting Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
> x <- c(0.868, -0.066, -0.075, -1.002, 0.646) > sort(x) [1] -1.002 -0.075 -0.066 0.646 0.868 > order(x) [1] 4 3 2 5 1 > x[order(x)] [1] -1.00203069 -0.07577924 -0.06647998 0.64641650 0.86889398 > x <- rep(1:4, each = 2) > x [1] 1 1 2 2 3 3 4 4 > unique(x) [1] 1 2 3 4 > rev(unique(x)) [1] 4 3 2 1 |
It is also possible to sort in reverse order by using a minus sign ( – ) in front of the sort variable. For example:
1 2 |
> sort(-x) [1] 0.868 0.646 -0.066 -0.075 -1.002 |
Data Sorting with Missing Values
The order() function has the na.last = argument, which allows missing values to be sorted first, last or removed. Setting the argument to TRUE, FALSE or NA is outline below. First, missing values are assigned to the data vector used previously.
1 2 3 4 5 6 7 8 9 |
> x[c(2, 4)] <- NA > x [1] 0.868, NA, -0.075, NA, 0.646 > x[order(x), na.last = TRUE] [1] -0.075 0.646, 0.868, NA, NA > x[order(x), na.last = FALSE] [1] NA, NA, -0.075 0.646, 0.868 > x[order(x), na.last = NA] [1] -0.075 0.646, 0.868 |
Data Sorting with subset()
A data subset is a portion of a data object that is sorted to meet logical conditions, as shown below.
1 2 |
subset(iris, Species == "setosa") subset(iris, Sepal.Width > 4) |
Data Sorting Functions in R
Other commonly used functions for sorting or manipulating data sets appear in the table below:
Function | Description | Comment |
---|---|---|
append() | Add elements to a vector | n/a |
cut() | Creates a category object by dividing continuous data into intervals | Generates cuts at specific points or for a specific number of equal width intervals |
duplicated() | Returns a logical value for input values without any repetitions | Inverse is x[!duplicated(x)] |
intersect() | Returns values shared by two objects | n/a |
is.element(x,y) | Logical tests if the elements of x are in y | Vector of TRUE/FALSE results |
length() | Number of elements in a vector | n/a |
match(), unmatch() | Compares an object to a table and returns 1 or na to identify matches | n/a |
order() | Returns the vector of permutations that will sort the input values in ascending order | n/a |
replace() | Replace elements in a vector | n/a |
rev() | Returns input vector in reverse order | n/a |
rle() | Computes the length of runs of the same value | n/a |
setdiff() | Determines elements of x not in y | Ignores elements of y not in x |
setequal() | Logical test if the elements of two objects are equal | Returns only one TRUE or FALSE |
sort() | Sort in ascending order | n/a |
sort.col() | Sorts values of a column(s) | One or two dimensional data objects only |
subset() | Return subsets of vectors, matrices or data frames which meet conditions. | subset(airquality, Temp > 80, select = c(Ozone, Temp)) |
unique() | Returns input values without any repetitions | n/a |
is.unsorted(0 | Logical test to determine if values are unsorted | n/a |
rank() | Returns the sample ranks of the values in a vector. Ties (i.e., equal values) and missing values can be handled in several ways. | n/a |
which() | Give the locations or TRUE indices for a logical test | The array equivalent is arrayInd() |
which.min() which.max() | Give the location or indices of the true min and max | n/a |