Iteration in R

Iteration is core to many calculations.  The use of iteration in R is common, but should be avoided whenever possible given vectorized methods that often achieve the same goal.

Iteration, or traditional looping, is a brute force approach to data management that is effective, but costly.  Every time a large data set enters an iteration loop, a copy of the data is saved to disk.  Thus, iteration consumes time and memory.  R supports the following vectorized looping functions: apply(), lapply(), tapply(), sapply() and by().  More traditional functions for iteration in R are described below.

The repeat() Statement

The repeat() statement is the simplest looping construction in R.  It performs no tests, but simply repeats a given expression indefinitely. Because of this, the repeat() function expression must include an exit, typically using either a break() or return() statement. The syntax for repeat() is:

The custom function below uses Newton’s method to find the positive, real jth roots of a number. A test for convergence is included inside the loop and a break() statement is used to exit the loop.

We can replace the break statement inside the loop with a return() statement. This makes it clear what the returned value is and avoids the need for any statements outside the loop:

The modest change makes clear where the loop ends, but abrupt departures from a loop may be undesirable if additional code after the loop is required.

The while() Statement

The while() statement is used to loop over an expression until a condition is false.  The syntax is:

For example, the function below returns a vector that corresponds to the binary representation of an integer.

Warning: In practice, the while() statement will not handle vectorized results.

The for() Statement

The basic syntax for the for() function in R is simple:

R evaluates expression2 once for each name in expression1 (e.g. a vector). For example:

Note that braces aren’t required for one command. 

There are certain situations in which for() loops may be necessary in R:

  • when calculation i+1 in a vector depends on the result of the same and previous calculation.
  • operations on list components, recognizing lapply() and sapply() perform looping implicitly and are more efficient.

For example:

To create a matrix using two for() loops, a null matrix of the correct dimensions must first be created:

Iteration vs. Vectorized Calculations

Programmers with experience in languages other than R are sometimes slow to exploit the power of R to do vectorized calculations.  Vectorized methods operate on all elements in a vector simultaneously, rather than on individual components in sequence.  The result is simple: faster delivery of results.

The example below determines the elapsed time of a for() loop to generate a large vector where each element is sequentially calculated as the incremental cumulative sum from 1 to N, where N = 10,000,000.  The speed test is repeated using a vectorized function and compared to the results from iteration.

The elapsed time of the vectorized calculation is so fast (just over 1/10th of 1 second), it could repeat 278 times during one pass of the for() loop.

The functions apply(), tapply(), sapply(), and lapply() are core functions in base R and offer ways around loops.  apply() is ideally suited for arrays.  tapply() is well suited for factors and jagged arrays (where the dimensions of each element matrix vary).  The functions sapply() and lapply() are used with vectors and lists respectively.

Flow of Control Summary

The following table summarizes constructions that support overriding and managing the normal flow of control:

if(condition) {expression}Evaluates condition. If true, evaluates expression.
if(condition) {expression1}
else {expression2}
Evaluates condition. True evaluates expr1; False, evaluates expr2.
ifelse(condition, expression1, expression2)Vectorized version of if statement. Evaluates condition and returns elements of expression1 for true and elements of expression2 for false.
switch(expression, …)Matches expression (a character or numeric value) to the remaining arguments and then executes the associated instruction(s). An alternative to multiple if/else statements.
breakTerminates the current loop and passes control out of the loop.
nextTerminates the current iteration of the loop and immediately starts the next iteration of a for, while or repeat loop.
return (expression)Terminates the current function and returns the value of expression.
stop("message")Signals an error by terminating evaluation of the current function, printing the character string in message, and returning to the > prompt.
Evaluates condition. If true, evaluates expression then repeats the loop, evaluating condition again until condition is false.
repeat {expression}Simple version of while statement. No tests performed. expression is evaluated indefinitely until break, return or stop is encountered.
for(name in expression1)
Evaluates expression2 once for each name in expression1. for() loops are generally less efficient in R than vectorized calculations

Back | Next