Creating R Functions

Creating functions and object orientated scripts are the preferred way to use R.  R functions expand the capabilities of R. By nature, R scripts are a way to organize and save data, complicated expressions, or sequences of operations for re-use.  Well configured  R functions rely on proper use of R language concepts and object orientated structures.

R Scripts vs. R Functions

Scripts and functions have several distinguishing characteristics:

  • Scripts store intermediate results as persistent objects on the hard drive (e.g. in the .Data  folder), cluttering up the working database.   Functions provide more control over which results will be saved and where, be it in RAM or using swap space on the hard drive;
  • A function’s local or intermediate results are stored in separate evaluation frames, and pose no danger of accidentally overwriting other objects with the same names in the working database;
  • The dynamic memory usage of functions is usually faster than reading and writing disk files, implying functions run faster than equivalent scripts;
  • Functions more easily support modular and vectorized code development;
  • Unlike scripts, functions can be deployed as a group by packaging them as an R library.

Types of R Functions

There are three broad types of R functions objects:

  • Standard Call Functions: these functions are user defined or intrinsic to base R.  Examples include mean(), lm(), and summary();
  • Infix Operators: These are standard operators for data manipulation, transformation and customization.  Examples include +, , %*%, etc.
  • Replacement Functions: These non-arithmetic functions act on data objects and extend manipulation to include physical redefinition.  Examples include diag(), length(), etc.

Meanwhile, each of these function types can be defined as employing S3 or S4 methods.  In simple terms, the difference is a matter of development standards, and impacts which R kernel or interpreter is used.

The Function Body

The syntax and generalized body of any user-defined function is defined below:

A function can have no arguments, one or many. Multiple argument inputs must be separated by commas.  When providing a default value for an argument, the syntax argument=value must be used in the argument list.

The body of the function is made of up of one or more R expressions, which are legal commands. The use of function bracketing { } is optional in the case of a short, one-line expressions, but is necessary when more than one command is specified. Functions can be defined within the body of other functions.  Finally, it is good practice to include comments in the function body.

The value returned by a function is the last value of the function body, which is usually an unassigned expression.  Alternatively, a function may be terminated at any stage by calling the return() function.  If return() is given several named arguments, then the value returned is a list object with several (named) components.

The on.exit() Function

The on.exit() function evaluates an expression upon exiting the function in which the on.exit() call resides.  It is helpful for re-setting any global parameters that me be changed within a function body.  For example, it may be necessary to change global environment parameters within a function using options() or par().  This is very common when functions need to change machine settings or to create graphs.   If this is the case, it is good practice to save old option settings and then to reset parameters on exit.  

With these thoughts in mind, the basic function body expands to be:

Example of Simple Function Writing in R

A function is presented to illustrate R language conventions and formatting styles:

The example confirms use of the simple definition of the function body (e.g. no par() changes were called).  It also shows implicit use of the return() function with list output, which is helpful if function output is complex.

Argument Calling Conventions

Functions may have their arguments specified or unspecified.  When the arguments are unspecified, there may be an arbitrary number of them.  They are shown in the argument lists as three-dots (e.g. …) when the function is defined.  An examples of a function with unspecified arguments includes the concatenation function c(…, recursive=F).

Formal arguments are those used in the function definition and the actual arguments are those used in the function call.  For example, the apply() functions appears as follows:

Each of the functions calls is equivalent.  Rules for matching actual to formal arguments are listed below in precedence order:

  • First, any actual arguments in the form name=value are matched if the actual name matches the name of the formal argument.  In the case of formal arguments which occur after a “” argument, exact matching is the only way they can be called;
  • Next, arguments specified in the form name=value are processed if the name is a unique partial match for a formal argument, provided the formal argument does not occur after the “” argument;
  • Any unnamed arguments are then matched to remaining formal arguments one by one based on positional matching;
  • All remaining unmatched actual arguments become part of the “” formal argument, if there is one.
  • Formal arguments without a match result in an error

The calling convention confirms that a function may have some initial arguments in positional form and some in name form.  Again, the following calls are equivalent in substance, but not in syntax:

It is good coding practice to always maintain position matching when calling a function and based on the formal function definition.  The argument names, positions and default values for any R function can be found using help(), by printing the function, or by using the args() function.

The Special Argument “…”

The three-dots argument is special since any number of arguments may be matched to it on a function call.  The three-dots argument also is the only way to define a function with variable argument input.  Finally, the three dots argument serves to include and pass arguments from generic functions  For example, the “” argument supports passing arguments for the par() function if that function is called in the body of the function.

Printing R Functions

You can print and view any object in the R Console by typing the name of the object.  Since functions are objects, you can view the source code for any function (either one you wrote or one that ships with base R).  For example:

This is handy if you’re using an old function that you’ve written but forgotten what its arguments are, for instance. It’s also useful if you want to explore what an R library function actually does.  Finally, one of the best ways to learn R programming is to review someone else’s source code.

Back | Next