Statement of Purpose
Documentation on the R programming language has been developed to provide a comprehensive answer to question “What is R?” The approach taken seeks to appeal to new users and the reliance on practical examples seeks to provide applied, long-term reference for seasoned users.
What is R?
R is an open-source implementation of the the S programming language, which was developed by Bell Labs “to improve data manipulation, analysis, and visualization.”
Development of S started in 1976 and took place in the same offices responsible for the transistor, UNIX, C, PostScript, and TCP/IP. In 1984, AT&T established Statistical Sciences Corp (StatSci) to distribute the S language under monopoly rules that required Bell Labs to commercialize its patents absent copyright protection. The S programming language is now licensed by AT&T/Lucent exclusively to Insightful Corp. under the product name S-PLUS.
R represents a development path separate and distinct from S. R is GNU S, an open-source platform developed in 1997 at the University of New Zealand, Auckland. Since 1997, the open-source framework has been managed by an international “R Core” team and the language has attract a substantial user base. R use is now significantly larger than S/S-Plus. The language also enjoys critical development momentum as evidenced by the large number of extension packages available … over 5,000 released between 2006 and 2013.
R is an interpreted language, not a compiled language. As a result, R code is dependent on the R interpreter for machine interface and data handling. Reliance on the R kernel offers many advantages for simplifying data handling, object-orientated programming, and memory management. Most of the user visible functions are written in R with primitive functions written in C and Fortran. It is possible for the user to interface R with C, C++ and Fortran, and also to write additional primitives.
R is easy to learn, easy to implement, and reinforces focus on data science and large data analytics absent the burden of writing code for memory or machine management. Meanwhile, R is also attractive for scientific and operational data sets that are very large in nature. To this end, R relies on a wide array of standard data object structures that can easily support diverse data sets and formats. R’s data object system is also extendable and lends itself easily to customization. Equally important, the R package management system is also an essential feature that extends R’s analytical and functional capabilities by providing acess to over 10,000 open-source packages. Finally, the R community is extensive and a great source of innovation and help.
There is great practical value in open-source R. First, open-source R is free. R has also enjoyed wide support for many years in the absence of performance, credit or reputation risk and is now widely deployed in many companies. More important, the source code – both the base system and all package extensions – is readily available with standardized documentation. Finally, R benefits from vendor support and is one of the most actively discussed programming languages in on-line forums.
Base R, a core set of packages, and package extensions can be obtained from the Comprehensive R Archive Network (CRAN).