Category Archives: Faster R
Best subset regression is an technique for model building and variable selection. The method looks at all combinations of independent predictor variables for use in a multiple regression model. Model developers and analysts will often struggle with variable selection, especially when the number of predictors is high. Ideally, each set of predictors is run and the best set is selected using a criteria for model performance. The following article provides custom functions for best subset selection that are fast and easy to use.
The standard way to read text files into R is to use the read.table() command. However, many users struggle with time delays when loading large data sets. An alternative command that offers significant speed improvements is fread(), or fast read, which can found in the data.table package. The following code loads a tab delimited file with a million elements and reveals that fread() reduces load time by almost 99%, as confirmed by the benchmark performance stats at left. The function is still under development, but it is available for download and doesn’t suffer from stability issues. Instead, expect argument structure and command syntax to change over time.