Category Archives: R Programming

Data Sorting in R

Data sorting in R is simple and straightforward.  Key functions include sort() and order().   The variable by which sort you can be a numeric, a string or a factor variable.  Argument options also provide flexibility how missing values will be handled:  they can be listed first, last or removed.

Data Sorting Examples

It is also possible to sort in reverse order by using a minus sign ( – ) in front of the sort variable.  For example:

Posted in R Data Objects, R Programming | Comments Off on Data Sorting in R

Continuous Futures Prices in R

Introduction

Crude oil prices by delivery period define the term structure of the market. The term structure changes shape over time given shifts in price level and slope.  Term structure behavior becomes clear by combining discrete futures contracts with similar maturities into a continuous time series.  R code is supplied to create continuous prices by delivery period. The purpose is to show term structure behavior and to derive risk and profitability measures for oil production, marketing and trading strategies. The resulting data is tidy, well suited for model training and out-of-sample testing.

Posted in Data, R Programming | Comments Off on Continuous Futures Prices in R

Plotting Forecast Data Objects Using ggplot

Robert Hyndman is the author of the forecast package in R. I’ve been using the package for long-term time series forecasts. The package comes with some built in methods for plotting forecast data objects in R that Ive wanted to customize for improved clarity and presentation.  The following article achieves that goal and shares two scripts for plotting forecast data objects using ggplot.

Posted in Data Science, ggplot2, Modeling, R Programming | Comments Off on Plotting Forecast Data Objects Using ggplot

From Least Squares to k-Nearest Neighbor (kNN)

The linear model is one of the most widely used data science tools and one of the most important.  In contrast, there is another basic tool:  the k nearest neighbor method (kNN).  Prediction and classification are two uses for these models.  In practice, classification results (ie. feature classes) are used by machines in many ways: to recognize faces in a crowd, to “read” road signs by distinguishing one letter from another and to set voter registration districts by separating population groups.  This article applies and compares linear and non-linear classification methods

Posted in Data Science, Modeling, R Programming, Website | Comments Off on From Least Squares to k-Nearest Neighbor (kNN)

Extract Data Tables from PDF Files in R

A new method to extract data tables from PDF files is introduced. The solution combines the R programming language with the open-source Java program Tabula. The result is a convenient method that transforms documents into databases.

Benefits
The ability to train a machine to extract data tables from PDF files has several benefits:

Posted in Data, Misc Tricks, R Data Import, R Programming | Comments Off on Extract Data Tables from PDF Files in R

Popularity of R Programming Language

TIOBE IndexThe popularity of R is rapidly increasing and is well on its way to being a top 10 programming language.  The TIOBE index is a standard indicator of the popularity of all programming languages.  The TIOBE index confirms that a subset of languages – those for computational statistics and data analysis – are gaining increased attention. The clear winner of the pack is the open source programming language R.

Posted in Data Science, R Programming | Comments Off on Popularity of R Programming Language

R Source Code

WordcloudSource code access is one of the great benefits of R.  Source code is available for base R and over 5,000 open source packages.  There are many reasons to view source code: to know what software does when documentation is vague or incomplete; to combine code objects in custom scripts or libraries; and to change source code as needed.  The following post defines the different types of R source code available and how to access R sources.

Posted in R Packages, R Programming | Comments Off on R Source Code

Fast File Reads (fread) for Large Data

freadThe standard way to read text files into R is to use the read.table() command.  However, many users struggle with time delays when loading large data sets.  An alternative command that offers significant speed improvements is fread(), or fast read, which can found in the data.table package.  The following code loads a tab delimited file with a million elements and reveals that fread() reduces load time by almost 99%, as confirmed by the benchmark performance stats at left.  The function is still under development, but it is available for download and doesn’t suffer from stability issues.  Instead, expect argument structure and command syntax to change over time.

Posted in Faster R, R Data Import, R Packages, R Programming | Comments Off on Fast File Reads (fread) for Large Data

Custom Beamer Template

Click to enlarge

Click to enlarge

Beamer is a \LaTeX{}~ document class that is by far the most practical tool for making presentations involving data science, business analytics, or general research. It is widely used in most conferences and easily lends itself to data intensive reporting and repetitive batch processing.

A custom beamer template is presented that is easy to extend or modify.  The benefits of the beamer document are numerous:

Posted in LaTeX, Misc Tricks, R Programming | Comments Off on Custom Beamer Template

Correlation Plots in R

The standard function for correlation plots in R is pairs(), which generates a matrix of scatter plots based on all pairwise combinations of variables in a data object.  The standard graph looks something like this after a little color enhancement:” plot13Click to enlarge

The code behind this plot is simple:

Posted in Data Science, ggplot2, R Graphics, R Programming | Comments Off on Correlation Plots in R

R Syntax Highlighter

Pretty RPretty R is an online tool and r syntax highlighter that transforms R source code into HTLM code for website development.  The result is easy to read R code for high quality web presentations. The Pretty R webpage is a good learning tool as it provides the HTML code details required to deliver syntax highlighting that complies with R documentation from inside-r.org.

Posted in Misc Tricks, R Programming | Comments Off on R Syntax Highlighter