Author Archives: Brad Horn

Learning R: The Fast Way

I read this on the Internet today:

“When I started learning R I started with base R and it was very time consuming to do simple data manipulation and plots. After I learned “Tidyverse” it was so much easier and a lot less frustrating. This bottom up learning process (starting with base R) made me want to give up learning R many times. Do you think newbies should learn some easy to use packages first, like the Tidyverse universe, and then learn base R to understand what is going on?”

Posted in R Basics | Comments Off on Learning R: The Fast Way

Plotting Forecast Data Objects Using ggplot

Robert Hyndman is the author of the forecast package in R. I’ve been using the package for long-term time series forecasts. The package comes with some built in methods for plotting forecast data objects in R that Ive wanted to customize for improved clarity and presentation.  The following article achieves that goal and shares two scripts for plotting forecast data objects using ggplot.

Posted in Data Science, ggplot2, Modeling, R Programming | Comments Off on Plotting Forecast Data Objects Using ggplot

From Least Squares to k-Nearest Neighbor (kNN)

The linear model is one of the most widely used data science tools and one of the most important.  In contrast, there is another basic tool:  the k nearest neighbor method (kNN).  Prediction and classification are two uses for these models.  In practice, classification results (ie. feature classes) are used by machines in many ways: to recognize faces in a crowd, to “read” road signs by distinguishing one letter from another and to set voter registration districts by separating population groups.  This article applies and compares linear and non-linear classification methods

Posted in Data Science, Modeling, R Programming, Website | Comments Off on From Least Squares to k-Nearest Neighbor (kNN)

Predicting Technology Progress and Solar Growth

Technology progress is a key to solar growth and pricing.  By extension, the ability to model technology progress is essential to understanding future energy supply and demand.

Solar innovation is widespread. Examples include  solar cell efficiency, module manufacturing, and learning innovations with solar system installation and operation. Solar pricing and growth are also supported by innovations in enabling technology, such as battery storage, smart grids and electric vehicles.

Posted in Economics, Modeling | Comments Off on Predicting Technology Progress and Solar Growth

R Functions for Best Subset Regression

Best subset regression is an technique for model building and variable selection. The method looks at all combinations of independent predictor variables for use in a multiple regression model. Model developers and analysts will often struggle with variable selection, especially when the number of predictors is high.  Ideally, each set of predictors is run and the best set is selected using a criteria for model performance. The following article provides custom functions for best subset selection that are fast and easy to use.

Posted in Data Science, Faster R, Modeling | Comments Off on R Functions for Best Subset Regression

Extract Data Tables from PDF Files in R

A new method to extract data tables from PDF files is introduced. The solution combines the R programming language with the open-source Java program Tabula. The result is a convenient method that transforms documents into databases.

Benefits
The ability to train a machine to extract data tables from PDF files has several benefits:

Posted in Data, Misc Tricks, R Data Import, R Programming | Comments Off on Extract Data Tables from PDF Files in R

SpatialPoints in R: Large Data Case Study

A common task in spatial data analysis is  extracting SpatialPoints inside a set of polygons or buffer zones.  Analysts can use standard GIS or map tools to extract a set of points within an area of interest using manual “point-and-click” routines. This method is easy, but will probably prove impractical, especially in cases involving big data.  The alternative is to train a machine to automatically extract the points in a polygon or buffer zone.  This post achieves that task and presents a case-study with R code.

Posted in Data, GDAL, Spatial Analysis | Comments Off on SpatialPoints in R: Large Data Case Study

Popularity of R Programming Language

TIOBE IndexThe popularity of R is rapidly increasing and is well on its way to being a top 10 programming language.  The TIOBE index is a standard indicator of the popularity of all programming languages.  The TIOBE index confirms that a subset of languages – those for computational statistics and data analysis – are gaining increased attention. The clear winner of the pack is the open source programming language R.

Posted in Data Science, R Programming | Comments Off on Popularity of R Programming Language

Cost Breakdown Structures for Solar PV Projects

Modules, inverters and balance of system costs define the total installed cost of a solar PV system.

The three cost components are very simple in nature.  In practice, total cost is defined using a detailed cost breakdown structure.  The structure must also be applied consistently across projects and over time.  The result can be improved cost modeling and management.

Posted in Economics, Engineering | Comments Off on Cost Breakdown Structures for Solar PV Projects

Glare from Solar PV Modules

PV_array_glareA common question concerning the safety of photovoltaic (PV) power systems is the impact of reflected sunlight.  PV modules have the potential to impact neighboring structures or activities, notably aviation.  It is important to know where the reflected light will go and what the intensity of the light will be at any point in time.

Posted in Engineering, Projects | Comments Off on Glare from Solar PV Modules

Aerosol Animation

Aerosol Optical Depth (AOD) defines the degree to which aerosols prevent the transmission of sunlight by absorption or scattering.  AOD is measured using an integrated extinction coefficient over a vertical column of air.  The extinction coefficient can be used to analyze solar extinction and the performance of solar power systems as a function of location and time.

GOCART

Posted in Animation, Data, Modeling, R Data Import, R Graphics, Spatial Analysis | Comments Off on Aerosol Animation

Crop Raster Images in R


E020N40The maptools package has a pruneMap() function t0 crop map objects in R.  In practice, the function extracts data from SpatialPolygon or SpatialLine objects given a boundary box or specific area of interest.  Unfortunately, there is no equivalent function for high resolution, large data, raster images, which are common in many Earth Science applications.  The following post defines a custom function to crop raster images in R and to extract data from SpatialGridDataFrames.  The function is tested using a raster image from the Shuttle Radar Topography Mission (SRTM; shown at left).  The resulting data is then mapped using the image() function in R.

Posted in GDAL, R Data Import, R Graphics, Spatial Analysis | Comments Off on Crop Raster Images in R

Renewable Energy and the Carbon Majors

CarbonMajorsThe business case for renewable energy and CO2 reduction is changing as carbon tracking and court cases shift focus from carbon emitters to the ultimate producers of coal, oil and natural gas … to the “carbon majors.”

A 2013 study published in the journal Climatic Change cites 90 companies by name and responsible for 63% of human induced greenhouse gas emissions since the beginning of the industrial revolution.  National industrial plans (NIPs) and energy companies from the US and Europe dominate the list with familiar names like ChevronTexaco, ExxonMobil, Saudi Aramco, British Coal, and RWE.

Posted in Website | Comments Off on Renewable Energy and the Carbon Majors

R Source Code

WordcloudSource code access is one of the great benefits of R.  Source code is available for base R and over 5,000 open source packages.  There are many reasons to view source code: to know what software does when documentation is vague or incomplete; to combine code objects in custom scripts or libraries; and to change source code as needed.  The following post defines the different types of R source code available and how to access R sources.

Posted in R Packages, R Programming | Comments Off on R Source Code

Qatar National Grid (QND95)

surveyingQND95 is a two dimensional coordinate reference system that is the standard for geographic mapping in Qatar.  QND95 is intended for onshore activities only. QND95 provides up-to-date specs to calibrate surveying tools, GPS devices, GIS tools, and analysis activity.  The coordinate reference system facilitates standardization and consistency across activities. 

Posted in GDAL | Comments Off on Qatar National Grid (QND95)