Category Archives: Data

Web Scraping in R

The world-wide web presents enormous amounts of data.  Unfortunately, the majority of the data is not directly available for download.  In response, web scraping exploits indirect means to harvest data from websites.  In practice, web scrapping is not unique and is totally legal.  For example, web browsers rely on the Hypertext Transfer Protocol (HTTP) to fetch data and so does web scrapping.  The difference with web scrapping is that the user retrieves, selects and extracts website content and data intended for browser display.  This article shows how web scraping works and presents tools available in the R programming language for both manual and automated web-scraping.

Posted in Data, R Programming, Web Scrapping | Comments Off on Web Scraping in R

Observation Networks

Introduction to Satellite Observation Networks

Satellite observation networks provide invaluable data on the climate and the layered atmosphere.  Space satellite data is a key input to assess the feasibility and operational integrity of renewable energy power systems.

Posted in Data | Comments Off on Observation Networks

Ground Stations

Ground station sensors for weather and climate observation are listed below.  The list is limited to station networks that that provide verification of wind and solar resource data.

Posted in Data | Comments Off on Ground Stations

Energy Content of Fuels

Energy Content Explained

Click to enlarge

Click to enlarge

The energy content of any organic fuel is defined as the fuel’s primary energy.  Primary energy is measured given the fuels calorific value or the heat generation from the complete combustion of one unit of fuel under well-defined conditions.  The calorific value can be a gross or net number, depending on whether the combustible heat released takes into account the vapor condensation of water.  Power production efficiency is typically calculated using Net Calorific Value (NCV) after water vaporization.

Posted in Data, Economics, Engineering | Comments Off on Energy Content of Fuels

Principles of Tidy Data

Introduction to Tidy Data

Despite the enormous amount of data available, there is surprisingly little alignment or information on how to create clean, consistent and easy to use data.

Human interface with data and code can benefit from some simple principles to facilitate repeatable research and results. The “tidy” approach to data requires that:

  • Data is structured consistently and reusable;
  • Code flow relies on simple function calls using the pipe;
Posted in Data, R Basics, R Data Objects, R Data Syntax, Scientific Computing | Comments Off on Principles of Tidy Data

Continuous Futures Prices in R

Introduction

Crude oil prices by delivery period define the term structure of the market. The term structure changes shape over time given shifts in price level and slope.  Term structure behavior becomes clear by combining discrete futures contracts with similar maturities into a continuous time series.  R code is supplied to create continuous prices by delivery period. The purpose is to show term structure behavior and to derive risk and profitability measures for oil production, marketing and trading strategies. The resulting data is tidy, well suited for model training and out-of-sample testing.

Posted in Data, R Programming | Comments Off on Continuous Futures Prices in R

Term Structure of Crude Oil 2018

An animation showing the term structure of NYMEX crude oil.  For source code, go here.

Posted in Animation, Data, Economics, R Graphics | Comments Off on Term Structure of Crude Oil 2018

Extract Data Tables from PDF Files in R

A new method to extract data tables from PDF files is introduced. The solution combines the R programming language with the open-source Java program Tabula. The result is a convenient method that transforms documents into databases.

Benefits
The ability to train a machine to extract data tables from PDF files has several benefits:

Posted in Data, Misc Tricks, R Data Import, R Programming | Comments Off on Extract Data Tables from PDF Files in R

SpatialPoints in R: Large Data Case Study

A common task in spatial data analysis is  extracting SpatialPoints inside a set of polygons or buffer zones.  Analysts can use standard GIS or map tools to extract a set of points within an area of interest using manual “point-and-click” routines. This method is easy, but will probably prove impractical, especially in cases involving big data.  The alternative is to train a machine to automatically extract the points in a polygon or buffer zone.  This post achieves that task and presents a case-study with R code.

Posted in Data, GDAL, Spatial Analysis | Comments Off on SpatialPoints in R: Large Data Case Study

Aerosol Animation

Aerosol Optical Depth (AOD) defines the degree to which aerosols prevent the transmission of sunlight by absorption or scattering.  AOD is measured using an integrated extinction coefficient over a vertical column of air.  The extinction coefficient can be used to analyze solar extinction and the performance of solar power systems as a function of location and time.

GOCART

Posted in Animation, Data, Modeling, R Data Import, R Graphics, Spatial Analysis | Comments Off on Aerosol Animation

Qatar: Meteosat Solar Data

This is the second article in a series on Meteosat solar data from EUMETSAT.  The intent is to define the basic parameters of meteorological data coverage for the State of Qatar.  Specifically:

  • Simple trigonometry is defined to assess the resolution of the satellite coverage area; 
  • A land surface analysis is conducted to visualize the geographic coordinates of the satellite pixels across the State of Qatar; 
Posted in Data, GDAL, Spatial Analysis | Comments Off on Qatar: Meteosat Solar Data

Binary Data In R

There are many reasons to work with binary data in R.  Solar resource data, solar PV performance data, and real-time grid monitoring data are typically stored and transmitted in binary data formats.  

In practice, the ability to access binary data in R is impossible in the absence of a vender or format specific “can opener” and a properly configured scientific programming environment.  As a result, many business applications often bypass binary data use altogether or, instead, rely on secondary sources and summary statistics with no ability to validate data integrity and accuracy.  

Posted in Data, Data Science, GDAL, R Data Import | Comments Off on Binary Data In R