Category Archives: Scientific Computing
Many data projects involve a split-apply-combine strategy, where a big dataset is split into manageable pieces, a function is applied to operate on each piece and the results are then combined to put all the pieces back together. These are common actions that are repeated in many analysis projects.
Introduction to Tidy Data
Despite the enormous amount of data available, there is surprisingly little alignment or information on how to create clean, consistent and easy to use data.
Human interface with data and code can benefit from some simple principles to facilitate repeatable research and results. The “tidy” approach to data requires that:
- Data is structured consistently and reusable;
- Code flow relies on simple function calls using the pipe;
RMarkdown provides an authoring system for project and data science reporting. RMarkdown is a core component of the RStudio IDE. It braids together narrative text with embedded chunks of R code. The R code serves to demonstrate the model concepts in the text. RMarkdown produces elegantly formatted document output, including publication quality data plots and tables.
With language, we learn to listen and to speak. With literacy, we learn to read and to write. With programming, we learn how to use and to make software programs. Programming is the new literacy of the digital age. Scientific programming is an essential skill. It allows us to respond to new data structures and new technology, to expand Internet interfaces, to challenge common understandings or wisdom, and to access the control panels of machines and civilization.