Preprocessing work to maintain R dates and times requires synchronize of data and formats across data sources. R dates and times justify care and attention.
Current Date/Time in R
The function date(), Sys.date() and Sys.time() all return a character string of the current system data and time:
1 2 3 4 5 6 7 8 |
> date() [1] "Tue Oct 22 18:43:27 2013" > Sys.Date() [1] "2013-10-22" > Sys.time() [1] "2013-10-22 18:45:54 AST" |
Each of these functions returns a slightly different result, which raises the obvious question how best to manage and format dates in large data objects?
Classes: R Dates and Times
R provides several options for dealing with date and time data.
- The as.Date() function handles dates (without times);
- the package chron handles dates and times, but does not control for time zones; and
- the POSIXct and POSIXlt classes are ISO compliant data objects that support date/times with time zones and assorted calendar adjustments.
The general rule for date/time data in R is to use the simplest technique possible:
- For date only data, as.Date() will usually be the best choice.
- If you need to handle dates and times, without timezone information, the chron library is a good choice;
- the POSIX classes are especially useful when timezone manipulation is important and is the common format for SCADA data and atmospheric science applications.
Creating Data/Time Objects in R
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
> x <- as.Date("2005-10-21 18:47:22", origin = "1990-01-01") > y <- as.POSIXct("2005-10-21 18:47:22", tz = "") > z <- as.POSIXlt("2005-10-21 18:47:22", tz = "") > x [1] "2005-10-21" > y [1] "2005-10-21 18:47:22 AST" > z [1] "2005-10-21 18:47:22" > as.numeric(x) [1] 13077 # number of days since origin > as.numeric(y) [1] 1129909642 # seconds since 1970-01-01 00:00 > as.numeric(z) [1] 1129909642 # seconds since 1970-01-01 00:00 > unlist(z) sec min hour mday mon year wday yday isdst 22 47 18 21 9 105 5 293 0 > class(x) [1] "Date" > class(y) [1] "POSIXct" "POSIXt" > class(z) [1] "POSIXlt" "POSIXt" |
POSIXct vs. POSIXlt in R
Both the POSIX classes give seconds since January 1, 1970 00:00 in UTC time. The primary difference between class POSIXct and POSIXlt is that the former is a just numeric value (seconds) and the latter is a named list of vectors representing:
sec
0–61: seconds
min
0–59: minutes
hour
0–23: hours
mday
1–31: day of the month
mon
0–11: months after the first of the year.
year
years since 1900, even though the origin is defined to be 1970!
wday
0–6 day of the week, starting on Sunday.
yday
0–365: day of the year.
isdst
Daylight Savings Time flag. Positive if in force, zero if not, negative if unknown.
A convenient way to exploit this vector data to create POSIX compliant dates is the ISOdate() function:
1 2 |
> ISOdate(2005,10,21,18,47,22,tz="") [1] "2005-10-21 18:47:22 AST" |
Formatting Date/Times in R
Capabilities to format date/times is typical R…lots of flexibility! In practice, date/time objects in R are manipulated in the same way they would be in a C program. The two most important functions in this regard are:
- strptime() for formatting input dates, and
- strftime() for formatting output dates.
Both of these functions use a variety of formatting codes, as listed in the table below. For example, dates in many logfiles are printed in a format like “16/Oct/2005:07:51:00“. To create a POSIXct date from a date in this format, the following call to strptime() could be used:
1 2 3 4 5 6 |
> y <- strptime("16/Oct/2005:07:51:00", format = "%d/%b/%Y:%H:%M:%S") > y [1] "2005-10-16 07:51:00" > z <- strftime(y, format = "%Y-%b-%d") > z [1] "2005-Oct-16" |
For pretty printing, the format() function will recognize the class of your input date, and perform any necessary conversions before calling strftime(), so strftime() rarely needs to be called directly. For example:
1 2 3 |
> y = ISOdate(2005,10,21,18,47,22,tz="PDT") > format(y,"%A, %B %d, %Y %H:%M:%S") [1] "Friday, October 21, 2005 18:47:22" |
All the available format codes are listed below:
Code | Description |
---|---|
%a | Abbreviated weekday name in the current locale. Also matches full name on input. |
%A | Full weekday name in the current locale. Also matches the abbreviated name on input. |
%b | Abbreviated month name in the current locale. Also matches the full name on input. |
%B | Full month name in the current locale. Also matches the abbreviated name on input. |
%c | Date and time. Locale-specific on output. "%a %b %e %H:%M:%S %Y on input. |
%d | Day of month as decimal number (01-31). |
%H | Hours as decimal number (01-23). |
%I | Hours as decimal number (01-12). |
%j | Day of year as decimal number (001-366). |
%m | Month as decimal number (01-12). |
%M | Minute as decimal number (00-59). |
%p | AM/PM indicator in the locale. Used in conjunction with %I and not with %H. An empty string in some locales. |
%S | Second as decimal number (00-61), allowing for up to two leap-seconds (POSICX-compliant implementations ignore leap seconds). |
%U | Week of the year as decimal number (00-53). |
%w | weekday as a decimal number (0-6, Sunday is 0). |
%W | Week of the year as a decimal number (00-53) using Monday as the first day of the week. |
%x | Date. Local-specific on output. %y/%m/%d on input. |
%X | Time. Local-specific on output. %H:%M:%S on input. |
%y | Year without century (00-99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 2008 POSIX standard; this could change) |
%Y | Year with century.Note: the Gregorian calendar assumes no zero year (ISO 8601:2004 defines as 1BC). |
%z | Signed offset in hours and minutes from UTC (+0300) is 3 hours before UTC. |
%Z | Output only. tzone as a character string (empty if not available) |
Some Common Date/Time Manipulations
The individual components of a POSIX date/time object can be extracted by first converting to POSIXlt if necessary, and then accessing the components directly:
1 2 3 4 5 6 |
> mydate = as.POSIXlt('2005-4-19 7:01:00') > names(mydate) [1] "sec" "min" "hour" "mday" "mon" "year" [7] "wday" "yday" "isdst" > mydate$mday [1] 19 |
Many of the statistical summary functions, like mean, min, max, etc are able to handle date objects. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
# Manually enter dates > rdates = scan(what="") 1: 1.0 29Feb2000 3: 1.1 15Jun2000 5: 1.2 15Dec2000 7: 1.3 22Jun2001 9: 1.4 19Dec2001 11: 1.5 29Apr2002 13: 1.6 1Oct2002 15: 1.7 16Apr2003 17: 1.8 8Oct2003 19: 1.9 12Apr2004 21: 2.0 4Oct2004 23: Read 22 items # Create data frame > rdates = as.data.frame(matrix(rdates, ncol = 2, byrow = TRUE)) > rdates[,2] = as.Date(rdates[,2], format = "%d%b%Y") > names(rdates) = c("Release","Date") > rdates Release Date 1 1.0 2000-02-29 2 1.1 2000-06-15 3 1.2 2000-12-15 4 1.3 2001-06-22 5 1.4 2001-12-19 6 1.5 2002-04-29 7 1.6 2002-10-01 8 1.7 2003-04-16 9 1.8 2003-10-08 10 1.9 2004-04-12 11 2.0 2004-10-04 |
Once the dates are properly read into R, a variety of calculations can be performed:
1 2 3 4 5 6 |
> mean(rdates$Date) [1] "2002-05-19" > range(rdates$Date) [1] "2000-02-29" "2004-10-04" > rdates$Date[11] - rdates$Date[1] Time difference of 1679 days |
If two times are subtracted, R will return the results in the form of a time difference, which represents a difftime object. For example, New York City experienced a major blackout on July 13, 1997, and another on August 14, 2003. To calculate the time interval between the two blackouts, we can simply subtract the two dates, using any of the classes that have been introduced:
1 2 3 4 |
> b1 = ISOdate(1977,7,13) > b2 = ISOdate(2003,8,14) > b2 - b1 Time difference of 9528 days |
If an alternative unit of time was desired, the difftime() function could be called, using the optional units= argument can be used with any of the following values: “auto“, “secs“, “mins“, “hours“, “days“, or “weeks“. So to see the difference between blackouts in terms of weeks, we can use:
1 2 |
> difftime(b2,b1,units='weeks') Time difference of 1361.143 weeks |
difftime() values can be manipulated like ordinary numeric variables; arithmetic performed with these values will retain the original units.
Date/Time Sequences
The by= argument to the seq() function can be specified either as a difftime() value, or in any units of time that the difftime function accepts, making it very easy to generate sequences of dates. For example, to generate a vector of ten dates, starting on July 4, 1976, we could use:
1 2 3 4 |
> seq(as.Date("1976-7-4"), by = "days", length = 10) [1] "1976-07-04" "1976-07-05" "1976-07-06" "1976-07-07" [2] "1976-07-08" "1976-07-09" "1976-07-10" "1976-07-11" [9] "1976-07-12" "1976-07-13" |