Basic Syntax
Data distributions in R provide many functions to generate and test random samples. For any distribution, it is possible to use the density, cumulative probability, quartiles, or a random number generator. The distribution name must be prefaced with the letters d, p, q or r as follows:
- Density: d<distrib.name>()
- Cumulative Probability: p<distrib.name>()
- Quartile: q<distrib.name>()
- Random Number: r<distrib.name>()
Data Distributions in R
A list of data distributions in R appears below. Additional distributions are found in many packages listed on CRAN:
Name | Description | Parameters | Defaults |
---|---|---|---|
beta() | Beta | shape1, shape2 | -, - |
binom() | Binomial | size, prob | -, - |
Cauchy() | Cauchy | location, scale | 0, 1 |
chisq() | ChiSquare | Df | - |
exp() | Exponential | rate | 1 |
f() | F | df1, df2 | -, - |
gamma() | Gamma | Shape | - |
geom() | Geometric | Prob | - |
hyper() | Hypergeometric | m, n, k | -, -, - |
lnorm() | Lognormal | mean, sd (of log) | 0, 1 |
multinom | Multinomial | n variables, size | -, - |
nbinom | Negative binomial | size, prob | -, - |
norm() | Normal | mean, sd | 0, 1 |
pois() | Poisson | Lambda | - |
T() | Student T | Df | - |
unif() | Uniform | min, max | 0, 1 |
weibull() | Weibull | Shape | - |
Wilcox() | Wilcox | m, n | -, - |
Repeating Random Draws
The .Random.seed object is reset after each call to a random number function. To reproduce the same random number sequence, the .Random.seed object must be assigned and saved for re-use. Alternatively, the user must “fix” the set.seed() function with an integer. The examples below will clarify:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# generate random sequence 1 > rnorm(5) [1] 0.08934727 -0.95494386 -0.19515038 0.92552126 0.48297852 # generate random sequence 2 > rnorm(5) [1] -0.5963106 -2.1852868 -0.6748659 -2.1190612 -1.2651980 # set seed to 10 and generate random sequence3 > set.seed(10) > rnorm(5) [1] 0.01874617 -0.18425254 -1.37133055 -0.59916772 0.29454513 # set seed to 10 again and generate random sequence4 > set.seed(10) > rnorm(5) [1] 0.01874617 -0.18425254 -1.37133055 -0.59916772 0.29454513 |
The alternative approach to repeating random sequences uses the .Random.seed object as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# Generate random sequence1 > rnorm(5) [1] 1.1017795 0.7557815 -0.2382336 0.9874447 0.7413901 # Generate random sequence2 > rnorm(5) [1] 0.08934727 -0.95494386 -0.19515038 0.92552126 0.48297852 # Fix .Random.seed and generate random sequence3 > old.seed <- .Random.seed > rnorm(5) [1] -0.5963106 -2.1852868 -0.6748659 -2.1190612 -1.2651980 # Recycle .Random.seed and generate random sequence4 > .Random.seed <- old.seed > rnorm(5) [1] -0.5963106 -2.1852868 -0.6748659 -2.1190612 -1.2651980 |
For additional information on the distribution seeding, see this article here.
Bootstrap Sampling
It is often preferred to define random draws from a vector an actual distribution of observations. The sample() function is used for this purpose:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
x <- 1:12 # a random permutation</pre> > x <- 1:12 > sample(x) [1] 10 9 11 6 12 7 8 5 2 3 4 1 # bootstrap resampling > sample(x, replace = TRUE) [1] 8 3 2 7 9 10 1 8 7 4 12 3 # 100 Bernoulli trials sample(c(0,1), 100, replace = TRUE) [1] 1 1 1 0 1 1 1 0 1 0 1 1 1 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1 0 1 [31] 0 0 0 1 0 0 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 [61] 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 1 1 1 [91] 0 1 1 0 0 0 1 1 0 0 |
Frequency Tables in R
The following code can be used to define frequency tables in R. Random data is first generated and the cut() function is used to define pretty data bins. The table() function works to define the frequency by bin and the transform() function adds new columns to the table. The new columns include cumulative frequency, relative and cumulative proportions, which rely on the cumsum() and prop.table() functions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# Random data set.seed(1) r.dat <- rnorm(n = 500, mean = 50, sd = 10) # Define data bins using cut() bins <- factor(cut(r.dat,breaks = nclass.Sturges(r.dat))) # Tabulate data and convert to a data.frame freq.table <- as.data.frame(table(bins)) # Add cumuative frequency, relative and cumulative proportions freq.table <- transform(freq.table, Cum.Freq = cumsum(Freq), Rel.Prop = prop.table(Freq)) freq.table <- transform(freq.table, Cum.Prop = cumsum(Rel.Prop)) print(freq. table) bins Freq Cum.Freq Rel.Prop Cum.Prop 1 (19.9,26.7] 5 5 0.010 0.010 2 (26.7,33.5] 18 23 0.036 0.046 3 (33.5,40.3] 55 78 0.110 0.156 4 (40.3,47.2] 119 197 0.238 0.394 5 (47.2,54] 126 323 0.252 0.646 6 (54,60.8] 105 428 0.210 0.856 7 (60.8,67.7] 51 479 0.102 0.958 8 (67.7,74.5] 18 497 0.036 0.994 9 (74.5,81.3] 2 499 0.004 0.998 10 (81.3,88.2] 1 500 0.002 1.000 |