The simulations in this tutorial start with the set.seed()
function, which sets the seed of R’s random number generator. Doing this ensures that you get the exact same distribution of pseudo-random scores each time you call a random process, like rnorm()
, dnorm()
, or sample()
. If you wish to generate a different distribution of scores, you can change the seed number or omit the set.seed()
command.
The scores obtained from any intelligence test are standardized so we know that in the population \(\mu\) = 100 and \(\sigma\) = 15. Simulate a normal population distribution of N = 100,000 IQ scores with a mean = 100 and standard deviation = 15.
set.seed(3) # set seed
<- round(rnorm(100000, mean = 100, sd = 15),0) # generate whole random numbers; save scores in variable Y
Y <- as.data.frame(Y) # save scores in new data frame
IQ.df head(IQ.df) # show the top of the data frame
## Y
## 1 86
## 2 96
## 3 104
## 4 83
## 5 103
## 6 100
Create a descriptive statistics summary table for the population IQ scores using the describe()
function in the psych package.
library(psych)
describe(IQ.df, skew = FALSE)
## vars n mean sd min max range se
## X1 1 1e+05 100.01 15.06 28 166 138 0.05
Plot the population IQ scores in a frequency distribution graph (histogram).
library(ggplot2)
ggplot(IQ.df, aes(x = Y)) + # data frame, x = scores to plot
geom_histogram(
bins = 200, # bins on X-Axis
color = "darkblue", # line color
fill = "white") + # fill color
labs(x = "IQ score", # X-axis label
y = "Frequency") # Y-axis label
Appendix A shows how to simulate normal distributions for multiple groups with different means and SDs, e.g., female and male IQ scores.
Research summary: In a hypothetical research study, 36 normal human adult participants volunteered to receive once-a-week brain stimulation therapy that had been shown to safely increase cognitive functioning in several non-human animal species, including primates. After 12 therapy sessions, all participants completed a standardized IQ test administered by a licensed psychometrician.
Null hypothesis significance testing (NHST) is a framework, or set of rules, for using sample data to make an inference about unknown population data.
Inference: Drawing a conclusion or making a logical judgment on the basis of evidence and prior conclusions rather than on the basis of direct observation.
Although we can accurately simulate population scores for many psychological variables of interest, like IQ, in most cases we are not able to directly observe all the actual scores in a population. Instead, psychological researchers sample scores from the population of interest and apply the NHST rules to make an inference about the unknown population scores.
Instead of starting with a hypothetical data set, let’s randomly select a sample of n = 36 IQ scores from the N = 100,000 untreated population we created above (IQ.df). Doing so assumes there really is no effect of the therapy, but it’s also a way to demonstrate sampling error, which is a discrepancy between a sample and the population it was drawn from.
set.seed(3) # set seed
<- sample(IQ.df$Y, size = 36, replace = FALSE) # randomly select 36 scores from IQ.df
Ysample <- as.data.frame(Ysample) # save the 36 scores in a new data frame
IQsample.df head(IQsample.df) # show the top of the data frame
## Ysample
## 1 86
## 2 102
## 3 119
## 4 120
## 5 114
## 6 91
Create a descriptive statistics summary table for the IQ scores using the describe()
function in the psych package. Note how the IQ scores compare to the population IQ mean = 100 and SD = 15.
library(psych)
describe(IQsample.df)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 36 101.86 16.6 101 101.7 20.02 72 141 69 0.14 -0.81 2.77
Plot the sample IQ scores in a frequency distribution.
library(ggplot2)
ggplot(IQsample.df, aes(x = Ysample)) +
geom_histogram(bins = 10, color = "darkblue", fill = "white") +
labs(x = "IQ", y = "Frequency")
Use the t.test()
function to run the one-sample t-test, which also produces a p-value and confidence intervals.
<- t.test(IQsample.df$Ysample, mu = 100) # one-sample t-test, mu = population mean
testIQ # show the results testIQ
##
## One Sample t-test
##
## data: IQsample.df$Ysample
## t = 0.67263, df = 35, p-value = 0.5056
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
## 96.24398 107.47824
## sample estimates:
## mean of x
## 101.8611
Statistical decision rule for the t-statistic:
For t(35) = 0.67, the p-value = .5056, which is greater than \(\alpha\) = .05, so we fail to reject the null hypothesis. In other words, the sample scores did not differ significantly from the untreated population.
The following statement shows how the results may be reported in an APA-style report.
Among a sample of 36 normal adult participants, 12 weeks of brain stimulation therapy showed no significant effect on IQ (M = 101.86, SD = 16.6) compared to the general population, t(35) = 0.67, p = .51, 95% CI [96.24, 107.48].
If we selected all possible samples with n = 36, the mean IQ from each of those samples would form the sampling distribution of the mean (see the Monte Carlo simulation in Appendix B). The mean of this sampling distribution will equal the population mean and its standard deviation is the standard error of the mean (SEM). The SEM indicates the average difference between the population mean and a sample mean when n = 36.
Appendix B shows a Monte Carlo simulation that can be used to test how changes in sample size the number of samples selected affects the standard error of the mean.
The t-statistic is a ratio between an obtained mean difference and an estimate of the SEM based on the sample scores, in other words, it indicates how far a sample mean falls from the population mean in terms of the estimated SEM (\(s_{\bar{Y}}\)). For example, a t-statistic = +1 indicates that the sample mean falls exactly one SEM above the population mean.
Entering the sample standard deviation = 16.6 and sample size = 36 into the estimated SEM (\(s_{\bar{Y}}\)) formula shows:
<- round(16.6/sqrt(36),3)
SEM SEM
## [1] 2.767
Entering the mean difference = 1.8611 and estimated SEM = 2.767 into the t formula shows:
<- round((101.8611 - 100)/SEM, 3)
t t
## [1] 0.673
So, the t calculation shows that the sample mean = 101.8611 IQ points is approximately 2/3s of one standard error above the population mean = 100.
For a t-test with 35 degrees of freedom, the mean difference must exceed 2.03 standard errors to fall in the 2.5% critical regions below or above the population mean. Because t = 0.673 falls well short of the positive critical region (red), we fail to reject the null hypothesis: The therapy has no effect on IQ
There is a long standing claim that male and female IQ scores have the same mean but differ in variability. Simulate a distribution of IQ scores for males and females, N = 10,000 each, that is consistent with this claim.
set.seed(123)
<- data.frame(
IQg.df gender = factor(rep(c("Female", "Male"), each = 10000)), # creates a categorical variable with labels
Y = round(c(rnorm(10000, mean = 100, sd = 13), rnorm(10000, mean = 100, sd = 17)),0))
head(IQg.df)
## gender Y
## 1 Female 93
## 2 Female 97
## 3 Female 120
## 4 Female 101
## 5 Female 102
## 6 Female 122
Create a descriptive statistics summary table by category using the describeBy()
function in the psych package to compare the male and female IQ distributions.
library(psych) # load psych package
<- describeBy(Y ~ gender, # quantitative variable ~ categorical variable
summary.df data = IQg.df, mat = TRUE, skew = FALSE, digits = 2) # data frame, matrix format, omit skew and kurtosis, round
# show the results summary.df
## item group1 vars n mean sd min max range se
## X11 1 Female 1 10000 99.97 12.98 50 150 100 0.13
## X12 2 Male 1 10000 99.84 17.03 41 164 123 0.17
The male and female distributions can be plotted separately, but it’s easier to visually compare two distributions when they appear in the same plot along with a visual indication of their means and SDs. To make this more visually appealing, we will use density plots instead of histograms. In very simplistic terms, a density plot is a smoothed estimate of a frequency distribution.
ggplot(IQg.df) +
geom_density(aes(x = Y, color = gender, fill = gender), alpha = 0.4) + # plot density
scale_color_manual(values=c(hsv(0.5, 0.5, 0.5), hsv(0.7, 0.7, 0.7))) + # line color
scale_fill_manual(values=c(hsv(0.5, 0.5, 0.5), hsv(0.7, 0.7, 0.7))) + # fill color
geom_vline(data = summary.df, aes(xintercept = mean, color = group1), linetype = "dashed") + # mean lines
geom_vline(data = summary.df, aes(xintercept = mean + sd, color = group1), linetype = "dotted") + # SD + line
geom_vline(data = summary.df, aes(xintercept = mean - sd, color = group1), linetype = "dotted") + # SD - line
labs(x = "IQ score", y = "Density") # axis labels
This Monte Carlo simulation is designed to randomly select samples with size n from a normally distributed population (the normal distribution, \(\mu\) = 0, \(\sigma\) = 1). In a real behavioral research situation, the population of interest may be IQ scores, neuroticism scores, or helpfulness scores, but assuming the scores are normally distributed, they will look like this:
curve(dnorm(x),
xlim = c(-4, 4),
main = "Hypothetical Population of Raw Scores (Y)",
xlab = "z",
ylab = "density",
bty = "n", # removes borders
lwd = 2)
Simulate a sampling distribution of the mean
A sampling distribution of the mean is produced by randomly selecting all the possible samples of size n from a population and calculating the sample mean for each. With the large populations and small samples that are typical in behavioral research, the number of repetitions needed to collect “all the possible samples” is very large, but approximating this process with a smaller population and far fewer repetitions will still reveal the important connection between sample size and sampling error.
The following Monte Carlo Simulation is “self-contained”, meaning you should be able to clear everything from your R work space, copy and paste the code into the console, and hit enter. The code for this and many other simulations can be found at econometrics-with-r.org.
# set sample size and number of samples
<- 100
n <- 1000
reps
# perform random sampling; 100 x 1,000 sample matrix
<- replicate(reps, rnorm(n))
samples
# compute sample means
<- colMeans(samples)
sample.avgs
#Plot the density histogram
hist(sample.avgs,
xlim = c(-0.3,0.3),
freq = FALSE,
xlab = paste("n = ", n, " SEM = ", 1/sqrt(n)))
# overlay the theoretical distribution of sample averages on top of the histogram
curve(dnorm(x, sd = 1/sqrt(n)),
col = "red",
lwd = "2",
add = TRUE)
The first two lines of the code set the sample size n = 100 and the number of samples to be selected, 1,000. The third and fourth lines organize the scores into a matrix (one sample per column) and calculate the mean for each sample, \(\bar{Y}\). By inspecting the samples matrix in the “Environment” tab, you can confirm that each of the 1,000 samples is unique.
The last code segment plots the red line, which is the predicted sample mean distribution with a standard deviation equal to the standard error of the mean, sd = 1/sqrt(n), which is \(\pm\) 0.1 when n = 100. A simple visual inspection of the plots below will show that well over half (actually, a bit over 2/3) of the sample means fall within \(\pm\) 1 standard error of \(\mu\) = 0.
To see how larger sample sizes reduce sampling error, or the discrepancy between \(\mu\) and \(\bar{Y}\), simply rerun the simulation over and over with different sample sizes. Below are sampling distributions of the mean where n = 100, n = 1,000 and n = 10,000. Remember that the hypothetical population from which the samples are drawn had \(\mu\) = 0, so sample means that fall above 0 are overestimates of the population mean and sample means that fall below 0 are underestimates.
Notice that the X-axis scale has been limited from -0.3 to +0.3 to show how the sample means get more and more concentrated around \(\mu\) = 0 with larger sample sizes. This concentration means that many more sample means are pilled up in this location, and as a result, the Y-axis scale gets larger with each increase in n even though each plot contains 1,000 sample mean scores. Another way to make the comparison is to overlay the first two prediction lines on the last plot (n = 100 is green and n = 1,000 is blue). Here, you can see how the 1,000 means are being squeezed into closer and closer proximity to \(\mu\) = 0, which is what you’d expect to see as your sample means become better and better estimates of \(\mu\).