1 Independent-samples experiments

A study by Zhong, Bohns, and Gino (2010) was designed to test the following research hypothesis:

Darkness may conceal identity and encourage moral transgressions, and if so, participants completing a test in a dimly-lit room should self report a test score that is higher than what they actually earned.

Independent variable (X): Eighty-four college student participants were randomly assigned to one of two testing conditions; 1) a small laboratory room that was normally illuminated or 2) a small laboratory room with dimmed lighting. The dimmed lighting was sufficient for participants to see the behavioral task, but it was visibly dimmer than the well-lit room.

Behavioral Task: Upon entering the testing room, participants received an envelope that contained ten dollars, which could be earned during the behavioral task. For the task, participants received 20 number matrices on separate pieces of paper, each of which consisted of 12 three-digit numbers in a grid format (see below). Participants had 5 minutes to find two numbers in each matrix that added up to 10.0. At the end of 5 minutes, participants scored their own matrices and collected 50 cents from the envelope for each correct matrix.


table
Matrix task


Dependent variable (Y): Participants dishonesty was measured as the discrepancy between their self-reported scores and their actual test performance (scored by the researcher after participants left the room). For example, a participant who reported 16/20 correct and actually scored 16/20 would have a discrepancy score = 0 (no dishonesty), whereas a participant who reported a score of 16/20 but actually scored 10/20 would have a discrepancy score = 6, so higher discrepancy scores indicate more dishonesty.

Load and view the data in the console:

honest.df = read.csv("https://andrewebrandt.github.io/object/datasets/honest.csv", header = TRUE)
head(honest.df)
##   light.cond Gender Discrep
## 1        Dim      F       3
## 2       Norm      M       5
## 3       Norm      F       2
## 4        Dim      M       4
## 5        Dim      F       4
## 6       Norm      M       2


1.1 Summarize data by group

Use the describeBy() function in the psych package to create a descriptive statistics summary table for the discrepancy scores in the normal and dim light conditions.

library(psych)
summary.df <- describeBy(Discrep ~ light.cond,               # quantitative variable ~ categorical variable
    data = honest.df, mat = TRUE, skew = FALSE, digits = 2)  # data frame, matrix format, omit skew and kurtosis, round
summary.df                                                   # show the results
##          item group1 vars  n mean   sd min max range   se
## Discrep1    1    Dim    1 42 4.05 1.51   1   7     6 0.23
## Discrep2    2   Norm    1 42 3.00 1.48   0   7     7 0.23


Use the ggplot() function in the ggplot2 package to create a strip chart for the discrepancy scores across conditions. Add a symbol to indicate the mean and error bars to indicate the SEM.

library(ggplot2)
ggplot(honest.df, aes(x = light.cond, y = Discrep, color = light.cond)) + 
  geom_jitter(na.rm = TRUE, position = position_jitter(0.1), 
              size = 3, alpha = 0.7) +
  stat_summary(fun = mean, na.rm = TRUE, geom = "point", 
               shape = 18, size = 3, color = "black") + 
  stat_summary(fun.data = mean_se, na.rm = TRUE, geom = "errorbar", 
               width = .1, color = "black") + 
  scale_x_discrete(name = "Light Condition") +  
  scale_y_continuous(name = "Mean Discrepancy Score") +
  theme_minimal() +
  theme(text = element_text(size = 16), legend.position = "none") + 
  scale_color_brewer(palette="Dark2")                         


Whenever you plot your data, it is good practice to check your work for accuracy by comparing the descriptive values reported in summary table to those in the plots (e.g., Are the means at the correct location?, Do the error bars appear to be the correct size?).

1.2 Null hypothesis significance test framework

  1. State the null hypothesis: \(H_0:\mu_{normal} = \mu_{dim}\)
  2. Define the critical region for the statistical decision: \(\alpha\) = .05
  3. Collect sample data and calculate statistics: Student’s or Welch’s
  4. Make a statistical decision: If \(\text{p-value} \le .05\), reject the null, otherwise, fail to reject the null
  5. Evaluate statistical conclusion validity
    • Type I error: Decision to reject \(H_0\) when it’s actually true
    • Type II error: Decision not to reject \(H_0\) when it’s actually false


1.3 Assumptions of Student’s and Welch’s t-statistic

Student’s t-statistic (William Gosset, 1908) was built on 3 assumptions about the populations from which the sample data was collected, and the degree to which the assumptions are met impacts your statistical conclusion validity.

  1. Independence: Scores in the population are independently distributed
  2. Normality: Populations have normally distributed scores
  3. Homogeneity of variance: Populations have equal variance

Welch’s t-statistic (Bernard Welch, 1938) is a modified version of Student’s t in which pooled variance is replaced by the variance of each sample and a more conservative degrees of freedom is used. It performs similarly to Student’s t when population variances are equal, and is less prone to Type I error when they are not. It comes with only two assumptions.

  1. Independence: Scores in the population are independently distributed
  2. Normality: Populations have normally distributed scores


1.4 Student’s t-statistic

Student’s t-statistic is the ratio between the sample mean difference (numerator) and the estimated standard error of the mean difference (denominator), or simply the ratio of the treatment effect to estimated error.


\(\displaystyle t = \frac{\bar{Y}_{1}-\bar{Y}_{2}}{s_{\bar{Y}1 - \bar{Y}2}} = \frac{\text{Treatment Effect}}{\text{Error}}\)


Find t and it’s associated p-value using the t.test() function:

t.test(Discrep ~ light.cond, data = honest.df, var.equal = TRUE) # use FALSE for Welch t-test
## 
##  Two Sample t-test
## 
## data:  Discrep by light.cond
## t = 3.2057, df = 82, p-value = 0.001921
## alternative hypothesis: true difference in means between group Dim and group Norm is not equal to 0
## 95 percent confidence interval:
##  0.3975129 1.6977252
## sample estimates:
##  mean in group Dim mean in group Norm 
##           4.047619           3.000000


Make a statistical decision: The results show that \(p \le .05\) so reject the null hypothesis, \(H_0:\mu_{normal} = \mu_{dim}\)


1.5 Effect size

Cohen’s d expresses the magnitude of the treatment effect in terms of standard deviation (pooled).

\(\displaystyle d = \frac{\bar{Y}_{1} - \bar{Y}_{2}}{\sqrt {s_{p}^{2}}}\)


Use the effectsize() function in the effectsize package to find d:

library("effectsize")
cohens_d(Discrep ~ light.cond, data = honest.df)
## Cohen's d |       95% CI
## ------------------------
## 0.70      | [0.26, 1.14]
## 
## - Estimated using pooled SD.


1.6 Reporting the results in APA-style

The following statements show how the results may be reported in an APA-style report.

An independent-samples t test showed that discrepancy scores were significantly higher in the dim light condition (M = 4.05, SD = 1.51) than in the normal light condition (M = 3.00, SD = 1.48), t(82) = 3.2057, p = .00192, d = 0.70.

2 Appendix B: Repeated-measures experiments

A study by Stephens, Atkins, and Kingston (2009) was designed to test the following research hypothesis:

Yelling a swear word during a painful experience may reduce the pain sensation, and if so, pain tolerance should be higher when yelling swear words during a painful laboratory task compared to when yelling neutral words.

Behavioral Task: The cold-pressor task is a measure of pain tolerance. Participants are asked to submerge their hand in icy water and to keep it there as long as they can before removing it from the water.

Independent variable (X): Sixty-seven college student participants completed two conditions with the cold-pressor task. During the “Neutral Words” condition, participants were instructed to yell non-swear words during the cold-pressor task. During the “Swear Words” condition, the same participants were instructed to yell swear words during the cold-pressor task. Condition order was counterbalanced across participants (e.g., half the participants experience the Neutral -> Swear condition order and the other half experienced the reverse order).

Dependent variable (Y): Several measures were collected, but we will focus on latency, which is the number of seconds participants kept their hand in the icy water.

Load and view the data in the console:

SBS11.2 = read.csv("https://andrewebrandt.github.io/object/datasets/SBS11.2.csv", header = TRUE)
head(SBS11.2)
##   P gender neutral swear
## 1 1      M      87   102
## 2 2      M     124   159
## 3 3      F     147   162
## 4 4      M     201   224
## 5 5      M     105   150
## 6 6      F     110   125


2.1 Reshape to the wide data format

The data in SBS11.2 are in wide format, i.e., latency scores from the neutral and swear conditions are in separate columns, so we will begin by reshaping it to a wide data frame called “pain.df” using the melt() function in the reshape2 package:

library(reshape2)
pain.df <- melt(SBS11.2,                   # stack this data frame)
     id.vars = c("P", "gender"),           # don't stack these variables
     measure.vars = c("neutral", "swear"), # stack these variables
     variable.name = "X.cond",             # name new X variable
     value.name = "Y.latency")             # name new Y variable

Once you have checked that the new data frame contains the correct X - Y pairs in each row, you can save the information to a new .csv file for later use (be sure to set your own file path).

write.csv(SBS11.2,"D:/My Drive/RStudio Working Directory/pain.df.csv", row.names = FALSE)


2.2 Describe and plot repeated-measures data

Load the psych() package then calculate and save a descriptive summary on latency scores in the neutral and swear word conditions:

library(psych)
painStats.df <- describeBy(
  pain.df[4],                 # data frame, Y scores in fourth column
  pain.df$X.cond,             # grouping variable
  mat = TRUE,                 # matrix format
  digits = 2)                 # round values to 2 digits
painStats.df                  # show descriptives
##            item  group1 vars  n   mean    sd median trimmed   mad min max range
## Y.latency1    1 neutral    1 67 105.39 42.41    105  103.73 35.58  17 210   193
## Y.latency2    2   swear    1 67 134.94 41.58    131  134.15 38.55  51 239   188
##            skew kurtosis   se
## Y.latency1 0.33     0.07 5.18
## Y.latency2 0.19    -0.01 5.08


Load the ggplot2() package and plot the means and SEMs for each condition:

library(ggplot2)
ggplot(painStats.df, aes(x = group1, y = mean)) +    # plot X and Y
  geom_col(                                          # bar graph
    width = 0.5,
    color = "black",
    fill = hsv(0.3, 0.5, 0.7)) +
  geom_errorbar(aes(ymin = mean-se, ymax = mean+se),# calculate error bar
                color = "black",                    # error bar color
                width = .1) +                       # error bar size
  xlab("Word Condition") +
  scale_y_continuous(name = "Mean Latency Score", limits = c(0, 150)) +
  ggtitle("Pain tolerance across word conditions")


2.3 Statistical significance test

  1. State the null and alternative hypotheses: \(H_0:\mu_{neutral} = \mu_{swear}\)
  2. Define the critical region for the statistical decision: \(\alpha\) = .05
  3. Collect sample data and calculate statistics: Paired-samples t-statistic (same as the one-sample t-statistic)
  4. Make a statistical decision: If \(\text{p-value} \le .05\), reject the null, otherwise, fail to reject the null
  5. Evaluate statistical conclusion validity
    • Type I error: Decision to reject \(H_0\) when it’s actually true
    • Type II error: Decision not to reject \(H_0\) when it’s actually false

The paired-samples t-statistic is based on two assumptions:

  1. Independence: Scores in the population are independently distributed
  2. Normality: Populations have normally distributed scores


2.4 Paired-samples t-statistic

The paired-samples t-statistic is a one sample t-test applied to difference scores (D); in this case, the difference between a participant’s latency score in the neutral and swear condition: \(D = Y_{neutral} - Y_{swear}\)


\(\displaystyle t = \frac{\bar{Y}_{D}}{s_{\bar{Y}D}} = \frac{\text{Treatment Effect}}{\text{Error}}\)


t.test(
  Y.latency ~     # data, DV scores
    X.cond,       # grouping variable, IV condition labels
  data = pain.df, # data frame
  paired = TRUE)  # paired t-test 
## 
##  Paired t-test
## 
## data:  Y.latency by X.cond
## t = -22.962, df = 66, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -32.12184 -26.98264
## sample estimates:
## mean difference 
##       -29.55224


Make a statistical decision: The results show that \(p \le .05\) so reject the null hypothesis, \(H_0:\mu_{neutral} = \mu_{swear}\)


2.5 Effect size

Cohen’s d expresses the magnitude of the treatment effect in terms of standard deviation (pooled).


\(\displaystyle d = \frac{\bar{Y}_{1} - \bar{Y}_{2}}{\sqrt {s_{p}^{2}}}\)


library(effectsize)
cohens_d(
  Y.latency ~       # data, DV scores
    X.cond,        # grouping variable, IV condition labels
  data = pain.df)   # data frame
## Cohen's d |         95% CI
## --------------------------
## -0.70     | [-1.05, -0.35]
## 
## - Estimated using pooled SD.


2.6 Reporting the results in APA-style

The following statements show how the results may be reported in an APA-style report.

A paired-samples t test showed that latency scores were significantly higher in the swear word condition (M = 134.94, SD = 41.58) than in the neutral word condition (M = 105.39, SD = 42.41), t(66) = 22.962, p < .001, d = 0.70.