Researchers interested in the factors that affect memory encoding conducted a study to test the following hypothesis:
People presented as cheaters will attract greater attention, and will therefore be more memorable than people who are presented as trustworthy.
Design and Procedure: One hundred twenty college student participants were recruited to participate in a study on attractiveness, which hid the true purpose of the investigation. In the first session, participants were asked to rate the attractiveness of people in 10 mock news paper articles, which included a photo and brief written description. Prior to the start of the session, participants were randomly assigned to one of three conditions (X); 1) cheating description, 2) neutral description, or 3) trustworthy description. A week later, participants were invited back to see how many of the people they remembered seeing (from a mix of new and previously viewed images). Their accuracy on the memory task was the primary dependent measure (Y).
Load and view the data in the console:
= read.csv("https://andrewebrandt.github.io/object/datasets/memory.csv", header = TRUE)
memory.df head(memory.df)
## Cond.X Correct.Y
## 1 N 7
## 2 N 8
## 3 N 7
## 4 N 8
## 5 N 8
## 6 N 8
This example is based on Mealey,
Daood, and Krage (1996)
Provide clear condition labels in memory.df and change Cond.X from character to factor data type. See more on R data types in this tutorial.
library(dplyr)
<- memory.df %>% # overwrite data set
memory.df mutate(Cond.X = case_when( # change condition labels
== "N" ~ "Neutral",
Cond.X == "T" ~ "Trust",
Cond.X == "C" ~ "Cheat")) %>%
Cond.X mutate(Cond.X = as.factor(Cond.X)) # change Cond.X from character to factor
head(memory.df)
## Cond.X Correct.Y
## 1 Neutral 7
## 2 Neutral 8
## 3 Neutral 7
## 4 Neutral 8
## 5 Neutral 8
## 6 Neutral 8
Generate a descriptive summary on memory scores across the cheat, neutral, and trustworthy conditions:
library(psych)
<- describeBy(
Describe.df 2], # data frame, Y scores in the second column
memory.df[$Cond.X, # grouping variable
memory.dfmat = TRUE, # matrix format
digits = 2) # round values to 2 digits
# show descriptive Describe.df
## item group1 vars n mean sd median trimmed mad min max range
## Correct.Y1 1 Cheat 1 40 9.10 0.96 9 9.22 1.48 7 10 3
## Correct.Y2 2 Neutral 1 40 7.85 0.83 8 7.81 1.48 6 10 4
## Correct.Y3 3 Trust 1 40 7.92 0.83 8 7.91 1.48 6 10 4
## skew kurtosis se
## Correct.Y1 -0.71 -0.61 0.15
## Correct.Y2 0.27 -0.30 0.13
## Correct.Y3 0.13 -0.29 0.13
Plot the means and SEMs across the cheat, neutral, and trustworthy conditions:
library(ggplot2)
ggplot(Describe.df, aes(x = group1, y = mean)) + # plot X and Y
geom_col( # bar graph
width = 0.5,
color = "black",
fill = hsv(0.6, 0.4, 0.7)) +
geom_errorbar(aes(ymin = mean-se, ymax = mean+se),# calculate error bar
color = "black", # error bar color
width = .1) + # error bar size
scale_x_discrete(name = "Information Condition", # reorder conditions
limits = c("Neutral", "Trust", "Cheat")) +
scale_y_continuous(name = "Mean Recall Score", limits = c(0, 10)) +
ggtitle("Recall across information conditions") +
theme_minimal()
The F-statistic is a test of mean equivalence based on the ratio of treatment effect (\(MS_A\)) to error (\(MS_E\)). In the R source table, these values appear in the “Mean Sq” column.
Find the F-statistic using the aov()
function:
<- aov(Correct.Y ~ Cond.X, data = memory.df) # outcome variable ~ predictor variable
aov.mem summary(aov.mem)
## Df Sum Sq Mean Sq F value Pr(>F)
## Cond.X 2 39.32 19.658 25.71 5.57e-10 ***
## Residuals 117 89.48 0.765
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA source table shows that \(\text{p} \le .05\) so:
Cohen’s f is a standardized measure of effect size:
Find Cohen’s f:
# find Cohen's f
library(effectsize)
effectsize(aov.mem, type = "f")
## For one-way between subjects designs, partial eta squared is equivalent to eta squared.
## Returning eta squared.
## # Effect Size for ANOVA
##
## Parameter | Cohen's f | 95% CI
## -----------------------------------
## Cond.X | 0.66 | [0.49, Inf]
##
## - One-sided CIs: upper bound fixed at [Inf].
Researchers may want to check for extreme outliers and the 3 assumptions about the populations from which the sample data was collected.
Check for outliers (“is.outlier”) and extreme outliers (“is.extreme”):
library(rstatix)
%>%
memory.df group_by(Cond.X) %>%
identify_outliers(Correct.Y)
## # A tibble: 2 × 4
## Cond.X Correct.Y is.outlier is.extreme
## <fct> <int> <lgl> <lgl>
## 1 Neutral 10 TRUE FALSE
## 2 Trust 10 TRUE FALSE
Use the Shapiro-Wilk test to assess normality (p < .001 suggests the sample has been drawn from a non-normal population distribution):
library(dplyr)
%>%
memory.df group_by(Cond.X) %>%
summarise(statistic = shapiro.test(Correct.Y)$statistic,
p.value = shapiro.test(Correct.Y)$p.value)
## # A tibble: 3 × 3
## Cond.X statistic p.value
## <fct> <dbl> <dbl>
## 1 Cheat 0.817 0.0000154
## 2 Neutral 0.877 0.000424
## 3 Trust 0.881 0.000552
Use Levene’s test to assess homogeneity of variance (p < .05 suggests heterogeneous variance):
library(car)
leveneTest(Correct.Y ~ Cond.X, data = memory.df)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 1.0257 0.3618
## 117
“Tukey’s honestly significant difference (HSD) test controls the familywise error for the set of all possible pairwise comparisons. It is a simultaneous method for testing hypotheses or constructing confidence intervals, meaning that a single critical value is used to evaluate all contrasts in a set” (Myers, Well, & Lorch, 2010, p. 252).
With some “follow-up tests” like Tukey’s HSD, a significant
F-statistic is not a prerequisite (Myers, Well, & Lorch, 2010). For
other comparison options, see the DescTools
package.
Use tukey_hsd
to run Tukey’s HSD test on recall scores
(Correct.Y) between all possible pairs of conditions (Cond.X):
# Tukey's tests on all pairwise comparisons
library(rstatix)
<- memory.df %>%
q.mem tukey_hsd(Correct.Y ~ Cond.X)
q.mem
## # A tibble: 3 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj p.adj…¹
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Cond.X Cheat Neutral 0 -1.25 -1.71 -0.786 1.04e-8 ****
## 2 Cond.X Cheat Trust 0 -1.18 -1.64 -0.711 6.49e-8 ****
## 3 Cond.X Neutral Trust 0 0.0750 -0.389 0.539 9.22e-1 ns
## # … with abbreviated variable name ¹p.adj.signif
The output table shows the mean difference (“estimate”), confidence interval (“conf.low” “conf.high”), and p-value (“p.adj”) for each pairwise comparison. This combination of tests indicates that recall scores were significantly higher in the cheat condition than in the neutral or trust conditions:
Cohen’s d for any pairwise comparison can be calculated by dividing the mean difference by the pooled standard deviation from the ANOVA model:
Using following two-step method, calculate and save the d values in a new data frame, cohen.df, then use kable() to show the results in a table.
# New data frame with comparison names in the first column and d values in second column
<- data.frame(
cohen.df compare = factor(rep(c("Cheat - Neutral", "Cheat - Trust", "Neutral - Trust"), each=1)),
d = c(1.25/(sqrt(0.765)), 1.18/(sqrt(0.765)), 0.075/(sqrt(0.765))))
# show comparison names and d values in a table
<- kable(cohen.df, # makes a simple table
cohen.kbl format = "html", # .html format
table.attr = "style='width:40%;'", # css control for column width
digits = 2, # round values to 2 decimal places
caption = "Effect size (d)", # table title
col.names = c("Comparison","Cohen's d")) # set column names
# show the table cohen.kbl
Comparison | Cohen’s d |
---|---|
Cheat - Neutral | 1.43 |
Cheat - Trust | 1.35 |
Neutral - Trust | 0.09 |
In an APA style paper, the results could be reported as follows:
An independent-samples F-test showed that recall scores significantly differed across the cheat, neutral, and trust information conditions, F(2,117) = 25.71, p < .001, \(\hat{f}\) = 0.66. Tukey’s tests showed that recall scores were significantly higher in the cheat (M = 9.10, SD = 0.96) condition than in the neutral (M = 7.85, SD = 0.83) (p < .001, d = 1.43) and trust (M = 7.92, SD = 0.83) (p < .001, d = 1.35) conditions.