Chapter 9: Comparing Groups

Overview

When you have more than two groups, ANOVA lets you test whether their means differ in a statistically meaningful way. In this walkthrough, we extend the resume study analysis from earlier chapters to compare callback rates and experience levels across multiple categories. This walkthrough accompanies Chapter 9 of Margin of Error.

Setup

# Loads tidyverse, book color palette, and theme_moe()
# Download _common.R from the Datasets page if running locally
source("_common.R")

Load and Explore the Data

The dataset contains 4,870 fictitious resumes sent in response to job postings. Each row records whether the resume received a callback, along with attributes like name type, gender, quality, experience, and education level.

resumes <- read_csv("data/resume-callbacks.csv")

Rows: 4870 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): name_type, gender, resume_quality, education, callback
dbl (2): resume_id, years_experience

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(resumes)

Rows: 4,870
Columns: 7
$ resume_id        <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
$ name_type        <chr> "White-sounding", "White-sounding", "Black-sounding",…
$ gender           <chr> "Female", "Female", "Female", "Female", "Female", "Ma…
$ resume_quality   <chr> "Low", "High", "Low", "High", "High", "Low", "High", …
$ years_experience <dbl> 6, 6, 6, 6, 22, 6, 5, 21, 3, 6, 8, 8, 4, 4, 5, 4, 5, …
$ education        <chr> "Bachelor", "High School", "Bachelor", "High School",…
$ callback         <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No",…

Before jumping into ANOVA, let’s get a feel for callback rates across the key grouping variables.

resumes |>
  group_by(resume_quality, education) |>
  summarize(
    n = n(),
    callback_rate = round(mean(callback) * 100, 1),
    .groups = "drop"
  )

Warning: There were 4 warnings in `summarize()`.
The first warning was:
ℹ In argument: `callback_rate = round(mean(callback) * 100, 1)`.
ℹ In group 1: `resume_quality = "High"` `education = "Bachelor"`.
Caused by warning in `mean.default()`:
! argument is not numeric or logical: returning NA
ℹ Run `dplyr::last_dplyr_warnings()` to see the 3 remaining warnings.

# A tibble: 4 × 4
  resume_quality education       n callback_rate
  <chr>          <chr>       <int>         <dbl>
1 High           Bachelor     1771            NA
2 High           High School   675            NA
3 Low            Bachelor     1733            NA
4 Low            High School   691            NA

One-Way ANOVA: Experience by Education Level

Does years of experience differ by education level? A one-way ANOVA tests whether the group means are equal.

aov_model <- aov(years_experience ~ education, data = resumes)
summary(aov_model)

              Df Sum Sq Mean Sq F value Pr(>F)
education      1     69   68.56   2.695  0.101
Residuals   4868 123838   25.44

The F-statistic measures how much the group means vary relative to within-group variation. A large F (with a small p-value) suggests that at least one education group has a meaningfully different average experience level.

Tukey HSD: Pairwise Comparisons

ANOVA tells us something differs, but not what. Tukey’s Honest Significant Difference test compares every pair of groups while controlling for multiple comparisons.

tukey_result <- TukeyHSD(aov_model)
tukey_result

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = years_experience ~ education, data = resumes)

$education
                          diff         lwr       upr     p adj
High School-Bachelor 0.2641073 -0.05129538 0.5795099 0.1007346

Look for confidence intervals that do not cross zero — those pairs have statistically significant differences.

Effect Size: Eta-Squared

Statistical significance does not tell us how large the effect is. Eta-squared gives the proportion of total variance explained by the grouping variable.

aov_summary <- summary(aov_model)[[1]]
ss_between <- aov_summary[["Sum Sq"]][1]
ss_total <- sum(aov_summary[["Sum Sq"]])
eta_sq <- ss_between / ss_total

cat("Eta-squared:", round(eta_sq, 4), "\n")

Eta-squared: 6e-04

cat("Interpretation:", round(eta_sq * 100, 1), "% of variance in experience is explained by education level\n")

Interpretation: 0.1 % of variance in experience is explained by education level

Guidelines: eta-squared of 0.01 is small, 0.06 is medium, and 0.14 or above is large.

Diagnostic Plots

ANOVA assumes normality of residuals and equal variances across groups. The built-in diagnostic plots help you check these assumptions.

par(mfrow = c(2, 2))
plot(aov_model)
par(mfrow = c(1, 1))

Residuals vs Fitted: look for a random scatter (no pattern means the equal-variance assumption holds).
Normal Q-Q: points should fall roughly on the diagonal line.
Scale-Location: a flat trend line supports equal variance.
Residuals vs Leverage: flags influential observations.

Visualization: Experience by Education

A boxplot with jittered points shows both the distribution shape and individual data points.

ggplot(resumes, aes(x = education, y = years_experience, fill = education)) +
  geom_boxplot(alpha = 0.6, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.15, size = 0.8, color = moe_colors$navy) +
  scale_fill_moe() +
  labs(
    title = "Years of Experience by Education Level",
    subtitle = "Boxplots with individual observations (jittered)",
    x = "Education Level",
    y = "Years of Experience"
  ) +
  theme_moe() +
  theme(legend.position = "none")

Figure 2: Years of experience by education level

Try It Yourself

Two-way ANOVA. Run an ANOVA comparing years_experience across all combinations of name_type and resume_quality. Use the formula aov(years_experience ~ name_type * resume_quality, data = resumes). Examine the summary — is there a significant interaction between name type and resume quality?
QQ plot of residuals. Extract residuals from your two-way ANOVA model with resid() and create a QQ plot using qqnorm() and qqline(). Do the residuals appear approximately normal?