Datasets

All datasets used in Margin of Error are available for download as CSV files. See Appendix B of the book for complete variable descriptions, source documentation, and notes on simulation methodology.

Download

Dataset	Chapter	Rows	Description	Download
`flint-water-lead.csv`	1	271	Lead levels from the Virginia Tech Flint Water Study (2015)	Download
`sampling-demo.csv`	2	200	Simulated student population for sampling-method demos	Download
`acs-household-income.csv`	3	1,200	U.S. household income from the 2022 American Community Survey 1-Year PUMS	Download
`housing-redlining.csv`	4	551	HOLC redlining grades linked to modern demographics	Download
`spurious-correlations.csv`	4	10	Pairs of variables with high correlation but no causal link	Download
`covid-testing.csv`	5	2,000	Simulated COVID test results with true infection status	Download
`birth-weights.csv`	6	944	Birth weights and maternal characteristics (OpenIntro `births14`)	Download
`polling-data.csv`	7	50	Real U.S. election polls with margins of error (FiveThirtyEight)	Download
`energy-reports.csv`	8	200	Home energy report experiment (treatment vs. control)	Download
`resume-callbacks.csv`	9	4,870	Bertrand & Mullainathan (2004) résumé audit study	Download
`education-spending.csv`	10	50	State-level education spending and test scores	Download
`wage-gap.csv`	11	534	CPS 1985 wage data with gender, experience, education, and occupation	Download

Loading Data in R

library(tidyverse)

# Option 1: Download the CSV and read from your local working directory
flint <- read_csv("flint-water-lead.csv")

# Option 2: Read directly from this site
flint <- read_csv("https://stats.marginoferrormedia.com/datasets/flint-water-lead.csv")

The `_common.R` setup file

The R Companion walkthroughs each begin with source("_common.R"). That file loads tidyverse and defines the book’s color palette and theme_moe() ggplot theme, so the figures match the book’s look. To run a walkthrough on your own machine, download _common.R and place it in the same folder as your script. If you would rather skip it, replace the source() line with library(tidyverse) — the analyses still run; only the styling changes.

Data Sources

The datasets are a mix of real public data and carefully simulated data calibrated to published research. Real datasets include the Virginia Tech Flint Water Study, FiveThirtyEight’s redlining and polling collections, the Bertrand & Mullainathan résumé audit, OpenIntro’s births14, and the CPS 1985 wage data from the AER package. Simulated datasets reproduce key statistical features from published studies while keeping individual-level data freely available for teaching.

See Appendix B for full source citations, variable codebooks, and simulation notes.

Datasets

Download

Loading Data in R

The _common.R setup file

Data Sources

The `_common.R` setup file