# Load the shared theme (optional — download _common.R from the Datasets page)
source("_common.R")
# Or just load tidyverse if you don't need the book's theme
# library(tidyverse)
# Option 1: Download the CSV and load from your local machine
flint <- read_csv("flint-water-lead.csv")
# Option 2: Load directly from the companion site
flint <- read_csv("https://stats.marginoferrormedia.com/data/flint-water-lead.csv")Datasets
All datasets used in Margin of Error are available for download as CSV files. See Appendix B in the book for complete variable descriptions and source documentation.
Download
| Dataset | Chapter | Rows | Description | Download |
|---|---|---|---|---|
flint-water-lead.csv |
1 | 271 | Lead levels in Flint, Michigan water samples | Download |
sampling-demo.csv |
2 | 200 | Simulated student population for sampling methods | Download |
income-inequality.csv |
3 | 500 | Household income data by education, region, and household size | Download |
housing-redlining.csv |
4 | 551 | HOLC redlining grades linked to modern demographics | Download |
spurious-correlations.csv |
4 | 10 | Pairs of variables with high correlation but no causal link | Download |
covid-testing.csv |
5 | 2,000 | Simulated COVID test results with true infection status | Download |
birth-weights.csv |
6 | 944 | Birth weights and maternal characteristics | Download |
polling-data.csv |
7 | 50 | Real U.S. election polls with margins of error | Download |
energy-reports.csv |
8 | 200 | Home energy report experiment (treatment vs control) | Download |
resume-callbacks.csv |
9 | 4,870 | Resume audit study on racial discrimination in hiring | Download |
education-spending.csv |
10 | 50 | State-level education spending and test scores | Download |
wage-gap.csv |
11 | 534 | Wage data with gender, experience, education, and occupation | Download |
R Setup File
The R Walkthroughs on this site use a shared setup file called _common.R. It loads the tidyverse, defines the book’s color palette, and creates a custom ggplot theme (theme_moe()) so all plots have a consistent look.
If you want to run the walkthrough code on your own machine, download this file and place it in the same folder as your R script or R Markdown document.
The file defines:
moe_colors— a named list of colors used throughout the book (navy, teal, amber, coral, etc.)moe_palette— a discrete color scale for ggplottheme_moe()— a clean ggplot theme with book-consistent stylingscale_color_moe()andscale_fill_moe()— convenience wrappers
Each walkthrough begins with source("_common.R"). If you prefer not to use it, you can replace that line with library(tidyverse) and the code will still work — the plots will just use default ggplot styling instead of the book’s theme.
Loading Data in R
Data Sources
Datasets are a mix of real public data and carefully simulated data calibrated to published research. Real datasets include the Flint water study, FiveThirtyEight’s redlining analysis, the Bertrand & Mullainathan resume audit experiment, NCES education spending data, and CPS wage data. Simulated datasets reproduce key statistical features from published studies while making individual-level data freely available for teaching.
See Appendix B in the book for full source citations, variable codebooks, and notes on simulation methodology.