What’s Inside

Twelve chapters. Twelve real stories. Each chapter opens with a case that proves why the statistics matter — then gives you the tools to analyze it yourself.

Chapter 1: Why Statistics Matters Now

In 2014, the water in Flint, Michigan was poisoned. Officials said it was safe. Two researchers — two datasets, two statistical analyses — forced the truth into the open.

What you’ll learn:

  • What statistics is (and is not)
  • Variables, data types, and levels of measurement
  • How to read and critique data summaries
  • Why context matters more than calculations

Interactive Tool:Dataset: flint-water-lead.csv Key Concept: Statistical thinking

Chapter 2: Asking Good Questions: Research Design

In 2012, Facebook quietly altered the News Feeds of 689,000 users to test whether emotional content could change their moods. The results were modest. The backlash was volcanic. The problem was not the findings — it was the design.

What you’ll learn:

  • Observational vs. experimental studies
  • Populations and samples; sampling methods (SRS, stratified, cluster, systematic)
  • Bias, confounding, and ethics
  • Research design determines what you can conclude

Interactive Tool: Sampling Explorer Dataset: sampling-demo.csv Key Concept: Research design determines what you can conclude

Chapter 3: Summarizing Data with Numbers

The mean U.S. household income says $105,000. The median says $75,000. Same data, same country, thirty-thousand-dollar gap. Whichever number a politician reports tells you more about their agenda than about the economy.

What you’ll learn:

  • Mean, median, and mode
  • Standard deviation, IQR, and range
  • Skewness and outliers
  • Choosing the right summary for the shape of your data

Interactive Tool:Dataset: income-inequality.csv Key Concept: Summary statistics are choices, not facts

Chapter 4: Summarizing Data with Pictures

In the 1930s, federal agents color-coded American neighborhoods — green for “best” (white neighborhoods), red for “hazardous” (Black neighborhoods). Eighty years later, 74% of those redlined neighborhoods are still low-income. The numbers told the story. The maps made you see it.

What you’ll learn:

  • Histograms, boxplots, bar charts, and scatter plots
  • Correlation vs. causation
  • How visualizations can mislead
  • The data-ink ratio

Interactive Tool: Correlation Game Dataset: housing-redlining.csv, spurious-correlations.csv Key Concept: Visualization reveals what tables hide

Chapter 5: Probability: Thinking About Chance

Your COVID test is positive. The test is 90% accurate. So there’s a 90% chance you’re infected, right? Wrong. In a low-prevalence population, the real probability was about 15%. More than 8 in 10 positive results were false alarms.

What you’ll learn:

  • Probability rules
  • Conditional probability and independence
  • Bayes’ theorem
  • The base rate fallacy

Interactive Tool: Distribution Explorer Dataset: covid-testing.csv Key Concept: Bayes’ theorem

Chapter 6: The Normal Distribution and the Central Limit Theorem

Within minutes of being born, every baby gets weighed. That number gets compared against a distribution. If the baby falls in the tails, clinical decisions happen fast. The bell curve is not just a shape — it is a decision-making tool.

What you’ll learn:

  • Normal distribution properties
  • Z-scores and the empirical rule (68-95-99.7)
  • The Central Limit Theorem
  • When and why normality matters

Interactive Tool: Distribution Explorer Dataset: birth-weights.csv Key Concept: The Central Limit Theorem

Chapter 7: Confidence Intervals

On election night 2016, the polls “failed.” Except they didn’t. Clinton +3 with a margin of error of plus or minus 4 meant Trump +1 was always within the range. The polls said exactly that. America heard something else.

What you’ll learn:

  • Point estimates vs. interval estimates
  • Constructing confidence intervals for means and proportions
  • Interpreting margin of error
  • What “95% confidence” actually means

Interactive Tool: CI Simulator Dataset: polling-data.csv Key Concept: Margin of error

Chapter 8: Hypothesis Testing

Opower sent 10 million households a letter comparing their energy use to their neighbors’. Usage dropped 2%. Was it real — or was it noise? This is the question hypothesis testing was built to answer.

What you’ll learn:

  • Null and alternative hypotheses
  • Test statistics and p-values
  • Type I and Type II errors; statistical power
  • P-hacking and the replication crisis

Interactive Tool: Hypothesis Playground, P-Hacking Simulator Dataset: energy-reports.csv Key Concept: P-values and their misinterpretation

Chapter 9: Comparing Groups

Economists sent 5,000 identical resumes to real job postings. The only difference: half had names like Emily and Greg, half had names like Lakisha and Jamal. White-sounding names received 50% more callbacks. Higher qualifications helped white names. For Black names, they made almost no difference.

What you’ll learn:

  • Two-sample t-tests
  • ANOVA and post-hoc comparisons (Tukey)
  • Effect sizes
  • Ethics of group comparison research

Interactive Tool: ANOVA Visualizer Dataset: resume-callbacks.csv Key Concept: Comparing group means

Chapter 10: Simple Linear Regression

Per-pupil spending on one axis, test scores on the other, one dot per state. Advocates see a trend. Skeptics see noise. The question is not whether there’s a pattern — it’s how to describe it precisely enough that people can agree on what the data says.

What you’ll learn:

  • The least-squares line
  • Slope and intercept interpretation
  • R-squared and residual analysis
  • Assumptions and diagnostics

Interactive Tool: Regression Explorer Dataset: education-spending.csv Key Concept: Fitting a line through the noise

Chapter 11: Multiple Regression

The average woman earns 79 cents for every dollar a man earns. “Control for occupation and experience,” one side says, “and the gap nearly vanishes.” “You can’t control for occupation,” the other replies, “when occupation itself reflects discrimination.” What does “controlling for” actually mean?

What you’ll learn:

  • Multiple predictors and partial slopes
  • Interaction terms and multicollinearity
  • Adjusted R-squared
  • What “controlling for” really means — and its limits

Interactive Tool: Regression Explorer Dataset: wage-gap.csv Key Concept: What “controlling for” really means

Chapter 12: Where Do You Go from Here?

You have learned to read data, quantify uncertainty, and test claims. This final chapter is a roadmap — a guided tour of the statistical landscape that lies ahead.

What you’ll learn:

  • Time series and seasonality
  • Machine learning (logistic regression, train/test split)
  • Bayesian thinking (updating beliefs with data)
  • Causal inference methods; where to go next

Interactive Tool:Dataset:Key Concept: The road ahead