R Companion — Chapter 7

R Companion — Chapter 7: Confidence Intervals

This walkthrough builds confidence intervals two ways — manually from the formula and via R’s built-in functions — then runs a coverage simulation that makes the 95% claim visible.

Setup

source("_common.R")
Warning: package 'tidyverse' was built under R version 4.5.2
Warning: package 'ggplot2' was built under R version 4.5.2
Warning: package 'tibble' was built under R version 4.5.2
Warning: package 'tidyr' was built under R version 4.5.2
Warning: package 'readr' was built under R version 4.5.2
Warning: package 'purrr' was built under R version 4.5.2
Warning: package 'dplyr' was built under R version 4.5.2
Warning: package 'stringr' was built under R version 4.5.2
Warning: package 'forcats' was built under R version 4.5.2
Warning: package 'lubridate' was built under R version 4.5.2
set.seed(42)

Download _common.R to run this locally — it loads tidyverse and the book’s plot theme. Or replace the source() line with library(tidyverse).

CI by hand

Given a sample mean of 72, SD of 12, and n of 36, the 95% CI for the population mean is:

\[ \bar{x} \pm t^{*} \cdot \frac{s}{\sqrt{n}} \]

xbar <- 72
s    <- 12
n    <- 36

t_star <- qt(0.975, df = n - 1)
margin <- t_star * s / sqrt(n)

c(lower = xbar - margin, upper = xbar + margin)
   lower    upper 
67.93978 76.06022 

qt(0.975, df = 35) looks up the t-value that puts 97.5% of the t-distribution below it. With 35 degrees of freedom, that is roughly 2.03.

Same CI via t.test()

sample_data <- rnorm(36, mean = 72, sd = 12)

result <- t.test(sample_data, conf.level = 0.95)
result$conf.int
[1] 67.90930 77.71205
attr(,"conf.level")
[1] 0.95

The CI from t.test() matches the by-hand calculation in form (slightly different in value because the simulated sample mean isn’t exactly 72). When you have actual data, t.test() is the way you compute a CI in practice.

CI for a proportion

Suppose 220 of 400 voters support a measure.

phat   <- 220 / 400
n_prop <- 400
z_star <- qnorm(0.975)

margin_p <- z_star * sqrt(phat * (1 - phat) / n_prop)
c(lower = phat - margin_p, upper = phat + margin_p) |> round(4)
 lower  upper 
0.5012 0.5988 
prop.test(220, 400, conf.level = 0.95, correct = FALSE)$conf.int
[1] 0.5010010 0.5980478
attr(,"conf.level")
[1] 0.95

prop.test() uses a slightly different formula (Wilson interval by default, classical with correct = FALSE). Both methods give similar answers in the middle of the parameter space; they diverge near 0 or 1.

Real polling data

polls <- read_csv("../datasets/polling-data.csv")
glimpse(polls)
Rows: 50
Columns: 8
$ poll_id          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
$ pollster         <chr> "KFF", "KFF", "RMG Research", "SSRS", "The New York T…
$ date             <date> 2020-09-06, 2020-09-06, 2020-09-11, 2020-09-11, 2020…
$ sample_size      <dbl> 1172, 1298, 941, 787, 440, 2006, 523, 1988, 233, 810,…
$ candidate_a_pct  <dbl> 45.00, 45.00, 48.00, 49.00, 47.00, 47.00, 45.00, 48.0…
$ candidate_b_pct  <dbl> 43.00, 40.00, 43.00, 46.00, 45.00, 37.00, 45.00, 41.0…
$ margin_of_error  <dbl> 2.8, 2.7, 3.2, 3.5, 4.7, 2.2, 4.3, 2.2, 6.4, 3.4, 0.7…
$ confidence_level <dbl> 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95,…
# Pick one poll: candidate A's support, computed CI by hand
one_poll <- polls |> slice(1)
one_poll
# A tibble: 1 × 8
  poll_id pollster date       sample_size candidate_a_pct candidate_b_pct
    <dbl> <chr>    <date>           <dbl>           <dbl>           <dbl>
1       1 KFF      2020-09-06        1172              45              43
# ℹ 2 more variables: margin_of_error <dbl>, confidence_level <dbl>
p     <- one_poll$candidate_a_pct / 100
n_p   <- one_poll$sample_size
moe_pct <- qnorm(0.975) * sqrt(p * (1 - p) / n_p) * 100
c(lower = one_poll$candidate_a_pct - moe_pct,
  upper = one_poll$candidate_a_pct + moe_pct) |> round(2)
lower upper 
42.15 47.85 

The reported margin of error in the dataset (margin_of_error) is the same number (within rounding) — it comes from this same formula.

The coverage simulation: what 95% confidence really means

true_mean <- 50
true_sd   <- 10
n_each    <- 30
n_sim     <- 100

sims <- map_dfr(1:n_sim, function(i) {
  x <- rnorm(n_each, mean = true_mean, sd = true_sd)
  tt <- t.test(x, conf.level = 0.95)
  tibble(
    interval = i,
    lower    = tt$conf.int[1],
    upper    = tt$conf.int[2],
    captures = lower <= true_mean & upper >= true_mean
  )
})

mean(sims$captures)
[1] 0.95
ggplot(sims, aes(y = interval)) +
  geom_segment(aes(x = lower, xend = upper, yend = interval,
                   color = captures), linewidth = 0.5) +
  geom_vline(xintercept = true_mean, linetype = "dashed",
             color = moe_colors$navy) +
  scale_color_manual(values = c("TRUE"  = moe_colors$teal,
                                "FALSE" = moe_colors$coral),
                     labels = c("TRUE" = "Captures truth",
                                "FALSE" = "Misses")) +
  labs(x = "Value", y = "Interval", color = NULL,
       title = paste0(round(100 * mean(sims$captures)),
                      " of 100 intervals captured the true mean")) +
  theme_moe()
Figure 23.1: 100 confidence intervals from independent samples of size n = 30 drawn from N(50, 10). Most contain the true mean (50, dashed line); a small number miss. Over many runs, the miss rate is about 5%.

The proportion of capturing intervals will be close to but rarely exactly 95%. That is the point — “95% confidence” is a long-run frequency statement, not a per-interval guarantee.

Sample size and margin of error

Larger samples shrink the margin by a factor of 1/√n.

size_sims <- map_dfr(c(10, 30, 100, 500), function(n_size) {
  x <- rnorm(n_size, mean = 50, sd = 10)
  tt <- t.test(x, conf.level = 0.95)
  tibble(
    n     = n_size,
    lower = tt$conf.int[1],
    upper = tt$conf.int[2],
    width = tt$conf.int[2] - tt$conf.int[1]
  )
})

size_sims |>
  mutate(n = factor(n, levels = c("10", "30", "100", "500"))) |>
  ggplot(aes(y = n)) +
  geom_segment(aes(x = lower, xend = upper, yend = n),
               color = moe_colors$teal, linewidth = 1.5) +
  geom_vline(xintercept = 50, linetype = "dashed", color = moe_colors$navy) +
  labs(x = "Value", y = "Sample size",
       title = "CI width shrinks with √n") +
  theme_moe()
Figure 23.2: CI width shrinks as sample size grows. Each interval drawn from N(50, 10).

To halve the margin of error you need four times the sample size — the cost-quality trade-off the chapter introduces.

Try it yourself

  1. 90% vs 99%. Re-run the coverage simulation with conf.level = 0.90 and again with 0.99. How do the capture rates compare? How do the interval widths compare?
  2. Sample-size calculator. Write a function that takes a desired margin of error and a population SD, and returns the sample size needed for a 95% CI. Test it on a SD of 10 and a target margin of 1 — how many observations do you need?