Preface

Vivek H. Patil, Ph.D.

Published by Margin of Error Media LLC

For every student who ever walked into a statistics course convinced they were “not a math person.”

For the educators who refuse to let a beautiful subject be reduced to formulas on a board.

And for anyone who has ever been misled by a number and wished they knew how to push back.

About the Author

Vivek H. Patil, Ph.D. has over two decades of experience in marketing, data analytics, and research methodology. His research integrates measurement theory and statistics with frameworks from economics, social psychology, and cognitive psychology to examine human behavior. He has authored or co-authored 25+ peer-reviewed articles in journals including the Journal of Business Research, PLOS ONE, Journal of Marketing Analytics, Scientometrics, and the American Journal of Health Promotion.

He holds a Ph.D. in Business (Marketing) from the University of Kansas, an M.Eng. in Software Systems from BITS Pilani, and a Master of Management Studies from BITS Pilani.

That grounding in both research and practice extends into his other work. He is the founder of VeloxPortfolio.com, an AI-powered tool that transforms resumes into portfolio websites, and the publisher of books on data analytics and research methods through Margin of Error Media LLC. His teaching materials, datasets, interactive tools, and current projects are available at patilv.com.

Preface

This book started with a frustration.

Across more than two decades of teaching courses that draw heavily on data and statistical reasoning — marketing research, business analytics, data visualization, survey design, multivariate statistics — I have watched students arrive convinced that anything involving numbers will be painful, abstract, and irrelevant to their lives. Some of them are right, depending on the course. Too many approaches to statistics treat the subject as a sequence of formulas to memorize and procedures to execute. They drain the life out of a deeply useful and consequential way of thinking humans have developed.

Statistics is not a collection of tests. It is a way of reasoning under uncertainty. It is how we figure out whether a medical treatment works, whether a policy is helping or hurting, whether the pattern we see in data is real or a mirage. It is how a pediatrician in Flint, Michigan showed that children were being poisoned by their own water supply. It is how researchers demonstrated that identical resumes with different names received different callback rates. It is how epidemiologists tracked the spread of a virus and how election forecasters try, and sometimes fail, to tell us what is coming.

Every one of those stories involves people. Real people, with real consequences riding on whether the analysis was done well or done badly. That is the thread that runs through this book: data analysis is a responsibility, more than a skill.

Margin of Error is written for anyone who needs to reason carefully with data — students in undergraduate or graduate courses, working professionals across fields, or anyone who regularly encounters quantitative claims and wants to evaluate them more honestly. The book is rigorous enough to serve as a primary course text and accessible enough for the reader who has not taken a formal statistics course in years.

Three things make this book different from the textbook you might expect.

First, every chapter opens with a real story. Not a contrived example, not a hypothetical scenario, but an actual situation where statistics mattered for real people. The Flint water crisis. The Facebook emotional contagion experiment. Redlining in American housing. Resume discrimination. Election polling. These stories are not decorations. They are the foundation. The statistical concept in each chapter emerges from the story, because that is how statistics works in practice: you start with a question that matters, and you figure out how to answer it with data.

Second, ethics is woven throughout, not siloed. There is no standalone chapter on ethics that students skip or instructors cut for time. Instead, every chapter includes moments where you confront the ethical dimensions of the method you are learning. Who collected this data? Who benefits from this analysis? Who might be harmed? When is it wrong to remove an outlier? What happens when people game the p-value until they get the result they want? These are not philosophical distractions. They are core statistical concerns.

Third, this book takes AI seriously without being an AI book. We are living through a moment when artificial intelligence can generate analyses, predictions, and recommendations at a scale no human could match. That makes statistical literacy more important, not less. AI can compute. It cannot think about what the computation means. Throughout this book, you will encounter “AI Reality Check” boxes that show how AI tools can get statistical reasoning wrong, and what a statistically literate person would catch. Your job is not to out-calculate a machine. Your job is to be the person who knows whether the calculation was the right one to run.

The book is software-agnostic. The main text teaches statistical thinking, not button-clicking. For hands-on work, the companion website at stats.marginoferrormedia.com hosts interactive applications that let you explore concepts visually, plus downloadable datasets for every analysis in the book.

I owe thanks to many people who shaped this book. My students, who over two decades have taught me what confuses them, what excites them, and what they need from a course built around data. My colleagues and collaborators, who create an environment where teaching and scholarship reinforce each other. And the researchers who collected and shared the data used throughout this book, sometimes at considerable personal effort, so that others could learn from real problems.

Most of all, I owe thanks to my family. My wife, Anvita, whose patience with a husband who talks about sampling distributions at dinner is a form of grace I do not deserve. And my daughters, Aashi and Avisha, whose curiosity sparks ideas, whose skepticism discounts the bad ones, and whose presence is a daily reminder to do better work. This book exists because they made the space for it.

Vivek H. Patil

How to Use This Book

Chapter Structure

Every chapter follows the same pattern:

Opening Story: A real case where the chapter’s statistical concept played a central role. Read it. It sets up everything that follows.
Main Content: The statistical ideas, explained in plain language first and then in formal terms. Formulas are introduced when they clarify thinking, not for their own sake.
Notes: Short asides in boxes throughout the text. Historical context, interesting facts, or dry observations. They reward the reader who slows down.
Ethics Moments: These are not interruptions. They are part of the method.
AI Reality Check: Brief examples of how AI tools can stumble on statistical reasoning, and what a careful analyst would catch.
Try It Online: Callouts that point you to interactive applications on the companion website. When you see one, open it.
Exercises: Three tiers at the end of every chapter.
- Check Your Understanding: Can you explain the concepts in your own words?
- Apply It: Can you work through the analysis?
- Think Deeper: Can you wrestle with what the numbers mean and whether the analysis was done responsibly?

The Companion Website

Visit stats.marginoferrormedia.com for:

Interactive Applications: Tools that let you explore sampling, probability, confidence intervals, hypothesis testing, regression, and more — no installation required, everything runs in your browser.
Datasets: Downloadable CSV files for every dataset used in the book.
Instructor Resources: Under development. Contact patilv@gmail.com for the current state.

A complete, chapter-by-chapter list of the interactive applications used in the book is maintained at stats.marginoferrormedia.com/apps.html. Every “Try It Online” callout in the chapters links into that page, so if an app’s host ever moves, the book’s references stay valid.

Errata and Feedback

This book will improve over time, and I want your help. If you find an error, have a suggestion, or want to flag something that could be explained better, you can open an issue at github.com/patilv/moe-stats-issues or email me directly at patilv@gmail.com. Every piece of feedback is read, and corrections are incorporated into future editions.

For Instructors

The exercise sets are deliberately large so that you can vary assignments across sections and terms. The main text is software-agnostic; companion materials use R with the tidyverse, and instructors using Python or other environments can adapt the datasets and exercises without loss. Instructor materials including sample syllabi and additional assessment resources are under development. Contact patilv@gmail.com in the meantime.

For Students

If you are anxious about this material, you are in good company. Most people are. The single best thing you can do is work through the examples and exercises rather than just reading about them. Open the companion site. Run the code. Use the interactive applications. Statistics is not a spectator sport.

What to Expect

Over the next twelve chapters, we will build your statistical toolkit from the ground up.

Part I: Foundations (Chapters 1–4) covers the basics. You will learn to identify variable types, understand research design, summarize data with numbers and pictures, and start developing an eye for the difference between real patterns and noise.

Part II: Probability and Inference (Chapters 5–8) is the heart of the course. You will learn how probability works, why the normal distribution matters, what confidence intervals actually mean (it is not what most people think), and how hypothesis testing allows us to make decisions under uncertainty. This section also tackles the misuse of these tools head-on, including p-hacking, the replication crisis, and the difference between statistical significance and practical importance.

Part III: Comparing and Modeling (Chapters 9–12) puts the tools to work. You will compare groups using ANOVA, model relationships using regression, and examine when statistical analyses can and cannot support causal claims. The final chapter points toward where statistics goes from here, including brief introductions to time series, causal inference, machine learning, and Bayesian thinking.

Every dataset used in the exercises is available on the companion website. Some familiarity with R will help when working through the analyses, though the book is written so the statistical reasoning stands on its own. If you have never written a line of code, plan to work slowly with the early datasets and give yourself time to get comfortable before the methods get more involved.

Throughout, you will encounter real stories, real questions, and data drawn from real sources where possible. Where a concept is better illustrated with a constructed or simulated dataset, that is noted clearly in the text.

A Note on AI Tools

This book is my own work. I used AI tools at various points to support that work: to assist with building the technical infrastructure behind the companion website and interactive applications, to review and revise drafts, and for targeted research assistance. The primary tool was Claude, developed by Anthropic. Gemini, developed by Google, served as a secondary review tool in limited instances.

Every analytical position reflects my own judgment, shaped by two decades of work in research, teaching, and data analysis. The responsibility for what appears here, including any errors, is entirely mine.

I mention this not as a disclaimer but as a matter of principle. The norms around AI use in scholarly work are still being worked out, and I would rather be transparent about my process than pretend the tools I used do not exist.