Doubt is not a pleasant condition, but certainty is absurd. Voltaire

Similar documents
Grade 6 correlated to Illinois Learning Standards for Mathematics

Module - 02 Lecturer - 09 Inferential Statistics - Motivation

Logical (formal) fallacies

McDougal Littell High School Math Program. correlated to. Oregon Mathematics Grade-Level Standards

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21

Discussion Notes for Bayesian Reasoning

Module 02 Lecture - 10 Inferential Statistics Single Sample Tests

AKC Lecture 1 Plato, Penrose, Popper

Grade 7 Math Connects Suggested Course Outline for Schooling at Home 132 lessons

Ground Work 01 part one God His Existence Genesis 1:1/Psalm 19:1-4

Georgia Quality Core Curriculum

Some statistical quotes (Most compiled by Ewart Shaw at Warwick, Chuck Rohde and various internet sites)

MISSOURI S FRAMEWORK FOR CURRICULAR DEVELOPMENT IN MATH TOPIC I: PROBLEM SOLVING

IS THE SCIENTIFIC METHOD A MYTH? PERSPECTIVES FROM THE HISTORY AND PHILOSOPHY OF SCIENCE

Introduction Questions to Ask in Judging Whether A Really Causes B

Curriculum Guide for Pre-Algebra

Introduction to Inference

If I were to give an award for the single best idea anyone has ever had, I d give it to... Darwin

I thought I should expand this population approach somewhat: P t = P0e is the equation which describes population growth.

Writing a Strong Thesis Statement (Claim)

The following content is provided under a Creative Commons license. Your support

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur

Content Area Variations of Academic Language

Philosophy 12 Study Guide #4 Ch. 2, Sections IV.iii VI

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur. Lecture No. # 18 Acceptance Sampling

2.1 Review. 2.2 Inference and justifications

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras

PHILOSOPHIES OF SCIENTIFIC TESTING

FOURTH GRADE. WE LIVE AS CHRISTIANS ~ Your child recognizes that the Holy Spirit gives us life and that the Holy Spirit gives us gifts.

PAGLORY COLLEGE OF EDUCATION

POLS 205 Political Science as a Social Science. Making Inferences from Samples

The Decline of the Traditional Church Choir: The Impact on the Church and Society. Dr Arthur Saunders

SUITE DU MÉMOIRE SUR LE CALCUL DES PROBABILITÉS

Mètode Science Studies Journal ISSN: Universitat de València España

Lecture 9. A summary of scientific methods Realism and Anti-realism

9 Knowledge-Based Systems

Unless otherwise noted, Scripture quotations are from the New King James Version of the Bible.

August Parish Life Survey. Saint Benedict Parish Johnstown, Pennsylvania

How many imputations do you need? A two stage calculation using a quadratic rule

SEVENTH GRADE RELIGION

A Layperson s Guide to Hypothesis Testing By Michael Reames and Gabriel Kemeny ProcessGPS

Statistics for Experimentalists Prof. Kannan. A Department of Chemical Engineering Indian Institute of Technology - Madras

People Count: The Social Construction of Statistics

Identity and Curriculum in Catholic Education

THE GOD OF QUARKS & CROSS. bridging the cultural divide between people of faith and people of science

On the Relationship between Religiosity and Ideology

SUMMARY COMPARISON of 6 th grade Math texts approved for 2007 local Texas adoption

Commentary on Descartes' Discourse on Method and Meditations on First Philosophy *

By world standards, the United States is a highly religious. 1 Introduction

Studying Adaptive Learning Efficacy using Propensity Score Matching

Statistics, Politics, and Policy

Math Matters: Why Do I Need To Know This? 1 Logic Understanding the English language

Computational Learning Theory: Agnostic Learning

BJ: Chapter 1: The Science of Life and the God of Life pp 2-37

Nigerian University Students Attitudes toward Pentecostalism: Pilot Study Report NPCRC Technical Report #N1102

Let s explore a controversial topic DHMO. (aka Dihydrogen monoxide)

Factors related to students focus on God

Scientific Realism and Empiricism

BIO 221 Invertebrate Zoology I Spring Course Information. Course Website. Lecture 1. Stephen M. Shuster Professor of Invertebrate Zoology

Learning Algebra on the Right Side of the Brain

The Ten Suggestions. Remarks by. Ben S. Bernanke. Chairman. Board of Governors of the Federal Reserve System

Americano, Outra Vez!

Measuring religious intolerance across Indonesian provinces

Tuen Mun Ling Liang Church

From the Greek Oikos = House Ology = study of

Prentice Hall Biology 2004 (Miller/Levine) Correlated to: Idaho Department of Education, Course of Study, Biology (Grades 9-12)

COACHING THE BASICS: WHAT IS AN ARGUMENT?

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

Becoming Lutheran Quantitative Analysis Summary

Friends and strangers

Abstract. Coping with Difficult, Unanswered, and Unanswerable Questions

CSC290 Communication Skills for Computer Scientists

Okay, good afternoon everybody. Hope everyone can hear me. Ronet, can you hear me okay?

Reports of the Death of Speed of Light Decay are Premature

Investigating Nature Course Survey Spring 2010 (2104) Rankings Pre Post (1-5) (mean) (mean)

Unit. Science and Hypothesis. Downloaded from Downloaded from Why Hypothesis? What is a Hypothesis?

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

May Parish Life Survey. St. Mary of the Knobs Floyds Knobs, Indiana

While Most Americans Believe in God, Only 36% Attend a Religious Service Once a Month or More Often. by Humphrey Taylor

King and Kitchener Packet 3 King and Kitchener: The Reflective Judgment Model

MITOCW watch?v=4hrhg4euimo

Ch01. Knowledge. What does it mean to know something? and how can science help us know things? version 1.5

STB-MY34 - Masonic Geometry.TXT

The Fifth National Survey of Religion and Politics: A Baseline for the 2008 Presidential Election. John C. Green

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Lesson 2 The Existence of God Cause & Effect Apologetics Press Introductory Christian Evidences Correspondence Course

End of the year test day 2 #3

Marcello Pagano [JOTTER WEEK 5 SAMPLING DISTRIBUTIONS ] Central Limit Theorem, Confidence Intervals and Hypothesis Testing

Why We Should Trust Scientists (transcript)

Worldview Basics. Questions a Worldview Seeks to Answer (Part I) WE102 LESSON 02 of 05. What is real?

In Our Own Words 2000 Research Study

Inductive Logic. Induction is the process of drawing a general conclusion from incomplete evidence.

Boethius, The Consolation of Philosophy, book 5

Getting To God. The Basic Evidence For The Truth of Christian Theism. truehorizon.org

I think, therefore I am. - Rene Descartes

A Stroke of Genius: Striving for Greatness in All You Do

by scientists in social choices and in the dialogue leading to decision-making.

Limited Intervention

Virtue Ethics without Character Traits

REVEAL Spiritual Vitality Index for Brazos Meadows Baptist Church

Transcription:

Doubt is not a pleasant condition, but certainty is absurd. Voltaire

James G. Scott Statistical Modeling: AGentle Introduction

Copyright 2010 12 James G. Scott www.mccombs.utexas.edu/faculty/james.scott These lecture notes are copyrighted materials, and are made available solely for educational and not-forprofit use. Any unauthorized use or distribution without written consent is prohibited. First published January 2010

Introduction This book is about statistical modeling. Some people define this term loosely as fitting equations to data. That s close, but inexact. To do a bit better, let s take both words in turn. Statistics is the study of variation among cases: economic growth rates, dinosaur skull volumes, student SAT scores, genes in a population, Congressional party affiliations, drug dosage levels, your choice of toothpaste versus mine... really any variable that can be measured! A model is a metaphor, a description of a system that helps us to reason more clearly. Like all metaphors, models are approximations, and will never account for every last detail. In the words of the English statistician George Box: all models are wrong, but some are useful. Aerospace engineers work with physical models blueprints, simulations, mock-ups, wind-tunnel prototypes to help them understand a proposed airplane design. Geneticists work with animal models fruit flies, mice, zebrafish to help them understand heredity. We will work with statistical models to help us understand variation. Like the weather, most variation in the world exhibits some features that are predictable, and some that are unpredictable. Will it snow on Christmas day? It s more likely in Boston than Austin; that much we can anticipate. But even as late as Christmas eve, and even at the North Pole, nobody knows for sure. The crucial thing about statistical models is that they describe both predictable and unpredictable variation. More than that, they allow us to partition observed variation into its predictable and unpredictable components and not just in some loose allegorical way, but in a precise mathematical way that can, with perfect accuracy, be described as Pythagorean. (More on that later.) This focus on the structured quantification of uncertainty is what distinguishes statistical modeling from ordinary evidencebased reasoning. It s important to know what the evidence says, goes this line of thinking. But it s also important to know what it

8 doesn t say. Sometimes that s the tricky part. We will use statistical models for three purposes: (1) to explore a large body of evidence, so that we might identify predictable features or trends amid random variation. (2) to test our beliefs about cause-and-effect relationships among things in the world. (3) to predict the future behavior of some system, and to say something useful about what remains unpredictable. As Hippocrates enjoined doctors in the oath bearing his name: Declare the past, diagnose the present, foretell the future. 1 These 1 Epidemics Book I, section 11. are the goals not merely of statistical modeling, but of the scientific method more generally. Let s deal up front with some common misconceptions. First, many people assume that the job of a statistical modeler is to objectively summarize the facts, slap down a few error bars, and get out of the way. This view is mistaken. To be sure, statistical modeling demands a deep respect for facts, and for not allowing one s wishes or biases to change the story one tells with the facts. But the modeling process is inescapably subjective, in a way that should be embraced rather than ignored. Model-building requires not just technical knowledge of statistical ideas; it also requires care and judgment, and cannot be reduced to a flowchart, a table of formulas, or a tidy set of numerical summaries that wring every last drop of truth from a data set. There is almost never a single model that is obviously right. But there are definitely such things as good models and bad models, and learning to tell the difference is important. Just remember: calling a model good or bad requires knowing both the tool and the task. A shop-window mannequin is good for displaying clothes, but bad for training medical students about vascular anatomy. Second, many people assume that statistical models must be complicated in order to do justice to the real world. Not always: complexity sometimes comes at the expense of explanatory power. We must avoid building models calibrated so tightly to past experience that they do not generalize to future cases. This idea that theories should be made as complicated as they need to be, and no more so is often called Occam s Razor. A good model will be simple enough to understand and interpret, but not so simple

9 that it does any major intellectual violence to the system being modeled. All models of the world must balance these goals, and statistical models are no exception. Finally, many people also assume that statistical modeling involves difficult, tedious mathematics. Happily, this isn t true at all. In fact, virtually all common statistical models are accessible to anyone with a high-school mathematics education, and these days all the tedious calculations are taken care of by computers. Modeling is even fun, once you get the hang of it! Modeling then and now On the time scale of important post-enlightenment ideas, statistical modeling is middle-aged. An astronomer named Tobias Mayer was using something vaguely like linear modeling as early as 1750. 2 But most scholars credit two later mathematicians Legendre, a Frenchmen; and Gauss, a German with independently inventing the method of least squares some time between 1794 and 1805. That makes statistical modeling newer than the invention of calculus (credited jointly to Leibniz and Newton in the late 1600 s), but older than the idea of evolution by natural selection (credited jointly to Darwin and Wallace over a period spanning the 1830 s to the 1850 s). For most of the nineteenth century, statistical modeling largely remained the concern of a highly specialized cadre of astronomers and geophysicists. But by our own age one of fast, cheap computing and abundant data it has become ubiquitous. In fact, the very same principle of least squares proposed by Legendre and Gauss remains, over two hundred years later, an important part of the day-to-day toolkit for solving problems in fields from aeronautics to zoology and everywhere in between. But don t just take my word for it. The director of the White House Office of Management and Budget says so, too: The President has made it very clear that policy decisions should be driven by evidence accentuating the role of Federal statistics as a resource for policymakers. Robust, unbiased data are the first step toward addressing our long-term economic needs and key policy priorities. 3 So does the Journal of the American Medical Association, indirectly but pointedly, in its intimidating litany of statistical requirements: 2 Stephen M. Stigler, The History of Statistics: The Measurement of Uncertainty before 1900, pp. 16 25. Harvard University Press, 1986 3 Using Statistics to Drive Sound Policy. Office of Management and Budget Blog; May 8, 2009 Numerical results should be accompanied by confidence intervals, if applicable, and exact levels of statistical significance.

10 Evaluations of screening and diagnostic tests should include sensitivity, specificity, likelihood ratios, receiver operating characteristic curves, and predictive values. 4 4 JAMA Instructions for Authors, jama. ama-assn.org Even the New York Times says so: For Today s Graduate, Just One Word: Statistics. 5 Of course, for all that, our political and cultural climate still exhibits a streak of distrust toward statistics. Why else would Churchill s brazen instructions to a young protégé sound so depressingly familiar? 5 New York Times (Technology section); August 5, 2009 I gather, young man, that you wish to be a Member of Parliament. The first lesson that you must learn is that, when I call for statistics about the rate of infant mortality, what I want is proof that fewer babies died when I was Prime Minister than when anyone else was Prime Minister. 6 6 Quoted in The Life of Politics (1968), Henry Fairlie, Methuen, pp. 203 204 And why else would the famous remark, popularized by Twain and attributed to Disraeli, remain so apt, even a century later? Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: There are three kinds of lies: lies, damned lies, and statistics. 7 How do you tell the difference between robust, unbiased evidence, misleading irrelevance, and cynical fraud? In considering this question, you will already have appreciated at least two good reasons to learn statistical modeling: 7 Chapters from My Autobiography, North American Review (1907) (1) To use data honestly and credibly in the service of an argument you believe in. (2) To know how and when to be skeptical of someone else s damned lies. For as John Adams put it, Facts are stubborn things; and whatever may be our wishes, our inclinations, or the dictates of our passion, they cannot alter the state of facts and evidence. 8 8 Argument in Defense of the Soldiers in the Boston Massacre Trials (1770)