Project: The Power of a Hypothesis Test

Similar documents
Types of Error Power of a Hypothesis Test. AP Statistics - Chapter 21

Module 02 Lecture - 10 Inferential Statistics Single Sample Tests

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras

INTRODUCTION TO HYPOTHESIS TESTING. Unit 4A - Statistical Inference Part 1

Module - 02 Lecturer - 09 Inferential Statistics - Motivation

Chapter 20 Testing Hypotheses for Proportions

Probability Distributions TEACHER NOTES MATH NSPIRED

Statistics for Experimentalists Prof. Kannan. A Department of Chemical Engineering Indian Institute of Technology - Madras

Discussion Notes for Bayesian Reasoning

A Layperson s Guide to Hypothesis Testing By Michael Reames and Gabriel Kemeny ProcessGPS

Computational Learning Theory: Agnostic Learning

ANSWER SHEET FINAL EXAM MATH 111 SPRING 2009 (PRINT ABOVE IN LARGE CAPITALS) CIRCLE LECTURE HOUR 10AM 2PM FIRST NAME: (PRINT ABOVE IN CAPITALS)

6.00 Introduction to Computer Science and Programming, Fall 2008

MITOCW watch?v=ogo1gpxsuzu

MITOCW watch?v=4hrhg4euimo

CSSS/SOC/STAT 321 Case-Based Statistics I. Introduction to Probability

Introductory Statistics Day 25. Paired Means Test

Middle School Sunday School Lessons by. rfour.org

December 7-8, Christmas. Luke 1-2; Matthew 2. God Speaks to Us!

About Type I and Type II Errors: Examples

The following content is provided under a Creative Commons license. Your support

Introduction to Inference

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur

Before reading. Two peas in a pod. Preparation task. Stories Two peas in a pod

Philosophy 12 Study Guide #4 Ch. 2, Sections IV.iii VI

Family Studies Center Methods Workshop

April 18-19, BRAVE Journey: STORM. Matthew 14:22-33; Joshua 1:9 Adventure Bible (pp , 237) You were made for bravery.

MATH 1000 PROJECT IDEAS

The following content is provided under a Creative Commons license. Your support

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21

Velo News 1. August 1, In search of relevance, a Cat. 3 turns to EPO and HGH. By Matthew Beaudin

POLS 205 Political Science as a Social Science. Making Inferences from Samples

The Birthday Problem

Probability Foundations for Electrical Engineers Prof. Krishna Jagannathan Department of Electrical Engineering Indian Institute of Technology, Madras

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

Some basic statistical tools. ABDBM Ron Shamir

Now you know what a hypothesis is, and you also know that daddy-long-legs are not poisonous.

LIABILITY LITIGATION : NO. CV MRP (CWx) Videotaped Deposition of ROBERT TEMPLE, M.D.

CHAPTER 17: UNCERTAINTY AND RANDOM: WHEN IS CONCLUSION JUSTIFIED?

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 3

Content Area Variations of Academic Language

Kelly Rowland On The Most Iconic Moments In Her Career I Feel Like Destiny s Child Stood For Something - Kelly Rowland

Logic for Computer Science - Week 1 Introduction to Informal Logic

It is One Tailed F-test since the variance of treatment is expected to be large if the null hypothesis is rejected.

CHAPTER FIVE SAMPLING DISTRIBUTIONS, STATISTICAL INFERENCE, AND NULL HYPOTHESIS TESTING

Bayesian Probability

Interview with Cathy O Neil, author, Weapons of Math Destruction. For podcast release Monday, November 14, 2016

January 11-12, The Woman at the Well. John 4:1-42 (Pg Adv. Bible) Jesus Knows Everything About Us

September 14-15, Esther. Esther (Pg ); Jeremiah 29:11 (Pg.859) God Has Plans for Us

1 DAVID DAVIS. ANDREW MARR SHOW, 12 TH MARCH 2017 DAVID DAVIS, Secretary of State for Exiting the EU

December 7-8, Christmas. Luke 1-2 (Pg ); Matthew 2 (Pg ) God Speaks to Us!

Week 6 Elementary Large Group Script

Marcello Pagano [JOTTER WEEK 5 SAMPLING DISTRIBUTIONS ] Central Limit Theorem, Confidence Intervals and Hypothesis Testing

Is there a definition of stupidity?

Artificial Intelligence I

CHAPTER 16: IS SCIENCE LOGICAL?

Higher Consciousness Essentials Brad Yates 01 Be Yourself

Experimental Design. Introduction

Apologies: Julie Hedlund. ICANN Staff: Mary Wong Michelle DeSmyter

A Statistical Scientist Meets a Philosopher of Science: A Conversation between Sir David Cox and Deborah Mayo (as recorded, June, 2011)

What is the purpose of these activities?

Knowledge, Trade-Offs, and Tracking Truth

Have You Burned a Boat Lately? You Probably Need to

JOURNAL. Transcript of Phone Conversation Between Russell Berger and Steven Devor THE. May 2013

Running Head: INTERACTIONAL PROCESS RECORDING 1. Interactional Process Recording. Kristi R. Rittenhouse

Some trust in chariots and some in horses: can our use of transport show our trust in God?

MISSION COMMITTEE RESOURCE GUIDE

Excel Lesson 3 page 1 April 15

Classroom Voting Questions: Statistics

Paul s Second Missionary Journey

The Pros and Cons of Guilt

August 3-4, Moses and Red Sea. Exodus 5-15; Philippians 4:13. God rescues his family

I thought I should expand this population approach somewhat: P t = P0e is the equation which describes population growth.

Is it rational to have faith? Looking for new evidence, Good s Theorem, and Risk Aversion. Lara Buchak UC Berkeley

175 Chapter CHAPTER 23: Probability

The City School Syllabus Outline for Parents Class 6

Interviewed by Tori Zremski (Principal Interviewer), Grace Menter, and Sara Bradshaw

Loving the Voice of Your Inner Critic

Kindergarten-2nd. July 20-21, Joseph. God is always with us (in good times and bad times). Genesis 37-48

Stepping Out In Faith Trusting God s Promises

climate change in the american mind Americans Global Warming Beliefs and Attitudes in March 2012

April 11-12, BRAVE Journey: ADVENTURE. Luke 5:1-11; Joshua 1:9 Adventure Bible (pp. 1129, 237) God is calling you on an adventure.

From the Greek Oikos = House Ology = study of

MODULE 13: AWAKENED RELATIONSHIPS

Introduction Chapter 1 of Social Statistics

Lesson 1 Large Group Version

Biometrics Prof. Phalguni Gupta Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Lecture No.

Step Three. Made a decision to turn our will and our lives over to the care of this Power of our own understanding.

Interviewee: Kathleen McCarthy Interviewer: Alison White Date: 20 April 2015 Place: Charlestown, MA (Remote Interview) Transcriber: Alison White

STEP THREE WE MADE A DECISION TO TURN OUR WILL AND LIVES OVER TO THE CARE OF GOD AS WE UNDERSTOOD HIM

Betting With Sleeping Beauty

Part 1: The details (56 points. 2.0 pts each unless noted.)

Master Supplies List. Optional Supplies

Jesus in the Wilderness. January 10-11, Jesus showed us how to fight sin. Matthew 4:1-11, 1 John 4:19

OPENER SFX: Play high-energy music as kids enter. CG: WISDOM Theme Slide. Host chooses eight volunteers for a game.

Hanging Out With Jesus: What Does It Look Like To Be A Faithful Man Today?

OPENRULES. Tutorial. Determine Patient Therapy. Decision Model. Open Source Business Decision Management System. Release 6.0

Grade 7 Math Connects Suggested Course Outline for Schooling at Home 132 lessons

What we want students to do with what they ve learned: To identify what it means to pursue righteousness in their day- to- day lives.

2.1 Review. 2.2 Inference and justifications

Transcription:

Project: The Power of a Hypothesis Test Let s revisit the basics of hypothesis testing for a bit here, shall we? Any hypothesis test contains two mutually exclusive hypotheses, H 0 and H 1 (AKA, H A ). After conducting our hypothesis test, we will either (formally) reject H 0, or fail to reject H 0. We can see pretty quickly there are four outcomes possible here: H 0 TRUE H 0 FALSE Reject H 0 Bad! (Type I error) Good! Fail to Reject H 0 Good! Bad! (Type 2 Error) Here s another, less formal way of looking at it: The boy who said Wolf! Wolf! Everybody! There s a freaking WOLF here! But there wasn t. Way to go, buddy. (False Positive) The boy who said, Nah there s no wolf here. And there wasn t. So yeah. Back to work. (True Negative) The boy who said Wolf! Wolf! Everybody! There s a freaking WOLF here! And look at that! There is! Thanks, man! (True Positive) The boy who said, Nah there s no wolf here. But, there was. Yikes! (False Negative) Now, those four cells in the contingency table aren t all equally likely; if they were, a hypothesis test would be as precise as flipping a coin! No, our job in good hypothesis testing is to maximize the good boxes, while minimizing the bad (errors). In our course, we have spent a great deal of time discussing both types of error. The probability of a Type I error, also called the significance level of a test is often, by default, set to be 5%. Why? Tradition, mostly. There are many references (dating well back to the beginnings of formal study of inferential statistics) that mention 5% as a good starting point for significance (feel free to Google it up there are many, many references to it). We ve also discussed, in class, that you might not always want to operate at 95% confidence. This project, in part, will explore why. Part 1: The Relationship between Type I and Type 2 error. As you work through this section, go ahead and open the spreadsheet errors that accompanies this project. Familiarize yourself with a few aspects of this sheet (and hypothesis testing in general): The chance of a type 1 ( false positive ) error is referred to, symbolically, as. Its value is indicated by the area that s orange in the sheet. Similarly, the type 2 ( false negative ) rate is called β, and it s the blue area. (continued)

The Critical Value (I ve left off which distribution we re using, as what we re discussing in this assessment is pretty general) is decided upon at the outset of the hypothesis test, and represents the cutoff for beyond a reasonable doubt (that is, any test statistics that lands beyond this value triggers belief in the research hypothesis). 1 1. (2 points) Start by moving the critical value to the right and left (but leaving the sample size and difference in means alone). What do you notice about the areas that represent the errors? Circle the best phrase that completes the following sentence: As the chance of a Type 1 error increases, the chance of a Type 2 error (decreases / stays the same / increases). Cool! If you want to increase your confidence (without changing the parameters of your study) the ONLY way, mathematically, to do this is to accept a higher false negative rate! This actually leaches out into more general applications, all the time for example, if you ve ever had your car recalled by the dealer, but nothing was found to be wrong with it, you got a false positive and most likely, lots of others did, too. That s because a car manufacturer is deathly afraid of a false negative (i.e., leaving unsafe cars on the road), so they accept a higher at the risk of a lower. 2. (2 points) We ve been testing at 95% confidence all term. It s pretty industry standard. However, suppose you re planning a study when you really want to avoid a Type 2 error, but you can live comfortably with a Type 1 error. What should you do at the outset of the study? Lower the confidence / Raise the confidence 3. (2 points) Now, suppose you re planning a study where you need (dearly) to avoid a Type 1 error, but you can live with a Type 2. Now what should you do at the outset of the study? Lower the confidence / Raise the confidence 4. (2 points each) OK now, place your critical value somewhere to the right of zero so that you see visible areas for the Type 1 and Type 2 errors. Leave it there. Now, adjust the difference in means so that the research mean 2 moves away from, and then toward, the null mean. Then, complete each of the following statements by circling the appropriate phrase that completes them: As the difference between the null and research means increases, a) the chance of a Type 1 error increases / stays the same / decreases b) the chance of a Type 2 error increases / stays the same / decreases 1 In the spreadsheet you re looking at, I m running a right tailed test. The same argument, without loss of generality, holds for any direction of testing. 2 Remember you never know, for certain, what this is!

Part 2. Looking Deeper: A Case Study in the Difference of Means 3 Hopefully, the answer to your last question makes sense to you if there s a large difference in means, then you should be able to (correctly) see it more easily than if there s only a small difference. This hypothesized difference (if it exists) is called the effect size. And, as it turns out, sample size is inherently tied to effect size. Let s do some experiments to talk about how. Start by making sure your version of Excel is set up in iterative mode. Here s a video I made to do it in Excel 2010 (it s similar in newer versions): https://www.youtube.com/watch?v=zlxmicraxpo. You only need to watch that short video there are more that follow it, but they re for a different problem. Next, once you re all set up and ready to go, open the sheet 4coin1.xlsx. This is a spreadsheet designed to (Monte Carlo) model a coin flipping: Press the button. Each time you do, the coin flips. The graph you see is the progression of the empirical probability of heads ( empirical because it doesn t assume that the probability of heads is any given number it s going to demonstrate its probability through experimentation). It keeps track of the number of heads that appear, and the number of trials, and then divides to arrive at a probability. That s the location of the penny at the probability of heads 4. 5. (2 points) Go ahead and hold the down for a bit. When the coin is close enough for you to believe that the probability of heads has stabilized (and not deviating far from that percentage, in your opinion), stop, and write down how many trials it took you to believe it. Now, in that last one, you would have failed to reject Ho that is, you were (most likely) expecting 50% heads, and that s about what you got, right? Now, we re going to do three experiments where we DO get a rejection that is, we re going to see a rigged coin. Open the sheet 4coin2.xlsx. I ve set this sheet up to simulate a rigged (that is, NOT 50/50) coin. You re going to repeat the experiment you just did above but this time, write down how many trials it too until you felt that the coin wasn t fair. 3 Means, proportions basically, whatever statistics we re talking about, we re talking about center. 4 A HA! Another nonparametric demonstration! I Monte Carlo.

6. (2 points) Hold the down until you feel you ve seen enough trials to have spotted the unfair coin. How many trials did it take? Note: I m not asking you to hypothesize by how much it s rigged just tell me when you think it s demonstrated it beyond a reasonable doubt. Ready for another rigged coin? Open up 4coin3.xlsx. Yep rigged again. 7. (2 points) Hold the down until you feel you ve seen enough trials to have spotted the unfair coin. How many trials did it take? One last rigged coin, perhaps? Open up 4coin4.xlsx. 8. (2 points) Hold the down until you feel you ve seen enough trials to have spotted the unfair coin. How many trials did it take? Note: please bring these numbers to class next time so we can collate our results! So now you can see why you answered the way you did back at the end of part 1! The greater the deviation from expected, the easier it is to spot! In reality, of course, you won t know if you coin is rigged or not but you will be able to calculate how large a sample you would need to spot a deviation from null, if it indeed exists. The calculations are tedious, but thankfully, there s great software around that ll do them for you! Here s one of my faves, if you ever need one: http://www.gpower.hhu.de/en.html. 5 Part 3. So, what is Power, anyway? We ve spent a lot of time talking about error but what exactly is power? Quite simply, power is defined to be the complement of a Type 2 Error (in symbols, 1 - ). The more powerful a statistical test is, the more we believe that it can catch a difference in means (if such a difference exists). You ll often hear statistical tests critiqued because of their low power what people often mean is that the sample size was too small to allow for a small value of (and, hence, a large 1 - ). I ll let you all get into statistical fistfights with colleagues later over how large is large enough. For now, this document serves as an introduction to Power, measuring type 2 error, and its relationship to type 1 error. Part 4. What does it all mean to you? All of these experiments and demonstrations are great 6, but they don t address the real issue: as researchers, how would you least prefer to be wrong? You ll never know with 100% certainty that s the 5 Es ist Deutsch, so dass Sie wissen, dass es funktioniert. 6 Well, I think so, anyway.

problem. Making this decision, when you re starting out testing a set of hypotheses, helps you set your confidence and/or sample size. In all honesty, in you careers, you ll most likely deal with this rigorously when the time is right. But, I want you thinking about it, critically (not mathematically) right now. Here s who we ll do it: I ll describe a situation that s going to be studied using hypothesis testing methods we ve been learning about in class. Your job is to decide which type of error would be worse, and explain to me why. Let s look at an example: Example: a research company is designing a cholesterol reducing drug. As part of the design process, they re testing for possible side effects. One side effect is loss of appetite while using this drug. They ve created the hypotheses: H 0 : The drug does not cause loss of appetite. H 1 : The drug does cause loss of appetite. We ll never know with 100% certainty which of these will be true, so we aim to minimize the chance of the one we don t want. Here are two ways to answer which error type would be worse? Type 1 Error is worse: A false positive in this case would be saying (erroneously) that, yes, the drug does cause a loss of appetite when, actually, it doesn t. Why would this error type be worse? Well, maybe people would be hesitant to use it if it had this side effect (or, in this case, if they thought it had this side effect). Type 2 Error is worse: A false negative in this case would be saying (erroneously) that the drug does not cause a loss of appetite when, actually, it does. Why would this error type be worse? Suppose people think they re getting a cholesterol lowering drug with the added benefit of not negatively affecting appetite! Great! Except, well, it actually is negatively affecting appetite, and you won t know it until it s too late. So, you see, there isn t one correct answer either type of error could be bad, depending on your context. The important thing here is to get thinking about it. Let s try two of your own! For each of the following, tell me, in your opinion, which type of error would be worse and why. 9. (4 points) An outdoor goods manufacturer is testing its rock climbing harnesses for safety. Of particular concern is the belay loop, an integral part of the harness that (literally) holds the climber s life at one point. From the CE testing specs, I have learned that belay loops need to withstand a 15 kn force for 3 minutes (if it does, it s considered safe ). Therefore, the company tests the following hypotheses: H 0 : The harness will withstand a 15 kn force for 3 minutes (the harness is safe ). H 1 : The harness will not withstand a 15 kn force for 3 minutes (the harness is not safe ). 10. (4 points) A lab is working on a new drug to cut down on illegal blood doping in professional cycling. In particular, it will test for EPO (Erythropoietin) 7 is such a way that the test will flag a rider whose blood has higher-than-permissible levels of EPO. The lab sets the drug test to use the following hypotheses: H 0 : The test comes back negative (that is, it fails to detect EPO). H 1 : The test comes back positive (that is, it detects EPO). 7 You might remember it as one of the substances Lance Armstrong finally admitted to using when we won his Tours de France.