CHAPTER FIVE SAMPLING DISTRIBUTIONS, STATISTICAL INFERENCE, AND NULL HYPOTHESIS TESTING

Size: px
Start display at page:

Download "CHAPTER FIVE SAMPLING DISTRIBUTIONS, STATISTICAL INFERENCE, AND NULL HYPOTHESIS TESTING"

Transcription

1 CHAPTER FIVE SAMPLING DISTRIBUTIONS, STATISTICAL INFERENCE, AND NULL HYPOTHESIS TESTING OBJECTIVES To lay the groundwork for the procedures discussed in this book by examining the general theory of data analysis and describing specific concepts as they apply to confidence intervals, effect sizes and hypothesis tests. CONTENTS 5.1 BASIC CONCEPTS BEHIND CONFIDENCE INTERVALS, EFFECT SIZES, AND HYPOTHESIS TESTING 5.2 SAMPLING ERROR 5.3 SIMPLE EXAMPLES INVOLVING AUTHORITARIANISM AND VISUAL CUES TO MEMORY 5.4 SAMPLING DISTRIBUTIONS AND THE STANDARD ERROR 5.5 TEST STATISTICS AND THEIR SAMPLING DISTRIBUTIONS 5.6 MAKING DECISIONS ABOUT THE NULL HYPOTHESIS 5.7 TYPE I AND TYPE II ERRORS 5.8 ONE- AND TWO-TAILED TESTS 5.9 RETAINING OR REJECTING THE NULL HYPOTHESIS 5.1 SUMMARY 1

2 In Chapter 2 we examined a number of different statistics and saw how they might be used to describe a set of data or to represent the frequency of the occurrence of some event. Although the description of the data is important and fundamental to any analysis, it is not sufficient to answer many of the most interesting problems we encounter. In a typical experiment, we might treat one group of people in a special way and wish to see whether their scores differ from the scores of people in general. Or we might offer a treatment to one group but not to a control group and wish to compare the means of the two groups on some variable. Descriptive statistics will not tell us, for example, whether the difference between a sample mean and a hypothetical population mean, or the difference between two obtained sample means, is small enough to be explained by chance alone or whether it could represent a true difference that might be attributable to the effect of our experimental treatment(s). Nor will they tell us if such a difference in meaningful, how much in error we could be in our estimates, and how this study might fit with other studies that have been conducted. The research paper that derives from an experiment must address each of these questions, and to do so requires an understanding of a number of different ways of examining data. 5.1 BASIC CONCEPTS BEHIND CONFIDENCE INTERVALS, EFFECT SIZES, AND HYPOTHESIS TESTING Traditionally, psychology and the behavioral sciences in general have focused on what is generally referred to as hypothesis testing, or, in more current terminology, Null Hypothesis Significance Tests (NHST). Not too long ago it was possible to write a chapter, and even a whole book, focusing almost exclusively on hypothesis testing. (I have done just that, and so have most other authors.) There we would be interested in answering questions such as Are the mean scores for these two groups sufficiently different to lead us to conclude, perhaps erroneously, that different treatments produce different results. (And notice that my question focused almost exclusively on the mean, ignoring other statistics. In some cases we focused on the correlation coefficient, ignoring other interesting possibilities.) Fortunately, we have expanded the kinds of questions we ask and the statistics that they produce. But each of the questions, or ways of evaluating data, depend on the same underlying concepts most importantly, they depend on what we will refer to as sampling distributions, one- and two-tailed tests, Type I and Type II errors, and the logic of hypothesis testing, each of which will be defined as we go along. Even for those who don't particularly approve of hypothesis testing itself, the underlying concepts are critically important. 2

3 Before I launch into a discussion of the whole issue surrounding hypothesis tests, their associated probability values (p), confidence intervals, and effect sizes, I need to lay out the basic concepts that are involved in a discussion of each of those. When I have explained each of the basic concepts, I will come back to the issue of null hypothesis significant testing and explain where it came from, what people have had to say about it, and why it gets people so excited. But I will also point to other statistics that we can use along with hypothesis testing to understand what our data have to say. Those alternative statistics will be elaborated much more in the next chapter. But before going off in that direction, we need a good understanding of the basic material. 5.2 SAMPLING ERROR One of the most basic concepts in statistics is what statisticians call sampling error. It lies at the heart of all statistical procedures. Sampling error refers to the variability of some observation or statistic from one sample to another. In Standard English we usually use the word error to refer to some kind of a mistake. That is not what we mean here. We simply mean random variability. In Chapter 3 we considered the distribution of Total Behavior Problem scores from Achenbach s Youth Self-Report form. Total Behavior Problem scores are nearly normally distributed in the population, with a population mean (μ) of 5 and a population standard deviation (σ) of 1. We know that different children show different levels of problem behaviors and therefore have different scores. We also know that if we took a sample of children, their scores would probably not equal exactly 5. One child might have a score of 49, while a second might have a score of 55. The actual scores would depend on the particular children who happened to be included in the sample. If we then go further and calculate the means of two samples of children, we would also expect those means to differ due to sampling error. (They might also differ because of real differences due to some treatment effect, but that is not what we are talking about here. Here we are just referring to the part that represents random variability.) One mean might be 47.4 and another might be This expected variability from sample to sample is what is meant when we speak of variability due to chance or error variance, or sampling error. The phrase refers to the fact that statistics (in this case, means) obtained from samples naturally vary from one sample to another. We need to understand sampling error if we are to evaluate how different groups respond to some experimental treatment, or how much confidence we can place on a statistic that we just computed. Sampling error is fundamental to the calculation of what we will call p values, 3

4 of confidence intervals, and of effect sizes, all of which will be defined in this chapter. You cannot ignore it. In examining sampling error, and in any statistical procedures which follow, we will be particularly interested in sampling distributions, which refer to the distributions of scores, or more often, statistics like the mean, and their associated sampling error. Such distributions tell us what kind of variability we can expect in sample means, for example, from one experiment to another. In other words, they plot sampling error. If we want to make meaningful estimates of population means, we need to have confidence in the stability in the sample means on which we base those estimates. Suppose that we have a sample mean of 68, and suppose that we can reasonably estimate that if we ran the sample experiment again we would likely have a new sample mean of somewhere between 66 and 7. That looks as if we have a solid basis for concluding that the population mean is probably somewhere in the upper 6s. However if we think that if we reran the experiment the new sample mean would be somewhere between 52 and 84, we would be much more cautious in our estimate of the true population mean. It is just this variability of sample means from one sample to another that we mean by the sampling distribution of a statistic. And although I have used the sample mean(s) as the statistic of interest in this paragraph, I could just as well have spoken about the sampling distribution of a variance, a correlation coefficient, or a test statistic such as t or F. Every statistic has its own sampling distribution. 5.3 SIMPLE EXAMPLES INVOLVING AUTHORITARIANISM AND VISUAL CUES TO MEMORY I want to begin with some examples that illustrate the issues we face. Roets, Au, & Van Hiel (215) examined the relationship between authoritarianism and attitudes toward out-groups. Many studies have found a negative relationship between these variables, with people high in authoritarianism tending to view minorities as bad, immoral, and deviant. (Gee Donald Trump comes to mind!) However, the government of Singapore has had a long history of promoting multiculturalism. They have forced people from different cultures to live in the same neighborhoods, and taken other measures to blend their communities. Roets and Van Hiel wondered if the approach taken by Singapore, which they refer to as the potential institutionalized intergroup ideology, imposed on the people of Singapore would potentially alter this relationship between authoritarianism and out-groups. They examined two quite different groups. For a Belgian 4

5 group of 245 students, the correlation between authoritarianism and a measure of multicultural acceptance was -.28, which was in line with a large body of research. Correlations measure the degree of relationship between two or more variables. For Belgian students, the higher one s authoritarianism score, the more negative one s attitude about minorities. But for a group of 249 students from Singapore, this same relationship was positive, with a correlation of.26. Would we have expected different groups to have such different results if Singapore s approach really has no effect? Is this difference in correlations between two groups with quite different backgrounds toward multiculturalism large enough to indicate a real difference between the two groups and the influence of government policy? What can we say about the potential stability of this difference? Does it represent an important result of Singapore s efforts, or is it minor effect that we can largely shove aside? Those are the important questions to be answered. Also note that the authors worked with college students. Perhaps future work might focus on a different age group. The point is that the study shouldn t end here it is part of a body of research that should be pursued. Two correlation coefficients do not exhaust the area of study. Too often we present a significant result, imply that we have answered the question, and then move on to something else. Now consider a second study. We all know how difficult it sometimes is to remember to do something e.g. call mom and wish her a happy birthday. Rogers & Milkman (216) hypothesized that if you can link a distinctive visual cue to the intention to call, subsequently noticing that cue will facilitate calling. You want to remember to phone your mother when you get home to wish her a happy birthday. First, think about the bottle of milk that you accidently left out on the kitchen counter when you set off to class or work, and associate that with the phone call. Seeing that bottle of sour milk when you return home should remind you to make the call 1. They found that of those who were instructed to form such an association, 29/39 = 74% performed the behavior. Of those who were not instructed to form an association, only 16/38 = 42% performed the behavior. We will want to have some way to decide whether the difference between 74% and 42% can be explained away by normal sampling error, in which case having such cues doesn't seem to help. Alternatively, if the difference is sufficiently large that we can not attribute it solely to sampling 1 I would not suggest that you tell your mother that seeing sour milk prompted your call to her. 5

6 error, then we have evidence that such cues do help and Rogers and Milkman are onto something important. Although the statistical calculations required to answer this question are different from those used to answer the one concerning the correlation between authoritarianism and attitudes toward outgroups, the underlying logic is fundamentally the same. We need to be explicit about what the problem is here. The reason for understanding sampling distributions, and from that to calculating confidence intervals, effect sizes and hypothesis tests is that data are ambiguous. When we collect data on attitude toward minorities, for example, the data will vary from occasion to occasion, depending on who happens to be included in our sample. Similarly for data on memory for important tasks. But how large a difference do we need to lead us to conclude that something meaningful is going on? How do we try to assess the importance of that difference? How sure are we that we have estimated it reliably? Those are the problems we are beginning to explore, and those are the subjects of this chapter and the rest of the book. 5.4 SAMPLING DISTRIBUTIONS AND THE STANDARD ERROR As I have said, the most basic concept underlying all statistical procedures is the sampling distribution of a statistic and its associated sampling error. It is fair to say that if we did not have sampling distributions, we would not have any confidence limits, statistical tests, and other important measures. Sampling distributions tell us what values we might (or might not) expect to obtain for a particular statistic under a set of predefined conditions (e.g., what the differences between our two samples might be expected to be if the true means of the populations from which those samples came are equal.) In addition, the standard deviation of that distribution of differences between sample means (known as the standard error of the distribution) reflects the variability that we would expect to find in the values of that statistic (in this case, differences between means) over repeated trials. Sampling distributions and their standard error provide the opportunity to evaluate the likelihood (given the value of a sample statistic) that such predefined conditions actually exists. Sampling distributions are almost always derived mathematically, but it is easier to understand what they represent if we consider how they could, in theory, be derived empirically with a simple 6

7 sampling experiment. (In several places in this book I will refer to the fact that with the computing power we have available today, we can answer more and more questions by repeated sampling instead of by solving equations. Statistical procedures really do change over time.) We ll begin with the sampling distribution of the mean of a single group. We can then move on to the sampling distribution of the differences between means. The sampling distribution of the mean is the distribution of means of an infinite number of random samples drawn from one population. Suppose we have a population with a known mean and standard deviation. (Here we will suppose that the population mean is 35 and the population standard deviation is 15, though what the values are is not critical to the logic of our argument. In the general case we rarely know the population standard deviation, but for our example suppose that we do.) Further suppose that we draw a very large number (theoretically an infinite number, but I drew 1,) of random samples from this population, each sample consisting of 1 scores. (In this example, for each of the 1, samples the R code shown below drew N = 1 observations from a normally distributed population with a mean of 35 and a standard deviation of 15. (I could have sampled from a population that is not normally distributed, but I wanted to keep this example uncomplicated.) I then repeated that process 9,999 more times and stored away all of those 1, sample means. (You might profitably repeat this procedure using a larger or smaller size of each sample (e.g., 3 or 2), looking to see how the difference influences the resulting sampling distribution.) When I finished drawing the samples, I plotted the distribution of the means. The histogram of this distribution is shown on the left of Figure 5.1, with the Q-Q plot on the right. The code for doing this in R follows. R Code # Sampling distribution shown in in Figure 5.1 nreps <- 1 # Number of replications n <- 1 # Size of individual samples xbar <- numeric(nreps) # Variable to store mean differences par(mfrow = c(2,1)) # Set up the graphics display for (i in 1:nreps) { sample <- rnorm(n = n, mean = 35, sd = 15) xbar[i] <- mean(sample) } # xbar now holds 1, elements Mean <- round(mean(xbar), digits = 2) StDev <- round(sd(xbar), digits = 2) 7

8 cat("the mean of the means is \n", Mean, '\n') cat("the standard deviation of mean is \n",stdev, '\n') hist(xbar, breaks = 5, main = "Distribution of Means", xlab = "Mean") legend(29,5, paste("mean = ", Mean), bty = "n" ) legend(29, 4, paste("st.dev = ",StDev), bty = "n") qqnorm(xbar, main = "Q-Q Plot for Distribution \n of Sample Means", xlab = "Obtained quantiles", ylab = "Expected quantiles") qqline(xbar) Figure 5.1 Distribution of sample means, each based on 1, samples of N = 1 I don't think that there is much doubt that this distribution is normally distributed. The Q-Q plot clearly tells us that it is. The center of this distribution is at 34.99, which is almost exactly the population mean. We can see from the figure on the left that sample means between 32 and 38, for example, are quite likely to occur when we sample from this population. We also can see that it is extremely unlikely that we would draw samples from this population with means of 4 or more. The fact that we know the kinds of values to expect for the difference of means of samples drawn from this one population is going to allow us to turn the question around and ask whether an 8

9 obtained sample mean can be taken as evidence in favor of the hypothesis that we actually are sampling from this population. In addition to authoritarianism and attitudes toward out-groups, and memory for future activities, we will add a third example, which is one to which we can all relate. It involves those annoying people who spend what seems to us an unreasonable amount of time vacating the parking space we are waiting for. Ruback and Juieng (1997) ran a simple study in which they divided drivers into two groups of 1 participants each those who had someone waiting for their space and those who did not. They then recorded the amount of time that it took the driver to leave the parking space. For those drivers who had no one waiting, it took an average of seconds to leave the space. For those who did have someone waiting, it took an average of 39.3 seconds. The average standard deviation of waiting times within these two groups was 14.6 seconds. Notice that a driver took 6.88 seconds (or nearly a full standard deviation) longer to leave a space when someone was waiting for it. (If you think about it, 6.88 seconds is a long time if you are the person doing the waiting.) Here we have a case where they have two means and we want to know about the sampling distribution of the difference between two means. Using a program similar to the one above, I drew 1, samples from two identical populations. The population means were set at 35.6 seconds (the average of the two group means). The standard deviation was set at 14.6 (the common standard deviation of the two groups). Because I was sampling from identical populations, they have the same population mean and standard deviation. The differences between these means are plotted in Figure 5.2. Remember that this is a distribution created by drawing from a case where the hypothesis of equal population means is true both population means are

10 Figure 5.2 Distribution of differences between means. Ruback and Juieng (1997) found a difference of 6.88 seconds in leaving times between the two conditions. It is quite clear from Figure 5.2 that this is very unlikely to have occurred if the true population means were equal. In fact, my sampling study only found 6 cases out of 1, when the mean difference was more extreme than 6.88, for a probability of.6. We will certainly feel justified in concluding that people take longer to leave their space, for whatever reason, when someone is waiting for it. We have just run our first null hypothesis test. You should now have a good understanding of three important concepts. There is sampling error, which is random variability from one sample to another, either in terms of individual observations or in terms of a statistic, such as the mean. There is the sampling distribution, which is just the distribution of, for example, a sample mean, or a sample mean difference, when means are repeatedly drawn from some population. And there is the standard error, which is the standard deviation of the corresponding sampling distribution. Figure 5.2 illustrates a sampling distribution of mean differences, and the variability within that distribution is sampling error. The standard deviation of that distribution is 1.5, which is the standard error of the mean. THE ROLE OF SAMPLING DISTRIBUTIONS AND STANDARD ERRORS 1

11 The reason that we need the concepts of sampling distributions and standard errors is that we use them to calculate measures that will help us better understand our data. The initial impetus came from the idea of testing an hypothesis, which, in the case of the Ruback and Juieng study, posited that both a population of drivers who had someone waiting and a population of drivers who had no one waiting, would have identical means. I will begin with hypothesis testing because it lies at the heart of what we have been doing for many years, but the field and coverage here has moved well beyond that point to include confidence intervals and effect sizes, which will be defined shortly. 5.5 TEST STATISTICS AND THEIR SAMPLING DISTRIBUTIONS Although I have not used the term, what we did in the previous example was to reject the null hypothesis. We said that if the null hypothesis (equal population means) were true, we would almost never find the difference we observed. So we rejected that hypothesis in favor of one that said that the population means were not equal. If we had, instead, found a sample mean difference of.5 seconds in our sample means, we would have not rejected the null hypothesis of equal population means. (Note where.5 would fall in Figure 5.2.) Knowing what the terms rejection and non-rejection mean is all well and good. But how do we get to that point? What do we do with our data to come up with a probability value such as we found in this example? In the not too distant future, we may well do what we did here, which is to draw a huge number of samples from equal populations. But the far more traditional approach is to run a statistical test, compute a test statistic, and evaluate that statistic. We have been discussing the sampling distribution of the mean, but the discussion would have been essentially the same had we dealt instead with the median, the variance, the range, the correlation coefficient (as in our authoritarianism example), proportions (as in our calling mom example), or any other statistic you care to consider. (Technically the shapes of these distributions would be different, but I am deliberately ignoring such issues in this chapter.) The statistics just mentioned usually are referred to as sample statistics because they describe characteristics of samples. There is a whole different class of statistics called test statistics, which are associated with specific statistical procedures and which have their own sampling distributions. Test statistics are statistics such as t, F, and χ 2, which you have probably run across in the past. (If you are not familiar with them, don't worry we will consider them separately in later chapters.) This is not the place to go into a detailed explanation of 11

12 any test statistic, but it is the place to point out that the sampling distributions for test statistics are obtained and used in essentially the same way as the sampling distribution of the mean. As an illustration, consider the sampling distribution of the statistic t, which will be discussed in Chapter 6. For those who are not familiar with the t test, it is sufficient to say that the t test is often used, among other things, to examine whether two samples were drawn from populations with the same means. Let µ 1 and µ 2 represent the means of the populations from which the two samples were drawn. The null hypothesis is the hypothesis that the two population means are equal, in other words, H : µ 1 = µ 2 (or µ 1 µ 2 = ). (This is what we had in the previous example.) If we wished, we could empirically obtain the sampling distribution of t when H is true by drawing an infinite number of pairs of samples, all from two identical populations, calculating t for each pair of samples (by methods to be discussed later), and plotting the resulting values of t. In that case H must be true because we forced it to be true by drawing the samples from identical populations. The resulting distribution is the sampling distribution of t when H is true. If we later had two samples that produced a particular value of t, we would evaluate the null hypothesis by comparing our obtained t to the sampling distribution of t. We would reject the null hypothesis in favor of our research (alternative) hypothesis if our obtained t did not look like the kinds of t values that the sampling distribution told us to expect when the null hypothesis is true. I could rewrite the preceding paragraph, substituting χ 2, or F, or any other test statistic in place of t, with only minor changes dealing with how the statistic is calculated. Thus, you can see that all sampling distributions can be obtained in basically the same way (calculate and plot an infinite number of statistics by sampling from identical populations). At the moment we won't actually draw all of those samples and compute the relevant test statistic, but we could do it that way. 5.6 MAKING DECISIONS ABOUT THE NULL HYPOTHESIS Figure 5.2 included a test of a null hypothesis concerning the time it takes to leave a parking space. You should recall that we first drew pairs of samples from a population with a mean of 35.6 and a standard deviation of Then we calculated the differences between pairs of means in each of 1, replications and plotted those. Then we discovered that under those conditions a difference 12

13 as large as the one that Ruback and Juieng found (6.88) would happen only about 6 times out of 1, trials, for a probability of.6. That is such an unlikely finding that we concluded that our two means did not come from populations with the same mean. That is a nice straightforward example of how we carry out a statistical test. At this point we have to become involved in the decision-making aspects of hypothesis testing. We must decide whether an event with a probability of.6 is sufficiently unlikely to cause us to reject H. Here we traditionally fall back on arbitrary conventions that have been established (perhaps too rigidly) over the years. The rationale, or lack thereof, for these conventions will become clearer as we go along, but for the time being keep in mind that they are merely conventions, and many people object to such conventions 2. One convention calls for rejecting H if the probability under H is less than or equal to.5 (p.5), while another convention one that is more conservative with respect to the probability (p) of rejecting H calls for rejecting H whenever the probability under H is less than or equal to.1. These values of.5 and.1 are often referred to as p values and represent the rejection level, or the significance level, of the test. (When we say that a difference is statistically significant at the.5 level, we mean that a difference that large would occur less than 5% of the time if the null were true.) Whenever the probability obtained under H is less than or equal to our predetermined significance level, we will reject H. Another way of stating this is to say that any outcome whose probability under H is less than or equal to the significance level falls in the rejection region, since such an outcome leads us to reject H. The phrase p value has almost come to be a derogatory term for those who object to null hypothesis testing, but it has played, and continues to play, an important role in statistics. Don't underestimate its importance. For the purpose of setting a standard level of rejection for this book, we will generally use the p >.5 level of statistical significance, keeping in mind that some people would consider this level to be 2 Cortina and Landis (211) point out that such conventions do have an important advantage. They take away my role as the experimenter in deciding whether p =.8 is close enough for rejecting H and substitute a standard (often p <.5) that has been more-or-less set by the research community. It helps keep me honest. 13

14 too lenient. But we will not simply report that such a difference is significant at p <.5 and walk away. There is much more that we need to do, and we need to think carefully about what our ultimate conclusion will be. For our particular example we have obtained a probability of p =.6, and it is clearly are less than.5. We will probably conclude that we have reasonable evidence to decide that the scores for the two conditions were drawn from populations with different means. But then, as researchers in the behavioral sciences, we should look to build on that result in future research. Ruback and Juieng included two additional studies in their paper that helped to confirm the results that I have given, which is important added information about the general conclusions of this paper. In a more casual replication of this study, McKenzie (29) ( reported similar results. The original study is often cited in discussions of territoriality. 5.7 TYPE I AND TYPE II ERRORS At this point you should have a reasonable understanding of what we mean by a null hypothesis and methods we have at our disposal to retain or reject that hypothesis. But there are additional statistical issues that come into play. Whenever we reach a decision with a statistical test, there is always a chance that our decision is the wrong one. While this is true of almost all decisions, statistical or otherwise, the statistician has one point in her favor that other decision makers normally lack. She not only makes a decision by some rational process, but she can also specify the conditional probabilities of a decision s being in error. In everyday life we make decisions with only subjective feelings about what is probably the right choice. The statistician, however, can state quite precisely her estimate of the probability that she would make an erroneously rejection of H if it were true. This ability to specify the probability of erroneously rejecting a true H follows directly from the logic of hypothesis testing. (You will soon see me back off from the statement that she can specify that probability precisely, but under the general interpretation of hypothesis testing, we operate as if that is the case.) Consider the parking lot example again, this time ignoring the difference in means that Ruback and Juieng found. The situation is diagrammed in Figure 5.3, in which the distribution is the distribution of differences in sample means when the null hypothesis is true, and the shaded 14

15 portion represents the upper 5% of the distribution. The actual score that cuts off the highest 5% is called the critical value. Critical values are those values of X (the dependent variable or the test statistic) that describe the boundary or boundaries of the rejection region(s). For this particular example the critical value is Figure 5.3 Upper 5% of differences in means Assume that we have a decision rule that says to reject H whenever an outcome falls in the highest 5% of the distribution. This is the rejection level of the test. We will reject H whenever the difference in means falls in the shaded area; that is, whenever a difference as high as the one we found has a probability of.5 or less of coming from the situation where the population means are equal. (Here I have represented that probability with the Greek letter α (alpha), which is the traditional notation.) Yet by the very nature of our procedure, 5% of the differences in means, when being in the presence of a waiting car has no effect on the time to leave, will themselves fall in the shaded portion. Thus if we actually have a situation where the null hypothesis of no mean difference is true, we stand a 5% chance of an obtained sample mean difference being in the shaded tail of the distribution, causing us erroneously to reject the null hypothesis. This kind of error (rejecting H when in fact it is true) is called a Type I error, and its conditional probability (the probability of rejecting the null hypothesis given that it is true) is α, the size of the rejection region. In the future, whenever we represent a probability by α, we will be referring to the probability of a Type I error erroneously rejecting the null hypothesis. Keep in mind the conditional nature of the probability of a Type I error. This means that you should be sure you understand that when we speak of a Type I error we mean the probability of 15

16 rejecting H given that it is true. We are not saying that we will reject H on 5% of the hypotheses we test. We would hope to run experiments on important and meaningful variables and, therefore, to reject H often. But when we speak of a Type I error, we are speaking only about erroneously rejecting H in those situations in which the null hypothesis happens to be true. You might feel that a 5% chance of making an error is too great a risk to take and suggest that we make our criterion much more stringent, by rejecting, for example, only the lowest 1% of the distribution. This procedure is perfectly legitimate, but realize that the more stringent you make your criterion, the more likely you are to make another kind of error failing to reject H when it is in fact false and H 1 is true. This type of error is called a Type II error, and its probability is symbolized by β (beta). The major difficulty in terms of Type II errors stems from the fact that if H is false, we almost never know what the true distribution (the distribution under H 1 ) would look like for the population from which our data came. In other words, we never know exactly how false the null hypothesis is. We know only the distribution of scores under H. Put in the present context, we know the distribution of differences in means when having someone waiting for a parking space makes no difference in response time, but we don't know what the difference would be if waiting did make a difference. This situation is illustrated in Figure 5.4, in which the distribution labeled H represents the distribution of mean differences when the null hypothesis is true, the distribution labeled H 1 represents our hypothetical distribution of differences when the null hypothesis is false, and the alternative hypothesis ( H 1 ) is true. Remember that the distribution for H1 is only hypothetical. We really do not know the location of that distribution, other than that it is higher (greater differences) than the distribution of H. (I have arbitrarily drawn that distribution so that its mean is 2 units above the mean under H.) 16

17 Figure 5.4 Distribution of mean differences under H and H 1 As I have said, the darkly shaded portion in the top half of Figure 5.4 represents the rejection region. Any observation falling in that area (i.e., to the right of about 3.5) would lead to rejection of the null hypothesis. If the null hypothesis is true, we know that our observation will fall in this area 5% of the time. Thus, we will make a Type I error 5% of the time. The distribution labeled ( H 1 ) represents the expected distribution of sample means if the two population means differ by two seconds. (And remember that this distribution would be displaced 17

18 left or right if I had chosen a different mean difference for population means.) The cross-hatched portion in the bottom half of Figure 5.4 represents the probability (β) of a Type II error. This is the situation in which having someone waiting does make a difference in leaving time, but the mean value is not sufficiently high to cause us to reject H. In the particular situation illustrated in Figure 5.4, where I made up the mean and variance, we can in fact calculate β by using the normal distribution to calculate the probability of obtaining a score greater than 3.5 (the critical value) if μ = 35 and σ = 15 for each condition. The actual calculation is not important for your understanding of β, because this chapter was designed specifically to avoid calculation. I will simply state that this probability (i.e., the area labeled β) is.76. Thus for this example, 76% of the occasions when waiting times (in the population) actually differ by 3.5 seconds (i.e., H1 is actually true), we will make a Type II error by failing to reject H when it is false. From Figure 5.4 you can see that if we were to reduce the level of α (the probability of a Type I error) from.5 to.1 by moving the rejection region to the left, it would reduce the probability of Type I errors but would increase the probability of Type II errors. Setting α at.1 would mean that β =.92. Obviously there is room for debate over what level of significance to use. The decision rests primarily on your opinion concerning the relative importance of Type I and Type II errors for the kind of study you are conducting. If it were important to avoid Type I errors (such as falsely claiming that the average driver is rude), then you would set a stringent (i.e., small) level of α. If, on the other hand, you want to avoid Type II errors (patting everyone on the head for being polite when actually they are not), you might set a fairly high level of α. (Setting α =.2 in this example would reduce β to.46.) Unfortunately, in practice most of us choose an arbitrary level of α, such as.5 or.1, and simply ignore β. In many cases this may be all you can do. (In fact you will probably use the alpha level that your instructor recommends.) In other cases, however, there is much more you can do, as you will see in Chapter 8. I should stress again that Figure 5.4 is purely hypothetical. I was able to draw the figure only because I arbitrarily decided that the population means differed by 2 units, and the standard deviation of each population was 15. The answers would be different if I had chosen to draw it with a mean difference of 2.5 and/or a standard deviation of 1. In most everyday situations we do not know the mean and the variance of that distribution and can make only educated guesses, thus 18

19 providing only crude estimates of β. On occasion, and this is especially true in medical research, we can select a value of μ under H1 that represents the minimum difference we would like to be able to detect, since larger differences will have even smaller βs. In this situation we don't care if a drug, for example, makes a very small difference that is of no practical importance. We want to only look for differences that make a meaningful difference. From this discussion of Type I and Type II errors we can summarize the decision-making process with a simple table. Table 5.1 presents the four possible outcomes of an experiment. The items in this table should be self-explanatory, but the one concept that we have not discussed is power. The power of a test is the probability of rejecting H when it is actually false. Because the probability of failing to reject a false H is β, then power must equal 1 - β. I will discuss power and its calculation in Chapter 8. Table 5.1 Possible outcomes of the decision-making process True State of the World Decision H True H False Reject H Type I error p = α Correct decision p = 1 - β = Power Don't reject H Correct decision p = 1 - α Type II error p = β 5.8 ONE- AND TWO-TAILED TESTS We have one more concept to cover and then we can move on. The preceding discussion brings us to a consideration of one- and two-tailed tests. In our parking lot example we were concerned if people took longer when there was someone waiting, and we decided to reject H only if those drivers took longer. In fact, I chose that approach simply to make the example clearer. However, suppose our drivers were really very thoughtful and left seconds sooner when someone was 19

20 waiting. Although this is an extremely unlikely event to observe if the null hypothesis is true, it would not fall in the rejection region, which consisted solely of long times. As a result we find ourselves in the position of not rejecting H in the face of a piece of data that is very unlikely, but not in the direction expected. The question then arises as to how we can protect ourselves against this type of situation (if protection is thought necessary). One answer is to specify before we run the experiment that we are going to reject a given percentage (say 5%) of the extreme outcomes, both those that are extremely high and those that are extremely low. But if we reject the lowest 5% and the highest 5%, then we would in fact reject H a total of 1% of the time when it is actually true, that is, α =.1. That is not going to work because we are rarely willing to work with α as high as.1 and prefer to see it set no higher than.5. The way to accomplish this is to reject the lowest 2.5% and the highest 2.5%, making a total of 5%. The situation in which we reject H for only the lowest (or only the highest) mean differences is referred to as a one-tailed, or directional, test. We make a prediction of the direction in which the individual will differ from the mean and our rejection region is located in only one tail of the distribution. When we reject extremes in both tails, we have what is called a two-tailed, or nondirectional, test. It is important to keep in mind that while we gain something with a twotailed test (the ability to reject the null hypothesis for extreme scores in either direction), we also lose something. A score that would fall in the 5% rejection region of a one-tailed test may not fall in the rejection region of the corresponding two-tailed test, because now we reject only 2.5% in each tail. In the parking example I chose a one-tailed test because it simplified the example. But that is not a rational way of making such a choice for an actual experiment. In many situations we do not know which tail of the distribution is important (or both are), and we need to guard against extremes in either tail. The situation might arise when we are considering a campaign to persuade children not to start smoking. We might find that the campaign leads to a decrease in the incidence of smoking. Or, we might find that campaigns run by adults to persuade children not to smoke simply make smoking more attractive and exciting, leading to an increase in the number of children smoking. In either case we would want to reject H. 2

21 In general, two-tailed tests are far more common than one-tailed tests for several reasons. First, the investigator may have no idea what the data will look like and therefore has to be prepared for any eventuality. Although this situation is rare, it does occur in some exploratory work. Moreover, a number of people have suggested that when you are trying to replicate an experiment that you or someone else has already run, the original experiment should be evaluated with a two-tailed test, but the replication should be a one-tailed test because you now have a direction in mind. Another common reason for preferring two-tailed tests is that the investigators are reasonably sure the data will come out one way but want to cover themselves in the event that they are wrong. This type of situation arises more often than you might think. (Carefully formed hypotheses have an annoying habit of being phrased in the wrong direction, for reasons that seem so obvious after the event.) The smoking example is a case in point, where there is some evidence that poorly contrived antismoking campaigns actually do more harm than good. A frequent question that arises when the data may come out the other way around is, Why not plan to run a one-tailed test and then, if the data come out the other way, just change the test to a two-tailed test? This kind of approach just won't work. If you start an experiment with the extreme 5% of the left-hand tail as your rejection region and then turn around and reject any outcome that happens to fall in the extreme 2.5% of the right-hand tail, you are working at the 7.5% level. In that situation you will reject 5% of the outcomes in one direction (assuming that the data fall in the desired tail), and you are willing also to reject 2.5% of the outcomes in the other direction (when the data are in the unexpected direction). There is no denying that 5% + 2.5% = 7.5%. To put it another way, would you be willing to flip a coin for an ice cream cone if I have chosen heads but also reserved the right to switch to tails after I see how the coin lands? Or would you think it fair of me to shout, Two out of three! when the coin toss comes up in your favor? (I used to do that all the time when I was a child, and I often got away with it. I guess that my playmates were not statisticians.) You would object to both of these strategies, and you should. By the same logic, the choice between a one-tailed test and a two-tailed one is made before the data are collected. It is also one of the reasons that two-tailed tests are usually chosen. A third reason for two-tailed tests concerns cases where we can't really define a one-tailed test. One example is the case when we have more than two groups. We will consider this situation at length 21

22 when we discuss the analysis of variance. When we have more than two groups a one-tailed test is pretty much undefined, and we will actually have a multi-tailed test. And when we come to the chisquare test in Chapter 7, the way that the test statistic is defined precludes the idea of a one-tailed test unless we engage in additional steps, which I would usually not suggest. Although the preceding discussion argues in favor of two-tailed tests, and although in this book we generally confine ourselves to such procedures, there are no hard-and-fast rules. The final decision depends on what you already know about the relative severity of different kinds of errors. It is important to keep in mind that with respect to a given tail of a distribution, the difference between a one-tailed test and a two-tailed test is that the latter just uses a different cutoff. A two-tailed test at α =.5 is more liberal than a one-tailed test at α =.1. If you have a sound grasp of the logic of testing hypotheses by use of sampling distributions, the remainder of this course will be relatively simple. For any new statistic you encounter, you will need to ask only two basic questions: 1. How and with which assumptions is the statistic calculated? 2. What does the statistic s sampling distribution look like under H? If you know the answers to these two questions, your test is accomplished by calculating the test statistic for the data at hand and comparing the statistic to the sampling distribution. Because the relevant sampling distributions are tabled in the appendices, or are even available on your cell phone, all you really need to know is which test is appropriate for a particular situation and how to calculate its test statistic. (Of course there is far more to statistics than just hypothesis testing, so perhaps I m doing a bit of overselling here. There is a great deal to understanding the field of statistics beyond how to calculate, and evaluate, a specific statistical test. Calculation is the easy part, especially with modern computer software.) 5.9 RETAINING OR REJECTING THE NULL HYPOTHESIS As I have tried to make clear, one of the major goals for behavioral scientists is to evaluate the null hypothesis. No matter your view on using statistical tests to draw conclusions about the variables under study, almost every research paper retains that focus. There is much more to do, but this seems to be the first step. And to understand that, you really need to understand what the fuss is 22

23 about. I raise the issue here because it applies to almost everything that follows in the book. It is not limited to a few statistical procedures. NHST DID THE EXPERIMENT WORK? Ever since the fights in the 192 s and 193's between Sir Ronald Fisher, on the one hand, and Neyman and Pearson, on the other, statistics has been involved in one way or another with a very messy issue called "hypothesis testing." As I said, the more current name for this debate is "null hypothesis significance testing (NHST). What we have today is an amalgam of both sets of ideas, and it is an amalgam that seems to please no one. There have been many papers in the last few years debating the proper way to approach the analysis of data from an experiment, and the debate won't end any time soon although it is encouraging that there really has been progress. Back in 1999 the American Psychological Association formed the Task Force on Statistical Inference to deal with this topic. Some people hoped that the task force would suggest banning all statistical tests. Instead, the task force kept alive hypothesis testing, and did something even more useful. They published a 1 page report (Wilkinson et al, 1999) describing what people should do when examining and reporting data. They offered many very good, and very clear, suggestions on how an author should approach the whole problem of analyzing data and writing up a report. Hypothesis testing played only a small role in that discussion. I was quite surprised when I went back and read it again just how good a report it was. There is far more to conducting and reporting a study than people realize, and statistical testing is only a fairly small part of that. I strongly recommend that you look at that paper. It is available at pdf. But first a little history to put this issues in perspective. Back in the 192's Sir Ronald Fisher looked at the problem of deciding whether experimental results were meaningful by positing the existence of a "null hypothesis." (Fisher did not use that term, but it is consistent with his ideas.) Suppose that you are interested in determining if a new fertilizer produces more wheat than the old one that you have been using for years. (Fisher started his career in agriculture, which is why this example.) You plant your wheat, let it grow, harvest it, and measure the result, as, e.g., bushels per acre. Fisher imagined a null hypothesis that said that the new fertilizer did not differ from the old, and that the mean bushels of wheat that it produced is the same as the mean bushels for the old fertilizer. We can abbreviate this as H ( µ µ ) or ( µ µ ) = = =, where the μ i refer to population means. old new old new 23

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras Lecture 09 Basics of Hypothesis Testing Hello friends, welcome

More information

Module 02 Lecture - 10 Inferential Statistics Single Sample Tests

Module 02 Lecture - 10 Inferential Statistics Single Sample Tests Introduction to Data Analytics Prof. Nandan Sudarsanam and Prof. B. Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras

More information

INTRODUCTION TO HYPOTHESIS TESTING. Unit 4A - Statistical Inference Part 1

INTRODUCTION TO HYPOTHESIS TESTING. Unit 4A - Statistical Inference Part 1 1 INTRODUCTION TO HYPOTHESIS TESTING Unit 4A - Statistical Inference Part 1 Now we will begin our discussion of hypothesis testing. This is a complex topic which we will be working with for the rest of

More information

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 3

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 3 6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 3 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare

More information

POLS 205 Political Science as a Social Science. Making Inferences from Samples

POLS 205 Political Science as a Social Science. Making Inferences from Samples POLS 205 Political Science as a Social Science Making Inferences from Samples Christopher Adolph University of Washington, Seattle May 10, 2010 Chris Adolph (UW) Making Inferences from Samples May 10,

More information

Module - 02 Lecturer - 09 Inferential Statistics - Motivation

Module - 02 Lecturer - 09 Inferential Statistics - Motivation Introduction to Data Analytics Prof. Nandan Sudarsanam and Prof. B. Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras

More information

McDougal Littell High School Math Program. correlated to. Oregon Mathematics Grade-Level Standards

McDougal Littell High School Math Program. correlated to. Oregon Mathematics Grade-Level Standards Math Program correlated to Grade-Level ( in regular (non-capitalized) font are eligible for inclusion on Oregon Statewide Assessment) CCG: NUMBERS - Understand numbers, ways of representing numbers, relationships

More information

Project: The Power of a Hypothesis Test

Project: The Power of a Hypothesis Test Project: The Power of a Hypothesis Test Let s revisit the basics of hypothesis testing for a bit here, shall we? Any hypothesis test contains two mutually exclusive hypotheses, H 0 and H 1 (AKA, H A ).

More information

Family Studies Center Methods Workshop

Family Studies Center Methods Workshop oncentral Family Studies Center Methods Workshop Temple University ovember 14, 2014 (Temple University) ovember 14, 2014 1 / 47 oncentral Understand the role of statistical power analysis in family studies

More information

Logical (formal) fallacies

Logical (formal) fallacies Fallacies in academic writing Chad Nilep There are many possible sources of fallacy an idea that is mistakenly thought to be true, even though it may be untrue in academic writing. The phrase logical fallacy

More information

Introduction to Inference

Introduction to Inference Introduction to Inference Confidence Intervals for Proportions 1 On the one hand, we can make a general claim with 100% confidence, but it usually isn t very useful; on the other hand, we can also make

More information

occasions (2) occasions (5.5) occasions (10) occasions (15.5) occasions (22) occasions (28)

occasions (2) occasions (5.5) occasions (10) occasions (15.5) occasions (22) occasions (28) 1 Simulation Appendix Validity Concerns with Multiplying Items Defined by Binned Counts: An Application to a Quantity-Frequency Measure of Alcohol Use By James S. McGinley and Patrick J. Curran This appendix

More information

1. Introduction Formal deductive logic Overview

1. Introduction Formal deductive logic Overview 1. Introduction 1.1. Formal deductive logic 1.1.0. Overview In this course we will study reasoning, but we will study only certain aspects of reasoning and study them only from one perspective. The special

More information

Georgia Quality Core Curriculum

Georgia Quality Core Curriculum correlated to the Grade 8 Georgia Quality Core Curriculum McDougal Littell 3/2000 Objective (Cite Numbers) M.8.1 Component Strand/Course Content Standard All Strands: Problem Solving; Algebra; Computation

More information

Logic & Proofs. Chapter 3 Content. Sentential Logic Semantics. Contents: Studying this chapter will enable you to:

Logic & Proofs. Chapter 3 Content. Sentential Logic Semantics. Contents: Studying this chapter will enable you to: Sentential Logic Semantics Contents: Truth-Value Assignments and Truth-Functions Truth-Value Assignments Truth-Functions Introduction to the TruthLab Truth-Definition Logical Notions Truth-Trees Studying

More information

Statistics for Experimentalists Prof. Kannan. A Department of Chemical Engineering Indian Institute of Technology - Madras

Statistics for Experimentalists Prof. Kannan. A Department of Chemical Engineering Indian Institute of Technology - Madras Statistics for Experimentalists Prof. Kannan. A Department of Chemical Engineering Indian Institute of Technology - Madras Lecture - 23 Hypothesis Testing - Part B (Refer Slide Time: 00:22) So coming back

More information

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking NPTEL NPTEL ONINE CERTIFICATION COURSE Introduction to Machine Learning Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking Prof. Balaraman Ravindran Computer Science and Engineering Indian

More information

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21 6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare

More information

MISSOURI S FRAMEWORK FOR CURRICULAR DEVELOPMENT IN MATH TOPIC I: PROBLEM SOLVING

MISSOURI S FRAMEWORK FOR CURRICULAR DEVELOPMENT IN MATH TOPIC I: PROBLEM SOLVING Prentice Hall Mathematics:,, 2004 Missouri s Framework for Curricular Development in Mathematics (Grades 9-12) TOPIC I: PROBLEM SOLVING 1. Problem-solving strategies such as organizing data, drawing a

More information

1.2. What is said: propositions

1.2. What is said: propositions 1.2. What is said: propositions 1.2.0. Overview In 1.1.5, we saw the close relation between two properties of a deductive inference: (i) it is a transition from premises to conclusion that is free of any

More information

The following content is provided under a Creative Commons license. Your support

The following content is provided under a Creative Commons license. Your support MITOCW Lecture 15 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a

More information

The SAT Essay: An Argument-Centered Strategy

The SAT Essay: An Argument-Centered Strategy The SAT Essay: An Argument-Centered Strategy Overview Taking an argument-centered approach to preparing for and to writing the SAT Essay may seem like a no-brainer. After all, the prompt, which is always

More information

August Parish Life Survey. Saint Benedict Parish Johnstown, Pennsylvania

August Parish Life Survey. Saint Benedict Parish Johnstown, Pennsylvania August 2018 Parish Life Survey Saint Benedict Parish Johnstown, Pennsylvania Center for Applied Research in the Apostolate Georgetown University Washington, DC Parish Life Survey Saint Benedict Parish

More information

The Scripture Engagement of Students at Christian Colleges

The Scripture Engagement of Students at Christian Colleges The 2013 Christian Life Survey The Scripture Engagement of Students at Christian Colleges The Center for Scripture Engagement at Taylor University HTTP://TUCSE.Taylor.Edu In 2013, the Center for Scripture

More information

A Layperson s Guide to Hypothesis Testing By Michael Reames and Gabriel Kemeny ProcessGPS

A Layperson s Guide to Hypothesis Testing By Michael Reames and Gabriel Kemeny ProcessGPS A Layperson s Guide to Hypothesis Testing By Michael Reames and Gabriel Kemeny ProcessGPS In a recent Black Belt Class, the partners of ProcessGPS had a lively discussion about the topic of hypothesis

More information

Content Area Variations of Academic Language

Content Area Variations of Academic Language Academic Expressions for Interpreting in Language Arts 1. It really means because 2. The is a metaphor for 3. It wasn t literal; that s the author s way of describing how 4. The author was trying to teach

More information

Introductory Statistics Day 25. Paired Means Test

Introductory Statistics Day 25. Paired Means Test Introductory Statistics Day 25 Paired Means Test 4.4 Paired Tests Find the data set textbooks.xlsx on the Moodle page. This data set is from OpenIntro Stats. In this data set we have 73 textbooks that

More information

CHAPTER 17: UNCERTAINTY AND RANDOM: WHEN IS CONCLUSION JUSTIFIED?

CHAPTER 17: UNCERTAINTY AND RANDOM: WHEN IS CONCLUSION JUSTIFIED? CHAPTER 17: UNCERTAINTY AND RANDOM: WHEN IS CONCLUSION JUSTIFIED? INTERPRETATION AND CONCLUSIONS Deduction the use of facts to reach a conclusion seems straightforward and beyond reproach. The reality

More information

MATH 1000 PROJECT IDEAS

MATH 1000 PROJECT IDEAS MATH 1000 PROJECT IDEAS (1) Birthday Paradox (TAKEN): This question was briefly mentioned in Chapter 13: How many people must be in a room before there is a greater than 50% chance that some pair of people

More information

Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 1 Correlated with Common Core State Standards, Grade 1

Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 1 Correlated with Common Core State Standards, Grade 1 Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 1 Common Core State Standards for Literacy in History/Social Studies, Science, and Technical Subjects, Grades K-5 English Language Arts Standards»

More information

Discussion Notes for Bayesian Reasoning

Discussion Notes for Bayesian Reasoning Discussion Notes for Bayesian Reasoning Ivan Phillips - http://www.meetup.com/the-chicago-philosophy-meetup/events/163873962/ Bayes Theorem tells us how we ought to update our beliefs in a set of predefined

More information

Probability Distributions TEACHER NOTES MATH NSPIRED

Probability Distributions TEACHER NOTES MATH NSPIRED Math Objectives Students will compare the distribution of a discrete sample space to distributions of randomly selected outcomes from that sample space. Students will identify the structure that emerges

More information

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur Lecture No. #05 Review of Probability and Statistics I Good afternoon, it is Tapan Bagchi again. I have

More information

175 Chapter CHAPTER 23: Probability

175 Chapter CHAPTER 23: Probability 75 Chapter 23 75 CHAPTER 23: Probability According to the doctrine of chance, you ought to put yourself to the trouble of searching for the truth; for if you die without worshipping the True Cause, you

More information

Experimental Design. Introduction

Experimental Design. Introduction Ecologists generally, and marine biologists in particular, do not spend sufficient time, at least according to the available literature, in introspection about the nature of the science that they do Underwood

More information

January Parish Life Survey. Saint Paul Parish Macomb, Illinois

January Parish Life Survey. Saint Paul Parish Macomb, Illinois January 2018 Parish Life Survey Saint Paul Parish Macomb, Illinois Center for Applied Research in the Apostolate Georgetown University Washington, DC Parish Life Survey Saint Paul Parish Macomb, Illinois

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Risk, Ambiguity, and the Savage Axioms: Comment Author(s): Howard Raiffa Source: The Quarterly Journal of Economics, Vol. 75, No. 4 (Nov., 1961), pp. 690-694 Published by: Oxford University Press Stable

More information

MITOCW watch?v=4hrhg4euimo

MITOCW watch?v=4hrhg4euimo MITOCW watch?v=4hrhg4euimo The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To

More information

The Effect of Religiosity on Class Attendance. Abstract

The Effect of Religiosity on Class Attendance. Abstract Curt Raney Introduction to Data Analysis Spring 2000 Word : 1,157 The Effect of Religiosity on Class Attendance Abstract This paper reports the results of a survey of college students showing that religiosity

More information

Detachment, Probability, and Maximum Likelihood

Detachment, Probability, and Maximum Likelihood Detachment, Probability, and Maximum Likelihood GILBERT HARMAN PRINCETON UNIVERSITY When can we detach probability qualifications from our inductive conclusions? The following rule may seem plausible:

More information

Scientific Realism and Empiricism

Scientific Realism and Empiricism Philosophy 164/264 December 3, 2001 1 Scientific Realism and Empiricism Administrative: All papers due December 18th (at the latest). I will be available all this week and all next week... Scientific Realism

More information

MITOCW watch?v=ogo1gpxsuzu

MITOCW watch?v=ogo1gpxsuzu MITOCW watch?v=ogo1gpxsuzu The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To

More information

HSC EXAMINATION REPORT. Studies of Religion

HSC EXAMINATION REPORT. Studies of Religion 1998 HSC EXAMINATION REPORT Studies of Religion Board of Studies 1999 Published by Board of Studies NSW GPO Box 5300 Sydney NSW 2001 Australia Tel: (02) 9367 8111 Fax: (02) 9262 6270 Internet: http://www.boardofstudies.nsw.edu.au

More information

Types of Error Power of a Hypothesis Test. AP Statistics - Chapter 21

Types of Error Power of a Hypothesis Test. AP Statistics - Chapter 21 Types of Power of a Hypothesis Test AP Statistics - Chapter 21 We make decisions based on a probability but what if we re WRONG?!? When we perform a hypothesis test: In real life... In our hypothesis...

More information

LTJ 27 2 [Start of recorded material] Interviewer: From the University of Leicester in the United Kingdom. This is Glenn Fulcher with the very first

LTJ 27 2 [Start of recorded material] Interviewer: From the University of Leicester in the United Kingdom. This is Glenn Fulcher with the very first LTJ 27 2 [Start of recorded material] Interviewer: From the University of Leicester in the United Kingdom. This is Glenn Fulcher with the very first issue of Language Testing Bytes. In this first Language

More information

Chapter 20 Testing Hypotheses for Proportions

Chapter 20 Testing Hypotheses for Proportions Chapter 20 Testing Hypotheses for Proportions A hypothesis proposes a model for the world. Then we look at the data. If the data are consistent with that model, we have no reason to disbelieve the hypothesis.

More information

CHAPTER 16: IS SCIENCE LOGICAL?

CHAPTER 16: IS SCIENCE LOGICAL? INTERPRETATION AND CONCLUSIONS CHAPTER 16: IS SCIENCE LOGICAL? An earlier chapter revealed that all models are false. This chapter reveals another blemish on the face of science -- how we decide the fate

More information

Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 3 Correlated with Common Core State Standards, Grade 3

Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 3 Correlated with Common Core State Standards, Grade 3 Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 3 Common Core State Standards for Literacy in History/Social Studies, Science, and Technical Subjects, Grades K-5 English Language Arts Standards»

More information

THE ROLE OF COHERENCE OF EVIDENCE IN THE NON- DYNAMIC MODEL OF CONFIRMATION TOMOJI SHOGENJI

THE ROLE OF COHERENCE OF EVIDENCE IN THE NON- DYNAMIC MODEL OF CONFIRMATION TOMOJI SHOGENJI Page 1 To appear in Erkenntnis THE ROLE OF COHERENCE OF EVIDENCE IN THE NON- DYNAMIC MODEL OF CONFIRMATION TOMOJI SHOGENJI ABSTRACT This paper examines the role of coherence of evidence in what I call

More information

Grade 7 Math Connects Suggested Course Outline for Schooling at Home 132 lessons

Grade 7 Math Connects Suggested Course Outline for Schooling at Home 132 lessons Grade 7 Math Connects Suggested Course Outline for Schooling at Home 132 lessons I. Introduction: (1 day) Look at p. 1 in the textbook with your child and learn how to use the math book effectively. DO:

More information

CAUSATION 1 THE BASICS OF CAUSATION

CAUSATION 1 THE BASICS OF CAUSATION CAUSATION 1 A founder of the study of international relations, E. H. Carr, once said: The study of history is a study of causes. 2 Because a basis for thinking about international affairs is history, he

More information

This report is organized in four sections. The first section discusses the sample design. The next

This report is organized in four sections. The first section discusses the sample design. The next 2 This report is organized in four sections. The first section discusses the sample design. The next section describes data collection and fielding. The final two sections address weighting procedures

More information

Ace the Bold Face Sample Copy Not for Sale

Ace the Bold Face Sample Copy Not for Sale Ace the Bold Face Sample Copy Not for Sale GMAT and GMAC are registered trademarks of the Graduate Management Admission Council which neither sponsors nor endorses this product 3 Copyright, Legal Notice

More information

Religious affiliation, religious milieu, and contraceptive use in Nigeria (extended abstract)

Religious affiliation, religious milieu, and contraceptive use in Nigeria (extended abstract) Victor Agadjanian Scott Yabiku Arizona State University Religious affiliation, religious milieu, and contraceptive use in Nigeria (extended abstract) Introduction Religion has played an increasing role

More information

7AAN2004 Early Modern Philosophy report on summative essays

7AAN2004 Early Modern Philosophy report on summative essays 7AAN2004 Early Modern Philosophy report on summative essays On the whole, the essays twelve in all were pretty good. The marks ranged from 57% to 75%, and there were indeed four essays, a full third of

More information

On the Verge of Walking Away? American Teens, Communication with God, & Temptations

On the Verge of Walking Away? American Teens, Communication with God, & Temptations On the Verge of Walking Away? American Teens, Communication with God, & Temptations May 2009 1 On the Verge of Walking Away? American Teens, Communication with God, & Daily Temptations Recent studies reveal

More information

The Fifth National Survey of Religion and Politics: A Baseline for the 2008 Presidential Election. John C. Green

The Fifth National Survey of Religion and Politics: A Baseline for the 2008 Presidential Election. John C. Green The Fifth National Survey of Religion and Politics: A Baseline for the 2008 Presidential Election John C. Green Ray C. Bliss Institute of Applied Politics University of Akron (Email: green@uakron.edu;

More information

Grade 6 correlated to Illinois Learning Standards for Mathematics

Grade 6 correlated to Illinois Learning Standards for Mathematics STATE Goal 6: Demonstrate and apply a knowledge and sense of numbers, including numeration and operations (addition, subtraction, multiplication, division), patterns, ratios and proportions. A. Demonstrate

More information

Scientific errors should be controlled, not prevented. Daniel Eindhoven University of Technology

Scientific errors should be controlled, not prevented. Daniel Eindhoven University of Technology Scientific errors should be controlled, not prevented Daniel Lakens @Lakens Eindhoven University of Technology 1) Error control is the central aim of empirical science. 2) We need statistical decision

More information

Probability Foundations for Electrical Engineers Prof. Krishna Jagannathan Department of Electrical Engineering Indian Institute of Technology, Madras

Probability Foundations for Electrical Engineers Prof. Krishna Jagannathan Department of Electrical Engineering Indian Institute of Technology, Madras Probability Foundations for Electrical Engineers Prof. Krishna Jagannathan Department of Electrical Engineering Indian Institute of Technology, Madras Lecture - 1 Introduction Welcome, this is Probability

More information

THE TENDENCY TO CERTAINTY IN RELIGIOUS BELIEF.

THE TENDENCY TO CERTAINTY IN RELIGIOUS BELIEF. THE TENDENCY TO CERTAINTY IN RELIGIOUS BELIEF. BY ROBERT H. THOULESS. (From the Department of Psychology, Glasgow University.) First published in British Journal of Psychology, XXVI, pp. 16-31, 1935. I.

More information

Unit. Science and Hypothesis. Downloaded from Downloaded from Why Hypothesis? What is a Hypothesis?

Unit. Science and Hypothesis. Downloaded from  Downloaded from  Why Hypothesis? What is a Hypothesis? Why Hypothesis? Unit 3 Science and Hypothesis All men, unlike animals, are born with a capacity "to reflect". This intellectual curiosity amongst others, takes a standard form such as "Why so-and-so is

More information

Why Good Science Is Not Value-Free

Why Good Science Is Not Value-Free Why Good Science Is Not Value-Free Karim Bschir, Dep. of Humanities, Social and Political Sciences, ETH Zurich FPF 2017 Workshop, Zurich Scientific Challenges in the Risk Assessment of Food Contact Materials

More information

P 97 Personality and the Practice of Ministry

P 97 Personality and the Practice of Ministry P 97 Personality and the Practice of Ministry Statistical Tables Further Resources The accompanying Grove Pastoral booklet has been written as far as possible to make sense to readers who are unfamiliar

More information

How many imputations do you need? A two stage calculation using a quadratic rule

How many imputations do you need? A two stage calculation using a quadratic rule Sociological Methods and Research, in press 2018 How many imputations do you need? A two stage calculation using a quadratic rule Paul T. von Hippel University of Texas, Austin Abstract 0F When using multiple

More information

Marcello Pagano [JOTTER WEEK 5 SAMPLING DISTRIBUTIONS ] Central Limit Theorem, Confidence Intervals and Hypothesis Testing

Marcello Pagano [JOTTER WEEK 5 SAMPLING DISTRIBUTIONS ] Central Limit Theorem, Confidence Intervals and Hypothesis Testing Marcello Pagano [JOTTER WEEK 5 SAMPLING DISTRIBUTIONS ] Central Limit Theorem, Confidence Intervals and Hypothesis Testing Inference This is when the magic starts happening. Statistical Inference Use of

More information

But we may go further: not only Jones, but no actual man, enters into my statement. This becomes obvious when the statement is false, since then

But we may go further: not only Jones, but no actual man, enters into my statement. This becomes obvious when the statement is false, since then CHAPTER XVI DESCRIPTIONS We dealt in the preceding chapter with the words all and some; in this chapter we shall consider the word the in the singular, and in the next chapter we shall consider the word

More information

Introduction Questions to Ask in Judging Whether A Really Causes B

Introduction Questions to Ask in Judging Whether A Really Causes B 1 Introduction We live in an age when the boundaries between science and science fiction are becoming increasingly blurred. It sometimes seems that nothing is too strange to be true. How can we decide

More information

2.1 Review. 2.2 Inference and justifications

2.1 Review. 2.2 Inference and justifications Applied Logic Lecture 2: Evidence Semantics for Intuitionistic Propositional Logic Formal logic and evidence CS 4860 Fall 2012 Tuesday, August 28, 2012 2.1 Review The purpose of logic is to make reasoning

More information

Same-different and A-not A tests with sensr. Same-Different and the Degree-of-Difference tests. Outline. Christine Borgen Linander

Same-different and A-not A tests with sensr. Same-Different and the Degree-of-Difference tests. Outline. Christine Borgen Linander Same-different and -not tests with sensr Christine Borgen Linander DTU Compute Section for Statistics Technical University of Denmark chjo@dtu.dk huge thank to a former colleague of mine Rune H B Christensen.

More information

ANSWER SHEET FINAL EXAM MATH 111 SPRING 2009 (PRINT ABOVE IN LARGE CAPITALS) CIRCLE LECTURE HOUR 10AM 2PM FIRST NAME: (PRINT ABOVE IN CAPITALS)

ANSWER SHEET FINAL EXAM MATH 111 SPRING 2009 (PRINT ABOVE IN LARGE CAPITALS) CIRCLE LECTURE HOUR 10AM 2PM FIRST NAME: (PRINT ABOVE IN CAPITALS) ANSWER SHEET FINAL EXAM MATH 111 SPRING 2009 FRIDAY 1 MAY 2009 LAST NAME: (PRINT ABOVE IN LARGE CAPITALS) CIRCLE LECTURE HOUR 10AM 2PM FIRST NAME: (PRINT ABOVE IN CAPITALS) CIRCLE LAB DAY: TUESDAY THURSDAY

More information

I think, therefore I am. - Rene Descartes

I think, therefore I am. - Rene Descartes CRITICAL THINKING Sitting on top of your shoulders is one of the finest computers on the earth. But, like any other muscle in your body, it needs to be exercised to work its best. That exercise is called

More information

It is One Tailed F-test since the variance of treatment is expected to be large if the null hypothesis is rejected.

It is One Tailed F-test since the variance of treatment is expected to be large if the null hypothesis is rejected. EXST 7014 Experimental Statistics II, Fall 2018 Lab 10: ANOVA and Post ANOVA Test Due: 31 st October 2018 OBJECTIVES Analysis of variance (ANOVA) is the most commonly used technique for comparing the means

More information

HAS DAVID HOWDEN VINDICATED RICHARD VON MISES S DEFINITION OF PROBABILITY?

HAS DAVID HOWDEN VINDICATED RICHARD VON MISES S DEFINITION OF PROBABILITY? LIBERTARIAN PAPERS VOL. 1, ART. NO. 44 (2009) HAS DAVID HOWDEN VINDICATED RICHARD VON MISES S DEFINITION OF PROBABILITY? MARK R. CROVELLI * Introduction IN MY RECENT ARTICLE on these pages entitled On

More information

It Ain t What You Prove, It s the Way That You Prove It. a play by Chris Binge

It Ain t What You Prove, It s the Way That You Prove It. a play by Chris Binge It Ain t What You Prove, It s the Way That You Prove It a play by Chris Binge (From Alchin, Nicholas. Theory of Knowledge. London: John Murray, 2003. Pp. 66-69.) Teacher: Good afternoon class. For homework

More information

Classroom Voting Questions: Statistics

Classroom Voting Questions: Statistics Classroom Voting Questions: Statistics General Probability Rules 1. In a certain semester, 500 students enrolled in both Calculus I and Physics I. Of these students, 82 got an A in calculus, 73 got an

More information

MLLunsford, Spring Activity: Conditional Probability and The Law of Total Probability

MLLunsford, Spring Activity: Conditional Probability and The Law of Total Probability MLLunsford, Spring 2003 1 Activity: Conditional Probability and The Law of Total Probability Concepts: Conditional Probability, Independent Events, the Multiplication Rule, the Law of Total Probability

More information

PHI 1700: Global Ethics

PHI 1700: Global Ethics PHI 1700: Global Ethics Session 3 February 11th, 2016 Harman, Ethics and Observation 1 (finishing up our All About Arguments discussion) A common theme linking many of the fallacies we covered is that

More information

On the Relationship between Religiosity and Ideology

On the Relationship between Religiosity and Ideology Curt Raney Introduction to Data Analysis Spring 1997 Word Count: 1,583 On the Relationship between Religiosity and Ideology Abstract This paper reports the results of a survey of students at a small college

More information

Philosophy 148 Announcements & Such. Inverse Probability and Bayes s Theorem II. Inverse Probability and Bayes s Theorem III

Philosophy 148 Announcements & Such. Inverse Probability and Bayes s Theorem II. Inverse Probability and Bayes s Theorem III Branden Fitelson Philosophy 148 Lecture 1 Branden Fitelson Philosophy 148 Lecture 2 Philosophy 148 Announcements & Such Administrative Stuff I ll be using a straight grading scale for this course. Here

More information

I thought I should expand this population approach somewhat: P t = P0e is the equation which describes population growth.

I thought I should expand this population approach somewhat: P t = P0e is the equation which describes population growth. I thought I should expand this population approach somewhat: P t = P0e is the equation which describes population growth. To head off the most common objections:! This does take into account the death

More information

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur. Lecture No. # 18 Acceptance Sampling

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur. Lecture No. # 18 Acceptance Sampling Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur Lecture No. # 18 Acceptance Sampling Good afternoon, we begin today we continue with our session on Six

More information

Is it rational to have faith? Looking for new evidence, Good s Theorem, and Risk Aversion. Lara Buchak UC Berkeley

Is it rational to have faith? Looking for new evidence, Good s Theorem, and Risk Aversion. Lara Buchak UC Berkeley Is it rational to have faith? Looking for new evidence, Good s Theorem, and Risk Aversion. Lara Buchak UC Berkeley buchak@berkeley.edu *Special thanks to Branden Fitelson, who unfortunately couldn t be

More information

Okay, good afternoon everybody. Hope everyone can hear me. Ronet, can you hear me okay?

Okay, good afternoon everybody. Hope everyone can hear me. Ronet, can you hear me okay? Okay, good afternoon everybody. Hope everyone can hear me. Ronet, can you hear me okay? I can. Okay. Great. Can you hear me? Yeah. I can hear you. Wonderful. Well again, good afternoon everyone. My name

More information

In Our Own Words 2000 Research Study

In Our Own Words 2000 Research Study The Death Penalty and Selected Factors from the In Our Own Words 2000 Research Study Prepared on July 25 th, 2001 DEATH PENALTY AND SELECTED FACTORS 2 WHAT BRINGS US TOGETHER: A PRESENTATION OF THE IOOW

More information

The Birthday Problem

The Birthday Problem The Birthday Problem In 1939, a mathematician named Richard von Mises proposed what we call today the birthday problem. He asked: How many people must be in a room before the probability that two share

More information

Argument Writing. Whooohoo!! Argument instruction is necessary * Argument comprehension is required in school assignments, standardized testing, job

Argument Writing. Whooohoo!! Argument instruction is necessary * Argument comprehension is required in school assignments, standardized testing, job Argument Writing Whooohoo!! Argument instruction is necessary * Argument comprehension is required in school assignments, standardized testing, job promotion as well as political and personal decision-making

More information

II Plenary discussion of Expertise and the Global Warming debate.

II Plenary discussion of Expertise and the Global Warming debate. Thinking Straight Critical Reasoning WS 9-1 May 27, 2008 I. A. (Individually ) review and mark the answers for the assignment given on the last pages: (two points each for reconstruction and evaluation,

More information

Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 4 Correlated with Common Core State Standards, Grade 4

Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 4 Correlated with Common Core State Standards, Grade 4 Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 4 Common Core State Standards for Literacy in History/Social Studies, Science, and Technical Subjects, Grades K-5 English Language Arts Standards»

More information

CSSS/SOC/STAT 321 Case-Based Statistics I. Introduction to Probability

CSSS/SOC/STAT 321 Case-Based Statistics I. Introduction to Probability CSSS/SOC/STAT 321 Case-Based Statistics I Introduction to Probability Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University of Washington, Seattle

More information

Computational Learning Theory: Agnostic Learning

Computational Learning Theory: Agnostic Learning Computational Learning Theory: Agnostic Learning Machine Learning Fall 2018 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others 1 This lecture: Computational Learning Theory The

More information

Identity and Curriculum in Catholic Education

Identity and Curriculum in Catholic Education Identity and Curriculum in Catholic Education Survey of teachers opinions regarding certain aspects of Catholic Education Executive summary A survey instrument (Appendix 1), designed by working groups

More information

The Critical Mind is A Questioning Mind

The Critical Mind is A Questioning Mind criticalthinking.org http://www.criticalthinking.org/pages/the-critical-mind-is-a-questioning-mind/481 The Critical Mind is A Questioning Mind Learning How to Ask Powerful, Probing Questions Introduction

More information

Introduction Chapter 1 of Social Statistics

Introduction Chapter 1 of Social Statistics Introduction p.1/22 Introduction Chapter 1 of Social Statistics Chris Lawrence cnlawren@olemiss.edu Introduction p.2/22 Introduction In this chapter, we will discuss: What statistics are Introduction p.2/22

More information

Survey Report New Hope Church: Attitudes and Opinions of the People in the Pews

Survey Report New Hope Church: Attitudes and Opinions of the People in the Pews Survey Report New Hope Church: Attitudes and Opinions of the People in the Pews By Monte Sahlin May 2007 Introduction A survey of attenders at New Hope Church was conducted early in 2007 at the request

More information

How to Generate a Thesis Statement if the Topic is Not Assigned.

How to Generate a Thesis Statement if the Topic is Not Assigned. What is a Thesis Statement? Almost all of us--even if we don't do it consciously--look early in an essay for a one- or two-sentence condensation of the argument or analysis that is to follow. We refer

More information

Final Paper. May 13, 2015

Final Paper. May 13, 2015 24.221 Final Paper May 13, 2015 Determinism states the following: given the state of the universe at time t 0, denoted S 0, and the conjunction of the laws of nature, L, the state of the universe S at

More information

The World Wide Web and the U.S. Political News Market: Online Appendices

The World Wide Web and the U.S. Political News Market: Online Appendices The World Wide Web and the U.S. Political News Market: Online Appendices Online Appendix OA. Political Identity of Viewers Several times in the paper we treat as the left- most leaning TV station. Posner

More information

16 Free Will Requires Determinism

16 Free Will Requires Determinism 16 Free Will Requires Determinism John Baer The will is infinite, and the execution confined... the desire is boundless, and the act a slave to limit. William Shakespeare, Troilus and Cressida, III. ii.75

More information

Video: How does understanding whether or not an argument is inductive or deductive help me?

Video: How does understanding whether or not an argument is inductive or deductive help me? Page 1 of 10 10b Learn how to evaluate verbal and visual arguments. Video: How does understanding whether or not an argument is inductive or deductive help me? Download transcript Three common ways to

More information