POLS 205 Political Science as a Social Science. Making Inferences from Samples

POLS 205 Political Science as a Social Science Making Inferences from Samples Christopher Adolph University of Washington, Seattle May 10, 2010 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 1 / 77

Motivation How do we know what the average American thinks about an issue? Usual approach: conduct on opinion poll, randomly sample 1000 or so people, and present the average of their opinions But how do we know this matches the average opinion of all Americans? Chris Adolph (UW) Making Inferences from Samples May 10, 2010 2 / 77

Motivation In particular, how do we know how far the sample mean, x, is from the true mean, x true E ( x x true) =? If our sample isn t very representative of the population, these might be far apart Without knowing anything but the sample, can we estimate the deviation between the sample mean and the population mean? To answer this, we ll need to build up several tools... Chris Adolph (UW) Making Inferences from Samples May 10, 2010 3 / 77

Outline Constructing a Sample Probability Distributions Inference about the Population Mean Inferences about Differences in the Mean Chris Adolph (UW) Making Inferences from Samples May 10, 2010 4 / 77

Constructing a Sample Populations & Samples We will consider groups of observations at three distinct levels: Superpopulation All the cases in the world we think our theory applies to A population of populations Example: Average support ᾰ of all Americans over time and space for the income tax Population All the potential units of analysis in our chosen research design Ideally we d like to analyze a census, or complete set, of these observations Example: Average support α of all Washingtonians in April 2010 for the income tax Sample The units of analysis actually collected for our study Usually a subset of the population Example: Average support ˆα of 500 randomly selected Washingtonians in April 2010 for the income tax Chris Adolph (UW) Making Inferences from Samples May 10, 2010 5 / 77

Constructing a Sample Sampling Frames In an ideal situation, our sample, population, and superpopulation will contain the same cases (a census) Usually, we must instead make inferences about the population (and superpopulation) using a subset, or sample, of cases Can select this sample in different ways Chris Adolph (UW) Making Inferences from Samples May 10, 2010 6 / 77

Constructing a Sample Sampling Frames Random sample Make a list of the full population and randomly select by identification number. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 7 / 77

Constructing a Sample Sampling Frames Random sample Make a list of the full population and randomly select by identification number. E.g., Random Digit Dialling of phone numbers. If done correclty, makes inference easy Stratified sample If we can t randomly sample properly, but have detailed information on the population, we could re-weight our flawed random sample based on identifiable strata E.g., If a phone survey fails to reach enough people who work at night, we could give the few we reach extra weight based on their known population frequency If done correctly, produces something close to a random sample Convenience sample If we can t form any sort of random sample, we might take people non-randomly who are close at hand Chris Adolph (UW) Making Inferences from Samples May 10, 2010 7 / 77

Constructing a Sample Sampling Frames Random sample Make a list of the full population and randomly select by identification number. E.g., Random Digit Dialling of phone numbers. If done correclty, makes inference easy Stratified sample If we can t randomly sample properly, but have detailed information on the population, we could re-weight our flawed random sample based on identifiable strata E.g., If a phone survey fails to reach enough people who work at night, we could give the few we reach extra weight based on their known population frequency If done correctly, produces something close to a random sample Convenience sample If we can t form any sort of random sample, we might take people non-randomly who are close at hand E.g., When studying a hard to reach population, we might ask each member we find to nominate other members, forming a snowball sample Chris Adolph (UW) Making Inferences from Samples May 10, 2010 7 / 77

Constructing a Sample Sampling Frames Random sample Make a list of the full population and randomly select by identification number. E.g., Random Digit Dialling of phone numbers. If done correclty, makes inference easy Stratified sample If we can t randomly sample properly, but have detailed information on the population, we could re-weight our flawed random sample based on identifiable strata E.g., If a phone survey fails to reach enough people who work at night, we could give the few we reach extra weight based on their known population frequency If done correctly, produces something close to a random sample Convenience sample If we can t form any sort of random sample, we might take people non-randomly who are close at hand E.g., When studying a hard to reach population, we might ask each member we find to nominate other members, forming a snowball sample Comnvenience samples do not allow scientific inference to the population parameters Chris Adolph (UW) Making Inferences from Samples May 10, 2010 7 / 77

Constructing a Sample When sampling goes wrong If a random sample is non-representative, will adding more random sample help make it so? Chris Adolph (UW) Making Inferences from Samples May 10, 2010 8 / 77

Constructing a Sample When sampling goes wrong If a random sample is non-representative, will adding more random sample help make it so? Yes Chris Adolph (UW) Making Inferences from Samples May 10, 2010 8 / 77

Constructing a Sample When sampling goes wrong If a random sample is non-representative, will adding more random sample help make it so? Yes If a stratified sample has the wrong weights, will adding more samples make it representative? No Are convenience samples more likely to be representative as they get larger? Chris Adolph (UW) Making Inferences from Samples May 10, 2010 8 / 77

Constructing a Sample When sampling goes wrong If a random sample is non-representative, will adding more random sample help make it so? Yes If a stratified sample has the wrong weights, will adding more samples make it representative? No Are convenience samples more likely to be representative as they get larger? NO! No matter how large a convenience sample, they are likely to be sampled with huge and unknown selection bias Chris Adolph (UW) Making Inferences from Samples May 10, 2010 8 / 77

Constructing a Sample Sampling Inference Our next goal is to make scientificially valid inferences from the random or representative sample we ve collected Standard scientific practice requires that we quantify the uncertainty introduced by sampling To learn how to do this, we need more probability theory Chris Adolph (UW) Making Inferences from Samples May 10, 2010 9 / 77

Probability Distributions Statistical Independence We say that two events are independent if the occurence of one doesn t affect the probability that the other occurs In math, independence implies the conditional probability of an event equals the marginal probability: Pr(a b) = Pr(a) Another way to think of independence is that knowing how the first event turns out doesn t help us predict the second Chris Adolph (UW) Making Inferences from Samples May 10, 2010 10 / 77

Probability Distributions Statistical Independence For example, suppose we flip a coin twice. The second flip doesn t depend on the first: Pr(Second coin is heads First coin is heads) = Pr(Second coin is heads) Gambler s Fallacy: If a coin flip comes out heads many times in a row, the next flip is more likely to be heads because it s due to be heads In fact, after a dozen straight heads, the probability flip thirteen will be heads is still 1/2. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 11 / 77

Probability Distributions Probability Distributions We say a variable is random when there is some probability that it takes on any of the possible values The mathematical function which relates those probabilities to each value is the probability distribuiton function (pdf) We can construct many different kinds of pdfs, but it helps to start small Chris Adolph (UW) Making Inferences from Samples May 10, 2010 12 / 77

Probability Distributions A probability distribution for binary variables Consider a single flip of a coin. The sample space is Ω coin flip = {H, T } That is, there is some probability Pr(H) that we see a head when we flip, and some probability Pr(T ) that we see a tail Based on probability assumption 1, we know that: 0 Pr(H) 1 0 Pr(T ) 1 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 13 / 77

Probability Distributions A probability distribution for binary variables If H and T are the only possible outcomes, we know from assumption 2 that: so Pr(H) + Pr(T ) = 1 Pr(T ) = 1 Pr(H) That is, if we know Pr(H), we know everything there is to know about the probability distribution of our coin flip Chris Adolph (UW) Making Inferences from Samples May 10, 2010 14 / 77

Probability Distributions A probability distribution for binary variables Let s call the probability of a head Pr(H) = π, following the statistics convention the we write all unknown parameters as Greek letters And let s call our random variable (whether the flip comes out heads or tails) x, following the statistics convention that known data variables are written as Roman letters Chris Adolph (UW) Making Inferences from Samples May 10, 2010 15 / 77

Probability Distributions We can summarize the probability distribution for a flip of a coin, or any other binary variable, in a single equation: f Bernoulli (x π) = { 1 π if x = 0 π if x = 1 This equation is clear, but unwieldy. Using exponents, we can reduce it to a single line: f Bernoulli (x π) = π x (1 π) 1 x This is the pdf of the Bernoulli distribution, which applies to all binary variables The first two moments of this distribution are: E(x) = π var(x) = π(1 π) Chris Adolph (UW) Making Inferences from Samples May 10, 2010 16 / 77

Probability Distributions What about continuous data? The Bernoulli distribution is helpful if we are talking about binary data. In fact, it s the only choice available! But what about a continuous random variable? It takes on far more than two possible values Unfortunately, there are many possible distributions for continuous variables, and choosing one is much more controversial We will discuss three different choices: the Normal distribution, the χ 2 distribution, and the t distribution Chris Adolph (UW) Making Inferences from Samples May 10, 2010 17 / 77

Probability Distributions The Normal Distribution Suppose we have a large number of additive or ratio level variables with unknown (ie, arbitrary) distributions These variables need not be related to one another Indeed, they should be independent; ie, uncorrelated with each other Let us call these variables x 1i, x 2i, x 3i,..., x ki They might be how much each American i spends on each product & service k for sale in the economy Now suppose we add together the spending of each American to create X i Chris Adolph (UW) Making Inferences from Samples May 10, 2010 18 / 77

Probability Distributions The Normal Distribution According to the Central Limit Theorem, as k, X i will follow the so-called Normal distribution: [ f Normal (X µ, σ 2 ) = (2πσ 2 ) 1/2 (Xi µ) 2 ] exp 2σ 2 Moments: E(X) = µ Var(X) = σ 2 The Normal distribution is continuous and symmetric, with positive probability everywhere from to Chris Adolph (UW) Making Inferences from Samples May 10, 2010 19 / 77

Probability Distributions The Normal Distribution According to the Central Limit Theorem, as k, X i will follow the so-called Normal distribution: [ f Normal (X µ, σ 2 ) = (2πσ 2 ) 1/2 (Xi µ) 2 ] exp 2σ 2 Moments: E(X) = µ Var(X) = σ 2 The Normal distribution is continuous and symmetric, with positive probability everywhere from to Also called the Gaussian distribution. (A better name, since it avoids the implication that it is Normal for a variable to follow this distribution.) Chris Adolph (UW) Making Inferences from Samples May 10, 2010 19 / 77

Probability Distributions Examples of the Normal Distribution Probability Density 0.0 0.4 0.8 N(0,1) 6 4 2 0 2 4 6 Value of Random Variable This is the Normal distribution with mean µ = 0 and variance σ 2 = 1. Known as the Standard Normal. Also the Bell Curve. 67% of the density is within ±1 sd s of the mean; 95% in ±2 sd s; and 99% in ±3 sd s. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 20 / 77

Probability Distributions Examples of the Normal Distribution Probability Density 0.0 0.4 0.8 N(0,2) 6 4 2 0 2 4 6 Value of Random Variable This is the Normal distribution with mean µ = 0 and variance σ 2 = 2. The larger variance has spread out the distribution. Still the case that: 67% of the density is within ±1 sd s of the mean; 95% in ±2 sd s; and 99% in ±3 sd s. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 21 / 77

Probability Distributions Examples of the Normal Distribution Probability Density 0.0 0.4 0.8 N(0,5) 6 4 2 0 2 4 6 Value of Random Variable This is the Normal distribution with mean µ = 0 and variance σ 2 = 5. The larger variance has spread out the distribution even more Still the case that: 67% of the density is within ±1 sd s of the mean; 95% in ±2 sd s; and 99% in ±3 sd s. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 22 / 77

Probability Distributions Examples of the Normal Distribution Probability Density 0.0 0.4 0.8 N(1,0.5) 6 4 2 0 2 4 6 Value of Random Variable This is the Normal distribution with mean µ = 0 and variance σ 2 = 0.25. Smaller variance tightens distribution to a spike over the mean Still the case that: 67% of the density is within ±1 sd s of the mean; 95% in ±2 sd s; and 99% in ±3 sd s. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 23 / 77

Probability Distributions Examples of the Normal Distribution Probability Density 0.0 0.4 0.8 N(1,1) 6 4 2 0 2 4 6 Value of Random Variable This is the Normal distribution with mean µ = 1 and variance σ 2 = 1. Increasing the mean just shifts the distribution rightward Still the case that: 67% of the density is within ±1 sd s of the mean; 95% in ±2 sd s; and 99% in ±3 sd s. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 24 / 77

Probability Distributions The χ 2 distribution What if we have a variable X 2 that is the sum of n < squared independent standard Normal random variables X 2 = x 2 1 + x 2 2 +... x 2 n Sum of a finite set of Normal random variables, so the Normal only applies approximately What distribution does this sum really follow? Chris Adolph (UW) Making Inferences from Samples May 10, 2010 25 / 77

Probability Distributions The χ 2 distribution X 2 = x 2 1 + x 2 2 +... x 2 k, n < follows a χ 2 (chi-squared) distribution, χ 2 (X 2 n ) = 1 2 n/2 Γ(n/2) (X 2 ) (n 2)/2 exp( X/2) which has degrees of freedom n (Γ( ) is the Gamma function, an interpolated factorial) Moments: E(χ 2 ) = n and Var(χ 2 ) = 2n Chris Adolph (UW) Making Inferences from Samples May 10, 2010 26 / 77

Probability Distributions χ 2 approaches the Normal as k increases Density 0.0 0.1 0.2 0.3 0.4 0.5 chisquare 1 chisquare 4 chisquare 15 0 10 20 30 40 x Chris Adolph (UW) Making Inferences from Samples May 10, 2010 27 / 77

Probability Distributions The t distribution The χ 2 is a key building block for a more useful distribution Suppose Z is Normally distributed and X 2 is distributed χ 2 with n degrees of freedom. Define t = Z X 2 /n which is distributed t with n degrees of freedom: f t (t, n) = Γ ( ) n+1 2 1 Γ(n/2) (1 + t 2 /n) (n+1)/2 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 28 / 77

Probability Distributions The t distribution f t (t, n) = Γ ( ) n+1 2 1 Γ(n/2) (1 + t 2 /n) (n+1)/2 Moments: E(t) = 0 (we could change this) Var(t) = n/(n 2) for n > 2. Not defined for n = 1. As the degrees of freedom grow, the t distribution approximates the Normal For low degrees of freedom, the t has fatter tails Chris Adolph (UW) Making Inferences from Samples May 10, 2010 29 / 77

Probability Distributions Example t distributions Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 t 1 t 4 t 15 4 2 0 2 4 x Chris Adolph (UW) Making Inferences from Samples May 10, 2010 30 / 77

Probability Distributions The t distribution Suppose we have a variable t that is t-distributed with mean 0 and 5 degrees of freedom That is, P(t) = f t (5) How large would t need to be for us to doubt it came from this distribution? Put another way, what are the critical values of t we would see just once in 10 draws? once in 20 draws? once in 100 draws? Put still another way, which critical values will bound the 90% (or 95%, or 99%) most ordinary t draws? Chris Adolph (UW) Making Inferences from Samples May 10, 2010 31 / 77

Probability Distributions Areas under the t t distribution with 5 degrees of freedom 90% of the mass of t(5) is between -2.015 and 2.015 5 percent of the mass Another 5 percent of the mass, for a total of 10 percent 2.015 0.000 2.015 A unusual value is one in the tails. Critical values = cutoff for unusualness Chris Adolph (UW) Making Inferences from Samples May 10, 2010 32 / 77

Probability Distributions Areas under the t t distribution with degrees of freedom 90% of the mass of t( ) is between -1.645 and 1.645 5 percent of the mass Another 5 percent of the mass, for a total of 10 percent 1.645 0.000 1.645 The degrees of freedom reflect how much information we have More information makes the tails thinner Critical values shrink; estimates get more certain Chris Adolph (UW) Making Inferences from Samples May 10, 2010 33 / 77

Probability Distributions Areas under the t t distribution with 5 degrees of freedom 95% of the mass of t(5) is between -2.571 and 2.571 2.5 percent of the mass Another 2.5 percent of the mass, for a total of 5 percent 2.571 0.000 2.571 Going back to the df = 5 case, notice we can choose what constitutes unusual Here, we ve raise the bar: only the 5% most extreme values are unusual Chris Adolph (UW) Making Inferences from Samples May 10, 2010 34 / 77

Probability Distributions Areas under the t t distribution with degrees of freedom 95% of the mass of t( ) is between -1.96 and 1.96 2.5 percent of the mass Another 2.5 percent of the mass, for a total of 5 percent 1.96 0.00 1.96 The infinite degrees of freedom critical values for the 95% case This is the most widely used standard for whether a result is unusual Chris Adolph (UW) Making Inferences from Samples May 10, 2010 35 / 77

Probability Distributions Areas under the t t distribution with 5 degrees of freedom 99% of the mass of t(5) is between -4.032 and 4.032 0.5 percent of the mass Another 0.5 percent of the mass, for a total of 1 percent 4.032 0.000 4.032 The most stringent standard is 99% In this case, a draw from the t must be in the 1% most extreme region to be considered unusual Chris Adolph (UW) Making Inferences from Samples May 10, 2010 36 / 77

Probability Distributions Areas under the t t distribution with degrees of freedom 99% of the mass of t( ) is between -2.576 and 2.576 0.5 percent of the mass Another 0.5 percent of the mass, for a total of 1 percent 2.576 0.000 2.576 The infinite degrees of freedom case for 99% Chris Adolph (UW) Making Inferences from Samples May 10, 2010 37 / 77

Probability Distributions Critical values of the t distribution We can state how unusual an observation is under the assumption that it is distributed t(n) Test level df = 5 df = 0.1 level / 90% 2.015 1.645 0.05 level / 95% 2.571 1.960 0.01 level / 99% 4.032 2.576 These will be very useful for quantifying the uncertainty of estimates Chris Adolph (UW) Making Inferences from Samples May 10, 2010 38 / 77

Inference about the Population Mean The Law of Large Numbers When sampling from a population, our estimates of features of that population get better the more data we sample What do we mean by better estimates? An estimate with smaller error (expected deviation from the truth): E ((Estimate Truth) 2) E ((Estimate E(Estimate)) 2) var(estimate) We have a special name for the square root of the variance of an error We call it this special standard deviation the standard error of the estimate, or se(estimate) Chris Adolph (UW) Making Inferences from Samples May 10, 2010 39 / 77

Inference about the Population Mean The Law of Large Numbers The Law of Large Numbers applies to estimating the mean of a population When our estimate of the mean, x gets closer to the truth, its standard error, se( x) gets smaller To see this, we need to derive se( x), which means we need to first derive var( x) Chris Adolph (UW) Making Inferences from Samples May 10, 2010 40 / 77

Inference about the Population Mean Derivation of the standard error of the mean var( x) = var ( 1 n ) n x i i=1 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 41 / 77

Inference about the Population Mean Derivation of the standard error of the mean ( var( x) = var 1 n ( = E 1 n ) n x i i=1 n x i E i=1 ( 1 n n i=1 )) 2 x i Chris Adolph (UW) Making Inferences from Samples May 10, 2010 41 / 77

Inference about the Population Mean Derivation of the standard error of the mean ( var( x) = var 1 n ( = E 1 n ) n x i i=1 n x i E i=1 ( 1 n n i=1 )) 2 x i ( n ( n )) 2 = E 1 n 2 x i E x i i=1 i=1 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 41 / 77

Inference about the Population Mean Derivation of the standard error of the mean ( var( x) = var 1 n ( = E 1 n ) n x i i=1 n x i E i=1 ( 1 n n i=1 )) 2 x i ( n ( n )) 2 = E 1 n 2 x i E x i i=1 i=1 ( n ( n )) 2 = 1 n 2 E x i E x i i=1 i=1 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 41 / 77

Inference about the Population Mean Derivation of the standard error of the mean ( var( x) = var 1 n ( = E 1 n ) n x i i=1 n x i E i=1 ( 1 n n i=1 )) 2 x i ( n ( n )) 2 = E 1 n 2 x i E x i i=1 i=1 ( n ( n )) 2 = 1 n 2 E x i E x i i=1 = 1 n 2 var ( n i=1 x i ) i=1 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 41 / 77

Inference about the Population Mean Derivation of the standard error of the mean Now we make use of the fact that for uncorrelated x 1,..., x i,... x n, var ( n i=1 x i) = n i=1 var (x i), and write: var( x) = 1 n 2 var ( n i=1 x i ) Chris Adolph (UW) Making Inferences from Samples May 10, 2010 42 / 77

Inference about the Population Mean Derivation of the standard error of the mean Now we make use of the fact that for uncorrelated x 1,..., x i,... x n, var ( n i=1 x i) = n i=1 var (x i), and write: var( x) = 1 n 2 var ( n i=1 x i ) = 1 n n 2 var (x i ) i=1 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 42 / 77

Inference about the Population Mean Derivation of the standard error of the mean Now we make use of the fact that for uncorrelated x 1,..., x i,... x n, var ( n i=1 x i) = n i=1 var (x i), and write: var( x) = 1 n 2 var ( n i=1 x i ) = 1 n n 2 var (x i ) i=1 = 1 n 2 nσ2 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 42 / 77

Inference about the Population Mean Derivation of the standard error of the mean Now we make use of the fact that for uncorrelated x 1,..., x i,... x n, var ( n i=1 x i) = n i=1 var (x i), and write: var( x) = 1 n 2 var ( n i=1 x i ) = 1 n n 2 var (x i ) i=1 = 1 n 2 nσ2 = σ2 n Chris Adolph (UW) Making Inferences from Samples May 10, 2010 42 / 77

Inference about the Population Mean Derivation of the standard error of the mean Now we make use of the fact that for uncorrelated x 1,..., x i,... x n, var ( n i=1 x i) = n i=1 var (x i), and write: var( x) = 1 n 2 var ( n i=1 x i ) = 1 n n 2 var (x i ) i=1 = 1 n 2 nσ2 = σ2 n se( x) = σ n Chris Adolph (UW) Making Inferences from Samples May 10, 2010 42 / 77

Inference about the Population Mean The Square Root Law se( x) = σ n Remember that the smaller se( x) is, the better our estimate Making n bigger adding more observations will indeed shrink se( x), but there are diminishing returns Because se( x) depends on n, to halve the amount of error we must quadruple the amount of data If our se is 500 dollars of wealth with 100 observations, to reduce our expected error to 250 dollars, we need 400 total observations Chris Adolph (UW) Making Inferences from Samples May 10, 2010 43 / 77

Inference about the Population Mean The t-statistic The t statistic of an estimate is the estimate, minus a hypothetical level, divided by the standard error of the estimate For the mean, x, this is t = x µ 0 se( x) = x µ 0 σ/ n We will often set our hypothetical comparison level µ 0 = 0, so this frequently reduces to: t = x se( x) = x σ/ n Chris Adolph (UW) Making Inferences from Samples May 10, 2010 44 / 77

Inference about the Population Mean The t-statistic Note that the t-statistic should be t distributed! 1 x: The mean of x i is the sum of a large number of independent variables, and thus will tend to be Normally distributed, by the Central Limit Theorem 2 σ 2 : The variance of x i is the sum of n squared variables, and is thus χ 2 distributed 3 The ratio of a Normal variable and the square root of a χ 2 variable is t-distributed Chris Adolph (UW) Making Inferences from Samples May 10, 2010 45 / 77

Inference about the Population Mean The t-statistic Originally discovered by William Gosset, a statistician working at Guinness Brewery in the 1908 on the problem of measuring the quality of beer Guinness was a pioneer of early statistical quality control, but forbade its statisticians from publishing (trade secrets!) Gosset published his discovery under the pseudonym Student. Hence this is Student s t-test Chris Adolph (UW) Making Inferences from Samples May 10, 2010 46 / 77

Inference about the Population Mean The t-statistic We can use the t-test to assess how likely it is that the truth deviates from a hypothetical value, given the sample estimate and standard error That is, given x µ 0 as large as the one we saw, uncertainty of that estimate σ/ n, how likely is it that the population mean of x is actual µ 0 or smaller? Large t could occur for one of two reasons: 1 A unusual random sample far from the true population mean (which is close to µ 0 2 A typical sample from a population mean that is larger than µ 0 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 47 / 77

Inference about the Population Mean The t-statistic We will never know which situation we are in But we can calculate how often we would see a t as large as the one we saw by chance. This probability is known as the p-value To look it up in a table or stat package, we need to know the degrees of freedom (roughly, how much information we have, n 1) Chris Adolph (UW) Making Inferences from Samples May 10, 2010 48 / 77

Inference about the Population Mean Significance tests We call an estimate statistically significant when we would only expect to see such a large t by chance less often than a prespecified siginificance level A statistical significance test checks whether the p-value associated with a t-test is below this level, usually 0.05 Significance tests are tests against a specific null hypothesis, and a conservative in the sense of being likely to favor the null over our own hypothesis Chris Adolph (UW) Making Inferences from Samples May 10, 2010 49 / 77

Inference about the Population Mean Are significance tests really conservative? Type I error Probability of falsely rejecting the null Type II error Probability of falsely accepting the null Significance tests minimize the chance of Type II error at the expense of allowing for more Type I error Is this a good idea? The null hypothesis is usually arbitrary, and our prior belief is usually that it is unlikely. Significance tests may lead to excessive contrarianism, which is not conservative at all Chris Adolph (UW) Making Inferences from Samples May 10, 2010 50 / 77

Inference about the Population Mean Confidence intervals An alternative to p-values which conveys the same information is the confidence interval In repeated samples from the same population, the 95% confidence interval contains the true population mean 95% of the time Warning! We cannot say the truth lies in the confidence interval we calculate with 95% probability we don t know in this specific case But if we conduct 20 studies, and in each report a 95% confidence interval, we will expect to be wrong in only one study (1 in 20) Chris Adolph (UW) Making Inferences from Samples May 10, 2010 51 / 77

Inference about the Population Mean Chris Adolph (UW) Making Inferences from Samples May 10, 2010 52 / 77

Inference about the Population Mean Calculating the confidence interval We pick a confidence level, such as 95% Then, we look up the critical value of t containing that 95% of the t distribution, and calculate: x lower = x t n 1ˆσ x x upper = x + t n 1ˆσ x Note that for the 95% CI, the critical value with infinite degrees of freedom is ±1.96, so 95% CIs are roughly ±2 standard errors Chris Adolph (UW) Making Inferences from Samples May 10, 2010 53 / 77

Inference about the Population Mean Example: Washington State Income Tax Bill Gates Sr. has proposed a state income tax for the November ballot On April 21, 2010, SurveyUSA sampled 500 Washington adults in order to estimate the statewide support, asking the following: A proposed initiative would create an income tax in Washington state on people making $200,000 per year and on couples making twice that. It would also cut the state s portion of the property tax by 20%, and end the business and occupation tax for small businesses. Do you support or do you oppose this proposed initiative? SurveyUSA found 66 percent supported the measure. How certain are we that the referendum would pass if it were held today? Chris Adolph (UW) Making Inferences from Samples May 10, 2010 54 / 77

Inference about the Population Mean Example: Washington State Income Tax How likely is it that a survey of 500 random individuals from a population would find 66% support for a measure when really only 50% or less support the measure Let s use a t-test: t = x µ 0 se( x) = x µ 0 σ/ n 0.66 0.5 = 0.474/ 500 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 55 / 77

Inference about the Population Mean Example: Washington State Income Tax How likely is it that a survey of 500 random individuals from a population would find 66% support for a measure when really only 50% or less support the measure Let s use a t-test: t = x µ 0 se( x) = x µ 0 σ/ n 0.66 0.5 = 0.474/ 500 = 7.545 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 55 / 77

Inference about the Population Mean Example: Washington State Income Tax How likely is it that a survey of 500 random individuals from a population would find 66% support for a measure when really only 50% or less support the measure Let s use a t-test: t = x µ 0 se( x) = x µ 0 σ/ n 0.66 0.5 = 0.474/ 500 = 7.545 A t this big would appear by chance only 1 in 4,620,000,000,000 random samples, (1 in 4.6 trillion), for a p = 0.000000000000216 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 55 / 77

Inference about the Population Mean Example: Washington State Income Tax A t this big would appear by chance only 1 in 4,620,000,000,000 random samples, (1 in 4.6 trillion), for a p = 0.000000000000216 Why is this so unlikely? Suppose that on April 21, a bare majority of Washington adults really did oppose the income tax. Then to get 66% approval, instead of the correct 50% approval, SurveyUSA would have need to sample 500 (0.66 0.50) = 80 more supporters than we would expect on average in 500 random draws. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 56 / 77

Inference about the Population Mean Example: Washington State Income Tax A t this big would appear by chance only 1 in 4,620,000,000,000 random samples, (1 in 4.6 trillion), for a p = 0.000000000000216 Why is this so unlikely? Suppose that on April 21, a bare majority of Washington adults really did oppose the income tax. Then to get 66% approval, instead of the correct 50% approval, SurveyUSA would have need to sample 500 (0.66 0.50) = 80 more supporters than we would expect on average in 500 random draws. That s as unlikely as flipping a coin 500 times and getting 330 head and 170 tails. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 56 / 77

Inference about the Population Mean Example: Washington State Income Tax Another way to summarize the uncertainty in our polling results is to calculate the confidence interval We can also state with 95% confidence that the actual level of support for the income tax among all Washington adults is between 61.8% and 70.2% Notice these numbers are 66 ± 4.2, which also happens to be the reported margin of error for the poll (what journalists call a confidence interval). Margin of error is misnamed: errors can be bigger than this, & are guaranteed to be 5% of the time! Chris Adolph (UW) Making Inferences from Samples May 10, 2010 57 / 77

Inference about the Population Mean Example: Washington State Income Tax SurveyUSA s sample of Washington voters includes 120 Republicans, 5 7% percent of whom supported the income tax (!) Is this result certain? Judging by the published margin of error, we might think so: 57% 4.2% = 52.8%, still a majority of Republicans. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 58 / 77

Inference about the Population Mean Example: Washington State Income Tax Let s do our own t-test to be sure: t = x µ 0 se( x) Chris Adolph (UW) Making Inferences from Samples May 10, 2010 59 / 77

Inference about the Population Mean Example: Washington State Income Tax Let s do our own t-test to be sure: t = x µ 0 se( x) = x µ 0 σ/ n Chris Adolph (UW) Making Inferences from Samples May 10, 2010 59 / 77

Inference about the Population Mean Example: Washington State Income Tax Let s do our own t-test to be sure: t = x µ 0 se( x) = x µ 0 σ/ n 0.57 0.5 = 0.497/ 120 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 59 / 77

Inference about the Population Mean Example: Washington State Income Tax Let s do our own t-test to be sure: t = x µ 0 se( x) = x µ 0 σ/ n 0.57 0.5 = 0.497/ 120 = 1.468 This is a pretty small t-statistic, one we would see by chance in 1 out of 7 random samples. The p-value is 0.150. We find that the 95% confidence interval ranges from 48% to 66%, which is equal to our estimate of 57% by ±9%. We are not at all certain that Washington Republicans support the income tax. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 59 / 77

Inference about the Population Mean Example: Washington State Income Tax Why is this different from our last example? Chris Adolph (UW) Making Inferences from Samples May 10, 2010 60 / 77

Inference about the Population Mean Example: Washington State Income Tax Why is this different from our last example? Two reasons: 1 Uncertainty depends on the size of the sample (which has changed) 2 Uncertainty depends on the variance of the sample (which has changed) Chris Adolph (UW) Making Inferences from Samples May 10, 2010 60 / 77

Inference about the Population Mean Change in Size of Sample Suppose a bare majority of Washington Republicans actually oppose the income tax. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 61 / 77

Inference about the Population Mean Change in Size of Sample Suppose a bare majority of Washington Republicans actually oppose the income tax. Then, to get 57% of Republicans in favor in a sample of 120, SurveyUSA would need to have randomly sampled 120 (0.57 0.50) = 8 more Republicans in favor than they would expect to on average This is exactly the same as flipping a coin 120 times and getting 68 heads and 52 tails. Unlikely, but not that unlikely. The margin of error reported with a survey applies only to the full population Chris Adolph (UW) Making Inferences from Samples May 10, 2010 61 / 77

Inference about the Population Mean Change in Size of Sample Suppose a bare majority of Washington Republicans actually oppose the income tax. Then, to get 57% of Republicans in favor in a sample of 120, SurveyUSA would need to have randomly sampled 120 (0.57 0.50) = 8 more Republicans in favor than they would expect to on average This is exactly the same as flipping a coin 120 times and getting 68 heads and 52 tails. Unlikely, but not that unlikely. The margin of error reported with a survey applies only to the full population Any average we calculate for a subgroup (the young, women, Republicans, Hispanics, etc.) will have a unique confidence interval, always bigger than that for thw whole sample Chris Adolph (UW) Making Inferences from Samples May 10, 2010 61 / 77

Inference about the Population Mean Change in Size of Sample Suppose a bare majority of Washington Republicans actually oppose the income tax. Then, to get 57% of Republicans in favor in a sample of 120, SurveyUSA would need to have randomly sampled 120 (0.57 0.50) = 8 more Republicans in favor than they would expect to on average This is exactly the same as flipping a coin 120 times and getting 68 heads and 52 tails. Unlikely, but not that unlikely. The margin of error reported with a survey applies only to the full population Any average we calculate for a subgroup (the young, women, Republicans, Hispanics, etc.) will have a unique confidence interval, always bigger than that for thw whole sample The smaller the n, the bigger the confidence interval, the less certain the finding Chris Adolph (UW) Making Inferences from Samples May 10, 2010 61 / 77

Inference about the Population Mean Change in Variance of the Sample The t-statistic gets bigger the smaller the variance Chris Adolph (UW) Making Inferences from Samples May 10, 2010 62 / 77

Inference about the Population Mean Change in Variance of the Sample The t-statistic gets bigger the smaller the variance Is the variance for our Republican sample smaller or larger than the whole sample variance? Note that our outcome is a binary variable Chris Adolph (UW) Making Inferences from Samples May 10, 2010 62 / 77

Inference about the Population Mean Change in Variance of the Sample The t-statistic gets bigger the smaller the variance Is the variance for our Republican sample smaller or larger than the whole sample variance? Note that our outcome is a binary variable Recall the variance of a binary variable is always var(x) = π(1 π) = π π 2 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 62 / 77

Inference about the Population Mean Change in Variance of the Sample The t-statistic gets bigger the smaller the variance Is the variance for our Republican sample smaller or larger than the whole sample variance? Note that our outcome is a binary variable Recall the variance of a binary variable is always var(x) = π(1 π) = π π 2 This is a parabola maximized at 0.5 Chris Adolph (UW) Making Inferences from Samples May 10, 2010 62 / 77

Inference about the Population Mean Change in Variance of the Sample Var(x) for a binary variable x 0.00 0.05 0.10 0.15 0.20 0.25 0.0 0.2 0.4 0.6 0.8 1.0 E(x) for a binary variable x Chris Adolph (UW) Making Inferences from Samples May 10, 2010 63 / 77

Inference about the Population Mean Change in Variance of the Sample Thus because the estimated probability a Republican supports the income tax is closer to 0.5 than the probability for all surveyed adults, The uncertainty of the proportion of Republicans supporting is also greater Bad news: Not only do margins of error reported in the press only apply to the full sample, they also only apply to one specific question! Good news: With minimal calculation, you can find the right margin of error on you own Worse news: If error is maximized for probabilities 0.5, what does that mean for predicting election outcomes? Chris Adolph (UW) Making Inferences from Samples May 10, 2010 64 / 77

Inference about the Population Mean On confidence versus significance There are two ways we could report our finding on Republicans support for the income tax: Significance test Based on a survey of Washington adults, we estimate 57% of Republicans support the income tax. However, this estimate is not statistically significantly different from 50% at the 0.05 level. Confidence interval Based on a survey of Washington adults, we estimate 57% of Republicans support the income tax. The 95% confidence interval for this estimate ranges from 48% to 66%, suggesting anywhere from a slight majority against the tax to a large majority in favor. Chris Adolph (UW) Making Inferences from Samples May 10, 2010 65 / 77

Inference about the Population Mean On confidence versus significance These write-ups present the same results. They rely on the same math and the same statistical theory. The significance test presentation obscures the substantive impact of the result in jargon, and makes it appear ignorable. The confidence interval focuses on the substantive impact of the result, and clarifies what we can and cannot reject: Although we aren t sure how many Republicans support the tax, it is very likely that half or more do, and very unlikely that a large percentage of Republicans are opposed Chris Adolph (UW) Making Inferences from Samples May 10, 2010 66 / 77

Inference about the Population Mean On confidence versus significance The significance test forces you to accept the author s arbitrary null hypothesis The confidence interval allows you to choose you own null And shows how robust your findings are to slight changes in the null Chris Adolph (UW) Making Inferences from Samples May 10, 2010 67 / 77

Inference about the Population Mean The irrelevance of population size t = x µ 0 σ/ n Notice one number that doesn t appear in this formula: the size of the population The precision of an estimate doesn t depend on the size of the population, only the size of the sample. That s why you tend to see polls using samples of 500 to 2000 respondents regardless of whether they are sampling from a small town population or the whole country Chris Adolph (UW) Making Inferences from Samples May 10, 2010 68 / 77

Inference about a Difference in Population Means Comparing two means So far, we have asked how far the mean of our sample might differ from a specific value e.g., how much does the average support for an income tax differ from 0.5? But what if we want to compare two groups in our sample? That is, what if we want to compare to means to each other? e.g., how much does the average support for an income tax among women differ from support among men? Chris Adolph (UW) Making Inferences from Samples May 10, 2010 69 / 77

Inference about a Difference in Population Means t-test for comparison of means As with a single mean, we will calculate a t-statistic: t = x ȳ se( x ȳ) then check if the t-statistic exceeds the chosen critical value or simply calculate the probability of seeing so large a t The form of the standard error here is a bit messy: se( x ȳ) = ((nx 1)ˆσ 2 x + (n y 1)ˆσ 2 y n x + n y 2 ) ( 1 + 1 ) n x n y Chris Adolph (UW) Making Inferences from Samples May 10, 2010 70 / 77