Lies, Damned Lies and Statistics Berwin A Turlach 1 1 School of Mathematics and Statistics University of Western Australia berwin.turlach@gmail.com 27 September 2011 Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 1 / 16
Some quotes There are three kinds of lies: lies, damned lies, and statistics. attributed to Benjamin Disraeli by Mark Twain It s easy to lie with statistics; it is easier to lie without them. Frederick Mosteller It is easy to lie with statistics. It is hard to tell the truth without statistics. Andrejs Dunkels Figures don t lie, but liars do figure. attributed to Mark Twain by Yates He uses statistics like a drunken man uses a lamp post, more for support than illumination. Andrew Lang http://stats.stackexchange.com/questions/726/famous-statistician-quotes http://en.wikiquote.org/wiki/mark_twain Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 2 / 16
Some quotes (ctd) Statistical thinking will one day be as necessary a qualification for efficient citizenship as the ability to read and write. H.G.Wells Those who ignore Statistics are condemned to reinvent it. Brad Efron Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 3 / 16
A common mistake In theory, a comparison of two experimental effects requires a statistical test on their difference. In practice, this comparison is often based on an incorrect procedure involving two separate tests in which researchers conclude that effects differ when one effect is significant (P < 0.05) but the other is not (P > 0.05). We reviewed 513 behavioral, systems and cognitive neuroscience articles in five top-ranking journals (Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience) and found that 78 used the correct procedure and 79 used the incorrect procedure. An additional analysis suggests that incorrect analyses of interactions are even more common in cellular and molecular neuroscience. We discuss scenarios in which the erroneous procedure is particularly beguiling. Nieuwenhuis, S., Forstmann, B.U. and Wagenmakers, E.J. (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. Nature Neuroscience 14:1105 1107. http://www.nature.com/neuro/journal/v14/n9/full/nn.2886.html http://www.guardian.co.uk/commentisfree/2011/sep/09/bad-science-research-error?cmp=twt_fd Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 4 / 16
More quotes Absence of evidence is not evidence of absence. Carl Sagan All we know about the world teaches us that the effects of A and B are always different in some decimal place for any A and B. Thus asking are the effects different? is foolish. John W. Tukey... no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Sir Ronald A. Fisher Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 5 / 16
Language and Probability Linda is thirty one years old, single, outspoken and very bright. She majored in philosophy. As a student she was deeply concerned with issues of discrimination and social justice and participated in anti-nuclear demonstrations. Which of the following two alternatives is more probable? A Linda is a bank teller. B Linda is a bank teller and active in the feminist movement. If C is an event (Linda is bank teller) and D is another event (Linda is active in the feminist movement), then C D C = P(C D) P(C) Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 6 / 16
Language and Probability (ctd) Should the word and be interpreted as the logical and ( )? The logical and is commutative, that is, a b is equivalent to b a. But this is not how we understand natural language; and can have chronological or causal implications: Peggy and Paul married and Peggy became pregnant. Peggy became pregnant and Peggy and Paul married. Mark got angry and Mary left. Mary left and Mark got angry. Verona is in Italy and Valencia is in Spain. Valencia is in Spain and Verona is in Italy. Only in the last pair is the and used in the sense of the logical and. Even more surprising, we also know without thinking when and should be interpreted as the logical or: We invited friends and colleagues Gigerenzer, G. (2007). Gut Feelings: The Intelligence of the Unconcious, Penguin Group, New York. Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 7 / 16
More quotes My thesis is simply this: probability does not exist. Bruno de Finetti Million to one chances crop up nine times out of ten. Terry Pratchett Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 8 / 16
Conditional probabilities If probabilities (and expectations) are already hard to understand Birthday problem de Méré s paradox Ellsberg paradox interpretation of probability statements (e.g. weather forecast, side-effects of medicaments) human ability to produce/recognise randomness various experiments by Amos Tversky and Daniel Kahneman... what about conditional probabilities? Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 9 / 16
Conditional probabilities (cont.) In general: P(A B) P(B A) The Monty Hall problem Mass screening for a rare disease In law it is called Prosecutor s fallacy: People vs. Collins in the U.S.A. Sally Clark in the U.K. Peter Donnelly in http://www.youtube.com/watch?v=klmzxmrcuto Lucia de Berk in the Netherlands Arguments for and against law changes are often supported by the wrong conditional probability... Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 10 / 16
Conditional probabilities (cont.) Note: P(A B) > P(A) implies P(B A) > P(B) A senior policeman was quoted as saying that the proportion of members of an ethnic minority amongst those convicted of mugging was higher than the proportion in the general population. In our language..., P(E C) > P(E), where E is the event of belonging to the ethnic minority and C conviction for mugging. This association implies that P(C E) > P(C), the members of the ethnic minority are more likely to be convicted of mugging than is a random member of the population. To some, the second statement sounds more racist than the first, yet they are equivalent. Lindley (2006, Chapter 4.4, p. 53ff) Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 11 / 16
Example of statistical thinking During the second world war, a British research group was asked to improve the protection of bombers from anti-aircraft fire. The group collected data on the places on returning aircrafts that had bullet and flak-holes. A Put extra armour plating on all parts of the air-plane. B Put extra armour plating on the places found to have the most bullet and flak-holes. C Put extra armour plating on the places with no or few bullet and flak-holes. Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 12 / 16
Example of statistical thinking (ctd) This is a classical missing data problem. We do not have data/information on what we are actually interested in, namely: where were planes that did not return hit? We do have data/information on where planes that returned where hit: presumably, places with observed hits are not vital. presumably, planes that did not return were hit at other places. Conclusion: Put extra armour plating on the places with no or few bullet and flak-holes. Statistics: A Job for Professionals http://www.statsoc.org.au/objectlibrary/288?filename=booklet.pdf Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 13 / 16
Statistics is Sexy! Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital. Aaron Levenstein Statisticians, like artists, have the bad habit of falling in love with their models. George Box I keep saying that the sexy job in the next 10 years will be statisticians. And I m not kidding. Hal Varian Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 14 / 16
References http://www.understandinguncertainty.org/ Gigerenzer, G. (2002). Calculated Risks: How to know when Numbers Deceive You, Simon & Schuster. Gigerenzer, G. (2007). Gut feelings : The Intelligence of The Unconscious, Viking. Goldacre, B. (2009). Bad Science, Fourth Estate. Haigh, J. (2003). Taking Chances: Winning with Probability (2nd ed), Oxford University Press. Hooke, R. (1983). How to tell the Liars from the Statisticians, Marcel Dekker. Huff, D. (1954). How to lie with Statistics, Gollancz. Lindley, D.V. (2006). Understanding Uncertainty, Wiley & Sons. Mlodinow, L. (2009). The Drunkard s Walk: How Randomness Rules Our Lives, Vintage. Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 15 / 16
References (ctd) Nahin, P.J. (2008). Digital Dice: Computational Solutions to Practical Probability Problems, Princeton University Press. Olofsson, P. (2007). Probabilities: The Little Numbers That Rule Our Lives, Wiley & Sons. Rosenthal (2006). Struck by Lightning: The Curious World of Probabilities, Joseph Henry Press. Salsburg, D. (2001). The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century, W.H. Freeman and Company. Senn, S. (2003). Dicing with Death: Chance, Risk and Health, Cambridge University Press. Woolfson, M.M. (2008). Everyday Probability and Statistics: Health, Elections, Gambling and War, Imperial College Press. Berwin A Turlach (UWA) Lies, Damned Lies and Statistics 27 September 2011 16 / 16