Fusion Confusion? Comments on Nancy Reid: BFF Four Are we Converging?

Similar documents
A Statistical Scientist Meets a Philosopher of Science: A Conversation between Sir David Cox and Deborah Mayo (as recorded, June, 2011)

Statistical Inference Without Frequentist Justifications

A Scientific Realism-Based Probabilistic Approach to Popper's Problem of Confirmation

Logic is the study of the quality of arguments. An argument consists of a set of

Detachment, Probability, and Maximum Likelihood

Bayesian Probability

Experimental Design. Introduction

ECONOMETRIC METHODOLOGY AND THE STATUS OF ECONOMICS. Cormac O Dea. Junior Sophister

Statistics, Politics, and Policy

The Problem of Induction and Popper s Deductivism

Logic: inductive. Draft: April 29, Logic is the study of the quality of arguments. An argument consists of a set of premises P1,

Scientific Realism and Empiricism

Sins of the Epistemic Probabilist Exchanges with Peter Achinstein

Scientific errors should be controlled, not prevented. Daniel Eindhoven University of Technology

Sydenham College of Commerce & Economics. * Dr. Sunil S. Shete. * Associate Professor

PHILOSOPHIES OF SCIENTIFIC TESTING

There are two common forms of deductively valid conditional argument: modus ponens and modus tollens.

Richard L. W. Clarke, Notes REASONING

Inductive Inference, Rationality and Pragmatism: Peirce and Ajdukiewicz

Mementos from Excursion 2 Tour II: Falsification, Pseudoscience, Induction (first installment, Nov. 17, 2018) 1

THE ROLE OF COHERENCE OF EVIDENCE IN THE NON- DYNAMIC MODEL OF CONFIRMATION TOMOJI SHOGENJI

Does Deduction really rest on a more secure epistemological footing than Induction?

Discussion Notes for Bayesian Reasoning

Falsification or Confirmation: From Logic to Psychology

Susan Vineberg. Ph.D. University of California, Berkeley, Logic and the Methodology of Science, 1992.

2014 THE BIBLIOGRAPHIA ISSN: Online First: 21 October 2014

In Defense of Radical Empiricism. Joseph Benjamin Riegel. Chapel Hill 2006

1. Introduction Formal deductive logic Overview

The problems of induction in scientific inquiry: Challenges and solutions. Table of Contents 1.0 Introduction Defining induction...

HPS 1653 / PHIL 1610 Revision Guide (all topics)

Is Epistemic Probability Pascalian?

CS485/685 Lecture 5: Jan 19, 2016

From Transcendental Logic to Transcendental Deduction

11 Beware of Syllogism: Statistical Reasoning and Conjecturing According to Peirce

INTUITION AND CONSCIOUS REASONING

World without Design: The Ontological Consequences of Natural- ism , by Michael C. Rea.

CLASS #17: CHALLENGES TO POSITIVISM/BEHAVIORAL APPROACH

Jeffrey, Richard, Subjective Probability: The Real Thing, Cambridge University Press, 2004, 140 pp, $21.99 (pbk), ISBN

Rethinking Knowledge: The Heuristic View

The error statistical philosopher as normative naturalist

Outline. The argument from so many arguments. Framework. Royall s case. Ted Poston

Family Studies Center Methods Workshop

Scientific Progress, Verisimilitude, and Evidence

Introduction and Background

Phil 1103 Review. Also: Scientific realism vs. anti-realism Can philosophers criticise science?

Course Webpage:

MARK KAPLAN AND LAWRENCE SKLAR. Received 2 February, 1976) Surely an aim of science is the discovery of the truth. Truth may not be the

Naturalized Epistemology. 1. What is naturalized Epistemology? Quine PY4613

Scientific Method and Research Ethics Questions, Answers, and Evidence. Dr. C. D. McCoy

Verificationism. PHIL September 27, 2011

Can A Priori Justified Belief Be Extended Through Deduction? It is often assumed that if one deduces some proposition p from some premises

Hoong Juan Ru. St Joseph s Institution International. Candidate Number Date: April 25, Theory of Knowledge Essay

Précis of Empiricism and Experience. Anil Gupta University of Pittsburgh

INTRODUCTION TO HYPOTHESIS TESTING. Unit 4A - Statistical Inference Part 1

INTRODUCTION: EPISTEMIC COHERENTISM

Business Research: Principles and Processes MGMT6791 Workshop 1A: The Nature of Research & Scientific Method

Gary Ebbs, Carnap, Quine, and Putnam on Methods of Inquiry, Cambridge. University Press, 2017, 278pp., $99.99 (hbk), ISBN

Lecture 1 The Concept of Inductive Probability

INQUIRY AS INQUIRY: A LOGIC OF SCIENTIFIC DISCOVERY

Reliabilism: Holistic or Simple?

Epistemic Utility and Theory-Choice in Science: Comments on Hempel

Philosophy Epistemology. Topic 3 - Skepticism

Why Good Science Is Not Value-Free

The Paradox of the stone and two concepts of omnipotence

MANAGEMENT RESEARCH: A THOUGHT ON VALIDITY OF POSITIVISM

Epistemic Contextualism as a Theory of Primary Speaker Meaning

Ayer on the criterion of verifiability

The Unbearable Lightness of Theory of Knowledge:

Skepticism is True. Abraham Meidan

PHIL 155: The Scientific Method, Part 1: Naïve Inductivism. January 14, 2013

Bayesian Probability

Conditional Degree of Belief

International Phenomenological Society

A Priori Bootstrapping

Tuukka Kaidesoja Précis of Naturalizing Critical Realist Social Ontology

Philosophy Epistemology Topic 5 The Justification of Induction 1. Hume s Skeptical Challenge to Induction

Popper s Falsificationism. Philosophy of Economics University of Virginia Matthias Brinkmann

Against Coherence: Truth, Probability, and Justification. Erik J. Olsson. Oxford: Oxford University Press, Pp. xiii, 232.

Philosophy 5340 Epistemology Topic 4: Skepticism. Part 1: The Scope of Skepticism and Two Main Types of Skeptical Argument

Review Articles THE LOGIC OF STATISTICAL INFERENCE1

The Concept of Testimony

Truth and Evidence in Validity Theory

KNOWLEDGE ON AFFECTIVE TRUST. Arnon Keren

PHI 1700: Global Ethics

Remarks on the philosophy of mathematics (1969) Paul Bernays

Epistemology Naturalized

Unit VI: Davidson and the interpretational approach to thought and language

An Empiricist Theory of Knowledge Bruce Aune

Rule-Following and the Ontology of the Mind Abstract The problem of rule-following

The Oxford Handbook of Epistemology

Introduction and Background

UNITY OF KNOWLEDGE (IN TRANSDISCIPLINARY RESEARCH FOR SUSTAINABILITY) Vol. I - Philosophical Holism M.Esfeld

Introduction to Deductive and Inductive Thinking 2017

Max Deutsch: The Myth of the Intuitive: Experimental Philosophy and Philosophical Method. Cambridge, MA: MIT Press, xx pp.

UC Berkeley, Philosophy 142, Spring 2016

Error and the Law Exchanges with Larry Laudan

MY PURPOSE IN THIS BOOK IS TO PRESENT A

Philosophy Of Science On The Moral Neutrality Of Scientific Acceptance

ONTOLOGICAL PROBLEMS OF PLURALIST RESEARCH METHODOLOGIES

Chance, Chaos and the Principle of Sufficient Reason

Transcription:

Fusion Confusion? Comments on Nancy Reid: BFF Four Are we Converging? Deborah G. Mayo The Fourth Bayesian, Fiducial and Frequentist Workshop (BFF4): Harvard University May 2, 2017 <1>

I m delighted to be part of a workshop linking statistics and philosophy of statistics! I thank the organizers for inviting me. Nancy Reid s BFF Four Are we Converging? gives numerous avenues for discussion She zeroes in on obstacles to fusion: Confusion or disagreement on the nature of probability and its use in statistical inference <2>

From Nancy Reid: Nature of probability probability to describe physical haphazard variability probabilities represent features of the real world in idealized form subject to empirical test and improvement conclusions of statistical analysis expressed in terms of interpretable parameters enhanced understanding of the data generating process probability to describe the uncertainty of knowledge measures rational, supposedly impersonal, degree of belief given relevant information (Jeffreys) measures a particular person s degree of belief, subject typically to some constraints of self-consistency often linked with personal decision-making <3>

As is common, she labels the second epistemological But a key question for me is: what s relevant for a normative epistemology, for an account of what s warranted/unwarranted to infer <4>

Reid quite rightly asks: in what sense are confidence distribution functions, significance functions, structural or fiducial probabilities to be interpreted? empirically? degree of belief? literature is not very clear <5>

Reid: We may avoid the need for a different version of probability by appeal to a notion of calibration (Cox 2006, Reid & Cox 2015) This is my central focus I approach this indirectly, with analogy between philosophy of statistics and statistics <6>

Carnap: Bayesians as Popper: Frequentists (N-P/Fisher) Can t solve induction but can build logics of induction or confirmation theories (e.g., Carnap 1962). Define a confirmation relation: C(H, e) (, rather than ) logical probabilities deduced from first order languages to measure the degree of implication or confirmation that e affords H (syntactical) <7>

Problems Languages too restricted There was a continuum of inductive logics (tried to restrict via inductive intuition ) How can a priori assignments of probability be relevant to reliability? ( guide to life ) Few philosophers of science are logical positivists, but the hankering for a logic of induction remains in some quarters <8>

Popper: In opposition to [the] inductivist attitude, I assert that C(H,e) must not be interpreted as the degree of corroboration of H by e, unless e reports the results of our sincere efforts to overthrow H. (Popper 1959, 418) The requirement of sincerity cannot be formalized-- (ibid.) Observations or experiments can be accepted as supporting a theory (or a hypothesis, or a scientific assertion) only if these observations or experiments are severe tests of the theory or in other words, only if they result from serious attempts to refute the theory. (Popper 1994, 89) -never successfully formulated the notion <9>

Ian Hacking (1965) gives a logic of induction that does not require priors, based on (Barnard, Royall, Edwards) Law of Likelihood : x support hypothesis H1 more than H0 if, Pr(x;H1) > Pr(x;H0) (i.e., if the likelihood ratio LR > 1). George Barnard, there always is such a rival hypothesis viz., that things just had to turn out the way they actually did (1972, 129). Pr(LR in favor of H1 over H0 ; H0) = high. <10>

Neyman-Pearson: In order to fix a limit between small and large values of [the likelihood ratio] we must know how often such values appear when we deal with a true hypothesis. (Pearson and Neyman 1967, 106) Sampling distribution of LR A crucial criticism in statistical foundations <11>

In statistics: Sampling distributions, significance levels, power, all depend on something more [than the likelihood function] something that is irrelevant in Bayesian inference namely the sample space. (Lindley 1971, 436) Once the data are in hand: Inference should follow the Likelihood Principle (LP): In philosophy (R. Rosenkrantz defending the LP): The LP implies the irrelevance of predesignation, of whether a hypothesis was thought of beforehand or was introduced to explain known effects. (Rosenkrantz 1977, 122) (don t mix discovery with justification) <12>

Probabilism vs Performance Are you looking for a way to assign degree of belief, confirmation, support in a hypothesis considered epistemological Or to ensure long-run reliability of methods, coverage probabilities (via the sampling distribution) considered only for long-run behavior, acceptance sampling <13>

We require a third role: Probativism (severe-testing). To assess and control erroneous interpretations of data, post-data The problems with selective reporting (Fisher) non-noveldata (Popper), are not problems about long-runs It s that we cannot say about the case at hand that it has done a good job of avoiding the sources of misinterpretation. <14>

Ian Hacking: there is no such thing as a logic of statistical inference (1980, 145) Though I m responsible for much of the criticism. I now believe that Neyman, Peirce, and Braithwaite were on the right lines to follow in the analysis of inductive arguments Probability enters to qualify a claim inferred, it reports the method s capabilities to control and alert us to erroneous interpretations (error probabilities) Assigning probability to the conclusion rather than the method is founded on a false analogy with deductive logic (Hacking, 141). he s convinced by Peirce <15>

The only two who are clear on the false analogy: Fisher (1935, 54): In deductive reasoning all knowledge obtainable is already latent in the postulates...the conclusions are never more accurate than the data. In inductive reasoning.. [t]he conclusions normally grow more and more accurate as more data are included. It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based. Peirce ( The probability of Induction 1878): In the case of analytic [deductive] inference we know the probability of our conclusion (if the premises are true), but in the case of synthetic [inductive] inferences we only know the degree of trustworthiness of our proceeding. <16>

Neyman and His Performance You could say Neyman gets his performance idea trying to clarify Fisher s fiducial intervals Neyman thought his confidence intervals were the same as Fisher s fiducial intervals. In a (1934) paper (to generalize fiducial limits), Neyman said a confidence coefficient refers to the probability of our being right when applying a certain rule for making statements set out in advance. (623) Fisher was highly complimentary: Neyman had every reason to be proud of the line of argument he had developed for its perfect clarity. (Fisher comment in Neyman 1934, 618) <17>

Neyman thinks he s clarifying Fisher s (1936, 253) equivocal reference to the aggregate of all such statements. [1] This then is a definite probability statement about the unknown parameter (Fisher 1930, 533) <18>

It s interesting too, to hear Neyman s response to Carnap s criticism of Neyman s frequentism Neyman: I am concerned with the term degree of confirmation introduced by Carnap. [if] the application of the locally best one-sided test failed to reject the [test] hypothesis (Neyman 1955, 40) The question is: does a failure to reject the hypothesis confirm it? A sample X = (X1,,Xn) each Xi is Normal, N(μ,σ 2 ), (NIID), σ assumed known; H0: μ μ0 against H1: μ > μ0. Test fails to reject H 0, d(x 0 ) c α. <19>

Carnap says yes Neyman:.the attitude described is dangerous. the chance of detecting the presence [of discrepancyδ from H0], when only [this number of] observations are available, is extremely slim, even if [δ is present]. (Neyman 1955, 41) The situation would have been radically different if the power function were greater than 0.95. (ibid.) Merely surviving the statistical test is too easy, occurs too frequently, even when H 0 is false. <20>

A post-data analysis is even better*: Mayo and Cox 2006 ( Frequentist principle of evidence ): FEV: insignificant result: A moderate P-value is evidence of the absence of a discrepancy δ from H 0, only if there is a high probability (1 c) the test would have given a worse fit with H 0 (i.e., d(x) > d(x 0 )) were a discrepancy δ to exist. (83-4) If Pr(d(X) > d(x 0 ); μ = μ 0 + δ) is high d(x) d(x 0 ); infer: any discrepancy from μ 0 < δ [Infer: µ < CI u ) (* severity for acceptance : Mayo & Spanos 2006/2011) <21>

How to justify detaching the inference? Rubbing off: The procedure is rarely wrong, therefore, the probability it is wrong in this case is low. What s rubbed off? (could be a probabilism or a performance) Bayesian epistemologists: (Having no other relevant information): A rational degree of belief or epistemic probability rubs off Attaching the probability to the claim differs from a report of well-testedness of the claim <22>

Severe Probing Reasoning The reasoning of the severe testing theorist is counterfactual: H: μ x 0 + 1.96σ x (i.e., μ CIu ) H passes severely because were this inference false, and the true mean μ > CI u then, very probably, we would have observed a larger sample mean. (I don t saddle Cox with my take, nor Popper) <23>

How Well Tested (Corroborated, Probed) How Probable We can build a logic for severity (it won t be probability) both C and ~C can be poorly tested low severity is not just a little bit of evidence, but bad or no evidence Formal error probabilities may serve to quantify probativeness or severity of tests (for a given inference), they do not automatically give this-must be relevant <24>

What Nancy Reid s paper got me thinking about is the calibration point: Here s the longer quote: We may avoid the need for a different version of probability by appeal to a notion of calibration, as measured by the behaviour of a procedure under hypothetical repetition. That is, we study assessing uncertainty, as with other measuring devices, by assessing the performance of proposed methods under hypothetical repetition. Within this scheme of repetition, probability is defined as a hypothetical frequency. (Reid and Cox 2015, 295) <25>

Notions of calibration also vary! (1) If we calibrate p-values by a Bayes factor or other probabilism, p-values exaggerate evidence (2) If we calibrate Bayes factors by performance or severity they exaggerate what s warranted to infer depends on one s philosophy of statistics Greenland, Senn, Rothman, Carlin, Poole, Goodman, Altman (2016, 342). <26>

Notions of calibration also vary! (1) If we calibrate p-values by a Bayes factor or other probabilism, p-values exaggerate evidence (2) If we calibrate Bayes factors by performance or severity, they exaggerate what s warranted to infer depends on one s philosophy of statistics, Greenland, Senn, Rothman, Carlin, Poole, Goodman, Altman (2016, 342). Reid: it is unacceptable if a procedure yielding high-probability regions in some non-frequency sense are poorly calibrated I agree. I take this as calling for the second (2), frequentist, calibration <27>

This takes me to my last point: an irony about today s replication crisis In some cases it s thought Big Data foisted statistics on fields unfamiliar with its dangers, and Reid discusses some foibles A lot of consciousness-raising is going on More hand-wringing than ever regarding cherry-picking, selection effects (p-hacking, significance seeking) R.A. Fisher: it s easy to lie with statistics by selective reporting (1955, p. 75) new names, same problem <28>

Returns to a question from back when the possibility of a logic of induction was still viable: can t data speak for themselves? Preregistration calls are everywhere: Authors must decide the rule for terminating data collection before data collection begins and report this rule in the article. (Simmons, Nelson, and Simonsohn 2011, 1362) At the same time Use of the Bayes factor gives experimenters the freedom to employ optional stopping without penalty. (In fact, Bayes factors can be used in the complete absence of a sampling plan ) (Bayarri, Benjamin, Berger, Sellke 2016, 100) <29>

What I take away from Nancy Reid s talk is: if we don t know what we mean by an account works we can t tell how to calibrate <30>

In the severe testing view: In order for a calibration to be relevant to normative epistemology, that is to what is warranted to infer, (what s well and poorly tested) 1. It must be directly affected by selection effects (cherry picking, multiple testing, stopping rules) 2. enable testing assumptions 3. enable statistical falsification. Points to the need for further philosophical-statistical interaction <31>

Philosophy of Inductive/Statistical Inference Inductive Logics Carnap C(H,e), Hacking Falsification, testing accounts Popper Parallels in Formal Statistics (goes much further) Bayesian and Likelihoodist accounts Probability: to assign degree of confirmation, support, belief (posterior or comparative) Probabilisms Fiducial? Fisherian, Neyman-Pearson frequentist methods: Probability: (a) to ensure reliable performance (b) severity of tests probativeness Fiducial? <32>

[1] (endnote) <33>

REFERENCES Barnard, G. (1972). The Logic of Statistical Inference (review of The Logic of Statistical Inference by Ian Hacking). British Journal for the Philosophy of Science 23(2): 123-132. Bayarri, M., Benjamin, D., Berger, J., Sellke, T. (2016). Rejection Odds and Rejection Ratios: A Proposal for Statistical Practice in Testing Hypotheses." Journal of Mathematical Psychology 72: 90-103. Berger, J. O. and Wolpert, R. (1988). The Likelihood Principle. 2 nd ed. Vol. 6. Lecture Notes- Monograph Series. Hayward, California: Institute of Mathematical Statistics. Carnap, R. (1962). Logical Foundations of Probability. 2nd ed. Chicago: University of Chicago Press. Cox, D. R. (2006). Principles of Statistical Inference. Cambridge: Cambridge University Press. Fisher, R. A. (1930). Inverse Probability. Mathematical Proceedings of the Cambridge Philosophical Society 26(4): 528-535. Fisher, R. A. (1935). The Logic of Inductive Inference. Journal of the Royal Statistical Society 98(1): 39 82. Fisher, R.A. (1936). Uncertain Inference. Proceedings of the American Academy of Arts and Sciences 71: 248-258. Fisher, R. A. (1955). Statistical Methods and Scientific Induction. Journal of the Royal Statistical Society, Series B (Methodological) 17(1): 69 78. Hacking, I. (1965). Logic of Statistical Inference. Cambridge: Cambridge University Press. Hacking, I. (1972). Review: Likelihood. British Journal for the Philosophy of Science 23(2): 132-7. Hacking, I. (1980). The Theory of Probable Inference: Neyman, Peirce and Braithwaite, in Mellor, D. (ed,), pp. 141 60. Science, Belief and Behavior: Essays in Honour of R. B. Braithwaite. Cambridge: CUP. <34>

Jeffreys, H. (1939). Theory of Probability. Oxford: Oxford University Press. Lindley, D. (1971). The Estimation of Many Parameters. In Foundations of Statistical Inference, edited by V. P. Godambe and D. A. Sprott, 435 455. Toronto: Holt, Rinehart and Winston. Mayo, D. G. (1996). Error and the Growth of Experimental Knowledge. Science and Its Conceptual Foundation. Chicago: University of Chicago Press. Mayo, D. G. (2014). On the Birnbaum Argument for the Strong Likelihood Principle (with discussion). Statistical Science 29(2): 227-39, 261-6. Mayo, D. G. (2016). Don't Throw Out the Error Control Baby with the Bad Statistics Bathwater: A Commentary on Wasserstein, R.L. & Lazar, N.A. 2016, The ASA's Statement on p-values: Context, Process, and Purpose. The American Statistician, vol. 70, no. 2, supplemental materials. Mayo, D. G. and Cox, D. R. (2006). "Frequentist Statistics as a Theory of Inductive Inference," in Optimality: The Second Erich L. Lehmann Symposium (ed. J. Rojo), Lecture Notes- Monograph series, Institute of Mathematical Statistics (IMS) 49: 77-97. Mayo, D. G. and Spanos, A. (2006). "Severe Testing as a Basic Concept in a Neyman- Pearson Philosophy of Induction," British Journal of Philosophy of Science, 57: 323-357. Mayo, D. G. and Spanos, A. (2011). Error Statistics, in Bandyopadhyay, P. and Forster, M. (eds.) pp. 152 198. Philosophy of Statistics, Vol. 7, Handbook of the Philosophy of Science. The Netherlands: Elsevier. Neyman, J. (1930). Methodes nouvelles de verification des hypotheses. Compt Rend Premier Congr Math Pays Slaves: 355-366. Neyman, J. (1934). On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection. Early Statistical Papers of J. <35>

Neyman: 98-141. [Originally published (1934) in The Journal of the Royal Statistical Society 97(4): 558-625.] Neyman, J. (1955). The Problem of Inductive Inference. Communications on Pure and Applied Mathematics 8(1): 13 46. Pearson, E. and Neyman, J. (1967). On the Problem of Two Samples. In Joint Statistical Papers, by J. Neyman and E.S. Pearson, 99-115 (Berkeley: University of California Press). First published in Bull. Acad. Pol. Sci (1930): 73-96. Peirce, C. S. (1931). Collected Papers of Charles Sanders Peirce, Hartsthorne, C and Weiss, P. (eds.), 6 vols. Cambridge: Harvard University Press. Popper, K. (1959). The Logic of Scientific Discovery. New York: Basic Books. Popper, K. (1994). The Myth of the Framework: In Defense of Science and Rationality. (ed. M. A. Notturno). London & New York: Routledge. Reid, C. (1997). Neyman. New York: Springer Science & Business Media. Reid, N. & Cox, D.R. (2015). "On Some Principles of Statistical Inference." International Statistical Review 83(2): 293-308. Rosenkrantz, R. (1977). Inference, Method and Decision: Towards a Bayesian Philosophy of Science. Dordrecht, The Netherlands: D. Reidel. Royall, R. (1997). Statistical Evidence: A Likelihood Paradigm. Chapman and Hall, CRC Press. Sellke, T., Bayarri, M. & Berger, J. O. (2001). Calibration of ρ Values for Testing Precise Hypotheses. The American Statistician 55(1): 62-71. Simmons, J. Nelson, L. and Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allow Presenting Anything as Significant. Psych. Sci. 22(11): 1359-1366. <36>