Truth and Evidence in Validity Theory

Similar documents
Richard L. W. Clarke, Notes REASONING

Lecture 4.2 Aquinas Phil Religion TOPIC: Aquinas Cosmological Arguments for the existence of God. Critiques of Aquinas arguments.

2016 Philosophy. Higher. Finalised Marking Instructions

Qualitative and quantitative inference to the best theory. reply to iikka Niiniluoto Kuipers, Theodorus

ON CAUSAL AND CONSTRUCTIVE MODELLING OF BELIEF CHANGE

CLASS #17: CHALLENGES TO POSITIVISM/BEHAVIORAL APPROACH

2017 Philosophy. Higher. Finalised Marking Instructions

2nd International Workshop on Argument for Agreement and Assurance (AAA 2015), Kanagawa Japan, November 2015

King and Kitchener Packet 3 King and Kitchener: The Reflective Judgment Model

Naturalism Primer. (often equated with materialism )

Introduction Symbolic Logic

2. Refutations can be stronger or weaker.

DISCUSSION THE GUISE OF A REASON

There are two common forms of deductively valid conditional argument: modus ponens and modus tollens.

THE TWO-DIMENSIONAL ARGUMENT AGAINST MATERIALISM AND ITS SEMANTIC PREMISE

Foundationalism Vs. Skepticism: The Greater Philosophical Ideology

Scientific Progress, Verisimilitude, and Evidence

A R G U M E N T S I N A C T I O N

Basic Concepts and Skills!

Relativism. We re both right.

REASON AND PRACTICAL-REGRET. Nate Wahrenberger, College of William and Mary

A Solution to the Gettier Problem Keota Fields. the three traditional conditions for knowledge, have been discussed extensively in the

UNITY OF KNOWLEDGE (IN TRANSDISCIPLINARY RESEARCH FOR SUSTAINABILITY) Vol. I - Philosophical Holism M.Esfeld

SYSTEMATIC RESEARCH IN PHILOSOPHY. Contents

The Qualiafications (or Lack Thereof) of Epiphenomenal Qualia

Moral Twin Earth: The Intuitive Argument. Terence Horgan and Mark Timmons have recently published a series of articles where they

THE ROLE OF COHERENCE OF EVIDENCE IN THE NON- DYNAMIC MODEL OF CONFIRMATION TOMOJI SHOGENJI

C. Exam #1 comments on difficult spots; if you have questions about this, please let me know. D. Discussion of extra credit opportunities

TWO VERSIONS OF HUME S LAW

Argumentation Module: Philosophy Lesson 7 What do we mean by argument? (Two meanings for the word.) A quarrel or a dispute, expressing a difference

Philosophy of Science. Ross Arnold, Summer 2014 Lakeside institute of Theology

Does Deduction really rest on a more secure epistemological footing than Induction?

Stout s teleological theory of action

Intuitive evidence and formal evidence in proof-formation

The Ontological Argument. An A Priori Route to God s Existence?

Critical Scientific Realism

On the alleged perversity of the evidential view of testimony

Christ-Centered Critical Thinking. Lesson 6: Evaluating Thinking

Rule-Following and the Ontology of the Mind Abstract The problem of rule-following

Informalizing Formal Logic

Two Ways of Thinking

The stated objective of Gloria Origgi s paper Epistemic Injustice and Epistemic Trust is:

Hume. Hume the Empiricist. Judgments about the World. Impressions as Content of the Mind. The Problem of Induction & Knowledge of the External World

Do we have knowledge of the external world?

Chance, Chaos and the Principle of Sufficient Reason

ANALOGIES AND METAPHORS

Knowledge, Trade-Offs, and Tracking Truth

Realism and instrumentalism

The problems of induction in scientific inquiry: Challenges and solutions. Table of Contents 1.0 Introduction Defining induction...

Robert Audi, The Architecture of Reason: The Structure and. Substance of Rationality. Oxford: Oxford University Press, Pp. xvi, 286.

IA Metaphysics & Mind S. Siriwardena (ss2032) 1 Personal Identity. Lecture 4 Animalism

The Problem of Induction and Popper s Deductivism

Direct Realism and the Brain-in-a-Vat Argument by Michael Huemer (2000)

An Inferentialist Conception of the A Priori. Ralph Wedgwood

Critical Thinking 5.7 Validity in inductive, conductive, and abductive arguments

Can A Priori Justified Belief Be Extended Through Deduction? It is often assumed that if one deduces some proposition p from some premises

Social mechanisms and explaining how: A reply to Kimberly Chuang Johannes Persson, Lund University

Scientific Method and Research Ethics

Study Guides. Chapter 1 - Basic Training

Courses providing assessment data PHL 202. Semester/Year

A Brief History of Thinking about Thinking Thomas Lombardo

Falsification or Confirmation: From Logic to Psychology

Academic argument does not mean conflict or competition; an argument is a set of reasons which support, or lead to, a conclusion.

Helpful Hints for doing Philosophy Papers (Spring 2000)

IN DEFENCE OF CLOSURE

Phenomenal Consciousness and Intentionality<1>

Realism and the success of science argument. Leplin:

PH 1000 Introduction to Philosophy, or PH 1001 Practical Reasoning

A Brief Introduction to Key Terms

Pragmatic Considerations in the Interpretation of Denying the Antecedent

From Transcendental Logic to Transcendental Deduction

Appendix: The Logic Behind the Inferential Test

EXERCISES, QUESTIONS, AND ACTIVITIES My Answers

Utilitarianism: For and Against (Cambridge: Cambridge University Press, 1973), pp Reprinted in Moral Luck (CUP, 1981).

1. Introduction Formal deductive logic Overview

Ayer and Quine on the a priori

2013 Pearson Education, Inc. All rights reserved. 1

Philosophical Review.

Commentary on Feteris

Van Fraassen: Arguments Concerning Scientific Realism

LTJ 27 2 [Start of recorded material] Interviewer: From the University of Leicester in the United Kingdom. This is Glenn Fulcher with the very first

2018 Philosophy of Management Conference Paper submission NORMATIVITY AND DESCRIPTION: BUSINESS ETHICS AS A MORAL SCIENCE

Reviewed Work: Why We Argue (and How We Should): A Guide to Political Disagreement, by Scott Aikin and Robert Talisse

Reductio ad Absurdum, Modulation, and Logical Forms. Miguel López-Astorga 1

Can Rationality Be Naturalistically Explained? Jeffrey Dunn. Abstract: Dan Chiappe and John Vervaeke (1997) conclude their article, Fodor,

PHILOSOPHIES OF SCIENTIFIC TESTING

Constructive Logic, Truth and Warranted Assertibility

Logic is the study of the quality of arguments. An argument consists of a set of

Should We Assess the Basic Premises of an Argument for Truth or Acceptability?

BELIEFS: A THEORETICALLY UNNECESSARY CONSTRUCT?

Critical Thinking - Section 1

IN THIS PAPER I will examine and criticize the arguments David

Four Arguments that the Cognitive Psychology of Religion Undermines the Justification of Religious Belief

Lecture 6 Keynes s Concept of Probability

The Oxford Handbook of Epistemology

15 Does God have a Nature?

Establishing premises

Projection in Hume. P J E Kail. St. Peter s College, Oxford.

ISSA Proceedings 1998 Wilson On Circular Arguments

Anaphoric Deflationism: Truth and Reference

Transcription:

Journal of Educational Measurement Spring 2013, Vol. 50, No. 1, pp. 110 114 Truth and Evidence in Validity Theory Denny Borsboom University of Amsterdam Keith A. Markus John Jay College of Criminal Justice of The City University of New York According to Kane (this issue), the validity of a proposed interpretation or use depends on how well the evidence supports the claims being made. Because truth and evidence are distinct, this means that the validity of a test score interpretation could be high even though the interpretation is false. As an illustration, we discuss the case of phlogiston measurement as it existed in the 18th century. At face value, Kane s theory would seem to imply that interpretations of phlogiston measurement were valid in the 18th century (because the evidence for them was strong), even though amounts of phlogiston do not exist and hence cannot be measured. We suggest that this neglects an important aspect of validity and suggest various ways in which Kane s theory could meet this challenge. We welcome Michael Kane s article updating, extending, and further developing his approach to test validity. We choose to focus our comment on a single issue that we consider fundamental to test validity and on which Kane s account remains ambiguous: How does the truth of the conclusion of the validity argument fit into validity in Kane s approach? Modern test validity theory contains a dialectic between two metaphors. The first metaphor is the mechanical metaphor in which a test is a machine that measures and is valid if it measures what it is intended to measure. For example, one can think of an alarm clock as a test that is reliable if it sounds consistently at some specific time and valid if it sounds consistently at the intended time. The second is the argument metaphor in which an interpretation is valid if it follows from test scores by way of a valid argument. Thus the conclusion that the time has come to get up follows appropriately from the sounding of the alarm if the inference is valid. The mechanical metaphor emphasizes the link between test validity to the truth of conclusions drawn from test scores (Borsboom & Mellenbergh, 2007; Borsboom, Mellenbergh, & Van Heerden, 2004). The argument metaphor emphasizes evidence to support the inference. One way to frame this issue is in terms of the traditional analysis of knowledge as justified true belief (Shope, 1983). The mechanical metaphor emphasizes true belief, whereas the argument metaphor emphasizes justified belief. The argument metaphor begins with the prototype of a deductive argument in which the conclusion must hold true if the premises hold true. In an ideal Cartesian tree of knowledge, only certainly true beliefs would be admitted as premises, and thus conclusions would follow with certainty. Thus truth and justification hold together. However, even formally valid deductive arguments allow for some slippage: A valid argument can lead to a false 110 Copyright c 2013 by the National Council on Measurement in Education

Truth and Evidence in Validity Theory conclusion if it begins with false premises. Thus what is required is not just validity but soundness of the argument (validity plus true premises). As Kane emphasizes, however, science does not typically proceed by deductive argument. Science proceeds by ampliative arguments that draw conclusions that go beyond what is made certain by their premises. Extending the metaphor, such inferences can be deemed valid if they tend toward true conclusions without guaranteeing them. A standard example is inductive inference based on a bad sample, in which a sample of all red spheres drawn from an urn containing 50% red spheres leads by justifiable inference to a false conclusion about the proportion of red spheres in the urn. As such, truth and justification can come apart, at least in the short run. Messick (1989) gave the example of test behaviors that may be adaptive in some contexts but not others being wrongly interpreted as universally adaptive (e.g., consistency) or nonadaptive (e.g., rigidity). An example like this holds even in the long run, because one can collect endless empirical evidence that the scores align to the construct without detecting the misinterpretation applied to both the construct and the scores when interpreting the research evidence. The informal arguments emphasized by Kane are clearly ampliative rather than demonstrative, widening the potential gap between justified belief and true belief. It is not our intent to advocate for one metaphor over the other, as we think that both involve important questions in test validity theory (Markus & Borsboom, in press). Instead, we wish to focus on a highly specific question: How, if at all, is the truth of the conclusions drawn from test scores incorporated into Kane s approach to validity? To put a finer point on it, does Kane s approach provide a framework that makes it possible to represent a situation in which the best available evidence leads to a false conclusion? If not, is this a deficiency of Kane s theoretical framework or a deliberate assumption that justification and truth cannot come out of alignment? This is not merely academic curiosity, in our view, because it has practical implications for test validation. If test validity theory emphasizes justified belief to the exclusion of true belief, validation may become an end in itself rather than a means to an end. In such a case, one constructs an argument to support a test score interpretation simply because one wants to support that interpretation. In this case, validity arguments risk becoming akin to arguments for conspiracy theories: they never fail to support these theories simply because they are explicitly designed to do so. In our view, however, one constructs and evaluates a validity argument as a means to an end namely, because one wants to arrive at a better understanding of how well the test is functioning. This, however, requires an account of validity that incorporates both justified belief and true belief as distinct elements. The question before us is whether Kane s approach has sufficient conceptual resources to achieve this. Phlogiston: A Clear Test Case Cronbach (1971) emphasized explanation, which assumes truth, and Messick (1989) presented an elaborate theory of fallibilism. However, in Kane s presentation of the argument-based approach, the emphasis is entirely on justification to the exclusion of truth. As Kane puts it (this issue, p. 1), the validity of a proposed 111

Borsboom and Markus interpretation or use depends on how well the evidence supports the claims being made. Rather than link the argument to justification and the concluding interpretation to truth, Kane links the interpretation itself to justification. Thus, taking Kane s statements at face value suggests that the validity of test-score interpretations (not just the arguments supporting them) is essentially independent of their truth. This makes validity entirely a time-dependent concept that is relative to scientists evidence and theories. Without the qualifications of earlier theorists, Kane asserts Validity... may change over time, as the interpretations/uses develop, and as new evidence accumulates (p. 3). The time-independent element of validity involving truth seems to get lost. To make the problem clear, it is useful to briefly illustrate it with a case where evidence for a long time supported a wholly incorrect interpretation of measurement outcomes: the measurement of phlogiston (see Borsboom, Cramer, Kievit, Zand Scholten, & Franic, 2009, for a detailed discussion). The theory of phlogiston ( firestuff ) posited the existence of a substance contained by flammable materials that was emitted in the form of fire when these materials were heated. In the 18th century, scholars measured the amount of phlogiston that a piece of material contained by subtracting the weight of the material after burning from the original weight of the material: the difference was thought to equal the relevant amount of phlogiston. Call this test-score interpretation Interpretation P: theweightofasubstancebefore burning minus the weight of the same substance after burning equals the amount of phlogiston the material contained. Support for Interpretation P existed in the form of a quite impressive theory on the nature of burning as phlogiston emission. For instance, the theory of phlogiston could explain why some materials burned while others did not (they did not contain phlogiston), why materials that do not burn (e.g., iron) do not lose weight when heated (they do not emit phlogiston), why a burning candle dies out if there is no fresh air supply (the air becomes saturated with phlogiston), and so forth. Thus, until the end of the 18th century when Lavoisier refuted the theory and showed that burning is a chemical reaction, measurement interpretations in terms of phlogiston enjoyed considerable support. The example provides an interesting test case for validity theories because it clearly shows how truth and evidence can come apart in test-score interpretations: Interpretation P was never true, but it was supported by significant amounts of evidence and strong arguments. Its negation, not-p, was always true but was not supported by evidence before Lavoisier entered the scene. If we take Kane literally, then the phrase the validity of a proposed interpretation or use depends on how well the evidence supports the claims being made would seem to imply that Interpretation Phadhighvalidityin,say,1730.Thus,ifithadbeenappliedbytheproponentsof phlogiston theory in the early 18th century, Kane s theory of validity would have led to the acceptance of phlogiston measurement as valid. The question we would like to pose to Kane is whether this is indeed a correct reading of his theory. If so, how would he evaluate the phlogiston measurement example? As we see it, the inference may have been valid, but the interpretation of measurement outcomes was not. In fact, it seems to us that phlogiston measurement is a textbook example of invalidity in test-score interpretation and that a good theory 112

Truth and Evidence in Validity Theory of test validity should somehow accommodate this. Regardless of how subtly one constructs the relation between test score interpretations and reality, it would seem that any theory of validity should deem the interpretation invalid. Possible Strategies for Dealing with the Phlogiston Case We see several possible responses that Kane could provide in response to the phlogiston case. First, he could bite the bullet. That is, Kane could accept that his theory would have supported Interpretation P in the 18th century (not just the inference). This would imply that Kane s theory emphasizes justified belief to the exclusion of true belief. In our view, this would be a heavy price to pay, but Kane could choose to do so. Also, if this should be the preferred route, then it would seem that Kane s theory is incomplete and requires a supplement outside of what it labels validity to deal with the relation between measurement interpretations and the world (i.e., true belief about test scores). Even a thoroughgoing pragmatist such as Rorty (2000) recognizes the need for a minimal notion of truth to support fallibilism about belief. Second, Kane could attempt to show that the argument does not actually stick. This would require that Kane set up an argument to the effect that his theory would not actually have granted validity to Interpretation P in 1730. For instance, Kane could attempt to furnish his approach with the room for fallibilism that all theories of truth require. This could for instance be done by including something like an ultimate argument into his argument-based approach; for instance, by making validity relative to the argument that rational scientists would arrive at, should they continue their investigations for an infinity of time. That way, Interpretation P could be valid to the observers in the 18th century but invalid with respect to the ultimate state of scientific knowledge. Such approaches to truth face many challenges, but the approach would allow Kane to incorporate both justified belief and true belief into his approach while at the same time keeping the emphasis on justified belief. Third, Kane could attempt to show that even though his theory is unable to rule correctly about Interpretation P, other theories cannot do so either. This strategy would come down to showing that, even though it seems that a theory of validity like that of Borsboom et al. (2004) correctly rules Interpretation P to be invalid (because phlogiston does not exist and thus cannot produce measurement outcomes, so that a sufficiently strong causal reading cannot be given; Markus, 2004, 2008), this is not actually so. Alternatively, Kane might argue that alternative approaches shortchange justified belief the same way that his approach shortchanges true belief and that shortchanging true belief is the lesser of two evils. The challenge would be to show why it is necessary on such a view to accept shortchanging one or the other. However Kane chooses to treat the issue, we think that any validity theory should deal with the relation between evidence and truth, however difficult that may be in psychological and educational testing (Markus & Borsboom, in press). In our view, atheoryofvaliditythatexclusivelydealswithhowtoorganizeevidenceandjustify decisions misses an essential psychometric aspect of validity and is unnecessarily impoverished. 113

References Borsboom, D., Cramer, A. O. J., Kievit, R. A., Zand Scholten, A., & Franic, S. (2009). The end of construct validity. In R. W. Lissitz (Ed.), The concept of validity (pp. 135 170). Charlotte, NC: Information Age. Borsboom, D. & Mellenbergh, G. J. (2007). Test validity in cognitive assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 85 115). New York, NY: Cambridge University Press. Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004). The concept of validity. Psychological Review, 111,1061 1071. Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 443 507). Washington, DC: American Council on Education. Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17 64). Westport, CT: American Council on Education and Praeger. Markus, K. A. (2004). Varieties of causal modeling: How optimal research design varies by explanatory strategy. In K. van Montfort, J. Oud & A. Satorra (Eds.), Recent developments on structural equation models: Theory and applications (pp. 175 196). Dordrecht, The Netherlands: Kluwer Academic. Markus, K. A. (2008). Constructs, concepts and the worlds of possibility: Connecting the measurement, manipulation, and meaning of variables. Measurement, 6,54 77. Markus, K. A., & Borsboom, D. (in press). Frontiers of validity theory: Measurement, causation, and meaning. NewYork,NY:Taylor&Francis. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13 103). Washington, DC: The American Council on Education and the National Council on Measurement in Education. Rorty, R. (2000). Universality and truth. In R. B. Brandom (Ed.), Rorty and his critics (pp. 1 30). Malden, MA: Blackwell. Shope, R. K. (1983). The analysis of knowing: A decade of research. Princeton, NJ: Princeton University Press. Authors DENNY BORSBOOM is Professor of Psychological Methods at the Department of Psychology of the University of Amsterdam, Weesperplein 4, 1018 XA Amsterdam, The Netherlands; dennyborsboom@gmail.com. His primary interests include psychometrics, philosophy of science, and network modeling. KEITH A. MARKUS is Professor of Psychology at John Jay College of Criminal Justice of The City University of New York, Psychology Department, 524 West 59th Street, New York, NY 10019; kmarkus@aol.com. His primary interests include test validity, causal explanation and inference, and program evaluation. 114