Imprint INFINITESIMAL CHANCES. Thomas Hofweber. volume 14, no. 2 february University of North Carolina at Chapel Hill.

Philosophers Imprint INFINITESIMAL CHANCES Thomas Hofweber University of North Carolina at Chapel Hill 2014, Thomas Hofweber <www.philosophersimprint.org/014002/> volume 14, no. 2 february 2014 1. Introduction Although there is little agreement in the metaphysics of chance about what chances are, there is large agreement on what the mathematical representation of chance should be and how chance is to be measured. Chance, whatever it is more precisely, is some feature of events that comes in degrees, and the measurement of chance assigns numbers to events in accordance to how likely they are to occur. No matter what one s view is in the metaphysics of chance, it should be no obstacle to an agreement on how chance should be measured. Measurement, after all, concerns the mathematical representation of this feature and its degrees, but leaves open the nature of chance. The measurement of chance might thus uncontroversially be taken to be what it is commonly taken to be: a probability measure in the technical sense that satisfies Kolmogorov s axioms. The metaphysical questions are left open by all this, or so it is natural to think. But this is mistaken. The standard way to measure chance brings with it a commitment in the metaphysics of chance that is not only substantial but also mistaken. This paper hopes to make clear what this commitment is, why it is mistaken, and how to properly measure chance instead. That something is not quite right in the status quo is not an unusual sentiment. Contemporary accounts of chance and its measurement include something that many, myself included, consider completely unpalatable, and that is almost universally rejected by those who are theoretically uncontaminated. We are told that we have to swallow it nonetheless, as any acceptable way to measure chance requires it. The claim in question is this: (1) Some events with 0% chance of happening happen anyways. Some events with 100% chance of happening don t happen nonetheless. Our initial strong reaction against this, we are told, is something we must learn to live with for technical reasons to be discussed shortly. The way chance is measured requires that sometimes we need to as-

sign 0 to the chance of events that happen and 1 to the chance of events that don t happen. Some things that prima facie seem implausible sometimes have to be accepted for theoretical reasons, or so we are told. But alternatively, maybe we simply don t measure chance correctly. Maybe our measurements of chance, our assigning numbers as measures of how likely an event is, isn t fine-grained enough to be the proper measure of chance. On the proper measure, one might hope, (1) will never occur. Nothing that happens will get 0% chance of happening, and nothing will get 100% chance of happening that doesn t happen. This issue so far seems to be one about measurement: do we need to make distinctions in our measures that are finer than we have made so far? But the issue goes much beyond that. I will argue that (i) accepting or rejecting (1) is a crucial dividing line between two large-scale metaphysical conceptions of chance, and that (ii) any satisfactory theory of chance must reject (1). The question then is what approach to the measurement of chance can deliver that (1) is ruled out. Here one component is well known: infinitesimals need to be employed in the measurement of chance. However, we will see that this by itself is not enough. We need to augment infinitesimals with two further ideas: non-locality and flexibility. Only all three of those together can give us a satisfactory account of how chance is to be measured and what chances are. 2. The minimal constraint on the measurement of chance Although we will clarify many of these points below, we can nonetheless take, as a first stab, our task of measuring chance to be as follows: Events are more or less likely to occur, and correspondingly they have larger or smaller chances of occurring. Chance thus is a gradable feature or quantity of events: a quantity that an event can have more or less of, or that it can have to a larger or smaller degree. It is unclear whether this quantity can be reduced to, or understood in terms of, other features of events, or whether it is a primitive or sui generis feature. Fortunately we don t have to settle this important debate here. Instead we need to focus on the connection between the quantity of chance and its measurement. To measure chance we assign numbers to events in accordance to how likely the event is to occur. Here we can simply take the chance of the event as a feature that is given and that obtains independently of our measurements. In fact, as far as this paper is concerned, chance can be a perfectly determinate and fully objective feature of events, a feature that the event has independently of its measurement. Nothing that is to come, I maintain, will speak against chance being this way, but nothing will require it to be this way either. On the one hand, then, there are the events that have chances of occurring, and on the other hand there are the things we use to measure these chances, that is, how likely these events are to occur. In measuring chance we thus assign numbers to events in accordance to how likely these events are to occur. In this way measuring chance is no different than measuring height. In the latter case we assign numbers to people in accordance to how tall they are. In the former case we assign numbers to events in accordance to how likely they are to occur. Each person has a height, and each event has a chance, and the numbers we assign in measurement are supposed to capture how much of that they have. This is so far only a basic outline of what measuring chance is. Not any such assignment of numbers to events will do, of course, and we will need to flesh out some more what such an assignment should be like. And this will get us to the heart of our issue. When we measure height there is a lowest possible measure of height, but no largest. In principle height has no maximum. We thus need to measure height with a half-open interval of some numbers, generally all positive real numbers, including 0 as the lowest possible measure. When we measure chance we generally take there to be a lowest possible chance as well as a highest possible one, and thus we need to employ a closed interval of some number system, generally the unit interval [0, 1] among the real numbers, but in principle we could use some other closed interval of the real numbers, or some other closed inphilosophers imprint - 2 - vol. 14, no. 2 (february 2014)

terval of some other number system. Employing the unit interval [0, 1] is technically more elegant than using some other closed interval, and so it is generally the chosen one. Furthermore, the larger the height or chance, the larger the number assigned as its measure. In particular, if an event e is less likely to happen than an event f, then the chance of e is smaller than the chance of f, and thus the number assigned to e as the measure of its chance must be a smaller number than the number assigned to f. And similarly for a number of other uncontroversial general requirements on measurement. One assumption about chance that is implicitly made when we use numbers like real numbers as measures of chance is worth making explicit. It is the assumption that events are linearly ordered by their chances, which means in particular that any two events can be compared with respect to how likely they are: for any two events e 1 and e 2, either e 1 is more likely than e 2, or e 2 is more likely than e 1, or they are equally likely. Since the numbers used as measures are linearly ordered by their natural ordering of being larger, any measure of chance that assigned all events a number from a linearly ordered number system corresponding to how likely the events are must assume that events are linearly ordered by their chances. We will also make this assumption in the following, as does almost everyone else, but we will revisit it below, in section 7. We will need to refine several of these general points in the following, but we can for now move on to our central concern. In measuring chance we assign numbers from the unit interval [0, 1] to events in accordance to how likely they are, meeting the above, in effect uncontroversial, requirements on what such an assignment should be like. However, there is also another requirement on the proper measure of chance, which I think has equal standing to the uncontroversial ones mentioned, but which is widely rejected in present theorizing about chance and which is controversial. I consider it to be inevitable, and I think of it as one of several minimal requirements on any measure of chance. For reasons given in a just a minute, it deserves the name of the Minimal Constraint: (MC) If the chance of p is 0, then not p. If the chance of p is 1, then p. (MC) is a constraint on the proper measurement of chance. Under the assumption that the measures of chance are numbers in the unit interval [0, 1], (MC) requires that only events that do not happen can be assigned the lowest possible measure, and only events that happen can be assigned the highest possible measure of chance. It is something that a measure of chance must deliver for it to be at least a minimally acceptable measure. Any measurement of chance that doesn t meet (MC) can be seen as being defective in one of two ways, which is connected to the fact that (MC) follows from the following two principles. The first is a principle about what the measurement of chance must deliver; the second is one about what the chances of certain events are: (P1) A (correct and complete) measure of chance assigns the same value to the chance of events just in case they are equally likely. (P2) An event that happens is more likely to happen than an event that is absolutely guaranteed not to happen. An event that doesn t happen is less likely to happen than an event that is absolutely guaranteed to happen. (P1) should be uncontroversial. If it isn t satisfied, then either the measure must not be fine enough or else it is incomplete. (P2) compares the likelihood of events that happen or don t happen to those that are absolutely guaranteed to happen or not to happen. The notion of absolute guarantee can have a very strong reading here for example, it being a conceptual truth that the event will not happen, or it being a necessary truth that the event will not happen, or whichever reading one might prefer. For our point here it is not required to settle on a particular reading of absolute guarantee, since any one of them will work, and different people might be more comfortable with some rather than others. We can leave this notion as a placeholder for one s philosophers imprint - 3 - vol. 14, no. 2 (february 2014)

preferred way to spell it out. To illustrate with an example: (P2) requires that the chance of the stars being scattered the way they are scattered is larger than the chance that I am taller than myself. No matter how unlikely it was that the stars are distributed the way they are, it is even less likely that I am taller than myself. (P2) is generally rejected, but not because it by itself is judged to be false or implausible, to the contrary, but rather because its rejection is required by the standard way of measuring chance, as I ll explain shortly. But this should lead us only to question the standard approach to the measurement of chance, not to reject (P2). I can t see giving up (P2) as an option while maintaining that we are talking about chance, and not some substitute for it. There is a difference in chance between events that happen and events that are guaranteed not to happen. What this means is that the chance of an event can be so low that its chance guarantees it will not happen. Since the lowest possible chance of an event is lower than the chance of an event that happens, having lowest possible chance guarantees that the event that has it does not happen. This is in part what gives chance its bite. Lowering the chance of an event gets it closer to being ruled out, and getting it all the way down to lowest possible chance really does rule it out. 1 Rejecting (MC) neuters chances. All that chances can do is to make things unlikely or likely, but not guarantee that things happen or not happen. Neutered chances are chances that have no impact on what events obtain or don t obtain, but instead only on how likely or unlikely they are to obtain. Having the lowest possible chance, on the conception of chances as neutered, doesn t guarantee that the event doesn t happen, only that it is unlikely to happen. But unlikely events might happen, even quite often, which simply would just be more 1. I take something like (MC) to be endorsed in a well-known quote of David Lewis. In the postscript to his paper Causation, Lewis states that if the chance of an event is 0, then it doesn t happen. To support it he states, Zero chance is no chance, and nothing with zero chance ever happens [Lewis, 1983, 176] (emphasis in the original). unlikely. Similarly, having (neutered) highest possible chance doesn t guarantee that the event happens; it just makes it likely that it happens. But the likely can fail to happen, unlikely as that might be. On the conception of chances as empowered chances, the chance of an event can have an impact on it happening, not just on its likelihood. Empowered chances are such that the chance of an event happening can be so high that its chance guarantees that it will happen. And the chance of an event happening can be so low that its chance rules it out. If chances are neutered, then chances are free-floating, almost epiphenomenal, affecting only other chances, but they don t hook into the non-chancy part of the world. All that is guaranteed by a chance is what other chances are, but not what will or won t happen. They would relate only to what is likely or unlikely to happen, not what will or won t happen. But if chances are empowered, then they have a grip on the non-chancy world beyond that. They can be low or high enough to guarantee what will or won t happen. And these two ways of thinking about chance are a significant difference in what chances are and how they fit into the world. The acceptance or rejection of (MC) goes hand in hand with one s conception of chances as neutered or empowered. If (MC) fails, then the chance of an event can be the lowest possible chance while the event still happens. The chance of an event thus can t be low enough to rule it out, and chances are neutered. If (MC) holds, then chances can guarantee that something does or doesn t happen, and chances are thus empowered. Without trying to sound dogmatic, I take (MC) to be beyond discussion. I can t help but to judge that (MC) is a conceptual truth about chance, given that 0 is the lowest and 1 the highest possible measure of chance. Of course, there is no conceptual connection between not happening and the number 0. But (P1) is a conceptual truth about what a (complete and correct) measure of chance is, and (P2) is a conceptual truth about chance. It is a conceptual truth about chance that an event which happens has a better chance of happening than an event which philosophers imprint - 4 - vol. 14, no. 2 (february 2014)

is conceptually incoherent. (P1) and (P2), together with the fact that 0 is the lowest possible measure of chance and 1 is the highest, entail (MC) and establish it as a conceptual truth as well. Although (MC), on the face of it, looks like a bad candidate for being a conceptual truth, since it connects chance with numbers, which seem conceptually unconnected, it turns out to be a conceptual truth after all, since it is conceptually implied by a conceptual truth about measurement of chance, a conceptual truth about chance, and the assumption that the measures of chance are from the unit interval [0, 1]. 2 (MC) thus must be accepted as a minimal requirement for the proper measurement of chance. To say that something is a conceptual truth is not simply to take one s own personal preferences and insist that they are beyond debate. It has some real explanatory power. It helps explain, for example, why almost everyone who encounters it accepts it as a truth. This is exactly the case with (MC). (MC) is almost universally and immediately judged to be true, and not simply to be a conjecture about chance, but to be in part what chance is and what our concept of chance requires. But on the other hand, you can t argue much over what is or isn t a conceptual truth. It is simply a matter of what our concepts do and don t allow, and those who in all honesty think their concept of chance allows for (MC) to fail can t be persuaded by further arguments that their concept really doesn t allow it. All one can do is to ask them to reconsider, and all they can do is the same for us. 3 (MC) is a requirement on any measure of chance, that is any measure that tries to capture all the details about the chances of events. Although a measure of chance needs to respect (MC), other measures 2. If the measures are taken from a different interval then of course (MC) has to be restated to employ the new highest and lowest measures. Its spirit is untouched by this even if its formulation might change. 3. See [Hofweber and Velleman, 2011] for a discussion of this issue in a different case. need not. For certain purposes we might want to measure something that is related to chance, but more coarse-grained and with fewer details. To take one example, we can call the coarse chance of an event its chance rounded to the closed tens in percentage. Coarse chance thus comes only in a few degrees: 0, 0.1, 0.2,..., 1. Something can have the smallest possible coarse chance of 0 but still happen, since its chance might be 0.04 of happening. (MC) doesn t have to hold for coarse chance, but it has to hold for chance. Sometimes we might be content to measure something that is less discriminating than chance, and the analog of (MC) won t apply to it. But for a measure of chance (MC) is inevitable. I can do no other but to accept (MC) as non-negotiable. It is not only a truth but a conceptual truth. And as a conceptual truth it not only happens to be true, but it has to be true. It is a necessary truth about chance, one we can discover just by thinking about what chance is. But on standard approaches to the measurement of chance, it must be given up, and neutered chances are the only option. This approach to the measurement of chance requires that (MC) is not a conceptual truth, since it requires that it is not even a truth, for reasons to be discussed shortly. The standard approach to the measurement of chance thus can t be right, or so we are forced to conclude. We will see below that we can do better. But even if (MC) is required for the correct measure of chance, there is no guarantee that such a measure can be given. Chance could be an objective feature of the world that simply can t be measured completely and correctly. If we could do no better than to violate (MC), it would show that we can t measure chance, not that (MC) has to be given up. Fortunately, we can do better. But before we can see how, we need to clarify (MC) by contrasting it with two different principles. 3. (MC), regularity, and Cournot s Principle (MC) is a principle that is related to, but different from, other similarsounding principles. In this section we will contrast it with two other philosophers imprint - 5 - vol. 14, no. 2 (february 2014)

principles, in part to clarify what it does and does not say. These two other principles are the principle of regularity and Cournot s Principle. The principle of regularity says, on one of its formulations, that if p is possible, then p has positive chance: p C(p) > 0. Or alternatively, if the chance of p is 0, then p is not possible. Or, in another formulation, 4 if the chance of p is 1, then p is necessary. These principles connect chance to modality. They are bridge principles between a claim about the chance of an event and its modal status. Since chance can be understood at least as either credence or objective chance, and modality can be understood at least as either epistemic or metaphysical modality, there are thus a number of different ways the regularity principle could be understood, combining different readings of chance and modality. The most plausible of these is likely a combination of credence and epistemic modality. 5 Such a formulation of regularity thus says that if p is epistemically possible for you, then you should assign it positive credence. (MC) is different from regularity. Whereas regularity connects chance to modality, (MC) connects chance to what does and doesn t happen. Since (MC) is a conceptual truth and thus a necessary truth, this connection is itself necessary, but it is not a connection between chance and modality but a necessary connection between chance and what does and doesn t happen. To contrast how the modal operators are placed differently, consider this comparison table: 4. We will discuss shortly whether these formulations are all equivalent. At the moment I am formulating the principles in terms of chance, but as we will also discuss momentarily, it is unclear whether regularity is best seen as a principle connecting probability in an objective or subjective sense to possibility in an epistemic or metaphysical sense. 5. See [Hájek, 2013] and [Easwaran, 2014] for a discussion of how to properly understand regularity. Regularity (MC) C(p) = 1 p (C(p) = 1 p) p C(p) > 0 (p C(p) > 0) C(p) = 0 p (C(p) = 0 p) In the regularity principle, modality is best understood as epistemic, and chance is best understood as credence. In (MC) chance should be understood as objective chance. The modality in (MC) can be understood as either epistemic or metaphysical, assuming it is a conceptual truth. Conceptual truths are not only necessarily true; they are rationally mandatory as well, at least in the sense that if you form an opinion on the matter at all, it has to be (MC). 6 Regularity might be true, but it doesn t have to be true, and it is not a conceptual truth. Credence is a technical notion that aims to capture some aspect of our minds, but it doesn t have to be understood in such a way that there is a connection between credences and epistemic modality in the sense captured by the regularity principle. Such connections are optional. The notion of a credence could be used to try to capture this connection, or it could be used to capture other features of our minds, but not that connection to epistemic modality. Similarly, (MC) doesn t have to be true if chance is understood as credence. There is no conceptual requirement to assign all truths positive credence. But when chance is understood as objective chance, then (MC) is inevitable. 6. The different formulations of (MC) are equivalent assuming there are no chance gaps, that is, events that happen but don t have a number assigned as the measure of their chance of happening. One way this might happen is because of there being non-measurable sets. However, in this case it is not so clear if there is a chance of the event, but it can t be measured, or there is no chance of the event at all. This is similar to the question whether a nonmeasurable subset of, say, a three-dimensional Euclidean space has no volume at all, or has a volume, but one can t measure what it is. We don t need to settle this issue here. If there are chance gaps, then the = and > in the formulations of (MC) need to be understood as they are commonly understood for partial functions: if there is a value at all for C(p), then it is as specified. Thanks here to Alan Hájek. philosophers imprint - 6 - vol. 14, no. 2 (february 2014)

Although (MC) is true, regularity might still be false even when understood as a principle connecting objective chance to metaphysical modality. (MC) requires that an event with lowest possible chance doesn t happen, but (MC) doesn t require that it can t happen. An event that has chance 0 in this world doesn t happen in this world, but it can have chance 0 in this world and happen in another world. In that other world, of course, it can t have chance 0 of happening, since (MC) is true in all worlds. But still, having chance 0 in this world doesn t guarantee that the event happens in no world. It only guarantees that it doesn t happen in this world. (MC) says only that events with chance 0 don t happen, not that they are impossible. An event doesn t have to have the same chance of happening in all worlds, and so regularity can fail even though (MC) is true at all worlds. Just like regularity, Cournot s Principle is similar to (MC) but not to be confused with it. This principle is named after 19th-century mathematician and philosopher Antoine Augustin Cournot, who proposed and defended it. 7 Cournot s Principle is commonly given in two quite different versions: one that is very similar to regularity, and another one, that is similar to (MC). The first principle often called Cournot s Principle is the principle that you can be morally certain that events with small probability don t happen. This version of the principle connects small probability to certainty, and to a particular kind of certainly at that. To be morally certain, I take it, is to be less certain than absolutely certain, but still certain enough that it is for practical purposes good enough to treat it as absolutely certain. This version of the principle is simply a modified version of regularity, connecting probability to epistemic modality. The second version in which Cournot s Principle is often given is closer to (MC) but still different from it. It states that events with small probabilities don t happen. This version of the principle was endorsed explicitly, since it was taken to make the probability calculus applicable to the physical world: it connects chance to what does or doesn t happen. For chances to be empowered there must be a connection between the chances of events and their happening or not happening. The motivation for Cournot s Principle on this reading is thus spot-on. And it, just as (MC), connects chance to what happens. But Cournot s Principle so understood is false. Events with small but positive probabilities do happen; they are just unlikely to happen. But sometimes the unlikely happens. Sometimes one of many different options will obtain, even though each one is unlikely to be the one that happens. Someone will win the lottery, even though that this person is going to win is unlikely. But someone will win, no matter how low the chances are for any particular one, as long as there is some chance: the chance is positive and not zero. (MC) doesn t make a claim about events that have small, positive chance, just about those that have the smallest or greatest possible chance: chance 0 or 1. Cournot s Principle so understood is stronger than (MC), but it is too strong to be true. However, (MC) and Cournot s Principle both are ways to affirm that chances are not neutered. Chances are tied to what does or doesn t happen, not just to how likely something is. This is why Cournot s Principle, in the early days of the philosophy of probability, was considered essential to give empirical content to talk about probability. 8 Even though Cournot s Principle is false, the weaker (MC) is all we need to defend chances as empowered. 7. See [Shafer, 2007] for a discussion of the history of Cournot s Principle. And, of course, there is [Cournot, 1843]. Thanks to Branden Fitelson and Alan Hájek for pointing me to Cournot. 8. See [Shafer, 2007]. philosophers imprint - 7 - vol. 14, no. 2 (february 2014)

4. (MC) and infinitesimals The reasons for rejecting (MC), and consequently neutering chances, are straightforward and familiar. They are based on the following two widely, if not almost universally, held ideas: 1. In the measurement of chance we assign a real number r to an event, with 0 r 1. 2. Such assignments must satisfy (at least) finite additivity. If we grant these two points, then we must give up (MC). Such measurements of chances with real numbers are supposed to be independent of what events we measure the chances of. And although this approach to the measurement of chance is compatible with (MC) in the finite case, it isn t in all cases. If we have infinitely many events that are all equally likely, but only one of them will happen, then no real number r > 0 is small enough to be the chance of this event happening. Such events might be unusual, but they certainly are possible. Candidates include picking a natural number at random, tossing a coin infinitely many times, throwing a dart at the real line, and so on. This failure of measuring chances with real numbers is due to a general feature of the real numbers, namely that the real numbers form an Archimedean ordered field, where the relevant Archimedean property is this: for any positive numbers r, no matter how small, and s, no matter how large, there is some finite natural number n such that n r > s. In particular, for any positive number r, no matter how small, there is a natural number n such that adding r n-times to itself is larger than 1, i.e., n r > 1. In other words, any positive real number, no matter how small, is larger than n 1 for sufficiently large n. Thus adding more than n many of those together leads to more than 1. This fact about the real numbers makes them an Archimedean ordered field. 9 And this 9. They form a field since they satisfy general properties tied to addition and multiplication, and an ordered one since they furthermore have a total ordering on them that relates in a natural way to addition and multiplication. As we will see shortly, an ordered field does not have to satisfy the Archimedean property as well. is just what shouldn t happen when we measure chances. But it does when chances are measured by real numbers and we assume finite additivity. Thus the only option for a case of infinitely many equally probable events is to assign to each of them a measure smaller than 1 n for all n, but still 0. And among the real numbers the only option is to assign them all 0. Thus these events each have chance 0, but one of them is going to happen, and so an event with chance 0 of happening happens anyway. The two above assumptions thus force us to violate the minimal constraint (MC) when we deal with infinite sets of events. Finite additivity is, in essence, beyond discussion, since without it it would make little sense to relate the chances of different events to each other. But why do we need to measure chances with numbers that satisfy the Archimedean property? Why do chances have to be measured with real numbers? Why could chances not be measured with some other numbers, which are not an Archimedean field? In particular, we could use numbers that contain the real numbers, but extend them to allow us what we want: numbers so small that adding them to themselves finitely many times always keeps them below 1. In other words, we need to use some extension of the real numbers that is non-archimedean. Such an extension would contain infinitely small positive numbers, numbers smaller than all n 1, with n a positive natural number, but still larger than 0. Such numbers are infinitesimals, and they can be the measures of the chances of events that are very unlikely but might still happen, as when we pick a natural number at random. This way of defending the principle that events with 0 chance of happening don t happen has been endorsed by David Lewis in [Lewis, 1983, 175f.] and Brian Skyrms in [Skyrms, 1980], amongst others. That such non-archimedean extensions of the real numbers exist is a well-known mathematical result (more on that shortly). Such extensions of the real numbers are generally called hyperreal numbers. To save (MC) we thus need to replace the real numbers as the measures of chance with hyperreal numbers. philosophers imprint - 8 - vol. 14, no. 2 (february 2014)

Although this is in the right spirit, it by itself is not enough to defend the minimal constraint. There are a number of good arguments that show that infinitesimals by themselves do not solve the problem they are introduced to solve. In the following we will look at the two most important arguments of this kind. One of them is an argument that no non-archimedean extension of the real numbers will work, which was given by Timothy Williamson in [Williamson, 2007]. The other is a group of arguments that no particular non-archimedean alternative to the real numbers is going to work. Versions of these arguments have, for example, been given by Alan Hájek in [Hájek, 2013], Kenny Easwaran in [Easwaran, 2014], and Alexander Pruss in [Pruss, 2013]. As we will see, none of these arguments are in the end correct, but they do show that infinitesimals by themselves are not enough to defend (MC). As we will see, we need to learn two more lessons: non-locality and flexibility. The first is connected to what chances are, and both are lessons for how chances are to be measured. All three of them together, infinitesimals, non-locality, and flexibility, give us a coherent and satisfying picture of chance and its measurement that preserves (MC) and leaves chances empowered. Before we can see how this goes we should briefly review some of the basic facts about infinitesimals and non-archimedean extensions of the real numbers. 5. The short story on infinitesimals An infinitesimal is simply a number > 0 that is nonetheless smaller than all n 1, for n N. There are no such numbers among the standard real numbers R. Any extension of R that contains infinitesimals, and preserves reasonable properties about addition and multiplication, would be non-archimedean in the above sense: there are numbers, the infinitesimals, such that adding them to themselves finitely many times never gets you above 1. Let s call an ordered field R hyperreal if it contains the real numbers, satisfies the first-order properties of the real numbers, and contains infinitesimals as well. Any such field we can then call a field of hyperreal numbers. Since there are many different, non-isomorphic, fields of this kind, there thus is no such thing as the hyperreal numbers. That hyperreal numbers exist in general can be shown quite easily using the compactness theorem. Alternatively one can construct a hyperreal extension of the real numbers directly using an ultraproduct construction over the real numbers. 10 What is important for us in the following are two basic features of hyperreal number systems: First, they can be arbitrarily large in size, or have the same size as R. Second, there are lots of infinitesimals if there are any at all, and they form an interesting and complex structure. Both of these can be seen as quick consequences of the proof that there are hyperreal fields at all using the compactness theorem, which we should go over at least in outline. Let Th(R) be the first-order theory of the real numbers, i.e., all firstorder sentences true about the real numbers in the language containing constants for all real numbers, as well as +,, and <. Now add a new constant c to the language, and consider the new set of sentences T = Th(R) {0 < c, 1 < c, 2 < c,...}. Since every finite subset of T has a model in the real numbers, with c denoting a sufficiently large real number, all of T has a model M, by compactness. Such a model thus contains an infinite number denoted by c, while still satisfying all the first-order properties of the real numbers, since it is also a model of Th(R). M thus can t contain just one infinite number. One of the firstorder properties of the reals is that the sum of any two numbers is always another number, and thus the sum of any standard real number with c must be another number, and it must be another infinite number. Thus there is a whole copy of the standard real numbers among the infinite numbers: each of the c + r, with r a standard real numbers. But things don t stop there. The sum of two infinite numbers must be another infinite number, but it can t be one we have seen already; it must be even larger. c + c must be larger than any of the c + r, since c is 10. For more on the ultraproduct construction see section 4 of [Keisler, 1994], which covers the ultraproduct construction of hyperreal numbers, or see chapter 4 of [Chang and Keisler, 1990] for a more general and detailed treatment. philosophers imprint - 9 - vol. 14, no. 2 (february 2014)

larger than those standard r, which is guaranteed by another first-order property of the real numbers: if a > b, and c > 0, then a + c > b + c. So we need at least another copy of the standard reals, even larger. Again, we can t have just one number d larger than what we have seen so far, but we need all the d + r as well. By the same reasoning we need an infinite ascending chain of copies of the real numbers, getting larger and larger (just add two of the even larger infinite numbers together, and so on). Furthermore, between any two such copies must be another one. It is a first-order property of the real numbers that if a < b, then there is an e such that a + e = b. Such an e must come from a different copy of the reals if a and b come from different copies. And by similar reasoning we can see that there can be no smallest copy of the reals among the infinite numbers. If there are any infinite numbers at all while the first-order properties of the real numbers are preserved, then there must be lots and lots of them. And here is the rub: all this structure among the infinite numbers gets mirrored in the infinitesimals. Since for every positive number r there is also a number 1 r, another first-order property which is preserved, 11 this means that for each infinite number a there is a corresponding infinitesimal number b such that a b = 1. If r is an infinite 1 number, then for each n N: r < n 1, since r > n. There are thus lots and lots of infinitesimals if there are any at all. And if there are any at all, then this guarantees that the hyperreal numbers form a non-archimedean field. In such a field we thus have a neighborhood around 0 of lots of infinitesimals that mirror the structure of the infinite numbers, and which each are less than n 1, for all n N. And this won t be true just for 0, but for every real number. For any real number r, adding an infinitesimal to r will give us a hyperreal number infinitely close to r. And any finite hyperreal number is infinitesimally close to a unique real number. The hyperreal numbers are thus the 11. Since we don t have fractions directly in our language, the proper formulation of the first-order property is rather that the sentence x > 0 y(x y = 1) is true in M. real numbers, plus the neighborhoods of infinitesimals around each of them, plus all the infinite numbers. Although the hyperreal numbers have the same first-order properties as the real numbers, they don t have all the same properties. One big difference is the least-upper-bound principle: that a bounded set of numbers has a least such bound. This is true for the real numbers but false for the hyperreal numbers. Take the infinitesimals around 0 for example. They are bounded, since they are all smaller than, say, 1, but there is no least such bound. The sequence of n 1 is an infinite descending sequence of such bounds of all infinitesimals that eventually gets below any other bound, and thus there is no least such bound. Or take, as another example, the regular finite natural numbers in a hyperreal field. They are bounded by all and only the infinite numbers, but since there is no smallest infinite number there is no least upper bound of the finite natural numbers. That the least-upper-bound principle fails for hyperreal numbers has significant consequences, as we will see below. The least-upper-bound principle can fail, since it is not one of the first-order features of the real numbers, and thus is not guaranteed to hold for the hyperreal numbers. The principle is not just about numbers but also about sets of numbers, which takes it beyond a first-order principle. Such principles don t have to carry over from the real numbers to the hyperreal ones, and the least-upper-bound principle is one that doesn t carry over. There are many fascinating facts about hyperreal numbers, but most are not essential for us here. We can see easily, as outlined above, that such hyperreal fields have lots and lots of infinitesimals in them, and, using the upwards Löwenheim-Skolem theorem, that they exist in arbitrarily large sizes. For much more information about them, see [Keisler, 1994] for a great overview and further references. Infinitesimals are perfectly coherent, and hyperreal numbers are a serious alternative to real numbers as the measures of chance. Can we thus replace the real numbers with some hyperreal field and thereby philosophers imprint - 10 - vol. 14, no. 2 (february 2014)

save (MC) and empower chances? This might seem like it already is the answer, but it s not as easy as it looks. There are some good arguments that using hyperreal numbers as alternatives to the real numbers is not going to solve the problems that lead to the violation of (MC). We will look at these next, and what lessons we need to draw from them. In the end hyperreal numbers will win out, but only with some help. 6. A tension for the chance of an infinite sequence of heads In [Williamson, 2007] Timothy Williamson argued that even when we have infinitesimals available to measure chances, we d still have to assign 0 to the chance of events that happen. Consider an infinite sequence of coin tosses. What is the chance that all the tosses in the sequence come up heads? Williamson argued that it has to be 0. And since it is just as likely an outcome as any other sequence of heads and tails, whichever one will be the result of the coin tosses, and thus will happen, has chance 0 of happening. Here is how his argument goes: Suppose we are tossing a fair coin once every second. Let H(1) be the event of the first toss coming up heads. Let H(1...) be the event of all tosses from the first one onwards coming up heads. Let H(2...) be the event of all tosses from the second one onward coming up heads. The chance of the first toss coming up heads is 1 2, since it is a fair coin: (2) Prob(H(1))= 1 2 Since the coin tosses are independent of each other, the chance of all of them coming up heads is the chance of the first one coming up heads times the chance of the second one onwards all coming up heads: (3) Prob(H(1...))= Prob(H(1)) Prob(H(2...)) and using (2) we thus have (4) Prob(H(1...)) = 1 2 Prob(H(2...)) But, Williamson argues, H(1...) and H(2...) must have the same chance, since they are isomorphic events (p.175). Both of them are ω- sequences of coin tosses, and that one is all heads should be just as likely as that the other one is all heads. Thus (5) Prob(H(1...))=Prob(H(2...)) and thus putting (4) and (5) together we get (6) Prob(H(1...))= 1 2 Prob(H(1...)) The only way that equation can be satisfied is for Prob(H(1...)) = 0. In particular, even if we have infinitesimals available to be assigned as the measures of the chances of events, they won t help us here. No infinitesimal ɛ > 0 is such that ɛ = 1 2 ɛ. The only number that satisfies that equation even among the hyperreal numbers is 0. And so Williamson concludes that even if infinitesimals are available as the measures of chance, some events that happen still have chance 0. But why do isomorphic events have to have the same chance, and thus why is the chance of H(1...) supposed to be the same as that of H(2...)? This is, of course, very plausible for finite events, i.e., events that have only finitely many constitutive parts. But once we deal with infinite events, like an infinite sequence of coin tosses which has infinitely many constitutive events as parts, i.e., the individual coin tosses, this is far from clear. It is a defining feature of the infinite that infinite events can be isomorphic to their own proper parts, just as infinite sets can be in 1-1 correspondence with their own proper subsets. To assign H(2...) the same chance as H(1...) is not implausible if we consider these events in isolation. But it is implausible once we consider the relationship that these events have to one another. That one is a proper sub-event of the other should be relevant for how their chances relate to one another. 12 To illustrate this, consider an infinite sequence of tosses and the subsequence of it that is fairly sparse, say the sequence that consists only of every millionth toss in the original sequence. What is the chance that the second sequence is all heads? Intuitively it should be more likely 12. Ruth Weintraub makes this point in [Weintraub, 2008]. philosophers imprint - 11 - vol. 14, no. 2 (february 2014)

that all those tosses come up heads than that all of the original series come up heads. After all, the original series requires that the second sequence is all heads, and that almost a million tosses in between any two of the second sequence are all heads as well. But on Williamson s reasoning they are equally likely and both have chance 0. For this it doesn t matter how sparse the subsequence of tosses is, as long as it remains infinite. That any sequence of heads is just as likely as some very sparse subsequence of it is prima facie counterintuitive. But it is also prima facie counterintuitive that the original sequence should get a different chance than the subsequence, no matter how sparse, since, after all, they are both ω-sequences of heads. They are exactly isomorphic to each other, and alike in all other relevant respects. We have a real tension here in what seems prima facie right. This tension is very much analogous to the tension we have when we try to determine which sets should be considered as being of the same size. It is a tension between parthood and correspondence. On the one hand, proper subsets should intuitively be smaller than their supersets. On the other hand, if two sets are in exact correspondence, i.e., there is a bijection between them, then intuitively they should be of equal size. For infinite sets these two come apart, and it is not clear which side we should pick as being the more definite characterization of having the same size. Our tension here between being a proper sub-event and being an isomorphic event is perfectly analogous. Both sides have some claim on being right, but in the infinite case they are in tension. It is not clear which side should be seen as being authoritative, and thus how this tension is to be resolved. Williamson simply picks a side, but his choice is not forced upon us. We do have other options. One is to hold that loosely speaking the infinite series have the same chance, but strictly speaking they can be slightly different. In other words, the two chances can be different, but have to be infinitesimally close. This way of seeing the situation preserves our prima facie judgment that on the one hand the chances are the same (up to a small infinitesimal difference), but on the other hand they are different (since there is a small, infinitesimal difference). In particular, the infinitesimal difference can be such that the ratio of the two chances is as big as you like. An infinitesimal can be twice as big as another, or 1,000 times as big, or their ratio can be infinite. We will see more on all this below. In effect, Williamson picks the side analogous to the one Cantor picked, and things turned out pretty well for Cantor. But this doesn t mean that Williamson is right in his choice. Things turned out well for Cantor since his characterization of sizes of collections lead to a very fruitful mathematical theory. Williamson s picking sides leads to no new fruitful mathematics; to the contrary, the other side does much better here, as we will see. Measuring chances of events and sizes of sets are simply two different things. A similar tension arises for both, but there is no guarantee that this kind of tension always needs to be resolved in the same way for different cases. 7. An outline of a simple alternative treatment How else should we deal with Williamson s example? If the chances of the particular outcomes of an infinite sequence of tosses are not 0, then what should they be? On the alternative way of thinking about this we will focus on the relationships that events have to each other, and what is determined by them. The chances of H(1...) and H(2...), for example, do not need to be identical, but they need to be infinitely close to each other and have a certain ratio. The former means that (7) Prob(H(1...)) Prob(H(2...)) where holds between two numbers just in case the absolute value of their difference is an infinitesimal. The second, and more important, point is that we can t measure the chances of H(1...) and H(2...) in isolation. Which number should get assigned to the chance of H(2...) isn t something we can tell by looking only at the event by itself; we need also consider the larger situation at hand, including other events and any constraints on how their chances relate to each other. This can impose restrictions on such a measure of chances. Considering H(1...) and H(2...) in isolation, any infinitesimal will do. But considering that H(1...) and H(2...) relate to each other in a certain way, this isn t true philosophers imprint - 12 - vol. 14, no. 2 (february 2014)

any more. There is a constraint on which infinitesimals are assigned to H(1...) and H(2...) as the measures of their chance which requires a proper ratio between them, in this case that (4) Prob(H(1...))= 1 2 Prob(H(2...)) And such constraints on what infinitesimal chances these events get are easily met. In this coin toss example, such an assignment of chance to the events under consideration would need to satisfy at least the following constraints: every individual coin toss outcome gets chance 1 2 and every finite set of toss outcomes gets the chance determined 1 by finite additivity: 2 n for n tosses. Every infinite set of toss outcomes will get an infinitesimal chance, but there are constraints on which one. One minimal constraint is a generalization of (4). As usual, call a subset S of N co-finite just in case it contains all but finitely many natural numbers. Call S n-co-finite just in case it contains all but n many natural numbers. Call an infinite subset of N co-infinite just in case there are infinitely many natural numbers it doesn t contain. We will consider the constraints on the chances of outcomes of subsequences of a sequence of coin tosses by considering the cases of how many are left out. To measure chance we take a particular hyperreal field as given and assign hyperreal numbers from it to particular outcomes. For S N, let H(S) be the chance that every toss comes up heads whose index, that is, position in the sequence, is in S. Thus H(1...) = H(N), H(2...) = H(N {1}), and so on. What hyperreal numbers measure the chances of these H(S)? To start, assign H(1...) some infinitesimal as the measure of its chance. Any one will do (more on that later). H(1...) thus has a smaller chance than any finite sequence of tosses coming up a certain way, but it is still positive. If S is some n-co-finite subset of N, then the chance of H(S) is: (8) Prob(H(S)) = 2 n Prob(H(1...)) since H(S) leaves n many tosses out, each of which has chance 1 2 of coming up heads, and all of which are independent of each other. With the chance of H(1...) given, this uniquely determines a chance for all cofinite subsequences of coin tosses. What is left is to specify the chances of co-infinite subsequences of our original sequences of tosses. Here it is not completely clear what constraints need to be met, but there are two minimal constraints that seem unavoidable: 1. The chance of a co-infinite subsequence of tosses coming up all heads should be larger than that of a co-finite subsequence. 2. If S is a subsequence of T, then Prob(H(S)) Prob(H(T)). The last requirement is minimal in that it does not specify when two co-infinite sequences, where one is a subsequence of the other, should get the same chance, and when they should get different chances. One might consider changing it to > from, but this might not be well-motivated for all cases, although it is plausible for extreme cases, where one is a very sparse subset of the other. Stricter requirements of this kind can be motivated and satisfied as well, but it is at first not clear what such requirements should be more precisely, since it is not clear how the chances of such events relate to each other. For example, it is one thing to find a technical notion that captures how sparse a set is, and another to motivate that sparseness in this sense should correspond to a certain chance. 13 Another issue not dealt with by the two minimal constraints listed above is whether there are any constraints 13. For example, one could use relative density, when defined, and connect it to chance. Relative density is just the limit of the ratio of how many members there are before n as n goes to infinity. This limit exists only in certain simple cases, but even when it does exist it is not obvious that this limit should be closely tied to the chance associated with the subsequence of heads. One problem for this on the present, infinitesimal-friendly approach is that such a limit can be 0, even though there are infinitely many members in the sequence for example, when there are fewer and fewer heads among the tails as n gets larger, while there still is always another heads outcome. One can improve on the relative-density limit, and this is discussed with an eye to the application in probability theory in [Schurz and Leitgeb, 2008] and in section 3.2 of [Wenmackers and Horsten, 2013]. philosophers imprint - 13 - vol. 14, no. 2 (february 2014)