Discussion Notes for Bayesian Reasoning Ivan Phillips - http://www.meetup.com/the-chicago-philosophy-meetup/events/163873962/ Bayes Theorem tells us how we ought to update our beliefs in a set of predefined theories in light of new evidence. It is considered by many to be the gold standard of rational inference. It s not rational to believe a theory just because it is possible. The theory must be inductively probable for it to be rationally believable. There are many side questions surrounding how we come up with theories, and how we ought to set our prior beliefs, but this presentation will keep things simple. To illustrate the theorem, we ll work through a simple probability problem without the theorem. A Dice Game Suppose I have a 4-sided die and a 20-sided die (a dodecahedron). The 4-sided die is a 4-sided pyramid with triangular sides, numbered 1 through 4. The 20-sided die is a dodecahedron with sides numbered 1-20. I take one of these dice at random, and roll it. At this stage you have no idea which of the two dice I rolled. However, you have two theories: T4 = The theory that I chose to roll the 4-sided die. T20 = The theory that I chose to roll the 4-sided die. Here is where we introduce our first key concept: the prior probability. Prior probability represents your confidence or degree of belief in each theory. Since you know that I rolled at least one die, but you don t know which, your confidence is evenly split between each theory, and your prior probability is 0.5 (or 50%). By convention,m we write this as and P (T4) = 0.5 P (T20) = 0.5 Now, I give you a little more information. I report to you that I rolled a 3. Which die did I probably roll? Obviously, either of the dice could possibly give me a 3 because both dice have a side labeled 3. However, we re not asking whether the theories are possibly true, but which theory is probably true. Humans find probability calculations difficult, and it s often easier for us to work things out in terms of frequencies. To do this, just imagine that we play this game very many times. Let s suppose we play this same game 200 times. When I randomly select between the two dice, I will select the 4-sided die 50% of the time and
the 20-sided die 50% of the time. That means that, in 200 plays, I will select the 4-sided die 100 times and the 20-sided die 100 times. In 100 rolls of the 4-sided die, I will roll a three 25 times. In 100 rolls of the 20-sided die, I will roll a three just 5 times. The relative likelihood of seeing our evidence in each theory will be used to update our beliefs based on the evidence. This information has a name in Bayesian reasoning. It s called a likelihood function, and it expresses the probability of seeing some evidence in a given theory. P (3 T4) = 25/100 = 25% This equation says that the probability of getting the evidence (rolling a 3) given T4 is 25%. Similarly: P (3 T20) = 5 / 100 = 5% Okay, back to our calculation. Overall, 200 plays will result in my rolling a three 30 times, 25 of which will be on the 4-sided die. As a result, I can say that 83.3% of the time, a three is rolled on the 4-sided die. In terms of a formula, I am calculating my updated probability or confidence in T4: P(3 T4) P(3 T4) + P(3 T20) = 25% P (T 4 3) = 25% + 5% = 8 3.3% This ratio gives me a conditional probability of T4 being true given the evidence that a three was rolled. This is how inductive inference works. Based on our knowledge (or ignorance) of the world, and based upon our theories, we can shift our confidence in our theories when we get new evidence. Initially, we were no more confident in T4 than in T20, but after hearing that a three was rolled, we are now 83.3% confident in T4, and only 16.6% confident in T20. Now, suppose I continue the game. I will roll the same die I selected last time, and report what I have rolled. Obviously, things have now changed. You expect a low number more than you expect a high number. If I roll a 3 again, you will be even more confident that the theory T4 is true. With each round of evidence, our prior belief in each theory has to be taken into account. This brings us to Bayes Theorem: P (T 1 E) = P(T ) P(E T ) 1 1 P(T ) P(E T ) + P(T ) P(E T ) 1 1 2 2 This assumes that T1 and T2 are our only theories, and that they are mutually exclusive. The E represents our evidence. Bayesian reasoning is usually presented as an iterative process. We start with our prior probabilities, P(T1) and P(T2). We get some evidence, E, and compute our new confidence in each theory, P(T1 E) and P(T2 E). When we go into the next round of experimentation, our conditional probabilities become our new prior probabilities.
Some Intuitive Principles Bayesian inference can be summed up intuitively in three principles. The first principle, as illustrated above, is this: All things being equal, that the theory that makes more definite, restrictive predictions is more likely to be true when the data falls in line with those predictions. A corollary being that theories that are vague or more prediction-less are less likely to be true, even if they are compatible with more possible outcomes. When we iterate this inference process, we can become very confident in a theory. If I roll the selected die another 9 times (a total of 10 rolls), and all 10 rolls are reported as being between one and four (inclusive), you are now extremely confident that I selected the 4-sided die. In fact, the probability of rolling four or less in 10 consecutive rolls of a 20-sided die is approximately 1 in 10 million. This leads us to our second principle: All things are not equal! Evidence accumulates, and can make us extremely confident in our theory. Finally, once we reach a very high level of confidence, we start to question counterevidence more carefully. If I roll the selected die an 11th time, and report that I rolled a 14, it naively appears that the game is up, and I know for certain that the selected die was the 20-sided die. However, this conclusion is incorrect. How confident are we that the reports you are hearing are true? How confident are you that I read the die correctly, that I spoke my report correctly, and that you heard my report correctly? These highly improbable errors are negligible when our confidence in a theory is relatively low. Let s suppose that the probability of you mishearing my report is 1 in a million. This has a negligible effect on the first conclusion, changing the initial 83.3% estimate by around 1 part in a million. Similarly, if my first roll was a 14, our naive posterior probability would be 100% in favor of the 20-sided die, and this, too, only changes by 1 part in a million. However, once we have overwhelming evidence for the 4-sided die being the selected die, it becomes potentially more likely that we misheard the report than that our well-established theory is incorrect. Thus, our third principle is this: Extraordinary claims require extraordinary evidence! To resolve this problem, we need more empirical data. We need more control over the experiment to ensure that false positives are not corrupting our conclusion.
Evidence What counts as evidence? The evidence can be anything we experience. We re not limited to the five senses. Induction and Deduction The process as described here results in an inductive inference. Inductive inference is an inference from the specific to the general. Induction takes data, and infers a rule that is probably true, but not necessarily true. However, the process of inductive inference relies on deduction. Deduction takes a general rule and infers specifics that are necessarily true if the general rule is true. If my 4-sided die is a fair die, then we expect (in the statistical sense) 100 rolls to be split evenly across each of the 4 possibilities, i.e., 25 ones, 25 twos, 25 threes and 25 fours. This is true by definition of a 4-sided die. In general, our likelihood functions, P(E T), are deductive calculations. They are necessarily true based on the definitions of our theories. Note: we teach deduction in high schools, but induction is rarely ever taught in a formal way, even at the graduate level. Abduction The dice game example problem is simplified to make the method of inference easy to see. However, in the real world, situations are usually more complex, and we re not served our theories on a silver platter. Instead, we look at data, and must invent theories that will be plugged into our Bayesian inference. This process of invention is usually called abduction, and it requires intelligence and creativity. Creativity is needed to invent the theory, and intelligence is needed to check its self-consistency. Prior to the scientific theory of evolution, the only known way of getting complexity was through design. By the time Charles Darwin set out on his voyages, it had already been observed that species appeared to have evolved over long periods of time. Charles Darwin recognized that inheritance, natural selection and mutation could result in an evolutionary process that creates new species and causes others to become extinct. This step of establishing a candidate theory, of establishing that a theory could work, is regarded as separate from the inductive inference that the theory is most likely to be true. Thus, invention of theories in this fashion is not a case of either deduction or Bayesian induction. It s probably best classified as a case of abduction.
Discussion Questions Does it make sense to say that I believe that P is true, but I don t think P is probable.? If it does not make sense, then any proper belief has to be tied to some statement of probability, even if that statement is vague. What shall we say about theories that make no predictions? If a theory makes no predictions, isn t it identical to the Null Hypothesis, i.e., the hypothesis that there is no relationship between phenomena? The Monty Hall Problem The Monty Hall problem has a very counterintuitive answer: Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? Bayes Theorem makes the correct answer easier to see. How does Bayesian Reasoning relate to Occam s Razor? Jefferys and Berger (1991) state "a hypothesis with fewer adjustable parameters will automatically have an enhanced posterior probability, due to the fact that the predictions it makes are sharp". http://quasar.as.utexas.edu/papers/ockham.pdf