Jump to content

Talk:Bayes' theorem

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Ritchy (talk | contribs) at 15:23, 2 December 2005 (Cookies example revisited). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Hello everyone. There was a recent substantial revision of Bayes' theorem [1]. I'm afraid it doesn't look like an improvement to me. Here are some points to consider.

  1. In the introduction, Bayes' theorem is described in terms of random variables. This isn't necessary to clarify Bayes' theorem, and introduces a whole raft of heavy baggage (to mix metaphors) that is going to be well-nigh incomprehensible to the general readership.
  2. The two-line derivation of Bayes' theorem is put off for several paragraphs by a lengthy digression which introduces some unnecessary notation and includes a verbal statement of Bayes' theorem which is much less clear than the algebraic statement which it displaces.
  3. The example is somewhat problematic. It is formally correct, but it's not very compelling, as it doesn't make use of any relevant prior information; the medical test example and even the cookies example, which were moved to Bayesian inference a while ago, were superior in that respect. Perhaps if an example is needed, we can restore the medical test or cookies (I vote for the medical test fwiw). The example is also misplaced (coming before the algebraic statement) although that's easy to remedy.

Given the difficulties of the recent revision, I'm tempted to revert. Perhaps someone wants to talk me out of it. Regards, Wile E. Heresiarch 22:39, 11 Jul 2004 (UTC)

Perhaps you are right that the really simple material should come first. However, that's not a reason to throw away the example on political opinion polling. That example is in many respects typical of the simplest applications in Bayesian statistical inference. I for one find it compelling for that reason. To say that the simple statement that is followed by the words "... which is Bayes' theorem" is more than just a simple special case is misleading. Michael Hardy 23:41, 11 Jul 2004 (UTC)
Also, the "verbal" version is very useful; in some ways it makes a simple and memorable idea appear that is less-than-clearly expressed by the formula expressed in mathematical notation. The role of the likelihood and the role of the prior are extremely important ideas. Michael Hardy 23:44, 11 Jul 2004 (UTC)
I've moved the example farther down in the article, as it interrupts the exposition. I've also reverted the section "Statement of Bayes' theorem" to its previous form; the newer version did not introduce any new material, and was less clear. I put a paraphrase using the words posterior, prior, likelihood, & normalizing constant into the "Statement" section. -- I'm still not entirely happy with "random variable" in the introduction, but I haven't found a suitable replacement. I'd favor "proposition" but that it is likely not familiar to general readers. Fwiw & happy editing, Wile E. Heresiarch 14:49, 20 Jul 2004 (UTC)

Hello, I've moved the existing content of this page (last edit April 12, 2004) to Talk:Bayes' theorem/Archive1. I used the "move" function (instead of cut-n-paste) so the edit history is now with the archive page. Regards, Wile E. Heresiarch 14:30, 8 Jul 2004 (UTC)

Bayes' theorem vs Bayesian inference

It seems to me that the current version of the Bayes' theorem article contains a little too much Bayesian inference. This is not to deny from the importance of Bayesian inference as the premier application of Bayes' theorem, but as far as I can see:

  1. The section explaining terms such as posterior, likelihood, etc. is more appropriate to the Bayesian inference article. None of it is taught with Bayes' theorem in courses on elementary probability (unless, I assume, Bayesian inference is also taught).
  2. The example is one of Bayesian inference, not simply Bayes' theorem. Somewhat ironically, the Bayesian inference article contains some simple examples of Bayes' Theorem that are not Bayesian in nature, and that were moved there from an older version of the Bayes' theorem article!

Some of these things are noted in other posts to this talk page and the talk page of the Bayesian inference article, but I can't see that the current version of either article is a satisfactory outcome of the discussions. The current versions of the articles appear to muddy the distinction between Bayes' theorem and Bayesian inference/probability.

Hence, I propose to change these articles by

  1. swapping the cookie jar and false positive examples from the Bayesian inference article for the example from the Bayes' theorem article;
  2. deleting the section on conventional names of terms in the theorem from the Bayes' theorem article (but noting that there are such conventions as detailed in the Bayesian inference article);
  3. revising the description of the theorem to refer to probabilities of events, since this is the most elementary way of expressing Bayes' theorem, and is consistent with identities given in (for instance) the conditional probability article.

Since this has been a topic of some discussion on the talk pages of both articles, I would like to invite further comment from others before I just go ahead and make these changes. In the absence of such discussion, I'll make the proposed changes in a few days.

Cheers, Ben Cairns 07:55, 23 Jan 2005 (UTC).

Well, I agree the present state of affairs isn't entirely satisfactory. About (1), if you want to move the medical test to Bayes' theorem in exchange for the voters example, I'm OK with that. I'd rather not clutter up Bayes' theorem with the cookies; it's no less complicated than the medical test, and a lot less interesting. (2) I'm OK with cutting the conventional terms from Bayes' theorem . (3) I guess I'm not entirely happy with stating Bayes' theorem as a theorem about events, since "events" has some baggage. I'd be happiest to say something like P(B|A) = P(A|B) P(B)/P(A) whenever A and B are objects for which P(A), P(B), etc, make sense and that might be OK for mathematically-minded readers but maybe not as friendly to the general readership. Any other thoughts about that? Anyway, thanks for reopening the discussion. Now that we've all had several months to think about, I'm sure we'll make quick progress. 8^) Regards & happy editing, Wile E. Heresiarch 21:56, 23 Jan 2005 (UTC)

Thanks for the quick response! I also prefer the medical test example. Perhaps the cookies can be returned home and then deleted. It's not so complicated a theorem that it needs many examples.

I also take your point about events, but it's just that event has a particular meaning. Perhaps a brief, layman's definition would be appropriate, for example:

"Bayes' theorem is a result in probability theory, which gives the conditional probability of an event (an outcome to which we may assign a probability) A given another event B in terms of the conditional probability of B given A and the (marginal) probabilities of A and B alone."

I don't believe this is a foolish consistency; a precise definition of an event is an important component of elementary probability theory, and anyone who would study the area (even in the kind of detail provided by Wikipedia) should come to appreciate that we cannot go around assigning probabilities to just anything. The article Event (probability theory) explains this quite well. It seems to me that the greater danger lies in obscuring the concept with an array of vaguer terms for which we do not have articles explaining the matter. Thanks again, Ben Cairns 22:43, 23 Jan 2005 (UTC).

Well, we seem to have reached an impasse. I'm quite aware that "event" has a prescribed meaning; that's why I want to omit it from article. Technical difficulties with strange sets never arise in practical problems and for this reason are at most a curiosity -- this is the pov of Jaynes the uber-Bayesian. From what I can tell, Bayesians are in fact happy to assign probability to "just anything" and this is pretty much the defining characteristic of their school. Let me see if I can find some textbook statements from Bayesians to see what is permitted for A and B. Wile E. Heresiarch 16:02, 24 Jan 2005 (UTC)

I don't think we've reached an impasse yet, but perhaps we (presently) disagree on what this article is about. Bayes' theorem is not about Bayesian-anything. It is a simple consequence of the definition of conditional probability. I don't think that this article should be about Bayesian decision theory, inference, probability or any other such approach to the analysis of uncertainty.

Even if my assertion that people "should come to appreciate that we cannot go around assigning probabilities to just anything" is misplaced (and I'm happy to agree that it is), the word 'event' is what probabilitists use to denote things to which we can assign probabilities. I cannote speak for Bayesian statisticians, as (despite doing my undergraduate degree in the field) I now do so little statistics that I can avoid declaring my allegiance. But, again, I don't believe that this article is about that at all. (I am aware of strong Bayesian constructions of probability theory, but they are not considered standard, by any means.)

What do you think of: "Bayes' theorem is a result in probability theory, which gives the conditional probability of A given B (where these are events, or simply things to which we may assign probabilities) in terms of the conditional probability of B given A and the (marginal) probabilities of A and B alone."

The main problem I have with the event business is that it's not necessary, and not helpful, in this context. Being told that A and B are elements of a sigma-algebra simply won't advance the understanding of the vast majority of readers -- this is the "not helpful" part. One can make a lot of progress in probability without introducing sigma-algebras until much later in the game -- this is the "not necessary" part. I'd prefer to say A and B are variables -- this avoids unnecessary assumptions. "A and B are simply things to which we may assign probabilities" is OK by me too. For what it's worth, Wile E. Heresiarch 16:24, 25 Jan 2005 (UTC)

The events article isn't that bad; the majority of it concerns a set of simple examples corresponding to the "things to which we may assign probabilities" definition. Of course, it also mentions the definition of events in the context of sigma algebras, but that is as it should be, too (after all, the term is in common use in that context). If you have qualms with the way the events article is presented, perhaps that needs attention, but I don't see that this should be a problem for Bayes' theorem. It seems a little POV to avoid use of the conventional term for "things to which we may assign probabilities" on the grounds that its formal definition, which does not appear in this article and is not the focus of the article on the term itself, may be difficult for some (even many) people to understand. Cheers, Ben Cairns 05:54, 26 Jan 2005 (UTC).

OK, so you saw the "not helpful" part. Can you address the "not necessary" part? Btw I don't have any desire or intent to change the event article. Wile E. Heresiarch 00:31, 27 Jan 2005 (UTC)

I think my comment above covers this to some exent, but to clarify... While the topic can certainly be explained without reference to events, we could just as easily discuss apes without calling them by that name—or worse, by calling them 'monkeys'—but that would obscure the facts that apes are (a) called 'apes', and (b) are not monkeys.

I have to say that I don't understand your resistance to using the word 'events', when you are satisfied with the (essentially) equivalent phrase, "things to which we may assign probabilities." How does adding the word detract from its elementary meaning? I don't deny that one can make a lot of progress without worrying about the details of constructing probability spaces, but providing a link which eventually leads to a discussion of those details hardly requires the reader to assimilate it all in one sitting.

Could you perhaps suggest, as a compromise, a way to present the material that (a) is clear even to the casual reader, and (b) at least hints that these things are called 'events'? Ben Cairns 04:25, 27 Jan 2005 (UTC).

Spelling of of possessive ending in 's'

Sorry to be a prude but I thought that names ending in 's' should be spelt 's'-apostraphe-'s', as in "Jones's", and should not end in an apostraphe unless the name is a plural. Shouldn't this page be "Bayes's" or is this rule particular to the UK? --Oniony 15:17, 25 July 2005 (UTC)[reply]

The Wikipedia Manual of Style says either is acceptable. I usually see "Bayes' theorem" instead of "Bayes's Theorem." I honestly don't know if this is a US/UK thing or just a matter of taste. (Personally I prefer the former.) --Kzollman 17:56, July 25, 2005 (UTC)
The lower case initial t in theorem is prescribe by Wikipedia's style manual, I think; certainly it's the usual practice here. I titled an article Ewens's sampling formula and created redirects from the various other conventional ways of dealing with possessives and eponymous adjectives, etc. I'm not sure what the style manual says, nor do I have settled preferences on this one. Michael Hardy 20:37, 25 July 2005 (UTC)[reply]
Googling for "Bayes' theorem" yields about 144 k hits, while "Bayes's theorem" yields about 6 k. Restricting the search to site:en.wikipedia.org yields 154 and 10, respectively. Searching newsgroups yields about 2500 and 150, respectively. Since both forms are acceptable, let's use "Bayes' theorem", which has much more currency than "Bayes's theorem". Wile E. Heresiarch 03:16, 26 July 2005 (UTC)[reply]

"Nontechnical explanation" and cookies example

Hello. I've cut the "nontechnical explanation" and the cookies example for the following reasons. (1) "Nontechnical explanation" is mistaken. Bayes' theorem isn't limited to observable physical events, as suggested by the repeated use of the word "occurring". The author has been misled by the suggestive term "event". (2) The verbiage about the term likelihood is void of meaning: This measure is sometimes called the likelihood, since it is the likelihood of A occurring given that B occurred. It is important not to confuse the likelihood of A given B and the probability of A given B. Even though both notions may seem similar and are related, they are quite different. Uh huh. (3) Descriptions of each term P(A), P(B), etc are covered elsewhere in the article. (4) P(A), P(B), etc are called "measures" in the "nontechnical explanation" but they're not; I suppose the author intended "quantities". (5) The description of P(B) is mistaken: This measure is sometimes called the normalising constant, since it will always be the same, regardless of which event A one is studying. No, it is not called a normalizing constant because it is always the same. (6) The cookies example doesn't illustrate anything interesting. (7) The cookies example already appears on the Bayesian inference page. -- The article needs work, and it can be improved, but not pasting random stuff into it. Wile E. Heresiarch 07:17, 28 November 2005 (UTC)[reply]

I agree with some of the points that you raise, but I also believe that there was some good information in the "non-technical" section that you removed. Furthermore, I believe that many math-related articles on Wikipedia, this one included, tend to start immediately with highly technical explanations that only Ph.D. mathematicians can understand. Yes, the articles do need to include the formal mathematical definitions, but I believe that it would be helpful to begin each article with a simple, non-technical explanation that is accessible to the more general reader. Most of these math-related articles have important applications well beyond mathematics -- including physics, chemistry, biology, engineering, economics, finance, accounting, manufacturing, forensics, medecine, etc. You need to consider your audience when you write articles for Wikipedia. The audience is far broader than the population of Ph.D. mathematicians. -- Metacomet 14:37, 28 November 2005 (UTC)[reply]
One other point: in my opinion, it is not a good idea in general for articles to point out that they are starting with a non-technical explanation, and that the full technical discussion will come later, as this article originally did. It is better simply to start with simple, non-technical descriptions and then smoothly to transition to the more formal, technical discussion. Sophisticated readers will know immediately that they can skim over the non-technical parts, and read the more advanced section in greater detail. Non-sophisticated readers will appreciate that you have tried to take them by the hand and bring them to a deeper level of understanding. -- Metacomet 14:50, 28 November 2005 (UTC)[reply]
Hi, I wrote the non-technical explanation, so I'll chip in with my thoughts. First, the reason I wrote it is that this article is too technical. If you check back the history before I first added the section, you'll see there was a "too technical, please simplify" warning on the page. Hell, I'm a computer engineer, I use Bayes' theorem every day, and even I couldn't figure out what the page was talking about. People who don't have a strong (grad level) mathematical background will be completely lost on this page. There is a definite, undeniable need for a simpler, non-technical explaination of Bayes' Theorem.
That said, the vision I had for the non-technical explaination was for it to be a stand-alone text. The technical explaination seemed complete and coherent, if too advanced for regular readers, so I did not want to mess around with it. I thought it would be both simpler and better to instead begin the page with a complete non-technical text, which regular readers could limit themselves too while advanced readers could skip completely to get to the more technical stuff. That is why, as Heresiarch pointed out, the definitions of Pr(A), Pr(B) etc. are there twice.
So I vote that we restore the non-technical explaination. Heresiarch, if you have a problem with some terms used, such as "occur" or "measure", you should correct those terms, not delete the entire section. But keep in mind when doing those corrections that the people who'll be reading it will have little to no formal background in mathematics – keep it sweet and simple! -- Ritchy 15:11, 28 November 2005 (UTC)[reply]
I think there is room for a compromise solution that will make everyone happy and improve the article substantially. Basically, I think Ritchy is correct, the non-technical explanation needs to go back in at the beginning, but it needs to be cleaned up a bit and the transitions need to be a bit smoother. The truth is, the so-called non-technical discussion is not even all that simplified -- it happens to be pretty well written and provides a very good introduction to the topic. Again, I think it just needs a bit of cleaning-up, and it needs to be woven into the article more smoothly. -- Metacomet 15:54, 28 November 2005 (UTC)[reply]
As a first step, I have added the simple "cookies" example back, but this time I grouped it with the other example in a single section entitled "Examples." Each example has its own sub-section with its own header. I think it improves the flow of articles when you put all of the examples together in a single section, and begin with simple examples before proceeding to more complicated ones. -- Metacomet 16:11, 28 November 2005 (UTC)[reply]
The next step is to figure out a way to weave the non-technical explanation back in near the beginning of the article without sounding too repetitious and with smooth transitions. -- Metacomet 16:11, 28 November 2005 (UTC)[reply]
I am not opposed to some remarks that are less technical. I am opposed to restoring the section "Non-technical explanation", as it was seriously flawed. If you want to write something else, go ahead, but please don't just restore the previous "Non-technical explanation". Please bear in mind that just making the article longer doesn't necessarily make it any clearer. Wile E. Heresiarch 02:22, 29 November 2005 (UTC)[reply]
Actually, I think it is pretty good as written. You say that it is "seriously flawed." I am confused: what are your specific objections or concerns? -- Metacomet 03:36, 29 November 2005 (UTC)[reply]
See items (1) through (5) above under "Nontechnical explanation" and cookies example. Wile E. Heresiarch 07:04, 29 November 2005 (UTC)[reply]
I have pasted a copy of the text below for reference. -- Metacomet 04:03, 29 November 2005 (UTC)[reply]
I have edited the "Nontechnical explanation" according to the critics (1) and (4). (2) and (3) are meaningless – it seems Heresiarch just doesn't like things explained too clearly to people who don't know math. (5) seems to be a misunderstanding. Pr(B) is the probability of B, regardless of A. Meaning, if we're computing Pr(A|B), or Pr(C|B), or Pr(D|B), the term Pr(B) will always be the same. That's what I meant by "it will always be the same, regardless of which event A one is studying." If the statement isn't clear enough, I'm open to ideas on how to improve it. -- Ritchy 20:10, 29 November 2005 (UTC)[reply]

Non-technical explanation

Simply put, Bayes’ theorem gives the probability of a random event A given that we know the probability of a related event B occurred. This probability is noted Pr(A|B), and is read "probability of A given B". This quantity is sometimes called the "posterior", since it is computed after all other information on A and B is known.

According to Bayes’ theorem, the probability of A given B will be dependent on three things:

  • The probability of A on its own, regardless of B. This is noted Pr(A) and read "probability of A". This quantity is sometimes called the "prior", meaning it precedes any other information – as opposed to the posterior, defined above, which is computed after all other information is known.
  • The probability of B on its own, regardless of A. This is noted Pr(B) and read "probability of B". This quantity is sometimes called the normalising constant, since it will always be the same, regardless of which event A one is studying.
  • The probability of B given the probability of A. This is noted Pr(B|A) and is read "probability of B given A". This quantity is sometimes called the likelihood, since it is the likelihood of A given B. It is important not to confuse the likelihood of A given B and the probability of A given B. Even though both notions may seem similar and are related, they are quite different.

Given these three quantities, the probability of A given B can be computed as

Cookies example revisited

The continued expansion of the cookies example isn't improving it. The medical test example, presently in Bayesian inference, is no more complicated, and much more compelling. The medical test, incidentally, is a standard example of the application of Bayes' theorem. I'm going to cut the cookies and copy the medical test unless someone can talk me out of it. Wile E. Heresiarch 15:59, 29 November 2005 (UTC)[reply]

I totally disagree. The cookies example, although rather simple, provides a tangible example of the relationship between conditional probabilities and Bayes' thoerem. Actually, one of its virtues is the fact that it is such a simple example. If you don't find it interesting, you don't have to read it. If you are so advanced in your understanding of Bayes' theorem that this example is trivial for you, then you don't have to read it. Not all readers of Wikipedia are as smart as you are. What is the harm in leaving it in the article? -- Metacomet 16:36, 29 November 2005 (UTC)[reply]
The medical test example is essentially the same as the cookies: bowl 1 = people with disease, bowl 2 = people without, plain = negative test, chocolate chip = positive; Fred has a plain cookie, which bowl is it from = Fred tests negative, does he have the disease. If the cookies example is simple, then so is the medical test, and the latter has the advantage that people (even ordinary readers) truly care about such problems. Wile E. Heresiarch 23:21, 29 November 2005 (UTC)[reply]
Furthermore, although the medical example is interesting, it is confusing and too advanced for a first example meant to introduce basic concepts. Again, the audience that we are writing for is not Ph.D. mathematicians; the audience is a general audience that includes people who do not have the same background that you do. The goal is to explain the concepts, not to show how smart you are by throwing around a lot of techno mumbo-jumbo that no one understands except the elite few. -- Metacomet 16:42, 29 November 2005 (UTC)[reply]
That's a nice strawman you have there. If you bothered to check the discussions above, you would see that I've argued against including measure-theoretic stuff (which, I believe, counts as "technical mumbo-jumbo"). More recently, I revised the introduction to remove the technical stuff and make it entirely verbal. Wile E. Heresiarch 23:21, 29 November 2005 (UTC)[reply]
Good. I am glad you agree. BTW, I think the revisions that you made recently to the introduction are excellent. I realize that most people above the age of 12 don't care much about bowls of cookies. Nevertheless, I think it illustrates the concepts very well and in a very straightforward way. Finally, I think the term I used was "techno mumbo-jumbo," not "technical mumbo-jumbo".  ;-) -- Metacomet 04:32, 30 November 2005 (UTC)[reply]
One more thing: if any of the examples in this article should be removed, it is Example #2 on Bayesian inference and not the cookies example. I have a pretty strong background in math, and I don't have the first clue what this example is all about. What benefit does it provide other than to confuse the reader? -- Metacomet 16:49, 29 November 2005 (UTC)[reply]
Agreed. In fact I've argued the same point (item 3 in my edit of July 11, 2004, at the top of the page). Wile E. Heresiarch 23:21, 29 November 2005 (UTC)[reply]
I agree with Metacomet. The cookie example is clear and relates to a simple tangible situation. Everyone can easily imagine drawing cookies from a bowl. This makes it an excellent medium to explain Bayes' Theorem. The example is complete, and clearly and accurately illustrates the Theorem. It is explained in plain and simple terms, so that anyone can understand it. Furthermore, it does not require any background knowledge from the reader in any other domain, and does not needlessly take on another topic like medicine or polling, something that only serves to confuse readers. I see no reasons to cut it; quite the opposite, it is the perfect example for the page and should definitly be kept. -- Ritchy 20:01, 29 November 2005 (UTC)[reply]


Why did the cookie example get cut? There was only one person who didn't like it, and the discussion here clearly highlighted why it was necessary to keep it. You can't possibly think that this medical example is simpler! The cookie example explained Bayes' Theorem much more clearly, and using a situation everyone is familiar with. Unless someone comes up with a good reason why it should be cut today, I'll restore it tomorrow. -- Ritchy 15:23, 2 December 2005 (UTC)[reply]