Talk:Correlation does not imply causation

ATTENTION: If you would like to talk about the page title or have a suggestion for the title, please leave your comments under Page title discussion. Please sign your comments with ~~~~.

Suggested Format

The "general pattern" section should be reorganized around the standard threats to casual inference:

1) omitted (unobserved) variables (possibly with a link to the weak page on Mediator variable

2) selection bias

3) simultaneous causation (reverse causation)

4) measurement error

It would be nice to have vivid examples of each type of problem. Cristo00

Requested move

Proposal :	Correlation does not imply causation → Correlation implies causation
Rationale :	No need for the parentheses, there's no other article on Wikipedia with a similar name, and correlation implies causation redirects to the more complicated title.
Proposer :	Silence 12:52, 30 May 2006 (UTC)[reply]

Survey and discussion

Please add * Support or * Oppose followed by a brief explanation, then sign your vote using "~~~~".

Support. No disambiguation necessary. David Kernow 11:43, 31 May 2006 (UTC)[reply]
Support, does not need parantheses if the plain title is a redirect to it. JIP | Talk 17:37, 31 May 2006 (UTC)[reply]
Support; brackets are pointless in this case. smurrayinchester^{(User), (Talk)} 17:42, 31 May 2006 (UTC)[reply]

Done based on consensus, have fun! -- Kim van der Linde ^{at venus} 18:57, 3 June 2006 (UTC)[reply]

Page title discussion

This article needs to be renamed; unqualified falsehoods just don't make for good titles. --Ryguasu 12:47 Nov 25, 2002 (UTC)

I think once it was a sub-page of fallacy? It would almost make sense as that (I think there is a redirect from Fallacy/Correlation....? It could stand alone as "Correlation and Causation" if rewritten. But someone should look at the fallacy collection first. Ctwardy 19:33 Nov 25, 2002 (UTC)

Why isn't this named "cum hoc ergo propter hoc"? Post hoc ergo propter hoc is called by its analogous name. Anyway, nobody is ever going to search for an article called "Correlation implies causation" so why not use a more official term? I also think that the article should talk about similarities and differences between post hoc and cum hoc. Just my two cents, I don't know much about this topic. --Leapfrog314 03:46, 8 September 2005 (UTC)[reply]

I am completely opposed to using Latin names here. I think it far more likely someone will look for, and spell correctly, an English name than a Latin name. That said, I think this should be rolled into the logical fallacy article, as opposed to creating a new article for each fallacy. StuRat 02:19, 17 September 2005 (UTC)[reply]

I found this article by searching google for "correlation is not causality" and it was exactly what I was looking for, so there goes your "nobody" arguement - Anonymous on Tue, 10 Jan 2006 18:54:15 -0800

Why is this article at Correlation implies causation (logical fallacy)? Why can't it simply be at Correlation implies causation? smurrayinchester^(Talk)]] 08:09, 29 May 2006 (UTC)[reply]

Anonymous == nobody. Anyways, why not just make it redirect?

Isn't "Correlation implies causation fallacy" a better name of the article? Narssarssuaq 17:18, 3 July 2006 (UTC)[reply]

Or what about "Correlation implies causation (fallacy)"Kmarinas86 17:24, 20 August 2006 (UTC)[reply]

Or we could go with "Correlation implies, fallaciously, causation". Aaadddaaammm 06:47, 2 October 2006 (UTC)[reply]

How about "False cause (fallacy)" in like kind with these guys. I would think "Correlation fallacy" would work just fine too. Or maybe "Correlation is causation (fallacy)". On second thought, maybe "Correlation is causation (fallacy)" would be best. -- Chris53516 16:51, 2 October 2006 (UTC)[reply]

Surely Correlation does not imply causation is a better title, that phrase is widly known and has the benefit of being true to boot. It would require some rewriting, but it is the more important concept. --Salix alba (talk) 20:09, 18 October 2006 (UTC)[reply]

Not true, Correlation does imply causation. What correlation doesn't do is prove causation. I know that the phrase 'correlation does not imply causation' is the most widely used, but it is incorrect. I propose that we move the page to Correlation does not prove causation or Correlation proves causation fallacy and mention the more widely used phrase as an alternative name for the fallacy in the first paragraph. Grumpyyoungman01 00:43, 28 October 2006 (UTC)[reply]

I agree with Salix alba and think Correlation does not imply causation is the best choice. In regards to Grumpyyoungman01's statement that causation is implied by correlation, I think it would be best to really evaluate that thought before placing it into the article. Right now, I am having a very difficult time agreeing with that, and since most people characterize the idea as "correlation does not imply causation" then we would definitely need a good number of high quality references to support describing the idea otherwise. - Dozenist talk 10:24, 28 October 2006 (UTC)[reply]

Correlation only implies causation because we are feeble-minded. In truth, it should never imply causation because of the myriad of ways of looking at the problem, as discussed on the page. For example, one "cause" could actually be an effect. Since correlation is the examination of two events at the same time, the linear time relationship of causation is missing. That is, since one of the events does not precede the other, the definition of "cause" is not met for either event. Therefore, an article with such a title would be misleading and entirely false. I'm in favor of any article title that is NOT a lie, or is plainly displayed as a fallacy (e.g., "Correlation is causation (fallacy)"). -- Chris53516 14:33, 30 October 2006 (UTC)[reply]

When talking about correlations two events can occur at roughly the same time, but not at the exact same time so that the possibility/plausibility of cause is met. Implication has nothing to do with objective Truth, when you are working with implication, you are speaking about the reaction or ideas from beings who are, as you put it 'feeble-minded'. Grumpyyoungman01 22:46, 30 October 2006 (UTC)[reply]

Additional information

Moved from the main article (articles are not for discussion!!)

But I thought it important to note that even though a logical fallacy, there was a stronger but deeper link between causation and correlation. If you believe Reichenbach's principle, then you believe that robust correlation implies SOME causation, just not necessarily direct links.

An earlier version of this page offered two examples "that show that it is sometimes quite difficult to judge correlations":

Statistics prove that most car accidents happen between vehicles driving at rather low speeds. Few accidents take place at, let's say, 100 mph. Does this mean that it is safer to drive fast? Of course not, most accidents take place within 25 miles of their primary city or suburban residence, usually driving at a moderate speed, ergo most accidents happen at moderate speeds.

Yes indeed. Although mostly this is a common failure to take base rates into account when doing correlations (or any other kind of inference). The prediction is that once you separate out the fact that most driving happens at low speeds, the correlation between speed and accidents will change.

Note that you could make another faulty inference given the explanation. Most accidents do take place within 25 miles of home. Does this mean it is more dangerous to drive near your house than far away? It might mean this. It might mean that people become complacent on familiar roads, and are more alert and safer when travelling. But those seem unlikely, and it is far more likely that it is another base rate effect: by far most driving happens near home than away. After all, every successful trip both starts and ends at home.

the correlation between reserve parachute deployment failures and death is quite high; if the main parachute fails and then the reserve parachute fails, the parachutist almost always dies. However it is the sudden deceleration when the parachutist hits the ground that causes the death, not the parachute failure. The parachute failure leads almost inevitably to death, but does not cause it.(1)

Actually I strongly disagree here. The reserve failure most certainly does cause the death! A good way to kill your (parachutist) enemies is to disable their chutes. Granted, the ground is a more proximate cause of death, but there will always be a more proximate cause of death. For example, even the impact with the ground is not the most proximate cause. The impact of your organs with your skeleton handles that. Etc.

The correlation between reserve chute failures and death is in fact a good indication of a causal link. Not that we needed statistics to find that one. :-)

Even though there are ways to go more securely from (many) correlations to (some) causal structure, unmeasured common causes can always interfere with those attempts. And most social scientific methods do not even try to do proper causal inference. They merely apply regression measures (correlations) and then make causal inferences from there. All statistics textbooks warn (overly much) against inferring causation from correlation, but in practice correlation methods are almost always used to establish causation. And this practice commonly leads to faulty conclusions drawn from scientific research.

This is in contrast to experimental science with proper controls, where you hold everything else constant and wiggle A. When you do this (and only then), you see B wiggle. That is a pretty foolproof way to establish causation. Lacking perfect control, a randomized clinical study is the best alternative, because it is the best guarantee that other causes are averaged out between the treatment and the non-treatment population.

The previous version included links to Problem of induction, and physics along with a 1-paragraph discussion that I feel blurred the important distinction between experimental correlations and observational correlations. For example, I do not think that moving from Newton's laws to the Theory of Relativity is a good example of problems with correlation. There is more on that shift no doubt under Philosophy of science, especially topics like theory choice, confirmation, induction, and paradigm-shift.

This page should probably be broken up.

The discussion in this page is a bit tortured. I've revised the section on "determining causation" to emphasize counterfactual reasoning. This should help readers think about conditions under which correlation provides persuasive evidence of causation. 09:14, 28 August 2006 (UTC)

The Latin name for this fallacy is post hoc ergo propter hoc: literally, "After the fact, therefore because of it."

(1)The United States Parachute Association (http://www.uspa.org/) terms this type of fatality "Impact with ground"

Political Issue

Since this is an encyclopedia, perhaps the bit about gun control could be changed to a more neutral example. As it stands, the article appears to oppose gun control.

Regards, Rajeev.

I agree. A more neutral example would be better, and I'll change it (sooner or later) if there are no objections. I think it is worse than just being political; the argument seems to assume that either gun ownership or crime is the cause, and the other is correlated whilst ignoring the possibility of a common cause. In fact, none of the examples are very inspiring. Rls 23:49, 24 Aug 2004 (UTC)

Changed it. I don't really like my example, but feel free to edit. Sir Elderberry 00:51, 2 May 2006 (UTC)[reply]

Art imitates penguins (or was it the other way around)?

Another example illustrating this fallacy was a study which found that British arts funding levels had an extremely close correlation with Antarctic penguin populations. Neat factoid, but an encyclopedia is not a factoid collection. Please provide a source and/or an explanation for this, as with the other examples. For all our readers know we completely made this up. Removing it for now. 82.92.119.11 20:34, 24 Dec 2004 (UTC)

Vague reasoning

But if there was a common cause, and you had that data as well, then often you can establish what the correct structure is. Likewise (and perhaps more usefully) if you have a common effect of two independent causes.

What does "that data" refer to? Some sort of data about the "common cause", I presume, but what exactly? I can't figure out what these sentences are supposed to mean. 82.92.119.11 20:38, 24 Dec 2004 (UTC)

Reichenbach's principle is closely tied to the Causal Markov Condition used in Bayesian networks. The theory underlying Bayesian networks sets out conditions under which you can infer causal structure, when you have not only correlations, but also partial correlations. In that case, certain nice things happen. For example, once you consider the temperature, the correlation between ice-cream sales and crime rates vanishes, which is consistent with a common-cause (but not diagnostic of that alone).

This is presumably obvious if you know what Bayesian networks are. But let's assume I am just a reader interested in logic—do I care that "certain nice things happen"? What exactly does it mean to "consider" the temperature? Enter the data into the sample? And how does that make the relation "vanish"? If we want to mention Bayesian networks, we should put it in better context for non-technically inclined readers. 82.92.119.11 20:43, 24 Dec 2004 (UTC)

Expert attention

Someone familiar with statistics needs to give some concrete examples and a clearer, more detailed explanation so that non-scientific readers can understand this concept. -- Beland 02:40, 28 Feb 2005 (UTC)

195.42.89.34 08:56, 26 December 2005 (UTC) Perhaps it is not easy, with statistics. I saw a book, in English it should be entitled like "Paradoxes in mathematical statisitics" with sveral examples that are hard to believe. Examle is testing new drugs. Testing new drug in two hospitals independently shown drug is effective. Then summarising quantity of patianets classes - and we find that fro two hospitals in total drug is counter-effective. Hard to believe, but... Ok, i still want to donate: Common Russian joke about statistics state that It was prooved that 100% of men, died of cancer, had eaten cucumbers. We all know now how harming cucumbers are. [reply]

PS: BTW, what perhaps could be more example, is games. Do wild, cynic games provoke wild, cruel behaviour, or do cruel people prefer such kind of games ? AD&D history or later PC Games scandals.

Hume

There should be more on David Hume, who essentially said that 'correlation is causation' - it is impossible to see causation, all we can know is correlation. - 05:31, 10 November 2005 (UTC)

Actually, there is a way of seeing causation. Conduct an experiment (however horribly impractical and expensive it may be), and you can determine causation, as opposed to comparing sets of data to one another. The statement made by Hume would apply to the Chemical "x" causes Cancer situation. You can see that large concentrations of Chemical "x" might have a correlation with higher cancer rates by comparing data on cancer rates in certain areas with data on concentrations of Chemical "x" within those areas, but that essentially tells you nothing about true causation, though it provides possibilities.

The way to know causation for certain is to actually conduct an experiment with this chemical, by exposing people to it (and controlling other possible variables) and checking for the emergence of cancer from the direct effect of this chemical. That is how you can know causation, by direct experimentation and observation, as opposed to checking various statistics and lining them up for examination. I will admit, however, that Hume is right in a way. Most experimentation of the sort that would provide meaningful proof of causation for statisticians is either unethical or extremely impractical. In all practical terms, David Hume's statement can be assumed to have a grain of truth. It's clear, however, that in simple experimentation and observation, Hume's logic is incorrect; say you were to find a very high correlation between pedaling a bike harder and the bike going faster. This would be, of course, based on two specific sets of data you studied: the speed at which the bike was pedaled, and also the rates of speed that the bike travelled at. You can assume correlation doesn't imply causation here, but by direct experimentation, and observation of your own, it's safe to assume that pedaling harder will make the bike travel faster. Of course, these are no-brainers (And probably worded falsely somewhere...I'm nothing but an ametuer). The point is, Hume's logic does indeed apply in the grand scale, but his generalization in smaller cases is more a matter of formality as opposed to the actual case of assuming causation. —The preceding unsigned comment was added by 70.95.202.104 (talk • contribs) .

Even with numerous experiments, all you actually see is correlation, maybe even 100% correlation, but you can't actually see causation. I'm speeking in a philosophical sense here, not in the day-to-day practical science sense. I'm guessing the article on causation, maybe also Popper's falsifiability, deals with some of this Philosophy of Science. - Matthew238 22:25, 22 August 2006 (UTC)[reply]

Monty Python example

The Monty Python example is flawed. As everyone knows, after Sir Belvedere's explanation, the accused woman herself says "It's a fair cop", confessing to being a witch. JIP | Talk 17:52, 31 May 2006 (UTC)[reply]

Unorganised and confusing -> more systematic

The article as it stood was (is) very unorganised and confusing, trying to explain through loose examples. I've tried to make things more systematic in the first paragraph. The later paragraphs should perhaps point to cases (1)-(4) to make things easier to understand.

Also, does "Teenage girls eat lots of chocolate, teenage girls are most likely to have acne, therefore, chocolate causes acne" give an example of a correlation? I'm not sure it does, and if not, the example should be removed.

I'm also not convinced if (4) actually is a possible outcome of a correlation.. help me out :) Narssarssuaq 16:31, 3 July 2006 (UTC)[reply]

I've removed the following because it doesn't contain a correlation:

Health and INcome

I really dont like the income health correlation example. I think it is bised for rich countries. In rich countries it may be true that income does not have much meaning in terms of health because everyone has a good diet. But in poor countries the health income correlation may well be indicative of good income causing good health. If you have a 1300 calorie diet, a higher income may well mean that you will have a 1600 calorie diet instead. Although both diets are not sufficient, I think the person with 1600 will be healthier.

Another example:

Teenage girls eat lots of chocolate.

Teenage girls are most likely to have acne.

Therefore, chocolate causes acne.

This argument, and any of this pattern, is an example of a false categorical syllogism. One observation about it is that the fallacy ignores (4), the possibility that the correlation is coincidence. We can pick an example where the correlation is as statistically "robust" as we please, but we still cannot assume one factor causes the other. If chocolate-eating and acne were strongly correlated across cultures, and remained strongly correlated for decades or centuries, it may not be a mere coincidence. However, in this particular example, the last statement is a logical fallacy because it ignores the possibility that a third factor may be the cause of eating chocolate and having acne (e.g. being young). See joint effect.

Someone correct me if I'm wrong about this. Narssarssuaq 16:50, 3 July 2006 (UTC)[reply]

Cannabis and possible other 'examples'

I think bringing in subjects like this is introducing POV bias and implications into the article. We should be able to demonstrate what a logical fallacy is without bringing up morally contentious issues as examples. Things like 'going to bed with shoes on' are great demonstrative examples, I don't see why we can't continue along those lines with the rest. Even the myopia one isn't really horrible beause I doubt you're seeing a lot of people out there with a strong interest in demonstrating causative links between lights being on and myopia.

On the other hand, bringing in a subject like cannabis use here is just opening a door to controversy as well as being a little devisive with regards to the implications of doing so in an article like this. The section is "examples of logical fallacies", and the topic we're given is a statistical link between cannabis use and mental illness. The section does not cite a specific case study (nor probably should it, this article isn't about a large debate like that), but rather makes a strongly implied generalisation that studies like this are or may be commiting logical fallacies. For all we know the studies go to great lengths to show that there is a causative effect behind the observed statistical correlations.

Ask yourself this: If most studies about links between cannabis use and mental illness didn't involve logical fallacies (as they supposedly do here), would this be a good example or a confusing/misleading one? I think it would clearly be the latter, and therefore the presence of this example is based on the underlying assumption that a statement about causal link between cannabis use and mental illness is something likely to be a logical fallacy. Do we know this? Have we done surveys of studies, looked at whether or not they are mostly guilty of logical fallacies, or do they mostly take this into account? Frankly, doing so is way beyond the scope of this article anyway.

The fact is, by putting claims like this in the section that demonstrates the fallcy, we are implying without evidence that the statement itself usually is false by way of the fallacy, which is not NPOV--and going to lengths to demonstrate it is infact NPOV (such statements statistically usually are fallacial) is out of the scope of the article. I simply don't see what we gain by having this here, we can do just as well explaining the topic while using entirely neutral examples. Honestly even the myopia one has problems really, but at least there the topic itself isn't a morally controversial one. --Rankler 00:51, 28 September 2006 (UTC)[reply]

I too agree that any possible controversial examples would not be appropriate, or maybe I should say best-suited, for this article. Instead of using a topic that may hinder groups of people from interpreting this article correctly, simple and neutral topics would probably be best to use. - Dozenist talk 01:33, 28 September 2006 (UTC)[reply]

NPOV and cannabis

Hi. I read over the section listed as a POV violation and couldn't really see it. I'm not a smoker of cannabis, nor do I really support legalisation, so I don't think it's an issue of my own engrained bias speaking for me.

The article is quite clear that the entire section is "maybe" and "possibly", without stating that the statement is false due to fallacy (any more than any other fallacy in this section - remember, just because a statement falls into Cum Hoc Ergo Proctor Hoc doesn't ever mean it's necessarily a 'false' statement (that, in fact, would be a case of Cum Hoc Ergo Proctor Hoc itself - just because a lot of statements that imply causation from coorelation are false (for instance, shoes to sleep and headaches) doesn't mean that they all are (for instance, cigarettes and lung cancer)).

Anyway, put short, the unsubstantiated fear that some Joe Average might read this article and get confused and immediately assume that pot is/is not harmful/helpful/related/unrelated to mental/physical/spiritual illness/well-being is not, by itself, reason enough to change the example given - which cannot easily be replaced with another, toned-down example, since the entire progression of the article goes from simple examples to more complex, real-worldish examples. In fact, using this argument to justify a rewrite is itself a fallacy - quite ironic, given the subject matter at hand. --151.200.252.164 09:08, 8 October 2006 (UTC)[reply]

I re-examined it and found it to be a bit defensive at the very beginning - as stoner stuff tends to be - and removed a bit of the "ALLEGED NATURE OF THE ALLEGED ALLEGED (POSSIBLY UNTRUE) UNVERIFIED ^{[citation needed]}^{[citation needed]}^{[citation needed]}" stuff in the beginning. Consensus? --151.200.252.164 09:13, 8 October 2006 (UTC)[reply]

Are the tags still needed?

Are the tags at the start of the article ({{Cleanup-date}}, {{expert}} and {{sources}}) still needed? Seems to me that they aren't and can be removed. Any Objections? Rami R 08:07, 18 October 2006 (UTC)[reply]

Objection. I do not believe that an expert has weighed in on this subject. There are many points without reference as well, so both the cleanup and sources templates apply. Chris53516 13:10, 18 October 2006 (UTC)[reply]

I sought out an expert (Prof. Yaacov Ritov). This was his input. He has, however, suggested that a science philosopher look at the article (mostly because of the terminological discussion in the beginning, and may be there is a need for an extended historical references beyond Hume). So the expert tag can stay. But i'm not sure what needs to be referenced. Could you show me an example? Rami R 07:32, 22 October 2006 (UTC)[reply]

Godwins Law?

Is there any relevance to Godwin's Law in the "See also" section, Or should it be removed? Rodo2 08:04, 25 October 2006 (UTC)[reply]

Definitely no relevance. I am removing it.--Dylan Lak e 23:10, 25 October 2006 (UTC)[reply]

Shoes example

The example about waking up with a headache after sleeping with shoes on is really an example of post hoc; shouldn't the first example be more specific to this article? --Tardis 06:52, 31 October 2006 (UTC)[reply]

Remove Monty Python Example?

As much as I love Monty Python, the monty python example in this article is confusing, and I don't think it helps to illustrate the "Correlation does not imply causation" point. I'm thinking about replacing it with a simpler example from popular culture/news/etc. Anyone would care to comment? Claus Aranha 08:55, 31 October 2006 (UTC)[reply]