Correlation vs. Causation - Are interpretations from correlations always bad?

Aditya Kuvalekar makes an insightful point in his recent blog post on correlation vs. causation arguments.
“Correlation is not causation,” “Anecdotal evidence isn’t really an evidence”! I am sure many of us have had these statements thrown at us at some point of time in debunking any argument that we make. True as these statements are, without a doubt, I think we have come dangerously close to interpreting these as “Correlation implies no causation” or “Anecdotes are false”.
I completely agree with this and am glad that someone has made this point. I observe two categories of correlation vs. causation interpretations that people generally make.

The first category where interpretations are made from a couple of data points. For example, when you land in an airport, it starts raining. So, you infer, it rained because I landed at airport. This sounds stupid but the irony is that most of us would have done similar things at some point of time, the very common context being cricket. While watching a tense match in a group, if someone joins the group and a wicket of batsman of the team you are supporting falls, it is common for people to accuse the new person for bringing the bad luck. Other similar acts being, changing the channel on the TV resulting in the fall of a wicket and so on. During one such match, there were a series of tweets on my TL, where people were saying, I did 'X' and the wicket fell. I had compiled them but unfortunately I have lost them. 

The second category is where interpretations are made from a trend. Let us say that you a visited a school. It turned out to be a good school, good by whatever your definition of good is. You also notice that there is a red pole on the top of the school. Now, you visit another school which is bad. You observe that there is no red pole on the top of this school. Similarly, you visit few other schools and find that those are good schools and there are red flags on the top of this school. Now, you come across a school and see a red flag on it, you infer that this school might be good.

The question now is, is this unacceptable and wrong? The answer is, it depends on the purpose of the interpretation. In real world, we rarely have complete information to establish the causality. Neither can we wait till the causality is established to take a decision. We are forced to take information based on the incomplete information that we have. In those cases, correlations are of great help. In the above example of determining the quality of schools, if one were asked casually to guess the quality of the school, one would categorize that the school with red pole as a good school. But if the same question was asked but with the purpose for school certification by governments, then the criteria of red pole is a bad idea.

Similarly, the case of cricket matches discussed above. If the purpose is to tease friends casually, it may be fine but if the purpose of the interpretation is to turn this interpretation into a rule or act, then it is wrong. In development economics, it is very common to come across situations where interpretations are to be made with two data points. Suppose if you observe that the growth rate of two countries is different and you also observe that there is a significant difference in another parameter 'y'. It is not wrong to flag this parameter and say, the growth of these countries seems to be different and interestingly these countries differ in parameter 'y' too. This gives us a starting point to explore the reasons behind the difference in growth rates. But if this interpretation is used to make a policy to increase 'y' by putting all resources into it, with significant opportunity costs, then it is wrong.

In summary, correlations aren't bad. It depends on the context we use them. We take many decisions based on mere correlations in real life. Frankly, no one does controlled studies for everything before taking a decision. Let us appreciate the utility of correlations and not discard them completely.

I would recommend reading my another on a similar topic, How much rigor is rigorous? Also, this answer in Stack Exchange forum, when can correlation be useful without causation?


  1. Applying the same funda in a different context, I think we use "correlation" very liberally
    when judging people - not distant politicians, but people around us. People make decisions as important as choosing life partners based on mere correlations. How simple would the world have been, if correlation would have always meant causation - just a fantasy? ;-)

  2. Thanks for this. I came across this xkcd cartoon that I thought you might like: :)