Stop saying "correlation is not causation" unnecessarily

Lately, the term "correlation is not causation" is being thrown around loosely. This is in the tradition of casting aspersions on evidence that one disagrees with. Throwing this phrase only puts the burden on the opponent while one can just sit and do nit picking. We have reached a stage where "correlation can never mean causation."

Correlation gives a good hypothesis to explore. It should no means be discarded because it is just a correlation. 

Correlation can be causation. If one doesn't want to believe a causation because of the presence of confounding factors, one needs to present those confounding factors.

Often, there's a misconception that causation can only be proven by RCTs. It's again incorrect. Causation can be proven if one can demonstrate an appropriate mechanism through which the correlation occurs.

For instance, if you quote the correlation between ice cream sales and the number of books published in a year, it may not be attention worthy. It's because there is no possible mechanism which explains this mechanism. On the other hand, if there's a correlation between ice cream sales and the number of ENT cases, it's attention worthy. It's because one can clearly see a mechanism through which this can possibly happen. If the mechanism through which ice cream causes throat disease, which results in ENT cases can be proven, it’s as good as an RCT.

For a good understanding of this technique, check out Raj Chetty's work in social mobility in US. He starts with the data that some areas have high social mobility and some have less mobility. There is a possible confounding factor here - may be people of certain kind live in a place which results in higher mobility, and it has nothing to do with the place. The way to test this is to check those people, who have changed cities and see how their mobility varies. If a person shifting from low mobility area to high mobility area ends up having higher mobility, it can be inferred that mobility is due to place, not people. Further, if the extent of mobility is proportional to the number of years of exposure to high mobility areas, it's a further evidence. All of this proves causality. No RCT here.

It also illustrates the importance of theory. Theory gives us the mechanisms through which the effect occurs.

Ashish Jha does a good job reminding these arguments in the context of his recent study.

On needing RCT to prove causation
Remember the RCT that assigned people to smoking (versus not) to see if it really caused lung cancer?  Me neither…because it never happened.  So, if you are a strict “correlation is not causation” person who thinks observational data only create hypotheses that need to be tested using RCTs, you should only feel comfortable stating that smoking is associated with lung cancer but it’s only a hypothesis for which we await an RCT.  That’s silly.  Smoking causes lung cancer.
On alternative explanations - confounding factors
There must be an alternative explanation! There must be confounding!  But the critics have mostly failed to come up with what a plausible confounder could be.  Remember, a variable, in order to be a confounder, must be correlated both with the predictor (gender) and outcome (mortality).  We spent over a year working on this paper, trying to think of confounders that might explain our findings.  Every time we came up with something, we tried to account for it in our models.  No, our models aren’t perfect. Of course, there could still be confounders that we missed. We are imperfect researchers. But that confounder would have to be big enough to explain about a half a percentage point mortality difference, and that’s not trivial.  So I ask the critics to help us identify this missing confounder that explains better outcomes for women physicians
The other important aspect here is the "possible threat of effects". If there's a correlation between smoking and lung cancer, it is better to err on the opposite side and take precautions assuming that it causes cancer. The stakes are high to wait till the causation through RCT is proved.

The same applies to climate change too. By the time Montreal Summit was going on in 1987, there was no indisputable evidence that CFCs are damaging the ozone layer. One had only preliminary evidence. The indisputable evidence came only years later. Countries exercised wisdom in identifying the damage and taking precautions well before hand.

In the context of growing relevance to evidence, unfortunately, there is a trend to misuse the data interpretations. The common form is to just point fingers at opponent's evidence without taking any responsibility for the burden of proof. Ashish's article is a gentle reminder to catch such arguments.

Correlation is a double edged sword. It can affect either ways. One needs to exercise caution and diligence in interpreting it.

No comments:

Post a Comment