"Inequality at birth is neither just nor unjust. What's just and unjust is the way institutions deal with it" - John Rawls
"I always had a certain dislike for general principles and abstract prescriptions. I think it's necessary to have an "empirical lantern" or a "visit with the patient" before being able to understand what is wrong with him. It is crucial to understand the peculiarity, the specificity, and also the unusual aspects of the case" - Albert O. Hirschman
The Third Caveat of Randomized Control Trials: Intra-hypothesis Validity
(This post was first published in Logos, The Takshashila Institute's blog)
Randomized Controlled Trials (RCTs) have gained significant popularity in recent days. This blog post tries to explore one of the nuances in interpreting the results of such evaluations.
For beginners, a RCT is a type of experiment used to test causal links. The subjects of the experiment are split into two groups; the ‘Control group’ which doesn’t get the treatment/intervention and the ‘Treatment group’ which receives the treatment/intervention. This is a common division in such experiments but in RCTs the people are assigned randomly to these two groups. This is to ensure: (i) that there are no observable and unobservable differences between both the groups at the beginning of the study; (ii) that any changes in the external factors are uniform across both the groups. This enables the conductors to reliably attribute any difference between the outcomes of the control group and treatment group only to the intervention.
There are certainly limitations to RCTs. One such limitation is that an intervention which yields results in one context might not yield the same results in another; a successful intervention in an American setting might not be as effective in an Indian setting due to differences peculiar to those parts of the world. The ability of an experiment to replicate its results in other contexts is referred to as its external validity; a study which isn’t replicable in other contexts is said not to have external validity. Another limitation is the nature of the ‘implementing organisation’; it was only recently realized that the ‘implementing organization’ of the RCT also matters. In some cases, RCTs administered by NGOs were found to show results not replicated by similar RCTs administered by the government. This could possibly be attributed to the logistical and administrative restraints of government run studies. These are the two most important caveats about RCTs that are generally discussed.
The third, and probably the most tricky caveat is what this blogger likes to call the Intra-hypothesis Validity. This is, admittedly, not a particularly catchy or glamorous name; any suggested replacements would be more than welcome.
Intra-hypothesis validity can be best explained through the following scenario. Suppose that you are conducting an RCT and that there aren’t problems with the external validity or the implementing organization (i.e. you). The causal link that you are seeking to evaluate is the impact of ‘teacher training’ on the learning outcomes of students. To conduct the RCT you will need to design a ‘training module’ with which to actually train the Treatment group of teachers. In this scenario the training module that the teachers go through is the ‘product’, or, in other words, the particular manifestation of the original ‘theme’ – teacher training.
Further suppose that after you complete your evaluation you find out that there wasn’t any increase in the learning outcomes of the students. Which of the following options will be your interpretation?
Teacher trainings don’t increase learning outcomes of children (associating the results with the theme).
The course curriculum and the mode of training wasn’t good and beneficial (associating the results with the product).
Is the problem with the ‘theme’ or is it with the way the theme was implemented, i.e. the ‘product’? The correct answer is that we cannot know with this limited information. However, such results are often interpreted as the failure of the theme; most examiners would hold that ‘teacher trainings don’t increase the learning outcomes of students’. Some people would probably then use the study to support the position that ‘teacher trainings aren’t the need of the hour’ and that ‘spending money on them isn’t of great utility’.
The biggest problem with this interpretation is that there is an inherent assumption that there is only way in which teacher training courses can be designed and delivered. Attributing the failure of a single method of training teachers to the theme of ‘teacher training’ as a whole assumes that all alternative methods would also fail. This would fly in the face of pedagogy as different results could possibly be attained if the content and delivery of the training sessions were changed.
In short, it is an easy mistake to attribute the failure of the product to the whole theme. This danger can be avoided by examining what I call the ‘intra-hypothesis validity’, i.e. the validity within a hypothesis; that there is agreement between different manifestations of the theme/hypothesis. It is important to realize that we aren’t discussing the different rates of usage of the product here but rather different products that are based on the same theme.
A question that naturally arises is ‘How do we know if the RCT is evaluating the theme or the product?’. The easiest test is to see how much intellectual effort is required to administer the RCT. Usually, when the RCT is evaluating the theme, very little effort will be required to make it into a form deliverable to the test subjects, e.g. direct reservations for women. There isn’t much change required to the ‘reservation for women’ theme; the only differences would be in the modes of delivery but that too is generally predetermined by convention. On the other hand, if a significant amount of work has to be done to convert the theme into an administrable form then the RCT is probably evaluating the product. In the given example of ‘teacher training’, the theme needs to be converted into a ‘training module’ to be able to deliver it to teachers. Though the purpose of the RCT might be stated as an evaluation of the ‘teacher training’, it is, in essence, actually the ‘training module’ that is being evaluated. I, personally have reservations on micro finance evaluations, based on the logic as above. The term micro finance is a broad notion. It can be delivered in multiple forms; different types of schemes with different conditions and metrics, called 'products' in our discussion. Results of the evaluations on few of these types of programmes can't be generalized to the whole theme of 'micro finance'. Another related issue is the heterogeneity of the impact. Instead of saying that this intervention didn't work, it is useful to see, whom did it work for and use this evidence to customize and target the people accordingly rather than dismissing the whole product.
What can be done about this?
There are two aspects to any product, the ability of the product to perform the job and the usability. A product can be extremely good at its job but if it is not usable by users, then it is as good as not having that product.
Collect as much data as possible on the product. Administer qualitative surveys to know about the usability of the product – do people find it easy to use?
It is often useful to first test such products through pilot cases so that the right product can be fine tuned until there is enough anecdotal evidence for a more rigorous evaluation. This, at the very least, ensures that the selected product is likely to do its job.
Evaluate different versions of the product or different products based on the same theme before arriving at a conclusion. However, this can be practically unfeasible owing to the costs of administering RCTs.
Be mindful of the intra-hypothesis validity while interpreting such evidence.