Types of reforms: Shock therapy vs Physiotherapy

The reforms in any sector can be broadly categorized into two categories - shock therapies and physiotherapies.

Shock Therapy:  Shock therapy in medicine is essentially a strong external stimuli to treat an ailment. In policy, the reforms which require enactment or amendment of a new law, or changes in certain rules and regulations come under this category.
  1. These are one time-go reforms. The major effort is to get the particular rule changed.
  2. One can advocate for these by being outside the system and get them done by persuasion or pressurizing.
Law to prevent entry of criminals into politics, removing unwanted restrictions on businesses etc come under this category.

Physiotherapy: Physiotherapy is generally used to strengthen the injured body parts. It involves regular and repeated exercise. The analogy to policy would be the tasks of improving the service delivery, making people work in offices etc. These type of reforms
  1. are about deepening the procedures.
  2. can be boring
  3. are time taking
  4. have no visibility of immediate outcomes
  5. require regularity and patience
  6. can only be done by those inside the governance system

The meaning of test scores - Evaluations in education

If you are going to a gym to reduce weight, probably you would measure your weight periodically on a weighing machine. This is your metric for measuring the result of going to gym. Similarly, evaluations measuring the impact of interventions also use certain metrics. If the intervention was to increase incomes, it would look at the income of the people and so on. In education, we often hear about increase in test scores (expressed in 'standard deviation') as the result of an intervention. Unlike, measuring weight, the term 'test scores' isn't straight forward and can have different connotations depending on the way the test is designed, administered and accounted for in the analysis. This post tries to list down the various possible versions of 'test scores'.

Tests of varying difficulty and length: The scores could be on very simple questions like 'identify the letter' to questions on varying complexity and skills. The test paper could range from anything between 5 questions to 50 questions. These aren't standardized and the assessment papers used differ among evaluations. An intervention measured on simple questions can show large increase in scores whereas the same intervention measured on complex questions mighn't show the same impact.

Live the data - Proposing new method of research in development

A research is an exploration of truth. It primarily needs two things
  1. Understanding of the context.
  2. Ability to analyze - prove, disprove; make connections with existing knowledge and derive insights and wisdom.
People who are suffering with the problem have a very good understanding of the context but mostly they don't have the need to analyze it critically and derive insights. The researcher on the other hand has the second ability and the motivation/need to derive the insights but doesn't have the complete understanding of the context. To complement this, surveys are administered, people are asked about their feelings, reasons for their actions and so on.

The fifth caveat of evaluations - Average effects can be misleading

There was a very remote village without access to good educational facilities. A small for-profit company had developed a technology tool which can teach children without the need of a teacher. It set up learning centers in couple of villages. Some interested parents joined their children in these centers and over a period of time it was observed that the students were getting very good at Maths, both from the observations of the center staff and also in their school grades.

Now, the company wanted to get a rigorous evaluation done of its program. An external evaluation agency did a Randomized Controlled Trial in the same location and reported that there isn't any significant impact of the program on the test scores of the students.

What do we make of this evidence?

The fourth caveat of evaluations - Necessary vs Sufficient

In an earlier post, we have discussed the nuances of evaluation of a theme vs evaluation of a product. In this post, we will discuss the causal framework used in interpreting evaluations. Most caveats about evaluations aren't necessary the problems with the evaluation design but more with the communication and interpretation of the results.

In a typical evaluation, the outcome of interest is dependent on multiple factors. The evaluations can be broadly classified into two categories based on the manner in which these factors are dealt with.

  1. Evaluations of programs which span across multiple factors: Do cash transfers increase the learning outcomes of children?
  2. Evaluation of the individual factors: Does providing textbooks to children improve learning outcomes?
In the second case, textbook/learning material is a part of the process of education. While in the first, cash transfers isn't a direct factor in the process of education but it operates through indirectly influencing several other factors which are part of the process of education.  This post discusses one of the caveats in the interpretation of evidence on evaluations of the second kind.