A Note on the Normalization of Scores in CAT 2014

A Note on the Normalization of Scores in CAT2014

This note is aimed at illustrating the principles based on which the CAT 2014 committee intends to normalize the scores of students across the four sessions of the CAT 2014. This note is based on T.I.M.E.’s interpretation of the (limited) official information shared by the IIMs on the official CAT2014 website, along with an application of the principles of statistical normalization. The official information shared by the IIMs can be accessed here.

The key objective of this note is to help you, if you are interested or just plain curious, make sense of the rather conceptually cryptic statement given by the IIMs that “The Normalization process to be implemented shall adjust for location and scale differences of score distributions across different forms and the scaled scores obtained by this process shall be converted into percentiles for purposes of shortlisting.” If you’ve wondered what exactly the IIMs are referring to when they mention terms like location and scale, then we suggest you read on to get more clarity on the topic.

Before you start, please note that the exact formula that the IIMs may use for normalizing the CAT 2014 scores is not yet known, and any formula that is mentioned within this document is given solely for the purpose of illustrating the fundamental concepts of normalization and the process that is likely to be used by the IIMs.

Introduction:

Firstly, it may be mentioned that under ideal conditions, the scores of the test takers in an exam like CAT can be expected to be in a normal distribution. There are two statistical parameters that characterize (or define) any normal distribution, namely the mean “µ” and the standard deviation “σ”. In simple terms, the mean can be said to represent the location of the graph when it is plotted on the coordinate plane and the standard deviation represents the spread or the scale of the distribution. The figure given below illustrates the effect of the mean and standard deviation on the exact shape of the graph of a normal distribution.

Interpretation:

For each of the four test sessions in the CAT 2014, one such distribution (graph) of candidate scores (in a particular section or the total) can be drawn. In each graph, all the possible levels of score (in the section or overall) are taken along the X-axis, and the corresponding proportion of students at each level of score is taken along the Y-axis. Each of these distributions will closely approximate a normal distribution. However, across the four sessions, the distribution of scores can be expected to differ in terms of the exact values of the two characteristic parameters mentioned above, i.e., the mean (location), “µ”, and the standard deviation (scale), “σ”. The differences could be slight or significant, depending on how close the question papers are in terms of their difficulty level and on how similar the test taking groups^# are across the four sessions.

However, these differences in mean and standard deviation (also referred to as differences in location and scale) can be reconciled by using different approaches, with a mathematical formula/transformation. The exact formula that could be employed may be a simple one or a relatively complicated one, based on several other finer considerations that the CAT2014 admission committee may be concerned about.

Illustrations:

Given below are two sample formulae that could be used for performing normalization of scores based on the above concepts. These formulae are given only for the purpose of illustration.

The first example is the most basic formula possible based on the above concepts, while and the second example gives the formula(s) used by the GATE to normalize scores across multiple slots. It may be noted that the information shared by the IIMs mentions the GATE as one example of the several other reputed entrance exams which adopt the process of normalization of scores for tests conducted across multiple slots.

Example1: For the purpose of simplicity, the first example that follows is an illustration using the most elementary approach/formula that could be used. The mathematical formula could be as simple as that given below:

Normalized score of a candidate = (Score of the Candidate - Mean of the distribution of his session) / standard Deviation of the distribution of his session

Now, as an example, consider that the mean score of all the CAT2014 test takers, in say Section-I, in Session 1 is 46 marks and the standard deviation is 24 marks, and the actual score obtained by a student X (who appeared in Session 1) in Section-I is 84. Then, the normalized score of student X in Section-I will be (84 – 46)/24 ≅ 1.5833

Similarly, let the mean score in Section-I of all the CAT2014 test takers in Session 2 be 44 marks and the standard deviation, 22 marks, and let the actual score obtained by a student Y (who appeared in Session 2) in Section-I be, again, 84 (for comparison purposes, same as that for student X). Then, the normalized score of student Y in Section-I will be (84 – 44)/22 ≅ 1.8182

The normalized scores thus obtained for all the students across all four sessions (in a particular section or the total) will then be used to calculate the sectional and overall percentiles.

In the above case, since the normalized scores are fractional and cannot intuitively be compared easily by an average observer, the normalized scores could possibly be multiplied with a constant in order to arrive at integer scores, only for reporting purposes. For example, if the constant is chosen as 50, then the final scaled scores, of students X and Y, that will be reported are 79 (i.e., 1.5833 * 50 ≅ 79.167, rounded off to 79) and 91 (i.e., 1.8182 * 50 ≅ 90.9090, rounded off to 91) respectively.

It may be noted that, the percentiles are usually calculated without any rounding off to ensure fairness, while the rounding off is done solely for the purpose of simplicity in reporting the scaled scores.

Several other modifications may also be done to the formula/normalizing/scaling process in order to address practical concerns like negative scores, outliers etc.

Example 2: As mentioned in the official note by the IIMs, several well other recognized and reputed entrance exams like the GATE also use the above concepts (with some customization/ modification).

The formula used by the GATE till 2012 (where the exam was conducted in a single slot each year but normalization was done across multiple years – instead of across multiple slots within the same year) is given below. This formula can be seen to be very closely based on the basic formula mentioned in Example 1, given above.

where,

m = Marks obtained by the candidate,

a = Average of marks of all candidates who appeared in that subject, in that year, with marks less than zero converted to zero

S = Standard deviation of marks of all candidates who appeared in that subject, in that year, with marks less than zero converted to zero

a_g = Global average of marks of all candidates who appeared across all subjects in current and past 3 years, with marks less than zero converted to zero

s_g = Global standard deviation of marks of all candidates who appeared across all subjects in current and past 3 years, with marks less than zero converted to zero

The formula used in the GATE 2014 is relatively more complex and is given in an appendix to this document.

Concluding remarks:

As per T.I.M.E.’s understanding, an approach to the normalization of candidate scores done using these principles as the basis should be a very fair and transparent one. We consider this to be a very positive development, in the larger interest of the students, and any such move by the IIMs is very welcome.

Appendix

Given below is the relatively more complex formula used for normalization of GATE 2014 scores:

# The similarity in the four groups of candidates taking the CAT2014 in the four different test sessions is, for all practical purposes, achieved by the random allocation of test session to each candidate.