The cut-off is a popular word during the admission season, especially in India. For us at Questionbang, the cut-off is something often asked by mock-set-plus users – Has my score is good enough to qualify Joint Entrance Examination (JEE)? Hence, we decided to predict the cut-off for the next season and have it in result analysis.
Background
Most of us have used basic regression analysis during our class 12th maths, e.g, time series and forecasting. In reality, the outcomes of such predictions are going to be dependent on various factors. The use of simple regression may not be sufficient in such cases.
Let us consider our requirement – predicting cut-off marks for JEE. The table below shows cut-off scores for the last 6 years.
Let us try a simple curve fitting approach (Figure 1); this is giving a prediction of 69 for the year 2019. However, we cannot relate these points to any reasoning. As we can see, the cut-off (Y) was 113 for the year 2013 and became 74 in 2018 (Table 1). Surely many factors influence the cut-off.
Let us assume those cut-offs (Table 1) are a measure of competition and hence, are a function of the following variables:
Number of seats available (),
Difficulty level (),
Number of applicants ().
How these individual variables influence the outcome is something to be predicted.
Choosing a regression method
What is the regression analysis?
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables [wiki].
There are many different types of regression techniques. They are mainly of two categories – linear regression and non-linear regression.
In our case, the cut-off score (predictor) is dependent on three independent variables (, , ) as discussed before. Hence, this is going to be a multiple linear regression scenario.
The table below (Table 2) is an extension of Table 1 (Cut-off scores for the last 6 years) to include the number of seats available, difficulty level and the number of applicants.
About data
The data – cut-off scores, seat availability, number of applicants and difficulty level have been gathered from online news portals. The JEE format changed a few times during the past 20+ years. It has a two-phase (Mains & Advanced) format since 2013. We will use data from 2013 to 2018.
Revisiting the basics of least square regression method – a single independent variable condition
Assume a single independent variable condition and set of values as below (Table 3). Let us call these observations.
Table 3. Observations. – Independent variable, – Actual dependent variable.
Following is an equation for simple linear regression:
(1)
Let us compute predictions , using the the above values (Table 3):
.
In generic form, the equation for predicted value will become:
(2)
Table 4. Observations and predictions. – Predicted value.
Let us verify the accuracy of our prediction (Table 5),
Table 5. Observations, predictions and errors. = error term.
As you can see, we subtracted the predictions () from the actual observations () to compute errors (). The next objective would be to refit the line so that the error () is minimized.
(3)
From (2) and (3),
(4)
Eq (4) is a squared error function; we need to find coefficients a & b to achieve minimum (zero) error. Take the partial derivative of eq (4) with respect to a and b:
After substituting the above coefficients, eq (9) becomes,
(17)
We can use the above eq (17) to compute the JEE cut-off.
We will assume the following values for the year 2019:
Number of seats () = 36500,
Difficulty level () = 1 or 0,
Number of applicants () = 11 lakh.
A) Using eq (10), high difficulty ()
.
B) Using eq (10), moderate difficulty ()
.
Cut-off score range: – .
Conclusion
The above prediction may not be accurate as it is based on a very limited set of data. It is to be noted that, this prediction is not relevant for the year 2019 (onwards), as the cut-off is going to be in percentile not in the score.
Software Engineer at bispark, Dharwad. Holds Bachelor degree in Electronics & Communication from Visvesvaraya Technological University (VTU), Belagavi.