Think of the lift curve as being not so much an illustration of goodness of fit, as an illustration of how good the model is at distinguishing between good risks and bad risks.
Imagine you draw a graph with actual claim frequency plotted against expected claim frequency. It this is a good fit, the graph would be along the 45 degree line.
But if the model has not distinguished between good risks and bad risks, then all the expected claim frequencies (ie the points on the x axis) would be ranked in a different order to the actual claim frequencies .
In that situation, all the points in the graph would be too far either to the left or the right, all jumbled up, and the graph wouldn't slope upwards.
The opposite must therefore also be true, a graph that is sloping upwards is doing a good job at telling which risks are good and which risks are bad; and the steeper the graph, the better the model is at distinguishing between risks.
====
I tend to think of the Gains curve as follows:
Think of the straight line as being the hypothetical dataset where every policy has the same degree of risk. In this case, policy 1 has x amount of risk, policy 2 also has x amount of risk etc, so the cumulative risk for 2 policies is 2x, the cumulative risk for 3 policies is 3x, etc. You can see that this line must be the diagonal from (0,0) to (1,1).
Now, for our own dataset, we know that risks are not uniform. We consider the higher risk policies first. Since these are higher than average, the data rises above the diagonal. This continues until we arrive at policies with a lower than average degree of risk which means we start to sink back towards the diagonal.
So, just as with the lift curve, the gains curve gives an illustration of how good the model is at distinguishing between good risks and bad risks. A graph which rises far above the line is a better predictor.
Last edited: Apr 29, 2014
redzer and jonathans like this.