On Jan 11, 2023, at 1:20 AM, Riccardo (Jack) Lucchetti
<p002264(a)staff.univpm.it> wrote:
***This email is from an external source. Only open links and attachments from a Trusted
Sender.***
> On Sat, 17 Dec 2022, Fred Engst wrote:
> Hi Jack, Allin, Sven and all others on the gretl team,
> As you know, "Number of cases ’correctly predicted’" in a logit/probit
model can be miss-leading even in a 50/50 split case.
> What we should be comparing is not zero ‘correctly predicted’, but rather random
assignments based on sample mean.
> If a sample is 50/50 split, a random assignment would get 50% "correctly
predicted", in theory. If our model's 'correctly predicted' is 70%, we
are only 20 percentage points higher than a model based on random assignment, representing
an improvement over the random assignment model by only 40%.
> Thus, I would like to propose an alternative output from gretl, i.e. the “Extra
number of cases 'correctly predicted' over random assignment” (or something like
that), call this dot_R-square perhaps.
> Dot_R-saure = (Y_hat_model - Y_hat_random) / (1-Y_hat_random)
> where Y_hat_model = sum(Y_hat_model_i=Y_i)/N
> Y_hat_random = Y_hat^2 + (1-Y_hat)^2
> Y_hat is the sample mean
> Y_hat_model_i = Pro(Y_i = 1) >Y_hat
> Pro(Y_i = 1) >Y_hat = 1, if Pro(Y_i = 1) >Y_hat is true
> Unlike the McFadden R-squared, the interpretation of this is fairly straight
forward, i.e. the percent that our model is better off than a model based on random
assignment.
There is a fair number of similar statistics available in the "extra" package,
under the name "scores2x2". Have you checked if your proposed statistic is in
there already?
You could also check out the “roc” package. It automatically does the test you’re
suggesting in the context of the area under the roc curve (AUROC). The AUROC for a random
classifier is also 0.5, and the package tests the model’s AUROC against that null.
Cheers,
PS
Sent from my iPad