[Gretl-users] Some suggestions on the "Number of cases ’correctly predicted'" from a logit/probit outputs

Saturday, 17 December 2022

Hi Jack, Allin, Sven and all others on the gretl team,
As you know, "Number of cases ’correctly predicted’" in a logit/probit model can
be miss-leading even in a 50/50 split case.
What we should be comparing is not zero ‘correctly predicted’, but rather random
assignments based on sample mean.
If a sample is 50/50 split, a random assignment would get 50% "correctly
predicted", in theory. If our model's 'correctly predicted' is 70%, we
are only 20 percentage points higher than a model based on random assignment, representing
an improvement over the random assignment model by only 40%.
Thus, I would like to propose an alternative output from gretl, i.e. the  “Extra number of
cases 'correctly predicted' over random assignment” (or something like that), call
this dot_R-square perhaps.

Dot_R-saure = (Y_hat_model - Y_hat_random) / (1-Y_hat_random)
where 	Y_hat_model = sum(Y_hat_model_i=Y_i)/N
		Y_hat_random = Y_hat^2 + (1-Y_hat)^2
		Y_hat is the sample mean
		Y_hat_model_i = Pro(Y_i = 1) >Y_hat
		Pro(Y_i = 1) >Y_hat = 1, if Pro(Y_i = 1) >Y_hat is true

Unlike the McFadden R-squared, the interpretation of this is fairly straight forward, i.e.
the percent that our model is better off than a model based on random assignment.

Best,
Fred

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Gretl-users] Some suggestions on the "Number of cases ’correctly predicted'" from a logit/probit outputs