Thank you Jack and Allin,
Jack's right asking "Why should it be anything else?" but I do not have the
answer actually. I read in the litterature that the 0.5 probability
threshold looks somewhat arbitrary and I was guessing may be gretl is
performing based any other criteria, such as the ROC Curve, in order to
maximize the % of correctly predicted cases. Stata has some in-build
commands for this purpose and the ROC curves are generally used to compare
the performance of different binary dependent variable models.
As an illustration I expanded Jack's script and found that maximizing the %
of correctly predicted occurs at a threshold of 0.56.
Best,
Artur
<hansl>
open mroz87 --quiet
logit LFP const WA WE KL6
# actual values
genr actual_0 = sum(LFP=0)
genr actual_1 = sum(LFP=1)
# compute: sensitivity, specificity, 1-specificity, % of correctly
predicted, threshold
matrix result=zeros(98,5)
cnames = " sensitivity specificity 1-specificity correctly_predicted
threshold"
colnames(result, cnames)
scalar count = 1
loop for (threshold=0.01; threshold<=.99; threshold+=.01) --quiet
series predict = $yhat>threshold
correct_0 = sum(predict=0 && LFP=0)
correct_1 = sum(predict=1 && LFP=1)
result[count,1]= correct_1/actual_1 # true positive; sensitivity
result[count,2]= correct_0/actual_0 # true negative; specificity
result[count,3]= 1-result[count,2] # 1 - specificity
result[count,4]= (correct_0 + correct_1)/$nobs # % of correctly
predicted
result[count,5]= threshold
count+=1
endloop
gnuplot 1 3 --matrix=result --with-lines --suppress-fitted --output=display
{set title 'ROC Curve'; set xrange [-0.01:1.01]; set yrange [-0.01:1.01];
set grid; show grid}
gnuplot 4 5 --matrix=result --with-lines --suppress-fitted --output=display
{set title 'Correctly predicted %'; set grid; show grid}
</hansl>
2014-09-19 19:30 GMT+02:00 Riccardo (Jack) Lucchetti <r.lucchetti(a)univpm.it>
:
On Fri, 19 Sep 2014, Artur Bala wrote:
Dear all,
> Does anyone know how the threshold value for the predicted probalities in
> a
> logit/probit estimation is being calculated in gretl?
>
I assume that by "threshold" you mean "the value of P(x'b) at which
we
shwitch from predicting a 0 to predicting a 1". I haven't looked at the
source code, but I'm pretty sure it's 0.5. Why should it be anything else?
Try this:
<hansl>
open mroz87 --quiet
logit LFP const WA WE KL6
series pred = $yhat>0.5
xtab LFP pred
</hansl>
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------
_______________________________________________
Gretl-users mailing list
Gretl-users(a)lists.wfu.edu
http://lists.wfu.edu/mailman/listinfo/gretl-users