Re: [Gretl-users] threshold value

Wednesday, 1 October 2014

Thank you Jack and Allin,

Jack's right asking "Why should it be anything else?" but I do not have the
answer actually. I read in the litterature that  the 0.5 probability
threshold looks somewhat arbitrary and I was guessing may be gretl is
performing based any other criteria, such as the ROC Curve, in order to
maximize the % of correctly predicted cases. Stata has some in-build
commands for this purpose and the ROC curves are generally used to compare
the performance of different binary dependent variable models.
As an illustration I expanded Jack's script and found that maximizing the %
of correctly predicted occurs at a threshold of 0.56.
Best,
Artur

<hansl>
open mroz87 --quiet
logit LFP const WA WE KL6

# actual values
genr actual_0 = sum(LFP=0)
genr actual_1 = sum(LFP=1)

# compute: sensitivity, specificity, 1-specificity, % of correctly
predicted, threshold

matrix result=zeros(98,5)
cnames = " sensitivity specificity 1-specificity correctly_predicted
threshold"
colnames(result, cnames)

scalar count = 1

loop for (threshold=0.01; threshold<=.99; threshold+=.01) --quiet
    series predict = $yhat>threshold
    correct_0 = sum(predict=0 && LFP=0)
    correct_1 = sum(predict=1 && LFP=1)
    result[count,1]= correct_1/actual_1 # true positive; sensitivity
    result[count,2]= correct_0/actual_0 # true negative; specificity
    result[count,3]= 1-result[count,2]   # 1 - specificity
    result[count,4]= (correct_0 + correct_1)/$nobs # % of correctly
predicted
    result[count,5]= threshold
    count+=1
endloop

gnuplot 1 3 --matrix=result --with-lines --suppress-fitted --output=display
{set title 'ROC Curve'; set xrange [-0.01:1.01]; set yrange [-0.01:1.01];
set grid; show grid}
gnuplot 4 5 --matrix=result --with-lines --suppress-fitted --output=display
{set title 'Correctly predicted %'; set grid; show grid}
</hansl>

2014-09-19 19:30 GMT+02:00 Riccardo (Jack) Lucchetti <r.lucchetti(a)univpm.it&gt;
:

...
 On Fri, 19 Sep 2014, Artur Bala wrote:

  Dear all,
> Does anyone know how the threshold value for the predicted probalities in
> a
> logit/probit estimation is being calculated in gretl?
>

 I assume that by "threshold" you mean "the value of P(x'b) at which
we
 shwitch from predicting a 0 to predicting a 1". I haven't looked at the
 source code, but I'm pretty sure it's 0.5. Why should it be anything else?

 Try this:
 <hansl>
 open mroz87 --quiet
 logit LFP const WA WE KL6
 series pred = $yhat>0.5
 xtab LFP pred
 </hansl>

 -------------------------------------------------------
   Riccardo (Jack) Lucchetti
   Dipartimento di Scienze Economiche e Sociali (DiSES)

   Università Politecnica delle Marche
   (formerly known as Università di Ancona)

   r.lucchetti(a)univpm.it
   http://www2.econ.univpm.it/servizi/hpp/lucchetti
 -------------------------------------------------------
 _______________________________________________
 Gretl-users mailing list
 Gretl-users(a)lists.wfu.edu
 http://lists.wfu.edu/mailman/listinfo/gretl-users

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Gretl-users] threshold value