Re: [Gretl-users] Fixed Effect crashed when I tried to include over 5000 entity dummies

Tuesday, 12 February 2019

This response to Fred is kinda long but hopefully some of it may be of 
interest.

Reminder: Fred reported a crash when running a Tobit model with 5000+ 
panel-unit dummy variables, and also reported the estimation taking 
ages -- even plain OLS with all the panel dummies was painfully slow. 
Sven has addressed the econometrics of the Tobit estimation; in this 
reply I'm just concerned with the mechanics.

Diagnosing the crash requires being able to run the thing on a 
reasonable time scale, so my first task was speeding it up.

The first step in our Tobit is running OLS (to flush out any missing 
values and set up a suitably sized model structure), and that step in 
itself was taking far too long. Up till now our default OLS engine has 
been a gretl-native Cholesky solver. It's fast and effective for 
problems of a typical size in econometrics, but this case has exposed 
the fact that it bogs down badly when there are thousands of 
regressors. So I've now put a switch in place: if the problem exceeds 
a certain size we switch to LAPACK, which on most systems these days 
will be highly optimized (on Mac the Accelerate Framework will be 
used).

For reference I'm showing below the test rig I'm running in emulation 
of Fred's example (in the first instance, just the OLS part). The
X'X matrix has > 25 million elements.

<hansl>
set seed 1234
N = 5000
T = 4
NT = N*T
nulldata NT --preserve
setobs 4 1:1 --stacked-time-series
series x = normal()
series y = normal()
y = y < 1 ? 0 : y
genr unitdum
ols y du* x
</hansl>

With our previous OLS routine, estimation took about 300 seconds on my 
desktop; with the new LAPACK switch it takes about 5 seconds.

Alright, that's nice, but what about Tobit? Well, our default 
estimation algorithm for Tobit is Newton-Raphson, using the analytical 
Hessian. The Hessian here will have tens of millions of elements so 
Newton is bound to be slow. I therefore tried using BFGS instead, and 
for good measure made the convergence tolerance a bit sloppier than 
usual:

set optimizer BFGS
set bfgs_toler 1.0e-5

BFGS produced parameter estimates in a tolerable time, but then we hit 
computation of the Hessian for the covariance matrix -- and after 
about 5 minutes I couldn't be bothered waiting for that to finish!

Next step: enable the --opg option for tobit (and intreg), to get a 
cheaper covariance matrix via the Outer Product of the Gradient. Plus, 
I introduced a little parallelization into the tobit loglikelihood 
code. Net result: I could now run the whole thing in under 100 
seconds. The tobit portion looks like this:

<hansl>
set optimizer BFGS
set bfgs_toler 1.0e-5
set stopwatch
tobit y du* x --opg
printf "elapsed: %gs\n", $stopwatch
</hansl>

I then experimented a bit more with the Hessian and introduced some 
parallelization there. 5000 dummies are still a problem, but with 500 
dummies the full Newton/Hessian estimation goes through in about 5 
seconds.

As of now I haven't been able to replicate a crash; I'll dig at that 
some more later.

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Gretl-users] Fixed Effect crashed when I tried to include over 5000 entity dummies