Re: [Gretl-users] question - Unstable regression results

Thursday, 28 April 2016

On Wed, 27 Apr 2016, Mikael Postila wrote:

...
 We've been experiencing a problem where exactly same data yields

 different regression results depending on A) which computer is used 
 and B) when regression is being run on same computer. 
Mikael kindly sent me a subset of his data and a sample model 
specification, so I've now been able to look into this. Since the 
question may be of some general interest I'm including the list in my 
reply.

Mikael is estimating a linear model via OLS. In the sample he gave me 
there are 5021 observations and 46 regressors, and among the 
regressors one, "dLAT1km", has a coefficient which was found to differ
at the 6th digit on different runs of the script (either on different 
machines or on different dates): 2.68356 versus 2.68357.

By using gretl's "mpols" command (multiple-precision OLS) I determined 
that the correct coefficient for this variable is, to 15 significant 
digits, 2.68356548257653. So it's on the rounding cusp between the two 
values that Michael observed, although on standard "round 5 away from 
zero" we'd say that 2.68357 is the "right" 6-digit result.

We usually expect that on linear problems gretl's results will be 
correct (and stable) to at least the 6 printed digits (in fact, 
several more than that in most cases). So what's going on here?

It turns out the X'X matrix for this regression is very ill 
conditioned, and in a particulay way. Invoking the "vif" command after 
OLS we find that the Variance Inflation Factor for dLAT1km is
58985220 ("Values > 10.0 may indicate a collinearity problem"),
several orders of magnitude greater than for any other regressor 
(there are 6 others with VIFs greater than 100). Running an OLS 
regression of dLAT1km on all the other regressors, gretl gets an 
R-squared of 0.999999983.

So Mikael's regression is close to the limits of gretl's default OLS 
procedure (using Cholesky decomposition) with regard to collinearity. 
Close enough that the outcome is likely to depend on the precise 
details of the physical computation (which intermediate results get 
stored to 80-bit registers, using how much "excess precision", and 
whether or not they get "spilled" to 64-bit memory locations).

I'm therefore not surprised that you could get differences at the 6th 
digit across different machines (or across different compilers, or 
different compiler settings).

It's more surprising that you could get differences on the same 
machine (with the same OS and the same pre-built gretl version), on 
different days. However, I've read that on MS Windows certain 
software, such as DirectX, may reset the precision of floating-point 
units. So it may be that -- when the results are so terribly sensitive 
to precision -- what gets printed by gretl depends on what else you've 
been running lately.

Is this a real problem? I'd say No. As a related experiment I tried 
probing the effect of a tiny change to the data. One of the regressors 
is "size", which I believe measures the size of properties in square 
meters and which seems to have a minimum increment of 0.5. I tried 
changing the size of just one of the 5000+ properties by 0.5 and 
rerunning Mikael's model. Result: most of the coefficients changed at 
the 6th digit or higher (several changed at the 4th or 5th digit). 
Moral: unless the data are considered _perfectly_ accurate you really 
don't want to be paying attention to the 6th digit of regression 
coefficients. However, you have "mpols" if you think you need it.

Allin Cottrell

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Gretl-users] question - Unstable regression results