Hi
We've been experiencing a problem where exactly same data yields different
regression results depending on A) which computer is used and B) when
regression is being run on same computer.
This is the third time the problem arises (first time was in February, next
was in March and third now in April).
The issue arises typically in location specific variable(s) and is in the
last digit of coefficient.
An example below
Here variable dLAT1km gets a value 2.68357. Using same data but on the next
day coefficient is 2.68356. Everything else is the same.
This particular error has now been replicated on two computers running
regressions today and yesterday!
My colleague is doing a test run later this evening and trying to see if the
problem arises between two runs during the same day.
coefficient std. error t-ratio p-value
----------------------------------------------------------------
const 9.61714 0.0383124 251.0 0.0000 ***
size -0.0360853 0.00170093 -21.22 1.07e-095 ***
size2 0.000328265 2.48074e-05 13.23 2.61e-039 ***
size3 -1.04849e-06 1.14382e-07 -9.167 6.99e-020 ***
age -0.00931640 0.00120263 -7.747 1.14e-014 ***
age2 -0.000252806 4.43133e-05 -5.705 1.23e-08 ***
age3 4.79820e-06 4.62571e-07 10.37 5.91e-025 ***
D_c_erinom 0.0248313 0.00777210 3.195 0.0014 ***
D_c_tyyd -0.0711856 0.00418330 -17.02 3.68e-063 ***
D_c_huonot -0.147710 0.0139738 -10.57 7.66e-026 ***
apartment_sauna 0.0611526 0.00529751 11.54 1.94e-030 ***
Q_7 -0.00308946 0.00694088 -0.4451 0.6563
Q_6 -0.000603071 0.00697798 -0.08642 0.9311
Q_5 -0.0102530 0.00680023 -1.508 0.1317
Q_4 0.00770398 0.00652315 1.181 0.2377
Q_3 0.00443441 0.00680937 0.6512 0.5149
Q_2 -0.00412167 0.00661880 -0.6227 0.5335
Q_1 0.00559893 0.00676218 0.8280 0.4077
dLAT1km 2.68357 1.30054 2.063 0.0391 **
dLON1km -6.45504 3.12653 -2.065 0.0390 **
DP_84 -0.119379 0.0860273 -1.388 0.1653
lot_ownership -0.0894216 0.00785090 -11.39 1.09e-029 ***
d1200 -0.300451 0.00921419 -32.61 1.54e-211 ***
The problem
We run the regression monthly with two different computers [we verify our
valuation by duplicating the process].
In both computers the Gretl version is the same 64-bit, and also the Windows
7 (pro) version is same in both computers. Now we've tested the runs on
following days, and same problem appears even when using same computer. Each
run consists of 480 regressions for different properties (this exercise is
done for property valuation) and the problem occurs in c. 1-6 regressions,
which are not the same on each time - i.e. the problem is rather small. On
portfolio level it was this time 7,246e-7 % - but on property level error
can be even 0,2%.
It is not the monetary value, but the real problem is that we can't repeat
the estimations.
In principle all these regressions ought to be solvable in closed form. Just
wondering if one of the following could be the reason:
- some algorithm is used in order to make calculations faster
- somewhere in the Gretl code a random number generator is used
- some rounding rule applies computer internal clock (odd/even date)
Kind Regards
Mikael Postila, MRICS
Head of Analysis
Orava Funds plc
t. +358 (0)50 347 2373
e. <mailto:Mikael.Postila@oravafunds.com> Mikael.Postila(a)oravafunds.com
a. Fabianinkatu 14B, FI-00100 Helsinki, FINLAND