Hi

 

We’ve been experiencing a problem where exactly same data yields different regression results depending on A) which computer is used and B) when regression is being run on same computer.

This is the third time the problem arises (first time was in February, next was in March and third now in April).

 

The issue arises typically in location specific variable(s) and is in the last digit of coefficient.

An example below

Here variable dLAT1km gets a value 2.68357. Using same data but on the next day coefficient is 2.68356. Everything else is the same.

This particular error has now been replicated on two computers running regressions today and yesterday!

My colleague is doing a test run later this evening and trying to see if the problem arises between two runs during the same day.

 

                   coefficient   std. error    t-ratio    p-value

  ----------------------------------------------------------------

  const             9.61714      0.0383124    251.0      0.0000    ***

  size             -0.0360853    0.00170093   -21.22     1.07e-095 ***

  size2             0.000328265  2.48074e-05   13.23     2.61e-039 ***

  size3            -1.04849e-06  1.14382e-07   -9.167    6.99e-020 ***

  age              -0.00931640   0.00120263    -7.747    1.14e-014 ***

  age2             -0.000252806  4.43133e-05   -5.705    1.23e-08  ***

  age3              4.79820e-06  4.62571e-07   10.37     5.91e-025 ***

  D_c_erinom        0.0248313    0.00777210     3.195    0.0014    ***

  D_c_tyyd         -0.0711856    0.00418330   -17.02     3.68e-063 ***

  D_c_huonot       -0.147710     0.0139738    -10.57     7.66e-026 ***

  apartment_sauna   0.0611526    0.00529751    11.54     1.94e-030 ***

  Q_7              -0.00308946   0.00694088    -0.4451   0.6563  

  Q_6              -0.000603071  0.00697798    -0.08642  0.9311  

  Q_5              -0.0102530    0.00680023    -1.508    0.1317  

  Q_4               0.00770398   0.00652315     1.181    0.2377  

  Q_3               0.00443441   0.00680937     0.6512   0.5149  

  Q_2              -0.00412167   0.00661880    -0.6227   0.5335  

  Q_1               0.00559893   0.00676218     0.8280   0.4077  

  dLAT1km           2.68357      1.30054        2.063    0.0391    **

  dLON1km          -6.45504      3.12653       -2.065    0.0390    **

  DP_84            -0.119379     0.0860273     -1.388    0.1653  

  lot_ownership    -0.0894216    0.00785090   -11.39     1.09e-029 ***

  d1200            -0.300451     0.00921419   -32.61     1.54e-211 ***

 

The problem

 

We run the regression monthly with two different computers [we verify our valuation by duplicating the process].

In both computers the Gretl version is the same 64-bit, and also the Windows 7 (pro) version is same in both computers. Now we’ve tested the runs on following days, and same problem appears even when using same computer. Each run consists of 480 regressions for different properties (this exercise is done for property valuation) and the problem occurs in c. 1-6 regressions, which are not the same on each time – i.e. the problem is rather small. On portfolio level it was this time 7,246e-7 % - but on property level error can be even 0,2%.

 

It is not the monetary value, but the real problem is that we can’t repeat the estimations.

 

In principle all these regressions ought to be solvable in closed form. Just wondering if one of the following could be the reason:

-        some algorithm is used in order to make calculations faster

-        somewhere in the Gretl code a random number generator is used

-        some rounding rule applies computer internal clock (odd/even date)

 

Kind Regards

 

Mikael Postila, MRICS

Head of Analysis

Orava Funds plc

 

t. +358 (0)50 347 2373

e. Mikael.Postila@oravafunds.com

a. Fabianinkatu 14B, FI-00100 Helsinki, FINLAND