On Fri, 27 Dec 2019, Allin Cottrell wrote:
It appears to be a free lunch, pretty much. The speed-up is
significant but
not huge, something like 30 percent. That's now in git. But the primary
source of difference between the gretl and numpy times that you quote must be
due to the respective Blas/Lapack implementations.
Here's what I'm seeing on a fairly elderly PC running Fedora with gretl
linked against openblas. I reduced the number of replications to 2000
(impatience) and created a baseline of accuracy by running mpols with
# gretl svd on, using DGELSS
gretl (mols): 8.43043 seconds
maxerr (gretl) = 0.000000000000007
python (linalg.lstsq): 5.63282 seconds
maxerr (python) = 0.000000000000007
# gretl svd on, using DGELSD
gretl (mols): 6.00157 seconds
maxerr (gretl) = 0.000000000000007
python (linalg.lstsq): 5.62002 seconds
maxerr (python) = 0.000000000000007
# gretl svd off (Cholesky)
gretl (mols): 0.396789 seconds
maxerr (gretl) = 0.000000000000005
python (linalg.lstsq): 5.62659 seconds
maxerr (python) = 0.000000000000007
> From this, three points are apparent: (1) as stated above, DGELSD is
close to 30% faster than DGELSS; (2) numpy is a little faster than us on SVD;
and (3) you get just as accurate results an order of magnitude faster via
Cholesky (the mols default) provided the regressors (as here) are not
horribly collinear.
Hm, I'm seeing something weird here: on my home laptop (an 8-core, 4 real
cpus machine) I'm getting results that are fairly consistent with yours.
On the other hand, trying the same on my work PC (a 12-core box) I'm
seeing this:
Old code (with set svd on):
gretl (mols): 7.09208 seconds
maxerr (gretl) = 0.000000000000010
python (linalg.lstsq): 3.92083 seconds
maxerr (numpy) = 0.000000000000008
New code (with set svd on):
gretl (mols): 3.79993 seconds
maxerr (gretl) = 0.000000000000008
python (linalg.lstsq): 3.89591 seconds
New code (with set svd on):
gretl (mols): 120.903 seconds
maxerr (gretl) = 0.000000000000005
python (linalg.lstsq): 5.63783 seconds
It looks as if turning svd off makes mols a good deal slower (by the way:
all 12 cpus go at 100% for the whole time). I have no time now to check
why this happens, but I'll throw a idea in: perhaps, for some reason, I'm
getting the QR decomposition instead of Cholesky?
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)