On Sat, 28 Dec 2019, Riccardo (Jack) Lucchetti wrote:
On Fri, 27 Dec 2019, Allin Cottrell wrote:
> It appears to be a free lunch, pretty much. The speed-up is significant but
> not huge, something like 30 percent. That's now in git. But the primary
> source of difference between the gretl and numpy times that you quote must
> be due to the respective Blas/Lapack implementations.
>
> Here's what I'm seeing on a fairly elderly PC running Fedora with gretl
> linked against openblas. I reduced the number of replications to 2000
> (impatience) and created a baseline of accuracy by running mpols with
> GRETL_MP_BITS=4096.
>
> # gretl svd on, using DGELSS
> gretl (mols): 8.43043 seconds
> maxerr (gretl) = 0.000000000000007
> python (linalg.lstsq): 5.63282 seconds
> maxerr (python) = 0.000000000000007
>
> # gretl svd on, using DGELSD
> gretl (mols): 6.00157 seconds
> maxerr (gretl) = 0.000000000000007
> python (linalg.lstsq): 5.62002 seconds
> maxerr (python) = 0.000000000000007
>
> # gretl svd off (Cholesky)
> gretl (mols): 0.396789 seconds
> maxerr (gretl) = 0.000000000000005
> python (linalg.lstsq): 5.62659 seconds
> maxerr (python) = 0.000000000000007
>
>> From this, three points are apparent: (1) as stated above, DGELSD is
> close to 30% faster than DGELSS; (2) numpy is a little faster than us on
> SVD; and (3) you get just as accurate results an order of magnitude faster
> via Cholesky (the mols default) provided the regressors (as here) are not
> horribly collinear.
Hm, I'm seeing something weird here: on my home laptop (an 8-core, 4 real
cpus machine) I'm getting results that are fairly consistent with yours. On
the other hand, trying the same on my work PC (a 12-core box) I'm seeing
this:
----------------------------------------------
Old code (with set svd on):
----------------------------------------------
gretl (mols): 7.09208 seconds
maxerr (gretl) = 0.000000000000010
python (linalg.lstsq): 3.92083 seconds
maxerr (numpy) = 0.000000000000008
----------------------------------------------
New code (with set svd on):
----------------------------------------------
gretl (mols): 3.79993 seconds
maxerr (gretl) = 0.000000000000008
python (linalg.lstsq): 3.89591 seconds
----------------------------------------------
New code (with set svd on):
Should that be, with set svd off?
----------------------------------------------
gretl (mols): 120.903 seconds
maxerr (gretl) = 0.000000000000005
python (linalg.lstsq): 5.63783 seconds
It looks as if turning svd off makes mols a good deal slower (by the way: all
12 cpus go at 100% for the whole time). I have no time now to check why this
happens, but I'll throw a idea in: perhaps, for some reason, I'm getting the
QR decomposition instead of Cholesky?
If that were the case you should see on stderr,
"gretl_matrix_multi_ols: switching to QR decomp"
Are these 12 "real" cores? The timings I showed were from a quad
core box with hyperthreads and I set OMP_NUM_THREADS=4 to prevent
hyperthreading, which slows things down. (If OpenBLAS doesn't use
OMP, you'd set OPENBLAS_NUM_THREADS if wanted.)
Allin