On Fri, 27 Dec 2019, Sven Schreiber wrote:
Hi,
I'm observing an unexpectedly drastic slowdown of mols() when switching
to SVD (with 'set svd on').
My toy benchmark is to regress a 200x10 LHS matrix on a 200x80 RHS
matrix 5000 times. The matrices are not redrawn in order to measure just
the mols speed, not the RNG. On an oldish machine I get:
~ 3 sec without svd
~ 60 sec with svd
For comparison, Python/Numpy's linalg.lstsq takes ~ 8 sec and is said to
use SVD always.
There are at least two possible differences between gretl/svd and numpy:
1) numpy might be using Intel's Math Kernel Library (MKL) in the
background instead of standard Lapack (coming from the conda/Anaconda
distro).
2) The web says that numpy uses Lapack's DGELSD
(
http://www.netlib.org/lapack/explore-html/d7/d3b/group__double_g_esolve_g...),
whereas judging from lib/src/gretl_matrix.c in the function
gretl_matrix_SVD_ols (and maybe gretl_matrix_multi_SVD_ols? - don't know
what that is for) it seems that gretl uses DGELSS.
On the Netlib Lapack page it says
(
https://www.netlib.org/lapack/lug/node27.html) that "The subroutine
xGELSD is significantly faster than its older counterpart xGELSS", and
it labels DGELSD with "solve LLS using divide-and-conquer SVD".
So, any negative side effects blocking the switch to DGELSD, or is it
something like a free lunch?
It appears to be a free lunch, pretty much. The speed-up is
significant but not huge, something like 30 percent. That's now in
git. But the primary source of difference between the gretl and
numpy times that you quote must be due to the respective Blas/Lapack
implementations.
Here's what I'm seeing on a fairly elderly PC running Fedora with
gretl linked against openblas. I reduced the number of replications
to 2000 (impatience) and created a baseline of accuracy by running
mpols with GRETL_MP_BITS=4096.
# gretl svd on, using DGELSS
gretl (mols): 8.43043 seconds
maxerr (gretl) = 0.000000000000007
python (linalg.lstsq): 5.63282 seconds
maxerr (python) = 0.000000000000007
# gretl svd on, using DGELSD
gretl (mols): 6.00157 seconds
maxerr (gretl) = 0.000000000000007
python (linalg.lstsq): 5.62002 seconds
maxerr (python) = 0.000000000000007
# gretl svd off (Cholesky)
gretl (mols): 0.396789 seconds
maxerr (gretl) = 0.000000000000005
python (linalg.lstsq): 5.62659 seconds
maxerr (python) = 0.000000000000007
From this, three points are apparent: (1) as stated above, DGELSD is
close to 30% faster than DGELSS; (2) numpy is a little faster than
us on SVD; and (3) you get just as accurate results an order of
magnitude faster via Cholesky (the mols default) provided the
regressors (as here) are not horribly collinear.
My test script:
<hansl>
set verbose off
set seed 12345999
n = 200 # observations
k = 80 # regressors
g = 10 # equations
matrix X = mnormal(n,k)
matrix U = mnormal(n,g)
matrix B = ones(k,g)
matrix Y = X*B + U
# get results in multiple precision as baseline;
# I used GRETL_MP_BITS=4096
Bmp = zeros(k,g)
loop j=1..g -q
Bmp[,j] = mpols(Y[,j],X)
endloop
# set svd on # or not
matrix Bgr
set stopwatch
loop 2000 -q
Bgr = mols(Y,X)
endloop
printf "gretl (mols): %#g seconds\n", $stopwatch
printf "maxerr (gretl) = %.15f\n", max(abs(Bgr - Bmp))
mwrite(Y, "Y.mat", 1)
mwrite(X, "X.mat", 1)
foreign language=python
import numpy as np
import time as tm
Y = gretl_loadmat('Y.mat', 1)
X = gretl_loadmat('X.mat', 1)
# numpy OLS
t0 = tm.time()
for i in range(1, 2000):
B = np.linalg.lstsq(X, Y)[0]
print("python (linalg.lstsq): %#g seconds" % (tm.time() - t0))
gretl_export(np.asmatrix(B), 'Bpy.mat', 1)
end foreign
matrix Bpy = mread("Bpy.mat", 1)
printf "maxerr (numpy) = %.15f\n", max(abs(Bpy - Bmp))
</hansl>
Allin