Am 28.12.2019 um 05:06 schrieb Allin Cottrell:
On Fri, 27 Dec 2019, Sven Schreiber wrote:
> So, any negative side effects blocking the switch to DGELSD, or is it
> something like a free lunch?
It appears to be a free lunch, pretty much. The speed-up is
significant but not huge, something like 30 percent. That's now in
git. But the primary source of difference between the gretl and numpy
times that you quote must be due to the respective Blas/Lapack
implementations.
Good, thanks!
From this, three points are apparent: (1) as stated above, DGELSD is
close to 30% faster than DGELSS; (2) numpy is a little faster than us
on SVD; and (3) you get just as accurate results an order of magnitude
faster via Cholesky (the mols default) provided the regressors (as
here) are not horribly collinear.
BTW, I came across this whole question because in
the Python version of
the Tensorflow package (from Google) they have a different
implementation of linalg.lstsq: There you can explicitly switch between
fast (=Cholesky) and not so fast (=SVD I think). (There I could also
vectorize out the loop by using a third dimension, which makes it still
a little bit faster, but that's a different story. Plain numpy also
seems to be moving in this direction starting with 1.8, but it isn't in
numpy.linalg.lstsq yet.)
for i in range(1, 2000):
Strictly speaking this loop will run 1999 times, not 2000. You'd have to
do range(0,2000) or just range(2000). This isn't hansl, hehe!
cheers
sven