Am 13.01.2020 um 20:52 schrieb Allin Cottrell:
We're on ths same page! I just pushed to git an openblas-specific
patch for DSYEV* to store the prior number of threads, obtained via
openblas_get_num_threads(); set the number of threads to 1 via
openblas_sget_num_threads(); then restore the prior thread count.
The bad news is that this really does seem to be a design flaw in
openblas -- I've now tried matrices of order 400 and multi-threading
is still slower for DSYEV*. The good news is that DGEEV runs a bit
faster with multi-threading for really big matrices (at least on
sandybridge). It remains to be seen how this pans out for other LAPACK
functions.
I have some preliminary evidence that it also affects cholesky() and
qrdecomp() with tiny matrices, at least on Windows. (This is not with
the latest snapshot, but if I understand correctly the underlying
routines DPOTRF/DPOTRS and DGEQRF/DORGQR were not called differently yet
anyway.)
The speedup of using just one thread for openmp seems to be about 15%,
very roughly speaking.
I don't have time to do a more thorough analysis right now, unfortunately.
cheers
sven