Am 22.01.2020 um 22:44 schrieb Allin Cottrell:
4 Hyper-Threaded Core i7-7700U CPUs @ 4.20GHz (haswell)
OMP_NUM_THREADS best of 3 runs
default: 8 1.9 s
4 1.5 s
1 1.4 s
4 Hyper-Threaded Core i7-2600 CPUs @ 3.40GHz (sandybridge)
OMP_NUM_THREADS best of 3 runs
default: 8 3.9 s
4 3.0 s
1 2.7 s
It seems pretty clear that in the SMT/HT case we should limit the number
of blas threads to the number of actual cores by default -- I'll work on
that.
But beyond that point I think we're seriously getting into the weeds.
Restricting openblas to single-threaded operation may be advantageous
for some combinations of architecture, openblas variant, lapack function
called, OS and problem-size but it's very hard to generalize.
Thanks Allin, I guess that's true. In principle of course it's the
responsibility of openblas to get that right, and from the discussions
over there I think they are aware of that but probably just don't have
the resources.
Maybe the matrix_perf package could be extended to run a bunch of
benchmarks on cholesky(), svd(), eigen() with different input sizes, so
that over time we can accumulate more evidence?
cheers
sven