On Wed, 22 Jan 2020, Sven S wrote:
Am 22.01.20 um 14:47 schrieb Marcin Błażejowski:
>
> Hi Sven,
>
> my machine is: 4 Hyper-Threaded Core i7-8550U CPU @ 1.80GHz.
>
> My results:
>
> OMP... OPENBLAS... best of 3 runs
> <unset/default> <unset/default> 7.54
> 4 4 7.24
> 4 1 8.36
> 1 4 12.66
> 1 1 12.51
Thanks Marcin, that's interesting! Actually I have the same CPU in a laptop,
so I could cross-check how the Windows package influences the whole thing.
(I'm assuming you were on Linux with that test?)
Another set of timings, from two machines. I'm using an OMP-enabled
build of openblas so I've omitted the OPENBLAS_NUM_THREADS column,
which is ignored by the library.
4 Hyper-Threaded Core i7-7700U CPUs @ 4.20GHz (haswell)
OMP_NUM_THREADS best of 3 runs
default: 8 1.9 s
4 1.5 s
1 1.4 s
4 Hyper-Threaded Core i7-2600 CPUs @ 3.40GHz (sandybridge)
OMP_NUM_THREADS best of 3 runs
default: 8 3.9 s
4 3.0 s
1 2.7 s
It seems pretty clear that in the SMT/HT case we should limit the
number of blas threads to the number of actual cores by default --
I'll work on that.
But beyond that point I think we're seriously getting into the
weeds. Restricting openblas to single-threaded operation may be
advantageous for some combinations of architecture, openblas
variant, lapack function called, OS and problem-size but it's very
hard to generalize.
Allin