Am 22.01.2020 um 22:44 schrieb Allin Cottrell:
On Wed, 22 Jan 2020, Sven S wrote:
> Am 22.01.20 um 14:47 schrieb Marcin Błażejowski:
>> my machine is: 4 Hyper-Threaded Core i7-8550U CPU @ 1.80GHz.
>> My results:
>> OMP... OPENBLAS... best of 3 runs
>> <unset/default> <unset/default> 7.54
>> 4 4 7.24
>> 4 1 8.36
>> 1 4 12.66
>> 1 1 12.51
> Thanks Marcin, that's interesting! Actually I have the same CPU in a
> laptop, so I could cross-check how the Windows package influences the
> whole thing.
So with that same i7-8550U I get on Windows 10 (only OMP varying):
It hasn't been clear to me whether in the end you had Openblas or Netlib
effectively being active. But apart from that the single-threaded
performance here might suggest that the CPU's Turboboost isn't supported
by the Linux drivers on your system? For the single thread the CPU goes
up all the way to 3.9GHz -- of course in some sense it's comparing
apples to oranges, but OTOH that's the reality of the hardware the stuff
is running on...
But beyond that point I think we're seriously getting into the weeds.
Restricting openblas to single-threaded operation may be advantageous
for some combinations of architecture, openblas variant, lapack
function called, OS and problem-size but it's very hard to generalize.
I was finally able to get a speed advantage for 4 threads when I
increased the problem size in the script to T=1000, N=40. So of course
you're right that single-thread is not the universal solution. But it
does seem that openblas tries multithreading much too aggressively.