On 28.01.2020 11:00, Sven Schreiber wrote:
Am 28.01.2020 um 10:53 schrieb Marcin Błażejowski:
>>> On 28.01.2020 09:18, Sven Schreiber wrote:
>>>> But openblas alone doesn't explain the equality of the 4-thread and
>>>> the single-thread outcome, at least comparing to the Windows results.
>>>> Or did you compile openblas yourself? (Perhaps with a different dgemm
>>>> threshold?)
>>>>
> This is 'libopenblas-base:amd64 0.3.7+ds-7' which is transitional to (in
> my case) 'libopenblas0:amd64 0.3.7+ds-7' which is meta package and true
> one is 'libopenblas0-openmp:amd64 0.3.7+ds-7'.
Then I don't see an obvious explanation why you get good performance
with OMP_NUM_THREADS=4 and and I did not. When Allin upgraded to
openblas 0.3.7 in the Windows snapshots this didn't change. Maybe some
other compile options are still different?
It's possible since I use quite agressive gcc optimisations:
-march=skylake -O3 -pipe
Cheers,
Marcin
--
Marcin Błażejowski