On 17.01.2020 12:13, Sven Schreiber wrote:
Hi,
more on the issue of effects of (openblas?) multithreading, but this
time unrelated to MPI. Possibly some pretty large potential gains here,
if my results are correct.
I'm attaching my test script which I ran with yesterday's snapshot on
the following machine specs:
nproc = 4
blascore = "Sandybridge"
os = "windows"
blas = "openblas"
omp_num_threads = 4
ncores = 4
omp = 1
blas_parallel = "OpenMP"
I varied the environment variables OMP_NUM_THREADS and
OPENBLAS_NUM_THREADS:
OMP... OPENBLAS... best of 3 runs
<unset/default> <unset/default> 6.4 s
4 4 6.4 s
4 1 6.4 s
1 4 2.9 s
1 1 2.9 s
So the OPENBLAS... settings are irrelevant, but using only a single OMP
thread is twice as fast than gretl's default setting!
(In the script you see that the matrix dimensions are 200x10 and 200x80.)
I guess the underlying Cholesky routine should be checked as well as the
possible root cause.
cheers
sven
Hi Sven again,
my machine is still: 4 Hyper-Threaded Core i7-8550U CPU @ 1.80GHz, but
this time we use new Allin's compiler/linker settings for Debian-like
system to force linking against OpenBLAS (0.3.7 + OpenMP).
My results:
OMP... OPENBLAS... best of 3 runs
<unset/default> <unset/default> 2.067
4 4 2.83
4 1 2.83
1 4 2.81
1 1 2.80
------------------------------------------
8 8 2.86
8 4 2.81
4 8 2.86
8 1 2.81
1 8 2.87
------------------------------------------
I'm attaching shell script to run test for all possibilities.
Cheers,
Marcin
--
Marcin Błażejowski