On Thu, 16 Jan 2020, Sven Schreiber wrote:
Hi, here's a question about what is really affected by setting
the
number of threads used by OpenMP via an environment variable and/or via
the option to an MPI block (as explained in the gretl + MPI documentation).
My test script has an MPI block as its main part which calls a certain
worker function. Basically it's a VAR bootstrap, so making heavy use of
mnormal(), varsimul() and mols().
The MPI block is specified with --omp-threads=1. I thought that means
that no multithreading is done at the OpenMP level.
My blas is openblas and shows up with the option blas_parallel = "OpenMP".
However: When I run this script (on a Jan-11 Windows snapshot) with the
environment set to OMP_NUM_THREADS=1 it takes about 1/3 _less_ time than
with OMP_NUM_THREADS=4, on a 4-physical-cores CPU. This happens for any
variation of the numbers of parallel MPI processes.
OK, so given our recent discussion about the problems of openblas
multithreading, the ranking of the timings is maybe not so surprising
anymore. But: why does OMP_NUM_THREADS=1 have an additional effect over
--omp-threads=1 when everything happens inside the MPI block ?
I hope the problem was clearly explained.
Just about -- but there's one thing I need to know: when the script is
taking a relatively long time, do you or don't you have
OMP_NUM_THREADS=4 set in the environment?
Allin