Hi,
more on the issue of effects of (openblas?) multithreading, but this
time unrelated to MPI. Possibly some pretty large potential gains here,
if my results are correct.
I'm attaching my test script which I ran with yesterday's snapshot on
the following machine specs:
nproc = 4
blascore = "Sandybridge"
os = "windows"
blas = "openblas"
omp_num_threads = 4
ncores = 4
omp = 1
blas_parallel = "OpenMP"
I varied the environment variables OMP_NUM_THREADS and OPENBLAS_NUM_THREADS:
OMP... OPENBLAS... best of 3 runs
<unset/default> <unset/default> 6.4 s
4 4 6.4 s
4 1 6.4 s
1 4 2.9 s
1 1 2.9 s
So the OPENBLAS... settings are irrelevant, but using only a single OMP
thread is twice as fast than gretl's default setting!
(In the script you see that the matrix dimensions are 200x10 and 200x80.)
I guess the underlying Cholesky routine should be checked as well as the
possible root cause.
cheers
sven