Am 11.01.2020 um 01:33 schrieb Allin Cottrell:
Another point is that when openblas includes a parallelized version
of a
lapack function, by default all available threads get used in its
execution. But today's "consumer" CPUs typically offer twice as many
threads as real/physical cores (hyper-threading) and on dense
computations such as lapack functions using all threads can slow things
down quite a bit. This will probably hurt most for tiny input where
multi-threading isn't really justified to start with.
Hm, is openblas really so naive? Or to put it differently, is it the
responsibility of the caller to pick the non-parallel version if needed?
Anyway, I'm appending below a modified version of your script,
with a
switch to control tiny versus bigger input. I recommend running this
with OMP_NUM_THREADS set in the environment to the number of physical
cores on your system. On my (kinda elderly) home system (4 cores, 8
threads max) here's what I'm seeing:
Hm, I'm using a perhaps even older 4-core system _without_ HT which has
omp_num_threads = 4 as per $sysinfo, and with the brand new snapshot I get:
tiny YES: ratio 0.11
tiny NO: ratio 0.37
So eigensym looks pretty bad "always". Since Ioannis had a newer PC,
this doesn't look like a pre-Haswell CPU issue. Perhaps something
Windows-specific (with OpenMP)?
cheers
sven