On Fri, 10 Jan 2020, Ioannis A. Venetis wrote:
Running the following script I get a huge difference
timerB = 0.010171000
timerA = 0.22064500
ratio = 21.693540
where the fast "timerB" pertains to use of eiggen2() -- now renamed
as eigen(), eigen-decomposition of a general matrix -- and the
seemingly much slower "timerA" pertains to eigensym(),
eigen-decomposition of a symmetric matrix. Both of these employed on
matrices which are by construction symmetric.
There are definitely some points to note here, though I've not been
able to reproduce anything like the asymmetry you found.
First, I think you may need more replications to generate meaningful
results. Second, the matrices you pass to the respective eigen
functions are tiny, 4 x 4. There's nothing wrong with that, but it
means that any function that's at all "optimized" (let alone
parallelized) is almost sure to take longer than the plainest of
plain vanilla algorithms.
We recently (git, snapshots) made eigensym() default to a cleverly
optimized lapack variant, dsyevr(). I've just recently changed that
so that we use use dsyevr only if the order of the input matrix is
at least 10. That helps a little in the case of tiny input.
Another point is that when openblas includes a parallelized version
of a lapack function, by default all available threads get used in
its execution. But today's "consumer" CPUs typically offer twice as
many threads as real/physical cores (hyper-threading) and on dense
computations such as lapack functions using all threads can slow
things down quite a bit. This will probably hurt most for tiny input
where multi-threading isn't really justified to start with.
Anyway, I'm appending below a modified version of your script, with
a switch to control tiny versus bigger input. I recommend running
this with OMP_NUM_THREADS set in the environment to the number of
physical cores on your system. On my (kinda elderly) home system (4
cores, 8 threads max) here's what I'm seeing:
With OMP_NUM_THREADS=4
With tiny_mat = 1 (order 4, 20000 replications):
eigen() time: 0.2217s
eigensym() time: 0.2200s
ratio of eigen() time to eigensym() time: 1.00768
With tiny_mat = 0 (order 40, 2000 replications):
eigen() time: 1.1705s
eigensym() time: 0.6310s
ratio of eigen() time to eigensym() time: 1.85502
Script:
<hansl>
set verbose off
set seed 8402212 # or whatever
tiny_mat = 0 # or 0 for bigger input!
reps = tiny_mat ? 20000 : 2000
dim = tiny_mat ? 4 : 40
matrices AA = array(reps)
loop i=1..reps -q
A = mrandgen(i, -5, 5, dim, dim)
AA[i] = A'A
endloop
matrix V = {}
matrix W = {}
set stopwatch
loop i=1..reps -q
D = eigen(AA[i], &V, &W)
Tmp = mreverse(msortby(Re(D)~Re(V)', 1))
D = Tmp[,1]
V = Tmp[,2:]'
endloop
eigen_time = $stopwatch
printf "eigen() time: %.4fs\n", eigen_time
set stopwatch
loop i=1..reps -q
D = eigensym(AA[i], &V)
endloop
eigensym_time = $stopwatch
printf "eigensym() time: %.4fs\n", eigensym_time
printf "ratio of eigen() time to eigensym() time: %g\n",
eigen_time / eigensym_time
</hansl>
Allin