Thanks to all of you who have run the matrix_perf tests. This will
be helpful in setting gretl's (internal, default) parameters for
using the system BLAS versus OpenMP (where available), versus our
own single-threaded matrix multiplication code.
Let me address some of the little "issues" that have come up, then
I'll offer a brief discussion of the results people have posted.
Issues
* Artur getting "error in function printvec, line 1": I think your
gretl sources must be not quite up to date. I'm not seeing this
problem.
* "nan" getting printed in the Gflops column in some cases. This is
due to division by zero, really meaning that the timer offered by
the OS is not very good. In the latest matrix_perf update (0.4) I've
converted such results to "inf", which is not true (!) but at least
should get the rank-ordering right.
* Some uncertainty over gretl's identification of the BLAS variant
against which gretl is linked. As I mentioned, gretl is relying on
the "ldd" program for this info.
I thought that in Artur's case gretl was misindentifying the BLAS as
the Netlib "reference" version when it must be some optimized
variant; that was because it was showing as 5-7 times as fast as
"vanilla" on some of the experiments. But Artur tells me he's using
Debian/Ubuntu libblas version 1.2.20110419-5, and that's Netlib OK.
I now think that the Debian guys must be building libblas with very
aggressive optimization at the compiler level, probably inducing
vectorization for amd64. So it seems our detection was right, but my
expectation that Netlib BLAS couldn't be so fast relative to vanilla
was wrong.
Then there was Hélio seeing on stderr, "detect blas: confused, found
too many libs!". This was because we have, in the ldd output:
libblas.so.3 => /lib64/libblas.so.3
libf77blas.so.3 => /usr/lib64/atlas/libf77blas.so.3
libcblas.so.3 => /usr/lib64/atlas/libcblas.so.3
The first of these lines suggested plain Netlib to gretl, while the
others suggested the BLAS was ATLAS. I'm not sure about this, but I
suspect it really is ATLAS, with ATLAS BLAS having been placed at
/lib64/libblas.so.3 via the "alternatives" mechanism that Sven
mentioned. It would be good if that could be confirmed; then we ccan
adjust our detection code accordingly.
Discussion
I'll just mention some points that seem of interest to me.
* If Hélio's BLAS is really ATLAS (on Fedora 19), it's surprisingly
unimpressive, coming a poor second to gretl's openmp code (or even
third). Maybe it's single-threaded? Or maybe it's not really ATLAS
after all?
* Henrique's Mac Mini (plus follow-up on the MacBook Pro): Apple's
VecLib seems to do quite nicely!
* Ignacio's Dell Xeon: Netlib does quite well in some cases but is
clearly single-threaded and is beaten out by OpenMP on larger
problems.
For comparison, I've posted results for my two systems (a desktop
named "myrtle" and laptop named "waverley") at
http://ricardo.ecn.wfu.edu/~cottrell/tmp/myrtle-mperf.txt
http://ricardo.ecn.wfu.edu/~cottrell/tmp/waverley-mperf.txt
These show openblas operating at up to 50 times the speed of
"vanilla" and up to 10 times the speed of gretl's openmp -- which
explains why I bothered building openblas for the 64-bit Windows
gretl packge. I would recommend installing openblas on Linux,
though you'd want to make sure that if you do so, it's a build that
uses OpenMP rather than "raw" pthreads, otherwise it will not play
nicely with gretl's internal uses of OpenMP.
Allin