Hi,
I know this is very old topic but I have new toy: AMD Phenom II X6 1100T
+ 6 GB RAM so I've decided to check how it handles with OpenMP. Results:
with OpenMP:
40000: 0,16 ( 0,249 Mflops)
160000: 0,31 ( 0,523 Mflops)
640000: 0,85 ( 0,754 Mflops)
2560000: 1,04 ( 2,467 Mflops)
10240000: 2,20 ( 4,662 Mflops)
without OpenMP:
40000: 0,08 ( 0,500 Mflops)
160000: 0,23 ( 0,696 Mflops)
640000: 0,78 ( 0,821 Mflops)
2560000: 3,85 ( 0,665 Mflops)
10240000: 12,75 ( 0,803 Mflops)
Having in mind Jack's question asked at 2nd Gretl Conference: "what
should be done before Gretl jumps to 2.0" in my opinion is/could be to
allow (power) user to use OpenMP inside scripts dividing big loops into
multiple cores of CPU.
Marcin
W dniu 24.03.2010 04:44, Allin Cottrell pisze:
As some of you know, we're currently experimenting with openmp
in
gretl. When building from CVS, use of openmp is the default (if
openmp is supported on the host) unless you pass the option
--disable-openmp to the configure script. In addition the current
snapshots for Windows and OS X are built with openmp support
(using gcc 4.4.3 and gcc 4.2.4 respectively).
This note is just to inform you about the state of play, and to
invite submission of test results if people would like to do that.
Right now, we use openmp only for gretl's native matrix
multiplication. So it'll get used (assuming you have at least two
cores) if you do matrix multiplication in a script, or call a
function that does matrix multiplication (such as qform), or use a
built-in command that happens to call matrix multiplication. If we
decide it's a good idea, we could use openmp directives in other
gretl code (but as along as we rely on lapack for much of our
number-crunching, and as long as lapack is not available in a
parallelized form, the scope for threading will remain somewhat
limited).
In a typical current use situation, with gretl running on a
dual-core machine where there's little other demand being placed
on the processors, the asymptotic speed-up from openmp should be
close to a factor of two. However, it takes a big calculation to
get close to the asymptote, and we've found that with small to
moderate sized matrices the overhead from starting and stopping
threads dominates, producing a slowdown relative to serial code.
This is similar to what we found with regard to the ATLAS
optimized blas; see
http://ricardo.ecn.wfu.edu/~cottrell/tmp/gretl_speed.html
Anyway, in case anyone would like to test I'm attaching a matrix
multiplication script that Jack wrote. Right now this is mostly
useful for people building gretl from source, since you want to
run timings both with and without MP, which requires rebuilding.
But if you're currently using a snapshot from before yesterday
(build date 2010-03-21 or earlier) you could run the script, then
download a current snapshot and run it again.
Allin
_______________________________________________
Gretl-devel mailing list
Gretl-devel(a)lists.wfu.edu
http://lists.wfu.edu/mailman/listinfo/gretl-devel
--
Marcin Błażejowski
http://www.wrzosy.nsb.pl/~marcin/
GG# 203127