I'm also late to this party. Here are my results from 64-bit Windows 7:
? matrix_perf(0)
dgemm experiment 1, variant 1, speed in Gflops
m n k vanilla openmp openblas
128 128 128 0.51479 2.4324 0.49444
128 128 256 1.3617 2.8694 7.5620
128 128 512 1.3178 3.2178 7.3135
128 128 1024 1.4547 3.2327 6.8889
128 128 2048 1.3938 3.2093 7.5701
result: openblas dominates for mnk >= 4194304
openmp dominates for mnk < 4194304
dgemm experiment 1, variant 2, speed in Gflops
m n k vanilla openmp openblas
128 128 128 1.1285 2.8591 6.7398
256 256 128 1.3835 3.0657 4.7816
512 512 128 1.2706 3.1040 6.3758
1024 1024 128 1.3621 3.1572 6.9379
2048 2048 128 1.3089 3.0210 6.5357
result: openblas dominates
dgemm experiment 1, variant 3, speed in Gflops
m n k vanilla openmp openblas
128 128 128 1.3427 2.2489 3.7473
256 256 256 1.3720 3.2505 7.2717
512 512 512 1.4024 3.2591 7.0756
1024 1024 1024 1.1824 2.9743 8.2696
2048 2048 2048 1.1475 2.5411 8.1775
result: openblas dominates
dgemm experiment 2, variant 1, speed in Gflops
m n k vanilla openmp openblas
8 8 8 0.51573 0.037176 0.57536
16 8 8 0.63935 0.073940 0.86400
32 8 8 0.72757 0.14150 1.3334
64 8 8 0.82663 0.26622 1.5487
128 8 8 0.78510 0.44572 1.9989
256 8 8 0.90749 0.68418 1.2459
512 8 8 0.96497 0.88337 1.8486
1024 8 8 0.87171 1.1631 1.7002
2048 8 8 0.95462 1.2275 2.0776
4096 8 8 0.78187 1.4584 1.9049
result: openblas dominates
dgemm experiment 2, variant 2, speed in Gflops
m n k vanilla openmp openblas
10 2 1000 1.2314 0.62643 2.7906
20 2 1000 1.2397 0.97835 3.0542
40 2 1000 1.2850 1.5531 3.4026
80 2 1000 1.3963 2.2204 3.3032
160 2 1000 1.3556 2.3921 2.9239
320 2 1000 1.3506 2.1077 2.8484
640 2 1000 1.1736 1.9558 2.0293
1280 2 1000 1.0521 1.8458 1.9304
2560 2 1000 1.0719 1.7515 1.7272
5120 2 1000 1.0384 1.9335 1.8989
result: openmp dominates for mnk >= 5120000
openblas dominates for mnk < 5120000
dgemm experiment 2, variant 3, speed in Gflops
m n k vanilla openmp openblas
10 10 1000 1.0817 1.9209 4.1470
20 10 1000 1.0978 2.4580 4.2459
40 10 1000 1.2954 2.1036 6.0306
80 10 1000 1.3542 1.9040 6.2067
160 10 1000 1.4430 2.0096 5.9852
320 10 1000 1.4470 2.0040 5.4259
result: openblas dominates
Operating system: Windows (64-bit)
BLAS library: OpenBLAS
Number of processors: 4
OpenMP enabled: yes
Performance summary:
vanilla -
dominates outright in 0 out of 6 tests
openmp -
dominates outright in 0 out of 6 tests
dominates in 1 test(s) for mnk >= 5120000
dominates in 1 test(s) for mnk < 4194304
openblas -
dominates outright in 4 out of 6 tests
dominates in 1 test(s) for mnk >= 4194304
dominates in 1 test(s) for mnk < 5120000
PS
-----Original Message-----
From: gretl-devel-bounces(a)lists.wfu.edu [mailto:gretl-devel-bounces@lists.wfu.edu] On
Behalf Of Riccardo (Jack) Lucchetti
Sent: Wednesday, June 18, 2014 12:13 PM
To: Gretl development
Subject: Re: [Gretl-devel] matrix_perf results
On Tue, 17 Jun 2014, Allin Cottrell wrote:
Thanks to all of you who have run the matrix_perf tests. This will be
helpful in setting gretl's (internal, default) parameters for using
the system BLAS versus OpenMP (where available), versus our own
single-threaded matrix multiplication code.
Sorry I'm late. A few more interesting results here: two machines with same operating
system (64-bit debian). One is a low-end dual core, the other is a modern machine with avx
and two physical processors, each with
8 hyperthreaded cores.
The results follow, but I believe the moral of the story (also, having seen the results
others posted earlier) is quite evident: the "right"
software setup depends heavily on what your hardware/software combination is.
Machine #1:
? matrix_perf(1234)
dgemm experiment 1, variant 1, speed in Gflops
m n k vanilla openmp netlib
128 128 128 1.1414 3.8052 4.4472
128 128 256 1.6901 3.9216 9.6005
128 128 512 1.7049 4.0992 12.939
128 128 1024 1.7066 4.0703 14.629
128 128 2048 1.6609 3.0855 14.598
result: netlib dominates
dgemm experiment 1, variant 2, speed in Gflops
m n k vanilla openmp netlib
128 128 128 1.6039 2.8291 12.559
256 256 128 1.6551 3.7319 13.157
512 512 128 1.5689 3.1081 12.157
1024 1024 128 1.7065 3.2241 13.810
2048 2048 128 1.4343 3.1975 12.901
result: netlib dominates
dgemm experiment 1, variant 3, speed in Gflops
m n k vanilla openmp netlib
128 128 128 1.6506 3.6948 9.4904
256 256 256 1.6570 3.6680 12.913
512 512 512 1.4917 3.4395 16.028
1024 1024 1024 0.70373 1.4884 18.831
2048 2048 2048 0.78776 1.5937 17.032
result: netlib dominates
dgemm experiment 2, variant 1, speed in Gflops
m n k vanilla openmp netlib
8 8 8 0.46703 0.37920 0.32284
16 8 8 0.63081 0.60029 0.54729
32 8 8 0.73601 0.89405 0.92477
64 8 8 0.90807 1.1178 1.3149
128 8 8 0.99528 1.4653 1.6911
256 8 8 1.0726 1.5460 1.5603
512 8 8 1.0186 1.8071 2.1102
1024 8 8 1.1191 1.6092 2.1806
2048 8 8 1.0810 1.6515 2.2472
4096 8 8 1.1118 1.6283 2.2618
result: netlib dominates for mnk >= 2048
vanilla dominates for mnk < 2048
dgemm experiment 2, variant 2, speed in Gflops
m n k vanilla openmp netlib
10 2 1000 1.4020 1.0541 2.9305
20 2 1000 1.2824 1.3566 3.2125
40 2 1000 1.4981 1.9113 2.3364
80 2 1000 1.6409 3.3097 3.0373
160 2 1000 1.6510 3.4486 2.8229
320 2 1000 1.2217 2.5190 2.2135
640 2 1000 0.80308 1.6429 1.9382
1280 2 1000 0.78715 1.5664 1.8491
2560 2 1000 0.71995 1.5641 1.8383
5120 2 1000 0.80175 1.5103 1.7516
result: netlib dominates for mnk >= 1280000
dgemm experiment 2, variant 3, speed in Gflops
m n k vanilla openmp netlib
10 10 1000 1.3822 2.9129 7.9036
20 10 1000 1.2999 3.3758 9.7947
40 10 1000 1.5159 3.2116 10.159
80 10 1000 1.6433 3.8072 11.100
160 10 1000 1.4395 4.1740 11.270
320 10 1000 1.1835 2.6255 9.5855
result: netlib dominates
Operating system: Linux (64-bit)
BLAS library: Netlib
Number of processors: 2
OpenMP enabled: yes
Performance summary:
vanilla -
dominates outright in 0 out of 6 tests
dominates in 1 test(s) for mnk < 2048
openmp -
dominates outright in 0 out of 6 tests
netlib -
dominates outright in 4 out of 6 tests
dominates in 2 test(s) for mnk >= (2048, 1280000)
Machine #2:
? matrix_perf(1234)
dgemm experiment 1, variant 1, speed in Gflops
m n k vanilla openmp netlib
128 128 128 0.90944 1.6489 3.6727
128 128 256 1.0361 12.066 4.0342
128 128 512 1.0363 13.360 3.2194
128 128 1024 2.1998 14.647 4.3699
128 128 2048 2.2040 15.157 4.4580
result: openmp dominates for mnk >= 4194304
netlib dominates for mnk < 4194304
dgemm experiment 1, variant 2, speed in Gflops
m n k vanilla openmp netlib
128 128 128 0.99176 7.2687 3.3558
256 256 128 1.0369 10.092 4.5736
512 512 128 1.0251 10.393 5.5707
1024 1024 128 1.0528 12.044 5.8882
2048 2048 128 1.5099 12.170 5.7919
result: openmp dominates
dgemm experiment 1, variant 3, speed in Gflops
m n k vanilla openmp netlib
128 128 128 1.0068 8.0279 3.1235
256 256 256 1.0678 12.979 5.2844
512 512 512 1.2698 14.865 6.8351
1024 1024 1024 2.3056 15.893 7.5533
2048 2048 2048 1.8804 19.930 10.979
result: openmp dominates
dgemm experiment 2, variant 1, speed in Gflops
m n k vanilla openmp netlib
8 8 8 0.31635 0.060822 0.40654
16 8 8 0.37316 0.13059 0.64134
32 8 8 0.48452 0.25061 0.89951
64 8 8 1.0088 0.35585 1.1348
128 8 8 1.2879 0.56582 1.3032
256 8 8 1.3405 0.83658 1.3995
512 8 8 1.3665 1.0493 0.61674
1024 8 8 1.3656 1.2329 0.87132
2048 8 8 1.3579 1.3256 1.0189
4096 8 8 1.3372 1.3590 0.59412
result: openmp dominates for mnk >= 262144
dgemm experiment 2, variant 2, speed in Gflops
m n k vanilla openmp netlib
10 2 1000 0.93204 0.27760 0.39253
20 2 1000 1.8316 0.51884 0.71238
40 2 1000 1.6916 0.60586 0.85907
80 2 1000 1.9516 1.0208 1.2188
160 2 1000 2.1520 1.1731 1.4970
320 2 1000 2.2201 1.4638 1.6964
640 2 1000 2.3148 1.8720 1.9165
1280 2 1000 2.2985 1.9854 2.3177
2560 2 1000 2.1321 2.4638 2.2796
5120 2 1000 1.8681 1.6833 1.5077
result: vanilla dominates for mnk >= 10240000
dgemm experiment 2, variant 3, speed in Gflops
m n k vanilla openmp netlib
10 10 1000 0.75487 0.64920 2.2521
20 10 1000 1.4513 0.99118 3.2809
40 10 1000 1.7493 1.5317 4.3326
80 10 1000 2.0249 2.1957 4.8481
160 10 1000 2.1919 3.3981 5.6694
320 10 1000 2.2979 5.4260 6.2356
result: netlib dominates
Operating system: Linux (64-bit)
BLAS library: Netlib
Number of processors: 32
OpenMP enabled: yes
Performance summary:
vanilla -
dominates outright in 0 out of 6 tests
dominates in 1 test(s) for mnk >= 10240000
openmp -
dominates outright in 2 out of 6 tests
dominates in 2 test(s) for mnk >= (4194304, 262144)
netlib -
dominates outright in 1 out of 6 tests
dominates in 1 test(s) for mnk < 4194304
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------