I just want to add another result obtained on Windows 7 (64-bit) but using
gretl's 32-bit version:
<OUTPUT>
dgemm experiment 1, variant 1, speed in Gflops
m n k vanilla openmp netlib
128 128 128 2.1368 3.4852 2.3743
128 128 256 2.5487 4.4178 2.5388
128 128 512 2.7106 5.0125 2.7782
128 128 1024 2.7998 5.4136 2.8940
128 128 2048 2.7670 5.3196 2.9199
result: openmp dominates
dgemm experiment 1, variant 2, speed in Gflops
m n k vanilla openmp netlib
128 128 128 2.3372 3.4591 2.3988
256 256 128 2.3256 4.4618 2.7094
512 512 128 2.1448 4.6779 2.7654
1024 1024 128 2.4723 4.7650 2.7767
2048 2048 128 2.5796 4.6320 2.7367
result: openmp dominates
dgemm experiment 1, variant 3, speed in Gflops
m n k vanilla openmp netlib
128 128 128 2.6201 3.3397 2.3657
256 256 256 2.8571 5.0325 2.9471
512 512 512 3.1156 4.9564 3.1536
1024 1024 1024 2.3233 5.2080 2.3002
2048 2048 2048 2.3192 4.4718 2.3189
result: openmp dominates
dgemm experiment 2, variant 1, speed in Gflops
m n k vanilla openmp netlib
8 8 8 0.44731 0.027292 0.46398
16 8 8 0.56046 0.050733 0.62706
32 8 8 0.66725 0.10275 0.66642
64 8 8 0.73151 0.18160 0.74646
128 8 8 0.78491 0.30364 0.79847
256 8 8 0.80797 0.45798 0.82017
512 8 8 0.79569 0.61589 0.78943
1024 8 8 0.97164 0.75438 0.82914
2048 8 8 0.83483 0.80910 0.71419
4096 8 8 0.84114 0.85660 0.83595
result: openmp dominates for mnk >= 262144
dgemm experiment 2, variant 2, speed in Gflops
m n k vanilla openmp netlib
10 2 1000 1.9133 0.59579 2.2596
20 2 1000 2.4185 0.93620 2.6299
40 2 1000 2.4249 1.5991 2.4208
80 2 1000 2.5991 2.3573 2.7317
160 2 1000 2.8793 3.3007 2.9413
320 2 1000 2.9538 4.5477 2.9906
640 2 1000 2.2554 3.4918 2.2917
1280 2 1000 2.2609 3.7071 2.2745
2560 2 1000 2.2118 3.3263 2.2296
5120 2 1000 2.2168 3.4272 2.2340
result: openmp dominates for mnk >= 320000
dgemm experiment 2, variant 3, speed in Gflops
m n k vanilla openmp netlib
10 10 1000 1.9689 1.9801 2.3104
20 10 1000 2.4662 3.0143 2.7699
40 10 1000 2.4801 3.8146 2.4038
80 10 1000 2.6682 4.2088 2.7531
160 10 1000 2.9370 4.6238 2.9567
320 10 1000 2.9992 4.6294 2.9979
result: openmp dominates for mnk >= 200000
netlib dominates for mnk < 200000
Operating system: Windows (32-bit)
BLAS library: Netlib
Number of processors: 4
OpenMP enabled: yes
Performance summary:
vanilla -
dominates outright in 0 out of 6 tests
openmp -
dominates outright in 3 out of 6 tests
dominates in 3 test(s) for mnk >= (262144, 320000, 200000)
netlib -
dominates outright in 0 out of 6 tests
dominates in 1 test(s) for mnk < 200000
</OUTPUT>
Artur
2014-06-18 21:05 GMT+02:00 Allin Cottrell <cottrell(a)wfu.edu>:
On Wed, 18 Jun 2014, Summers, Peter wrote:
>> Small comment: that poor result from openblas on the very first
>> test is probably just a matter of warming up the machine's vector
>> registers; if you were to run it a second time you'd probably see
>> openblas dominating from the get-go.
>
> Right you are, Allin. The first test was pretty much the first
> thing I did after booting up my computer. 2 subsequent runs show
> near-total openblas domination: 6/6 cases in one run, 5/6 in the
> other. The lone exception was
>
> dgemm experiment 2, variant 2, speed in Gflops
>
> m n k vanilla openmp openblas
> 10 2 1000 1.1782 0.77966 2.1771
> 20 2 1000 1.3308 0.99745 2.6331
> 40 2 1000 1.3111 1.3642 2.7879
> 80 2 1000 1.0477 1.4362 2.8529
> 160 2 1000 1.2724 1.8858 2.6338
> 320 2 1000 1.1903 1.8180 2.4244
> 640 2 1000 1.1162 1.6754 1.6777
> 1280 2 1000 1.1157 1.6019 1.5981
> 2560 2 1000 1.0857 1.7744 1.7199
> 5120 2 1000 1.1078 1.6811 1.7112
>
> result: openblas dominates for mnk >= 10240000
Thanks. Yes, this last result is expected: it's the one "known hole"
in openblas performance. By default, openblas does not do
multi-threading if any of the matrix dimensions m, n, or k is less
than 4. In other batches of results (though not really here) I've
seen openmp do significantly better for matrices that are big but
have one small dimension < 4.
Allin
_______________________________________________
Gretl-devel mailing list
Gretl-devel(a)lists.wfu.edu
http://lists.wfu.edu/mailman/listinfo/gretl-devel