I just want to add another result obtained on Windows 7 (64-bit) but using gretl's 32-bit version:

<OUTPUT>
dgemm experiment 1, variant 1, speed in Gflops

         m         n         k   vanilla    openmp    netlib
       128       128       128    2.1368    3.4852    2.3743
       128       128       256    2.5487    4.4178    2.5388
       128       128       512    2.7106    5.0125    2.7782
       128       128      1024    2.7998    5.4136    2.8940
       128       128      2048    2.7670    5.3196    2.9199

result: openmp dominates

dgemm experiment 1, variant 2, speed in Gflops

         m         n         k   vanilla    openmp    netlib
       128       128       128    2.3372    3.4591    2.3988
       256       256       128    2.3256    4.4618    2.7094
       512       512       128    2.1448    4.6779    2.7654
      1024      1024       128    2.4723    4.7650    2.7767
      2048      2048       128    2.5796    4.6320    2.7367

result: openmp dominates

dgemm experiment 1, variant 3, speed in Gflops

         m         n         k   vanilla    openmp    netlib
       128       128       128    2.6201    3.3397    2.3657
       256       256       256    2.8571    5.0325    2.9471
       512       512       512    3.1156    4.9564    3.1536
      1024      1024      1024    2.3233    5.2080    2.3002
      2048      2048      2048    2.3192    4.4718    2.3189

result: openmp dominates

dgemm experiment 2, variant 1, speed in Gflops

         m         n         k   vanilla    openmp    netlib
         8         8         8   0.44731  0.027292   0.46398
        16         8         8   0.56046  0.050733   0.62706
        32         8         8   0.66725   0.10275   0.66642
        64         8         8   0.73151   0.18160   0.74646
       128         8         8   0.78491   0.30364   0.79847
       256         8         8   0.80797   0.45798   0.82017
       512         8         8   0.79569   0.61589   0.78943
      1024         8         8   0.97164   0.75438   0.82914
      2048         8         8   0.83483   0.80910   0.71419
      4096         8         8   0.84114   0.85660   0.83595

result: openmp dominates for mnk >= 262144

dgemm experiment 2, variant 2, speed in Gflops

         m         n         k   vanilla    openmp    netlib
        10         2      1000    1.9133   0.59579    2.2596
        20         2      1000    2.4185   0.93620    2.6299
        40         2      1000    2.4249    1.5991    2.4208
        80         2      1000    2.5991    2.3573    2.7317
       160         2      1000    2.8793    3.3007    2.9413
       320         2      1000    2.9538    4.5477    2.9906
       640         2      1000    2.2554    3.4918    2.2917
      1280         2      1000    2.2609    3.7071    2.2745
      2560         2      1000    2.2118    3.3263    2.2296
      5120         2      1000    2.2168    3.4272    2.2340

result: openmp dominates for mnk >= 320000

dgemm experiment 2, variant 3, speed in Gflops

         m         n         k   vanilla    openmp    netlib
        10        10      1000    1.9689    1.9801    2.3104
        20        10      1000    2.4662    3.0143    2.7699
        40        10      1000    2.4801    3.8146    2.4038
        80        10      1000    2.6682    4.2088    2.7531
       160        10      1000    2.9370    4.6238    2.9567
       320        10      1000    2.9992    4.6294    2.9979

result: openmp dominates for mnk >= 200000
  netlib dominates for mnk < 200000

Operating system: Windows (32-bit)
BLAS library: Netlib
Number of processors: 4
OpenMP enabled: yes

Performance summary:

vanilla -
  dominates outright in 0 out of 6 tests

openmp -
  dominates outright in 3 out of 6 tests
  dominates in 3 test(s) for mnk >= (262144, 320000, 200000)

netlib -
  dominates outright in 0 out of 6 tests
  dominates in 1 test(s) for mnk < 200000

</OUTPUT>

Artur


2014-06-18 21:05 GMT+02:00 Allin Cottrell <cottrell@wfu.edu>:
On Wed, 18 Jun 2014, Summers, Peter wrote:

>> Small comment: that poor result from openblas on the very first
>> test is probably just a matter of warming up the machine's vector
>> registers; if you were to run it a second time you'd probably see
>> openblas dominating from the get-go.
>
> Right you are, Allin. The first test was pretty much the first
> thing I did after booting up my computer. 2 subsequent runs show
> near-total openblas domination:  6/6 cases in one run, 5/6 in the
> other. The lone exception was
>
> dgemm experiment 2, variant 2, speed in Gflops
>
>         m         n         k   vanilla    openmp  openblas
>        10         2      1000    1.1782   0.77966    2.1771
>        20         2      1000    1.3308   0.99745    2.6331
>        40         2      1000    1.3111    1.3642    2.7879
>        80         2      1000    1.0477    1.4362    2.8529
>       160         2      1000    1.2724    1.8858    2.6338
>       320         2      1000    1.1903    1.8180    2.4244
>       640         2      1000    1.1162    1.6754    1.6777
>      1280         2      1000    1.1157    1.6019    1.5981
>      2560         2      1000    1.0857    1.7744    1.7199
>      5120         2      1000    1.1078    1.6811    1.7112
>
> result: openblas dominates for mnk >= 10240000

Thanks. Yes, this last result is expected: it's the one "known hole"
in openblas performance. By default, openblas does not do
multi-threading if any of the matrix dimensions m, n, or k is less
than 4. In other batches of results (though not really here) I've
seen openmp do significantly better for matrices that are big but
have one small dimension < 4.

Allin

_______________________________________________
Gretl-devel mailing list
Gretl-devel@lists.wfu.edu
http://lists.wfu.edu/mailman/listinfo/gretl-devel