On Wed, 18 Jun 2014, Summers, Peter wrote:
> Small comment: that poor result from openblas on the very first
> test is probably just a matter of warming up the machine's vector
> registers; if you were to run it a second time you'd probably see
> openblas dominating from the get-go.
Right you are, Allin. The first test was pretty much the first
thing I did after booting up my computer. 2 subsequent runs show
near-total openblas domination: 6/6 cases in one run, 5/6 in the
other. The lone exception was
dgemm experiment 2, variant 2, speed in Gflops
m n k vanilla openmp openblas
10 2 1000 1.1782 0.77966 2.1771
20 2 1000 1.3308 0.99745 2.6331
40 2 1000 1.3111 1.3642 2.7879
80 2 1000 1.0477 1.4362 2.8529
160 2 1000 1.2724 1.8858 2.6338
320 2 1000 1.1903 1.8180 2.4244
640 2 1000 1.1162 1.6754 1.6777
1280 2 1000 1.1157 1.6019 1.5981
2560 2 1000 1.0857 1.7744 1.7199
5120 2 1000 1.1078 1.6811 1.7112
result: openblas dominates for mnk >= 10240000
Thanks. Yes, this last result is expected: it's the one "known hole"
in openblas performance. By default, openblas does not do
multi-threading if any of the matrix dimensions m, n, or k is less
than 4. In other batches of results (though not really here) I've
seen openmp do significantly better for matrices that are big but
have one small dimension < 4.
Allin