Hi,
some time ago I found an interesting disscusion on code optimisiation in
gcc with '-avx2 ' flag in case of copying blocks of matrixes. So, I
wrote a simple script and I got the following results:
? set verbose off
10,586272
9,690394
10,030351
10,335968
8,9853768
Difference 1: 0,895878
Difference 2: -0,305616
I have to say that I'm little surprised. Although I would expect
difference between the first and the second loop ('Difference 1:
0,895878'), I wouldn't expect any difference between timings of loops
three and four ('Difference 2: -0,305616'). And, after all, why the
difference between the last loop and the other loops are so big?
Marcin
<hansl>
set verbose off
clear
scalar ROW = 10000
scalar COL = 20
scalar BLOCK_size = 2000
scalar BLOCK_start = 115
scalar LOOP = 600
matrix A = mnormal(ROW, COL)
matrix B1 = zeros(ROW, COL)
matrix B2 = B1
matrix B3 = B1
matrix B4 = B1
matrix B5 = B1
set stopwatch
loop LOOP --quiet
loop i=1..BLOCK_size --quiet
loop j=1..COL --quiet
B1[BLOCK_start - 1 + i, j] = A[BLOCK_start - 1 + i, j]
endloop
endloop
endloop
est1 = $stopwatch
eval est1
set stopwatch
loop LOOP --quiet
loop i=0..BLOCK_size-1 --quiet
loop j=1..COL --quiet
B2[BLOCK_start + i, j] = A[BLOCK_start + i, j]
endloop
endloop
endloop
est2 = $stopwatch
eval est2
loop LOOP --quiet
loop i=1..BLOCK_size --quiet
loop j=1..COL --quiet
tmp1 = BLOCK_start - 1 + i
B3[tmp1, j] = A[tmp1, j]
endloop
endloop
endloop
est3 = $stopwatch
eval est3
set stopwatch
loop LOOP --quiet
loop i=0..BLOCK_size-1 --quiet
loop j=1..COL --quiet
tmp2 = BLOCK_start + i
B4[tmp2, j] = A[tmp2, j]
endloop
endloop
endloop
est4 = $stopwatch
eval est4
loop LOOP --quiet
loop i=BLOCK_start..BLOCK_start-1+BLOCK_size --quiet
loop j=1..COL --quiet
B5[i, j] = A[i, j]
endloop
endloop
endloop
eval $stopwatch
printf "Difference 1: %f\n", est1 - est2
printf "Difference 2: %f\n", est3 - est4
</hansl>
--
Marcin Błażejowski