[Gretl-devel] Re: Speed of matrix "block" operations

Monday, 14 October 2019

On Mon, 14 Oct 2019, Allin Cottrell wrote:

...
 So all the data seems to (pretty much) agree: the point at which
reduction in 
 copy-time turns into increase, as we crank up the number of columns to copy 
 at once, is in the neighbourhood of the L2 cache size, which is typically 1 
 MB (2^20) these days. 
Oof, sorry people! I'm afraid that matrix-copy timings based on the 
script I posted are mostly artifacts of an error in the script -- 
revealed when I finally checked for B == A after the copy. The limit 
@n for the inner loop across columns was wrong, with the result that 
not all columns were getting copied. Here are my current timings -- 
relatively flat in respect of the number/size of chunks:

matrix size = 2500000, (20000000 bytes)
   1 columns per chunk: 2.4073s
   2 columns per chunk: 2.3623s
   5 columns per chunk: 2.4675s
  10 columns per chunk: 2.4984s
  25 columns per chunk: 2.5216s
  50 columns per chunk: 2.5999s
100 columns per chunk: 3.2440s
125 columns per chunk: 3.5543s
500 columns per chunk: 2.8366s

And here's the corrected script:

<hansl>
set verbose off
clear

scalar ROW = 5000
scalar COL = 500
scalar LOOP = 600
matrix chunkcols = {1, 2, 5, 10, 25, 50, 100, 125, 500}

matrix A = mnormal(ROW, COL)
matrix B = zeros(ROW, COL)
printf "matrix size = %d, (%d bytes)\n", ROW*COL, ROW*COL*8

loop k=1..nelem(chunkcols) --quiet
     cols = chunkcols[k]
     # n = COL / cols # WRONG !!
     n = COL - cols + 1
     B .= 0
     set stopwatch
     loop LOOP --quiet
         loop for (j=1; j<=n; j+=cols) --quiet
             # printf "copy cols %d to %d (n=%d)\n", j, j+cols-1, n
             B[,j:j+cols-1] = A[,j:j+cols-1]
         endloop
     endloop
     printf "%3d columns per chunk: %.4fs\n", cols, $stopwatch
     # printf "max(abs(A-B)) = %g\n", max(abs(A-B))
endloop
</hansl>

Some evidence remains that smaller chunks are better, but nothing 
like as striking as before.

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Gretl-devel] Re: Speed of matrix "block" operations