[Gretl-devel] Re: Speed of matrix "block" operations

Saturday, 12 October 2019

On Sat, 12 Oct 2019, Marcin Błażejowski wrote:

...
 some time ago I found an interesting disscusion on code optimisiation
in
 gcc with '-avx2 ' flag in case of copying blocks of matrixes. 
Marcin, here's another observation, which chimes with what I think 
may have been your original intent.

As I mentioned in my previous reply, one really wants to copy by 
column when possible, and take advantage of libgretl's use of 
memcpy() as opposed to copying element-by-element. But... the 
question arises: is there such a thing as being "too greedy" in use 
of memcpy? Might it help to divide the data to be copied into 
smaller blocks? And the answer is Yes, if the matrix is big enough.
(I guess this has to do with the available cache.)

I'm appending an example script below. We have a big matrix (5000 x 
500) and we'd like to copy its entire content. We try copying by 
chunks of columns, starting at 1 column per chunk and going up to 
the full 500 in a single chunk. At first the copy time declines, but 
in this example the "too greedy" point arrives when copying 50 
columns at a time. And if we try to copy all 500 columns in one go, 
that's actually worse than going by individual columns.

Here are my timings:

   1 columns per chunk: 2.4016s
   2 columns per chunk: 1.1832s
   5 columns per chunk: 0.3720s
  10 columns per chunk: 0.1427s
  25 columns per chunk: 0.0708s
  50 columns per chunk: 0.1604s
100 columns per chunk: 0.5870s
125 columns per chunk: 0.8519s
500 columns per chunk: 2.8247s

And here's the script:

<hansl>
set verbose off
clear

scalar ROW = 5000
scalar COL = 500
scalar LOOP = 600

matrix A = mnormal(ROW, COL)
matrix B = zeros(ROW, COL)

matrix chunkcols = {1, 2, 5, 10, 25, 50, 100, 125, 500}

loop k=1..nelem(chunkcols) --quiet
     cols = chunkcols[k]
     n = COL / cols
     set stopwatch
     loop LOOP --quiet
         loop for (j=1; j<=n; j+=cols) --quiet
             B[,j:j+cols-1] = A[,j:j+cols-1]
         endloop
     endloop
     printf "%3d columns per chunk: %.4fs\n", cols, $stopwatch
endloop
</hansl>

Quite interesting, and maybe we can make use of this in libgretl's 
internals.

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Gretl-devel] Re: Speed of matrix "block" operations