[Gretl-devel] Re: Speed of matrix "block" operations

Saturday, 12 October 2019

On Sat, 12 Oct 2019, Marcin Błażejowski wrote:

...
 some time ago I found an interesting disscusion on code optimisiation
in
 gcc with '-avx2 ' flag in case of copying blocks of matrixes. So, I
 wrote a simple script and I got the following results [...] 
I'm attaching a modified verson of your script which may clarify
things. The relative execution times of your variants are mostly a
function of how much excess indexation arithmetic you're doing. Do as
little arithmetic as possible in the inner loop in particular. Your
first variant does 160000 additions/subtractions where 2 will do just
fine.

That said, copying element-by-element by row -- as in all your
variants -- is very inefficient for two reasons.

First, gretl matrices are in column-major order: column elements are
adjacent in memory, row elements are separated by the number of rows
in the matrix. So go by columns whenever possible.

Second, one should uses ranges rather than single-element indices
whenever possible. If the data in the given range are contiguous in
memory, libgretl will use the C library's memcpy() to copy a chunk of
data in one call.

Here are my timings for the 6 variants in my version of your script
(on i7, Arch Linux):

loop 1: 5.3523 (add/sub = 160000)
loop 2: 4.6882 (add/sub = 80000)
loop 3: 4.9248 (add/sub = 80000)
loop 4: 4.5084 (add/sub = 40001)
loop 5: 3.3973 (add/sub = 2)
loop 6: 0.0109 (add/sub = 2; column chunks)

Note the huge speed-up when copying columns as chunks.

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Gretl-devel] Re: Speed of matrix "block" operations