[Gretl-devel] Re: Speed of matrix "block" operations

Monday, 14 October 2019

On 12.10.2019 22:01, Allin Cottrell wrote:
...
 On Sat, 12 Oct 2019, Marcin Błażejowski wrote:

 I'm attaching a modified verson of your script which may clarify
 things. The relative execution times of your variants are mostly a
 function of how much excess indexation arithmetic you're doing. Do as
 little arithmetic as possible in the inner loop in particular. Your
 first variant does 160000 additions/subtractions where 2 will do just
 fine. Ok. And that is something I would expect.
...
 That said, copying element-by-element by row -- as in all your
 variants -- is very inefficient for two reasons.

 First, gretl matrices are in column-major order: column elements are
 adjacent in memory, row elements are separated by the number of rows
 in the matrix. So go by columns whenever possible. Ok, Does it mean that if I have
a matrix which I expand by adding new
"tuples" I should append these tuples as new columns instead of new rows?
...
 Second, one should uses ranges rather than single-element indices
 whenever possible. If the data in the given range are contiguous in
 memory, libgretl will use the C library's memcpy() to copy a chunk of
 data in one call. I know. I just wanted to replicate an example I mentioned at the
beggining of my email, because people discovered that gcc was able to
'vectorize' only "a[i,j] = b[i,j]" copying (i.e. load the chunks of
matrixes into YMM/XMM registers and call specific AVX functions on it).
...
 Here are my timings for the 6 variants in my version of your script
 (on i7, Arch Linux):

 loop 1: 5.3523 (add/sub = 160000)
 loop 2: 4.6882 (add/sub = 80000)
 loop 3: 4.9248 (add/sub = 80000)
 loop 4: 4.5084 (add/sub = 40001)
 loop 5: 3.3973 (add/sub = 2)
 loop 6: 0.0109 (add/sub = 2; column chunks)

 Note the huge speed-up when copying columns as chunks. 
My results (Inspiron laptop with Debian, i7-8550U CPU @ 1.80GHz Skylake):

loop 1:  11,1638 (add/sub = 160000)
loop 2:  9,6939 (add/sub = 80000)
loop 3:  9,8530 (add/sub = 80000)
loop 4:  9,5306 (add/sub = 40001)
loop 4a: 8,7002 (add/sub = 2001)
loop 5:  8,5831 (add/sub = 2)
loop 6:  0,0163 (add/sub = 2; column chunks)

Marcin

-- 
Marcin Błażejowski

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Gretl-devel] Re: Speed of matrix "block" operations