On Sun, 13 Oct 2019, Riccardo (Jack) Lucchetti wrote:
On Sat, 12 Oct 2019, Allin Cottrell wrote:
> Here are my timings:
[...]
and here are mine, on two different machines:
Laptop at home:
1 columns per chunk: 4.1346s
2 columns per chunk: 2.1247s
5 columns per chunk: 0.7616s
10 columns per chunk: 0.2831s
25 columns per chunk: 0.1298s
50 columns per chunk: 0.3060s
100 columns per chunk: 0.9900s
125 columns per chunk: 1.2825s
500 columns per chunk: 4.1578s
Desktop at work:
1 columns per chunk: 2.4007s
2 columns per chunk: 0.6390s
5 columns per chunk: 0.1440s
10 columns per chunk: 0.0739s
25 columns per chunk: 0.0363s
50 columns per chunk: 0.0724s
100 columns per chunk: 0.1957s
125 columns per chunk: 0.3581s
500 columns per chunk: 2.8391s
Thanks, Jack. So, given the options I posited, our machines agree on
a best chunk size of (25 * 5000 * 8) bytes (=~ 1 MB) for use with
memcpy. (5000 being the number of rows in the matrix to be copied,
and 8 the number of bytes to represent a double-precision floating
point value.)
Now, to optimize libgretl's copying of contiguous data, we just have
to figure out how that relates to the size of L1 or L2 cache, or
whatever is truly the relevant hardware parameter here!
Allin