On 13.10.2019 02:15, Allin Cottrell wrote:
Thanks, Jack. So, given the options I posited, our machines agree on
best chunk size of (25 * 5000 * 8) bytes (=~ 1 MB) for use with
memcpy. (5000 being the number of rows in the matrix to be copied, and
8 the number of bytes to represent a double-precision floating point
Now, to optimize libgretl's copying of contiguous data, we just have
to figure out how that relates to the size of L1 or L2 cache, or
whatever is truly the relevant hardware parameter here!
But Allin, isn't it something that could/should(?) be done by AVX
extensions? But I have to admit that I no idea how to use such low-level
optimisation for scripting language.