On 15.10.2019 00:44, Allin Cottrell wrote:
On Mon, 14 Oct 2019, Allin Cottrell wrote:
> So all the data seems to (pretty much) agree: the point at which
> reduction in copy-time turns into increase, as we crank up the number
> of columns to copy at once, is in the neighbourhood of the L2 cache
> size, which is typically 1 MB (2^20) these days.
Oof, sorry people! I'm afraid that matrix-copy timings based on the
script I posted are mostly artifacts of an error in the script --
revealed when I finally checked for B == A after the copy. The limit
@n for the inner loop across columns was wrong, with the result that
not all columns were getting copied. Here are my current timings --
relatively flat in respect of the number/size of chunks:
My results:
Dell desktop with i5-4460 @3.2 GHz (L1 (data): 4x32 KB, L2: 4x256 KB,
L3: 6 MB):
1 columns per chunk: 2,7478s
2 columns per chunk: 2,6180s
5 columns per chunk: 2,7415s
10 columns per chunk: 3,3991s
25 columns per chunk: 3,3736s
50 columns per chunk: 6,5499s
100 columns per chunk: 6,6246s
125 columns per chunk: 6,6132s
500 columns per chunk: 6,9947s
Dell laptop with i7-8550U @1.80 GHz (L1 (data): 4x32 KB, L2: 4x256 KB,
L3: 8 MB):
1 columns per chunk: 2,5714s
2 columns per chunk: 2,3895s
5 columns per chunk: 2,4755s
10 columns per chunk: 2,4302s
25 columns per chunk: 2,4552s
50 columns per chunk: 2,3811s
100 columns per chunk: 2,7048s
125 columns per chunk: 3,2614s
500 columns per chunk: 3,1312s
Desktop with AMD Phenom II X6 1100T @3.3 GHz (L1 (data): 6x64 KB, L2:
6x512 KB, L3: 6 MB):
1 columns per chunk: 5,9561s
2 columns per chunk: 5,7848s
5 columns per chunk: 5,6697s
10 columns per chunk: 6,3291s
25 columns per chunk: 6,6432s
50 columns per chunk: 7,0944s
100 columns per chunk: 9,3844s
125 columns per chunk: 9,8562s
500 columns per chunk: 6,5650s
Marcin
--
Marcin Błażejowski