[Gretl-devel] Re: problem with large gdtb panel datafile and index variables

Saturday, 15 July 2023

Am 14.07.2023 um 22:05 schrieb Allin Cottrell:
...

 Actually, I think it's worth fixing, even if partially, now.

 First, I've come to understand a point that I wasn't clear on before, 
 namely that the expensive GUI-specific check is only about more 
 explicit error-reporting; it won't catch any errors in addition to the 
 check run by "setobs". That's because if there are any duplicated 
 pairs of values in the unit and period series, this is bound to show 
 up as (total number of observations) > (number of distinct units) * 
 (number of disinct periods), which condition is checked by setobs, and 
 also checked in the GUI _prior_ to doing the expensive thing.

 So we can immediately make the more expensive check (EC) conditional 
 on an error exposed by the simpler check. If your index series are OK 
 we won't call EC. That's done in git. OK, thanks. That sounds like the most
relevant case.
...

 Second (not surprisingly, doh!) the expensive check can be reduced 
 from O(n^2) to O(n log n). That too is in git.

 The remaining question: Is n log n still too complex for a jumbo dataset?

 ...
 It runs pretty fast for me with 10000 units and 20 periods.
 So if I understand the background correctly, we have n = 200K, and apart 
from some constant factor we're comparing  40bn (4e10) to 2.4m (2.4e6). 
That sounds like a sufficient speedup factor to me! My biggest dataset 
right has something like 6000 units and close to 900 periods, so n = 
5.4m almost. Then the comparison is 2.9e13 vs. 8.4e7, even more radical.

BTW, all that dataset shrinking didn't really pay off in the end. I 
guess this is due to gretl's requirement of a "nominally balanced" 
panel. So it seems that gretl basically reinstates most of the obs with 
missings because they are needed to fill up the rectangular grid, 
combined with the usage of panel index variables, because then the 
values of those need to be stored at least.

cheers

sven

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Gretl-devel] Re: problem with large gdtb panel datafile and index variables