Am 02.01.2014 19:14, schrieb Allin Cottrell:
On Wed, 1 Jan 2014, Sven Schreiber wrote:
> I'd like to raise an issue which is probably quite fundamental in terms
> of data handling. I'm currently working on a large panel dataset,
> meaning that gretl occupies more than 600MB of memory with the data
> loaded. In terms of file sizes, the Stata file version occupies 42MB,
> the gretl workfile only about 3.5MB. This shows that gretl stores the
> data very efficiently (by zipping), but OTOH opening and saving takes
> quite some time. Actually it is much faster even in gretl to import the
> Stata file instead of the native gretl file.
I'd like to experiment with this. Can you give a little more detail
on the characteristics of the data file? That is (roughly) how many
observations? And how many variables? And what sort of ratio of
quantitative variables to small-integer coded variables?
First of all, I just found out that opening and saving the same data is
much faster if the dataset is left as undated, as opposed to using panel
index variables. On saving, gretl reports in the pop-up window 177052KB
in the first case, versus 571712KB in the panel-structured case. Not
sure if that's expected, the difference seems quite extreme.
The max N and T dimensions are 3157 and 19, but quite unbalanced. Gretl
reports n=59983, but I think this must include all the missings. There
are about 1200 variables in the file. Hard to tell how many are discrete
(any scripting idea here?), definitely most of them.
hth,
sven