On Wed, 1 Jan 2014, Sven Schreiber wrote:
I'd like to raise an issue which is probably quite fundamental in
terms
of data handling. I'm currently working on a large panel dataset,
meaning that gretl occupies more than 600MB of memory with the data
loaded. In terms of file sizes, the Stata file version occupies 42MB,
the gretl workfile only about 3.5MB. This shows that gretl stores the
data very efficiently (by zipping), but OTOH opening and saving takes
quite some time. Actually it is much faster even in gretl to import the
Stata file instead of the native gretl file.
I'd like to experiment with this. Can you give a little more detail
on the characteristics of the data file? That is (roughly) how many
observations? And how many variables? And what sort of ratio of
quantitative variables to small-integer coded variables?
I've tried generating a random dataset with 10000 observations on
850 variables, 50 of them normal and the remaining 800 binary. On
disk This occupies 26MB uncompressed and 7MB with maximal gzip
compression. Reading the gzipped version takes a little longer but
in neither case is the delay very noticeable. So I'd like to know
which dimension(s) to increase to make the gzipped load time
problematic.
Allin