Am 12.01.2021 um 16:34 schrieb Allin Cottrell:
On Mon, 11 Jan 2021, Sven Schreiber wrote:
> Sorry, observing this on Windows here I just did a much cruder
> comparison: Loading the same dataset from gdtb (new I think)
If by "new" you mean the new pure-binary format, a gretl data file
will be in that format only if saved via CLI with the --purebin option.
Sorry for the confusion: No, I meant that in the GUI on saving I'm
getting a choice how to represent missings, "new" or old/compatible way.
I chose new.
What are the specs of the machine in question? On my 2014-vintage
desktop the 26MB load time is less then a second for both formats
(and 20 milliseconds for the new binary format).
That PC is a veteran Ivybridge (I think) quadcore i5 with 8GB memory and
SSD drives.
I just checked on a newer i7/16GB Laptop and yes it is somewhat faster
(also Win10). But I guess the relevant property might be that gretl's
memory consumption when loading the 26MB gdtb file goes up from <10MB to
700MB. So I suppose it's not so much the disk file size but
probably
the heavily compressed variables with many discrete-valued or dummy
series in the dataset. (This was originally a Stata file.) And/or the
representation of missings.
> Hm, then I wanted to check what the file size would be without
> compression. So I exported to gdtb, setting compression to 0 (all in the
> GUI); after a while I get an error window: " .... error zipping" !?
I can't reproduce that, either on Linux or Windows.
On this laptop now I don't see this anymore, either. Could be that the
system partition on the other PC was almost full. So the non-compressed
gdtb file indeed takes up 745MB.
OK, that was with "new missings representation". If instead I use the
old missings choice, that brings me back to the question about
compression, because:
- old missings, compression level 0, gdtb: 25900KB
- old missings, compression level 1, gdtb: 25900KB
How is this same number of KB possible?
(Sorry, can't send the file, it's sensitive data.)
thanks
sven