On Mon, 11 Jan 2021, Sven Schreiber wrote:
Am 11.01.2021 um 16:12 schrieb Sven Schreiber:
> Hi,
> another minor glitch: I had a large Stata dta file and imported it into
> gretl - this actually worked flawlessly via dragging-and-dropping (on
> Windows). Then I went on to save it; in the save dialog gretl inserted
> the extension "gdt" automatically, and when I changed the format to the
> binary gdtb type in the drop-down box at the bottom, in the end I had a
> file ending in .gdt.gdtb.
> Obviously very easy to correct manually in various ways along the steps
> taken, but maybe now we want to shake out even minor stuff like this.
Hm, I'm observing something strange with this conversion to gdtb, and
using the choice of "new missing codes".
Please try the test under "data-io" at
https://sourceforge.net/p/gretl/workspace/ci/master/tree/
Here's the output I get from running MB26.sh, which creates a
dataset of (somewhat greater than) the 26 MB you mention, and saves
then loads it in various ways:
plain gdt : save time 1.649s, size 72745837 bytes
plain gdt : load time 0.950s
trad. gdtb : save time 0.889s, size 25103074 bytes
trad. gdtb : load time 0.148s
gdtb gzipped=6: save time 1.204s, size 25007970 bytes
gdtb gzipped=6: load time 0.140s
purebin : save time 0.022s, size 29121024 bytes
purebin : load time 0.021s
There's not much to be got out of gzip compression because the data
are random normal. "Traditional" gdtb is faster for both writing and
reading than plain gdt. The new "purebin" is of course much faster
still.
Allin