Am 08.01.2014 04:31, schrieb Allin Cottrell:
On Tue, 7 Jan 2014, Sven Schreiber wrote:
1) Writing data as text
...
What's new here is that I've worked out a substantially faster way
of determining the appropriate the format specification for each
series. In my tests this cuts several seconds off big data writes.
This improvement is independent of the compression level and the
skip-padding status.
Yes the results are impressive! I'm just wondering whether it would also
help in this context to use the information on which variables are
officially "discrete". Most of them (not all) will be integer-valued for
example.
Also, in principle the data are now changed with respect to the old
format I guess; hopefully just within the precision error margin of
doubles, but this should probably be tested -- or did you already?
BTW, I have always wondered how gretl actually determines the true/false
value of something like "indicator == 1" given that the series are
stored as floating-point values and the corresponding representation
problems of integers...
2) Writing binary data
Tweaks to our writing of data in text form are be useful, but
there's no question that if you want raw speed you're better using
C's fwrite and fread to zap big swathes of bytes from RAM to disk or
vice versa. I've implemented a --binary option to "store" that
causes gretl to write out an XML .gdt file containing the metadata
plus a binary .bdt file containing doubles.
Hm, that sounds as if now a .gdt file could indicate either a
traditional standalone file or a new metadata file, which are quite
different things, no? A new suffix would seem in order. (.mgdt?)
This shows the relationship between binary vs text, compression vs
none, and skip padding vs not. As you'll see, if speed is the only
concern (and disk space not an issue), a straight binary write/read
with no compression or skipping of padding wins the race. On the
other hand you can get quite nice performance using skip-padding and
compression level 1: write in 2.4 seconds, read in 1.2 seconds, and
use only 4 percent of the maximal disk space.
Yes I'm a big fan of compression level 1 -- why not make that the default?
Again, impressive!
thanks,
sven