On Wed, 22 Apr 2009, Sven Schreiber wrote:
I was wondering why the file size of a gretl database is so big
compared
to the other storage formats. For example, I have a relatively large
dataset which fills up 1.4MB when exported to csv. (Of course, gretls
native and zipped file is much smaller, about 170KB.) When I save this
data to gretl binary database format I get a 2MB file. And I thought
that text files like csv cannot be beaten in terms of explicitness and
bloat...
It depends on the nature of the data. In gretl db format data are
stored as floats (4 bytes on a 32-bit system). This preserves
about 8 significant digits, with is OK for primary economic data.
So the size of a gretl .bin file is perfectly predictable. With
CSV, gretl checks the number of significant digits and does not
write more bytes than necessary -- so a CSV file may be larger or
smaller than a db .bin file.
The following script illustrates:
<script>
nulldata 10000
loop i=1..20
# lots of digits
series x$i = normal()
endloop
store random1.csv x* --csv
store random1.bin x* --database
loop i=1..20
# few digits
series y$i = i
endloop
store random2.csv y* --csv
store random2.bin y* --database
<script>
On one run here I get, for example:
waverley:~/src/build/cli$ ls -l random1.bin random1.csv
-rw-r--r-- 1 cottrell users 800000 2009-04-22 12:58 random1.bin
-rw-r--r-- 1 cottrell users 2699928 2009-04-22 12:58 random1.csv
waverley:~/src/build/cli$ ls -l random2.bin random2.csv
-rw-r--r-- 1 cottrell users 800000 2009-04-22 12:58 random2.bin
-rw-r--r-- 1 cottrell users 510071 2009-04-22 12:58 random2.csv
random2.bin can be gzipped to 871 bytes (less than 4 times the
size of the generating script), but of course this is a rather
special case.
Allin.