Re: [Gretl-devel] database file size

Wednesday, 22 April 2009

On Wed, 22 Apr 2009, Sven Schreiber wrote:

...
 I was wondering why the file size of a gretl database is so big
compared
 to the other storage formats. For example, I have a relatively large
 dataset which fills up 1.4MB when exported to csv. (Of course, gretls
 native and zipped file is much smaller, about 170KB.) When I save this
 data to gretl binary database format I get a 2MB file. And I thought
 that text files like csv cannot be beaten in terms of explicitness and
 bloat... 
It depends on the nature of the data. In gretl db format data are
stored as floats (4 bytes on a 32-bit system).  This preserves
about 8 significant digits, with is OK for primary economic data.
So the size of a gretl .bin file is perfectly predictable.  With
CSV, gretl checks the number of significant digits and does not
write more bytes than necessary -- so a CSV file may be larger or
smaller than a db .bin file.

The following script illustrates:

<script>
nulldata 10000

loop i=1..20
  # lots of digits
  series x$i = normal()
endloop
store random1.csv x* --csv
store random1.bin x* --database

loop i=1..20
  # few digits
  series y$i = i
endloop
store random2.csv y* --csv
store random2.bin y* --database
<script>

On one run here I get, for example:

waverley:~/src/build/cli$ ls -l random1.bin random1.csv
-rw-r--r-- 1 cottrell users  800000 2009-04-22 12:58 random1.bin
-rw-r--r-- 1 cottrell users 2699928 2009-04-22 12:58 random1.csv

waverley:~/src/build/cli$ ls -l random2.bin random2.csv
-rw-r--r-- 1 cottrell users 800000 2009-04-22 12:58 random2.bin
-rw-r--r-- 1 cottrell users 510071 2009-04-22 12:58 random2.csv

random2.bin can be gzipped to 871 bytes (less than 4 times the
size of the generating script), but of course this is a rather
special case.

Allin.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] database file size