Re: [Gretl-devel] memory with many discrete variables in large (panel) dataset

Thursday, 2 January 2014

On Wed, 1 Jan 2014, Sven Schreiber wrote:

...
 I'd like to raise an issue which is probably quite fundamental in
terms
 of data handling. I'm currently working on a large panel dataset,
 meaning that gretl occupies more than 600MB of memory with the data
 loaded. In terms of file sizes, the Stata file version occupies 42MB,
 the gretl workfile only about 3.5MB. This shows that gretl stores the
 data very efficiently (by zipping), but OTOH opening and saving takes
 quite some time. Actually it is much faster even in gretl to import the
 Stata file instead of the native gretl file. 
I'd like to experiment with this. Can you give a little more detail 
on the characteristics of the data file? That is (roughly) how many
observations? And how many variables? And what sort of ratio of 
quantitative variables to small-integer coded variables?

I've tried generating a random dataset with 10000 observations on 
850 variables, 50 of them normal and the remaining 800 binary. On 
disk This occupies 26MB uncompressed and 7MB with maximal gzip 
compression. Reading the gzipped version takes a little longer but 
in neither case is the delay very noticeable. So I'd like to know 
which dimension(s) to increase to make the gzipped load time 
problematic.

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] memory with many discrete variables in large (panel) dataset