Re: [Gretl-devel] memory with many discrete variables in large (panel) dataset

Thursday, 2 January 2014

Am 02.01.2014 19:14, schrieb Allin Cottrell:
...
 On Wed, 1 Jan 2014, Sven Schreiber wrote:

> I'd like to raise an issue which is probably quite fundamental in terms
> of data handling. I'm currently working on a large panel dataset,
> meaning that gretl occupies more than 600MB of memory with the data
> loaded. In terms of file sizes, the Stata file version occupies 42MB,
> the gretl workfile only about 3.5MB. This shows that gretl stores the
> data very efficiently (by zipping), but OTOH opening and saving takes
> quite some time. Actually it is much faster even in gretl to import the
> Stata file instead of the native gretl file.

 I'd like to experiment with this. Can you give a little more detail
 on the characteristics of the data file? That is (roughly) how many
 observations? And how many variables? And what sort of ratio of
 quantitative variables to small-integer coded variables? 
First of all, I just found out that opening and saving the same data is 
much faster if the dataset is left as undated, as opposed to using panel 
index variables. On saving, gretl reports in the pop-up window 177052KB 
in the first case, versus 571712KB in the panel-structured case. Not 
sure if that's expected, the difference seems quite extreme.

The max N and T dimensions are 3157 and 19, but quite unbalanced. Gretl 
reports n=59983, but I think this must include all the missings. There 
are about 1200 variables in the file. Hard to tell how many are discrete 
(any scripting idea here?), definitely most of them.

hth,
sven

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] memory with many discrete variables in large (panel) dataset