Re: [Gretl-devel] memory with many discrete variables in large (panel) dataset

Wednesday, 1 January 2014

Am 01.01.2014 15:14, schrieb Riccardo (Jack) Lucchetti:
...
 On Wed, 1 Jan 2014, Riccardo (Jack) Lucchetti wrote:

> On Wed, 1 Jan 2014, Sven Schreiber wrote:
>
>> Nevertheless, given the fact that many of the variables in there are
>> discrete integer-valued which could be saved as a signed single byte,
>> and that (I think) I know that gretl stores every variable as
>> double-precision 8-byte numbers, there are huge potential memory savings
>> and speed improvements. Maybe that's something to think about for the
>> longer term.
>
> This is an issue Allin and I have been discussing since the Toruń
> conference. Eventually we dropped the idea, because to properly
> support series which contain anything but doubles we'd have to rewrite
> pretty much everything internally (however, examining the issue did
> lead us to a substantial rewrite of the internals, which made things
> more efficient and streamlined --- see the commits made around the end
> of July 2011 if you're curious).

 I realise I wasn't quite clear: the reason why I said "we've been
 discussing" is that at least myself haven't totally given up the idea.
 However, it was put on hold indefinitely because it wasn't obvious that
 the advantages would outweigh the massive rewrite effort that would be
 necessary.

It is not surprising that the whole issue is problematic, and nobody 
wants a disruptive rewrite just because of that. But actually, the RAM 
memory usage itself is not such a big deal IMO, given today's typical 
RAM capacities.

Instead, what made me think about this at all was the noticeable delay 
when opening and especially saving the workfile. So maybe it would be 
enough to speed up the gzipping, perhaps by trading off the compressed 
size. For example, instead of the 42:3.5 relation between the Stata and 
gretl files, getting instead a 10MB gretl file would still be much 
smaller than the Stata competition, but the threefold increase of the 
file size may allow the gzip to be sufficiently faster?

thanks,
sven

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] memory with many discrete variables in large (panel) dataset