On Wed, 1 Jan 2014, Sven Schreiber wrote:
Am 01.01.2014 15:14, schrieb Riccardo (Jack) Lucchetti:
> On Wed, 1 Jan 2014, Riccardo (Jack) Lucchetti wrote:
>
>> On Wed, 1 Jan 2014, Sven Schreiber wrote:
>>
>>> Nevertheless, given the fact that many of the variables in there are
>>> discrete integer-valued which could be saved as a signed single byte,
>>> and that (I think) I know that gretl stores every variable as
>>> double-precision 8-byte numbers, there are huge potential memory savings
>>> and speed improvements. Maybe that's something to think about for the
>>> longer term.
>>
>> This is an issue Allin and I have been discussing since the ToruĊ
>> conference. Eventually we dropped the idea, because to properly
>> support series which contain anything but doubles we'd have to rewrite
>> pretty much everything internally (however, examining the issue did
>> lead us to a substantial rewrite of the internals, which made things
>> more efficient and streamlined --- see the commits made around the end
>> of July 2011 if you're curious).
>
> I realise I wasn't quite clear: the reason why I said "we've been
> discussing" is that at least myself haven't totally given up the idea.
> However, it was put on hold indefinitely because it wasn't obvious that
> the advantages would outweigh the massive rewrite effort that would be
> necessary.
>
It is not surprising that the whole issue is problematic, and nobody
wants a disruptive rewrite just because of that. But actually, the RAM
memory usage itself is not such a big deal IMO, given today's typical
RAM capacities.
Instead, what made me think about this at all was the noticeable delay
when opening and especially saving the workfile. So maybe it would be
enough to speed up the gzipping, perhaps by trading off the compressed
size. For example, instead of the 42:3.5 relation between the Stata and
gretl files, getting instead a 10MB gretl file would still be much
smaller than the Stata competition, but the threefold increase of the
file size may allow the gzip to be sufficiently faster?
That's worth experimenting with. At present we apply maximal
compression, but maybe that's not always the best thing to do.
Allin