Re: [Gretl-devel] memory with many discrete variables in large (panel) dataset

Wednesday, 8 January 2014

On Wed, 8 Jan 2014, Sven Schreiber wrote:

...
 Am 08.01.2014 04:31, schrieb Allin Cottrell:
> On Tue, 7 Jan 2014, Sven Schreiber wrote:
>
>
> 1) Writing data as text
>
 ...
>
> What's new here is that I've worked out a substantially faster way
> of determining the appropriate the format specification for each
> series. In my tests this cuts several seconds off big data writes.
> This improvement is independent of the compression level and the
> skip-padding status.

 Yes the results are impressive! I'm just wondering whether it would also
 help in this context to use the information on which variables are
 officially "discrete". Most of them (not all) will be integer-valued for
 example. 
Good idea. Even if they're not integer-valued, they certainly should 
not require 17 digits. The artifacts that I mentioned disappear if 
you use the printf format %.15g, and this could safely be applied to 
series marked as discrete without any need for elaborate testing.

...
 Also, in principle the data are now changed with respect to the old
 format I guess; hopefully just within the precision error margin of
 doubles, but this should probably be tested -- or did you already? 
I did. We need 17 signifcant digits if we're to reproduce exactly 
results obtained using (e.g.) logs and random numbers (by reproduce 
I mean: run a regression, save data, reopen data, run regression 
again.) But we still use 17 digits if that's required -- that is, if 
printing to 15 digits doesn't leave trailing zeros. [Personally, I 
think logs and random numbers should be generated by script not 
saved in a data file, but anyway.]

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] memory with many discrete variables in large (panel) dataset