[Gretl-users] Re: Loading large datasets into gretl

Wednesday, 17 April 2019

On Wed, 17 Apr 2019, Logan Kelly wrote:

...
 I have students who are working with very big dataset--around 9 
 million observations. I had one student try to load a 4 GB csv 
 file into gretl, and gretl loaded it! But with some errors. 
What sort of errors -- can you elaborate?

...
 So my question are

 1. What is the largest data set one should expect gretl to handle? 
Well, that's going to depend on how much RAM you have.

...
 2. Are there any suggestions for handling large datasets in gretl?

For one thing, with many millions of observations any tiny, tiny 
effect will be "statistically significant"; it's probably a good 
idea to down-sample (perhaps at random) to an n in the hundreds of 
thousands.

...
 3. Is there a better file type than csv to import large datasets 
 into gretl? 
Not really; our CSV importer is about the most effective of our 
various importers.

A general comment: In gretl, every data value is stored as a 
"double" (a double-precision floating-point value, which occupies 64 
bits or 8 bytes). But in some huge datasets many of the variables 
may be representable in a much smaller data type, such as a single 
byte (8 bits). If you're loading a 4 GB CSV file with a lot of 0s 
and 1s as data values, those values will be expanded by a factor of 
8 in gretl's in-memory version -- which may make the difference 
between feasible and infeasible, for given RAM.

This is something we may want to think about in future. It will not 
be easy to allow smaller data types for series but maybe that's 
something we need to aim for, eventually.

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Gretl-users] Re: Loading large datasets into gretl