Re: [Gretl-devel] Gretl and PUMS Data

Friday, 26 October 2007

On Fri, 26 Oct 2007, Sven Schreiber wrote:

...
 Riccardo (Jack) Lucchetti schrieb:
 > 
 > A possible alternative may be the following: first, read all 
 > the data as if they were all strings. Then, with the data 
 > already in RAM, convert to numeric whenever possible. This 
 > way, you read the datafile only once, and the way stays open 
 > if we want, for instance, flag some of the variables as 
 > dummies or discrete variables straight away.

 Jack's idea sounds good. If I understand correctly, it's an 
 approach to convert as much as possible to usable variables and 
 data, and inform the user about the rest. (Rather than throwing 
 errors and stopping.) That would be good. 
I like Jack's idea too, with a couple of reservations.  

First, I'm not too keen on reading all the data into RAM as 
strings.  To ensure no data loss, these strings would have to be 
fairly long -- say 32 characters.  Now with something like PUMS 
you can have tens or hundreds of thousands of observations on 
hundreds of variables.  This makes for a big memory chunk when 
stored as doubles, and perhaps 4 times as big when stored as 
strings.  So I tend to favour two passes.

Second, I think that attempting to parse all non-numeric stuff as 
coded data should probably be governed by an explicit option.  
It'll work fine on a well-formed PUMS file, but could cause a 
nasty mess with a very large data file that has a few extraneous 
non-numeric characters in it, 100% CPU for a long time.  Think of 
a file with 200000 observations and a stray 'x' on the last row.

...
 BTW, on a (only loosely) related issue, it would be useful if 
 gretl could handle files like some I recently downloaded from 
 the US BLS site; they report quarterly data with an additional 
 row for year averages, like so:

 1950Q01 3.5
 1950Q02 4.2
 1950Q03 9.4
 1950Q04 5.3
 1950Q05 <you do the calc ;-)> 
Yes, I've seen data of that sort too.  I'll think about that 
issue.

Allin.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] Gretl and PUMS Data