Re: [Gretl-devel] Gretl and PUMS Data

Thursday, 25 October 2007

On Thu, 25 Oct 2007, Allin Cottrell wrote:

...
 When gretl encounters non-numeric data for a particular variable
 in a CSV import it treats the values of that variable as strings,
 constructs a numeric coding, and creates a "string table" that
 presents the coding to the user.  BUT this is done only if
 non-numeric data are encountered in the first data row for the
 variable in question.  That is, if we read (apparently) numeric
 data on rows 1 to k-1, then encounter non-numeric data on row k,
 we flag an error and stop reading.

 The trouble is that some of the PUMS variables are codings, some
 but not all values of which contain non-numeric characters.  For
 example, NAICSP, the "NAICS Industry Code", which has values
 (among others) of 1133 and 113M.

 Here's a solution, perhaps not permanent if we can think of
 something better: I've added a new parameter to the "set" command,
 namely "codevars".  You can do, for example, [...]

The problem I see with this approach is that one has to know in advance 
which variables must be treated specially. With large datasets, you may 
not; the improved debugging info does help, but IMO only to an extent. A 
possible alternative may be the following: first, read all the data as if 
they were all strings. Then, with the data already in RAM, convert to 
numeric whenever possible. This way, you read the datafile only once, and 
the way stays open if we want, for instance, flag some of the variables as 
dummies or discrete variables straight away.

What do you think?

Riccardo (Jack) Lucchetti
Dipartimento di Economia
Università Politecnica delle Marche

r.lucchetti(a)univpm.it
http://www.econ.univpm.it/lucchetti

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] Gretl and PUMS Data