Gretl and PUMS Data
by Allin Cottrell
I recently responded to this question from a gretl user:
> I have been trying to figure out how to Gretl for Public Use
> Micro Data Sample (PUMS). I am wondering if you can point me in
> the right direction. Your response is greatly appreciated.
My response is below (you may have seen it on gretl-users),
followed by a design question.
<initial response>
I haven't made much use of PUMS data myself, but here's what I
found on quick experimentation. I went to
http://factfinder.census.gov/home/en/acs_pums_2006.html
and downloaded the 2006 Population Records for North Carolina in
CSV format. Gretl was close to being able to read this straight
off, but there was one problem.
When gretl encounters non-numeric data for a particular variable
in a CSV import it treats the values of that variable as strings,
constructs a numeric coding, and creates a "string table" that
presents the coding to the user. BUT this is done only if
non-numeric data are encountered in the first data row for the
variable in question. That is, if we read (apparently) numeric
data on rows 1 to k-1, then encounter non-numeric data on row k,
we flag an error and stop reading.
The trouble is that some of the PUMS variables are codings, some
but not all values of which contain non-numeric characters. For
example, NAICSP, the "NAICS Industry Code", which has values
(among others) of 1133 and 113M.
Here's a solution, perhaps not permanent if we can think of
something better: I've added a new parameter to the "set" command,
namely "codevars". You can do, for example,
set codevars NAICSP SOCP
prior to importing a CSV file. This tells gretl that the
variables NAICSP and SOCP should be interpreted as string-coded,
even if the first values look to be numeric.
(In general you say: "set codevars <varnames>", where <varnames>
is a space-separated list of names. You can say "set codevars
null" to clean out the list.)
For the North Carolina PUMS data, this now works to open the file
in gretl:
set codevars NAICSP SOCP
open ss06pnc.csv
This feature is in CVS gretl, and also in the current Windows
snapshot at
http://ricardo.ecn.wfu.edu/pub/gretl/gretl_install.exe
You may have to engage in some trial and error. I've beefed up
the error reporting a little. So, in relation to the example
above, if you do
set codevars NAICSP
open ss06pnc.csv
you then see:
Variable 106 (SOCP), observation 12, '434XXX':
Extraneous character 'X' in data
which in effect tells you that you need to add SOCP to the
"codevars" list -- if it seems to you that 434XXX is a legtitimate
value for that variable.
</initial response>
Now here's my question. I wonder if it might be better (or
complementary, perhaps) to add an option flag to open/import, that
forces gretl to treat all data columns containing non-numeric
values as legitimate codings. (There could be a corresponding
checkbox in the GUI.)
Internally, this would require two passes through the file, one to
assess which variables need special treatment, and a second to
atually read (and code) the data.
The general issue here is that non-numeric values are sometimes
legit, but sometimes reflect a screwed-up data file. It might be
useful for the user to be able to say, "I know that anything
non-numeric in this file is in fact legit".
Allin.
17 years, 1 month
Bug in sort data menu entry
by andreas.rosenblad@ltv.se
A bug in the sort data menu entry:
Using version 1.6.6pre2 build date 10/21/2007 on Windows XP and the data
set Ramanathan data2-1: Selecting the menu entry Data > Sort data... to
sort by vsat ascending, nothing changes. The data set is not sorted.
However, by writing "dataset sortby vsat" in the gretl console the datset
is sorted correctly.
Best regards
Andreas
17 years, 1 month
Re: Re: [Gretl-devel] Improvements in the user-defined function packages
by andreas.rosenblad@ltv.se
cottrell(a)wfu.edu @ INTERNET skrev 2007-10-18 17:15:52 :
> On Thu, 18 Oct 2007, andreas.rosenblad(a)ltv.se wrote:
>
> > Currently, when writing a user-defined function, the parameters >
> can be of six types: bool (scalar variable acting as a Boolean >
> switch), int (scalar variable acting as an integer), scalar >
> (scalar variable), series (data series), list (named list of >
> series) and matrix (named matrix or vector).
> > > To make it easier to write a user-defined function and make it >
> more flexible I would appreciate if three other parameter types >
> could be included:
> > > * "discrete" (a data series with only discrete values)
> > * "proportion" or "quantile" (a scalar which value is between 0 and 1)
> > * "select" (providing a list of alternatives to select from).
> > > As it is now, the first two parameter types, "discrete" and >
> "proportion", could be input to the function as "series" and >
> "scalar", and the user has to let the function check if they are >
> discrete or between 0 and 1. It would be nice if it instead was >
> automatically checked already at the input by the parameter type >
chosen.
>
> "proportion" would be quite easy to do;
"proportion" would be very useful e.g. when the function requires the user
to specify a significance level alpha, which of course has to be between 0
and 1
> "discrete" less so. The
> trouble is that I don't think "discrete" denotes a well-defined data
> type. It would, I suppose, if we restricted it to integer values,
> but some people think that's too restrictive. As things stand, I
> think it's up to the function writer to decide what counts as
> discrete in any given context.
You are right. Well, could you then instead create a parameter type
"intseries" for a a series containing only integers, just as there
currently is a parameter type "int" for a scalar variable acting as an
integer, please?
> > For the third parameter type, "select", I imagine writing e.g. >
> function(select rejectionregion["two sided","one sided > lower","one
> sided upper"]) would in the GUI produce a drop-down > list for this
parameter
>
> That's definitely a nice idea. We could have a go at that after the
> 1.6.6 release.
It would be really great. I am currently writing a function where I want
the user to specify the scale type of the variable, i.e. nominal, ordinal,
interval or ratio scale. It is hard to do as it is now, it would be easier
with e.g. function(select scaletype["nominal scale","ordinal
scale","interval scale","ratio scale"]).
Best regards
Andreas
17 years, 1 month
Re: Re: [Gretl-devel] mxtab command is not documented in the manual
by andreas.rosenblad@ltv.se
cottrell(a)wfu.edu @ INTERNET skrev 2007-10-18 18:10:34 :
> On Wed, 17 Oct 2007, andreas.rosenblad(a)ltv.se wrote:
>
> > The new mxtab command is not documented in the Gretl Command Reference,
the
> > Gretl User's Guide or the help files. It would be useful to have it
> > described there.
>
> That's now added in CVS. I've also modified this function so that
> it will accept two (n x 1) matrices as arguments as well as two series.
Thank you very much.
Andreas
17 years, 1 month
mxtab command is not documented in the manual
by andreas.rosenblad@ltv.se
The new mxtab command is not documented in the Gretl Command Reference, the
Gretl User's Guide or the help files. It would be useful to have it
described there.
Best regards
Andreas
17 years, 1 month
Improvements in the user-defined function packages
by andreas.rosenblad@ltv.se
Currently, when writing a user-defined function, the parameters can be of
six types: bool (scalar variable acting as a Boolean switch), int (scalar
variable acting as an integer), scalar (scalar variable), series (data
series), list (named list of series) and matrix (named matrix or vector).
To make it easier to write a user-defined function and make it more
flexible I would appreciate if three other parameter types could be
included:
* "discrete" (a data series with only discrete values)
* "proportion" or "quantile" (a scalar which value is between 0 and 1)
* "select" (providing a list of alternatives to select from).
As it is now, the first two parameter types, "discrete" and "proportion",
could be input to the function as "series" and "scalar", and the user has
to let the function check if they are discrete or between 0 and 1. It would
be nice if it instead was automatically checked already at the input by the
parameter type chosen.
For the third parameter type, "select", I imagine writing e.g.
function(select rejectionregion["two sided","one sided lower","one sided
upper"]) would in the GUI produce a drop-down list for this parameter where
one can select one of the alternatives "two sided", "one sided lower" or
"one sided upper". This is one of the functions that I currently miss in
gretl when writing my own functions and which I would find very useful.
Best regards and thanks for a great software
Andreas
17 years, 1 month
Re: Re: [Gretl-devel] Cumbersome to enter new data
by andreas.rosenblad@ltv.se
I have now find out what the problem is. When I enter a value with the
typewriter keys in the keyboard I have no problems, but using the numeric
keypad of the keyboard (using Num Lock) I got the problems I described
earlier.
I am attaching a couple of screenshots showing the problem.
In screenshot 1, I hit "5", which result in screenshot 2, which in turn
after a few seconds changes to screenshot 3.
Andreas
(See attached file: gretl screenshot 1.jpg)(See attached file: gretl
screenshot 2.jpg)(See attached file: gretl screenshot 3.jpg)
cottrell(a)wfu.edu @ INTERNET skrev 2007-09-19 07:27:11 :
> On Tue, 18 Sep 2007, I wrote:
>
> > On Tue, 18 Sep 2007, andreas.rosenblad(a)ltv.se wrote:
> > > > It is quite cumbersome to create a new data set i gretl. In
> the > > edit data window, when I insert a value in a column and
> press > > Enter, Down arrow or Tab to come to the next row, I cannot
> enter > > a new value directly, but have to press Enter first,
> before I > > can enter a new value.
> > > Hmm, this is on Windows? On Linux I'm able to enter values >
> directly, without having to hit Enter an extra time. I'll look > into
that.
>
> I've looked into it: with the current gretl snapshot, running on XP,
> I can enter a column of numbers by typing
>
> value
> Enter (or down-arrow)
> value
> Enter
> value etc.
>
> That is, the cell that Enter moves into is ready to accept numerical
> input, as it should be.
>
> The cell-cursor admittedly does not immediately look like the
> editing cursor (this issue may perhaps be fixable), but it turns
> into the editing cursor as soon as I type.
>
> Allin.
17 years, 1 month
Bug using print command in a loop?
by andreas.rosenblad@ltv.se
When running the following script in gretl (latest Windows snapshot on XP)
matrix A = {1,2,3;1,2,3}
print A
loop for i=1..3
print A
endloop
I got the following result:
gretl version 1.6.6.pre2
Current session: 2007/10/16 12:14
? matrix A = {1,2,3;1,2,3}
Replaced matrix A
? print A
A (2 x 3)
1 2 3
1 2 3
? loop for i=1..3
> print A
> endloop
loop: i = 1
Command has insufficient arguments
>> print A
I.e., I can print a matrix from outside a loop, but not from inside a loop.
Is this a bug or am I doing something wrong?
Best regards
Andreas
17 years, 1 month
Re: Re: Re: [Gretl-devel] Bugs, missing features and a suggestion
by andreas.rosenblad@ltv.se
>svetosch at gmx.net @ INTERNET skrev 2007-09-25 14:02:40 :
>
>> andreas.rosenblad at ltv.se schrieb:
>>
>> >
>> > If it is a missing feature, please do implement it, so that the the
>> > combined expression X = 1 & Y = 2 can be used for matrices
>>
>> I'm not against that feature, but note that you can do for example
>> ((x=1).*(y=2) = 1), or ((x=1)+(y=2) = 2).
>
>
>Yes, but it is more cumbersome and less intuitive. Especially since X = 1
&
>Y = 2 is allowed for scalars and data series, it should also be allowed
for
>matrices.
Is there any chance that Z = (X = 1 & Y = 2) will be allowed for matrices
too, considering that it is allowed for scalars and data series?
Best regards
Andreas
17 years, 1 month
Update documentation, please
by andreas.rosenblad@ltv.se
I noted that the new "ceil()" command is not documented in the PDF files
User's guide or Command reference, nor in the Help > Command reference >
Plain text files.
Andreas
17 years, 1 month