On Fri, 27 Feb 2009, Cameron Kaplan wrote:
Is there any good way of importing data into gretl in ascii
format that does not have variable names at the top and/or the
data is in fixed width columns (e.g. data from ICPSR.org? In
stata, spss and sas, this is done by writing dictionary/setup
programs. Can the same thing be done in gretl? If not, is
there a work around?
My personal feeling is that pre-processing a file of the sort
you're talking about is pretty easy using free, non-interactive
text-editing tools, such as sed, awk, and cut.
Up till now gretl has not offered any functionality of this sort
-- it's assumed that any plain text data file will have its
columns delimited in some coherent fashion. But it's not
difficult to provide at least some such functionality, and in
current gretl CVS and the MS Windows shapshot,
we do so. There's now a (command-line only) option:
open <filename> --cols=<colspec>
<filename> should be the name of a plain text file, and <colspec>
should be a column-reading specification, taking the form of a set
of comma-separated integers. For example:
open mydata.txt --cols=1,6,20,3
The details are subject to change, but at present the "cols"
numbers are interpreted as a set of pairs. The first element of
each pair denotes a starting column, measured in bytes from the
beginning of the line with 1 indicating the first byte; and the
second element indicates how many bytes should be read for the
given field. The example above is therefore parsed as follows:
For variable 1, read 6 bytes starting in column 1; and for
variable 2, read 3 bytes starting at column 20.
This option assumes a well-formatted file. Lines that are blank,
or that begin with '#', are ignored, but otherwise the
column-reading template is applied, and if anything other than a
valid numerical value is found an error is flagged.
The column specs must be given left-to-right, and must be
non-empty: it's an error if the starting column for variable j is
non-positive, or (for j > 1) is not greater than that for variable
j-1; and it's an error if the column width is non-positive.
If the data are read successfully, the variables will be named
"v1", "v2", etc. It's up to the user to provide meaningful names
and/or descriptions using the commands "rename" and/or "setinfo".