On Thu, 24 Sep 2009, Sven Schreiber wrote:
Alan G Isaac schrieb:
>
> There is also a different kind of argument. Sure CSV files
> are plain text and can therefore be easily manipulated by a
> variety of tools. But they are also a core spreadsheet
> format. Thus the --rowoffset and --coloffset options of
> `open` should work with CSV too. Or so I claim, despite
> offering no code to do this.
Hi Alan,
in principle I agree that this one's a valid point. However:
* --coloffset doesn't convince me in the case of csv files (as opposed
to --rowoffset in csv files, or --coloffset in spreadsheet files)
I'm willing to go with --coloffset in the case of CSV; the columns
are supposed to be well defined. But it may take a little while
to implement.
* my teaching experience is that it's even more manageable for
students if you tell them: ok, first open the file with a text
editor (jedit is cross-platform for example, but there are
hundreds others of course), insert # signs in the first couple
of rows, then you're ready to go. Many will like this (arguably
inefficient) approach even more than specifying --rowoffset in a
script file.
Now here I'm very much with Sven. And in fact I'm not so sure
that the suggested approach (checking and revising the CSV file in
a text editor before loading it into gretl) is inefficient; it may
well be optimal. This is a serious point.
I base this judgment on extensive experience with CSV files
generated by various statistical agencies around the world. My
experience tells me (a) that such files are often "broken" and
moreover (b) that they are not infrequently broken in ways that
you wouldn't expect. More than once I've thought that I had
figured out how to fix Agency X's broken CSV using a fairly simple
algorithm, only to find that _some_ of the files produced by X
were broken is a different way!
Therefore, I would not trust any statistical analysis based on
such data unless the researcher had actually opened the raw files
in a text editor and scanned them for problems.
DO NOT rely on simple algorithmic fixes, and don't suggest that
your students do so. Open the data and take a look. Comment out
lines you don't want. Use Find/Replace to get rid of idiocy such
as the ghost column
,""
appended to every line (well, maybe every line, but can you be
sure?) of IFS files.
Allin.