Am 20.08.2013 02:42, schrieb Allin Cottrell:
On Tue, 20 Aug 2013, Riccardo (Jack) Lucchetti wrote:
> On Mon, 19 Aug 2013, Allin Cottrell wrote:
>
>> On Mon, 19 Aug 2013, Allin Cottrell wrote:
>>
>> So here's a suggestion: when we determine that a certain column of a
>> CSV file represents a string-valued variable, by default we treat
>> all non-blank values as string literals. But we provide a "set"
>> variable ("missing_string" or some such) so that the user can
>> specify a missing-code for string-valued input. E.g. when reading
>> from Alfred's tab-separated files one could say
>>
>> set missing_string "."
>>
>> (This would not just be for "join", but for any delimited-text
>> read.)
>
> We already have "set csv_na"; we could use that.
That occurred to me too, but I think it would be confusing. The
(only) role of "set csv_na" at present is to set the string used to
represent NA on _output_ of CSV from gretl. This string may be
specific to the anticipated use of gretl's CSV output by a
third-party program (e.g. Stata or Ox, which have their own notions
of what represents NA).
But what we're talking about now is the string to interpret as NA on
_input_ of string-valued variables from CSV. This string may be
specific to the third-party producer of the CSV file (e.g. Alfred).
The string that's appropriate for one of these uses is not
necessarily appropriate for the other use; or to put it differently,
"set csv_na" (used as per its current role) could have unwanted
side-effects if it were also to govern the reading of string-valued
variables from CSV subsequently.
I agree; actually I had looked up the docs for that 'set' option because
it sounded like it could be relevant. There may be a case to rename (or
introduce aliases for) these options to distinguish input and output
more clearly, like "set csv_write_na" and "set csv_read_na". But then
you knew already that I'm always for more verbosity and explicitness...
thanks,
sven