Am 23.07.2017 um 21:44 schrieb Allin Cottrell:
The problem: some time ago we decided to ease the task of parsing
"CSV"
by deleting quotation marks from each line of input. (We can and do
recognize string-valued input, but only by determining that it cannot be
parsed as numeric.) Quotation is sometimes used inconsistently and
arbitrarily in "CSV" files
I am absolutely no csv fundamentalist (like people who don't accept
semicolons or tabs as column separators), but could you remind us why
coping with CSV files with inconsistent quotation has to be done?
Spontaneously I'd say such files are really the problem of their creators.
So, I've been working on a revision of our CSV reader in which we
"respect" quotation in this sense: we do not delete quotation marks in
CSV input, and if it turns out that all the values in a given column are
quoted integers, we take that column to be an encoding of a categorical
variable.
Except if they're years, I hope... No seriously, doesn't this mess with
a lot of variables that may be only integers but that we usually treat
as quasi-continuous?
thanks,
sven