On Sun, 19 May 2019, Sven Schreiber wrote:
Am 19.05.2019 um 19:13 schrieb Riccardo (Jack) Lucchetti:
>>
>> Hmm, interesting idea. I think this could be made to work quite
>> nicely. Internally, nothing prevents us from creating a new, temporary
>> "hidden" dataset (then turning it into a matrix) without disturbing
>> the existing dataset or absence of dataset.
>
> This would be very nice of course, but in that case I would imagine the
> job would be less straightforward than it seems, because of the
> intrinsic differences between the eventual aims.
Given that we have a function for reading a matrix from a file (mread) I
think the natural aim should be to extend that function eventually to
read from csv. Either with a new option or perhaps simply by recognizing
a ".csv" file extension.
(I'm speaking purely from a user's point of view here.)
But if that isn't feasible in the short term, maybe a transitory
function in "extra" could indeed be the solution.
A few points on this.
1) Jack's csv2mat is an outstanding example of accomplishing a lot
with just a few lines of hansl. Of course this is not in the least
unusual from Jack, but for the rest of us it's noteworthy all the
same!
2) I take Jack's point that the "no error" criterion for reading a
dataset from CSV (which we already do) is more restrictive than that
for reading a matrix from CSV -- where we don't have to care about
valid variable names, nor about handling non-numeric values, which
we can just map onto "NA" without further ado.
3) Nonetheless, I find that it's not too difficult to handle the
issues under point 2 in the context of our current CSV importation
code. In current git, you can try out reading CSV into a matrix via
mread() when the filename (or URL) has a ".csv" extension. Two
comments on that: (a) "CSV" really just means delimited text, the
delimiter doesn't have to be comma; and (b) if we want to pursue
this option we could admit some other filename extensions.
4) One point supported by Jack's hansl code that is not supported by
our built-in CSV importer is malformed CSV (e.g. some lines have
more fields than others). I don't think we'd want to support this in
our C code -- and actually I kinda wonder about the wisdom of
supporting it at all.
I'm attaching a sample script that derives from Jack's original
upthread. It requires, and compares results with, Jack's
csv2mat.inp.
Allin