On Mon, 27 Aug 2018, Sven Schreiber wrote:
Am 27.08.2018 um 21:42 schrieb Allin Cottrell:
> And here's what I find:
> If <UTF8_filename> contains accented characters that are in my Windows'
> locale codepage (for example c-cedilla), and <TO_ENCODING> is just ""
> (empty string to indicate current locale), the second invocation of
> file.info() works OK, though not surprsingly the first one fails (actually,
> produces a bunch of NAs).
So this case might be caught in functions like gretl.loadmat which is offered
to R by gretl, right?
Or actually I guess I might mean the string gretl.dotdir, really.
Both gretl.dotdir and the matrix filename might contain non-ASCII
characters. There's now in git, but not yet snapshots, a
Windows-specific call to R's iconv() to convert the full filename
(dotdir + matrix-file) to the locale in gretl.loadmat. (Though
actually, perhaps gretl.dotdir should be converted in its own
right.) That, hopefully, should fix the case of non-ASCII filenames
that are representable in the locale.
> If <UTF8_filename> contains out-of-codepage characters (I
> conversion to "" fails. Not surprising. In addition, iconv() with
> to="UTF16" fails. It seems to me this last is an R bug. The function
> iconvlist() shows that UTF16 is a valid target.
Aha, interesting. I was puzzled by some of the failures in this area. Where
are R bugs reported, anyway? It turns out it's not only hard to google for
"R" stuff, but also for R bugs, because that's actually an R package of
"bugzilla" is the magic string to include with "R". But... you
should be very sure what you think might be an R bug really is an R
bug before reporting it, unless you want a roasting!
One more thought, maybe "UTF-16LE" on little-endian
I don't think that's the issue, but it should be checked.
> But I suspect R suffers from the problem that gretl had before
> namely that you can't access files that have out-of-codepage names (and
> whose names must therefore be given in UTF16).
Sounds plausible -- way to go, gretl!
I did think of one other approach, namely having gretl recode the
script it's sending to R from UTF-8 to UTF-16 at source. That didn't
do any good; R barfed on the script from the start.