On Mon, 27 Aug 2018, Sven Schreiber wrote:
Am 27.08.2018 um 16:49 schrieb Allin Cottrell:
>
> fname <- iconv(fname, from="utf-8", to="")
I'm getting "NA" as the result because the non-latin isn't possible to
convert to my locale.
> I haven't tested that yet but I'll explore a bit.
Now I have explored a bit. My test rig for R is in this form:
<hansl>
foreign language=R
fname <- <UTF8_filename>
file.info(fname)
fconv <- iconv(fname, from="utf-8", to=<TO_ENCODING>)
file.info(fconv)
end foreign
</hansl>
And here's what I find:
If <UTF8_filename> contains accented characters that are in my
Windows' locale codepage (for example c-cedilla), and <TO_ENCODING> is
just "" (empty string to indicate current locale), the second
invocation of file.info() works OK, though not surprsingly the first
one fails (actually, produces a bunch of NAs).
If <UTF8_filename> contains out-of-codepage characters (I tried
Greek), conversion to "" fails. Not surprising. In addition, iconv()
with to="UTF16" fails. It seems to me this last is an R bug. The
function iconvlist() shows that UTF16 is a valid target. The reported
cause of failure is that the "UTF-8" string contains an embedded NUL
character, which is impossible with valid UTF-8.
So I don't know what would happen if one were able to pass a UTF16
representation of the filename to a function such as file.info().
But I suspect R suffers from the problem that gretl had before 2018b,
namely that you can't access files that have out-of-codepage names
(and whose names must therefore be given in UTF16).
Allin