On Mon, 27 Aug 2018, Allin Cottrell wrote:
On Mon, 27 Aug 2018, Allin Cottrell wrote:
> [Y]ou should be very sure what you think might be an R bug really is an R
> bug before reporting it, unless you want a roasting!
I'm getting closer to sure. My test involves the string "Anastasia" spelled
out in Greek letters. If I run this by itself through R's iconv(string,
from="utf-8", to="UTF16") it works fine. But if I put any Roman
letters into
the string, as in "Anastasia/test" then it fails, complaining "embedded
nul
in string". But of course both UTF-8 and UTF-16 should have no trouble
combining Greek and Roman characters.
Sorry to go on about this, but actually I now see why it _might_ not
be considered a bug. The UTF-16 sequence corresponding to
"Anastasia" in Greek letters contains no embedded nul byte, since
each of the Greek letters requires 2 non-empty bytes for its
representation. But the appended ASCII characters will each be
represented by a single "active" byte followed by a nul. (UTF-16
requires at least two bytes for each character, and pads with nuls
as needed.)
So I think what R's error message is trying to say is that the
result of conversion doesn't qualify as a string, where "string"
means a sequence of bytes _terminated_ by a nul byte. And true
enough, if you pass a third argument of 'toRaw=TRUE' iconv() gives
you the UTF-16 byte-array, including enbedded nuls, without
complaint. Now whether you can pass that array to an R function in
the role of a filename and expect anything useful to happen, I very
much doubt.
Allin