Re: [Gretl-users] R (foreign language): non-Ascii chars in gretl.loadmat on Windows

Tuesday, 28 August 2018

On Tue, 28 Aug 2018, Sven Schreiber wrote:

...
 Am 28.08.2018 um 04:06 schrieb Allin Cottrell:
> 
> 
> Sorry to go on about this, but actually I now see why it _might_ not be 
> considered a bug. The UTF-16 sequence corresponding to "Anastasia" in Greek

> letters contains no embedded nul byte, since each of the Greek letters 
> requires 2 non-empty bytes for its representation. But the appended ASCII 
> characters will each be represented by a single "active" byte followed by a

> nul. (UTF-16 requires at least two bytes for each character, and pads with 
> nuls as needed.)
> 
> So I think what R's error message is trying to say is that the result of 
> conversion doesn't qualify as a string, where "string" means a sequence
of 
> bytes _terminated_ by a nul byte.

 But wouldn't that imply that R considers all UTF-16 strings as invalid as 
 long as there are some "simple" characters in there? If that's the case, it

 would very much defeat the purpose of Unicode being a superset of more 
 restrictive encodings. So it still sounds like a bug, no? 
Not really, IMO. No program can support "narrow" strings (ASCII, 8-bit 
codepages, UTF-8) and "wide" ones (UTF-16, etc.) interchangeably.

I think R's error message is not very good, but the deal is basically, 
"Look, we're not going to give you a UTF-16 byte array as if it were a 
string, because we only do narrow strings. If you really want the 
bytes, then use toRaw=TRUE and we'll give you them; what you do with 
them is then up to you, but don't expect them to work where a string 
is wanted."

Like gretl, R supports Unicode OK but only in the form of UTF-8. What 
R apparently lacks is a mechanism to translate UTF-8 filenames to 
UTF-16 on Windows automagically.

Allin

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Gretl-users] R (foreign language): non-Ascii chars in gretl.loadmat on Windows