On Sat, 19 Apr 2014, Shintaro Nakagawa wrote:
> Next question: where did the Yen signs come from? I'll work
on that
> problem. Getting the text encoding "right" for RTF for a range of
> locales is non-trivial, since the RTF "standard" is an unprincipled
> mash-up of so-called "ansi" and hex-coded UTF-16.
Although I have no idea about how Gretl treats Unicode,
but if iconv is used somewhere,
backslashes might be replaced with Yen signs by iconv.
When I run the following command from Terminal,
echo -n \\ | iconv -f Shift_JIS -t UTF-16 | hexdump
I have
0000000 fe ff 00 a5
0000004
00a5 represents the Yen sign.
Thanks. Here's what happens when gretl is asked to put RTF onto the
clipboard/pasteboard.
* The text is printed into a buffer, translated if gretl is being run in a
supported non-English locale. Since gretl is not translated into Japanese,
this will come out as English in your case.
* For output as RTF, two things must be checked regarding the text buffer.
First, the text may contain "real" Unicode minus signs: if so, these need
to be recoded as simple dashes. Second, everything is in UTF-8 inside
gretl but RTF doesn't support this encoding so we check the text
(heuristic: it's not ASCII but validates as UTF-8) and if we find it's
UTF-8 we try to recode to the locale (via GLib).
* We then use the GTK clipboard mechanism or, on the Mac, pbcopy.
Here's the thing that's puzzling me: since the text you're getting is in
English and not Japanese, it seems it should be ASCII (after any Unicode
minus signs are replaced) and therefore the recoding phase shouldn't be
activated. In that case it would seem that it's pbcopy that is
interpreting the backslash byte (ASCII: 0x5C) as the Yen symbol (as in ISO
646).
I could add some debugging output to verify whether or not gretl is
invoking GLib's recoding mechanism.
Allin