On Thu, 8 Feb 2018, Sven Schreiber wrote:
Am 07.02.2018 um 23:27 schrieb Allin Cottrell:
>> Couldn't handle 'C:\Users\P\Desktop\I`I½I±IƒI,I±IƒI_I± spss\excel
>> I'IµI'I¿I¼I-I½I±.xlsx': Invalid byte sequence in conversion input
>
> Thanks, Periklis! That "Couldn't handle" line gives me a good clue, if
I
> can just figure exactly what it means.
What are the Windows APIs that readfile() etc. uses?
We don't use Windows APIs directly; we use standard C-library
functions plus GLib wrappers (both cross-platform). However, I see
what the problem was with readfile(): the GLib function we're calling
requires that the filename is given as UTF-8 on Windows, and we
weren't ensuring that filenames in locale-specific encoding were being
converted when necessary. That fix is in git and snapshots.
(BTW, I also see errors in an analogous script to the above --with
the
latest snapshot-- when I put German-specific letters in there. So as
Henrique already showed, this is not a non-Latin issue, but more
generally a non-ASCII thing.)
The issue with Periklis's filenames is more difficult -- although
there's at least a partial fix in place now. Here's the thing: we've
been assuming that we can get the native Windows counterpart of a
UTF-8 filename by calling GLib's g_locale_from_utf8(). But that only
works if the filename in question can be represented in the locale
codepage, and apparently that's not the case with those mixed Greek
and English filenames; they seem to require representation in UTF-16.
(This is not the case with the examples Henrique gave.)
In order to handle mixed-language UTF-16 Windows filenames reliably,
with full generality, we'll have to revamp a lot of path-related
functionality in libgretl. That may be worth doing, but not for the
imminent release.
Allin