On Mon, 21 Oct 2013, Ignacio Diaz-Emparanza wrote:
On 18/10/13 18:40, Allin Cottrell wrote:
> On Fri, 18 Oct 2013, Ignacio Diaz-Emparanza wrote:
>
>> On 18/10/13 15:12, Allin Cottrell wrote:
>>> Perhaps we should offer an optional second argument to readfile(),
>>> allowing the user to specify the source codeset.
>> I think it is a good idea.
> OK, it's now implemented. Suppose I want to use readfile() on
> a text file encoded in MS codepage 1251 (Russian), and that is
> not my locale codeset. I can then do
>
> string s = readfile("russky.txt", "cp1251")
>
> (Case doesn't matter in the codeset name).
>
Thanks !
With respect to the 'open' (importing CSV) command I think we may leave the
responsability of using a correct UTF8 codeset to the user, but probably the
error message that emerges in trying to import from an incorrect codeset
could be more explicit.
With the table I sent you, the error I obtain is
<output>
Binary data (225) encountered (line 9:4): this is not a valid text file
</output>
I assume the program in this conditions cannot distinguish an accent or
symbol of a non-UTF8 codeset from another binary element [...]
Well, we could try making the (admittedly Eurocentric) assumption that if
the file is not in UTF-8 it might be ISO-8859, as with the new readfile()
default. That's now in CVS.
Appart from that, I am seeing that in my table the first accented
character
is at line 10, position 5, so I think the information given in the error
message (line 9:4) is incorrect.
Ah, 1-based versus 0-based counting. That's now fixed.
Allin