Re: [Gretl-devel] readline() and non utf8

Tuesday, 22 October 2013

On Mon, 21 Oct 2013, Ignacio Diaz-Emparanza wrote:

...
 On 18/10/13 18:40, Allin Cottrell wrote:
> On Fri, 18 Oct 2013, Ignacio Diaz-Emparanza wrote:
> 
>> On 18/10/13 15:12, Allin Cottrell wrote:
>>> Perhaps we should offer an optional second argument to readfile(),
>>> allowing the user to specify the source codeset.
>> I think it is a good idea.
> OK, it's now implemented. Suppose I want to use readfile() on
> a text file encoded in MS codepage 1251 (Russian), and that is
> not my locale codeset. I can then do
> 
> string s = readfile("russky.txt", "cp1251")
> 
> (Case doesn't matter in the codeset name).
> 
 Thanks !

 With respect to the 'open' (importing CSV) command I think we may leave the 
 responsability of using a correct UTF8 codeset to the user, but probably the 
 error message that emerges in trying to import from an incorrect codeset 
 could be more explicit.

 With the table I sent you, the error I obtain is

 <output>
 Binary data (225) encountered (line 9:4): this is not a valid text file
 </output>

 I assume the program in this conditions cannot distinguish an accent or 
 symbol of a non-UTF8 codeset from another binary element [...] 
Well, we could try making the (admittedly Eurocentric) assumption that if 
the file is not in UTF-8 it might be ISO-8859, as with the new readfile() 
default. That's now in CVS.

...
 Appart from that, I am seeing that in my table the first accented
character 
 is at line 10, position 5, so I think the information given in the error 
 message (line 9:4) is incorrect. 
Ah, 1-based versus 0-based counting. That's now fixed.

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] readline() and non utf8