On 18/10/13 15:12, Allin Cottrell wrote:

The question is, what codeset are we to assume we're converting from?

To date we have assumed that if text is not UTF-8 it will be in the 
encoding of the current locale, and have used the GLib function 
g_locale_to_utf8() on the imported text. This will fail if your locale 
codeset is in fact UTF-8, which I presume is what's happening in your 
case.

I've now made this a little smarter in CVS. First we check if the current 
locale codeset is UTF-8. If so, we avoid using g_locale_to_utf8() and 
instead use g_convert(), guessing at ISO-8859-15 as the source codeset.
If the current locale codeset is _not_ UTF-8 we try using that as the 
source encoding; but if that fails we try ISO-8859-15.
It works ! (BTW it seems to be the same problem with 'open' in importing the CSV file)


Perhaps we should offer an optional second argument to readfile(), 
allowing the user to specify the source codeset.
I think it is a good idea.


By the way, the correspondence on this topic illustrates how tricky this 
whole business is. Ignacio attached a file which he said was in ISO-8859, 
The "file" command was reporting it only as ISO-8859. Looking at it with more detail I see it was really ISO-8859-15
but in fact what came across via email was an ASCII file with question 
marks in place of accented characters. Helio gave an inline example of a 
file that was again supposed to be in ISO-8859-15, but what came across 
via email here was in fact UTF-8.

If you want to illustrate anything to do with codesets via email, it's 
necessary to zip or tar the files in question; otherwise you have no idea 
what your reader is going to see!

Yes, you are right and I should know this. I am old enough to remember the times we had many problems sending attachments with such codifications.


--
Firma Arista
Ignacio Díaz-Emparanza
Zuzendaria/Director
ignacio.diaz-emparanza@ehu.es
94 6013732
EKONOMIA APLIKATUA III SAILA (EKONOMETRIA ETA ESTATISTIKA)/ DEPARTAMENTO DE ECONOMÍA APLICADA III (ECONOMETRÍA Y ESTADÍSTICA)
UPV/EHU

Avda. Lehendakari Aguirre, 83 | 48015 BILBAO
T.: +34 946013740 | F.: +34 946013754
www.ea3.ehu.es
ERNE! Baliteke mezu honen zatiren bat edo mezu osoa legez babestuta egotea. Mezuak badu bere hartzailea. Okerreko helbidera heldu bada (helbidea gaizki idatzi, transmisioak huts egin) eman abisu igorleari, korreo honi erantzunda. Kontuz! Mezua ez bada zuretzat, ez erabili, ez zabaldu beste inori, ez kopiatu eta ez baliatu.
¡ATENCIÓN! Este mensaje contiene información privilegiada o confidencial a la que sólo tiene derecho a acceder el destinatario. Si usted lo recibe por error le agradeceríamos que no hiciera uso de la información y que se pusiese en contacto con el remitente.

E-mail hau inprimatu baino lehen egiaztatu inprimatzeko beharra.
Antes de imprimir este e-mail piense bien si es necesario hacerlo.