On Tue, 18 Aug 2009, Allin Cottrell wrote:
On Mon, 17 Aug 2009, Hebert Suarez Cahuana wrote:
> Estoy importando datos de stata a Gretl, pero me encuentro que las
> variables que tienen en su etiqueta la letra "ñ" no puede importar la
> misma, pero si la variable.
If you can send me a copy of the Stata file you're trying to open
I will take a look and see what we can do. The trouble is that I
don't know what encoding Stata uses for accented characters, but
with an example we can perhaps figure that out.
I've now taken a look at this issue (though having a sample
datafile might still be helpful).
The difficulty with handling non-ASCII labels in stata dta files
is that character encoding is not handled properly in those files.
The online stata documentation for the dta format says "Strings
use ASCII encoding", which is a lie! Apparently, they use
whatever encoding is in force on the platform where the dta file
was created, but the file contains no record of the encoding used.
See
http://cluelessresearch.com/2007/05/stata-and-accents-diacriticals/
What gretl does with such labels is:
* If a label validates as UTF-8 (which includes ASCII as a
subset), we use the label as is. (I've verified that the Spanish
letter 'enie' comes out right in dta labels if the dta file is
created on a platform that uses UTF-8.)
* Otherwise we try recoding the label from the current locale to
UTF-8. This will work only if the current locale happens to be the
same as that under which the dta file was created.
* New in CVS: if the above does not work we guess that the
encoding of the label might be Windows CP1252, and try converting
from that.
If none of the above work, you're out of luck. Complain to stata!
Allin Cottrell