On Wed, 13 Sep 2017, Allin Cottrell wrote:
> Not sure about this, but my initial reaction is that it may be
assuming too
> much about our "discrete" series.
>
> In R, isn't a "factor" a variable that (in gretl parlance) has to be
> "dummified" before use in regression? That is, an arbitrary encoding of a
> qualitative characteristic?
Yes, you're right.
> If so, then I think the above is wrong, since a gretl-discrete
series could
> be a perfectly valid (albeit quantized) quantitative variable; for example,
> years of education or number of bedrooms.
>
> But If I'm wrong about what a "factor" is to R, my objection may fall.
Sorry, I should have added: we now have the facility, under the "setinfo"
command, of marking a series as "coded". And when we write a "coded"
series
as CSV we quote the numerical values, in response to which R automatically
treats the series as a "factor". So I think we already have what you're
aiming at here.
I agree that the mapping to R's factors is much more accurate if we used
the "coded" bit. However, R doesn't seem to make this distinction
automagically for integer-valued coded strings. Example:
<hansl>
nulldata 50
cont1 = normal()
disc1 = floor(uniform(1,5))
disc2 = floor(uniform(4,18))
stringify(disc1, defarray("a", "b", "c", "d")) #
string-valued series
list D = disc1 disc2
loop foreach i D
setinfo $i --coded
endloop
foreign language=R --send-data
summary(gretldata);
is.factor(gretldata$disc1);
is.factor(gretldata$disc2);
end foreign
</hansl>
Perhaps we could force R to treat variables as factors via an additional
option to foreign, something like
foreign language=R --send-data --as-factors=X
where X is a list.
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------