On Thu, 14 Sep 2017, Riccardo (Jack) Lucchetti wrote:
On Wed, 13 Sep 2017, Allin Cottrell wrote:
>> Not sure about this, but my initial reaction is that it may be assuming
>> too much about our "discrete" series.
>>
>> In R, isn't a "factor" a variable that (in gretl parlance) has to
be
>> "dummified" before use in regression? That is, an arbitrary encoding
of a
>> qualitative characteristic?
Yes, you're right.
>> If so, then I think the above is wrong, since a gretl-discrete series
>> could be a perfectly valid (albeit quantized) quantitative variable; for
>> example, years of education or number of bedrooms.
>>
>> But If I'm wrong about what a "factor" is to R, my objection may
fall.
>
> Sorry, I should have added: we now have the facility, under the "setinfo"
> command, of marking a series as "coded". And when we write a
"coded" series
> as CSV we quote the numerical values, in response to which R automatically
> treats the series as a "factor". So I think we already have what you're
> aiming at here.
I agree that the mapping to R's factors is much more accurate if we used the
"coded" bit. However, R doesn't seem to make this distinction automagically
for integer-valued coded strings. Example:
<hansl>
nulldata 50
cont1 = normal()
disc1 = floor(uniform(1,5))
disc2 = floor(uniform(4,18))
stringify(disc1, defarray("a", "b", "c", "d")) #
string-valued series
list D = disc1 disc2
loop foreach i D
setinfo $i --coded
endloop
foreign language=R --send-data
summary(gretldata);
is.factor(gretldata$disc1);
is.factor(gretldata$disc2);
end foreign
</hansl>
Hmm, I see what you mean. Not sure where I got the idea that "quoted
in CSV" means factor to R, but apparently it's not true in general.
Perhaps we could force R to treat variables as factors via an
additional
option to foreign, something like
foreign language=R --send-data --as-factors=X
where X is a list.
In git there's now something more automated than that: we have a
variant of what you sketched in
http://lists.wfu.edu/pipermail/gretl-devel/2017-September/007916.html
whereby we send R a matrix that identifies any "coded" series in
gretl as factors for R. (We handle the case where the data passed to
R are a subset of the full dataset, as in --send-data=Rlist.)
Allin