Allin Cottrell schrieb:
On Sat, 17 Apr 2010, Lee Adkins wrote:
> ->OK, they may make _some_ distinction, but if they evaluate 0*NA as-
> ->NA (as we've heard) then they are not doing it right.
>
> Except if 0 is a dummy variable. In that case 0*NA = NA., 0 is
> not really zero in ordinal data--it's arbitrary and just
> indicates a category. Otherwise recoding the dummy variables
> changes the effective data set and lead to unexpected results.
Good point. Just thinking it through: If one had (D1: female = 1,
male = 0) and (D2: married = 1, unmarried = 0), and for some
individuals D2 was NA, and then one created an interaction D1*D2,
one would wrongly categorize males of unknown marital status as
unmarried males, if 0*NA = 0. I'll have to think about that.
Yep, this whole thread started when I was handling dummies and got
unexpected results like that.
Now Lee points out that gretl assumes that NAs are (unknown) numbers on
a cardinal scale. But if you allow me to rephrase myself a little (and
without referring to NaN's this time, which is a related but slightly
different issue):
I am actually also worried about assuming _any_ type of number for NAs.
What does the abbrev. "NA" stand for in English? I can think of at least
two meanings: "not available" or "not applicable". The interpretation
of
"not available" may well allow to treat NAs as numbers, but I really
think that "not applicable" will usually not be hiding a meaningful
number underneath. And it seems to me that missing values in
economic/social statistics quite often carry the meaning of "not
applicable". At least they did in my unbalanced panel data set where I
encountered the problems.
All in all, while the cleanest solution may be to introduce and properly
differentiate NAs, NaNs, Infs and all the rest, wouldn't the easiest way
simply be to make any operation on NA return NA, including 0*NA=NA?
cheers,
sven