I totally agree with Gordon. Specially on 0*NA = NA to avoid unwanted results.
I did read Allin's explanation on this, and he is also right. Maybe there is the need to ask the user what to do in these specific situations, or issuing a warning on the automated decision.
For advanced users, some parameter, flag or environmental variable could set the wanted behavior.
Hélio
Can I raise a dissenting voice? Do you REALLY want to expend the
effort to distinguishing between NA and NaN in every single procedure
and (presumably) every function, etc? It would be even worse if you
added +/-Inf. My reaction is that there are better ways to spend
time in developing the program.
Anyone learning statistics or econometrics rapidly comes across the
need to deal with missing values of several different
kinds. Recognising that lx=log(x) for x <=0 causes a problem is a
very elementary and early lesson. Masking it by, for example,
setting lx=-Inf is likely to mislead. Since this would be very
different from what most other programs do, it is likely to generate
more problems of consistency across analyses performed using
different packages.
As far as I understand, the original argument was generated by the
question: should 0*NA be 0 or NA? My personal view is that it should
always be NA because it is always possible for the user to override
this result by explicitly recognising that NA is really NaN and using
conditional generate statements. The default of generating a missing
value when there is any doubt is easily addressed by the user if a
different result is required, but it is much safer for the unwary.
Gordon Hughes
>If we want to distinguish between true NAs and nan/inf (as we
>probably should), some other design questions come up, as a
>consequence of the fact that we would be allowing non-finite
>values in series and scalar variables. (Unless, that is, we make
>it an error to put non-finite values into such variables.)
>
>I presume that in simple, per observation, calculations such as y
>= log(x) or y = x*z we'd want to let IEEE rules prevail, but what
>about more complex calculations?
>
>At present we automatically exclude observations with NAs from
>regression calculations, means and variances and so on. Should we
>do the same for nan/inf, or should we let IEEE rules prevail -- or
>should we add a "set" switch to control this?
>
>A practical use case is this:
>
>series lx = log(x)
>ols y 0 lx
>
>where the series x contains non-positive values. Right now the bad
>log x values are converted to NA and skipped. If we leave them as
>nan or -inf then what should we do?
>
>Allin
_______________________________________________
Gretl-devel mailing list
Gretl-devel@lists.wfu.edu
http://lists.wfu.edu/mailman/listinfo/gretl-devel