Can I raise a dissenting voice? Do you REALLY want to expend the
effort to distinguishing between NA and NaN in every single procedure
and (presumably) every function, etc? It would be even worse if you
added +/-Inf. My reaction is that there are better ways to spend
time in developing the program.
Anyone learning statistics or econometrics rapidly comes across the
need to deal with missing values of several different
kinds. Recognising that lx=log(x) for x <=0 causes a problem is a
very elementary and early lesson. Masking it by, for example,
setting lx=-Inf is likely to mislead. Since this would be very
different from what most other programs do, it is likely to generate
more problems of consistency across analyses performed using
As far as I understand, the original argument was generated by the
question: should 0*NA be 0 or NA? My personal view is that it should
always be NA because it is always possible for the user to override
this result by explicitly recognising that NA is really NaN and using
conditional generate statements. The default of generating a missing
value when there is any doubt is easily addressed by the user if a
different result is required, but it is much safer for the unwary.
If we want to distinguish between true NAs and nan/inf (as we
probably should), some other design questions come up, as a
consequence of the fact that we would be allowing non-finite
values in series and scalar variables. (Unless, that is, we make
it an error to put non-finite values into such variables.)
I presume that in simple, per observation, calculations such as y
= log(x) or y = x*z we'd want to let IEEE rules prevail, but what
about more complex calculations?
At present we automatically exclude observations with NAs from
regression calculations, means and variances and so on. Should we
do the same for nan/inf, or should we let IEEE rules prevail -- or
should we add a "set" switch to control this?
A practical use case is this:
series lx = log(x)
ols y 0 lx
where the series x contains non-positive values. Right now the bad
log x values are converted to NA and skipped. If we leave them as
nan or -inf then what should we do?