New subject: NA and nan: next steps

Thursday, 15 April 2010

Can I raise a dissenting voice?  Do you REALLY want to expend the 
effort to distinguishing between NA and NaN in every single procedure 
and (presumably) every function, etc?  It would be even worse if you 
added +/-Inf.  My reaction is that there are better ways to spend 
time in developing the program.

Anyone learning statistics or econometrics rapidly comes across the 
need to deal with missing values of several different 
kinds.  Recognising that lx=log(x) for x <=0 causes a problem is a 
very elementary and early lesson.  Masking it by, for example, 
setting lx=-Inf is likely to mislead.  Since this would be very 
different from what most other programs do, it is likely to generate 
more problems of consistency across analyses performed using 
different packages.

As far as I understand, the original argument was generated by the 
question: should 0*NA be 0 or NA?  My personal view is that it should 
always be NA because it is always possible for the user to override 
this result by explicitly recognising that NA is really NaN and using 
conditional generate statements.  The default of generating a missing 
value when there is any doubt is easily addressed by the user if a 
different result is required, but it is much safer for the unwary.

Gordon Hughes

...
If we want to distinguish between true NAs and nan/inf (as we
probably should), some other design questions come up, as a
consequence of the fact that we would be allowing non-finite
values in series and scalar variables. (Unless, that is, we make
it an error to put non-finite values into such variables.)

I presume that in simple, per observation, calculations such as y
= log(x) or y = x*z we'd want to let IEEE rules prevail, but what
about more complex calculations?

At present we automatically exclude observations with NAs from
regression calculations, means and variances and so on.  Should we
do the same for nan/inf, or should we let IEEE rules prevail -- or
should we add a "set" switch to control this?

A practical use case is this:

series lx = log(x)
ols y 0 lx

where the series x contains non-positive values. Right now the bad
log x values are converted to NA and skipped. If we leave them as
nan or -inf then what should we do?

Allin 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] NA and nan: next steps