On Mon, 18 Jul 2011, Allin Cottrell wrote:
I'm thinking it might be good to revise the functions
firstobs() and lastobs() so they restrict their checks for
non-missing values to the current sample range (right now they
scan the entire dataset).
This would be backward incompatible, but I'd be surprised if
it would cause much trouble. Although it's so stated in the
manual, I think it's unintuitive for function writers that
firstobs() and lastobs() can give you observation indices that
are outside of the sample passed by the caller, and therefore
inaccessible.
Any support/objections?
The more I think about this, the more I get confused.
Let me start from the easy part: Allin's proposal makes a lot of sense to
me, and I am 100% in favour. If you absolutely need to know what data you
have regardless of the current sample range, do a "smpl full" before the
real work begins, store the result into two scalars and go ahead.
This said, I've begun to think about the usefulness of having those two
functions at all. Clearly, they're no use for cross-section data. How
about panel datasets? Either they return something entirely different
(vectors, maybe?) or they're no use in this case, either. Moreover: I
would guess that what most people would use firstobs() and lastobs() for
is some sort of loop-based algorithm which deals with time-series data. In
that case, either you're absolutely sure you never get NAs between
firstobs() and lastobs() or you need to put some sort of check in place.
But if you do, then what's the use of firstobs() and lastobs()?
Note: I'm not making a definite proposal (other than agreeing with Allin's
original point). I'd just like to hear your opinions on this.
Riccardo (Jack) Lucchetti
Dipartimento di Economia
Università Politecnica delle Marche
r.lucchetti(a)univpm.it
http://www.econ.univpm.it/lucchetti