On Tue, 1 Oct 2013, Riccardo (Jack) Lucchetti wrote:
> This whole thing needs some more investigation. I guess for
consistency we
> should either scrap the --balanced option or reactivate the balancing mode
> that's now disabled. Whenever I try to think this issue through it makes my
> head hurt, but I'll see what I can do...
Personal opinion: if a panel structure is in force, --balanced should be
implicit in smpl. Rationale: a panel dataset is a panel dataset. It simply
can't stop being a panel dataset because you are sub-sampling it. If you
really want it to be interpreted differently, then you have setobs for that.
I hear you, but not all users have that expectation. At one time what the
--balanced option does was the default, but I changed that (a long time
ago now) after hearing from some users that they found the effect
surprising, even disconcerting ("I thought I'd got rid of those rows, but
there they still are, just with NAs substituted").
Anyway, I have a related idea that may be worth floating. When we
"subsample" so as to preserve panelness, we could do this:
1) If the sampling criterion excludes any leading or trailing groups, move
the sample endpoints accordingly.
2) Beyond that, don't mess with the panel structure at all, just write NAs
into all the deselected "interior" rows (while, of course, backing up the
original data from those rows).
I think this would be a more robust variant on what we have at present.
Better still would be to construct a binary mask which would be applied
such that all deselected rows were ignored in all statistical procedures.
That would be a lot more work, but worth considering for 2.0.
Allin