On Wed, 20 Jan 2021, Allin Cottrell wrote:
> However, Stata skips zeros when computing the effective sample
size: the
> vector stata_se reproduces stata's algorithm. This introduces an
> inconsistency in stata when using the w1 series. The estimates are the
> same, but the standard errors are quite different. Of course, the
> inconsistency vanishes for large sample sizes, but it has to be taken into
> account when comparing results.
Ah, interesting point. However, I've committed a "fix" which brings us in
line with stata (and R). Prior to that, we were netting out observations with
zero weights from n in calculating standard errors if and only if the weights
series was a 0/1 dummy. Yet in the model printout we were always reporting a
number of observations net of the zero-weighted ones. Now we always net out
points with zero weight from the start.
I think this is defensible. Data points with zero weight have no effect on
the parameter estimates by construction. Points with very small weights may
or may not have a non-negligible effect on the estimates; that'll depend on
their leverage as well as the size distribution of the weights (perhaps
they're all tiny?).
Hm. To be honest, I'm not 100% comfortable with this. My gut feeling is
that using weights to surreptitiously perform subsampling feels a bit like
cheating. I don't like very much the idea that you can dramatically alter
an essential statistic such as a hypothesis test simply by replacing 0
with 1.0e-9 or vice versa.
In my view, if you want some observations to be out of your sample, you
should do it the "proper" way, that is, by using the smpl command. What
Stata does happens when you use the so-called "analytic" weights, that (if
I'm not mistaken) are meant to be used in the context of stratified
samples, so you would expect the weighting variable to hold values
typically larger than 1. In that context, zero weight essentially means
"nobody". But in econometrics, WLS can be used in very different contexts
than stratified samples (eg heteroskedasticity). In those contexts, I'm
not really at ease with the idea that 0 and 1.0e-9 mean _very_ different
things. But maybe I just have to get used to this.
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------