On Wed, 20 Jan 2021, Riccardo (Jack) Lucchetti wrote:
IMO, there's another point we should consider here, if you want
to compare
our results with Stata's, namely computation of the sample size, which may
become relevant for the computation of the covariance matrix with small
samples.
Consider the following example (adapted from Artur's):
<hansl>
set verbose off
open greene12_1.gdt
list lx = const income ownrent selfemp
set seed 100
# generate a weights series with a few zeros in
z = uniform() < 0.1
w = abs(normal())
w0 = z ? 0 : w
w1 = z ? 1.0e-9 : w
wls w0 expend lx
stata_se = $stderr * sqrt(($nobs - $ncoeff)/($T - $ncoeff))
print stata_se
wls w1 expend lx
foreign language=stata --send-data
reg expend income ownrent selfemp [aw=w0]
reg expend income ownrent selfemp [aw=w1]
end foreign
</hansl>
The two series w0 and w1 are, for all intents and purposes, identical.
Therefore, the estimates of the coefficients are the same. It just makes
sense that the standard error should also be the same (like we do).
However, Stata skips zeros when computing the effective sample size: the
vector stata_se reproduces stata's algorithm. This introduces an
inconsistency in stata when using the w1 series. The estimates are the same,
but the standard errors are quite different. Of course, the inconsistency
vanishes for large sample sizes, but it has to be taken into account when
comparing results.
Ah, interesting point. However, I've committed a "fix" which brings
us in line with stata (and R). Prior to that, we were netting out
observations with zero weights from n in calculating standard errors
if and only if the weights series was a 0/1 dummy. Yet in the model
printout we were always reporting a number of observations net of
the zero-weighted ones. Now we always net out points with zero
weight from the start.
I think this is defensible. Data points with zero weight have no
effect on the parameter estimates by construction. Points with very
small weights may or may not have a non-negligible effect on the
estimates; that'll depend on their leverage as well as the size
distribution of the weights (perhaps they're all tiny?).
Allin