[Gretl-users] Re: gretl vs Stata: different WLS estimates for weights=0

Wednesday, 20 January 2021

On Wed, 20 Jan 2021, Riccardo (Jack) Lucchetti wrote:

...
 IMO, there's another point we should consider here, if you want
to compare 
 our results with Stata's, namely computation of the sample size, which may 
 become relevant for the computation of the covariance matrix with small 
 samples.

 Consider the following example (adapted from Artur's):

 <hansl>
 set verbose off
 open greene12_1.gdt
 list lx = const income ownrent selfemp
 set seed 100

 # generate a weights series with a few zeros in
 z = uniform() < 0.1
 w = abs(normal())
 w0 = z ? 0 : w
 w1 = z ? 1.0e-9 : w

 wls w0 expend lx
 stata_se = $stderr * sqrt(($nobs - $ncoeff)/($T - $ncoeff))
 print stata_se

 wls w1 expend lx

 foreign language=stata --send-data
  	reg expend income ownrent selfemp [aw=w0]
  	reg expend income ownrent selfemp [aw=w1]
 end foreign
 </hansl>

 The two series w0 and w1 are, for all intents and purposes, identical. 
 Therefore, the estimates of the coefficients are the same. It just makes 
 sense that the standard error should also be the same (like we do).

 However, Stata skips zeros when computing the effective sample size: the 
 vector stata_se reproduces stata's algorithm. This introduces an 
 inconsistency in stata when using the w1 series. The estimates are the same, 
 but the standard errors are quite different. Of course, the inconsistency 
 vanishes for large sample sizes, but it has to be taken into account when 
 comparing results. 
Ah, interesting point. However, I've committed a "fix" which brings 
us in line with stata (and R). Prior to that, we were netting out 
observations with zero weights from n in calculating standard errors 
if and only if the weights series was a 0/1 dummy. Yet in the model 
printout we were always reporting a number of observations net of 
the zero-weighted ones. Now we always net out points with zero 
weight from the start.

I think this is defensible. Data points with zero weight have no 
effect on the parameter estimates by construction. Points with very 
small weights may or may not have a non-negligible effect on the 
estimates; that'll depend on their leverage as well as the size 
distribution of the weights (perhaps they're all tiny?).

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Gretl-users] Re: gretl vs Stata: different WLS estimates for weights=0