Heckit

Saturday, 26 May 2007

Hello all,

I'm working on the Heckit estimator, and I need some feedback on the 
treatment of missing observations.

Some background first: suppose we want to estimate a model

y = X \beta + u (1)

but we only observe y if another variable d equals one. Assume that

P(d=1) = \Phi(Z \gamma) (2)

As you all know, there are two way to estimate \beta:

a) Estimate a probit model first for d, compute the Mills ratios and stick 
them into (1), which you estimate by OLS. With an appropriate correction 
of the OLS std. errors, what you get is the "two-step" estimator.

b) Estimate \beta, \gamma and the correlation parameter together by 
maximum likelihood. This is arguably preferable.

Now suppose you have some missing observations in X, in Z or both (far 
from unusual in large micro datasets). Obviously, for the ML estimator you 
can only use the observations that have no missing values for any of the 
variables.

With the two-step estimtors, however, you may have different samples for 
the two equations (1) and (2): if there are missing data in X only, 
nothing forbids you from estimating (2) on the full sample and then (1) on 
the subset for which you actually have data.

Would this be good or bad? The answer I gave myself so far is that on one 
hand, if you use all the data for (2), you end up with better estimates of 
\gamma, which in turn gives you better estimates of the Mills factor and 
hence of \beta. This, of course, assuming that the probabilistic mechanism 
which dictates which rows of X are missing is independent of everything 
else; otherwise, this could be a VERY bad idea.

For the two-step estimator, Stata uses matching samples. What should WE 
do?

Comments welcome. And, oh, before you say "let the user choose", let me 
just say that yes, this is a possibility, but then, what should the 
default behaviour be?

Riccardo (Jack) Lucchetti
Dipartimento di Economia
Università Politecnica delle Marche

r.lucchetti(a)univpm.it
http://www.econ.univpm.it/lucchetti

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004