On Thu, 10 Feb 2005, Allin Cottrell wrote:
For instance there's a "black" dummy variable in your
model. Since
nobody changed their color over the 8 years, the deviation from the
"group mean" is always zero for this variable. I need to devise a
way of handling such issues of perfect collinearity.
Of course, the same problem arises if you have a variable like
"black" and try to include a dummy variable specific to each person
-- it's just that the problem may not be so apparent.
If anyone on the list has thoughts on the Right Way to handle this
issue, I'd be very glad to hear them!
Well, this is a known problem in panel data estimation. Collinearity here
is a by-product of under-identification. Allow me a brief
recap of the econometric issues involved (I'm going to use LaTeX
notation, sorry for those unfamiliar):
in general, a static panel data model can be written as:
y_{it} = X_{it} \beta + Z_i \gamma + \alpha_i + \epsilon_{it}
where the X's are time-varying regressors and the Z's are time-invariant
(like "black") in the example; note that Z also includes the constant
term. The fundamental difference between the within estimator and the GLS
estimator is in the treatment of the individual effects \alpha_i.
- the within estimator treats the \alpha's as given. No assumptions are
made on the distribution of the individual effects, and the \alpha's are
treated, for all purposes, as nuisance parameters. Therefore, \gamma
is obviously unidentified. In this context, the only parameter it makes
sense to estimate is \beta.
- in GLS estimation, we introduce an additional hypothesis on the
distribution of the \alpha's, that is:
E(\alpha_i | X_{it} , Z_i) = 0 ;
under this hypothesis, it becomes possible to estimate not only \beta, but
\gamma as well and GLS just happens to be the best linear unbiased
estimator in this case; some estimators have been devised for dealing
with the case where the \alpha_i's are correlated with the explanatory
variables, but I consider it a niche market for gretl, at least for now.
This is the reason why the Hausman test commonly used just contrasts the
within and GLS estimates of \beta (as opposed to \beta AND \gamma):
simply, no estimator of \gamma exists unless we make some assumptions on
\alpha_i.
I feel that we ought to have a panel estimation command that, given a list
of regressors:
1) spits out the OLS (pooled) estimate;
2) performs within estimation with time-varying regressors only, and
prints the results out;
3) does GLS for all regressors;
4) computes and prints the Breusch-Pagan poolability test (aka OLS vs
rest of the world) and the Hausman test (aka GLS vs within).
The ideal way to deal I would like it would be having a command like (ok,
I know the command "panel" is already taken):
panel y const x1 x2 z1 x3
having gretl figure out which regressors belong to the X's and which
don't. An acceptable alternative could be
panel y x1 x2 x3 ; const z1
where it's the user's responsibility to separate the two sets.
Thoughts, anybody?
Riccardo `Jack' Lucchetti
Dipartimento di Economia
Università di Ancona
jack(a)dea.unian.it
http://www.econ.unian.it/lucchetti