On Sat, 11 Apr 2009, Riccardo (Jack) Lucchetti wrote:
On Sat, 11 Apr 2009, Allin Cottrell wrote:
> Yes, I like it. But note that at present it produces a horrid
> mess for two-step heckit since the covariance matrix is stuffed
> with NAs/nans. I guess we should be able to fix that up without
> too much difficulty.
Not really, if the coefficients we test for 0 are only those for
the main equation, and leave the selection equation alone (as I
think we should: the selection equation may be interesting in
its own right, but the model you care about is the main
equation); $vcv is block diagonal, so we should be ok.
OK, granted; you just have to careful to limit the test to the
main equation.
I do have one reservation. As you put it, one typically wants the
R^2 as a quick check on whether "a particular model contains any
explanatory variables worth keeping". Yes, there's that, but one
also wants a simple measure of "goodness of fit", and the two can
diverge. Here's a silly tsls example:
<script>
open data4-10
ols ENROLL 0 2 3
tsls ENROLL 0 2 3 ; 3 4 5 6
matrix b= $coeff[2:]
matrix V = $vcv[2:,2:]
W = qform(b', invpd(V))
R2 = W / ( W + ($T-$ncoeff) )
R2 = corr(ENROLL, $yhat)^2
</script>
The correlation-based R^2 that we print currently (and which is
reproduced at the end) is just slightly lower for the tsls model
than the OLS. And in one sense that seems right -- the _fit_ is
only slightly reduced by instrumenting variable 2. On the other
hand, no coefficient is significant in the tsls variant, and the
Wald-based R^2 is much lower.
Allin.