On Sat, 11 Apr 2009, Allin Cottrell wrote:
I do have one reservation. As you put it, one typically wants the
R^2 as a quick check on whether "a particular model contains any
explanatory variables worth keeping". Yes, there's that, but one
also wants a simple measure of "goodness of fit", and the two can
diverge.
True. You have a valid point here. However, let me state my point of view
more clearly (sorry, this WILL be slightly verbose).
OLS is quite exceptional among estimation methods, in that the OLS
statistic \hat{\beta} has a dual interpretation: it is at the same time a
nice descriptive statistic (the solution of a purely algebraic
minimisation problem, or geometric if you like), which also happens to be
a smart choice as an estimator, under certain circumstances.
This lucky coincidence allows the R2 statistic to have a dual
interpretation too: "goodness of fit" and "overall validity of the
model".
The first interpretation comes quite natural when the dependent variable
is continuous and you think of a statistical model as a machine that
yields the "best" approximation to it. It makes very good sense to judge
the approximation on the basis of correlation (and square it if you feel
like doing it). A notable advantage of this interpretation is that it
involves no probability/inference concepts. To take it to the extreme,
it's just the square cosine of an angle. As such, it's a nice measure.
Unfortunately, this interpretation often breaks down in several
cirumstances. In a Tobit model, for example, you have a non-null share of
the sample for which that the dependent variable is 0. Does it make sense
to use those values when computing the correlation? Or should you just
ignore them? What should you do when the dependent variable is discrete,
like in a probit model? Or, worse, for a _multinomial_ probit model, in
which the numbers often have only a conventional value? And the list goes
on.
Hence, what I had in mind when I proposed the Wald-based R2 is to leverage
on the second interpretation instead (note that other versions of
"generalised R2" that have been proposed in the literature, like
McFadden's, have a similar justification).
In this sense, the TSLS example you make is very well chosen, as TSLS lies
very near to the border. It makes sense to compute fitted values for TSLS
models, but in general you should not expect them to fit the data
particularly well. If you have a model
y = Xb + u
in which you have reasons to say that X and u are correlated, you're
saying (with a choice of words that has well-known historical reasons)
that the vector of coefficients you're interested in is not the parameter
of the conditional mean, but something else, which has a richer
behavioural meaning. As a consequence, TSLS is not an "approximation
machine", but rather a very clever way to solve a problem of
interpretation of the estimates, since the estimates of "b" you end up
with have no special property in terms of "fit" (well, ok, they have, but
only if you think in terms of oblique projections rather than orthogonal).
In this context, the "goodness of fit" measure may diverge from the
"overall significance" by a large extent. The question is: which one
should we appoint to the R2 office? Which of course has no obvious answer.
So... I don't know! Maybe we just ought to leave this to somebody else and
confine R2 to least-squares based models. Maybe not. I'm open!
PS By the way: happy Easter to everybody!
Riccardo (Jack) Lucchetti
Dipartimento di Economia
Università Politecnica delle Marche
r.lucchetti(a)univpm.it
http://www.econ.univpm.it/lucchetti