On Fri, 24 Mar 2017, Allin Cottrell wrote:
On Fri, 24 Mar 2017, Sven Schreiber wrote:
> Am 23.03.2017 um 19:05 schrieb Allin Cottrell:
>
>> 1) The convention when calculating BIC from a model estimated by least
>> squares is to set "k" to the number of regression coefficients
(leaving
>> aside the error variance), while the convention under MLE is to include
>> the variance estimator in k. (Or at least I think that's a fair
>> statement of the case.)
>
> One more follow-up here: Can you give a source for the convention? I guess
> in principle one can make the case that also the error variance could be
> fixed a priori and not estimated, and so k should change accordingly. But
> right now I don't see why that argument wouldn't apply to OLS as well.
> (Or are there some block-diagonal and/or asymptotic independence arguments
> that would apply to one estimator here and not the other?)
I don't know of any canonical source of the convention, and in fact it's not
universal. Some writers argue for including the variance parameter in the "k"
count for least squares, but it seems that most software doesn't do that
(Stata, SAS, SPSS at least). R does include the extra term, however. William
Greene doesn't include it, in his account of info criteria in Econometric
Analysis, but he doesn't comment on the matter.
I guess using k = (number of regressors) in the least squares case is
motivated by the fact that k in that sense is the standard measure of
loss of degrees of freedom in estimation.
My guess is that, for the purpose information criteria are designed for,
including the variance or not is quite irrelevant, so either version is
legitimate.
The main virtue of an IC is being consistent, that is, to pick the right
model with probability 1. In an OLS model, you don't really get a choice
as to whether estimating the variance or not (you have to), so "picking
the right model" essentially means "making the right choice on the
regressors". So in that case I suppose (but I don't have a proof, it's
just my gut feeling), that using ln(n)*k or ln(k)*(k+1) as a penalty term
doesn't make a difference asymptotically. Of course, as usual, in finite
samples asymptotically equivalent choices may not be equivalent at all
(see eg Wald vs LM vs LR tests).
As the saying goes, "in theory, there's no difference between theory and
practice; in practice, there is".
But again, this is just my intuition, and I could be very wrong.
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------