Am 17.04.2025 um 17:15 schrieb Cottrell, Allin:
Yes, it seems clear that BIC is not an effective criterion in the
fat
case. The trouble is that as a saturated model is approached the SSR
will continue to decline somewhat, increasing the log-likelihood,
while the penalty 'k' in the BIC formula (which in regls lasso is the
number of non-zero coefficients) tends to stabilize at around the
number of observations.
Ah, our messages overlapped, see my other email. Your very
intuitive
reasoning explains my open question about gretl's BIC results there.
The net effect is that the BIC continues to
"improve" regardless as lambda shrinks. Use of cross validation, on
the other hand, produces sensible results.
I notice that glmnet does two relevant things in the fat case: (1) it
sets by default a relatively large value (0.01) for the smallest
lambda fraction when the user just specifies a number of lambda
values, and (2) it automatically terminates exploration of small
lambda when the R^2 reaches 0.991 or so.
Yes, I read that, too, although I'm
not sure if the auto-termination
already applied in my example, since I got the full 50 requested lambda
values.
We might do something
similar. Glmnet doesn't produce BIC values (though I see that Scikit
Learn does); if we continue to show them we should probably issue a
warning in the fat case.
Yes; since under your reasoning the minimal BIC will tend to be a corner
solution at the right edge, it might be worth reporting the local
minimum, too. Although finding a local minimum is of course slightly
more involved than a global one.
thanks
sven