Am 12.11.2016 um 21:56 schrieb Allin Cottrell:
On Sat, 12 Nov 2016, Sven Schreiber wrote:
> While we're on the subject of the condition number and
collinearity, I
> have a question about the following example: Open the example data
> hall.gdt and select the two variables "consrat" and "ewr"; the
> correlation output (again, right-click and then select from the
> context menu) then shows a corr coeff of just 0.16. The new
> collinearity analysis gives a whopping 634, an order of magnitude
> greater than the rule-of-thumb value 50. The bkw.gfn package confirms
> this value.
> This strikes me as qualitatively very different, and spontaneously I'm
> not sure why that is so. Any ideas?
The Pearson correlation coefficient is undefined if one or both of the
terms are constants. However, the Belsley condition number can handle a
constant, and presumably the big condition number in the example you
describe (but note, only when you include a constant) is due to the fact
that consrat itself is almost a constant.
That's not quite right, consrat has a distribution not too far from a
normal, actually. What's true is that the scaling/range is quite
different. But you can take 10*consrat and the series then are more
comparable (apart from the mean). Which affects neither the correlation
coefficient nor the condition number, however, so I'm inclined to say
that my question still stands.
cheers,
sven