On Sat, 12 Nov 2016, Sven Schreiber wrote:
Am 12.11.2016 um 21:56 schrieb Allin Cottrell:
> On Sat, 12 Nov 2016, Sven Schreiber wrote:
>> While we're on the subject of the condition number and collinearity, I
>> have a question about the following example: Open the example data
>> hall.gdt and select the two variables "consrat" and "ewr";
the
>> correlation output (again, right-click and then select from the
>> context menu) then shows a corr coeff of just 0.16. The new
>> collinearity analysis gives a whopping 634, an order of magnitude
>> greater than the rule-of-thumb value 50. The bkw.gfn package confirms
>> this value.
>> This strikes me as qualitatively very different, and spontaneously I'm
>> not sure why that is so. Any ideas?
>
> The Pearson correlation coefficient is undefined if one or both of the
> terms are constants. However, the Belsley condition number can handle a
> constant, and presumably the big condition number in the example you
> describe (but note, only when you include a constant) is due to the fact
> that consrat itself is almost a constant.
That's not quite right, consrat has a distribution not too far from a normal,
actually. What's true is that the scaling/range is quite different. But you
can take 10*consrat and the series then are more comparable (apart from the
mean). Which affects neither the correlation coefficient nor the condition
number, however, so I'm inclined to say that my question still stands.
Hmm, I guess so. Maybe worth noting that Belsley's condition number
calculation involves scaling but not centering. So try the
following:
<hansl>
open hall.gdt
series c10 = consrat * 10
series cc = consrat - mean(consrat)
eval cnumber({const, ewr, consrat})
eval cnumber({const, ewr, c10})
eval cnumber({const, ewr, cc})
</hansl>
Multiplying consrat by 10 makes no difference to the condition
number, as you say. But centering it reduces the value a great deal.
(With correlation, of course, neither scaling not centering makes
any difference.)
Allin