On 31/05/2024 20:43, Allin Cottrell wrote:
On Fri, 31 May 2024, Sven Schreiber wrote:
> Am 30.05.2024 um 19:28 schrieb g s:
>>
>> My data set has some missing values for some variables. When I do
>> correlation, using the drop down menus (view and then correlation
>> matrix), I notice that gretl reports a correlation matrix excluding
>> all cases with any missing values for any of the variables.
>
> I don't think that's correct. Notice that in the dialog window there's
> a tick box saying "ensure uniform sample". And the results differ
> depending on whether you tick the box or not.
Right, we calculate each pairwise correlation on the maximum available
number of observations for the two series unless that box is ticked --
which corresponds to giving the --uniform option with the "corr" command.
This is certainly useful, and I don't mean in any way to suggest we
shouldn't do it, but users should be aware of the fact the the
collection of individual correlation coefficients computed in this way
may end up being an invalid correlation matrix, because it may violate
the positive-semidefiniteness requirement. Example:
<hansl>
nulldata 9
matrix c = {1;0;-1}
matrix u = {0; 1; 0}
matrix m = mshape(NA, 3, 1)
series x = c | c | m
series y = c | m | c
series z = m | c | u
list L = x y z
print -o
C = I(3)
loop i = 2 .. 3
loop j = 1 .. i-1
C[i,j] = corr(L[i], L[j])
C[j,i] = C[i,j]
endloop
endloop
print C
eval eigen(C)
</hansl>
This problem is avoided by construction if the --uniform switch is in place.
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------