On Mon, 16 Apr 2007, ab.news wrote:
In the correlation matrix window, in the presence of missing
values, does the 5% critical value make any sense since the
number of observations "n" is not the same for all pairwise
correlations? I guess -- when some some values are missing--,
we'll need either a 5% critical value for each correlation
coefficient or a matrix correlation computed on a listwise
basis, meaning that the same observations are used throughout
the dataset.
This is quite a tricky issue. Here's what I have come up with:
* We keep track of how many observations, n_ij, are used in
calculating each correlation coefficient in the matrix.
* We then calculate the proportional difference between the
maximum and minimum n_ij values. If this is less than 0.1, we
report the 5 percent critical value, using min(n_ij) to be on the
conservative side. If the difference is greater than 0.1, we
don't show a critical value.
* The above is the default behaviour. But at the command line you
can give the "--uniform" flag (uniform sample size) to the corr
command. In this case gretl will determine the maximum sample for
which all variables are observed, and calculate all the
coefficients using that sample. A critical value will be
reported. (Obviously, this option makes a difference to the
output only if there are missing values, and they're not perfectly
aligned across the chosen variables.)
* Note also that you can always get a critical value for a
specific correlation by calling "corr" with only 2 arguments.
This is in CVS and the current Windows snapshot.
Allin.