Hi all

Question. When I did the correlation, all I got was the correlation coefficients. Is it possible for correlation to show the n and p values for each pair? This is when there are missing and correlation shows pairwise correlations.


Thanks

Gene



On Sunday, June 9, 2024 at 06:46:09 PM EDT, g s <gsociology@yahoo.com> wrote:


Hi all

Just fyi, I used the gretl_install-64.exe from June 2nd, and redid the correlation, including all my variables (BirthRate, Agriculture, Service, GDPPerCap, Population, InfantMort) in the correlation analysis. I got the same results as SAS gave me, so we are good.

Now on to regression.

Thanks

Gene



On Monday, June 3, 2024 at 11:59:01 PM EDT, g s <gsociology@yahoo.com> wrote:


Hi Sven, Allin

Thanks! I'll recheck when the snapshot is updated.

Gene


On Monday, June 3, 2024 at 11:07:25 PM EDT, Allin Cottrell <cottrell@wfu.edu> wrote:


On Sun, 2 Jun 2024, Sven Schreiber wrote:


> Am 02.06.2024 um 06:37 schrieb g s:
>> Yes, I'm not exactly clear on what the results are showing.
>>
>>
>> 1) I did a correlation matrix of the following variables: BirthRate, Agriculture, Service, GDPPerCap, Population, InfantMort. I did NOT click on "ensure uniform sample size".
>>
>> The top of the results box says
>>       Correlation Coefficients, using the observations 4 - 229
>>       (missing values were skipped)
>>       Two-tailed critical values for n = 221: 5% 0.1320, 1% 0.1729
>>
>> A couple of results:
>> BirthRate and Agriculture = 0.7021
>> ...
> ...
>>
>> 3) Next, I did a correlation matrix of JUST BirthRate and Agriculture.Here are the results:
>>
>> corr(BirthRate, Agriculture) = 0.68261942
>> Under the null hypothesis of no correlation:
>> t(220) = 13.855, with two-tailed p-value 0.0000
>>
>> The correlation here is different from steps 1 or 2,
>
> Yes, I can confirm that, and indeed the difference between 1 and 3 (values 0.702 and 0.683) is unexpected, I'd say. Perhaps there's something wrong when a subset of variables are selected and missing values are all over the place


True. In corr without the --uniform option we were trimming from the start and end of the sample range observations with at least one missing value among the selected series. That was wrong: we should only trim observations with missing values for _all_ of the selected series.

So some of the individual correlations could end up using fewer observations than the maximum possible. In the example given above, the BirthRate,Agriculture and population,GDPPerCap correlations were being calculated with n = 221 and n = 225, respectively, when this should have been n = 222 and n = 228.

That's now fixed in git; snapshots will follow before long.

Thanks, Gene, for probing this matter.

Allin Cottrell