Hi all
Question. When I did the correlation, all I got was the correlation coefficients. Is it
possible for correlation to show the n and p values for each pair? This is when there are
missing and correlation shows pairwise correlations.
Thanks
Gene
On Sunday, June 9, 2024 at 06:46:09 PM EDT, g s <gsociology(a)yahoo.com> wrote:
Hi all
Just fyi, I used the gretl_install-64.exe from June 2nd, and redid the correlation,
including all my variables (BirthRate, Agriculture, Service, GDPPerCap, Population,
InfantMort) in the correlation analysis. I got the same results as SAS gave me, so we are
good.
Now on to regression.
Thanks
Gene
On Monday, June 3, 2024 at 11:59:01 PM EDT, g s <gsociology(a)yahoo.com> wrote:
Hi Sven, Allin
Thanks! I'll recheck when the snapshot is updated.
Gene
On Monday, June 3, 2024 at 11:07:25 PM EDT, Allin Cottrell <cottrell(a)wfu.edu>
wrote:
On Sun, 2 Jun 2024, Sven Schreiber wrote:
Am 02.06.2024 um 06:37 schrieb g s:
> Yes, I'm not exactly clear on what the results are showing.
>
>
> 1) I did a correlation matrix of the following variables: BirthRate, Agriculture,
Service, GDPPerCap, Population, InfantMort. I did NOT click on "ensure uniform sample
size".
>
> The top of the results box says
> Correlation Coefficients, using the observations 4 - 229
> (missing values were skipped)
> Two-tailed critical values for n = 221: 5% 0.1320, 1% 0.1729
>
> A couple of results:
> BirthRate and Agriculture = 0.7021
> ...
...
>
> 3) Next, I did a correlation matrix of JUST BirthRate and Agriculture.Here are the
results:
>
> corr(BirthRate, Agriculture) = 0.68261942
> Under the null hypothesis of no correlation:
> t(220) = 13.855, with two-tailed p-value 0.0000
>
> The correlation here is different from steps 1 or 2,
Yes, I can confirm that, and indeed the difference between 1 and 3 (values 0.702 and
0.683) is unexpected, I'd say. Perhaps there's something wrong when a subset of
variables are selected and missing values are all over the place
True. In corr without the --uniform option we were trimming from the start and end of the
sample range observations with at least one missing value among the selected series. That
was wrong: we should only trim observations with missing values for _all_ of the selected
series.
So some of the individual correlations could end up using fewer observations than the
maximum possible. In the example given above, the BirthRate,Agriculture and
population,GDPPerCap correlations were being calculated with n = 221 and n = 225,
respectively, when this should have been n = 222 and n = 228.
That's now fixed in git; snapshots will follow before long.
Thanks, Gene, for probing this matter.
Allin Cottrell