pca bug and issues reloaded

Sunday, 14 December 2014

Hi,

there was an open-ended thread initiated by Paulo Grahl
http://lists.wfu.edu/pipermail/gretl-users/2013-December/009475.html)
about gretl's 'pca' command. I checked again and I think there still
--with a very recent snapshot-- is a bug, although slightly different
from Paulo's experience. Here's an example script:

<hansl>
open denmark
list vars = IBO IDE

# compare 'pca' and 'princomp()' in the full sample
matrix P1 = princomp({vars}, 1)
pca vars --save=1	# turns out they coincide; good

# now compare them in the reduced sample
smpl 1980:1 1985:1
matrix P2 = princomp({vars}, 1)
pca vars --save=1 # matrix and series differ; bad

# check if the PCs are different in the overlapping range
if sum(PC1 - PC11) > 0.01	# PC naming is fragile...
    print "ok"
else
    print "PCs are the same although they should differ" # I get this
endif

smpl --full
</hansl>

Summary: The 'princomp()' function seems to work fine, but 'pca'
apparently uses the full sample for calculating the pca, even if a
reduced sample is specified. What's different from Paulo's report is
that the PCs are saved only over the reduced sample range (but the
values are still wrong).

I also would like to (re-) raise some other issues with pca:

- Accessor for the loadings, as suggested by Henrique
(http://lists.wfu.edu/pipermail/gretl-users/2012-March/007346.html) and
in terms of the princomp() function by myself. Allin answered that it's
easy to get them as the eigenvectors of the correlation matrix. This is
of course correct, but first it's a convenience issue, and secondly if
you perhaps want to do some simulations it seems like an avoidable
inefficiency to compute the eigenvectors twice (first implicitly in
princomp, and then explicitly by hand).

- Automatic printing of the workfile variables: When using 'pca' to save
some PCs to the workfile, gretl automatically prints all the variables
in the workfile. IMHO this contaminates the script output for no good
reason (I currently have thousands of variables in there, and it really
is a long list in the output...). So could this be switched off?

Thanks,
sven

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006