Thanks for the comments Sven.
I did as you suggested and the results are not the same. The principal components calculated using the original dataset with the sample restricted and the ones calculated using a new dataset with missing values are different.

So back to square one. Clearly  one could use matrix to do the PCA calculation directly and then it would be easy to pick only the observations that I want for my analysis -- but I am not too familiar with gretl script syntax.

Any idea how I can handle that using simple gretl commands?

On Wed, Dec 4, 2013 at 4:31 PM, Sven Schreiber <> wrote:
Am 04.12.2013 19:16, schrieb Paulo Grahl:
> Hello,
> I am struggling with a simple issue:
>       I have a data set of monthly time series that spans 2000:01 to 2012:12
>       I want to run "pca" in a subset, from 2000:01 to 2011:12,
>       So I change the sample using "smpl ; 2011:12" command before
> running "pca" command and save all the principal components.
>       But when I change back to full sample I can see the principal
> components running through all my sample -- so I assume the "pca" runs
> in the full dataset and does not respect the "smpl" command.
> Is this the case? If so, how do I run "pca" in a subset of data?

I don't know if it's the case (which would look like a bug), but a
clunky workaround would be to save copies of the original data and set
all the post 2011m12 values to missing there. You can then also compare
the saved components with the first ones to see if indeed there's a
difference (which shouldn't be if the sample selection is honored).


Gretl-users mailing list

Dr. Paulo Gustavo Grahl, CFA
+55(21) 8809-9254