Re: [Gretl-users] Principal Components

Saturday, 23 January 2010

On Sat, 23 Jan 2010, Riccardo (Jack) Lucchetti wrote:

...
 On Fri, 22 Jan 2010, Henrique wrote:

 > Dear Gretl community,
 >
 > I want to create a index using five variables using gretl's principal
components
 > analysis and I would like to know if I'm doing it properly. I'll describe my
steps:
 >
 > Step 1: Compute principal components (Main window, View -> Principal
components);
 > Step 2: Save all components (PC1, ..., PC5);
 > Step 3: index = PC1 + ... + PC5.

 Sorry, Henrique, maybe I'm missing the point, but what are you trying to
 do? Principal components are useful exactly because they are orthogonal
 (incorrelated if you prefer) to each other, so they carry non-overlapping
 information. If you want an index that contains the maximum possible
 amount of information that one single variable can contain, take the first
 PC (the one associated with the highest eigenvalue) and you're ok. If you
 take their sum you end up with a variable that contains LESS information,
 not more. 
I gather via google that's there's a literature out there on
constructing index variables of one sort or another on the basis
of PCA. I suppose if you add up all the components you get some
sort of weighted sum of the original series, which might perhaps
be useful in certain contexts (supposing that you _want_ to lose
information relative to the original set of data).

This untutored example doesn't prove anything at all but I found
it moderately interesting:

<script>
open data7-12
pca wbase length width height weight --save-all
series idx = PC1 + PC2 + PC3 + PC4 + PC5
series xsum = wbase + length + width + height + weight
ols price 0 idx
ols price 0 xsum
ols price 0 PC1
ols price 0 weight
</script>

The sum of PCs of cars' characteristics predicts price better than
the sum of characteristics (which is hardly meaningful), and a bit
better than PC1 alone, but it doesn't do nearly as well as the
most relevant of the original variables, namely the weight of the
car.

Allin.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Gretl-users] Principal Components