On Sat, 23 Jan 2010, Riccardo (Jack) Lucchetti wrote:
On Fri, 22 Jan 2010, Henrique wrote:
> Dear Gretl community,
>
> I want to create a index using five variables using gretl's principal
components
> analysis and I would like to know if I'm doing it properly. I'll describe my
steps:
>
> Step 1: Compute principal components (Main window, View -> Principal
components);
> Step 2: Save all components (PC1, ..., PC5);
> Step 3: index = PC1 + ... + PC5.
Sorry, Henrique, maybe I'm missing the point, but what are you trying to
do? Principal components are useful exactly because they are orthogonal
(incorrelated if you prefer) to each other, so they carry non-overlapping
information. If you want an index that contains the maximum possible
amount of information that one single variable can contain, take the first
PC (the one associated with the highest eigenvalue) and you're ok. If you
take their sum you end up with a variable that contains LESS information,
not more.
I gather via google that's there's a literature out there on
constructing index variables of one sort or another on the basis
of PCA. I suppose if you add up all the components you get some
sort of weighted sum of the original series, which might perhaps
be useful in certain contexts (supposing that you _want_ to lose
information relative to the original set of data).
This untutored example doesn't prove anything at all but I found
it moderately interesting:
<script>
open data7-12
pca wbase length width height weight --save-all
series idx = PC1 + PC2 + PC3 + PC4 + PC5
series xsum = wbase + length + width + height + weight
ols price 0 idx
ols price 0 xsum
ols price 0 PC1
ols price 0 weight
</script>
The sum of PCs of cars' characteristics predicts price better than
the sum of characteristics (which is hardly meaningful), and a bit
better than PC1 alone, but it doesn't do nearly as well as the
most relevant of the original variables, namely the weight of the
car.
Allin.