On Wed, 29 Feb 2012, Lee Adkins wrote (in response to Talha):
> These people were from State Planning Agency. They told me that
they
> have about 60 series (which have different levels of collinearity) and
> they use SPSS to do principal components analysis to create regional
> development indices (Turkey has 81 provinces). I am not very familiar
> with pca but I can call them and learn more.
>
I think you just need the eigenvectors. Put all the series into a matrix,
get the eigenvectors, and peel off the first few, and put these into
series; these are pc.
You can do PCA manually if you like, but I guess Talha's
students were not too keen on that idea. But in fact gretl has
a perfectly good "pca" command (accessible via GUI and script)
that takes a list of series as input, and outputs either all
of the components or just those whose associated eigenvalues
are above the mean. By default the correlation matrix is used
as the basis for the PCs but there's an option to use the
covariance matrix instead.
In current CVS and snapshots I've enhanced the pca command
slightly: there's now a --quiet option if you don't care to
see all the eigen-analysis, and the --save option now takes an
optional integer parameter so you can specify exactly how many
components to save.
>>> 2)- Some students suggested (and all others agreed) that
it would be
>>> very useful to have a predict command, which will provide predicted
>>> values as well as slopes (given Xs) for various nonlinear models such
>>> as polynomial regressions, logit, probit etc. I think this could be
>>> nice to have as a command as well as a GUI entry next to the forecast
>>> item. Maybe a small goodie to consider for the 2.0 release? They said
>>> Stata has this.
>>
>> I don't see what the difference is between "predicted values"
>> and what we offer already (in sample fitted values and
>> out-of-sample forecasts). Can you expand on what you mean?
> Now this is maybe I didn't know how to fully use gretl in this
> context. The issue arised on 2 occasions:
> (1) I had a polynomial regression and I was showing them to enter from
> the GUI the command something like:
> prediction = $coeff[1] + $coeff[2]*x + $coeff[3]*x^2
Yes, in gretl that sort of thing is required if you want to
predict for some x-value that's not in the dataset. Or you
could do it by adding one observation to the dataset,
containing the x-values for which you want a prediction, then
asking for an out of sample forecast. I guess we could add a
variant of "fcast" (or maybe a new command-word is wanted),
for use after estimating a model, which takes a vector of
x-values as input and produces the prediction plus a standard
error. That woud be quite simple.
> (2) I was showing an ordered logit example and I had long
commands like:
> pcut0 = 1 / (1+exp(-$coeff[1]-x*$coeff[2])+exp(-$coeff[1]-x*$coeff[3]))
> pcut1= exp(-$coeff[1]-x*$coeff[2]) /
> (1+exp(-$coeff[1]-x*$coeff[2])+exp(-$coeff[1]-x*$coeff[3]))
> pcut2= exp(-$coeff[1]-x*$coeff[3]) /
> (1+exp(-$coeff[1]-x*$coeff[2])+exp(-$coeff[1]-x*$coeff[3]))
>
> ...and they said Stata (supposedly) has a command where you enter x
> and get the prediction and slope for different models :-P
Prediction for nonlinear limited dependent variable models are
trickier; as Lee says, there are various things one might to
see under this heading, and gretl's offering are limited. This
is a case where people can write gretl addons to do the job.
Open source suggests perhaps more modularity than one gets
with a proprietary software like Stata (though nearly
everything it does is executed in .do or .ado add-ons).
What we discussed in Torun was the idea that Allin and Jack
would work on the back-bone and that others would try to
develop the expertise with the bundle concept to add
specific functionality. So, the question is, is enhancing
prediction or marginal effects a back-bone issue or an
add-on? (I'm not sure)
Good question, but I tend to think it could be addressed by
addons (though the writers of such addons might reasonably
request some additional built-in functions to make their lives
easier).
My idea of a back-bone issue would be the introduction of
factor variables (version 2.0). In it, variables are
defined as being continuous or discrete. They can be
interacted in various combinations (continuous-discrete,
continutous-continuous and discrete-discrete very easily
within Stata's .do files).
Gretl does have the continuous/discrete distinction. But we
could make more use of it.
Allin