Allin Cottrell schrieb:
Marcin B. has been in touch with me about providing an accessor or
function such that you could retrieve ACF/PACF values for further
analysis, and I thought I'd bring this issue to the list.
IMO it would be a great feature to have.
This could work, but my alternative thought was to provide a
function acf() -- or corrgm(), the name doesn't matter much --
that works in a "stand-alone" manner to provide an acf vector.
This seemed a little more "natural" to me than using an accessor.
Now I take your point that one might also want the partial
autocorrelations, or for that matter the cross-correlogram of two
variables. We could do this by providing three functions, e.g.,
acy(y, p)
pacf(y, p)
xcf(x, y, p)
where x and y are series and p is the maximum lag. It would be
possible to combine acf and pacf into one function by using a
third boolean parameter to distinguish the cases -- or, I suppose,
we could have a corrgm() function with returns a p x 2 matrix with
the ACF in column 1 and the PACF in column 2. I'm not sure which
of these options is best.
I have no strong opinions here, but for heavy-duty number-crunching in
simulations I guess it would be better to only calculate the results
that are really needed to save cpu time. So no automatic returning of
both ACF and PACF unless it is explicitly requested.
Also, I would be in favor of not introducing too many new function names
for the sake of finding quickly what you want in the function
documentation. So I think having one function, say 'acf' (but there may
be better names), with a parameter that determines whether you want the
standard acf, or pacf, or cross-correlation.
In general I would suggest to have only one data argument which must be
either a series, or a list of series, or a matrix. Some further ideas:
If an n-element list or an n-column matrix is provided, the dimension of
the returned matrix depends on the extra parameter:
* if standard acf or pacf is requested, then there would be n columns,
one for each variable in the input (and p rows for the lags)
* if cross-correlation is requested, there would be 0.5*n*(n-1) columns
(or thereabouts....) for all combinations of variables, and something
like 2*p+1 rows for leads and lags including contemporaneous
correlation. The documentation would tell you what the column index for
the i,j-combination of variables is (inspired from the cross-spectrum
functions in R, I don't remember the formula right now, but of course
it's easy to reconstruct once you try).
Once we have all that, we only need functions that return some spectral
weighting windows and we're all set for more frequency-domain analysis!
So, gretl's future is bright (as always... ;-)
thanks,
sven