On Mon, 9 Feb 2009, Sven Schreiber wrote:
On 09.02.2009 21:35, Allin Cottrell wrote:
> Thinks: we could ban renaming of variables that have parent status
> -- maybe not a bad idea.
>
Hey hey hey -- it's still a free country, isn't it?
Seriously, I don't think that the approach of behind-the-scenes,
trying to be smarter than the user will work. IMHO if somebody
wants to have forecast errors based on lag polynomials, then
these polynomials should be made explicit. Which brings to mind
the lag specification dialogs in the VAR case for example. To me
it seems that something like that would be needed for
forecasting in the single-equation case. (It also reminds me of
PcGive again, but that's not a bad thing, because they get the
dynamic stuff right...)
[Warning: long discursive reply!]
Yes, I will admit that there are diminishing returns to pursuing
our current design strategy, and that we may have something to
learn from PcGive.
The thing is that for historical reasons -- but not _just_
historical reasons, because it works very nicely, up to a point --
we have a very simple basic mechanism for specifying a regression
in gretl. Namely, the "list": an array of integers. The first
value tells you how many values follow, and the remaining values
are, first, the dependent variable (i.e. the position of that
variable within a two-dimensional array of series), then the
independent variables (again, as indices into the data array).
It's easy to extend this pattern to estimators such as two-stage
least squares (which requires a distinct list of instruments) by
defining a "list separator" integer value; and we have a well
honed suite of functions, in gretl_list.c, that will pick apart or
splice sub-lists defined in this way. And it's not difficult to
extend the mechanism further, by passing an additional integer
(not part of the list itself) to certain estimators, to specify a
global lag order (as in VARs and VECMs).
It's maybe worth remarking that, in the case of GUI dialogs that
present a more "structured" appearance with regard to lags, what's
going on in the background (for the most part) is that these
structured representations are converted into "list" form for
internal consumption. That is, the required lags are generated as
additional series in the data array, and a list is composed
containing the indices of the lagged variables (as if they were
just "any old" variables).
Now, if we had designed from scratch a program for analysis of
time series it's unlikely we would have chosen such a
representation. The "list", although it can contain several
sub-lists, is inherently one-dimensional, and so is not a very
convenient mechanism for specifying tuples of the form (variable,
lag specification).
Note that the design issue here is a matter of relative
convenience and clarity, not possibility/impossibility. The
Turing machine has a linear tape, yet can compute any computable
function. We could certainly pack fully explicit lag-polynomial
specifications into a "list" (perhaps with the use of more than
one "separator" integer). And this may be worth considering.
But maybe the time is approaching when we should think about
substituting a more structured data type for the regression
"list".
This is a bit daunting, since the gretl code base contains
hundreds if not thousands of calls to regression-related functions
that currently expect a list argument, but it's doable.
Allin.