I'd like to put to people an issue that Jack and I have been
discussing lately, and that seems to require a
backward-incompatible change at some point.
By way of background, recall that user-defined functions in gretl
are supposed to respect the principle of encapsulation.
Basically, this means that
* external variables are accessible within a function only insofar
as they are passed as arguments; and
* no variables are changed at the caller level other than via
assignment of a return value offered by a function.
In addition, settings of program parameters using the "set"
command, within a function, are confined to that function: we save
the values at the caller level and restore them on exit from the
function. And when a model is estimated within a function, this
does not displace the "last model" (the target for accessors such
as $coeff) from the point of view of the caller: this will be the
last model estimated before the function was called.
So far so good, but the trouble is that the current handling of
lists, both as arguments to functions and as return values, is not
wholly consistent with encapsulation. Sven noted one aspect of
this a while back; Jack noticed another aspect recently.
(1) Sven's point: Suppose a function constructs and returns a list
of variables. As a toy example, suppose it returns a list holding
the cubes of the variables given in a list argument. As a
function writer, one might construct the names of the variables in
the return list as <varname_i>_3, where <varname_i> represents the
name of the variable at position i in the input list.
The issue is this: it's true that nothing will be changed at the
caller level unless this return value is assigned, but all the
same such assignment may have an unintended effect. If there
already exists, say, a variable named "x_3", and if a variable "x"
is passed to the cube function, then x_3 will be overwritten by
the cube of x.
I would argue that this is not an insurmountable problem. I think
we can avoid excessive trouble by ensuring that any public gretl
function packages that do this sort of thing are very explicit in
their documentation. We could add a required field to gretl
function packages that return lists -- a warning along the lines
of "This function returns a list of variables named on the pattern
<pattern>. Please note that any existing variables of the same
name will be overwritten."
(2) Jack's point: Consider a function that accepts a list as an
argument, and suppose that the function is called, with a list
argument that includes a series named "x".
Now let's say the writer of this function wants to create a local
scalar called "x". What happens? You'll get a type error, from
trying to overwrite a series with a scalar. There can be only one
variable named "x" inside the function, and the series x is made
"visible" within the function by virtue of its inclusion in a list
argument.
And what happens if the function writer tries to create a local
series named "x"? If the list argument is marked "const" you'll
get an error; if not, the external "x" will be silently
over-written. Bad.
A moment's thought shows that this problem is less tractable than
the first: it's a design flaw, and it can't be solved simply by
documenting clearly what the function does.
Here's a proposal for solving the second problem. It is
implemented in CVS but not by default, only if you edit
lib/src/gretl_func.c and change the definition
#define PROTECT_LISTS 0
to
#define PROTECT_LISTS 1
The essential component is that, when a series is passed to a
function as part of a list argument, the series is _not_
visible/accessible by name within the function. In relation to
the problem cases mentioned above, it will then be OK for the
function writer to create a local scalar or series "x": this won't
collide with a list-argument series "x" since the latter will
never by "seen" by name within the function.
Accordingly, we modify the action of a "foreach i" loop over the
members of a list argument to a function. Instead of "$i"
retrieving the name of the series at position i in the list, it
retrieves the ID number of that series (and series ID numbers are
always unique, regardless of where we are in terms of function
execution).
Two questions are likely to arise here:
1. How do you construct a new series name based on the name of a
series given via a list argument (as in the cubes example)?
Answer: You can get the input variable's name as a string variable
using the built-in varname() function, with the variable's ID
number as argument.
2. How do you get hold of the values of a series given via a list
argument, for use on the right-hand side of a genr expression?
Answer: Use the new built-in function varcopy(). Again, this
takes a variable's ID number as argument, circumventing the
possible collision of names.
Illustration: Take the trivial cubes example -- here is a
new-style implementation
function make_cubes (list xlist)
list retlist = null
loop foreach i xlist
string oldname = varname($i)
sprintf newname "%14s_3", oldname
series @newname = varcopy($i)^3
setinfo @newname -d "cube of @oldname"
list retlist += @newname
end loop
return list retlist
end function
Two questions for you: First, does this solution seem acceptable?
Second, if it's OK, how soon should we aim to make the change?
Allin.