[Gretl-devel] gretl functions and lists (long)

Saturday, 12 July 2008

I'd like to put to people an issue that Jack and I have been 
discussing lately, and that seems to require a 
backward-incompatible change at some point.

By way of background, recall that user-defined functions in gretl 
are supposed to respect the principle of encapsulation.  
Basically, this means that

* external variables are accessible within a function only insofar 
as they are passed as arguments; and

* no variables are changed at the caller level other than via 
assignment of a return value offered by a function.

In addition, settings of program parameters using the "set" 
command, within a function, are confined to that function: we save 
the values at the caller level and restore them on exit from the 
function.  And when a model is estimated within a function, this 
does not displace the "last model" (the target for accessors such 
as $coeff) from the point of view of the caller: this will be the 
last model estimated before the function was called.

So far so good, but the trouble is that the current handling of 
lists, both as arguments to functions and as return values, is not 
wholly consistent with encapsulation.  Sven noted one aspect of 
this a while back; Jack noticed another aspect recently.

(1) Sven's point: Suppose a function constructs and returns a list 
of variables.  As a toy example, suppose it returns a list holding 
the cubes of the variables given in a list argument.  As a 
function writer, one might construct the names of the variables in 
the return list as <varname_i>_3, where <varname_i> represents the 
name of the variable at position i in the input list.

The issue is this: it's true that nothing will be changed at the 
caller level unless this return value is assigned, but all the 
same such assignment may have an unintended effect.  If there 
already exists, say, a variable named "x_3", and if a variable "x" 
is passed to the cube function, then x_3 will be overwritten by 
the cube of x.

I would argue that this is not an insurmountable problem.  I think 
we can avoid excessive trouble by ensuring that any public gretl 
function packages that do this sort of thing are very explicit in 
their documentation.  We could add a required field to gretl 
function packages that return lists -- a warning along the lines 
of "This function returns a list of variables named on the pattern 
<pattern>. Please note that any existing variables of the same 
name will be overwritten."

(2) Jack's point: Consider a function that accepts a list as an 
argument, and suppose that the function is called, with a list 
argument that includes a series named "x".

Now let's say the writer of this function wants to create a local 
scalar called "x".  What happens?  You'll get a type error, from 
trying to overwrite a series with a scalar.  There can be only one 
variable named "x" inside the function, and the series x is made 
"visible" within the function by virtue of its inclusion in a list 
argument.

And what happens if the function writer tries to create a local 
series named "x"?  If the list argument is marked "const" you'll 
get an error; if not, the external "x" will be silently 
over-written.  Bad.

A moment's thought shows that this problem is less tractable than 
the first: it's a design flaw, and it can't be solved simply by 
documenting clearly what the function does.

Here's a proposal for solving the second problem.  It is 
implemented in CVS but not by default, only if you edit 
lib/src/gretl_func.c and change the definition

#define PROTECT_LISTS 0

to

#define PROTECT_LISTS 1

The essential component is that, when a series is passed to a 
function as part of a list argument, the series is _not_ 
visible/accessible by name within the function.  In relation to 
the problem cases mentioned above, it will then be OK for the 
function writer to create a local scalar or series "x": this won't 
collide with a list-argument series "x" since the latter will 
never by "seen" by name within the function.

Accordingly, we modify the action of a "foreach i" loop over the 
members of a list argument to a function.  Instead of "$i" 
retrieving the name of the series at position i in the list, it 
retrieves the ID number of that series (and series ID numbers are 
always unique, regardless of where we are in terms of function 
execution).

Two questions are likely to arise here:

1. How do you construct a new series name based on the name of a 
series given via a list argument (as in the cubes example)?

Answer: You can get the input variable's name as a string variable 
using the built-in varname() function, with the variable's ID 
number as argument.

2. How do you get hold of the values of a series given via a list 
argument, for use on the right-hand side of a genr expression?

Answer: Use the new built-in function varcopy().  Again, this 
takes a variable's ID number as argument, circumventing the 
possible collision of names.

Illustration: Take the trivial cubes example -- here is a 
new-style implementation

function make_cubes (list xlist)
   list retlist = null
   loop foreach i xlist
      string oldname = varname($i)
      sprintf newname "%14s_3", oldname
      series @newname = varcopy($i)^3
      setinfo @newname -d "cube of @oldname"
      list retlist += @newname
    end loop
    return list retlist
end function

Two questions for you: First, does this solution seem acceptable? 
Second, if it's OK, how soon should we aim to make the change?

Allin.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Gretl-devel] gretl functions and lists (long)