I apologise for not following up the replies to my e-mail earlier in
this week, but I have been preoccupied with other business - in part
the result of difficulties in trying to get Linux to work reliably on
a new Thinkpad instead of Windows Vista. The distributions I started
with - Mandriva & Ubuntu - were unable to handle the wireless network
and other hardware properly. In addition, VMware Server, which I use
to run Windows XP in a virtual machine, has become a large unwieldy beast.
Anyway, the problem raised in Allin's response really concerns the
correct way of dealing with directories when writing sample
scripts. I have assumed that both of the scripts (example and
function) plus the data are stored in the user's working
directory. The missing function identified by Allin "sfa_eff" is one
component of the new (renamed) functions that I have not uploaded to
the server, because I wanted to ensure that everything is properly
documented and working reliably under different operating systems.
On my Windows system my working directory is E:\gretl\sfa_work, but
it is different under Linux & Mac OS-X. Maybe, every sample script
should include an explicit statement of the assumption that is made
along the following lines:
# IMPORTANT NOTE:
# 1. Before executing this script, store dataset xxx.gdt and script
# and yyy.inp in a single directory - e.g. d:\gretl\work
# 2. Use the GUI command File/Working directory to set this directory
# - e.g. d:\gretl\work - as your working directory.
I can't access the ftp server at Wake Forest, so I have not checked
whether the problem with numerical derivatives still occurs in the
most recent Windows snapshot.
I prepared a gdt file of the famous Galton data showing regression
toward the mean (find the data description below). I propose adding
this to the gretl's sample data files.
Also attached is a plot of the data found somewhere over the net. I
was wondering if it is possible to reproduce a similar graph using
gretl. This would be useful.
What would also be very very useful is the possibility to add any line
(or a curve) to a graph by entering its formula in gretl's plot
controls. Currently we can't even add a 45 degree line.
Francis Galton's Mid-parent Child Height Data
Galton's famous comparison of the heights of 928 adult children with
those of their 205 pairs of parents. It shows that when the parents
are taller than the median, their children tend to be shorter and when
the parents are shorter than the median, their children tend to be
taller. Galton termed this as "regression towards mediocrity."
parent = Mid-parent height in inches (Range 64 - 73)
child = Child height in inches (Range 61.7 - 73.7)
Galton, F., "Regression Towards Mediocrity in Hereditary Stature,"
Journal of the Anthropological Institute of Great Britain and Ireland,
15, 246--263, 1886.
"Remember not only to say the right thing in the right place, but far
more difficult still, to leave unsaid the wrong thing at the tempting
moment." - Benjamin Franklin (1706-1790)
Did the e-mail that I sent reporting a possible bug in the most
recent version of mle get through? I am not sure what happens to
postings that have attachments. Anyway, the problem is still there
in the most recent Windows cvs version.
An edited version of the report follows:
"However, in experimenting with it I have come across what appears to
be a bug associated with the introduction of the --numeric option. I
have run an identical script (without the --numeric option) using the
Windows CVS versions dated 30Sep08 and 3Oct08. Everything executes
properly when the 30Sep08 version is installed, but when the 3Oct08
version is installed the code for numeric derivatives fails saying
that the numeric Hessian cannot be computed. Unfortunately, it seems
that the different behaviour is not consistent since scripts will run
ok under the two versions but others do not."
Output from mle using numeric derivatives:
Panel Stochastic Frontier Analysis: Normal-Truncated Normal
Production Function specification with Time-Varying Inefficiency
Starting values generated by the program
Using numerical derivatives
Tolerance = 1e-005
Failed to compute numerical Hessian
Error executing script: halting
> matrix sfa_coeff2 = sfa_panel_mod(lnwidgets, xlist, "P", "Y", "N")
Following up the replies to my questions:
A. The redundant functions on the server which should be removed are
(a) sfa_eff, and (b) sfa_het_eff. These have been replaced by
renamed versions for consistency with the rest of my functions.
C. I will see what I can do about making my test dataset available
to other users.
D. The new option for mle & nls. Thank you for implementing this
change. Going a little further, is there any way of controlling the
option via a flag? At the moment I have different but almost
identical blocks of code that are executed if a flag is set to
request numeric or analytical derivatives. It would be nice, but not
essential if the "-n" could be parameterised, but I don't know
whether this is possible and if so how.
Using pvalue(n,x) instead of 1-cnorm(x). I have made this
change. It seems to be more improve the behaviour of mle a little,
but it is still difficult to avoid the gradient search blowing up in
various circumstances. Strangely, the use of analytical derivatives
seems to create particular problems when I start the procedure from
the coefficient values obtained using a restricted specification
which implies that a particular parameter eta is zero. It is clear
that the likelihood function isn't globally concave, but using
numerical derivatives seems to be a lot more robust in such circumstances.
I have revised and extended my collection of functions to estimate
stochastic frontier models. Now, the complete set includes versions
for panel and non-panel data, both with and without heteroskedastic errors.
I have some queries before I upload the new versions to the server.
A. I have changed the conventions that I have used to name
functions. As a result the new versions will not just overwrite the
old versions. Is there a method by which I can delete or deprecate
the old versions, so as to ensure that users get access to the
B. Allin mentioned the possibility of providing multiple entry
points in a function package. Has this been implemented? At the
moment, each of my function packages provides (i) a function for
estimating the relevant model, and (ii) another function for
obtaining the efficiency residuals using the estimated model. In the
case of the panel functions there are subsidiary functions that are
shared between the functions in the package. Hence, it would be
convenient to load the functions as a package with multiple entry
points rather than as separate functions.
C. The example runs for the non-panel functions use one of Greene's
data files from the standard gretl install package. However, I
haven't identified a suitable test dataset for the panel functions,
so I have created my test data (an extension of the test data used by
Stata). Is there a way of uploading this data along with the sample
programs so that users can experiment with a known example?
D. I have mentioned previously that the functions can easily run
into non-concave regions, which can cause mle using analytical
derivatives to fail. [The root problem appears to lie in the
behaviour of the function cnorm(x) for large values of x, since the
likelihood function contains terms involving 1/(1-cnorm(x)). The
fix-up causes discontinuities in the analytical derivatives, but not
in the numerical derivatives.] This is a particular problem for the
models with heteroskedastic errors. In my experiments I have found
that mle using numerical derivatives is less likely to fail, though
it may be slow. The current versions are programmed to use
analytical derivatives, but it would be easy, though a bit tedious,
to provide alternative versions that use numerical derivatives - just
an extra option in the calling routine leading to a branch in
selecting the versions of mle that are used. The disadvantage is
that the function packages will become rather large - in excess of
1,000 lines of code each. Before I do this, I would like to enquire
whether it might be possible to embed this directly as an option in
mle, rather than via the presence of deriv or parameters commands.
lately I found that when I write a script that does some kind of
estimation, most of the times I have to write a long and boring function
to display the results "nicely".
So I thought this kind of thing could be done once and for all via a
command. The patch you'll find attached[*] implements a "modprint"
command, which I believe will turn out useful to people like me, Ignacio,
Gordon, Sven, Stefano, Franck etc.
In practice, once you have your estimates, you pack your estimated
coefficients and their standard errors in a nx2 matrix (call it X), store
their names in a string (call it parnames) using the comma as a separator
and issue the modprint command as follows
modprint parnames X
If you have additional statistics that you want printed, you collect them
in a column vector (call it addstats), which you specify as a third
An example script is also attached, which should hopefully clarify what I
have in mind. Bear in mind this is still preliminary work; my main idea as
of now is to hear your comments.
[*] How do you apply the patch? Simple:
1) save the diff file somewhere.
2) from the unix shell, go to your gretl source main directory (the one
you run ./configure from)
3) be sure you have a fresh CVS source; run cvs up if necessary
4) issue the command
patch -p0 < /path/where/you/saved/the/diff/file/modprint.diff
5) run make etcetera
Riccardo (Jack) Lucchetti
Dipartimento di Economia
Università Politecnica delle Marche