On Thu, 23 Feb 2012, Allin Cottrell wrote:
I know we discussed gretl 2.0 issues a while back, and I know I need
to
re-read what was said then, so that previous work is not just discarded. But
in the meantime here are a few thoughts off the top of my head.
One general question: do we want to make a big deal of gretl 2.0, or do we
want to "do a Linus"?
[ ... ]
I don't think we are in the position to gallantly ignore the psychological
and image implications od a major version number change like Linus did.
Compared to Linux, I'd venture to say that gretl is a slightly less
recognisable brand.
In gretl's case it would be quite natural to roll over from 1.9.9
to 2.0.0.
As Jack says, we could instead roll to 1.10.0 (or 1.9.10) but multi-digit
minor numbers are perhaps a little unsightly and liable to cause confusion in
some contexts.
IMHO, there would be more confusion after "Linussing"; casual users would
simply go "WOW! Oh, wait... what...?"
OK, so much for doing a Linus: what's the case for making 2.0 a
real
milestone? (And deferring that milestone until something special is ready.) I
can see a few possibilities (there may be more). Version 2.0 should contain
one or more of:
1) Major new functionality
2) Major changes in the GUI
3) A major backward-incompatible clean-up of hansl
4) A major change in the libgretl API, to make it easier for third
parties to use
5) A major purge of bugs and update/completion of documentation
My current thinking (sorry if this is disappointing!) is that number 5
provides the strongest case for a "2.0 milestone". Let's go through the
list.
* Major new functionality: Well, if we're talking C code, then at present
that means stuff that Jack and I will produce. I put my view on this at the
2011 gretl conference: I think we now have a good enough baseline that people
ought to be able to add functionality to gretl in the form of function
packages and "addons". I certainly stand ready to fix bugs and tweak the C
code (including the GUI code and the "gretl server" infrastructure) to make
that easier. But right now I myself have no plans to add major econometric
functionality in C form. Jack has been working on substantial new stuff, but
in the form of (brilliant) hansl code rather than C.
Thanks for the kind words, but in my experience hansl is absolutely and by
far the best language to work in from an applied econometrician's
viewpoint, so producing nice hansl code for doing even hard stuff is
surprisingly easy.
You may think that mine is a slightly biased opinion, having contributed
to the creation of hansl itself: it's a bit like Linus Torvalds saying he
likes Linux or Larry Wall saying he prefers Perl to Python (cue to Sven to
start the flamefest). You'd probably be right, but let my add one element:
contrary to LT and LW, coding is not my job and I never even had a formal
training at that. What I do as a job is being an academic economist (and a
bass player, but I digress). I think I can claim I have a much better
understanding of the needs of the end user than LT or LW can have, because
I'm one of them!
And I think we can take as a fact that most applied economists learn how
to use _one_ econometrics package or perhaps two, depending on their field
of activity; some (surely not the majority) learn how to program in _one_
language; a few become proficient in that language; only a tiny minority
can claim to be at ease with more than one language. Older time-series
people like Gauss or Ox, younger fellows go with Matlab, the micro
community worships Stata, the eccentric brag about R, corporate
practitioners are not even aware that something else exists but Eviews,
some pockets of resistance still exist in the TSP mountains, etcetera. I
have some familiarity with all of these, and I'm ready to defend the point
that NOTHING BEATS HANSL.
There are only three areas in which I see the necessity of low-level code
work (but maybe I'm missing something):
a) Massive parallelisation is definitely the future of scientific
computation. So far, we have cautiously explored some possibilities, but
time will come when properly parallelising the internals of gretl will
become unavoidable. But that's not for 2.0; I see it more as a 3.0 thing.
b) Both hansl and gretl (heh) may strongly benefit from setting up an
infrastructure for managing data sets like Stata does. That is, do the
things that, ideally, you'd use a RDBMS for, but you can't ask an applied
economist to study SQL, can you? I'm talking about dataset merging,
splitting, sorting, variable/cases keeping/dropping, etcetera. Anybody
who's ever worked with large micro data bases knows exactly what I'm
talking about. Stata is, to my knowledge, the only econometrics package
that attempts to do this and does it, in my opinion, badly.
c) There may be the case for extending the way data are stored in gretl
from a double-only representation to a more general one. This would enable
us to have string and int variables. Allin and I talked a little about
this in Toruń, but this is HUGE. The project currently contains about
400,000 lines of C code, and my guesstimate is that at least half of this
would have to be thoroughly revised, if not rewritten. Allin already has
done some rationalisation work in libgretl which makes this a little
easier, but it's a loooooooong way away.
* Major changes in the GUI: That's up to me alone, and I have no
plans in
that area. Nor do I expect to have time to implement truly big ideas that
others may come up with, though I'm always ready to consider incremental
improvements and bug fixes.
100% agree. (Notice my elegant silence on the issue of decimal
separators.)
* Major backward-incompatible clean-up of hansl: consistency and
cleanliness
are good, but so is continuing backward compatibility. I can surely see a
case for scrapping some archaisms. But I seem to recall some folk wisdom from
computer science: the production of a backward-incompatible "cleaned up"
version 2 of language L often results in fragmentation of the user base and
decline of L.
IMO, it's a matter of common sense. We may scrap a few things that really
are obsolete, but that's it IMO.
* Clean-up of the libgretl API: A good idea. But this can be done
without
much (if any) change that is visible to users of gretl itself, so it's
probably not very pertinent to the "2.0" question.
* Purge of bugs and update/completion of documentation: Here I can really get
on board. One conception of gretl 2.0 is that it has achieved a degree of
maturity where we have squashed as many bugs as we can find on an extended
period of testing, and have documented in a reasonably comprehensible and
cross-referenced form all that the program can do.
Agree.
Riccardo (Jack) Lucchetti
Dipartimento di Economia
Università Politecnica delle Marche
r.lucchetti(a)univpm.it
http://www.econ.univpm.it/lucchetti