Hello all panel-interested people,
while using gretl for teaching with panel data (which I hadn't done much
before) I noticed the following, let's say, interface nuisances compared
to the usual luxury gretl offers for time series:
1: The sample and/or range in the main window (bottom) are given as pure
index numbers, even if "panel data marker strings" (cf. user guide p.23)
are defined. At least for the time dimension it would be useful to show
the sample periods in a human-readable form (through the markers). Also,
I noticed that the period numbers shown do not always coincide with the
values of the "time" index variable, if subsampling is in effect. (Seen
in the CEL.gdt dataset after applying the sample restriction year>1970
1b: A slightly more general suggestion, also for non-panel data: The
active sample restriction criterion could be shown next to the resulting
active sample in the main window. (At least for simple restrictions,
maybe not for complex, multiple ones.)
2: Menu Sample -> Set range: Only the group range can be chosen, not the
periods. Actually, given the often arbitrary ordering of groups, this is
really the less useful dimension to choose a contiguous range from. (I
know I can use "set sample based on criterion" for periods, but that's
not the point.)
3: About pshrink(): A version that returns a full panel series (with
repeated values like pmean() etc.) could be useful -- practical example:
in growth regressions one needs the initial value of output-per-worker
as a regressor. Also maybe it should be called "pfirst()" or something
4: Time-constant variables: I'm not sure how to create variables that
only vary along the cross-section, like it is done with the built-in
pmean() etc. functions. Or how to append them (like the user guide p.114
"adding a time series", but along the other panel dimension).
5: Constant in a fixed-effects regression: I don't understand what gretl
reports as the global constant term in a fixed-effects model, and it
doesn't seem to be defined in the guide. It's also confusing that gretl
complains if one wants to discard the constant in the specification
dialog (when fixed effects are selected). (But obviously gretl
estimates the right thing as the comparison with explicit LSDV
regression shows, just the constant is mysterious -- even if it's the
average of the fixed effects it's not clear where the standard errors
6: Lags not showing in model spec dialog when sample is restricted to a
single period: If I restrict the CEL.gdt data with year==1985, I cannot
include any previously created lags (of y for example) in the
regression, because they don't show up in the variable selector. Because
the subsampled dataset is now treated as "undated", there's also no
"lags..." button in the dialog. -- Actually I don't understand why gretl
"temporarily forgets" the panel structure of the dataset when a single
period is active. It would seem less problematic to treat even a T=1
sample as a special case of panel data if the underlying dataset has a
panel structure; especially in conjunction with point 1 above about
showing the selected periods in the sample.
Ok, that was a long post, sorry, but still necessary I think.
[The developments mentioned here are in CVS but not yet in the
snapshots -- I'm waiting until the new "save data as binary"
facility has stablized before I prepare new snapshots, but hopefully
that should happen soon.]
1. The bug reported by Artur in
2. Binary data files: I've now implemented Jack's suggestion in
So here's how things stand: the --binary option to "store" is gone;
instead, to save data as binary, use the new ".gdtb" filename
extension, as in
store foo.gdtb <optional-series-list>
This results in creation of a zip file containing data.xml
(metadata) and data.bin (binary component). The compression level of
the zip file is governed by the --gzipped option, with a default
level of 1 (relatively fast, "light" compression).
The binary data option is not yet fully represented in the GUI.
Before I do that I'd like to get reactions to a suggestion.
Obviously, adding a new native datafile type will complicate the GUI
somewhat. At the same time I'd like to remove some old complexity
that I think is no longer needed.
Under the menu "/File/Open data" we have both "User file", which
leads to a file selector equipped with a long list of file types
(both native and "import" formats) and an "import" item which lists
the supported import formats. I'd like to get rid of the "import"
item since it really just repeats in a different form what's
available under "User file". And similarly for the "Append data"
menu: I'd like to switch to a single file selector with a drop-down
list of filters.
There's one possible downside of this, but I doubt that it's a real
issue. That is, suppose one has a misnamed file (e.g. a plain text
file with ".xls" extension). In principle you could handle this by
selecting "text/CSV" from the current import menu (hence forcing the
interpretation of the file) and then using the "All files" filter in
the dialog, or manually typing the spurious filename. The option of
pre-forcing the interpretation of a file regardless of its name
would be gone under my suggestion. But how many users have ever done
this sort of thing? (Besides, this is a work-around for a problem
that has an obvious and simple correct solution: rename the file in
question. If you're smart enough to figure out that the "import"
menu allows you to work around broken filenames, you're smart enough
to fix the file!)
I recently updated my system to Mint-KDE 16, and when I want to
compile gretl I get "Please install libcurl ..."
However, I already have libcurl since running, "sudo apt-get install
libcurl3" says I have the recent version. I thought maybe I need the
dev package but could not see anything like libcurl3-dev.
Also, saying "--without-gnome" during configure gives "unrecognized
options: --without-gnome" Since this is changed now, are there other
new things to be careful about? More specifically, what would be a
good configure command for 64 bit Linux Mint KDE? (For a long time I
have been using "./configure --prefix=/opt/gretl --without-gnome")
“Blessed are the young for they shall inherit the national debt.” –
Herbert C. Hoover (1874-1964)
Here's the "RFC" I promised in response to Sven's posting at
Saving a gretl dataset in binary form
1. Why? Because for very large datasets it is _much_ faster to save
(and to reload) data in binary form than as text. For details see
2. How? At present (in CVS) this can be done only by passing the
--binary option to gretl's "store" command (where the name of the file
to be saved has the ".gdt" extension). There is not yet any means of
doing this via the GUI, pending further discussion of potential
3. File details: At present if you use the --binary option gretl
writes two files, a gdt file containing exactly the same sort of
metadata that is stored in a regular (pure XML) gdt file and a file
with extension ".bdt" containing double-precision floating-point
values written in the platform endianness. The gdt file contains an
attribute named "binary" in the "gretldata" element, which has value
either "little-endian" or "big-endian". This attribute is "IMPLIED"
(in XML jargon) and is omitted in pure XML gdt files. When gretl
finds the "binary" attribute in a gdt file it knows to open the
companion bdt file, which must have the same name but with the ".gdt"
suffix replaced by ".bdt".
4. Recent change: As of CVS of 2014-01-16 the bdt file contains a
small header, by way of sanity check. This is a string that says
either "gretl-bdt:little-endian" or "gretl-bdt:big-endian", in either
case padded to 24 bytes with nuls. Gretl will not proceed to read the
file if this header is missing, or if the endianness it indicates
disagrees with that stated in the gdt file. If anyone wants to read a
gretl binary data file created before this change, the shell script
fixbin.sh (attached) can be used to update the bdt file.
5. Discussion: When I first introduced this idea, Sven and Jack
remarked that it would be desirable to use an extension other than
".gdt" for the XML component of a metadata/binary pair of files on the
new pattern, so as to avoid potential confusion. I can see the case
for this, but I'm not sure it's a good idea. I explain my misgivings
Internally, a gdt file is just a gdt file, regardless of whether it
has a binary companion file: it's XML conforming to a common DTD. The
functions to read and write such files are in common. There are many
places in the gretl code where it's assumed that a native data file
has the ".gdt" extension and it would be a pain to go through all of
those and adjust for the possibility of another extension. In other
words, there's no internal rationale for a different extension, this
would be purely for users' convenience.
But would it in fact be convenient for users? So far as GUI use is
concerned, I don't see any reason why users should care. The format is
mostly "hidden", all you have to bother with is (say) marking a check
box saying "Use binary format" if you have a huge data file and
write/read speed is an issue. (And I'm thinking this box might not be
shown for datasets smaller than some reasonable threshold.) It would
seem "fussy" to have a drop-down selector for different extensions in
file dialogs pertaining to native gretl data files.
It's true, there is some possibility of confusion in CLI use. The main
issue I see is that someone might save a dataset as binary, then later
decide to send it to a colleague or move it to another directory: in
that case she has to know to send/move the bdt file as well as the gdt.
Of course, she'd have to know to do that even if the XML component
were named differently, but there would be some visual clue if it
had, say, an ".mdt" suffix. On the other hand, it would be easy enough
to check the size of the gdt file: if you've stored tens or hundreds
of MB of data and the gdt file is 3K, there's your clue. We could also
provide a little command-line helper program that tests a gdt file and
tells you some stuff about it, including whether it has a binary
companion (this could also be provided as a GUI menu item).
Besides, there would be some possibility of confusion if we used
different extensions. Suppose you first have a dataset in pure XML
format, then you decide to save as XML + binary. Then maybe you make
some changes to the data and re-save. Now some time later you want to
reopen it: which file should you open? OK, it's not very hard to look
at the last modification dates of the files to see which is more
recent, but there's no uncertainty if a native data-save uses just one
I have a txt-file with no headers. Gretl informs me correctly that "it
seems there are no variable names".
However, the imported variables do not include the first row, which should
not happen, right?
regarding the recently discussed binary workfile format, I don't see the
option in the GUI save-as dialog -- perhaps on purpose?
And slightly different: If I "export" data in the GUI, I don't get the
compression level setting in the file save dialog which I do have under
All with the latest (yesterday's) windows snapshot.
an existing script of mine stopped working at the line:
set RNG SFMT
I suspect this is related to these changelog entries:
" Enable use of arrays in the SFMT random number generator"
" Internals: add support for multiple, independent PRNGs"
But is it a bug, or a deliberate change?