Hello panel devotees,
I believe I'm seeing a bug in cumulated sample restrictions in the panel
context. Before I can set up a minimal example, let me describe
hopefully in enough detail what I'm doing, what's failing, which
workaround exists, and so on. I can't supply the dataset (yet) because
it's big.
The full data is structured and recognized by gretl as follows (I
believe the translated German terms are self-explanatory in this context):
Voller Datenbereich: 1:001 - 5860:900 (n = 5274000)
This is monthly, in general heavily unbalanced, the index variables
stemming from $unit and $obsdate are "stationindex" and "iso" (for ISO
date format).
For certain data cleaning reasons I'm first applying the following
restriction:
(1) smpl STATIONS_ID==291 --restrict --replace
which gives me:
Voller Datensatz: 5274000 Beobachtungen
Aktuelle Stichprobe: 1288 Beobachtungen
So far, so good. Then I want to keep only those obs where a certain
variable "temp" is non-missing, so I do:
(2) smpl ok(temp) --restrict
which then yields:
Voller Datensatz: 5274000 Beobachtungen
Aktuelle Stichprobe: 1:001 - 2:156 (n = 312)
Notice that gretl by itself figured out that this subsample has a valid
and feasible panel structure, I didn't tell it that. Does this side
effect matter? I don't know.
At this point let me demonstrate that there are no missings in this
subsample; in the console, asking for "=sum(missing(dataset))" yields 0.
Finally I want to retain only the most recent time period, and since
"iso" is my time variable, I try to apply this restriction:
(3) smpl iso==max(iso) --restrict
In theory there should be at least one observation where any given
variable takes on its own max (and in fact in this case there are two
such obs) -- however, now I'm getting an error message:
"No obs would be left!" (retranslated)
This can't be true, can it? So I tried the following workaround before
getting to step (3), i.e. starting from scratch and applying
restrictions (1) and (2), then I create a helper series:
series checkiso = iso==max(iso)
And indeed, "=sum(checkiso)" gives me the correct answer 2. Now I try
the equivalent workaround restriction:
(3b) smpl checkiso --restrict
... and it works nicely, as expected, leaving only two obs in the active
sample!
So why does step (3) not work? I believe it's a bug, but maybe I'm
missing something.
Ah, one more remark: For other values in the step-(1) restriction I'm
seeing that there won't be a recognized panel dataset after step (2),
and apparently in those case there is no problem! That's why I mentioned
it above, maybe it's relevant.
Sorry for a long message, I hope I was clear enough.
thanks
sven