Hi everybody,
this is a bit of an advanced panel data handling problem, but at the
same time practically relevant I think... Perhaps it might even qualify
as a bug, but let's see: I'm wondering whether appending a second panel
dataset (say, file2.gdt) to an open panel dataset (say, file1.gdt) is
perhaps made "too" simple with gretl, potentially resulting in a rubbish
combined dataset. The core problem seems to be that gretl (for
appending) matches the internal numbering of panel units, even though
this internal numbering isn't really meaningful.
Let me try to explain: The attached "file1_test.gdt" holds unbalanced
data for 4 panel units, while "file2_test.gdt" holds data for 5 panel
units. The overall time dimension for both is the same. With the 1st
dataset loaded, when I append the 2nd dataset (simple 'append', no
'join' or whatever), gretl does that without any complaints.
But suppose that the 4 panel units in file1 do _not_ correspond to the
first 4 units in file2. This is something that can easily happen in real
life. So the append operation doesn't get the matching right.
In addition, I tried a variation where the two panel data structures
were defined properly with panel index variables that had no missings.
Still, 'append' doesn't care about that, always just referring to the
internal but separate unit index numbering.
Yes, I'm aware that I can get the desired result using 'join', that's
not the point.
The point is about the following claim from the 'append' function
reference: "In the case of adding series, compatibility requires either
(a) that the number of observations for the new data equals that for the
current data, or (b) that the new data carries clear observation
information so that gretl can work out how to place the values." -- In
my example, condition (a) is clearly not the case, and I'd argue that
condition (b) is either violated as well, or that the
panel-index-variable information is ignored by 'append'. Perhaps
'append' should refuse to proceed in my example case, or it should ask
the user, or it should make use of the panel index variables, but
arguably it should not use the arbitrary internal unit numbers.
I hope that the problem description was clear enough!
I'm curious what your reactions and answers will be, thanks
sven