lrvar bug (?) in panels
by Sven Schreiber
Hi,
I suspect that the lrvar function (long-run variance estimation) is
slightly wrong in the panel case. Consider the following example:
<hansl>
function matrix lrvarFE(series x, int b "bandwidth")
N = max($unit)
out = 0
loop i = 1..N
smpl i i --unit
errorif(abs(mean(x)) > 1e-10, "whoops, found non-zero within mean")
out += lrvar(x, b)
endloop
return out / N
end function
# -- test case
open grunfeld
# do an explicit FE demeaning first, for easier comparison
panel kstock const
series kres = $uhat
eval lrvarFE(kres, 5) # 2.0608e+005
# comparison
eval lrvar(kres, 5) # 177215.62
# the following wrong on purpose:
eval lrvar({kres}, 5) # same result as the previous line
</hansl>
Explanation: The last calculation (which is wrong) converts the kres
series directly into a matrix. Given gretl's panel storage format of
stacked time-series, this means that the values for all units are simply
stacked in the resulting single column. Then the Bartlett kernel of the
lrvar() function partly connects the different panel units, which is
spurious and distorts the result.
Given that gretl's result in the pen-ultimate line is identical to that,
I guess that's what happens.
Instead, the output of my lrvarFE function is what I believe to be the
correct one.
The similar lrcovar() function is not directly affected, because it
takes a matrix instead of (a list of) series as input, and so it's the
caller's responsibility to handle the data correctly. (Although it would
be nice if it accepted a list, but that's a different issue.)
But I could imagine that the filter functions could suffer from a
similar problem (bkfilt, bwfilt, hpfilt, perhaps pergm?, kdensity?); but
haven't checked that.
Does all that sound right?
Sven
1 year, 10 months
pitfalls of appending panel datasets
by Sven Schreiber
Hi everybody,
this is a bit of an advanced panel data handling problem, but at the
same time practically relevant I think... Perhaps it might even qualify
as a bug, but let's see: I'm wondering whether appending a second panel
dataset (say, file2.gdt) to an open panel dataset (say, file1.gdt) is
perhaps made "too" simple with gretl, potentially resulting in a rubbish
combined dataset. The core problem seems to be that gretl (for
appending) matches the internal numbering of panel units, even though
this internal numbering isn't really meaningful.
Let me try to explain: The attached "file1_test.gdt" holds unbalanced
data for 4 panel units, while "file2_test.gdt" holds data for 5 panel
units. The overall time dimension for both is the same. With the 1st
dataset loaded, when I append the 2nd dataset (simple 'append', no
'join' or whatever), gretl does that without any complaints.
But suppose that the 4 panel units in file1 do _not_ correspond to the
first 4 units in file2. This is something that can easily happen in real
life. So the append operation doesn't get the matching right.
In addition, I tried a variation where the two panel data structures
were defined properly with panel index variables that had no missings.
Still, 'append' doesn't care about that, always just referring to the
internal but separate unit index numbering.
Yes, I'm aware that I can get the desired result using 'join', that's
not the point.
The point is about the following claim from the 'append' function
reference: "In the case of adding series, compatibility requires either
(a) that the number of observations for the new data equals that for the
current data, or (b) that the new data carries clear observation
information so that gretl can work out how to place the values." -- In
my example, condition (a) is clearly not the case, and I'd argue that
condition (b) is either violated as well, or that the
panel-index-variable information is ignored by 'append'. Perhaps
'append' should refuse to proceed in my example case, or it should ask
the user, or it should make use of the panel index variables, but
arguably it should not use the arbitrary internal unit numbers.
I hope that the problem description was clear enough!
I'm curious what your reactions and answers will be, thanks
sven
1 year, 10 months